A few years ago, Chris Olah gave a machine learning seminar for GiveWell staff who were interested (i.e. curious people with no ML background).
In one session, he introduced the manifold hypothesis, and its corollary that platonic concepts might exist on an objective manifold such that the (objective) relationships between concepts can be explored by exploring the manifold space. Example: King - Man + Woman = Queen
(I'm probably butchering the idea here, but that's not important for current purposes.)
Chris was super enthusiastic as he explained this idea. One I understood it a little, my reaction was something like: "huh, that's pretty cool."
I think part of the difference between Chris and my reactions to the manifold hypothesis is explained by his greater mastery of the material (greater mastery leading to greater appreciation of the phenomenon & its subtleties).
But I think another (large) part of the difference is explained by personal affinity to the material. (e.g. if there was another idea similar in character to the manifold hypothesis but totally new to both Chris & myself, I posit that learning about this new idea would be much more exciting for Chris.)
I'm not claiming that this difference is innate & immovable. I think I could grow more interested in manifold-hypothesis-like ideas, and Chris could grow less.
I am claiming that given our personal histories to date, our genetic makeup, and our beliefs about ourselves & the world, manifold-hypothesis-like ideas are much more salient to Chris than than to me.
Okay, sure, but why bring this up?
I'm currently holding that a lot of the useful work to be done on AI alignment is technical work. This is the conjunction of "first-principles work like MIRI is doing is basically math research" and "AI alignment stuff that falls out AI development is basically software engineering."
To the extent that useful work on AI alignment is technical work, my contribution is mediated by how much technical capacity I can contribute.
I think in this context, "technical capacity" basically flows from interest in manifold-hypothesis-like problems.
For whatever reason, I don't think that this kind of interest can be manufactured by exhortations like "holy shit this research area might be a really big deal for humanity."
Humans don't appear to be built in a way that allows for easy conversion of considered views into personal saliency.
This is in tension with another view I hold, something like "I enjoy working on problems that seem important & neglected." I think there's truth in that, but it's not the full story.
Consider "thinking about humanitarian issues and what matters" as an idea of the same kind as "thinking about the manifold hypothesis."
I think I get more juice out of thinking about ethics & comparative humanitarian impact than Chris Olah does, in the same way he gets more juice out of thinking about the manifold hypothesis than I do. I don't know why this is true. (Well, I do: some combination of our genetics, personal histories, and beliefs about the world. But that's not a very satisfying answer.)
It would be a mistake to take "I enjoy thinking about ethics & comparative humanitarian impact" and infer from it that after some period of thinking about what's worth doing (and deciding on a worthwhile path forward), my interest in thinking about what's worth doing will easily convert into interest in the object-level thing I settle on.
I regularly make this mistake when thinking about career moves. Another way of saying this: ideas about what problems seems most important from a God's-eye view do not easily or simply map to ideas about what I should do, specifically.
Maybe this seems obvious, but I've only come to appreciate it in the last few months. (And if I'm feeling grandiose I'm tempted to claim that large swaths of EA are making the same simple-mapping error.)
These days I'm feeling bearish about non-technical work on AI alignment, and technical progress on AI alignment is heavily meditated by personal interest in the subject matter. But personal interest isn't fixed, and to some extent it increases as you learn more about a research area.
So maybe I should tool up on AI, and see how my interest responds?
This sounds pretty good, except for I already tried that and it didn't go super well.
This doesn't rule out another attempt, but my life is set up in a similar way now to how it was then (a lot of unstructured time, no clear social or community accountability for doing the thing), and the Nanodegree's accountability mechanisms proved insufficient to motivate me consistently. (It annoyed me to pay each month for program enrollment, and that probably led to more work than I would have done otherwise, but not enough to get me to complete; being graded by random internet instructors didn't motivate me at all because it felt super arbitrary, "why do I care about what these people think?", "so what if I do a sloppy job on this assignment, who in my life is going to notice?")
That attempt failed to bootstrap intrinsic motivation in the ideas; this failure makes me less excited about another similar-seeming attempt. Especially when I contrast it to my self-directed progress in other directions which have gone better (e.g. writing essays about ethics).
I'm coming to a point where it doesn't feel like there's much of a choice here. To sum up:
- Technical AI work hinges on interest in technical problems
- Currently I don't have much interest in technical problems (and I don't really know why not)
- I'm not the sort of creature who can (explicitly) will himself into being interested in a problem-space
- A previous attempt to cultivate interest in technical problems didn't work well
A big constraint on thinking about what to do next is how interesting I find the problem-spaces I'm considering.