AI alignment researchers don’t (seem to) stack
(Status: another point I find myself repeating frequently.)
One of the reasons I suspect we need a lot of serial time to solve the alignment problem is that alignment researchers don’t seem to me to “stack”. Where “stacking” means something like, quadrupling the size of your team of highly skilled alignment researchers lets you finish the job in ~1/4 of the time.
It seems to me that whenever somebody new and skilled arrives on the alignment scene, with the sort of vision and drive that lets them push in a promising direction (rather than just doing incremental work that has little chance of changing the strategic landscape), they push in a new direction relative to everybody else. Eliezer Yudkowsky and Paul Christiano don’t have any synergy between their research programs. Adding John Wentworth doesn’t really speed up either of them. Adding Adam Shimi doesn’t really speed up any of the previous three. Vanessa Kosoy isn’t overlapping with any of the other four.
Sure, sometimes one of our visionary alignment-leaders finds a person or two that sees sufficiently eye-to-eye with them and can speed things along (such as Diffractor with Vanessa, it seems to me from a distance). And with ops support and a variety of other people helping out where they can, it seems possible to me to take one of our visionaries and speed them up by a factor of 2 or so (in a simplified toy model where we project ‘progress’ down to a single time dimension). But new visionaries aren’t really joining forces with older visionaries; they’re striking out on their own paths.
And to be clear, I think that this is fine and healthy. It seems to me that this is how fields are often built, with individual visionaries wandering off in some direction, and later generations following the ones who figured out stuff that was sufficiently cool (like Newton or Laplace or Hamilton or Einstein or Grothendieck). In fact, the phenomenon looks even more wide-ranging than that, to me: When studying the Napoleonic wars, I was struck by the sense that Napoleon could have easily won if only he’d been everywhere at once; he was never able to find other generals who shared his spark. Various statesmen (Bismark comes to mind) proved irreplaceable. Steve Jobs never managed to find a worthy successor, despite significant effort.
Also, I’ve tried a few different ways of getting researchers to “stack” (i.e., of getting multiple people capable of leading research, all leading research in the same direction, in a way that significantly shortens the amount of serial time required), and have failed at this. (Which isn’t to say that you can’t succeed where I failed!)
I don’t think we’re doing something particularly wrong here. Rather, I’d say: the space to explore is extremely broad; humans are sparsely distributed in the space of intuitions they’re able to draw upon; people who have an intuition they can follow towards plausible alignment-solutions are themselves pretty rare; most humans don’t have the ability to make research progress without an intuition to guide them. Each time we find a new person with an intuition to guide them towards alignment solutions, it’s likely to guide them in a whole new direction, because the space is so large. Hopefully at least one is onto something.
But, while this might not be an indication of an error, it sure is a reason to worry. Because if each new alignment researcher pursues some new pathway, and can be sped up a little but not a ton by research-partners and operational support, then no matter how many new alignment visionaries we find, we aren’t much decreasing the amount of time it takes to find a solution.
Like, as a crappy toy model, if every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.
And of course, in real life, different paths have different lengths, and adding new people decreases the amount of time required at least a little in expectation. But not necessarily very much, and not linearly.
(And all this is to say nothing of how the workable paths might not even be visible to the first generation of visionaries; the intuitions that lead one to a solution might be the sort of thing that you can only see if you’ve been raised with the memes generated by the partial-successes and failures of failed research pathways, as seems-to-me to have been the case with mathematics and physics regularly in the past. But I digress.)
So8res—well said. This seems like an accurate take on a major problem, and it fits well with what I’ve observed about the rates of progress in various new and emerging academic fields.
Your last paragraph is especially important—often, the first generation of visionary researchers working on a fresh problem offer such intellectually compelling and novel insights that they sweep up a lot of young talent into their world-view. The young talent initially just follows in their tracks, adding a few details and epicycles to their initial models. It often takes at least 20-30 years for a younger generation, after that first flush of enthusiastic field-building, to develop any serious critiques of the initial visions, or to find any common ground between different visionaries.
The result is that major, new, intellectually demanding fields usually take at least 30-40 years to mature to the point that they can become ‘normal science’, with a large, multi-generational, smoothly functioning ecosystem of ideas, critiques, data, and advances that aren’t overly locked into the original, fallible insights of the field’s founders.
The field of AI alignment is maybe 10-15 years old, depending on when we start counting. That leaves at least another 25-35 years before we can expect it to achieve even a modest degree of maturity and applicability.
And I can’t think of any historical examples of any people or groups successfully accelerating this generational time-scale for field maturation. It seems pretty deeply woven into the social psychology of human research cultures.
This reminds me of attitudes to Quantum Physics. Most current physics professors I’ve meat have a sort of learned helplessness relationship to quantum interpretations, subscribing to something like “shut up and calculate” (i.e. don’t even try to understand). There is an attitude that quantum is too strange and therefore impossible to understand. Where as the newer generation of post-docs and grand students don’t shy away from quantum interpretations, and discussions of ontology. However, this falls a bit outside your model, since quantum mechanics is ~100 year old.
Nice post!
Have there been any attempts to estimate the fraction of the total research hours required to solve AI alignement which are serial time?