What if we don’t need a “Hard Left Turn” to reach AGI?
Epistemic Status: My day job involves building Large Language Models (LLMs), but I only read AI safety literature as a casual hobby, and may have missed some important points. All predictions I make about the future of AI naturally have a very high level of uncertainty.
Over the last few years, I’ve noticed a contradiction in the beliefs of AGI prognosticators in the rationalist-adjacent community. The amazing progress in AI over the last few years has caused many to dramatically update their timelines for the emergence of AGI. While there have been advances in other areas of AI, most of the major accomplishments come from Large Language Models (LLMs) such as GPT-3, and I don’t believe that the non-LLM progress would elicit anywhere near the same level of excitement without LLMs. That is, LLM progress has caused many to update their timelines for AGI. However, it has not caused those same people to update their predictions for how an AGI might act and what conceptual frameworks we should use to understand its behavior.
Language Models can do more than model language: 2022 has been a good year for AI research, but I believe that, in retrospect, Minerva will be considered the most important advance of the year. Minerva is a LLM trained to take math word problems and generate a step-by-step solution. In a particularly impressive result, Minerva got ~50% accuracy on the MATH dataset, much better than the previous 10% state-of-the-art result. This may not sound impressive until you realize how tough the MATH dataset is. This isn’t a dataset of ‘plug in the numbers’ word problems from an introductory algebra class. This dataset is taken from high-school math competitions, designed to challenge the best mathematics students. Estimates of human performance are spotty, but the original MATH dataset paper gave the MATH dataset to a smart Computer Science undergrad student and got a performance around ~40%. Personally, I have a degree in statistics and I’d probably struggle to beat Minerva at these problems. It’s conservative to say that Minerva is around 90th percentile of human performance in solving math problems.
Language models are trained to the predict the next token in a large corpus of unstructured text. But the results on the MATH dataset can’t be explained by simple pattern matching. To easily adapt to performing complex and difficult math problems, Minerva has That’s not to say that Minerva is an AGI—it clearly isn’t. But something important is going on here.
Maybe scaling really is all we need: There’s no reason to expect performance to plateau at this level. My very conservative estimate is that, within the next two years, a model will be able to surpass the performance of very smart people who haven’t specifically trained on math problems (ie. 80-90% accuracy on MATH). What comes next?
I find it helpful to think of AI progress in terms of order-of-magnitude (OOM) improvements in effective model capacity, over, say, the next 25 years. Scaling can come from three places: more spending, better hardware, and better algorithms.
The manhattan project cost ~20 billion in todays dollars, and current large models cost ~20 million. If AGI gets increasingly more plausible, it’s possible to imagine the size of current models increasing 2-5 OOMs from increased spending alone.
GPU performance/$ doubles every 2.5 years, so that means 2-4 OOMs from more efficient hardware (where the upper end of this range is driven by increasingly specialized hardware).
Progress from improved algorithms is harder to estimate, but we can make some guesses. Chain-of-thought prompting alone, released this year, improves performance on downstream math reasoning by 1-2 OOM equivalents. If we have a major breakthrough every 4-6 years, each of which gives us 1-2 OOM equivalents, that’s 4-8 OOMs from improved algorithms. This would be incredibly slow for the AI field’s recent pace (the first modern LLM was released ~4 years ago), but I want to get conservative estimates.
Put together, we can expect 6-17 orders-of-magnitude improvements over current models over the next 25 years. You might move some of the timelines here up, especially around algorithmic advances, to match your overall sense of AGI timelines.
What happens when we take a model that can beat most humans at mathematical reasoning, and then build another model that is around a billion times more powerful?
What do we really need for a pivotal act?
As the power of models increases, it’s useful to think about what we actually need from an AGI. I believe that it is sufficient for that AGI to be able to make drastically-superhuman advancements in engineering and technology. It seems likely that a model which is a billion times the effective power of Minerva is able to match (or exceed) the competence of a human engineer at a wide range of tasks. And, with access to the compute needed to train the model, we can scale up the inference speed of this model, until we can do a lot of effective thinking near-instantaneously.
We can provide this model with a description of the model, technical background, and the ability to write out its chain of thoughts, query the technical literature, direct lab techs to run experiments or to build (carefully-human-checked) devices. Then we can direct this model to do things like “invent cold fusion” (or “build a nanobot swarm that can destroy all other GPUs”), and it would accomplish this in a small amount of real-world time.
This is a very different form of AI
Notably, the model would be able to accomplish these effects despite having little more “agency” than modern LLMs. My central point is that we can easily imagine singularity-producing systems that do not look fundamentally different from modern AI.
Speculating wildly, alignment of this type of AGI may be much easier. This AGI might not have any goals or any motivation to deceive humans. It might not even have any knowledge that the real world is a concept distinct from the various works of fiction in the training corpus. But it would be extremely good at solving open-ended general research questions.
This isn’t the definitive future roadmap for AGI research. It could be the case that this approach doesn’t scale, or that this approach does scale but some reinforcement learning breakthroughs (likely combined with LLMs) gets us to AGI faster. But this is the plausible roadmap to AGI that requires the fewest major breakthroughs, and I haven’t seen people in the AI safety community discuss this as a possibility. It seems worthwhile to spend more effort focusing on the types of alignment issues from this type of AI.
Nate / Elizer / others I’ve seen arguing for a sharp left turn appeal to an evolution → human capabilities analogy and say that evolution’s outer optimization process built a much faster human inner optimization process whose capability gains vastly outstripped those evolution built into humans. They seem to expect a similar thing to happen with SGD creating some inner thing which is not SGD and gains capabilities much faster than SGD can “insert” them into the AI. Then, just like human civilization exploded in capabilities over a tiny evolutionary timeframe, so too will AIs explode in capabilities over a tiny “SGD timeframe”.
I think this is very wrong, and that “evolution → human capabilities” is a very bad reference class to make predictions about “AI training → AI capabilities”. We don’t train out AIs via an outer optimizer over possible inner learning processes, where each inner learning process is initialized from scratch, then takes billions inner learning steps before the outer optimization process take one step, and then is deleted after the outer optimizer’s single step. Obviously, such a “two layer” training process would experience a “sharp left turn” once each inner learner became capable of building off the progress made by the previous inner learners (which happened in humans via culture / technological progress from one generation to another).
However, this “sharp left turn” does not occur because the inner learning processes is inherently better / more foomy / etc. than the outer optimizer. It happens because you devoted billions of times more resources to the inner learning processes, but then deleted each inner learner after a short amount of time. Once the inner learning processes become capable enough to pass their knowledge along to their successors, you get what looks like a sharp left turn. But that sharp left turn only happens because the inner learners have found a kludgy workaround past the crippling flaw where they all get deleted shortly after initialization.
In my frame, we’ve already figured out and applied the “sharp left turn” to our AI systems, in that we don’t waste our compute on massive amounts of incredibly inefficient neural architecture search or hyperparameter tuning[1]. We know that, for a given compute budget, the best way to spend it on capabilities is to train a single big model in accordance with the empirical scaling laws discovered in the Chinchilla paper, not to split the compute budget across millions of different training runs for vastly tinier models with slightly different architectures / training processes. The marginal return on architecture tweaking is much lower than the return to direct scaling.
(Also, we don’t delete our AIs well before they’re fully trained and start again from scratch using the same number of parameters. I feel a little silly to be emphasizing this point so often, but I think it really does get to the crux of the matter. Evolution’s sharp left turn happened because evolution spent compute in a shockingly inefficient manner for increasing capabilities. Once you condition on this specific failure mode of evolution, there really is nothing else to be explained here, and no reason to suppose some general tendency towards “sharpness” in inner capability gains.)
It can be useful to do hyperparameter tuning on smaller versions of the model you’re training. My point is that relatively little of your compute budget should go into such tweaking
Hi! Thanks for this post. What you are describing matches my understanding of Prosaic AGI, where no significant technical breakthrough is needed to get to safety-relevant capabilities.
Discussion of the implications of scaling large language models is a thing, and your input would be very welcome!
On the title of your post: the hard left turn term is left undefined, I assume that’s a reference to Soares’s sharp left turn.
I think the worry is that if the AGI is able to reason well about a diverse range of real world situations, then it probably has a world model and doesn’t just think that the world is a corpus of fictional texts.
I agree with this, although I think people would say that “general” part of AGI means that domain specific AI’s won’t count, that’s just semantics though.
It seems reasonable to me that narrow AI’s focused on one specific thing will outperform “general” AI on most tasks, the same way that the winner of an olympic triathlon is usually not the winner of the individual swimming, running or cycling events. If this is true, then there is dramatically less incentive to make one that is “general”, instead of making a boatload of hyper specific ones for whatever purpose you need.
Hey!
I’d search the “list of lethalities” post for “Facebook AI Research” (especially point 5)
TL;DR: If one group makes an AI which doesn’t have such strong capabilities, Facebook AI Research can still build a dangerous AI a 6 months.
Yudkowsky also points out that the AGI might suggest a plan that is too complicated for us to understand, and if we could understand it we’d come up with it ourselves. This seems wrong to me (because “understanding” is easier than “coming up with it” in some cases), but I’m guessing it’s part of what he’d reply, if that helps
I’m sympathetic to this basic possibilities you’re outlining. I touched on some similar-ish ideas in this high-level visualization of how the future might play out.
Thanks for the post! Wanted to flag a typo: “ To easily adapt to performing complex and difficult math problems, Minerva has That’s not to say that Minerva is an AGI—it clearly isn’t.”