What if we don’t need a “Hard Left Turn” to reach AGI?

Epistemic Status: My day job involves building Large Language Models (LLMs), but I only read AI safety literature as a casual hobby, and may have missed some important points. All predictions I make about the future of AI naturally have a very high level of uncertainty.

Over the last few years, I’ve noticed a contradiction in the beliefs of AGI prognosticators in the rationalist-adjacent community. The amazing progress in AI over the last few years has caused many to dramatically update their timelines for the emergence of AGI. While there have been advances in other areas of AI, most of the major accomplishments come from Large Language Models (LLMs) such as GPT-3, and I don’t believe that the non-LLM progress would elicit anywhere near the same level of excitement without LLMs. That is, LLM progress has caused many to update their timelines for AGI. However, it has not caused those same people to update their predictions for how an AGI might act and what conceptual frameworks we should use to understand its behavior.

Language Models can do more than model language: 2022 has been a good year for AI research, but I believe that, in retrospect, Minerva will be considered the most important advance of the year. Minerva is a LLM trained to take math word problems and generate a step-by-step solution. In a particularly impressive result, Minerva got ~50% accuracy on the MATH dataset, much better than the previous 10% state-of-the-art result. This may not sound impressive until you realize how tough the MATH dataset is. This isn’t a dataset of ‘plug in the numbers’ word problems from an introductory algebra class. This dataset is taken from high-school math competitions, designed to challenge the best mathematics students. Estimates of human performance are spotty, but the original MATH dataset paper gave the MATH dataset to a smart Computer Science undergrad student and got a performance around ~40%. Personally, I have a degree in statistics and I’d probably struggle to beat Minerva at these problems. It’s conservative to say that Minerva is around 90th percentile of human performance in solving math problems.

Language models are trained to the predict the next token in a large corpus of unstructured text. But the results on the MATH dataset can’t be explained by simple pattern matching. To easily adapt to performing complex and difficult math problems, Minerva has That’s not to say that Minerva is an AGI—it clearly isn’t. But something important is going on here.

Maybe scaling really is all we need: There’s no reason to expect performance to plateau at this level. My very conservative estimate is that, within the next two years, a model will be able to surpass the performance of very smart people who haven’t specifically trained on math problems (ie. 80-90% accuracy on MATH). What comes next?

I find it helpful to think of AI progress in terms of order-of-magnitude (OOM) improvements in effective model capacity, over, say, the next 25 years. Scaling can come from three places: more spending, better hardware, and better algorithms.

  • The manhattan project cost ~20 billion in todays dollars, and current large models cost ~20 million. If AGI gets increasingly more plausible, it’s possible to imagine the size of current models increasing 2-5 OOMs from increased spending alone.

  • GPU performance/​$ doubles every 2.5 years, so that means 2-4 OOMs from more efficient hardware (where the upper end of this range is driven by increasingly specialized hardware).

  • Progress from improved algorithms is harder to estimate, but we can make some guesses. Chain-of-thought prompting alone, released this year, improves performance on downstream math reasoning by 1-2 OOM equivalents. If we have a major breakthrough every 4-6 years, each of which gives us 1-2 OOM equivalents, that’s 4-8 OOMs from improved algorithms. This would be incredibly slow for the AI field’s recent pace (the first modern LLM was released ~4 years ago), but I want to get conservative estimates.

Put together, we can expect 6-17 orders-of-magnitude improvements over current models over the next 25 years. You might move some of the timelines here up, especially around algorithmic advances, to match your overall sense of AGI timelines.

What happens when we take a model that can beat most humans at mathematical reasoning, and then build another model that is around a billion times more powerful?

What do we really need for a pivotal act?

As the power of models increases, it’s useful to think about what we actually need from an AGI. I believe that it is sufficient for that AGI to be able to make drastically-superhuman advancements in engineering and technology. It seems likely that a model which is a billion times the effective power of Minerva is able to match (or exceed) the competence of a human engineer at a wide range of tasks. And, with access to the compute needed to train the model, we can scale up the inference speed of this model, until we can do a lot of effective thinking near-instantaneously.

We can provide this model with a description of the model, technical background, and the ability to write out its chain of thoughts, query the technical literature, direct lab techs to run experiments or to build (carefully-human-checked) devices. Then we can direct this model to do things like “invent cold fusion” (or “build a nanobot swarm that can destroy all other GPUs”), and it would accomplish this in a small amount of real-world time.

This is a very different form of AI

Notably, the model would be able to accomplish these effects despite having little more “agency” than modern LLMs. My central point is that we can easily imagine singularity-producing systems that do not look fundamentally different from modern AI.

Speculating wildly, alignment of this type of AGI may be much easier. This AGI might not have any goals or any motivation to deceive humans. It might not even have any knowledge that the real world is a concept distinct from the various works of fiction in the training corpus. But it would be extremely good at solving open-ended general research questions.

This isn’t the definitive future roadmap for AGI research. It could be the case that this approach doesn’t scale, or that this approach does scale but some reinforcement learning breakthroughs (likely combined with LLMs) gets us to AGI faster. But this is the plausible roadmap to AGI that requires the fewest major breakthroughs, and I haven’t seen people in the AI safety community discuss this as a possibility. It seems worthwhile to spend more effort focusing on the types of alignment issues from this type of AI.