Some very harsh criticism of Leopold Aschenbrenner’s recent AGI forecasts in the recent comments on this Metaculus question. People who are following stuff more closely than me will be able to say whether or not they are reasonable:
I didn’t read all the comments, but Order’s are obvious nonsense, of the “(a+b^n)/n = x, therefore God exists” tier. Eg take this comment:
But something like 5 OOMs seems very much in the realm of possibilities; again, that would just require another decade of trend algorithmic efficiencies (not even counting algorithmic gains from unhobbling).
Here he claims that 100,000x improvement is possible in LLM algorithmic efficiency, given that 10x was possible in a year. This seems unmoored from reality—algorithms cannot infinitely improve, you can derive a mathematical upper bound. You provably cannot get better than Ω(n log n) comparisons for sorting a randomly distributed list. Perhaps he thinks new mathematics or physics will also be discovered before 2027?
This is obviously invalid. The existence of a theoretical complexity upper bound (which incidentally Order doesn’t have numbers of) doesn’t mean we are anywhere near it, numerically. Those aren’t even the same level of abstraction! Furthermore, we have clear theoretical proofs for how fast sorting can get, without AFAIK any such theoretical limits for learning. “algorithms cannot infinitely improve” is irrelevant here, it’s the slightly more mathy way to say a deepity like “you can’t have infinite growth on a finite planet,” without actual relevant semantic meaning[1].
Numerical improvements happen all the time, sometimes by OOMs. No “new mathematics or physics” required.
Frankly, as a former active user of Metaculus, I feel pretty insulted by his comment. Does he really think no one on Metaculus took CS 101?
It’s probably true that every apparently “exponential” curve become a sigmoid eventually, knowing this fact doesn’t let you time the transition. You need actual object-level arguments and understanding, and even then it’s very very hard (as people arguing against Moore’s Law or for “you can’t have infinite growth on a finite planet” found out).
To be clear I also have high error bars on whether traversing 5 OOMs of algorithmic efficiency in the next five years are possible, but that’s because a) high error bars on diminishing returns to algorithmic gains, and b) a tentative model that most algorithmic gains in the past were driven by compute gains, rather than exogeneous to it. Algorithmic improvements in ML seems much more driven by the “f-ck around and find out” paradigm than deep theoretical or conceptual breakthroughs; if we model experimentation gains as a function of quality-adjusted researchers multiplied by compute multiplied by time, it’s obvious that the compute term is the one that’s growing the fastest (and thus the thing that drives the most algorithmic progress).
In the future I would recommend reading the full comment. Admitting your own lack of knowledge (not having read the comments) and then jumping to “obviously nonsense” and “insulting” and “Does he really think no one on Metaculus took CS 101?” is not an amazing first impression of EA. You selected the one snippet where I was discussing a complicated topic (ease of algorithmic improvements) instead of low hanging and obviously wrong topics like Aschenbrenner seemingly being unable to do basic math (3^3) using his own estimates for compute improvements. I consider this to be a large misrepresentation of my argument and I hope that you respond to this forthcoming comment in good faith.
Anyway, I am crossposting my response from Metaculus, since I responded there at length:
...there is a cavernous gap between:
- we don’t know the lower bound computational complexity
versus
- 100,000x improvement is very much in the realm of possibilities, and - if you extend this trendline on a log plot, it will happen by 2027, and we should take this seriously (aka there is nothing that makes [the usual fraught issues with extending trendlines](https://xkcd.com/605/) appear here)
I find myself in the former camp. If you question that a sigmoid curve is likely, there is no logical basis to believe that 100,000x improvement in LLM algorithm output speed at constant compute (Aschenbrenner’s claim) is likely either.
Linch’s evidence to suggest that 100,000x is likely is:
- Moore’s Law happened [which was a hardware miniaturization problem, not strictly an algorithms problem, so doesn’t directly map onto this. But it is evidence that humans are capable of log plot improvement sometimes]
- “You can’t have infinite growth on a finite planet” is false [it is actually true, but we are not utilizing Earth anywhere near fully]
- “Numerical improvements happen all the time, sometimes by OOMs” [without cited evidence]
None of these directly show that 100,000x improvement in compute or speed is forthcoming for LLMs specifically. They are attempts to map other domains onto LLMs without a clear correspondence. Most domains don’t let you do trendline expansion like this. But I will entertain it, and provide a source to discuss (since they did not): [How Fast Do Algorithms Improve? (2021)](https://ieeexplore.ieee.org/document/9540991)
Some key takeaways:
1. Some algorithms do exhibit better-than-Moore’s-Law improvements when compared to brute force, although the likelihood of this is ~14% over the course of the entire examined time window (80 years). I would also add from looking at the plots that many of these historical improvements happened when computer science was still relatively young (1970s-1990s) and it is not obvious that this is so common nowadays with more sophisticated research in computer science. The actual yearly probability is super low (<1%) as you can see in the state diagram at the bottom of these charts in Figure 1: https://ieeexplore.ieee.org/document/9540991/figures#figures
2. Moore’s Law has slowed down, at least for CPUs. Although there is still further room in GPUs / parallel compute, the slowdown in CPUs is not a good portent for the multi-decade outlook of continued GPU scaling.
Some other things I would add:
1. LLMs already rest on decades of algorithmic advancements, for example, matrix multiplication. I would be very surprised if any algorithmic advancements can make matrix multiplication on the order of O(n^2) with a reasonable constant—it is a deeply researched human field of study and gains in it are harder to reach every year. We in theory have O(n^2.371552) but the constant in front (hidden in big O notation) is infeasibly large. Overall this one seems to have hit diminishing returns since 1990: ![](https://upload.wikimedia.org/wikipedia/commons/5/5b/MatrixMultComplexity_svg.svg)
2. There are currently trillions of dollars per year in LLMs and the current algorithmic improvements are the best we can muster. (Most of the impressive results recently have been compute driven, not algorithmic driven.) This implies that the problem might actually be very difficult instead of easy.
These two points nudge me in the direction that LLM algorithmic improvement might actually be harder than other algorithms, and therefore lead me to think that much less than 1% chance of big O improvement will happen each year. Sure, a priori ML model improvements have seemed ad hoc to an outside viewer, but that we still haven’t done better than ad hoc improvements also implies something about the problem difficulty.
I appreciate that you replied! I’m sorry if I was rude. I think you’re not engaging with what I actually said in my comment, which is pretty ironic. :)
(eg there are multiple misreadings. I’ve never interacted with you before so I don’t really know if they’re intentional)
The Metaculus timeline is already highly unreasonable given the resolution criteria,[1] and even these people think Aschenbrenner is unmoored from reality.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria don’t necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
All the tasks in both AGI questions are quite short, so it’s easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it can’t handle long-running tasks. It’s also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of “AI that can automate all knowledge work” (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions ‘yes’ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask “how do I make chemical weapons” it’ll read you a stiff lecture about why it can’t do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve ‘yes’ if a system that can pass the Turing test is actually constructed, it’s not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of “AGI” do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mention—asking the model to violate the terms of service is surely an easy way to win. That’s the resolution criteria, so unless the Metaculus users think that’ll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires “having less than ten SAT exams as part of the training data”, which is very unlikely in current Frontier models, and labs probably aren’t keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I don’t know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they ‘feel the AGI’ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate − 2027/2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they don’t fully match AGI, but that’s partly because ‘AGI’ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
What I meant to say was unfair was basing “even Metaculus users, think Aschenbrenner’s stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?
Some very harsh criticism of Leopold Aschenbrenner’s recent AGI forecasts in the recent comments on this Metaculus question. People who are following stuff more closely than me will be able to say whether or not they are reasonable:
I didn’t read all the comments, but Order’s are obvious nonsense, of the “(a+b^n)/n = x, therefore God exists” tier. Eg take this comment:
This is obviously invalid. The existence of a theoretical complexity upper bound (which incidentally Order doesn’t have numbers of) doesn’t mean we are anywhere near it, numerically. Those aren’t even the same level of abstraction! Furthermore, we have clear theoretical proofs for how fast sorting can get, without AFAIK any such theoretical limits for learning. “algorithms cannot infinitely improve” is irrelevant here, it’s the slightly more mathy way to say a deepity like “you can’t have infinite growth on a finite planet,” without actual relevant semantic meaning[1].
Numerical improvements happen all the time, sometimes by OOMs. No “new mathematics or physics” required.
Frankly, as a former active user of Metaculus, I feel pretty insulted by his comment. Does he really think no one on Metaculus took CS 101?
It’s probably true that every apparently “exponential” curve become a sigmoid eventually, knowing this fact doesn’t let you time the transition. You need actual object-level arguments and understanding, and even then it’s very very hard (as people arguing against Moore’s Law or for “you can’t have infinite growth on a finite planet” found out).
To be clear I also have high error bars on whether traversing 5 OOMs of algorithmic efficiency in the next five years are possible, but that’s because a) high error bars on diminishing returns to algorithmic gains, and b) a tentative model that most algorithmic gains in the past were driven by compute gains, rather than exogeneous to it. Algorithmic improvements in ML seems much more driven by the “f-ck around and find out” paradigm than deep theoretical or conceptual breakthroughs; if we model experimentation gains as a function of quality-adjusted researchers multiplied by compute multiplied by time, it’s obvious that the compute term is the one that’s growing the fastest (and thus the thing that drives the most algorithmic progress).
In the future I would recommend reading the full comment. Admitting your own lack of knowledge (not having read the comments) and then jumping to “obviously nonsense” and “insulting” and “Does he really think no one on Metaculus took CS 101?” is not an amazing first impression of EA. You selected the one snippet where I was discussing a complicated topic (ease of algorithmic improvements) instead of low hanging and obviously wrong topics like Aschenbrenner seemingly being unable to do basic math (3^3) using his own estimates for compute improvements. I consider this to be a large misrepresentation of my argument and I hope that you respond to this forthcoming comment in good faith.
Anyway, I am crossposting my response from Metaculus, since I responded there at length:
...there is a cavernous gap between:
- we don’t know the lower bound computational complexity
versus
- 100,000x improvement is very much in the realm of possibilities, and
- if you extend this trendline on a log plot, it will happen by 2027, and we should take this seriously (aka there is nothing that makes [the usual fraught issues with extending trendlines](https://xkcd.com/605/) appear here)
I find myself in the former camp. If you question that a sigmoid curve is likely, there is no logical basis to believe that 100,000x improvement in LLM algorithm output speed at constant compute (Aschenbrenner’s claim) is likely either.
Linch’s evidence to suggest that 100,000x is likely is:
- Moore’s Law happened [which was a hardware miniaturization problem, not strictly an algorithms problem, so doesn’t directly map onto this. But it is evidence that humans are capable of log plot improvement sometimes]
- “You can’t have infinite growth on a finite planet” is false [it is actually true, but we are not utilizing Earth anywhere near fully]
- “Numerical improvements happen all the time, sometimes by OOMs” [without cited evidence]
None of these directly show that 100,000x improvement in compute or speed is forthcoming for LLMs specifically. They are attempts to map other domains onto LLMs without a clear correspondence. Most domains don’t let you do trendline expansion like this. But I will entertain it, and provide a source to discuss (since they did not): [How Fast Do Algorithms Improve? (2021)](https://ieeexplore.ieee.org/document/9540991)
Some key takeaways:
1. Some algorithms do exhibit better-than-Moore’s-Law improvements when compared to brute force, although the likelihood of this is ~14% over the course of the entire examined time window (80 years). I would also add from looking at the plots that many of these historical improvements happened when computer science was still relatively young (1970s-1990s) and it is not obvious that this is so common nowadays with more sophisticated research in computer science. The actual yearly probability is super low (<1%) as you can see in the state diagram at the bottom of these charts in Figure 1: https://ieeexplore.ieee.org/document/9540991/figures#figures
2. Moore’s Law has slowed down, at least for CPUs. Although there is still further room in GPUs / parallel compute, the slowdown in CPUs is not a good portent for the multi-decade outlook of continued GPU scaling.
Some other things I would add:
1. LLMs already rest on decades of algorithmic advancements, for example, matrix multiplication. I would be very surprised if any algorithmic advancements can make matrix multiplication on the order of O(n^2) with a reasonable constant—it is a deeply researched human field of study and gains in it are harder to reach every year. We in theory have O(n^2.371552) but the constant in front (hidden in big O notation) is infeasibly large. Overall this one seems to have hit diminishing returns since 1990:
![](https://upload.wikimedia.org/wikipedia/commons/5/5b/MatrixMultComplexity_svg.svg)
2. There are currently trillions of dollars per year in LLMs and the current algorithmic improvements are the best we can muster. (Most of the impressive results recently have been compute driven, not algorithmic driven.) This implies that the problem might actually be very difficult instead of easy.
These two points nudge me in the direction that LLM algorithmic improvement might actually be harder than other algorithms, and therefore lead me to think that much less than 1% chance of big O improvement will happen each year. Sure, a priori ML model improvements have seemed ad hoc to an outside viewer, but that we still haven’t done better than ad hoc improvements also implies something about the problem difficulty.
I appreciate that you replied! I’m sorry if I was rude. I think you’re not engaging with what I actually said in my comment, which is pretty ironic. :)
(eg there are multiple misreadings.
I’ve never interacted with you before so I don’t really know if they’re intentional)(I replied more substantively on Metaculus)
The Metaculus timeline is already highly unreasonable given the resolution criteria,[1] and even these people think Aschenbrenner is unmoored from reality.
Remind me to write this up soon
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I actually think the two Metaculus questions are just bad questions. The detailed resolution criteria don’t necessarily match what we intuitively think=AGI or transformative AI, or obviously capture anything that important, and it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
All the tasks in both AGI questions are quite short, so it’s easy to imagine an AI beating all of them, and yet not being able to replace most human knowledge workers, because it can’t handle long-running tasks. It’s also just not clear how performance on benchmark questions and the Turing test translates to competence with even short-term tasks in the real world. So even if you think AGI in the sense of “AI that can automate all knowledge work” (let alone all work) is far away, it might make sense to think we are only a few years from a system that can resolve these questions yes.
On the other hand, resolving the questions ‘yes’ could conceivably lag the invention of some very powerful and significant systems, perhaps including some that some reasonable definition would count as AGI.
As someone points out in the comments of one of the questions; right now, any mainstream LLM will fail the Turing test, however smart, because if you ask “how do I make chemical weapons” it’ll read you a stiff lecture about why it can’t do that as it would violate its principles. In theory, that could remain true even if we reach AGI. (The questions only resolve ‘yes’ if a system that can pass the Turing test is actually constructed, it’s not enough for this to be easy to do if Open AI or whoever want to.) And the stronger of the two questions requires that a system can do a complex manual task. Fair enough, some reasonable definitions of “AGI” do require machines that can match humans at every manual dexterity-based cognitive task. But a system that could automate all knowledge work, but not handle piloting a robot body would still be quite transformative.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?
Two of the four in particular stand out. First, the Turing Test one exactly for the reason you mention—asking the model to violate the terms of service is surely an easy way to win. That’s the resolution criteria, so unless the Metaculus users think that’ll be solved in 3 years[1] then the estimates should be higher. Second, the SAT-passing requires “having less than ten SAT exams as part of the training data”, which is very unlikely in current Frontier models, and labs probably aren’t keen to share what exactly they have trained on.
it is just unclear whether people are forecasting on the actual resolution criteria or on their own idea of what “AGI” is.
No reason to assume an individual Metaculus commentator agrees with the Metaculus timeline, so I don’t think that’s very fair.
I don’t know if it is unfair. This is Metaculus! Premier forecasting website! These people should be reading the resolution criteria and judging their predictions according to them. Just going off personal vibes on how much they ‘feel the AGI’ feels like a sign of epistemic rot to me. I know not every Metaculus user agrees with this, but it is shaped by the aggregate − 2027/2032 are very short timelines, and those are median community predictions. This is my main issue with the Metaculus timelines atm.
I actually think the two Metaculus questions are just bad questions.
I mean, I do agree with you in the sense that they don’t fully match AGI, but that’s partly because ‘AGI’ covers a bunch of different ideas and concepts. It might well be possible for a system to satisfy these conditions but not replace knowledge workers, perhaps a new market focusing on automation and employment might be better but that also has its issues with operationalisation.
On top of everything else needed to successfully pass the imitation game
What I meant to say was unfair was basing “even Metaculus users, think Aschenbrenner’s stuff is bad, and they have short time lines, off the reaction to Aschenbrenner of only one or two people.
Which particular resolution criteria do you think it’s unreasonable to believe will be met by 2027/2032 (depending on whether it’s the weak AGI question or the strong one)?