Two pieces of evidence commonly cited for near-term AGI are AI 2027 and the METR time horizons graph. AI 2027 is open to multiple independent criticisms, one of which is its use of the METR time horizons graph to forecast near-term AGI or AI capabilities more generally. Using the METR graph to forecast near-term AGI or AI capabilities more generally is not supported by the data and methodology used to make the graph.
Two strong criticisms that apply specifically to the AI 2027 forecast are:
It depends crucially on the subjective intuitions or guesses of the authors. If you don’t personally share the authors’ intuitions, or don’t personally trust that the authors’ intuitions are likely correct, then there is no particular reason to take AI 2027′s conclusions seriously.
Credible critics claim that the headline results of the AI 2027 timelines model are largely baked in by the authors’ modelling decisions, irrespective of what data the model uses. That means, to a large extent, AI 2027′s conclusions are not actually determined by the data they use. We already saw with the previous bullet point that the conclusions of AI 2027 are largely a restatement of the authors’ personal and contestable beliefs. This is another way in which AI 2027′s conclusions are, effectively, a restatement of the pre-existing beliefs or assumptions that the authors chose to embed in their timelines model.
AI 2027 is largely based on extrapolating the METR time horizons graph. The following criticisms of the METR time horizons graph therefore extend to AI 2027:
Some of the serious problems and limitations of the METR time horizons graph are sometimes (but not always) clearly disclosed by METR employees. Note the wide difference between the caveated description of what the graph says and the interpretation of the graph as a strong indicator of rapid, exponential improvement in general AI capabilities.
Gary Marcus, a cognitive scientist and AI researcher, and Ernest Davis, a computer scientist and AAAI fellow, co-authored a blog post on the METR graph that looks at how the graph was made and concludes that “attempting to use the graph to make predictions about the capacities of future AI is misguided”.
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, published a detailed breakdown of some of the problems with METR’s methodology. He concludes that it’s “impossible to draw meaningful conclusions from METR’s Long Tasks benchmark” and that the METR graph “contains far too many compounding errors to excuse”. Witkin calls out a specific tweet from METR, which presents the METR graph in the broad, uncaveated way that the AI 2027 authors interpret it. He calls the tweet “an uncontroversial example of misleading science communication”.
Since AI 2027 leans so heavily on this interpretation of the METR graph to make its forecast, it is hard to see how AI 2027 could be credible if its interpretation of the METR graph is not credible.
It’s worth contrasting AI 2027 and similar forecasts of near-term AGI with expert opinion:
76% of AI experts think it is unlikely or very unlikely that existing approaches to AI, which includes LLMs, will scale to AGI. (See page 66 of the AAAI 2025 survey. See also the preceding two pages about open research challenges in AI — such as continual learning, long-term planning, generalization, and causal reasoning — none of which are about scaling more, or at least not uncontroversially so. If you want an example of a specific, prominent AI researcher who emphasizes the importance of fundamental AI research over scaling, Ilya Sutskever believes that further scaling will be inadequate to get to AGI.)
Expert surveys about AGI timelines are not necessarily reliable, but the AI Impacts survey in late 2023 found that AI researchers’ median year for AGI is 20 to 90 years later than the AI 2027 scenario.
Two overall takeaways:
There are good reasons to be highly skeptical of AI 2027 and the METR time horizons graph as evidence for near-term AGI or for a rapid, exponential increase in general AI capabilities.
Peer review in academic research is designed to catch these sort of flaws prior to publication. This means flaws can be fixed, the claims made can be moderated and properly caveated, or publication can be prevented entirely so that research below a certain threshold of quality or rigour is not given the stamp of approval. (This helps readers know what’s worth paying attention to and what isn’t.) Research published via blogs and self-published reports don’t go through academic peer review, and may fall below the standards of academic publishing. In the absence of peer review doing quality control, deeply flawed research, or deeply flawed interpretations of research, may propagate.
[Adapted from this comment.]
Two pieces of evidence commonly cited for near-term AGI are AI 2027 and the METR time horizons graph. AI 2027 is open to multiple independent criticisms, one of which is its use of the METR time horizons graph to forecast near-term AGI or AI capabilities more generally. Using the METR graph to forecast near-term AGI or AI capabilities more generally is not supported by the data and methodology used to make the graph.
Two strong criticisms that apply specifically to the AI 2027 forecast are:
It depends crucially on the subjective intuitions or guesses of the authors. If you don’t personally share the authors’ intuitions, or don’t personally trust that the authors’ intuitions are likely correct, then there is no particular reason to take AI 2027′s conclusions seriously.
Credible critics claim that the headline results of the AI 2027 timelines model are largely baked in by the authors’ modelling decisions, irrespective of what data the model uses. That means, to a large extent, AI 2027′s conclusions are not actually determined by the data they use. We already saw with the previous bullet point that the conclusions of AI 2027 are largely a restatement of the authors’ personal and contestable beliefs. This is another way in which AI 2027′s conclusions are, effectively, a restatement of the pre-existing beliefs or assumptions that the authors chose to embed in their timelines model.
AI 2027 is largely based on extrapolating the METR time horizons graph. The following criticisms of the METR time horizons graph therefore extend to AI 2027:
Some of the serious problems and limitations of the METR time horizons graph are sometimes (but not always) clearly disclosed by METR employees. Note the wide difference between the caveated description of what the graph says and the interpretation of the graph as a strong indicator of rapid, exponential improvement in general AI capabilities.
Gary Marcus, a cognitive scientist and AI researcher, and Ernest Davis, a computer scientist and AAAI fellow, co-authored a blog post on the METR graph that looks at how the graph was made and concludes that “attempting to use the graph to make predictions about the capacities of future AI is misguided”.
Nathan Witkin, a research writer at NYU Stern’s Tech and Society Lab, published a detailed breakdown of some of the problems with METR’s methodology. He concludes that it’s “impossible to draw meaningful conclusions from METR’s Long Tasks benchmark” and that the METR graph “contains far too many compounding errors to excuse”. Witkin calls out a specific tweet from METR, which presents the METR graph in the broad, uncaveated way that the AI 2027 authors interpret it. He calls the tweet “an uncontroversial example of misleading science communication”.
Since AI 2027 leans so heavily on this interpretation of the METR graph to make its forecast, it is hard to see how AI 2027 could be credible if its interpretation of the METR graph is not credible.
It’s worth contrasting AI 2027 and similar forecasts of near-term AGI with expert opinion:
76% of AI experts think it is unlikely or very unlikely that existing approaches to AI, which includes LLMs, will scale to AGI. (See page 66 of the AAAI 2025 survey. See also the preceding two pages about open research challenges in AI — such as continual learning, long-term planning, generalization, and causal reasoning — none of which are about scaling more, or at least not uncontroversially so. If you want an example of a specific, prominent AI researcher who emphasizes the importance of fundamental AI research over scaling, Ilya Sutskever believes that further scaling will be inadequate to get to AGI.)
Expert surveys about AGI timelines are not necessarily reliable, but the AI Impacts survey in late 2023 found that AI researchers’ median year for AGI is 20 to 90 years later than the AI 2027 scenario.
Two overall takeaways:
There are good reasons to be highly skeptical of AI 2027 and the METR time horizons graph as evidence for near-term AGI or for a rapid, exponential increase in general AI capabilities.
Peer review in academic research is designed to catch these sort of flaws prior to publication. This means flaws can be fixed, the claims made can be moderated and properly caveated, or publication can be prevented entirely so that research below a certain threshold of quality or rigour is not given the stamp of approval. (This helps readers know what’s worth paying attention to and what isn’t.) Research published via blogs and self-published reports don’t go through academic peer review, and may fall below the standards of academic publishing. In the absence of peer review doing quality control, deeply flawed research, or deeply flawed interpretations of research, may propagate.