Yarrow Bouchard 🔸 comments on Yarrow’s Quick takes

Yarrow Bouchard 🔸 5 May 2025 8:10 UTC
2 points
1 ∶ 1
Here is the situation we’re in with regard to near-term prospects for artificial general intelligence (AGI). This is why I’m extremely skeptical of predictions that we’ll see AGI within 5 years.
-Current large language models (LLMs) have extremely limited capabilities. For example, they can’t score above 5% on the ARC-AGI-2 benchmark, they can’t automate any significant amount of human labour,^[1] and they can only augment human productivity in minor ways in limited contexts.^[2] They make ridiculous mistakes all the time, like saying something that happened in 2025 caused something that happened in 2024, while listing the dates of the events. They struggle with things that are easy for humans, like playing hangman.
-The capabilities of LLMs have been improving slowly. There is only a modest overall difference between GPT-3.5 (the original ChatGPT model), which came out in November 2022, and newer models like GPT-4o, o4-mini, and Gemini 2.5 Pro.
-There are signs that there are diminishing returns to scaling for LLMs. Increasing the size of models and the size of the pre-training data doesn’t seem to be producing the desired results anymore. LLM companies have turned to scaling test-time compute to eke out more performance gains, but how far can that go?
-There may be certain limits to scaling that are hard or impossible to overcome. For example once you’ve trained a model on all the text that exists in the world, you can’t keep training on exponentially^[3] more text every year. Current LLMs might be fairly close to running out of exponentially^[4] more text to train on, if they haven’t run out already.^[5]
-A survey of 475 AI experts found that 76% think it’s “unlikely” or “very unlikely” that “scaling up current AI approaches” will lead to AGI. So, we should be skeptical of the idea that just scaling up LLMs will lead to AGI, even if LLM companies manage to keep scaling them up and improving their performance by doing so.
-Few people have any concrete plan for how to build AGI (beyond just scaling up LLMs). The few people who do have a concrete plan disagree fundamentally on what the plan should be. All of these plans are in the early-stage research phase. (I listed some examples in a comment here.)
-Some of the scenarios people are imagining where we get to AGI in the near future involve strange, exotic, hypothetical process wherein a sub-AGI AI system can automate the R&D that gets us from a sub-AGI AI system to AGI. This requires two things to be true: 1) that doing the R&D needed to create AGI is not a task that would require AGI or human-level AI and 2) that, in the near term, AI systems somehow advance to the point where they’re able to do meaningful R&D autonomously. Given that I can’t even coax o4-mini or Gemini 2.5 Pro into playing hangman properly, and given the slow improvement of LLMs and the signs of diminishing returns to scaling I mentioned, I don’t see how (2) could be true. The arguments for (1) feel very speculative and handwavy.
Given all this, I genuinely can’t understand why some people think there’s a high chance of AGI within 5 years. I guess the answer is they probably disagree on most or all of these individual points.
Maybe they think the conventional written question and answer benchmarks for LLMs are fair apples-to-apples comparisons of machine intelligence and human intelligence. Maybe they are really impressed with the last 2 to 2.5 years of progress in LLMs. Making they are confident no limits to scaling or diminishing returns to scaling will stop progress anytime soon. Maybe they are confident that scaling up LLMs is a path to AGI. Or maybe they think LLMs will soon be able to take over the jobs of researchers at OpenAI, Anthropic, and Google DeepMind.
I have a hunch (just a hunch) that it’s not a coincidence many people’s predictions are converging (or herding) around 2030, give or take a few years, and that 2029 has been the prophesied year for AGI since Ray Kurzweil’s book The Age of Spiritual Machines in 1999. It could be a coincidence. But I have a sense that there has been a lot of pent-up energy around AGI for a long time and ChatGPT was like a match in a powder keg. I don’t get the sense that people formed their opinions about AGI timelines in 2023 and 2024 from a blank slate.
I think many people have been primed for years by people like Ray Kurzweil and Eliezer Yudkowsky and by the transhumanist and rationalist subcultures to look for any evidence that AGI is coming soon and to treat that evidence as confirmation of their pre-existing beliefs. You don’t have to be directly influenced by these people or by these subcultures to be influenced. If enough people are influenced by them or a few prominent people are influenced, then you end up getting influenced all the same. And when it comes to making predictions, people seem to have a bias toward herding, i.e., making their predictions more similar to the predictions they’ve heard, even if that ends up making their predictions less accurate.
The process by which people come up with the year they think AGI will happen seems especially susceptible to herding bias. You ask yourself when you think AGI will happen. A number pops into your head that feels right. How does this happen? Who knows.
If you try to build a model to predict when AGI will happen, you still can’t get around it. Some of your key inputs to the model will require you to ask yourself a question and wait a moment for a number to pop into your head that feels right. The process by which this happens will still be mysterious. So, the model is ultimately no better than pure intuition because it is pure intuition.
I understand that, in principle, it’s possible to make more rigorous predictions about the future than this. But I don’t think that applies to predicting the development of a hypothetical technology where there is no expert agreement on the fundamental science underlying that technology, and not much in the way of fundamental science in that area at all. That seems beyond the realm of ordinary forecasting.
1. ^
  This post discusses LLMs and labour automation in the section “Real-World Adoption”.
2. ^
  One study I found had mixed results. It looked at the use of LLMs to aid people working in customer support, which seems like it should be one of the easiest kinds of jobs to automate using LLMs. The study found that the LLMs increased productivity for new, inexperienced employees but decreased productivity for experienced employees who already knew the ins and outs of the job:
  These results are consistent with the idea that generative AI tools may function by exposing lower-skill workers to the best practices of higher-skill workers. Lower-skill workers benefit because AI assistance provides new solutions, whereas the best performers may see little benefit from being exposed to their own best practices. Indeed, the negative effects along measures of chat quality—RR [resolution rate] and customer satisfaction—suggest that AI recommendations may distract top performers or lead them to choose the faster or less cognitively taxing option (following suggestions) rather than taking the time to come up with their own responses.
3. ^
  I’m using “exponentially” colloquially to mean every year the LLM’s training dataset grows by 2x or 5x or 10x — something along those lines. Technically, if the training dataset increased by 1% a year, that would be exponential, but let’s not get bogged down in unimportant technicalities.
4. ^
  Yup, still using it colloquially.
5. ^
  Epoch AI published a paper in June 2024 that predicts LLMs will exhaust the Internet’s supply of publicly available human-written text between 2026 and 2032.