Yarrow Bouchard 🔸 comments on Even after GPT-4, AI researchers forecasted a 50% chance of AGI by 2047 or 2116, depending how you define AGI

Yarrow Bouchard 🔸 1 Nov 2025 12:56 UTC
1 point
0 ∶ 0
I don’t think the pay for a person solving problems for Waymos would need to be that much higher than taxi drivers but even if it is 1.5x or 2x, that would still be an order of magnitude reduction in labor cost. Of course there is additional hardware cost for autonomous vehicles, but I think that can be paid off with a reasonable duty cycle. So then if you grant that the geofencing is less than a 20x safety advantage, I think there is an economic case for the chimeras, as you say.
This could be true, but you have to account for other elements of the cost structure. For example, can you improve the ratio of engineers to autonomous vehicles from the current ratio of around 1:1 or 1:2 to something like 1:1000 or 1:10,000?

It seems like Waymo is using methods that scale with engineer labour rather than learning methods that scale with data and compute. So, to deploy more vehicles in more areas would require commensurately more engineers, and, in addition to being too expensive, there are simply not enough of them on Earth.
As for AGI, I do think that some people conflate the 50% success time horizon at coding tasks reaching ~one month as AGI, when it really only means superhuman coder (I think something like 50% success rate is appropriate for long coding tasks, because humans rarely produce code that works on the first try).
It doesn’t mean that, either. METR has found that current frontier AI systems are worse in real world, practical use cases than not using AI at all.

Automatically gradable benchmarks generally don’t seem to have much to do with the ability to do tasks in the real world. So, predicting real world performance based on benchmark performance seems to just be an invalid inference.
Anecdotally, what I hear from people who say they find AI coding assistants useful is that it saves them the time it would take to copy and paste code from Stack Exchange. I have never heard anything along the lines of “it came up with a new idea” or “it was creative”. Yet this is what human-level coding would require.
However, once you get to superhuman coder, the logic goes that the progress in AI research will dramatically accelerate. In AI 2027, I think there was some assumption like it would take 90 years at current progress to get something like AGI, but when things accelerate, it ends up happening in a year or two.
The obvious objection to this as it pertains to the initial advent of AGI, rather than to superintelligence, is that this is a chicken-and-egg problem. If you need AGI to do AGI R&D, AGI can’t help you develop AGI because you haven’t invented it yet. You would need a sub-human AI that can do tasks that speed up AI research and AI engineering. And that seems dubious. Is automating this kind of work not an AGI-level problem?
You’re right it’s not fully general, but AI systems are much more general than they were 7 years ago.
I don’t know if I believe this. LLMs are impressive, but their scope is fairly narrow. They have memorized most of the important digital/digitized text that’s available. Their next-token prediction and everything layered on top of that like fine-tuning to follow instructions, reinforcement learning from human feedback (RLHF), and Chain of Thought results in some impressive behaviours. But they are extremely brittle. They routinely make errors on very basic tasks.

I think of LLMs more like another type of AI system that are proficient in one area, comparable to game-playing RL agents. LLMs are good at many text-related tasks (including math and coding, which are also text) but they aren’t able to generalize beyond the text-related tasks they have massive amounts of training data for. They don’t do well outside of text-related tasks, they don’t do well with novelty, they frequently fail to reason properly, etc.

So, I’m not sure LLMs are all that more general than previous systems like MuZero and AlphaZero.

Part of generality or generalization is that you should see positive transfer learning, i.e., having skills in some domains should improve the AI system’s skills in other domains. But it seems like we see is the opposite. That is, it seems like we see negative transfer learning. Training an AI on many diverse, heterogeneous tasks from multiple domains seems to hurt performance. That’s narrowness, not generality.
LessWrong is also known for having very short timelines, but I think the last survey indicated median time for AGI was 2040. So I do think that it is a vocal minority in EA and LW that have median timelines before 2030.
That’s very interesting, if you remember correctly. I would be interested in seeing survey data both for LessWrong and for EA.
- Denkenberger🔸 1 Nov 2025 20:57 UTC
  2 points
  0 ∶ 0
  Parent
  If you need AGI to do AGI R&D, AGI can’t help you develop AGI because you haven’t invented it yet. You would need a sub-human AI that can do tasks that speed up AI research and AI engineering. And that seems dubious. Is automating this kind of work not an AGI-level problem?
  So, I’m not sure LLMs are all that more general than previous systems like MuZero and AlphaZero.
  I don’t think you can have it both ways: A superhuman coder (that is actually competent, which you don’t think AI assistants are now) is relatively narrow AI, but would accelerate AI progress. A superhuman AI researcher is more general (which would drastically speed up AI progress), but is not fully general. I would argue that LLMs now are more general than AI researcher tasks (though LLMs are currently not good at all of those tasks), because LLMs can competently discuss philosophy, economics, political science, art, history, engineering, science, etc.
  For example, can you improve the ratio of engineers to autonomous vehicles from the current ratio of around 1:1 or 1:2 to something like 1:1000 or 1:10,000?
  I’m claiming that they could approach overall staff to vehicle ratio of 1:10 if the number of real-time helpers (which don’t have to be engineers) and vehicles were dramatically scaled up, and that’s enough for profitability.
  I would be interested in seeing survey data both for LessWrong and for EA.
  The 2023 LessWrong survey was median 2040 for singularity, and 2030 for “By what year do you think AI will be able to do intellectual tasks that expert humans currently do?”. The second question was ambiguous, and some people put it in the past. I haven’t seen a similar survey result for EAs, but I expect longer timelines than LW.
  - Yarrow Bouchard 🔸 2 Nov 2025 13:01 UTC
    1 point
    0 ∶ 0
    Parent
    I don’t think you can have it both ways: A superhuman coder (that is actually competent, which you don’t think AI assistants are now) is relatively narrow AI, but would accelerate AI progress. A superhuman AI researcher is more general (which would drastically speed up AI progress), but is not fully general.
    I definitely disagree with this. Hopefully what I say below will explain why.
    I would argue that LLMs now are more general than AI researcher tasks (though LLMs are currently not good at all of those tasks), because LLMs can competently discuss philosophy, economics, political science, art, history, engineering, science, etc.
    The general in artificial general intelligence doesn’t just refer to having a large repertoire of skills. Generality is about the ability to learn to generalize beyond what a system has seen in its training data. An artificial general intelligence doesn’t just need to have new skills, it needs to be able to acquire new skills, and to acquire new skills that have never existed in history before by developing them itself — just as humans do.
    
    If a new video game comes out today, I’m able to play that game and develop a new skill that has never existed before.^[1] I will probably get the hang of it in a few minutes, with a few attempts. That’s general intelligence.
    
    AlphaStar was not able to figure out how to play StarCraft using pure reinforcement learning. It just got stuck using its builders to attack the enemy, rather than figuring out how to use its builders to make buildings that produce units that attack. To figure out the basics of the game, it needed to do imitation learning on a very large dataset of human play. Then, after imitation learning, to get as good as it did, it needed to do an astronomical amount of self-play, around 60,000 years of playing StarCraft. That’s not general intelligence. If you need to copy a large dataset of human examples to acquire a skill and millennia of training on automatically gradable, relatively short time horizon tasks (which often don’t exist in the real world), that’s something, and it’s even something impressive, but it’s not general intelligence.
    
    Let’s say you wanted to apply this kind of machine learning to AI R&D. The necessary conditions don’t apply. You don’t have a large dataset of human examples to train on. You don’t have automatically gradable, relatively short time horizon tasks with which to do reinforcement learning. And if the tasks require real world feedback and can’t be simulated, you certainly don’t have 60,000 years.
    
    I like what the AI researcher François Chollet has to say about this topic in this video from 11:45 to 20:00. He draws the distinction between crystallized behaviours and fluid intelligence, between skills and the ability to learn skills. I think this is important. This is really what the whole topic of AGI is about.
    
    Why have LLMs absorbed practically all text on philosophy, economics, political science, art, history, engineering, science, and so on and not come up with a single novel and correct idea of any note in any of these domains? They are not able to generalize enough to do so. They can generalize or interpolate a little bit beyond their training data, but not very much. It’s that generalization ability (which is mostly missing in LLMs) that’s the holy grail in AI research.
    I’m claiming that they could approach overall staff to vehicle ratio of 1:10 if the number of real-time helpers (which don’t have to be engineers) and vehicles were dramatically scaled up, and that’s enough for profitability.
    There are two concepts here. One is remote human assistance, which Waymo calls fleet response. The other is Waymo’s approach to the engineering problem. I was saying that I suspect Waymo’s approach to the engineering problem doesn’t scale. I think it probably relies on engineers doing too much special casing that doesn’t generalize well when a modest amount of novelty is introduced. So, Waymo currently has something like 1,500 engineers to operate in the comparatively small geofenced areas where it currently operates. If it wanted to expand where it drives to a 10x larger area, would its techniques generalize to that larger area, or would it need to hire commensurately more engineers?
    
    I suspect that Waymo faces the problem of trying to do far too much essentially by hand, just adding incremental fix after fix as problems arise. The ideal would be to, instead, apply machine learning techniques that can learn from data and generalize to new scenarios and new driving conditions. Unfortunately, current machine learning techniques do not seem to be up to that task.
    The 2023 LessWrong survey was median 2040 for singularity, and 2030 for “By what year do you think AI will be able to do intellectual tasks that expert humans currently do?”. The second question was ambiguous, and some people put it in the past.
    Thank you. Well, that isn’t surprising at all.
    ^
    Okay, well maybe the play testers and the game developers have developed the skill before me, but then at some point one of them had to be the first person in history to ever acquire the skill of playing that game.
    - Denkenberger🔸 23 Dec 2025 21:38 UTC
      4 points
      0 ∶ 0
      Parent
      Quoting myself:
      So I do think that it is a vocal minority in EA and LW that have median timelines before 2030.
      Now we have some data on AGI timelines for EA (though it was only 34 responses, so of course there could be large sample bias): about 15% expect it by 2030 or sooner.
      - Yarrow Bouchard 🔸 26 Dec 2025 22:23 UTC
        2 points
        0 ∶ 0
        Parent
        But 47% (16 out of 34) put their median year no later than 2032 and 68% (23 out of 34) put their median year no later than 2035, so how significant a finding this is depends how much you care about those extra 2-5 years, I guess.
        
        Only 12% (4 out of 34) of respondents to the poll put their median year after 2050. So, overall, respondents overwhelmingly see relatively near-term AGI (within 25 years) as at least 50% likely.