Yarrow Bouchard 🔸 comments on How I hope the EA community will respond to the AI bubble popping

Yarrow Bouchard 🔸 9 Dec 2025 14:15 UTC
0 points
0 ∶ 0
What I just said: AI systems acting like a toddler or a cat would make me think AGI might be developed soon.

I’m not sure FrontierMath is any more meaningful than any other benchmark, including those on which LLMs have already gotten high scores. But I don’t know.
- David Mathers🔸 9 Dec 2025 14:51 UTC
  11 points
  0 ∶ 0
  Parent
  I asked about genuine research creativity not AGI, but I don’t think this conversation is going anywhere at this point. It seems obvious to me that “does stuff mathematicians say makes up the building blocks of real research” is meaningful evidence that the chance that models will do research level maths in the near future is not ultra-low, given that capabilities do increase with time. I don’t think this analogous to IQ tests or the bar exam, and for other benchmarks, I would really need to see what your claiming is the equivalent of the transfer from frontier math 4 to real math that was intuitive but failed.
  - Yarrow Bouchard 🔸 9 Dec 2025 16:04 UTC
    2 points
    0 ∶ 0
    Parent
    What percentage probability would you assign to your ability to accurately forecast this particular question?
    I’m not sure why you’re interested in getting me to forecast this. I haven’t ever made any forecasts about AI systems’ ability to do math research. I haven’t made any statements about AI systems’ current math capabilities. I haven’t said that evidence of AI systems’ ability to do math research would affect how I think about AGI. So, what’s the relevance? Does it have a deeper significance, or is it just a random tangent?
    
    If there is a connection to the broader topic of AGI or AI capabilities, I already gave a bunch of examples of evidence I would consider to be relevant and that would change my mind. Math wasn’t one of them. I would be happy to think of more examples as well.
    
    I think a potentially good counterexample to your argument about FrontierMath → original math research is natural language processing → replacing human translators. Surely you would agree that LLMs have mastered the basic building blocks of translation? So, 2-3 years after GPT-4, why is demand for human translators still growing? One analysis claims that growth is counterfactually less that it would have been without the increase in the usage of machine translation, but demand is still growing.
    
    I think this points to the difficulty in making these sorts of predictions. If back in 2015, someone had described to you the capabilities and benchmark performance of GPT-4 in 2023, as well as the rate of scaling of new models and progress on benchmarks, would you have thought that demand for human translators would continue to grow for at least the next 2-3 years?
    
    I don’t have any particular point other than what seems intuitively obvious in the realm of AI capabilities forecasting may in fact be false, and I am skeptical of hazy extrapolations.
    The most famous example of a failed prediction of this sort is Geoffrey Hinton’s prediction in 2016 that radiologists’ jobs would be fully automated by 2021. Almost ten years after this prediction, the number of radiologists is still growing and radiologists’ salaries are growing. AI tools that assist in interpreting radiology scans exist, but evidence is mixed on whether they actually help or hinder radiologists (and possibly harm patients).