Yarrow Bouchard 🔸 comments on Yarrow’s Quick takes

Yarrow Bouchard 🔸 16 Nov 2025 22:23 UTC
3 points
2 ∶ 0
What do you make of the fact that METR’s time horizon graph and METR’s study on AI coding assistants point in opposite directions? The graph says: exponential progress! Superhuman coders! AGI soon! Singularity! The study says: overhyped product category, useless tool, tricks people into thinking it helps them when it actually hurts them.

Pretty interesting, no?
- niplav 17 Nov 2025 19:27 UTC
  3 points
  0 ∶ 0
  Parent
  Yep, I wouldn’t have predicted that. I guess the standard retort is: Worst case! Existing large codebase! Experienced developers!
  
  I know that there’s software tools I use >once a week that wouldn’t have existed without AI models. They’re not very complicated, but they’d’ve been annoying to code up myself, and I wouldn’t have done it. I wonder if there’s a slowdown in less harsh scenarios, but it’s probably not worth the value of information of running such a study.
  
  I dunno. I’ve done a bunch of calibration practice^[1], this feels like a 30%, I’m calling 30%. My probability went up recently, mostly because some subjectively judged capabilities that I was expecting didn’t start showing up.
  ↩︎
  My metaculus calibration around 30% isn’t great, I’m overconfident there, I’m trying to keep that in mind. My fatebook is slightly overconfident in that range, and who can tell with Manifold.
  - Yarrow Bouchard 🔸 17 Nov 2025 20:18 UTC
    2 points
    0 ∶ 0
    Parent
    There’s a longer discussion of that oft-discussed METR time horizons graph that warrants a post of its own.
    
    My problem with how people interpret the graph is that people slip quickly and wordlessly from step to step in a logical chain of inferences that I don’t think can be justified. The chain of inferences is something like:
    
    AI model performance on a set of very limited benchmark tasks → AI model performance on software engineering in general → AI model performance on everything humans do
    I don’t think these inferences are justifiable.