Yep, I wouldn’t have predicted that. I guess the standard retort is: Worst case! Existing large codebase! Experienced developers!
I know that there’s software tools I use >once a week that wouldn’t have existed without AI models. They’re not very complicated, but they’d’ve been annoying to code up myself, and I wouldn’t have done it. I wonder if there’s a slowdown in less harsh scenarios, but it’s probably not worth the value of information of running such a study.
I dunno. I’ve done a bunch ofcalibrationpractice[1], this feels like a 30%, I’m calling 30%. My probability went up recently, mostly because some subjectively judged capabilities that I was expecting didn’t start showing up.
My metaculus calibration around 30% isn’t great, I’m overconfident there, I’m trying to keep that in mind. My fatebook is slightly overconfident in that range, and who can tell with Manifold.
There’s a longer discussion of that oft-discussed METR time horizons graph that warrants a post of its own.
My problem with how people interpret the graph is that people slip quickly and wordlessly from step to step in a logical chain of inferences that I don’t think can be justified. The chain of inferences is something like:
AI model performance on a set of very limited benchmark tasks → AI model performance on software engineering in general → AI model performance on everything humans do
Yep, I wouldn’t have predicted that. I guess the standard retort is: Worst case! Existing large codebase! Experienced developers!
I know that there’s software tools I use >once a week that wouldn’t have existed without AI models. They’re not very complicated, but they’d’ve been annoying to code up myself, and I wouldn’t have done it. I wonder if there’s a slowdown in less harsh scenarios, but it’s probably not worth the value of information of running such a study.
I dunno. I’ve done a bunch of calibration practice[1], this feels like a 30%, I’m calling 30%. My probability went up recently, mostly because some subjectively judged capabilities that I was expecting didn’t start showing up.
My metaculus calibration around 30% isn’t great, I’m overconfident there, I’m trying to keep that in mind. My fatebook is slightly overconfident in that range, and who can tell with Manifold.
There’s a longer discussion of that oft-discussed METR time horizons graph that warrants a post of its own.
My problem with how people interpret the graph is that people slip quickly and wordlessly from step to step in a logical chain of inferences that I don’t think can be justified. The chain of inferences is something like:
AI model performance on a set of very limited benchmark tasks → AI model performance on software engineering in general → AI model performance on everything humans do
I don’t think these inferences are justifiable.