There’s a longer discussion of that oft-discussed METR time horizons graph that warrants a post of its own.
My problem with how people interpret the graph is that people slip quickly and wordlessly from step to step in a logical chain of inferences that I don’t think can be justified. The chain of inferences is something like:
AI model performance on a set of very limited benchmark tasks → AI model performance on software engineering in general → AI model performance on everything humans do
There’s a longer discussion of that oft-discussed METR time horizons graph that warrants a post of its own.
My problem with how people interpret the graph is that people slip quickly and wordlessly from step to step in a logical chain of inferences that I don’t think can be justified. The chain of inferences is something like:
AI model performance on a set of very limited benchmark tasks → AI model performance on software engineering in general → AI model performance on everything humans do
I don’t think these inferences are justifiable.