Executive summary: This reflective, critical post argues that the AI 2027 “benchmarks and gaps” timelines model is not meaningfully better than the authors’ earlier model, because it relies heavily on sparse, subjective forecasts and weakly justified modeling choices that are more likely to reflect systematic bias than to provide reliable information about AI timelines.
Key points:
The author argues that the benchmarks-and-gaps model’s added complexity is unjustified and increases the risk of overfitting or embedding the forecasters’ desired conclusions.
The model’s main formal component is a logistic fit to RE-Bench saturation, which the AI 2027 authors themselves now suggest may not belong in the model.
After RE-Bench saturation, the model largely consists of summing forecasters’ estimates for seven gaps, making it closer to an opinion aggregation exercise than a substantive model.
The author contends that under conditions of extreme uncertainty, such forecasts are likely dominated by systematic bias rather than informative signal.
A detailed case study of the “feedback loops” gap shows that the forecasts rely on sparse evidence, intuitive guesses, and reasoning often disconnected from the final quantitative estimates.
Removing progress speedups shifts timelines later but does not change the qualitative conclusion, indicating that speedups are not the core problem with the model.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: This reflective, critical post argues that the AI 2027 “benchmarks and gaps” timelines model is not meaningfully better than the authors’ earlier model, because it relies heavily on sparse, subjective forecasts and weakly justified modeling choices that are more likely to reflect systematic bias than to provide reliable information about AI timelines.
Key points:
The author argues that the benchmarks-and-gaps model’s added complexity is unjustified and increases the risk of overfitting or embedding the forecasters’ desired conclusions.
The model’s main formal component is a logistic fit to RE-Bench saturation, which the AI 2027 authors themselves now suggest may not belong in the model.
After RE-Bench saturation, the model largely consists of summing forecasters’ estimates for seven gaps, making it closer to an opinion aggregation exercise than a substantive model.
The author contends that under conditions of extreme uncertainty, such forecasts are likely dominated by systematic bias rather than informative signal.
A detailed case study of the “feedback loops” gap shows that the forecasts rely on sparse evidence, intuitive guesses, and reasoning often disconnected from the final quantitative estimates.
Removing progress speedups shifts timelines later but does not change the qualitative conclusion, indicating that speedups are not the core problem with the model.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.