Disclaimer: I work for Metaculus.
You can now forecast on how much AI benchmark progress will continue to be underestimated by the Metaculus Community Prediction (CP) on this Metaculus question! Thanks @Javier Prieto for prompting us to think more about this and inspiring this question!
Predict a distribution with a mean of
≈0.5, if you expect the CP to be decently calibrated or just aren’t sure about the direction of bias,
>0.5, if you think the CP will continue to underestimate AI benchmark progress,
<0.5, if you think the CP will overestimate AI benchmark progress, e.g. by overreacting to this post.
Here is a Colab Notebook to get you started with some simulations.
And don’t forget to update your forecasts about the AI benchmark progress questions in question, if the CP on this one has a mean far away from 0.5!
[Disclaimer: I’m working for FutureSearch]
To add another perspective: Reasoning helps aggregating forecasts. Just consider one of the motivating examples for extremising, where, IIRC, some US president is handed the several (well-calibrated, say) estimates around ≈70% for P(head of some terrorist organisation is in location X)—if these estimates came from different sources, the aggregate ought to be bigger than 70%, whereas if it’s all based on the same few sources, 70% may be one’s best guess.
This is also something that a lot of forecasters may just do subconsciously when considering different points of view (which may be something as simple as different base rates or something as complicated as different AGI arrival models).
So from an engineering perspective there is a lot of value in providing rationales, even if they don’t show up in the final forecasts.