Curious what you mean by this. One version of chance is “uniform prediction of AGI over future years” which obviously seems worse than Metaculus, but perhaps you meant a more specific baseline?
Personally, I think forecasts like these are rough averages of what informed individuals would think about these questions. Yes, you shouldn’t defer to them, but it’s also useful to recognize how that community’s predictions have changed over time.
I am not sure how much to trust Metaculus’ in general, but I do not think it is obvious that their AI predictions should be ignored. For what is worth, Epoch attributed a weight of 0.23 to Metaculus in the judgement-based forecasts of their AI Timelines review. Holden, Ajeya and AI Impacts got smaller weights, whereas Samotsvety got a higher one:
However, one comment I made here may illustrate what Guy presumably is referring to:
The mean Brier scores of Metaculus’ predictions (and Metaculus’ community predictions) are (from here):
For all the questions:
At resolve time (N = 1,710), 0.087 (0.092).
For 1 month prior to resolve time (N = 1,463), 0.106 (0.112).
For 6 months (N = 777), 0.109 (0.127).
For 1 year (N = 334), 0.111 (0.145).
For 3 years (N = 57), 0.104 (0.133).
For 5 years (N = 8), 0.182 (0.278).
For the questions of the category artificial intelligence:
At resolve time (N = 46), 0.128 (0.198).
For 1 month prior to resolve time (N = 40), 0.142 (0.205).
For 6 months (N = 21), 0.119 (0.240).
For 1 year (N = 13), 0.107 (0.254).
For 3 years (N = 1), 0.007 (0.292).
Note:
For the questions of the category artificial intelligence:
Metaculus’ community predictions made earlier than 6 months prior to resolve time perform as badly or worse than always predicting 0.5, as their mean Brier score is similar or higher than 0.25. [Maybe this is what Guy is pointing to.]
Metaculus’ predictions perform significantly better than Metaculus’ community predictions.
Questions for which the Brier score can be assessed for a longer time prior to resolve, i.e. the ones with longer lifespans, tend to have lower base rates (I found a correlation of −0.129 among all questions). This means it is easier to achieve a lower Brier score:
Predicting 0.5 for a question whose base rate is 0.5 will lead to a Brier score of 0.25 (= 0.5*(0.5 − 1)^2 + (0.5 − 0)*(0.5 − 0)^2).
Predicting 0.1 for a question whose base rate is 0.1 will lead to a Brier score of 0.09 (= 0.1*(0.1 − 1)^2 + (1 − 0.1)*(0.1 − 0)^2).
Agree that they shouldn’t be ignored. By “you shouldn’t defer to them,” I just meant that it’s useful to also form one’s own inside view models alongside prediction markets (perhaps comparing to them afterwards).
What I mean is “these forecasts give no more information than flipping a coin to decide whether AGI would come in time period A vs. time period B”.
I have my own, rough, inside views about if and when AGI will come and what it would be able to do, and I don’t find it helpful to quantify them into a specific probability distribution. And there’s no “default distribution” here that I can think of either.
Gotcha, I think I still disagree with you for most decision-relevant time periods (e.g. I think they’re likely better than chance on estimating AGI within 10 years vs 20 years)
Curious what you mean by this. One version of chance is “uniform prediction of AGI over future years” which obviously seems worse than Metaculus, but perhaps you meant a more specific baseline?
Personally, I think forecasts like these are rough averages of what informed individuals would think about these questions. Yes, you shouldn’t defer to them, but it’s also useful to recognize how that community’s predictions have changed over time.
Hi Gabriel,
I am not sure how much to trust Metaculus’ in general, but I do not think it is obvious that their AI predictions should be ignored. For what is worth, Epoch attributed a weight of 0.23 to Metaculus in the judgement-based forecasts of their AI Timelines review. Holden, Ajeya and AI Impacts got smaller weights, whereas Samotsvety got a higher one:
However, one comment I made here may illustrate what Guy presumably is referring to:
Agree that they shouldn’t be ignored. By “you shouldn’t defer to them,” I just meant that it’s useful to also form one’s own inside view models alongside prediction markets (perhaps comparing to them afterwards).
What I mean is “these forecasts give no more information than flipping a coin to decide whether AGI would come in time period A vs. time period B”.
I have my own, rough, inside views about if and when AGI will come and what it would be able to do, and I don’t find it helpful to quantify them into a specific probability distribution. And there’s no “default distribution” here that I can think of either.
Gotcha, I think I still disagree with you for most decision-relevant time periods (e.g. I think they’re likely better than chance on estimating AGI within 10 years vs 20 years)