My analysis suggests that the experts did a fairly good job of forecasting (Brier score = 0.21), and would have been less accurate if they had predicted each development in AI to generally come, by a factor of 1.5, later (Brier score = 0.26) or sooner (Brier score = 0.29) than they actually predicted.
Important missing context from this is that Brier score = 0.25 is what you would get if you predicted randomly (i.e., put 50% on everything no matter what, or assigned 0% or 100% by coin flip). So that means here that systematically predicting “later” or “sooner” would make you “worse than random” whereas the actual predictions are “better than random” (though not by much).
So I think the main takeaways are: (1) predicting this AI stuff is very hard and (2) we are at least not systematically biased as far as we can tell so far in predicting progress too slow or too fast. (1) is not great but (2) is at least reassuring.
Hmm I disagree on the numbers—have I got something wrong in the below?
If you assigned 0% or 100% by coin flip, you would get a Brier score of 0.5 (half the time you would get 0, half the time you would get 1), and if you assigned a random probability between 0% and 100% for every question, you would get a Brier score of 0.33. If you put 50% on everything you would indeed get 0.25.
As the experts had to give 10%, 50%, and 90% forecasts, if they had done this at random they would have ended up with a score of 0.36 [1].
So I think they—including the bullish and bearish groups—still did a fair bit better than random, which would be 0.36 in this context. And all simulated groups did better than the ‘randomized’ group which got a Brier score of 0.31 in my randomization. This does seem like worthwhile context to add though.
Important missing context from this is that Brier score = 0.25 is what you would get if you predicted randomly (i.e., put 50% on everything no matter what, or assigned 0% or 100% by coin flip). So that means here that systematically predicting “later” or “sooner” would make you “worse than random” whereas the actual predictions are “better than random” (though not by much).
So I think the main takeaways are: (1) predicting this AI stuff is very hard and (2) we are at least not systematically biased as far as we can tell so far in predicting progress too slow or too fast. (1) is not great but (2) is at least reassuring.
Hmm I disagree on the numbers—have I got something wrong in the below?
If you assigned 0% or 100% by coin flip, you would get a Brier score of 0.5 (half the time you would get 0, half the time you would get 1), and if you assigned a random probability between 0% and 100% for every question, you would get a Brier score of 0.33. If you put 50% on everything you would indeed get 0.25.
As the experts had to give 10%, 50%, and 90% forecasts, if they had done this at random they would have ended up with a score of 0.36 [1].
So I think they—including the bullish and bearish groups—still did a fair bit better than random, which would be 0.36 in this context. And all simulated groups did better than the ‘randomized’ group which got a Brier score of 0.31 in my randomization. This does seem like worthwhile context to add though.
[(1-0.1)^2 + (0-0.1)^2 + (1-0.5)^2 + (0-0.5)^2 + (1-0.9)^2 + (1-0.1)^2] / 6
No, it was me who got this wrong. Thanks!