My take is that we should give little weight to Metaculus. From footnote 2 of the post:
Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
(Edited to add: I see the post you linked also includes the “Metaculus prediction” which theoretically performs significantly better than the community prediction by weighting stronger predictors more heavily. But if you look at its actual track record, it doesn’t do much better than the community. For binary questions at resolve time, it has a log score of 0.438 vs. 0.426 for community. At all times, it gets 0.280 vs. 0.261. For continuous questions at resolve time, it has a log score of 2.19 vs. 2.12. At all times, it gets 1.57 vs. 1.55.)
That said:
Or is forecasting risks just really super unreliable and not a thing to put much weight on?
I wouldn’t want people to overestimate the precision of the estimates in this post! Take them as a few data points among many. I also think it’s very healthy for the community if many people are forming inside views about AI risk, though I understand it’s difficult and had a hard time with it myself for a while.
My take is that we should give little weight to Metaculus. From footnote 2 of the post:
(Edited to add: I see the post you linked also includes the “Metaculus prediction” which theoretically performs significantly better than the community prediction by weighting stronger predictors more heavily. But if you look at its actual track record, it doesn’t do much better than the community. For binary questions at resolve time, it has a log score of 0.438 vs. 0.426 for community. At all times, it gets 0.280 vs. 0.261. For continuous questions at resolve time, it has a log score of 2.19 vs. 2.12. At all times, it gets 1.57 vs. 1.55.)
That said:
I wouldn’t want people to overestimate the precision of the estimates in this post! Take them as a few data points among many. I also think it’s very healthy for the community if many people are forming inside views about AI risk, though I understand it’s difficult and had a hard time with it myself for a while.
Ah the answer was in the footnotes all along. Silly me. Thank you for the reply!