Looking at the rolling performance of your method (optimize on last 100 and use that to predict), median and geo mean odds, I find they have been ~indistinguishable over the last ~200 questions. If I look at the exact numbers, extremized_last_100 does win marginally, but looking at that chart I’d have a hard time saying “there’s a 70% chance it wins over the next 100 questions”. If you’re interested in betting at 70% odds I’d be interested.
There seems to be a long tradition of extremizing in the academic literature (see the reference in the post above). Though on the other hand empirical studies have been sparse, and eg Satopaa et al are cheating by choosing the extremization factor with the benefit of hindsight.
No offense, but the academic literature can do one.
In this case I didn’t try too hard to find an extremization factor that would work, just two attempts. I didn’t need to mine for a factor that would work. But obviously we cannot generalize from just one example.
Again, I don’t find this very persuasive, given what I already knew about the history of Metaculus’ underconfidence.
Extremizing has an intuitive meaning as accounting for the different pieces of information across experts that gives it weight (pun not intended). On the other hand, every extra parameter in the aggregation is a chance to shoot off our own foot.
I think extremizing might make sense if the other forecasts aren’t public. (Since then the forecasts might be slightly more independent). When the other forecasts are public, I think extremizing makes less sense. This goes doubly so when the forecasts are coming from a betting market.
Intuitively it seems like the overall confidence of a community should be roughly continuous over time? So the level of underconfidence in recent questions should be a good indicator of its confidence for the next few questions.
I find this the most persuasive. I think it ultimately depends how you think people adjust for their past calibration. It’s taken the community ~5 years to reduce it’s under-confidence, so maybe it’ll take another 5 years. If people immediately update, I would expect this to be very unpredictable.
Looking at the rolling performance of your method (optimize on last 100 and use that to predict), median and geo mean odds, I find they have been ~indistinguishable over the last ~200 questions. If I look at the exact numbers, extremized_last_100 does win marginally, but looking at that chart I’d have a hard time saying “there’s a 70% chance it wins over the next 100 questions”. If you’re interested in betting at 70% odds I’d be interested.
No offense, but the academic literature can do one.
Again, I don’t find this very persuasive, given what I already knew about the history of Metaculus’ underconfidence.
I think extremizing might make sense if the other forecasts aren’t public. (Since then the forecasts might be slightly more independent). When the other forecasts are public, I think extremizing makes less sense. This goes doubly so when the forecasts are coming from a betting market.
I find this the most persuasive. I think it ultimately depends how you think people adjust for their past calibration. It’s taken the community ~5 years to reduce it’s under-confidence, so maybe it’ll take another 5 years. If people immediately update, I would expect this to be very unpredictable.