Sorry I didn’t see this sooner! You and @MaxRa are right, the title is a bit dramatic; indeed, in Q3 and Q4 we got null results. The −8.9 head-to-head score (I like this scoring mechanism a lot) is pretty impressive in my opinion, but again, not statistically significant, and anyway, Max’s point about effect size is well taken (-11.3 to −8.9).
We’ll take your feedback when we have the Q1 results!
On how bots compare to average human forecasters: Several of the bots are certainly better than the median forecaster on Metaculus. But relative to the community prediction (a bit more complicated than the average of the forecasts on a given question), the bot team is worse, but again, not with significance. I think we’ll include bots vs CP analysis in the Q1 post, or as a separate thing, soon.
Sorry I didn’t see this sooner! You and @MaxRa are right, the title is a bit dramatic; indeed, in Q3 and Q4 we got null results. The −8.9 head-to-head score (I like this scoring mechanism a lot) is pretty impressive in my opinion, but again, not statistically significant, and anyway, Max’s point about effect size is well taken (-11.3 to −8.9).
We’ll take your feedback when we have the Q1 results!
On how bots compare to average human forecasters: Several of the bots are certainly better than the median forecaster on Metaculus. But relative to the community prediction (a bit more complicated than the average of the forecasts on a given question), the bot team is worse, but again, not with significance. I think we’ll include bots vs CP analysis in the Q1 post, or as a separate thing, soon.