A quick and crude comparison of epidemiological expert forecasts versus Metaculus forecasts for COVID-19
Katherine Milkman on Twitter notes how far off the epidemiological expert forecasts were in the linked sample:
https://twitter.com/katy_milkman/status/1244668082062348291
They gave an average estimate of 20,000 cases. The actual outcome was 122,653 by the stated date in the U.S. That’s off by a factor of 6.13.
I was curious how this compares to the Metaculus community forecast (note: not the machine learning fed one, just the simple median prediction). Unfortunately the interface doesn’t tell me the full distribution at date x, it just says what the median was at the time. If the expert central tendency was off by a factor of 6.13, how far off was it for Metaculus?
I looked into it in this document:
Sadly a direct comparison is not really feasible, since we weren’t predicting the same questions. But suppose if all predictions of importance were inputted into platforms such as the Good Judgement Project Open or Metaculus. Then making comparisons between groups could be trivial and continuous. This isn’t even “experts versus non-experts”. The relevant comparison is at the platform-level. It is “untrackable and unworkable one-off PDFs of somebody’s projections” versus proper scoring and aggregation over time. Since Metaculus accounts can be entirely anonymous, why wouldn’t we want every expert to input their forecast into a track record? That would make it possible to find out if the person is a dart-throwing chimp. You should assume half of them are.
There have been some claims that the 538 article put the wrong date on the expert’s forecasts, and we haven’t been able to figure out whether that’s true or not by contacting them, so unfortunately I wouldn’t use the 538 article by itself.
If I’m reading this Tweet thread correctly, anna wiederkehr from 538 seems to say the graphic was correct and the error really is that much.
https://twitter.com/wiederkehra/status/1245026159529807873?s=20
It is further implied by them in another tweet:
https://twitter.com/wiederkehra/status/1244956725150650368?s=20
It’s worth noting that epidemiologists can do modelling of disease given particular assumptions or policy choices, but they do not make a career out of predicting policies, which is a lot of what this question is about.
I messaged Khorton on Twitter, but just paraphrasing what I said here:
Metaculus predictions are now featured in those surveys (yay!) so I was able to make a more direct comparison for the first survey where you can compare those predictions head-to-head.
tl;dr: Experts have broadly outperformed the Metaculus aggregative predictions, however the differences were not exceptionally large.
UPDATE: With more data, Metaculus users have instead done better again.
Thank you very much! Apologies for not replying to your earlier comment. I was predicting that the Metaculus community prediction would outperform the surveys, and it is gratifying to see.