Hey I want to give a more directly informative answer later but since this might color other people’s questions too: I just want to flag that I don’t think I’m a better forecaster than all the 989+ people below me on the leaderboards, and I also would not be surprised if I’m better than some of the people above me on the leaderboard. There’s several reasons for this:
Reality is often underpowered. While medium-term covid-19 forecasting is less prone to those issues in comparison to many other EA questions, you still have a bunch of fundamental uncertainty about how actually good you are. Being correct for one question often relies on a “bet” that’s loosely correlated with being correct on another question. At or near the top, there are not enough questions for you to be sure if you just got lucky in a bunch of correlated ways that others slightly below you in the ranks got unlucky on, vs you actually being more skilled. The differences are things like whether you “called” it correctly at 90% when others put 80%, or conversely when you were sufficiently calibrated at 70% when others were overconfident (or just unlucky) at 90%.
Metaculus rankings is a composite measurement of both activity and accuracy (all forecasting leaderboards have to be this way, otherwise the top 10 will be dominated by people who are overconfident and right on a few questions). For all I know, people who answer <10 covid-19 questions on Metaculus are actually amazing forecasters, they just chose a different platform than Metaculus after a brief trial, or they almost always only answer non-covid questions on Metaculus.
Garden of forking paths in what selection criteria you choose. For example I’m below top 50 on Metaculus overall (my excuse is I only joined 4 months ago) and below top 20 on some specific covid-19 subtournaments (though I’m also higher than 10 on others; my excuse is that I didn’t participate in a lot of those questions). But you only get so many excuses, and it’s hard to pick a fully neutral prior.
At worst, I can only be moderately confident that I’m somewhat above average at medium-term predictions of a novel pandemic, though I also don’t want to be falsely humble. My best guess is that a) this represents some underlying skill more than that and b) there’s some significant generalizability.
A lifetime ago, when I interviewed Liv Boeree about poker and EA, one thing she said really struck out to me. (I was dumb and didn’t include it in the final interview, so hopefully I didn’t butcher this rephrase completely). Roughly speaking, that in poker, among professionals, the true measurement of poker skill isn’t how much money you make (because poker is both a high-skill and high-luck game, and there’s so much randomness); rather a better measurement is the approval of your peers who are professional poker players.
This was really memorable to me because I always had the impression of poker as this extremely objective game with a very clear winning criteria (I guess from the outside, so is forecasting). If you can’t even have a clear and externally legible metric for poker, what hope does anything else significantly more fuzzy have?
That said, I do think this is a question of degree rather than kind. I think the rankings are an okay proxy for minimal competence. You probably shouldn’t trust forecasters too much (at least in their capacity as forecasters) who are below 50th percentile in the domain they are asked to forecast on, and maybe not below 90th percentile either, unless there are strong countervailing reasons.
This was a lot of good discussion of epistemics, and I highly valued that, but I was also hoping for some hot forecasting tips. ;) I’ll try asking the question differently.
I understood your intent! :) I actually plan to answer the spirit of your question on Sunday, just decided to break the general plan to “not answer questions until my official AMA time” because I thought the caveat was sufficiently important to have in the open!
Hey I want to give a more directly informative answer later but since this might color other people’s questions too: I just want to flag that I don’t think I’m a better forecaster than all the 989+ people below me on the leaderboards, and I also would not be surprised if I’m better than some of the people above me on the leaderboard. There’s several reasons for this:
Reality is often underpowered. While medium-term covid-19 forecasting is less prone to those issues in comparison to many other EA questions, you still have a bunch of fundamental uncertainty about how actually good you are. Being correct for one question often relies on a “bet” that’s loosely correlated with being correct on another question. At or near the top, there are not enough questions for you to be sure if you just got lucky in a bunch of correlated ways that others slightly below you in the ranks got unlucky on, vs you actually being more skilled. The differences are things like whether you “called” it correctly at 90% when others put 80%, or conversely when you were sufficiently calibrated at 70% when others were overconfident (or just unlucky) at 90%.
Metaculus rankings is a composite measurement of both activity and accuracy (all forecasting leaderboards have to be this way, otherwise the top 10 will be dominated by people who are overconfident and right on a few questions). For all I know, people who answer <10 covid-19 questions on Metaculus are actually amazing forecasters, they just chose a different platform than Metaculus after a brief trial, or they almost always only answer non-covid questions on Metaculus.
Garden of forking paths in what selection criteria you choose. For example I’m below top 50 on Metaculus overall (my excuse is I only joined 4 months ago) and below top 20 on some specific covid-19 subtournaments (though I’m also higher than 10 on others; my excuse is that I didn’t participate in a lot of those questions). But you only get so many excuses, and it’s hard to pick a fully neutral prior.
At worst, I can only be moderately confident that I’m somewhat above average at medium-term predictions of a novel pandemic, though I also don’t want to be falsely humble. My best guess is that a) this represents some underlying skill more than that and b) there’s some significant generalizability.
A lifetime ago, when I interviewed Liv Boeree about poker and EA, one thing she said really struck out to me. (I was dumb and didn’t include it in the final interview, so hopefully I didn’t butcher this rephrase completely). Roughly speaking, that in poker, among professionals, the true measurement of poker skill isn’t how much money you make (because poker is both a high-skill and high-luck game, and there’s so much randomness); rather a better measurement is the approval of your peers who are professional poker players.
This was really memorable to me because I always had the impression of poker as this extremely objective game with a very clear winning criteria (I guess from the outside, so is forecasting). If you can’t even have a clear and externally legible metric for poker, what hope does anything else significantly more fuzzy have?
That said, I do think this is a question of degree rather than kind. I think the rankings are an okay proxy for minimal competence. You probably shouldn’t trust forecasters too much (at least in their capacity as forecasters) who are below 50th percentile in the domain they are asked to forecast on, and maybe not below 90th percentile either, unless there are strong countervailing reasons.
This was a lot of good discussion of epistemics, and I highly valued that, but I was also hoping for some hot forecasting tips. ;) I’ll try asking the question differently.
I understood your intent! :) I actually plan to answer the spirit of your question on Sunday, just decided to break the general plan to “not answer questions until my official AMA time” because I thought the caveat was sufficiently important to have in the open!