Hey I want to give a more directly informative answer later but since this might color other people’s questions too: I just want to flag that I don’t think I’m a better forecaster than all the 989+ people below me on the leaderboards, and I also would not be surprised if I’m better than some of the people above me on the leaderboard. There’s several reasons for this:
Reality is often underpowered. While medium-term covid-19 forecasting is less prone to those issues in comparison to many other EA questions, you still have a bunch of fundamental uncertainty about how actually good you are. Being correct for one question often relies on a “bet” that’s loosely correlated with being correct on another question. At or near the top, there are not enough questions for you to be sure if you just got lucky in a bunch of correlated ways that others slightly below you in the ranks got unlucky on, vs you actually being more skilled. The differences are things like whether you “called” it correctly at 90% when others put 80%, or conversely when you were sufficiently calibrated at 70% when others were overconfident (or just unlucky) at 90%.
Metaculus rankings is a composite measurement of both activity and accuracy (all forecasting leaderboards have to be this way, otherwise the top 10 will be dominated by people who are overconfident and right on a few questions). For all I know, people who answer <10 covid-19 questions on Metaculus are actually amazing forecasters, they just chose a different platform than Metaculus after a brief trial, or they almost always only answer non-covid questions on Metaculus.
Garden of forking paths in what selection criteria you choose. For example I’m below top 50 on Metaculus overall (my excuse is I only joined 4 months ago) and below top 20 on some specific covid-19 subtournaments (though I’m also higher than 10 on others; my excuse is that I didn’t participate in a lot of those questions). But you only get so many excuses, and it’s hard to pick a fully neutral prior.
At worst, I can only be moderately confident that I’m somewhat above average at medium-term predictions of a novel pandemic, though I also don’t want to be falsely humble. My best guess is that a) this represents some underlying skill more than that and b) there’s some significant generalizability.
A lifetime ago, when I interviewed Liv Boeree about poker and EA, one thing she said really struck out to me. (I was dumb and didn’t include it in the final interview, so hopefully I didn’t butcher this rephrase completely). Roughly speaking, that in poker, among professionals, the true measurement of poker skill isn’t how much money you make (because poker is both a high-skill and high-luck game, and there’s so much randomness); rather a better measurement is the approval of your peers who are professional poker players.
This was really memorable to me because I always had the impression of poker as this extremely objective game with a very clear winning criteria (I guess from the outside, so is forecasting). If you can’t even have a clear and externally legible metric for poker, what hope does anything else significantly more fuzzy have?
That said, I do think this is a question of degree rather than kind. I think the rankings are an okay proxy for minimal competence. You probably shouldn’t trust forecasters too much (at least in their capacity as forecasters) who are below 50th percentile in the domain they are asked to forecast on, and maybe not below 90th percentile either, unless there are strong countervailing reasons.
This was a lot of good discussion of epistemics, and I highly valued that, but I was also hoping for some hot forecasting tips. ;) I’ll try asking the question differently.
I understood your intent! :) I actually plan to answer the spirit of your question on Sunday, just decided to break the general plan to “not answer questions until my official AMA time” because I thought the caveat was sufficiently important to have in the open!
What do you think helps make you a better forecaster than the other 989+ people?
I’ll instead answer this as:
What helps you have a higher rating than most of the people below you on the leaderboard?
I probably answered more questions than most of them.
I update my forecasts more quickly than most of them, particularly in March and April
Activity has consistently been shown to be one of (often, the) strongest predictors of overall accuracy in the academic literature.
I suspect I have a much stronger intuitive sense of probability/calibration.
For example, 17% (1:5) intuitively feels very different to me than 20% (1:4), and my sense is that this isn’t too common
This could just be arrogance however, there isn’t enough data for me to actually check this for actual predictions (as opposed to just calibration games)
I feel like I actually have lower epistemic humility compared to most forecasters who are top 100 or so on Metaculus. “Epistemic humility” defined narrowly as “willingness to make updates based on arguments I don’t find internally plausible just because others believed them.”
Caveat is that I’m making this comparison solely to top X% (in either activity or accuracy) forecasters.
I suspect a fair number of other forecasters are just wildly overconfident (in both senses of the term)
Certainly, non-forecasters (TV pundits, say, or just people I see on the internet) frequently seem very overconfident for what seems to me like bad reasons.
A certain epistemic attitude that I associate with both Silicon Valley and Less Wrong/rationalist culture is “strong opinions, held lightly”
This is where you believe concrete, explicit and overly specific models of the world strongly, but you quickly update whenever someone points out a hole in your reasoning.
I suspect this attitude is good for things like software design and maybe novel research, but is bad for having good explicit probabilities for Metaculus-style questions.
I’m a pretty competitive person, and I care about scoring well.
This might be surprising, but I think a lot of forecasters don’t.
Some forecasters just want to record their predictions publicly and be held accountable to them, or want to cultivate more epistemic humility by seeing themselves be wrong
I think these are perfectly legitimate uses of forecasting, and I actively encourage my friends to use Metaculus and other prediction platforms to do this.
However, it should not be surprising that people who want to score well end up on average scoring better.
So I do a bunch of things like meditate on my mistakes and try really hard to do better. I think most forecasters, including good ones, do this much less than I do.
I know more facts about covid-19.
I think the value of this is actually exaggerated, but it probably helps a little.
_____
What do you think other forecasters do to make them have a higher rating than you? [Paraphrased]
Okay, a major caveat here is that I think there is plenty of heterogeneity among forecasters. Another is that I obviously don’t have clear insight into why other forecasters are better than me (otherwise I’d have done better!) However, in general I’m guessing they:
Have more experience with forecasting.
I started in early March and I think many of them have already been forecasting for a year or more (some 5+ years!).
I think experience probably helps a lot in building intuition and avoiding a lot of subtle (and not-so-subtle!) mistakes.
They usually forecast more questions.
It takes me some effort to forecast on new questions, particularly if the template is different from other questions I’ve forecasted on before, and they aren’t something I’ve thought about before in a non-forecasting context
I know some people in the Top 10 literally forecast all questions on Metaculus, which seems like a large time commitment to me.
They update forecasts more quickly than me, particularly in May and June.
Back in March and April, I was *super* “on top of my game.” But right now I have a backlog of old predictions, of which I’m >30 days behind on the earliest one (as in, the last time I updated that prediction was 30+ days ago).
This is partially due to doing more covid forecasting on day job, partially due to having some other hobbies, and partially due to general fatigue/loss of interest (akin to lockdown fatigue from others)
On average, they’re more inclined to do simple mathematical modeling (Guesstimate, Excel, Google Sheets, foretold etc), whereas personally I’m often (not always) satisfied with a few jotted notes on a Google Doc plus a simple arithmetic calculator.
There are also more specific reasons some other forecasters are better than me, but I don’t think all or even most of the forecasters better than me have:
JGalt seems to read the news both more and more efficiently than I do, and probably knows much more factual information than me.
In particular, I recall many times where I see interesting news on Twitter or other places, want to bring it Metaculus, and bam, JGalt has already linked it ahead of me.
This is practically a running meme among Metaculus users that JGalt has read all the news.
Lukas Gloor and Pablo Stafforini plausibly has a stronger internal causal model of various covid-19 related issues.
datscilly often decomposes questions more cleanly than me, and (unlike me and several other forecasters), appears to aggressively prioritize not updating on irrelevant information.
He also cares about scores more than I do.
I think Pablo, datscilly and some others started predicting on covid-19 questions almost as soon as the pandemic started, so they built up more experience than me not only on general forecasting, but also on forecasting covid-19 related questions specifically.
At least this is what I can gather from their public comments and (in some cases) private conversations. It’s much harder for me to interpret how forecasters higher than me on the leaderboard but are otherwise mostly silent think.
What do you think helps make you a better forecaster than the other 989+ people?
What do you think makes the other ~10 people a better forecaster than you?
Hey I want to give a more directly informative answer later but since this might color other people’s questions too: I just want to flag that I don’t think I’m a better forecaster than all the 989+ people below me on the leaderboards, and I also would not be surprised if I’m better than some of the people above me on the leaderboard. There’s several reasons for this:
Reality is often underpowered. While medium-term covid-19 forecasting is less prone to those issues in comparison to many other EA questions, you still have a bunch of fundamental uncertainty about how actually good you are. Being correct for one question often relies on a “bet” that’s loosely correlated with being correct on another question. At or near the top, there are not enough questions for you to be sure if you just got lucky in a bunch of correlated ways that others slightly below you in the ranks got unlucky on, vs you actually being more skilled. The differences are things like whether you “called” it correctly at 90% when others put 80%, or conversely when you were sufficiently calibrated at 70% when others were overconfident (or just unlucky) at 90%.
Metaculus rankings is a composite measurement of both activity and accuracy (all forecasting leaderboards have to be this way, otherwise the top 10 will be dominated by people who are overconfident and right on a few questions). For all I know, people who answer <10 covid-19 questions on Metaculus are actually amazing forecasters, they just chose a different platform than Metaculus after a brief trial, or they almost always only answer non-covid questions on Metaculus.
Garden of forking paths in what selection criteria you choose. For example I’m below top 50 on Metaculus overall (my excuse is I only joined 4 months ago) and below top 20 on some specific covid-19 subtournaments (though I’m also higher than 10 on others; my excuse is that I didn’t participate in a lot of those questions). But you only get so many excuses, and it’s hard to pick a fully neutral prior.
At worst, I can only be moderately confident that I’m somewhat above average at medium-term predictions of a novel pandemic, though I also don’t want to be falsely humble. My best guess is that a) this represents some underlying skill more than that and b) there’s some significant generalizability.
A lifetime ago, when I interviewed Liv Boeree about poker and EA, one thing she said really struck out to me. (I was dumb and didn’t include it in the final interview, so hopefully I didn’t butcher this rephrase completely). Roughly speaking, that in poker, among professionals, the true measurement of poker skill isn’t how much money you make (because poker is both a high-skill and high-luck game, and there’s so much randomness); rather a better measurement is the approval of your peers who are professional poker players.
This was really memorable to me because I always had the impression of poker as this extremely objective game with a very clear winning criteria (I guess from the outside, so is forecasting). If you can’t even have a clear and externally legible metric for poker, what hope does anything else significantly more fuzzy have?
That said, I do think this is a question of degree rather than kind. I think the rankings are an okay proxy for minimal competence. You probably shouldn’t trust forecasters too much (at least in their capacity as forecasters) who are below 50th percentile in the domain they are asked to forecast on, and maybe not below 90th percentile either, unless there are strong countervailing reasons.
This was a lot of good discussion of epistemics, and I highly valued that, but I was also hoping for some hot forecasting tips. ;) I’ll try asking the question differently.
I understood your intent! :) I actually plan to answer the spirit of your question on Sunday, just decided to break the general plan to “not answer questions until my official AMA time” because I thought the caveat was sufficiently important to have in the open!
I’ll instead answer this as:
I probably answered more questions than most of them.
I update my forecasts more quickly than most of them, particularly in March and April
Activity has consistently been shown to be one of (often, the) strongest predictors of overall accuracy in the academic literature.
I suspect I have a much stronger intuitive sense of probability/calibration.
For example, 17% (1:5) intuitively feels very different to me than 20% (1:4), and my sense is that this isn’t too common
This could just be arrogance however, there isn’t enough data for me to actually check this for actual predictions (as opposed to just calibration games)
I feel like I actually have lower epistemic humility compared to most forecasters who are top 100 or so on Metaculus. “Epistemic humility” defined narrowly as “willingness to make updates based on arguments I don’t find internally plausible just because others believed them.”
Caveat is that I’m making this comparison solely to top X% (in either activity or accuracy) forecasters.
I suspect a fair number of other forecasters are just wildly overconfident (in both senses of the term)
Certainly, non-forecasters (TV pundits, say, or just people I see on the internet) frequently seem very overconfident for what seems to me like bad reasons.
A certain epistemic attitude that I associate with both Silicon Valley and Less Wrong/rationalist culture is “strong opinions, held lightly”
This is where you believe concrete, explicit and overly specific models of the world strongly, but you quickly update whenever someone points out a hole in your reasoning.
I suspect this attitude is good for things like software design and maybe novel research, but is bad for having good explicit probabilities for Metaculus-style questions.
I’m a pretty competitive person, and I care about scoring well.
This might be surprising, but I think a lot of forecasters don’t.
Some forecasters just want to record their predictions publicly and be held accountable to them, or want to cultivate more epistemic humility by seeing themselves be wrong
I think these are perfectly legitimate uses of forecasting, and I actively encourage my friends to use Metaculus and other prediction platforms to do this.
However, it should not be surprising that people who want to score well end up on average scoring better.
So I do a bunch of things like meditate on my mistakes and try really hard to do better. I think most forecasters, including good ones, do this much less than I do.
I know more facts about covid-19.
I think the value of this is actually exaggerated, but it probably helps a little.
_____
Okay, a major caveat here is that I think there is plenty of heterogeneity among forecasters. Another is that I obviously don’t have clear insight into why other forecasters are better than me (otherwise I’d have done better!) However, in general I’m guessing they:
Have more experience with forecasting.
I started in early March and I think many of them have already been forecasting for a year or more (some 5+ years!).
I think experience probably helps a lot in building intuition and avoiding a lot of subtle (and not-so-subtle!) mistakes.
They usually forecast more questions.
It takes me some effort to forecast on new questions, particularly if the template is different from other questions I’ve forecasted on before, and they aren’t something I’ve thought about before in a non-forecasting context
I know some people in the Top 10 literally forecast all questions on Metaculus, which seems like a large time commitment to me.
They update forecasts more quickly than me, particularly in May and June.
Back in March and April, I was *super* “on top of my game.” But right now I have a backlog of old predictions, of which I’m >30 days behind on the earliest one (as in, the last time I updated that prediction was 30+ days ago).
This is partially due to doing more covid forecasting on day job, partially due to having some other hobbies, and partially due to general fatigue/loss of interest (akin to lockdown fatigue from others)
On average, they’re more inclined to do simple mathematical modeling (Guesstimate, Excel, Google Sheets, foretold etc), whereas personally I’m often (not always) satisfied with a few jotted notes on a Google Doc plus a simple arithmetic calculator.
There are also more specific reasons some other forecasters are better than me, but I don’t think all or even most of the forecasters better than me have:
JGalt seems to read the news both more and more efficiently than I do, and probably knows much more factual information than me.
In particular, I recall many times where I see interesting news on Twitter or other places, want to bring it Metaculus, and bam, JGalt has already linked it ahead of me.
This is practically a running meme among Metaculus users that JGalt has read all the news.
Lukas Gloor and Pablo Stafforini plausibly has a stronger internal causal model of various covid-19 related issues.
datscilly often decomposes questions more cleanly than me, and (unlike me and several other forecasters), appears to aggressively prioritize not updating on irrelevant information.
He also cares about scores more than I do.
I think Pablo, datscilly and some others started predicting on covid-19 questions almost as soon as the pandemic started, so they built up more experience than me not only on general forecasting, but also on forecasting covid-19 related questions specifically.
At least this is what I can gather from their public comments and (in some cases) private conversations. It’s much harder for me to interpret how forecasters higher than me on the leaderboard but are otherwise mostly silent think.
1.) This is amazing, thank you. Strongly upvoted—I learned a lot.
2.) Can we have an AMA with JGalt where he teaches us how to read all the news?