Some learnings I had from forecasting in 2020
crossposted from my own short-form
Here are some things I’ve learned from spending a decent fraction of the last 6 months either forecasting or thinking about forecasting, with an eye towards beliefs that I expect to be fairly generalizable to other endeavors.
Before reading this post, I recommend brushing up on Tetlock’s work on (super)forecasting, particularly Tetlock’s 10 commandments for aspiring superforecasters.
1. Forming (good) outside views is often hard but not impossible.
I think there is a common belief/framing in EA and rationalist circles that coming up with outside views is easy, and the real difficulty is a) originality in inside views, and also b) a debate of how much to trust outside views vs inside views.
I think this is directionally true (original thought is harder than synthesizing existing views) but it hides a lot of the details. It’s often quite difficult to come up with and balance good outside views that are applicable to a situation. See Manheim and Muelhauser for some discussions of this.
2. For novel out-of-distribution situations, “normal” people often trust centralized data/ontologies more than is warranted.
See here for a discussion. I believe something similar is true for trust of domain experts, though this is more debatable.
3. The EA community overrates the predictive validity and epistemic superiority of forecasters/forecasting.
(Note that I think this is an improvement over the status quo in the broader society, where by default approximately nobody trusts generalist forecasters at all)
I’ve had several conversations where EAs will ask me to make a prediction, I’ll think about it a bit and say something like “I dunno, 10%?”and people will treat it like a fully informed prediction to make decisions about, rather than just another source of information among many.
I think this is clearly wrong. I think in almost any situation where you are a reasonable person and you spent 10x (sometimes 100x or more!) time thinking about a question then I have, you should just trust your own judgments much more than mine on the question.
To a first approximation, good forecasters have three things: 1) They’re fairly smart. 2) They’re willing to actually do the homework. 3) They have an intuitive sense of probability.
This is not nothing, but it’s also pretty far from everything you want in a epistemic source.
4. The EA community overrates Superforecasters and Superforecasting techniques.
I think the types of questions and responses Good Judgment .* is interested in is a particular way to look at the world. I don’t think it is always applicable (easy EA-relevant example: your Brier score is basically the same if you give 0% for 1% probabilities, and vice versa), and it’s bad epistemics to collapse all of the “figure out the future in a quantifiable manner” to a single paradigm.
Likewise, I don’t think there’s a clear dividing line between good forecasters and GJP-certified Superforecasters, so many of the issues I mentioned in #3 are just as applicable here.
I’m not sure how to collapse all the things I’ve learned on this topic in a few short paragraphs, but the tl;dr is that I trusted superforecasters much more than I trusted other EAs before I started forecasting stuff, and now I consider their opinions and forecasts “just” an important overall component to my thinking, rather than a clear epistemic superior to defer to.
5. Good intuitions are really important.
I think there’s a Straw Vulcan approach to rationality where people think “good” rationality is about suppressing your System 1 in favor of clear thinking and logical propositions from your system 2. I think there’s plenty of evidence for this being wrong*. For example, the cognitive reflection test was originally supposed to be a test of how well people suppress their “intuitive” answers to instead think through the question and provide the right “unintuitive answers”, however we’ve later learned (one fairly good psych study. May not replicate, seems to accord with my intuitions and recent experiences) that more “cognitively reflective” people also had more accurate initial answers when they didn’t have the time to think through the question.
On a more practical level, I think a fair amount of good thinking is using your System 2 to train your intuitions, so you have better and better first impressions and taste for how to improve your understanding of the world in the future.
*I think my claim so far is fairly uncontroversial, for example I expect CFAR to agree with a lot of what I say.
6. Relatedly, most of my forecasting mistakes are due to emotional rather than technical reasons.
Here’s a Twitter thread from May exploring why; I think I still mostly stand by it.
- Long-Term Future Fund: May 2021 grant recommendations by 27 May 2021 6:44 UTC; 110 points) (
- 8 Apr 2022 21:58 UTC; 40 points) 's comment on Against the “smarts fetish” by (
- Forecasting Newsletter: October 2020. by 1 Nov 2020 13:00 UTC; 34 points) (
- Forecasting Newsletter: October 2020. by 1 Nov 2020 13:09 UTC; 11 points) (LessWrong;
- 3 Oct 2020 19:46 UTC; 6 points) 's comment on Linch’s Quick takes by (
This seems to be true and also to be an emerging consensus (at least here on the forum).
I’ve only been forecasting for a few months, but it’s starting to seem to me like forecasting does have quite a lot of value—as valuable training in reasoning, and as a way of enforcing a common language around discussion of possible futures. The accuracy of the predictions themselves seems secondary to the way that forecasting serves as a calibration exercise. I’d really like to see empirical work on this, but anecdotally it does feel like it has improved my own reasoning somewhat. Curious to hear your thoughts.
Thanks for the comment!
Can you point to some examples?
This seems right to me. I think society as a whole underprices forecasting, and EA underprices a bunch of subniches within forecasting (even if they overrate predictive validity specifically).
I think this is right. I think to some degree, the value of forecasting is similar to what Parfit ascribes to thought experiments:
Similarly, I think of a lot of the value of inputting probabilities and distributions is as a way to have internal coherence/validity, to help represent/bring to the forefront of what I believe.
This sounds right to me. Stefan Schubert has a fun comparison of forecasting and analytic philosophy.
Do your opinion updates extend from individual forecasts to aggregated ones? In particular how reliable do you think is the Metaculus median AGI timeline?
On the one hand, my opinion of Metaculus predictions worsened as I saw how the ‘recent predictions’ showed people piling in on the median on some questions I watch. On the other hand, my opinion of Metaculus predictions improved as I found out that performance doesn’t seem to fall as a function of ‘resolve minus closing’ time (see https://twitter.com/tenthkrige/status/1296401128469471235). Are there some observations which have swayed your opinion in similar ways?
With regards to the AGI timeline, it’s important to note that Metaculus’ resolution criteria are quite different from a ‘standard’ interpretation of what would constitute AGI[1], (or human-level AI[2], superintelligence[3], transformative AI, etc.). It’s also unclear what proportion of forecasters have read this fine print (interested to hear others’ views on this), which further complicates interpretation.
OpenAI Charter
expert survey
Bostrom
Agreed, I’ve been trying to help out a bit with Matt Barnett’s new question here. Feedback period is still open, so chime in if you have ideas!
I suspect most Metaculites are accustomed to paying attention to how a question’s operationalization deviates from its intent FWIW. Personally, I find the Montezuma’s revenge criterion quite important without which the question would be far from AGI.
My intent with bringing up this question, was more to ask about how Linch thinks about the reliability of long-term predictions with no obvious frequentist-friendly track record to look at.
Can you say more about this? I ask because this behavior seems consistent with an attitude of epistemic deference towards the community prediction when individual predictors perceive it to be superior to what they can themselves predict given their time and ability constraints.
Sure at an individual level deference usually makes for better predictions, but at a community level deference-as-the-norm can dilute the weight of those who are informed and predict differently from the median. Excessive numbers of deferential predictions also obfuscate how reliable the median prediction is, and thus makes it harder for others to do an informed update on the median.
As you say, it’s better if people contribute information where their relative value-add is greatest, so I’d say it’s reasonable for people to have a 2:1 ratio of questions on which they deviate from the median to questions on which they follow the median. My vague impression is that the ratio may be lower—especially for people predicting on <1 year time horizon events. I think you, linch and other heavier Metaculus users may have a more informed impression here though, so would be happy to see disagreement.
I think it would be interesting to have a Metaculus on which for every prediction you have to select a general category for your update e.g. “New Probability Calculation”, “Updated to Median”, “Information source released”, etc. Seeing the various distributions for each would likely be quite informative.
I think the best individual forecasters are on average better than the aggregate Metaculus forecasts at the moment they make the prediction. Especially if they spent a while on the prediction. I’m less sure if you account for prediction lag (The Metaculus and community predictions are usually better at incorporating new information), and my assessment for that will depend on a bunch of details.
I think as noted by matthew.vandermerwe, the Metaculus question operationalization for “AGI” is very different from what our community typically uses. I don’t have a strong opinion on whether a random AI Safety person will do better on that operationalization.
For something closer to what EAs care about, I’m pretty suspicious of the current forecasts given for existential risk/GCR estimates (for example in the Ragnarok series), and generally do not think existential risk researchers should strongly defer to them (though I suspect the forecasts/comments are good enough that it’s generally worth most xrisk researchers studying the relevant questions to read).