Thanks for writing this! Some (weakly-held) points of skepticism:
I find it a bit nebulous what you do and don’t count as a rationale. Similarly to Eli,* I think on some readings of your post, “forecasting” becomes very broad and just encompasses all of research. Obviously, research is important!
Rationales are costly! Taking that into account, I think there is a role to play for “just the numbers” forecasting, e.g.:
Sometimes you just want to defer to others, especially if an existing track record establishes that the numbers are reliable. For instance, when looking at weather forecasts, or (at least until last year) looking at 538’s numbers for an upcoming election, it would be great if you understood all the details of what goes into the numbers, but the numbers themselves are plenty useful, too.
Even without a track record, just-the-number forecasts give you a baseline of what people believe, which allows you to observe big shifts. I’ve heard many people express things like “I don’t defer to the Metaculus on AGI arrival, but it was surely informative to see by how much the community prediction has moved over the last few years”.
Just-the-number forecasts let you spot disagreements with other people, which helps finding out where talking about rationales/models is particularly important.
I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts. Replying to this demand with AI-generated rationales might shift the skeptical take from “they’re just making up numbers” to “it’s all based on LLM hallucinations” that I’m not sure really addresses the underlying problem.
*OTOH, I think Eli is also hinting at a definition of forecasting that is too narrow. I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don’t agree that forecasting by definition means that little effort was put into it! Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?” If yes, it’s forecasting, if not, it’s something else.
on some readings of your post, “forecasting” becomes very broad and just encompasses all of research.
To add another perspective: Reasoning helps aggregating forecasts. Just consider one of the motivating examples for extremising, where, IIRC, some US president is handed the several (well-calibrated, say) estimates around ≈70% for P(head of some terrorist organisation is in location X)—if these estimates came from different sources, the aggregate ought to be bigger than 70%, whereas if it’s all based on the same few sources, 70% may be one’s best guess.
This is also something that a lot of forecasters may just do subconsciously when considering different points of view (which may be something as simple as different base rates or something as complicated as different AGI arrival models).
So from an engineering perspective there is a lot of value in providing rationales, even if they don’t show up in the final forecasts.
Yeah, I do like your four examples of “just the numbers” forecasts that are valuable: weather, elections, what people believe, and “where is there lots of disagreement? I’m more skeptical that these are useful, rather than curiosity-satisfying.
Election forecasts are a case in point. People will usually prepare for all outcomes regardless of the odds. And if you work in politics, deciding who to choose for VP or where to spend your marginal ad dollar, you need models of voter behavior.
Probably the best case for just-the-numbers is probably your point (b), shift-detection. I echo your point that many people seem struck by the shift in AGI risk on the Metaculus question.
I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts.
Undoubtedly some of it is. Anecdotally, though, high-level folks frequently take one (or zero) glances at the calibration chart, nod, and then say “but how I am supposed to use this?”, even on questions I pick to be highly relevant to them, just like the paper I cited finding “decision-makers lacking interest in probability estimates.”
Even if you’re (rightly) skeptical about AI-generated rationales, I think the point holds for human rationales. One example: Why did DeepMind hire Swift Centre forecasters when they already had Metaculus forecasts on the same topics, as well as access to a large internal prediction market?
I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don’t agree that forecasting by definition means that little effort was put into it! Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?” If yes, it’s forecasting, if not, it’s something else.
Thanks for clarifying! Would you consider OpenPhil worldview investigations reports such Scheming AIs, Is power-seeking AI an existential risk, Bio Anchors, and Davidson’s takeoff model forecasting? It seems to me that they are forecasting in a relevant sense and (for all except Scheming AIs maybe?) the sense you describe of the rationale linked tightly to a numerical forecast, but wouldn’t fit under the OP forecasting program area (correct me if I’m wrong).
Maybe not worth spending too much time on these terminological disputes, perhaps the relevant question for the community is what the scope of your grantmaking program is. If indeed the months-year-long reports above wouldn’t be covered, then it seems to me that the amount of effort spent is a relevant dimension of what counts as “research with a forecast attached” vs. “forecasting as is generally understood in EA circles and would be covered under your program”. So it might be worth clarifying the boundaries there. If you indeed would consider reports like worldview investigations ones under your program, then never mind but good to clarify as I’d guess most would not guess that.
I think it’s borderline whether reports of this type are forecasting as commonly understood, but would personally lean no in the specific cases you mention (except maybe the bio anchors report).
I really don’t think that this intuition is driven by the amount of time or effort that went into them, but rather the percentage of intellectual labor that went into something like “quantifying uncertainty” (rather than, e.g. establishing empirical facts, reviewing the literature, or analyzing the structure of commonly-made arguments).
As for our grantmaking program: I expect we’ll have a more detailed description of what we want to cover later this year, where we might also address points about the boundaries to worldview investigations.
Hi Dan,
Thanks for writing this! Some (weakly-held) points of skepticism:
I find it a bit nebulous what you do and don’t count as a rationale. Similarly to Eli,* I think on some readings of your post, “forecasting” becomes very broad and just encompasses all of research. Obviously, research is important!
Rationales are costly! Taking that into account, I think there is a role to play for “just the numbers” forecasting, e.g.:
Sometimes you just want to defer to others, especially if an existing track record establishes that the numbers are reliable. For instance, when looking at weather forecasts, or (at least until last year) looking at 538’s numbers for an upcoming election, it would be great if you understood all the details of what goes into the numbers, but the numbers themselves are plenty useful, too.
Even without a track record, just-the-number forecasts give you a baseline of what people believe, which allows you to observe big shifts. I’ve heard many people express things like “I don’t defer to the Metaculus on AGI arrival, but it was surely informative to see by how much the community prediction has moved over the last few years”.
Just-the-number forecasts let you spot disagreements with other people, which helps finding out where talking about rationales/models is particularly important.
I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts. Replying to this demand with AI-generated rationales might shift the skeptical take from “they’re just making up numbers” to “it’s all based on LLM hallucinations” that I’m not sure really addresses the underlying problem.
*OTOH, I think Eli is also hinting at a definition of forecasting that is too narrow. I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don’t agree that forecasting by definition means that little effort was put into it!
Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?” If yes, it’s forecasting, if not, it’s something else.
[Disclaimer: I’m working for FutureSearch]
To add another perspective: Reasoning helps aggregating forecasts. Just consider one of the motivating examples for extremising, where, IIRC, some US president is handed the several (well-calibrated, say) estimates around ≈70% for P(head of some terrorist organisation is in location X)—if these estimates came from different sources, the aggregate ought to be bigger than 70%, whereas if it’s all based on the same few sources, 70% may be one’s best guess.
This is also something that a lot of forecasters may just do subconsciously when considering different points of view (which may be something as simple as different base rates or something as complicated as different AGI arrival models).
So from an engineering perspective there is a lot of value in providing rationales, even if they don’t show up in the final forecasts.
Yeah, I do like your four examples of “just the numbers” forecasts that are valuable: weather, elections, what people believe, and “where is there lots of disagreement? I’m more skeptical that these are useful, rather than curiosity-satisfying.
Election forecasts are a case in point. People will usually prepare for all outcomes regardless of the odds. And if you work in politics, deciding who to choose for VP or where to spend your marginal ad dollar, you need models of voter behavior.
Probably the best case for just-the-numbers is probably your point (b), shift-detection. I echo your point that many people seem struck by the shift in AGI risk on the Metaculus question.
Undoubtedly some of it is. Anecdotally, though, high-level folks frequently take one (or zero) glances at the calibration chart, nod, and then say “but how I am supposed to use this?”, even on questions I pick to be highly relevant to them, just like the paper I cited finding “decision-makers lacking interest in probability estimates.”
Even if you’re (rightly) skeptical about AI-generated rationales, I think the point holds for human rationales. One example: Why did DeepMind hire Swift Centre forecasters when they already had Metaculus forecasts on the same topics, as well as access to a large internal prediction market?
Thanks for clarifying! Would you consider OpenPhil worldview investigations reports such Scheming AIs, Is power-seeking AI an existential risk, Bio Anchors, and Davidson’s takeoff model forecasting? It seems to me that they are forecasting in a relevant sense and (for all except Scheming AIs maybe?) the sense you describe of the rationale linked tightly to a numerical forecast, but wouldn’t fit under the OP forecasting program area (correct me if I’m wrong).
Maybe not worth spending too much time on these terminological disputes, perhaps the relevant question for the community is what the scope of your grantmaking program is. If indeed the months-year-long reports above wouldn’t be covered, then it seems to me that the amount of effort spent is a relevant dimension of what counts as “research with a forecast attached” vs. “forecasting as is generally understood in EA circles and would be covered under your program”. So it might be worth clarifying the boundaries there. If you indeed would consider reports like worldview investigations ones under your program, then never mind but good to clarify as I’d guess most would not guess that.
I think it’s borderline whether reports of this type are forecasting as commonly understood, but would personally lean no in the specific cases you mention (except maybe the bio anchors report).
I really don’t think that this intuition is driven by the amount of time or effort that went into them, but rather the percentage of intellectual labor that went into something like “quantifying uncertainty” (rather than, e.g. establishing empirical facts, reviewing the literature, or analyzing the structure of commonly-made arguments).
As for our grantmaking program: I expect we’ll have a more detailed description of what we want to cover later this year, where we might also address points about the boundaries to worldview investigations.