I don’t think this is an accurate summary of the disagreement, but I’ve tried to clarify my point twice already, so I’m going to leave it at that.
BenjaminTereick
1. Are you referring to your exchange with David Mathers here?
3. I’m not sure what you’re saying here. Just to clarify what my point is: you’re arguing in the post that the slow scenario actually describes big improvements in AI capabilities. My counterpoint is that this scenario is not given a lot of weight by the respondents, suggesting that they mostly don’t agree with you on this.
Thanks for the replies!
I agree that the scenario question could have been phrased better in hindsight, or maybe could have included an option like “progress falling behind all of the three scenarios”. I also agree that given the way the question was asked, the summary on p. 38 is slightly inaccurate. (it doesn’t seem like a big issue to me, but that’s probably downstream of me disagreeing that the “slow” scenario describes AGI-like capabilities).
Fair.
I’m not saying that the responses show that there’s no framing effect. I’m saying that they seem to indicate that at least for most respondents, the description of the slow scenario didn’t seem wildly off as a floor of what could be expected.
I forgot to include the text of the question in my post. I just added it now.
I think it would also be fair to include the disclaimer in the question I quoted above.
There are several capabilities mentioned in the slow progress scenario that seem indicative of AGI or something close, such as the ability of AI systems to largely automate various kinds of labour (e.g. research assistant, software engineer, customer service, novelist, musician, personal assistant, travel agent)
I would read the scenario as AI being able to do some of the tasks required by these jobs, but not to fully replace humans doing them, which I would think is the defining characteristic of slow AI progress scenarios.
I’m confused by this, for a few reasons:
The question asks what scenario a future panel would believe is “best matching” the general level of AI progress in 2030, so if things fell short of the “slow” scenario, it would still be the best matching. This point is also reinforced by the instructions: “Reasonable people may disagree with our characterization of what constitutes slow, moderate, or rapid AI progress. Or they may expect to see slow progress observed with some AI capabilities and moderate or fast progress in others. Nevertheless, we ask you to select which scenario, in sum, you feel best represents your views” (p. 104).
There are several other questions in the survey that allow responses indicating very low capability levels or low societal impact. If there is a huge framing effect in the scenario question, it would have to strongly affect answers to these other questions, too (which I think is implausible), or else you should be able to show a mismatch between these questions and the scenario question (which I don’t think there is).
The actual answers don’t seem to reflect the view that most respondents believe that the slow scenario represents a very high bar (unless, again, you believe the framing effect is extremely strong): “By 2030, the average expert thinks there is a 23% chance that the state of AI most closely mirrors a “rapid” progress scenario [...]. Conversely, they give a 28% chance of a slow-progress scenario, in which AI is a widely useful assisting technology but falls short of transformative impact”.
Aside from these methodological points, I’m also surprised that you believe that the slow scenario constitutes AI that is “either nearly AGI or AGI outright”. Out of curiousity, what capability mentioned in the “slow” scenario do you think is the most implausible by 2030? To me, most of these seem pretty close to what we already have in 2025.
[disclaimer: I recommended a major grant to FRI this year, and I’ve discussed LEAP with them several times]
Seems true. But then again, it feels suspicious that a cleverly worded and funny Quick Take would just happen to be one of the rare true generalizing statements about human psychology…
Thanks, that’s a helpful clarification! “Allowed” still feels like a strong choice of words, but I can see that the line between that and “I’m not sure if this will be perceived as annoying” is blurry, and also, the latter feels frustrating enough.
I’m only speaking in personal capacity here, but my strong preference would always be for these questions to be raised!
I’m not sure if I’m allowed to ask this [...].
Maybe you don’t mean this literally, but I find this really sadkind of horrifying. Who do you think wouldn’t allow you to ask this question, and why?
Disagree-voted. I think there are issues with the Neglectedness heuristic, but I don’t think the N in ITN is fully captured by I and T.
For example, one possible rephrasing of ITN is: (certainly not covering all the ways in which it is used)Would it be good to solve problem P?
Can I solve P?
How many other people are trying to solve P?
I think this is a great way to decompose some decision problems. For instance, it seems very useful for thinking about prioritizing research, because (3) helps you answer the important question “If I don’t solve P, will someone else?” (even if this is also affected by 2).
(edited. Originally, I put the question “If I don’t solve P, will someone else?” under 3., which was a bit sloppy)
I think it’s borderline whether reports of this type are forecasting as commonly understood, but would personally lean no in the specific cases you mention (except maybe the bio anchors report).
I really don’t think that this intuition is driven by the amount of time or effort that went into them, but rather the percentage of intellectual labor that went into something like “quantifying uncertainty” (rather than, e.g. establishing empirical facts, reviewing the literature, or analyzing the structure of commonly-made arguments).As for our grantmaking program: I expect we’ll have a more detailed description of what we want to cover later this year, where we might also address points about the boundaries to worldview investigations.
Hi Dan,
Thanks for writing this! Some (weakly-held) points of skepticism:I find it a bit nebulous what you do and don’t count as a rationale. Similarly to Eli,* I think on some readings of your post, “forecasting” becomes very broad and just encompasses all of research. Obviously, research is important!
Rationales are costly! Taking that into account, I think there is a role to play for “just the numbers” forecasting, e.g.:
Sometimes you just want to defer to others, especially if an existing track record establishes that the numbers are reliable. For instance, when looking at weather forecasts, or (at least until last year) looking at 538’s numbers for an upcoming election, it would be great if you understood all the details of what goes into the numbers, but the numbers themselves are plenty useful, too.
Even without a track record, just-the-number forecasts give you a baseline of what people believe, which allows you to observe big shifts. I’ve heard many people express things like “I don’t defer to the Metaculus on AGI arrival, but it was surely informative to see by how much the community prediction has moved over the last few years”.
Just-the-number forecasts let you spot disagreements with other people, which helps finding out where talking about rationales/models is particularly important.
I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts. Replying to this demand with AI-generated rationales might shift the skeptical take from “they’re just making up numbers” to “it’s all based on LLM hallucinations” that I’m not sure really addresses the underlying problem.
*OTOH, I think Eli is also hinting at a definition of forecasting that is too narrow. I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don’t agree that forecasting by definition means that little effort was put into it!
Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?” If yes, it’s forecasting, if not, it’s something else.
As the program is about forecasting, what is your stance on the broader field of foresight & futures studies? Why is forecasting more promising than some other approaches to foresight?
We are open to considering projects in “forecasting-adjacent” areas, and projects that combine forecasting with ideas from related fields are certainly well within the scope of the program.
As for projects that would exclusively rely on other approaches: My worry is that non-probabilistic foresight techniques typically don’t have more to show in terms of evidence for their effectiveness, while being more ad hoc from a theoretical perspective.
Just confirming that informing our own decisions was part of the motivation for past grants, and I expect it to play an important role for our forecasting grants in the future.
[The forecasting money] seems to have overwhelmingly gone to community forecasting sites like Manifold and Metaculus. I don’t see anything like “paying 3 teams of 3 forecasters to compete against each other on some AI timelines questions”.
That’s directionally true, but I think “overwhelmingly” isn’t right.
We did not fund Manifold.
One of our largest forecasting grants went to FRI, which is not a platform.
While it’s fair to say that Metaculus is mostly a platform, it also runs externally-funded tournaments, and has a pro forecaster service.
There were a few grants to more narrowly defined projects.
Most of these are currently not assgined to forecasting as a cause area, but you can find themhere(searching for “forecast” in our grants database), see especially those before August 2021.[Update: we have updated the labels, and these grants are now listed here ].
I expect that we’ll make more of these types of grants now that forecasting is a designated area with more capacity.
I’m glad to see the debate on decision relevance in the comments! I think that if we end up considering forecasting a successful focus area in 5-10 years, thinking hard about the value-add to decision-making will likely have played a crucial role in this success.
As for my own view, I do agree that judgmental / subjective probability forecasting hasn’t been as much of a success story as one might have expected about 10 years ago. I also agree that many of the stories people tell about the impact of forecasting naturally raise questions like “so why isn’t this a huge industry now? Why is this project a non-profit?”. We are likely to ask questions of this kind to prospective grantees way more often than grantmakers in other focus areas.
However, I (unsurprisingly) also disagree with the stronger claim that the lack of a large judgmental forecasting industry is conclusive evidence that forecasting doesn’t provide value, and is just an EA hobby horse. While I don’t have capacity to engage in this debate deeply, a few points of rebuttal:I do think there have been some successes. For instance, the XPT mentioned in this comment certainly affected the personal beliefs of some people in the EA community, and thereby had an influence on resource allocation and career decisions.
Forecasting, as such, is a large industry. I’d assign considerable weight to the idea that making judgmental forecasting a success of the kind that model-driven forecasting approaches have been in areas like finance, marketing or sports, is a harder, but solvable task. There might simply be a free-riding problem for investing the resources necessary for figuring out the best way to make it work.
As a related indirect argument, forecasting has a pretty straightforward a priori case (more accurate information leads to better decision-making), and there are plenty of candidate explanations for why its widespread adoption would have been difficult despite forecasting having the potential to be widely useful (e.g. I’m sympathetic to the points made by MaxRa here). Thus, even after updating on the observation that judgmental forecasting hasn’t conquered the world yet, I don’t think we should assign high confidence that it will forever stay a niche industry.
As others have pointed out, only a fairly small fraction of Open Phil’s spending has gone into forecasting so far (about 1%), and this is unlikely to dramatically change in the future. The forecasting community doesn’t need to become a multi-billion industry to justify that level of spending.
This RFP will be closing soon: it will be closed after February 6, so make sure to submit by then!