All views are my own rather than those of any organizations/groups that I’m affiliated with. Trying to share my current views relatively bluntly. Note that I am often cynical about things I’m involved in. Thanks to Adam Binks for feedback.
Edit 2: I think the grantmaking program has different scope than I was expecting; see this comment by Benjamin for more.
Following some of the skeptical comments here, I figured it might be useful to quickly write up some personal takes on forecasting’s promise and what subareas I’m most excited about (where “forecasting” (edit: is defined as things in the vein of “Tetlockian superforecasting” or general prediction markets/platforms, in which questions are often answered by lots of people spending a little time on them, without much incentive to provide deep rationales) is defined as things I would expect to be in the scope of OpenPhil’s program to fund).
Overall, most forecasting grants that OP has made seem much lower EV than the AI safety grants (I’m not counting grants that seem more AI-y than forecasting-y, e.g. Epoch, and I believe these wouldn’t be covered by the new grantmaking program). In part due to my ASI timelines (10th (edit: ~15th) percentile ~2027, median ~late 2030s), I’m most excited about forecasting grants that are closely related to AI, though I’m not super confident that no non-AI related ones are above the bar.
I generally agree with the view that I’ve heard repeated a few times that EAs significantly overrate forecasting as a cause area, while the rest of the world significantly underrates it.
I think EAs often overrate superforecasters’ opinions, they’re not magic. A lot of superforecasters aren’t great (at general reasoning, but even at geopolitical forecasting), there’s plenty of variation in quality.
General quality: Becoming a superforecaster selects for some level of intelligence, open-mindedness, and intuitive forecasting sense among the small group of people who actually make 100 forecasts on GJOpen. There are tons of people (e.g. I’d guess very roughly 30-60% of AI safety full-time employees?) who would become superforecasters if they bothered to put in the time.
Some background: as I’ve written previously I’m intuitively skeptical of the benefits of large amounts of forecasting practice (i.e. would guess strong diminishing returns).
Specialties / domain expertise: Contra a caricturized “superforecasters are the best at any forecasting questions” view, consider a grantmaker deciding whether to fund an organization. They are, whether explicitly or implicitly, forecasting a distribution of outcomes for the grant. But I’d guess most would agree that superforecasters would do significantly worse than grantmakers at this “forecasting question”. A similar argument could be made for many intellectual jobs, which could be framed as forecasting. The question on whether superforecasters are relatively better isn’t “Is this task answering a forecasting question“ but rather “What are the specific attributes of this forecasting question”.
Some people seem to think that the key difference between questions superforecasters are good at vs. smart domain experts are in questions that are *resolvable* or *short-term*. I tend to think that the main differences are along the axes of *domain-specificity* and *complexity*, though these are of course correlated with the other axes. Superforecasters are selected for being relatively good at short-term, often geopolitical questions.
As I’ve written previously: It varies based on the question/domain how much domain expertise matters, but ultimately I expect reasonable domain experts to make better forecasts than reasonable generalists in many domains.
There’s an extreme here where e.g. forecasting what the best chess move is obviously better done by chess experts rather than superforecasters.
So if we think of a spectrum from geopolitics to chess, it’s very unclear to me where things like long-term AI forecasts land.
This intuition seems to be consistent with the lack of quality existing evidence described in Arb’s report (which debunked the “superforecasters beat intelligence experts without classified information” claim!).
Similarly, I’m skeptical of the straw rationalist view that highly liquid well-run prediction markets would be an insane societal boon, rather than a more moderate-large one (hard to operationalize, hope you get the vibe). See here for related takes. This might change with superhuman AI forecasters though, whose “time” might be more plentiful.
Historically, OP-funded forecasting platforms (Metaculus, INFER) seem to be underwhelming on publicly observable impact per dollar (in terms of usefulness for important decision-makers, user activity, rationale quality, etc.). Maybe some private influence over decision-makers makes up for it, but I’m pretty skeptical.
Tbh, it’s not clear that these and other platforms currently provide more value to the world than the opportunity cost of the people who spend time on them. e.g. I was somewhat addicted to Metaculus then later Manifold for a bit and spent more time on these than I would reflectively endorse (though it’s plausible that they were mostly replacing something worse like social media). I resonate with some of the comments on the EA Forum post mentioning that it’s a very nerd-sniping activity; forecasting to move up a leaderboard (esp. w/quick-resolving questions) is quite addicting to me compared to normal work activities.
I’ve heard arguments that getting superforecasted probabilities on things is good because they’re more legible/credible because they’re “backed by science”. I don’t have an airtight argument against this, but it feels slimy to me due to my beliefs above about superforecaster quality.
Regarding whether forecasting orgs should try to make money, I’m in favor of pushing in that direction as a signal of actually providing value, though it’s of course a balance re: the incentives there and will depend on the org strategy.
The types of forecasting grants I’d feel most excited about atm are, roughly ordered, and without a claim that any are above OpenPhil’s GCR bar (and definitely not exhaustive, and biased toward things I’ve thought about recently):
Making AI products for forecasting/epistemics in the vein of FutureSearch and Elicit. I’m also interested in more lightweight forecasting/epistemic assistants.
FutureSearch and systems in recentpapers are already pretty good at forecasting, and I expect substantial improvements soon with next-gen models.
I’m excited about making AIs push toward what’s true rather than what sounds right at first glance or is pushed by powerful actors.
However, even if we have good forecasting/epistemics AIs, I’m worried that it won’t convince people of the truth since people are irrational and often variance in their beliefs is explained by gaining status/power, vibes, social circles, etc. It seems especially hard to change people’s minds on very tribal things, which seem correlated with the most important beliefs to change.
AI friends might actually be more important than AI forecasters for epistemics, but that doesn’t mean AI forecasters are useless.
Judgmental forecasting of AI threat models, risks, etc. involving a mix of people who have AI / dangerous domain expertise and/or very strong forecasting track record (>90th percentile superforecaster), ideally as many people as possible who have both. Not sure how helpful it will be but it seems maybe worth more people trying.
In particular, forecasting that can help inform risk assessment / RSPs seems like a great thing to try. See also discussion of the Delphi technique in the context of AGI risk assessment here. Malcolm Murray at GovAI is running a Delphi study to get estimates of likelihood and impact of various AI risks from experts.
This is related to a broader class of interventions that might look somewhat like a “structured review process” in which one would take an in-depth threat modeling report and have various people review and contribute their own forecasts in addition to qualitative feedback. My sense is that when superforecasters reviewed Joe Carlsmith’s p(doom) forecast in a similar vein that the result wasn’t that useful, but the exercise could plausibly be more useful with better quality reviews/forecasts. It’s unclear whether this would be a good use of resources above the usual ad-hoc/non-forecasting review process, but might be worth trying more.
Forecasting tournaments on AI questions with large prize pools: I think these historically have been meh (e.g. the Metaculus one attracted few forecasters, wasn’t fun to forecast on (for me at least), and I’d guess significantly improved ~no important decisions), but I think it’s plausible things could go better now as AIs are much more capable, there are many more interesting and maybe important things to predict, etc.
Crafting forecasting questions that are cruxy on threat models / intervention prioritization between folks working on AI safety
It’s kind of wild that there has been so little success on this front. See frustrations from Alex Turner “I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun.” I worry that this will take a bunch of effort and not get very far (see Paul/Eliezer finding only a somewhat related bet re: their takeoff speeds disagreement), but it seems worth giving a more thorough shot with different participants.
I’m relatively excited about doing things within the AI safety group rather than between this group and others (e.g. superforecasters) because I expect the results might be more actionable for AI safety people. (edit: I got feedback that this bullet was too tribal and I think that might be right. I think that a better distinction might be preferring inclusion of people who’ve thought deeply about the future of AI, rather than e.g. superforecaster generalists)
I incorporated some snippets of a reflections section from a previous forecasting retrospective above, but there’s a little that I didn’t include if you’re inclined to check it out.
I feel like I need to reply here, as I’m working in the industry and defend it more.
First, to be clear, I generally agree a lot with Eli on this. But I’m more bullish on epistemic infrastructure than he is.
Here are some quick things I’d flag. I might write a longer post on this issue later.
I’m similarly unsure about a lot of existing forecasting grants and research. In general, I’m not very excited about most academic-style forecasting research at the moment, and I don’t think there are many technical groups at all (maybe ~30 full time equivalents in the field, in organizations that I could see EAs funding, right now?).
I think that for further funding in this field to be exciting, funders should really work on designing/developing this field to emphasize the very best parts. The current median doesn’t seem great to me, but I think the potential has promise, and think that smart funding can really triple-down on the good stuff. I think it’s sort of unfair to compare forecasting funding (2024) to AI Safety funding (2024), as the latter has had much more time to become mature. This includes having better ideas for impact and attracting better people. I think that if funders just “funded the median projects”, then I’d expect the field to wind up in a similar place to it is now—but if funders can really optimize, then I’d expect them to be taking a decent-EV risk. (Decent chance of failure, but some chance at us having a much more exciting field in 3-10 years).
I’d prefer funders focus on “increasing wisdom and intelligence” or “epistemic infrastructure” than on “forecasting specifically”. I think that the focus on forecasting is over-limiting. That said, I could see an argument to starting from a forecasting angle, as other interventions in “wisdom and intelligence / epistemic infrastructure” are more speculative.
If I were deploying $50M here, I’d probably start out by heavily prioritizing prioritization work itself—work to better understand this area and what is exciting within it. (I explain more of this in the wisdom/intelligence post above). I generally think that there’s been way too little good investigation and prioritization work in this area.
Like Eli, I’m much more optimistic about “epistemic work to help EAs” than I am “epistemic work to help all of society”, at very least in the short-term. Epistemics/forecasting work requires a lot of marginal costs to help any given population, and I believe that “helping N EAs” is often much more impactful than helping N people from most other groups. (This is almost true by definition, for people of any certain background).
I’d like to flag that I think that Metaculus/Manifold/Samotsvety/etc forecasting has been valuable for EA decision-making. I’d hate to give this up or de-prioritize this sort of strategy.
I don’t particularly trust EA decision-making right now. It’s not that I think I could personally do better, but rather that we are making decisions about really big things, and I think we have a lot of reason for humility. When choosing between “trying to better figure out how to think and what to do” vs. “trying to maximize the global intervention that we currently think is highest-EV,” I’m nervous about us ignoring the former and going all-in on the latter. That said, some of the crux might be that I’m less certain about our current marginal AI Safety interventions than I think Eli is.
Personally, around forecasting, I’m most excited about ambitious, software-heavy proposals. I imagine that AI will be a major part of any compelling story here.
I’d also quickly flag that around AI Safety—I agree that in some ways AI safety is very promising right now. There seems to have been a ton of great talent brought in recently, so there are some excellent people (at very least) to give funding to. I think it’s very unfortunate how small the technical AI safety grantmaking team is at OP. Personally I’d hope that this team could quickly get to 5-30 full time equivalents. However, I don’t think this needs to come at the expense of (much) forecasting/epistemics grantmaking capacity.
I think you can think of a lot of “EA epistemic/evaluation/forecasting work” as “internal tools/research for EA”. As such, I’d expect that it could make a lot of sense for us to allocate ~5-30% of our resources to it. Maybe 20% of that would be on the “R&D” to this part—perhaps more if you think this part is unusually exciting due to AI advancements. I personally am very interested in this latter part, but recognize it’s a fraction of a fraction of the full EA resources.
While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects. And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
Thoughts on some of your bullet points:
2. I think that for further funding in this field to be exciting, funders should really work on designing/developing this field to emphasize the very best parts. The current median doesn’t seem great to me, but I think the potential has promise, and think that smart funding can really triple-down on the good stuff. I think it’s sort of unfair to compare forecasting funding (2024) to AI Safety funding (2024), as the latter has had much more time to become mature. This includes having better ideas for impact and attracting better people. I think that if funders just “funded the median projects”, then I’d expect the field to wind up in a similar place to it is now—but if funders can really optimize, then I’d expect them to be taking a decent-EV risk. (Decent chance of failure, but some chance at us having a much more exciting field in 3-10 years).
I was trying to compare previous OP forecasting funding to previous AI Safety. It’s not clear to me how different these were; sure, OP didn’t have a forecasting program but AI safety was also very short-staffed. And re: the field maturing idk Tetlock has been doing work on this for a long time, my impression is that AI safety also had very little effort going into it until like mid-late 2010s. I agree that funding of potentially promising exploratory approaches is good though.
3. I’d prefer funders focus on “increasing wisdom and intelligence” or “epistemic infrastructure” than on “forecasting specifically”. I think that the focus on forecasting is over-limiting. That said, I could see an argument to starting from a forecasting angle, as other interventions in “wisdom and intelligence / epistemic infrastructure” are more speculative.
Seems reasonable. I did like that post!
4. If I were deploying $50M here, I’d probably start out by heavily prioritizing prioritization work itself—work to better understand this area and what is exciting within it. (I explain more of this in the wisdom/intelligence post above). I generally think that there’s been way too little good investigation and prioritization work in this area.
Perhaps, but I think you gain a ton of info from actually trying to do stuff and iterating. I think prioritization work can sometimes seem more intuitively great than it ends up being, relative to the iteration strategy.
6. I’d like to flag that I think that Metaculus/Manifold/Samotsvety/etc forecasting has been valuable for EA decision-making. I’d hate to give this up or de-prioritize this sort of strategy.
I would love for this to be true! Am open to changing mind based on a compelling analysis.
7. I don’t particularly trust EA decision-making right now. It’s not that I think I could personally do better, but rather that we are making decisions about really big things, and I think we have a lot of reason for humility. When choosing between “trying to better figure out how to think and what to do” vs. “trying to maximize the global intervention that we currently think is highest-EV,” I’m nervous about us ignoring the former and going all-in on the latter. That said, some of the crux might be that I’m less certain about our current marginal AI Safety interventions than I think Eli is.
There might be some difference in perceptions of the direct EV of marginal AI Safety interventions. There might also be differences in beliefs in the value of (a) prioritization research vs. (b) trying things out and iterating, as described above (perhaps we disagree on absolute value of both (a) and (b)).
8. Personally, around forecasting, I’m most excited about ambitious, software-heavy proposals. I imagine that AI will be a major part of any compelling story here.
Seems reasonable, though I’d guess we have different views on which ambitious AI-related software-heavy projects.
9. I’d also quickly flag that around AI Safety—I agree that in some ways AI safety is very promising right now. There seems to have been a ton of great talent brought in recently, so there are some excellent people (at very least) to give funding to. I think it’s very unfortunate how small the technical AI safety grantmaking team is at OP. Personally I’d hope that this team could quickly get to 5-30 full time equivalents. However, I don’t think this needs to come at the expense of (much) forecasting/epistemics grantmaking capacity.
I think you might be understating how fungible OpenPhil’s efforts are between AI safety (particularly governance team) and forecasting. Happy to chat in DM if you disagree. Otherwise reasonable point, though you’d ofc still have to do the math to make sure the forecasting program is worth it.
(edit: actually maybe the disagreement is still in the relative value of the work, depending on what you mean by “much” grantmaking capacity)
10. I think you can think of a lot of “EA epistemic/evaluation/forecasting work” as “internal tools/research for EA”. As such, I’d expect that it could make a lot of sense for us to allocate ~5-30% of our resources to it. Maybe 20% of that would be on the “R&D” to this part—perhaps more if you think this part is unusually exciting due to AI advancements. I personally am very interested in this latter part, but recognize it’s a fraction of a fraction of the full EA resources.
Seems unclear what should count as internal research for EA, e.g. are you counting OP worldview investigation team / AI strategy research in general? And re: AI advancements, it both improves the promise of AI for forecasting/epistemics work but also shortens timelines which points toward direct AI safety technical/gov work.
First, again, overall, I think we generally agree on most of this stuff.
Perhaps, but I think you gain a ton of info from actually trying to do stuff and iterating. I think prioritization work can sometimes seem more intuitively great than it ends up being, relative to the iteration strategy.
I agree to an extent. But I think there are some very profound prioritization questions that haven’t been researched much, and that I don’t expect us to gain much insight from by experimentation in the next few years. I’d still like us to do experimentation (If I were in charge of a $50Mil fund, I’d start spending it soon, just not as quickly as I would otherwise). For example:
How promising is it to improve the wisdom/intelligence of EAs vs. others?
How promising are brain-computer-interfaces vs. rationality training vs. forecasting?
What is a good strategy to encourage epistemic-helping AI, where philanthropists could have the most impact?
What kinds of benefits can we generically expect from forecasting/epistemics? How much should we aim for EAs to spend here?
I would love for this to be true! Am open to changing mind based on a compelling analysis.
We might be disagreeing a bit on what the bar for “valuable for EA decision-making” is. I see a lot of forecasting like accounting—it rarely leads to a clear and large decision, but it’s good to do, and steers organizations in better directions. I personally rely heavily on prediction markets for key understandings of EA topics, and see that people like Scott Alexander and Zvi seem to. I know less about the inner workings of OP, but the fact that they continue to pay for predictions that are very much for their questions seems like a sign. All that said, I think that ~95%+ of Manifold and a lot of Metaculus is not useful at all.
I think you might be understating how fungible OpenPhil’s efforts are between AI safety (particularly governance team) and forecasting
I’m not sure how much to focus on OP’s narrow choices here. I found it surprising that Javier went from governance to forecasting, and that previously it was the (very small) governance team that did forecasting. It’s possible that if I evaluated the situation, and had control of the situation, I’d recommend that OP moved marginal resources to governance from forecasting. But I’m a lot less interested in this question than I am, “is forecasting competitive with some EA activities, and how can we do it well?”
Seems unclear what should count as internal research for EA, e.g. are you counting OP worldview diversification team / AI strategy research in general?
Just chatted with @Ozzie Gooen about this and will hopefully release audio soon. I probably overstated a few things / gave a false impression of confidence in the parent in a few places (e.g., my tone was probably a little too harsh on non-AI-specific projects); hopefully the audio convo will give a more nuanced sense of my views. I’m also very interested in criticisms of my views and others sharing competing viewpoints.
Also want to emphasize the clarifications from my reply to Ozzie:
While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects (and am for the most part more pessimistic about forecasting than others in these groups/orgs). And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
All views are my own rather than those of any organizations/groups that I’m affiliated with. Trying to share my current views relatively bluntly. Note that I am often cynical about things I’m involved in. Thanks to Adam Binks for feedback.
Edit: See also child comment for clarifications/updates.
Edit 2: I think the grantmaking program has different scope than I was expecting; see this comment by Benjamin for more.
Following some of the skeptical comments here, I figured it might be useful to quickly write up some personal takes on forecasting’s promise and what subareas I’m most excited about (where “forecasting” (edit: is defined as things in the vein of “Tetlockian superforecasting” or general prediction markets/platforms, in which questions are often answered by lots of people spending a little time on them, without much incentive to provide deep rationales)
is defined as things I would expect to be in the scope of OpenPhil’s program to fund).Overall, most forecasting grants that OP has made seem much lower EV than the AI safety grants (I’m not counting grants that seem more AI-y than forecasting-y, e.g. Epoch, and I believe these wouldn’t be covered by the new grantmaking program). In part due to my ASI timelines (
10th(edit: ~15th) percentile ~2027, median ~late 2030s), I’m most excited about forecasting grants that are closely related to AI, though I’m not super confident that no non-AI related ones are above the bar.I generally agree with the view that I’ve heard repeated a few times that EAs significantly overrate forecasting as a cause area, while the rest of the world significantly underrates it.
I think EAs often overrate superforecasters’ opinions, they’re not magic. A lot of superforecasters aren’t great (at general reasoning, but even at geopolitical forecasting), there’s plenty of variation in quality.
General quality: Becoming a superforecaster selects for some level of intelligence, open-mindedness, and intuitive forecasting sense among the small group of people who actually make 100 forecasts on GJOpen. There are tons of people (e.g. I’d guess very roughly 30-60% of AI safety full-time employees?) who would become superforecasters if they bothered to put in the time.
Some background: as I’ve written previously I’m intuitively skeptical of the benefits of large amounts of forecasting practice (i.e. would guess strong diminishing returns).
Specialties / domain expertise: Contra a caricturized “superforecasters are the best at any forecasting questions” view, consider a grantmaker deciding whether to fund an organization. They are, whether explicitly or implicitly, forecasting a distribution of outcomes for the grant. But I’d guess most would agree that superforecasters would do significantly worse than grantmakers at this “forecasting question”. A similar argument could be made for many intellectual jobs, which could be framed as forecasting. The question on whether superforecasters are relatively better isn’t “Is this task answering a forecasting question“ but rather “What are the specific attributes of this forecasting question”.
Some people seem to think that the key difference between questions superforecasters are good at vs. smart domain experts are in questions that are *resolvable* or *short-term*. I tend to think that the main differences are along the axes of *domain-specificity* and *complexity*, though these are of course correlated with the other axes. Superforecasters are selected for being relatively good at short-term, often geopolitical questions.
As I’ve written previously: It varies based on the question/domain how much domain expertise matters, but ultimately I expect reasonable domain experts to make better forecasts than reasonable generalists in many domains.
There’s an extreme here where e.g. forecasting what the best chess move is obviously better done by chess experts rather than superforecasters.
So if we think of a spectrum from geopolitics to chess, it’s very unclear to me where things like long-term AI forecasts land.
This intuition seems to be consistent with the lack of quality existing evidence described in Arb’s report (which debunked the “superforecasters beat intelligence experts without classified information” claim!).
Similarly, I’m skeptical of the straw rationalist view that highly liquid well-run prediction markets would be an insane societal boon, rather than a more moderate-large one (hard to operationalize, hope you get the vibe). See here for related takes. This might change with superhuman AI forecasters though, whose “time” might be more plentiful.
Historically, OP-funded forecasting platforms (Metaculus, INFER) seem to be underwhelming on publicly observable impact per dollar (in terms of usefulness for important decision-makers, user activity, rationale quality, etc.). Maybe some private influence over decision-makers makes up for it, but I’m pretty skeptical.
Tbh, it’s not clear that these and other platforms currently provide more value to the world than the opportunity cost of the people who spend time on them. e.g. I was somewhat addicted to Metaculus then later Manifold for a bit and spent more time on these than I would reflectively endorse (though it’s plausible that they were mostly replacing something worse like social media). I resonate with some of the comments on the EA Forum post mentioning that it’s a very nerd-sniping activity; forecasting to move up a leaderboard (esp. w/quick-resolving questions) is quite addicting to me compared to normal work activities.
I’ve heard arguments that getting superforecasted probabilities on things is good because they’re more legible/credible because they’re “backed by science”. I don’t have an airtight argument against this, but it feels slimy to me due to my beliefs above about superforecaster quality.
Regarding whether forecasting orgs should try to make money, I’m in favor of pushing in that direction as a signal of actually providing value, though it’s of course a balance re: the incentives there and will depend on the org strategy.
The types of forecasting grants I’d feel most excited about atm are, roughly ordered, and without a claim that any are above OpenPhil’s GCR bar (and definitely not exhaustive, and biased toward things I’ve thought about recently):
Making AI products for forecasting/epistemics in the vein of FutureSearch and Elicit. I’m also interested in more lightweight forecasting/epistemic assistants.
FutureSearch and systems in recent papers are already pretty good at forecasting, and I expect substantial improvements soon with next-gen models.
I’m excited about making AIs push toward what’s true rather than what sounds right at first glance or is pushed by powerful actors.
However, even if we have good forecasting/epistemics AIs, I’m worried that it won’t convince people of the truth since people are irrational and often variance in their beliefs is explained by gaining status/power, vibes, social circles, etc. It seems especially hard to change people’s minds on very tribal things, which seem correlated with the most important beliefs to change.
AI friends might actually be more important than AI forecasters for epistemics, but that doesn’t mean AI forecasters are useless.
I might think/write more about this soon. See also Lukas’s Epistemics Project Ideas and ACX on AI for forecasting
Judgmental forecasting of AI threat models, risks, etc. involving a mix of people who have AI / dangerous domain expertise and/or very strong forecasting track record (>90th percentile superforecaster), ideally as many people as possible who have both. Not sure how helpful it will be but it seems maybe worth more people trying.
In particular, forecasting that can help inform risk assessment / RSPs seems like a great thing to try. See also discussion of the Delphi technique in the context of AGI risk assessment here. Malcolm Murray at GovAI is running a Delphi study to get estimates of likelihood and impact of various AI risks from experts.
This is related to a broader class of interventions that might look somewhat like a “structured review process” in which one would take an in-depth threat modeling report and have various people review and contribute their own forecasts in addition to qualitative feedback. My sense is that when superforecasters reviewed Joe Carlsmith’s p(doom) forecast in a similar vein that the result wasn’t that useful, but the exercise could plausibly be more useful with better quality reviews/forecasts. It’s unclear whether this would be a good use of resources above the usual ad-hoc/non-forecasting review process, but might be worth trying more.
Forecasting tournaments on AI questions with large prize pools: I think these historically have been meh (e.g. the Metaculus one attracted few forecasters, wasn’t fun to forecast on (for me at least), and I’d guess significantly improved ~no important decisions), but I think it’s plausible things could go better now as AIs are much more capable, there are many more interesting and maybe important things to predict, etc.
Crafting forecasting questions that are cruxy on threat models / intervention prioritization between folks working on AI safety
It’s kind of wild that there has been so little success on this front. See frustrations from Alex Turner “I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun.” I worry that this will take a bunch of effort and not get very far (see Paul/Eliezer finding only a somewhat related bet re: their takeoff speeds disagreement), but it seems worth giving a more thorough shot with different participants.
I’m relatively excited about doing things within the AI safety group rather than between this group and others (e.g. superforecasters) because I expect the results might be more actionable for AI safety people. (edit: I got feedback that this bullet was too tribal and I think that might be right. I think that a better distinction might be preferring inclusion of people who’ve thought deeply about the future of AI, rather than e.g. superforecaster generalists)
I incorporated some snippets of a reflections section from a previous forecasting retrospective above, but there’s a little that I didn’t include if you’re inclined to check it out.
I feel like I need to reply here, as I’m working in the industry and defend it more.
First, to be clear, I generally agree a lot with Eli on this. But I’m more bullish on epistemic infrastructure than he is.
Here are some quick things I’d flag. I might write a longer post on this issue later.
I’m similarly unsure about a lot of existing forecasting grants and research. In general, I’m not very excited about most academic-style forecasting research at the moment, and I don’t think there are many technical groups at all (maybe ~30 full time equivalents in the field, in organizations that I could see EAs funding, right now?).
I think that for further funding in this field to be exciting, funders should really work on designing/developing this field to emphasize the very best parts. The current median doesn’t seem great to me, but I think the potential has promise, and think that smart funding can really triple-down on the good stuff. I think it’s sort of unfair to compare forecasting funding (2024) to AI Safety funding (2024), as the latter has had much more time to become mature. This includes having better ideas for impact and attracting better people. I think that if funders just “funded the median projects”, then I’d expect the field to wind up in a similar place to it is now—but if funders can really optimize, then I’d expect them to be taking a decent-EV risk. (Decent chance of failure, but some chance at us having a much more exciting field in 3-10 years).
I’d prefer funders focus on “increasing wisdom and intelligence” or “epistemic infrastructure” than on “forecasting specifically”. I think that the focus on forecasting is over-limiting. That said, I could see an argument to starting from a forecasting angle, as other interventions in “wisdom and intelligence / epistemic infrastructure” are more speculative.
If I were deploying $50M here, I’d probably start out by heavily prioritizing prioritization work itself—work to better understand this area and what is exciting within it. (I explain more of this in the wisdom/intelligence post above). I generally think that there’s been way too little good investigation and prioritization work in this area.
Like Eli, I’m much more optimistic about “epistemic work to help EAs” than I am “epistemic work to help all of society”, at very least in the short-term. Epistemics/forecasting work requires a lot of marginal costs to help any given population, and I believe that “helping N EAs” is often much more impactful than helping N people from most other groups. (This is almost true by definition, for people of any certain background).
I’d like to flag that I think that Metaculus/Manifold/Samotsvety/etc forecasting has been valuable for EA decision-making. I’d hate to give this up or de-prioritize this sort of strategy.
I don’t particularly trust EA decision-making right now. It’s not that I think I could personally do better, but rather that we are making decisions about really big things, and I think we have a lot of reason for humility. When choosing between “trying to better figure out how to think and what to do” vs. “trying to maximize the global intervention that we currently think is highest-EV,” I’m nervous about us ignoring the former and going all-in on the latter. That said, some of the crux might be that I’m less certain about our current marginal AI Safety interventions than I think Eli is.
Personally, around forecasting, I’m most excited about ambitious, software-heavy proposals. I imagine that AI will be a major part of any compelling story here.
I’d also quickly flag that around AI Safety—I agree that in some ways AI safety is very promising right now. There seems to have been a ton of great talent brought in recently, so there are some excellent people (at very least) to give funding to. I think it’s very unfortunate how small the technical AI safety grantmaking team is at OP. Personally I’d hope that this team could quickly get to 5-30 full time equivalents. However, I don’t think this needs to come at the expense of (much) forecasting/epistemics grantmaking capacity.
I think you can think of a lot of “EA epistemic/evaluation/forecasting work” as “internal tools/research for EA”. As such, I’d expect that it could make a lot of sense for us to allocate ~5-30% of our resources to it. Maybe 20% of that would be on the “R&D” to this part—perhaps more if you think this part is unusually exciting due to AI advancements. I personally am very interested in this latter part, but recognize it’s a fraction of a fraction of the full EA resources.
Thanks Ozzie for sharing your thoughts!
A few things I want to clarify up front:
While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects. And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
Thoughts on some of your bullet points:
I was trying to compare previous OP forecasting funding to previous AI Safety. It’s not clear to me how different these were; sure, OP didn’t have a forecasting program but AI safety was also very short-staffed. And re: the field maturing idk Tetlock has been doing work on this for a long time, my impression is that AI safety also had very little effort going into it until like mid-late 2010s. I agree that funding of potentially promising exploratory approaches is good though.
Seems reasonable. I did like that post!
Perhaps, but I think you gain a ton of info from actually trying to do stuff and iterating. I think prioritization work can sometimes seem more intuitively great than it ends up being, relative to the iteration strategy.
I would love for this to be true! Am open to changing mind based on a compelling analysis.
There might be some difference in perceptions of the direct EV of marginal AI Safety interventions. There might also be differences in beliefs in the value of (a) prioritization research vs. (b) trying things out and iterating, as described above (perhaps we disagree on absolute value of both (a) and (b)).
Seems reasonable, though I’d guess we have different views on which ambitious AI-related software-heavy projects.
I think you might be understating how fungible OpenPhil’s efforts are between AI safety (particularly governance team) and forecasting. Happy to chat in DM if you disagree. Otherwise reasonable point, though you’d ofc still have to do the math to make sure the forecasting program is worth it.
(edit: actually maybe the disagreement is still in the relative value of the work, depending on what you mean by “much” grantmaking capacity)
Seems unclear what should count as internal research for EA, e.g. are you counting OP worldview investigation team / AI strategy research in general? And re: AI advancements, it both improves the promise of AI for forecasting/epistemics work but also shortens timelines which points toward direct AI safety technical/gov work.
Thanks for the replies! Some quick responses.
First, again, overall, I think we generally agree on most of this stuff.
I agree to an extent. But I think there are some very profound prioritization questions that haven’t been researched much, and that I don’t expect us to gain much insight from by experimentation in the next few years. I’d still like us to do experimentation (If I were in charge of a $50Mil fund, I’d start spending it soon, just not as quickly as I would otherwise). For example:
How promising is it to improve the wisdom/intelligence of EAs vs. others?
How promising are brain-computer-interfaces vs. rationality training vs. forecasting?
What is a good strategy to encourage epistemic-helping AI, where philanthropists could have the most impact?
What kinds of benefits can we generically expect from forecasting/epistemics? How much should we aim for EAs to spend here?
We might be disagreeing a bit on what the bar for “valuable for EA decision-making” is. I see a lot of forecasting like accounting—it rarely leads to a clear and large decision, but it’s good to do, and steers organizations in better directions. I personally rely heavily on prediction markets for key understandings of EA topics, and see that people like Scott Alexander and Zvi seem to. I know less about the inner workings of OP, but the fact that they continue to pay for predictions that are very much for their questions seems like a sign. All that said, I think that ~95%+ of Manifold and a lot of Metaculus is not useful at all.
I’m not sure how much to focus on OP’s narrow choices here. I found it surprising that Javier went from governance to forecasting, and that previously it was the (very small) governance team that did forecasting. It’s possible that if I evaluated the situation, and had control of the situation, I’d recommend that OP moved marginal resources to governance from forecasting. But I’m a lot less interested in this question than I am, “is forecasting competitive with some EA activities, and how can we do it well?”
Yep, I’d count these.
Just chatted with @Ozzie Gooen about this and will hopefully release audio soon. I probably overstated a few things / gave a false impression of confidence in the parent in a few places (e.g., my tone was probably a little too harsh on non-AI-specific projects); hopefully the audio convo will give a more nuanced sense of my views. I’m also very interested in criticisms of my views and others sharing competing viewpoints.
Also want to emphasize the clarifications from my reply to Ozzie:
While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects (and am for the most part more pessimistic about forecasting than others in these groups/orgs). And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
Audio/podcast is here:
https://forum.effectivealtruism.org/posts/fsnMDpLHr78XgfWE8/podcast-is-forecasting-a-promising-ea-cause-area