Forecasting is Way Overrated, and We Should Stop Funding It
Summary
EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very useful, yet it continues to receive substantial EA funding. We should cut it off.
My Experience with Forecasting
For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I’m still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI bets. I say this to suggest that I would gain status from forecasting being seen as useful, but I think, to the contrary, that the EA community should stop funding it.
I’ve written a few comments throughout the years that I didn’t think forecasting was worth funding. You can see some of these here and here. Finally, I have gotten around to making this full post.
Solution Seeking a Problem
When talking about forecasting, people often ask questions like “How can we leverage forecasting into better decisions?” This is the wrong way to go about solving problems. You solve problems by starting with the problem, and then you see which tools are useful for solving it.
The way people talk about forecasting is very similar to how people talk about cryptocurrency/blockchain. People have a tool they want to use, whether that be cryptocurrency or forecasting, and then try to solve problems with it because they really believe in the solution, but I think this is misguided. You have to start with the problem you are trying to solve, not the solution you want to apply. A lot of work has been put into building up forecasting, making platforms, hosting tournaments, etc., on the assumption that it was instrumentally useful, but this is pretty dangerous to continue without concrete gains.
We’ve Funded Enough Forecasting that We Should See Tangible Gains
It’s not the case that forecasting/prediction markets are merely in their infancy. A lot of money has gone into forecasting. On the EA side of things, it’s near $100M. If I convince you later on in this post that forecasting hasn’t given any fruitful results, it should be noted that this isn’t for lack of trying/spending.
The Forecasting Research Institute received grants in the 10s of millions of dollars. Metaculus continues to receive millions of dollars per year to maintain a forecasting platform and conduct some forecasting tournaments. The Good Judgment Project and the Swift Centre have received millions of dollars for doing research and studies on forecasting and teaching others about forecasting. Sage has received millions of dollars to develop forecasting tools. Many others, like Manifold, have also been given millions by the EA community in grants/investments at high valuations, diverting money away from other EA causes. We have grants for organizations that develop tooling, even entire programming languages like Squiggle, for forecasting.
On the for-profit side of things, the money gets even bigger. Kalshi and Polymarket have each raised billions of dollars, and other forecasting platforms have also raised 10s of millions of dollars.
Prediction markets have also taken off. Kalshi and Polymarket are both showing ATH/growth in month-over-month volume. Both of them have monthly volumes in the 10s of billions of dollars. Total prediction market volume is something like $500B/year, but it just isn’t very useful. We get to know the odds on every basketball game player prop, and if BTC is going to go up or down in the next 5 minutes. While some people suggest that these trivial markets help sharpen skills or identify good forecasters, I don’t think there is any evidence of this, and it is more wishful thinking.
If forecasting were really working well and was very useful, you would see the bulk of the money spent not on forecasting platforms but directly on forecasting teams or subsidizing markets on important questions. We have seen very little of this, and instead, we have seen the money go to platforms, tooling, and the like. We already had a few forecasting platforms, the market was going to fund them itself, and yet we continue to create them.
There has also been an incredible amount of (wasted) time by the EA/rationality community that has been spent on forecasting. Lots of people have been employed full-time doing forecasting or adjacent work, but perhaps even larger is the amount of part-time hours that have gone into forecasting on Manifold, among other things. I would estimate that thousands of person-years have gone into this activity.
Hits-based Giving Means Stopping the Bets that Don’t Pay Off
You may be tempted to justify forecasting on the grounds of hits-based giving. That is to say, it made sense to try a few grants into forecasting because the payoff could have been massive. But if it was based on hits-based giving, then that implies we should be looking for big payoffs, and that we have to stop funding it if it doesn’t.
I want to propose my leading theory for why forecasting continues to receive 10s of millions per year in funding. That is, it has become a feature of EA/rationalist culture. Similar to how EAs seem to live in group houses or be polyamorous, forecasting on prediction markets has become a part of the culture that doesn’t have much to do with impact. This is separate from parts of EA culture that we do for impact/value alignment reasons, like being vegan, donating 10%+ of income, writing on forums, or going to conferences. I submit that forecasting is in the former category.
At this point, if forecasting were useful, you would expect to see tangible results. I can point to you hundreds of millions of chickens that lay eggs that are out of cages, and I can point to you observable families that are no longer living in poverty. I can show you pieces of legislation that have passed or almost passed on AI. I can show you AMF successes with about 200k lives saved and far lower levels of malaria, not to mention higher incomes and longer life expectancies, and people living longer lives that otherwise wouldn’t be because of our actions. I can go at the individual level, and I can, more importantly, go at the broad statistical level. I don’t think there is very much in the way of “this forecasting happened, and now we have made demonstrably better decisions regarding this terminal goal that we care about”. Despite no tangible results, people continue to have the dream that forecasting will inform better decision-making or lead to better policies. I just don’t see any proof of this happening.
Feels Useful When It Isn’t
Forecasting is a very insidious trap because it makes you think you are being productive when you aren’t. I like to play bughouse and a bunch of different board games. But when I play these games, I don’t claim to do so for impact reasons, on effective altruist grounds. If I spend time learning strategy for these board games, I don’t pretend that this is somehow making the world better off. Forecasting is a dangerous activity, particularly because it is a fun, game-like activity that is nearly perfectly designed to be very attractive to EA/rationalist types because you get to be right when others are wrong, bet on your beliefs, and partake in the cultural practice. It is almost engineered to be a time waster for these groups because it provides the illusion that you are improving the world’s epistemics when, in reality, it’s mainly just a game, and it’s fun. You get to feel that you are improving the world’s epistemics and that therefore there must be some flow-through effects and thus you can justify the time spent by correcting a market from 57% to 53% on some AI forecasting question or some question about if the market you are trading on will have an even/odd number of traders or if someone will get a girlfriend by the end of the year.
Conclusion
A lot of people still like the idea of doing forecasting. If it becomes an optional, benign activity of the EA community, then it can continue to exist, but it should not continue to be a major target for philanthropic dollars. We are always in triage, and forecasting just isn’t making the cut. I’m worried that we will continue to pour community resources into forecasting, and it will continue to be thought of in vague terms as improving or informing decisions, when I’m skeptical that this is the case.
I don’t disagree with some of the fundamentals of this post. Before diving into that, I want to correct a factual error:
“the Swift Centre have received millions of dollars for doing research and studies on forecasting and teaching others about forecasting”
The Swift Centre for Applied Forecasting has not received millions in funding. The majority of our earnings have been through direct projects with organisations who want to use forecasting to inform their decisions.
On your wider argument. I think forecasting has probably received too much funding and the vast majority of that has misallocated on platforms and research. I believe some funding (hundreds of thousands) to maintain core platforms like Metaculus as a public good of information. Though, services like Polymarket can probably fill most of this need in the future (but many useful, informative markets would never reach the necessary volume to be reliable).
Where I think we disagree most is in the application of forecasting and some of the achievements. We’ve worked with frontier AI labs to inform their decisions, are currently advising a U.K. Minister’s team on a central piece of their policy, and are about to start a secondment where I will be advising one of the most influential decision making committees in the country to help improve their scenario analysis and forecasting. Forecasting, and specifically, the science of decision making that it is built on, has the ability to structurally improve decisions in institutions. Significantly better than asking two or three of your smartest friends. That was just never funded, so instead we conclude forecasting is not useful.
Appreciate the correction. I simply did totals I saw from Open Phil/CG spreadsheets. Ill correct the post.
“We’ve worked with frontier AI labs to inform their decisions”
This feels likely net negative to me? But don’t have enough information to know.
We could “forecast” the likelihood of that haha.
I can’t get into specifics. But if you believe activities like evaluations of models to test for dangerous behaviour etc. is net negative, then that may give credence to your assumption. As an extra data point of whether we’d do work we thought was net negative, I was Head of Policy at ControlAI and co-authored narrowpath.co, and our forecasters have done numerous AI safety focused projects (with and outside of the Swift Centre, including AI 2027).
Personally I weakly think any working with AI labs (except perhaps anthropic) supports dangerous acceleration, but I think the opposing view to this is almost as strong.
That other stuff sounds way better than working with the labs too ;)
Whilst I strongly disagree with the claim at the object level, many other non-forecasting AI safety interventions work with labs in some way, so even if this were true, the relative penalty applied to AIS forecasting work would be fairly low.
[Relevant context/COI: I’m CEO at the Forecasting Research Institute (FRI), an organization which I co-founded with Phil Tetlock and others. Much of the below is my personal perspective, though it is informed by my work. I don’t speak for others on my team. I’m sharing an initial reply now, and our team at FRI will share a larger post in future that offers a more comprehensive reflection on these topics.]
Thanks for the post — I think it’s important to critically question the value of funds going to forecasting, and this post offers a good opportunity for reflection and discussion.
In brief, I share many of your concerns about forecasting and related research, but I’m also more positive on both its impact so far and its future expected impact.
A summary of some key points:
Much of the impact of forecasting research on specific decision-makers is not public. For example, FRI has informed decisions on frontier AI companies’ capability scaling policies, has advised senior US national security decision-makers, and has informed research at key US and UK government agencies. But, we are not able to share many details of this work publicly. However, there is also public evidence that forecasting research is widely cited and informs discourse and some decision-making (some examples below).
AI timelines, adoption, and risk forecasts play a huge role in both individual career decisions and the broader AI discourse. Forecasting research still seems like one of the best tools available for getting specific and accountable beliefs on these topics. For example, comparing ‘AI safety’ community forecasts to more ‘typical’ experts’ forecasts seems especially important for understanding how much to trust each group’s views. These comparisons will become increasingly relevant for government policymakers over time, especially if there is extremely rapid AI capabilities progress that leads to major societal impacts in the short-run.
When evaluating the impact of FRI-style forecasting research, I think the closest relevant comparison classes are more like broad public goods/measurement-oriented research (e.g., Our World in Data, Epoch) or think-tank research (e.g. GovAI, IAPS). By its nature, the impact of this kind of research tends to be more diffuse and difficult to measure. However, I’d be interested in more intensive comparative evaluation of this type of research and agree that funders should be responsive to evidence about relative impact in these fields.
Forecasting research still has a ton of flaws, and its impact has been far from the dream I’ve long had for it. There are still big challenges around identifying accurate forecasters on questions related to AI, integrating conditional policy forecasts with actual decision-makers’ needs, and combining deep, individual qualitative research with high-quality, group-generated quantitative forecasts.
My extremely simplified narrative is: Tetlock et al. established the modern judgmental forecasting field and created a proof of concept for better forecasts on important topics (“superforecasting”)---this work was largely academic; some forecasting platforms were created to build on that work and apply it to a range of important issues; targeted efforts to make forecasting more directly useful to decision-makers are relatively nascent (i.e., have largely begun in the last few years), and are accumulating impact over time, but still have room for improvement.
FRI’s research, in particular, aims to close many of the gaps left by prediction markets and historical forecasting approaches: it is particularly focused on conditional policy forecasts, medium-to-long-run forecasts that do not get much detailed engagement on prediction markets/platforms, and systematically eliciting forecasts from experts who would not typically participate in forecasting platforms but whom decision-makers want to rely on (while also eliciting forecasts from generalists with strong forecasting track records).
However, some factors make the future potential impact of this work look more promising:
AI-enhanced forecasting research is a huge factor that will unlock cheaper, faster, high-quality forecasts on any question of one’s choosing.
The next few years of forecasting AI progress/adoption/impact seem critical, and like they’ll deliver a lot of answers on whose forecasts we should trust. It seems good to be ready to support decision-makers during this time.
Leaders in the AI space seem particularly interested in using forecasting in their decision-making; they tend to be both quantitative and open-minded. This creates more potential for forecasting to be useful. More minorly, prediction markets and forecasting are generally becoming more credible within governments.
More detail on some select points below. This comment already got very long (!), so I’ll reserve more elaboration for a future, more comprehensive post.
Examples of impact
Forecasting research has informed some very important decisions. Unfortunately, many of the details of the relevant evidence here cannot be made public. However, there is evidence of substantial public citation of this research, and some public evidence of affecting particular decisions.
A few examples of relevant impact include:
Forecasting has been particularly relevant for decision-making around capability scaling policies. The near-term magnitude of AI-biorisk, how growing AI capabilities may increase it, and what safeguards need to be in place to respond to it, are highly uncertain. Frontier AI companies, the EU AI Code of Practice, and other governments are trying to track and respond to AI impacts on biorisk, cybersecurity, AI R&D, and other domains. We’ve had substantial engagement with the relevant actors, including some focused partnerships, and believe our work in this area has affected important decisions, though we unfortunately cannot share many of the details publicly.
Our work on ForecastBench, a benchmark of AI’s ability to do forecasting, showed that AI-produced forecasts could catch up to top human forecasters in roughly the next year if trends persist. This generated interest among senior decision-makers in U.S. national security. We cannot share details, but this is another example of important decision-makers paying attention to and using forecasts.
We have completed commissioned research to directly inform grantmaking at Coefficient Giving, and also have indirectly affected grantmaking. For an example of the latter, our work on the Existential Risk Persuasion Tournament (XPT) partially inspired Coefficient Giving (formerly Open Philanthropy) to launch an RFP on improved AI benchmarks. The XPT forecasts predicted that most existing benchmarks would likely saturate in the next few years, and showed that progress on these benchmarks was not crux-y for disagreements about AI impact. We were told that this played a role in the launch and conception of the RFP, and the XPT is cited in the public write-up.
Some examples of more diffuse impacts — e.g., impact on public understanding of AI and research for policymakers or philanthropists, include:
FRI has given presentations to, and has ongoing connections and conversations with, important government agencies such as the Congressional Budget Office, US CAISI, the UK Department of Science, Innovation, and Technology, and others. We cannot share many details, but the potential to inform decisions at these organizations is highly important.
Major reports for policymakers, like the International AI Safety Report, the AI Index, and relevant RAND reports, also prominently cite FRI research.
FRI research is cited in places like the New York Times, The Economist, and Bloomberg to inform readers about the economic impacts of AI, AI-biorisk, general catastrophic and existential risk, AI-enhanced forecasting, and the future of AI more generally.
Forecasts are widely cited in cause prioritization research and by experts in relevant domains: as a few examples, see citations from Ethan Mollick on AI progress, 80,000 Hours on biorisk, Dr. Richard Moulange on AI-biorisk, Tyler Cowen on the economic effects of AI, Will MacAskill on AI progress and risk, etc.
For context: FRI has been operating for a little over 3 years, and we’re accumulating substantially more momentum in terms of connections to top decision-makers as time goes on.
(To be clear: I am mostly discussing FRI here since it’s what I’m most familiar with.)
AI timelines, impact, and adoption forecasts drive a huge amount of career decision-making, attention, etc.
Forecasts about AI timelines and risk have had major effects on people’s career decisions and the broader AI discourse. AI 2027 underlies popular YouTube videos, 80,000 Hours advises people on career decisions based on timelines forecasts, Dario Amodei’s “country of geniuses in a datacenter by 2027” forecast informs a lot of Anthropic’s work and policy outreach, the AI Impacts survey on AI researchers’ forecasts of existential risk is highly cited, etc.
A major reason I got into this field is that many people are making very intense claims about the effect that AI will have on the world soon, and I want to bring as much rigor and reflection as possible to those claims. So far, it looks like most forecasters are substantially underestimating AI capabilities progress (with some exceptions, e.g. on uplift studies); the evidence on forecasts about AI adoption, societal impacts, and risk is less clear, but I expect we will have more evidence soon, particularly from the Longitudinal Expert AI Panel (LEAP), especially as some forecasters are predicting transformative change in the next few years.
As the expected impact and timing of AI progress is sharpened and clarified, talent and money can be allocated more efficiently.
Case study: Economic impacts of AI
In some cases, it looks to me like forecasting research is picking relatively low-hanging fruit.
The economic impact of AI is a prominent topic of public discussion right now, and it is likely that governments will spend many billions of dollars to address it in the coming years.
Currently, economists hold major sway in public policy about the economic impacts of AI. Perhaps you think top economists, as a group, are badly mistaken about the likely near-term impacts of AI, as some Epoch researchers and others believe. Perhaps you think they are likely to be fairly accurate, as Tyler Cowen, Séb Krier, or typical economists believe. It seems like a valuable common sense intervention to at least document what various groups believe, so that when we are making economic policy going forward we can rely on that evidence to determine who is trustworthy. I believe that studies like this one (and its follow-ups) will be the clearest evidence on the topic.
Relevant comparison class for forecasting research
When thinking about the impact and cost-effectiveness of forecasting, I think it’s more appropriate to compare this work to public goods-oriented research organizations (e.g., Our World in Data, Epoch, etc.) and policy-oriented think-tank research (e.g. GovAI, IAPS, CSET, etc.).
I’ve been disappointed by most impact evaluation of think-tanks and public goods-oriented research that I’ve seen. I believe this is partly because it is very difficult to quantify the impact of this type of work because it has diffuse benefits. But, I still think it’s possible to do better and I would like FRI to do better on this front going forward.
That said, I still believe there are reasonable heuristics for why this research area could be highly cost-effective. There are many billions of dollars of philanthropic and government capital being spent on AI policy topics. If there is a meaningful indication that forecasting is changing people’s views on these questions (as I believe there is; see discussion above), it seems reasonable to me to spend a very small fraction of that capital on getting more epistemic clarity.
My critiques of forecasting research
Forecasting research, and FRI’s research in particular, still has major areas for improvement.
Examples of a few key issues:
I’ve been underwhelmed by the accuracy of typical experts and superforecasters on questions about AI capabilities progress (as measured by benchmarks); they often underestimate AI progress (with exceptions). I think this underestimation is a useful fact to document, but it would be much more helpful if our research identified experts you should trust. We’re in the process of identifying ‘Top AI forecasters’ through LEAP and aim to share updates on this soon.
I think forecasting research is at its best when combined with in-depth research reports that provide more narratives and key arguments underlying forecasts. For example, Luca Righetti’s work on estimating (certain kinds of) AI-biorisk provides a lot of valuable analysis that usefully complements our expert panel study on the topic. [Note: Luca is an FRI senior advisor and a co-author of our forecasting study.] For decision-makers to build sufficiently detailed models, and for forecasters to test their arguments, we’d ideally have detailed research like Luca’s on most major topics where we collect forecasts — ideally from a few experts who disagree with each other. Unfortunately, this research often doesn’t readily exist, but we are investigating ways to generate it.
I have been somewhat surprised by how few experts in AI industry, AI policy, and other domains predict transformative impacts of AI similar to what are commonly discussed by AI lab leaders, people in the AI safety community, and others. This has made it harder to have a true horse-race between the ‘transformative AI’ school of thought that seems to drive a lot of discourse and decision-making vs. more gradual views of AI impacts. Though we have some transformative AI forecasters in our studies, in future work we aim to explicitly collect more forecasts from the ‘transformative AI’ school of thought in order to set up clearer comparisons between worldviews and to better anticipate what will happen if the ‘transformative AI’ school makes more accurate forecasts.
I will save other thoughts on how forecasting, and FRI’s research, could be made more useful to decision-makers for a future post.
But, to be clear: I have a lot of genuine uncertainty about whether forecasting research will be sufficiently impactful going forward. There are promising signs, and increasing momentum, but to more fully deliver on its promise, more improvements will be necessary.
Some notes on FRI-style forecasting research vs. other forecasting interventions
On the value of FRI-style forecasting research in particular:
Prediction markets do not have good ways to collect causal policy forecasts, but in our experience, conditional policy forecasts (e.g., how much would various safeguards reduce AI-cyber risk) are often the most helpful forecasts for decision-makers.
Similarly, prediction markets do not create good incentives for longer run forecasts or low-probability forecasts, and incentivize against sharing the rationales behind forecasts. Directly paying and incentivizing relevant experts and forecasters to answer questions is often more useful.
Typical forecasting platforms do not get forecasts from the kinds of experts that policymakers typically rely on, and aren’t the kind of evidence that can easily be cited in government reports. (This may be unfortunate, but it is the current state of the world.)
Reasons for optimism about future impact
Finally, there are a few factors that have the potential to dramatically change the field going forward:
It looks like AI may soon make it >100x cheaper and faster to get high-quality forecasts on any topic of one’s choosing. Policy researchers will be able to ask the precise question they’re interested in, will be able to upload confidential documents to inform forecasts (something we’ve heard is especially important to decision-makers), and will be able to get detailed explanations for all forecasts. AI-produced forecasts will also be much easier to test for accuracy due to the volume of forecasts they can provide, and it will be easier to generate ‘crux’ questions since AI will not get bored of producing huge numbers of conditional forecasts (which are necessary for identifying cruxes). Building benchmarks and tooling to harness AI-produced forecasts will be a much larger part of our work going forward.
The next few years seem very unusual in human history: very thoughtful researchers are predicting “Superhuman Coders” by 2029, with attendant large impacts. There is a spectrum of views, but the scope for disagreement among reasonable people about what the world will look like in 2030 is huge. This is a particularly important time to make predictions testable, update on what we observe, and make better policy and personal decisions on the basis of this information.
People working in the AI space seem particularly interested in using forecasting, perhaps due to a mix of being quantitatively oriented and because they’re facing unusual degrees of uncertainty. This bodes well for forecasting being useful in the coming years. More minorly, it appears that there is a broader cultural change around forecasting-related topics. Prediction markets are increasingly being cited by government officials, and the public is paying more attention to them than ever before. Much of the impact for prediction markets specifically seems negative (e.g. via incentivizing gambling on low-value topics), but the broader cultural shift suggests there may be an opportunity for better uses of forecasting to enter public consciousness as well.
I think this post significantly overstates its conclusion and is plausibly poorly calibrated on the relative value of forecasting.
My main “directional” issues with the post as it’s currently written:
I think it overstates the amount of funding devoted to forecasting on a “worldview” basis.
Most forecasting funding is (iiuc) not going to neartermist causes or particularly fungible with neartermist causes, so pointing to a bunch of neartermist causes to justify better funding options seems irrelevant.
From my perspective, it seems like:
Within Animal welfare fungible money, very little goes into forecasting e.g. less than $2M per year
Tbh—I would probably prefer that more money went into some kinds of forecasting on the margin. For example, I think that people are generally too bullish on clean meat, and Linch/Open Phil’s work investigating the difficulty of clean meat has plausibly resulted in better allocation of millions of dollars because there are, in fact, good alternatives (like cage-free campaigns).
Within Longtermist/AI fungible money, maybe $10M/year goes into forecasting, which seems pretty reasonable to me but i think to get to 10M you need to be including projects that seems very promising to me for different reasons to mainstream forecasting infrastructure e.g. AI 2027, METR.
I think the strongest version of the argument would be attacking AI evals but I’m unsure about whether those are in-scope for this post—my impression is that evals are useful for forecasting capabilities are a pretty great bet relative to other funding opportunities within the AI space.
So the argument actually seems to be “longtermist funding is not as cost-effective as neartermist funding” which is not totally unreasonable, but clearly needs to engage with the long/neartermist worldview (e.g. moral size of the future) as opposed to just engaging with tangible short-term impact indicators.
I’m less convinced than the OP that funders in particular are overrating forecasting—I just don’t see much effort going into forecasting grantmaking compared to ~every other grantmaking area.
My impression is that a lot of forecasting dollars are funded by organisations that are incentivised to use the money well (e.g. AI companies paying FRI to produce forecasts around safety and capability evaluations for safety planning). I see that others have weighed in on this already so not planning to elaborate on this more.
I agree with some of the post’s vibes and think it’s pointing at real cultural traits of rationalist communities. Though tbh, I think OP is too bearish on the usefulness of betting/making falsifiable predictions for people in EA-spaces. I suspect that OP seen lots of people getting very distracted by futarchy/manifold etc. (and I do think this is a risk), but culturally I think EA should be pretty into “betting/making falsifiable predictions” and that cluster of epistemic traits AND I think forecasting infrastructure has a meaningful effect on this. E.g In two office spaces (out of three that I’ve spent substantial time in), I think Manifold/prediction markets have very clearly made the communities more forecasting-y, and this has had tangible effects on people’s research/choice of projects—this is probably the most explicit example, though most changes are harder to hyperlink.
(Caveat: Slightly self-promoting, sorry, but I hope it’s germane/helpful.) By the way, on the animal welfare forecasting front, see Support Metaculus’ First Animal-Focused Forecasting Tournament and Rethinking the Future of Cultured Meat: An Unjournal Evaluation. I’d leave room for some doubt as to whether the “clean meat forecasting” work led to updates in the right direction.
We’re trying to take the next steps on this with a workshop involving some belief elicitation and forecasting (workshop page, belief elicitation page).
Hi Marcus. Thanks for the post. I broadly agree.
Coefficient Giving’s (CG’s) Forecasting Fund has recently been closed.
I think this is more likely to make forecasting grants useful. They will presumably be assessed with the criteria used to evaluate the non-forecasting grants of the respective fund.
@NunoSempere wrote about the end of CG’s Forecasting Fund in the last edition of the Forecasting Newsletter. Only paid subscribers can check the relevant section.
Right.
I’m not a paid sub to Nuno so I can’t see.
I had this post in my drafts for 3 years. I was happy to see the Forecasting Fund close down, I don’t expect we will see less than $5M of forecasting grants done by CG in 2026 or 2027 though.
I don’t have an opinion on if I would rather the forecasting grants be made within or separate from the forecasting fund so long as the grants are still being made. I see pros and cons.
I think Holly’s post you linked is awesome and is in my Mount Rushmore of posts (top 4 all time).
What are the other 3 on your Mount Rushmore?
Me neither.
CG’s Forecasting Fund granted 15.9 M$ in 2025.
I work at Founders Pledge, which has made many forecasting-related grants, some of them quite recently. Like Marcus, I’ve been fairly successful at forecasting — I am a so-called superforecaster a — but have a fair amount of skepticism. My views here are personal ones, not FP’s.
I have some agreements and disagreements with this post. The main point of agreement I have is with Marcus’ “vibe” here: I think forecasting’s apparent status and prominence among EAs outstrip either its prima facie promisingness or the to-date empirical support for its use.
I’m not sure that I agree that too much has been spent on forecasting, and I definitely don’t agree that enough time has passed that we’d know by now whether this work has been useful. We’re talking about a very short period of time here.
I think we’re at risk of conflating a bunch of different kinds of forecasting work:
Investments in calibration: Funding new techniques or experiments in more effective forecasting
Investments in diffusion: Broadly, attempts to “make forecasting a thing” by supporting e.g. new platforms
Investments in capacity: Attempts to propagate or institutionalize formal forecasting at influential institutions
Investments in public goods: Supporting institutions that do good forecasting and which the broader EA ecosystem finds useful.
I hope it’s clear that these are very different kinds of effort and should be considered differently promising. One fairly strongly held view I have is that further investments in precise calibration are probably not worthwhile: as far as I know, there are no consequential institutions that are able to usefully differentiate between the ways they’d respond to a 63% forecast vs a 65% one.
Finance, of course, could benefit from such an edge. But here’s where I find Marcus’ vibe most compelling: if this were really so useful at the moment, then good human forecasters would be much better-paid.
At this point, I think it’s critical to draw another distinction. In funding forecasting work, effective giving orgs are essentially trying to purchase an outcome. I think the best case for forecasting work is that we’re not trying to purchase well-calibrated forecasts but rather institutional forms that generate well-calibrated forecasts.
I investigated FP’s most recent large grant in this space, which seeded a forecasting practice at an international security-focused think tank. Almost everyone I spoke to for that investigation viewed good forecasts as something like an incidental side-effect of the process required to generate them: generating useful questions, formally surfacing critical disagreements, identifying critical paths, decomposing reasoning, generating anchors that can be updated as events progress, making individuals’ judgment intercomparable.
As forecasting proponents have been arguing for years, judgmental forecasting is something like “institutionalized good judgment” — Brier scores are sort of like an OKR for org epistemics. And if you talk to people at the kinds of institutions where EAs are enthusiastic to see forecasting implemented, you’ll find either (a) an eagerness to see these kinds of norms and guardrails put in place or (b) an epistemic posture that makes the need for these guardrails self-evident.
My overall feeling here is one of sympathy for Marcus’ view: I think there is a there there, but I agree that EAs’ native enthusiasm for this kind of work has outrun our rigorous thinking about its usefulness, and I think we could probably use more discipline in that regard.
I tend to agree with the OP, but think there are a couple of other points about subsidising prediction markets which could have had more emphasis
Forecasts are a market in which people trade money, which makes it easy for them to function on a for profit model if there is significant interest in participation. Even if prediction markets are objectively highly valuable, it is not clear there is sufficient altruism-relevant benefit in forecasting quality coming from subsidised rather than non-subsidised platforms to justify the subsidy [1]
Forecasting for profit is zero sum,[2] which means every superforecaster is balanced out by an equal and opposite amount of money collectively lost by others who are less “well calibrated”. Many people are perhaps happy to net lose money gambled for entertainment or signalling purposes (though perhaps they could part with their cash in other ways which deliver more positive outcomes...), but others may be developing gambling habits which can be extremely self destructive[3]. I guess this links to Marcus’ “feels like doing something useful where it isn’t” point, but it can be much worse than simply a distraction. Negative externalities can be significant, and it is unclear if the positive externalities outweigh them.
I guess without a platform cut/spread you get marginally more precision, but how many forecasts actually need that precision and are sufficiently liquid to get it?
actually worse than zero sum on a for-profit exchange, obviously...
many forms of traditional gambling relies heavily on “whales” with a mixture of non trivial amounts of money to lose and impulse control problems for much of their volume and profit; some of them ruin their lives doing so, even more so the people with the same impulse control problems and less starting money. This may not apply to niche prediction markets, but I’m sure people can become addicted to the idea of winning their money back even if they know casino “betting systems” are -EV and don’t like sports or machines with flashing lights
For Kalshi specifically, it seems to have essentially become a backdoor to deregulate sports gambling in every US state. The mass deregulation of gambling in the US this decade feels harmful and like something we’ll probably really regret (legalisation seems fine but not like this).
It doesn’t seem popular to criticise the gambling aspects of prediction markets here, but it does seem strange to me that EAs seem to care a lot about reducing harms from tobacco and alcohol, but seem indifferent to gambling.
Interesting, is sports betting plausibly as bad as tobacco/alcohol in low-income countries?
Like, I think sports betting is plausibly one of the “worst businesses” for the US, comparable to alcohol/tobacco—but my impression is that the EAs that care about tobacco/alcohol don’t care very much about interventions in high-income countries relative to low-income countries.
BOTEC: 4.2 percent of suicides in the state of Victoria in Australia were gambling-related. 19.4 percent of suicides in Hong Kong were gambling-related. 720,000 suicides happen every year. Let’s say 10 percent of all suicides globally are related to gambling. That would be 72,000 gambling-related suicides.
Tobacco causes over 7 million deaths per year, while alcohol kills 2.6 million people per year.
Gambling interventions could be cost-effective, in some situations. Especially for larger countries yet to liberalize gambling/betting (like Brazil, India)? Also, now might be a window to lobby for stricter regulations on prediction markets.
Random fun fact: Indonesia has banned gambling, but is the only country where tobacco advertising is still legal.
Random tangent: tobacco advertising also legal in Andorra
I don’t have any data to back this up but I wouldn’t be surprised if in Sub-Saharan Africa sports betting was worse than tobacco and on par with alcohol, it being incredibly widespread and normalized.
Interesting point, and I suspect that there are lower hanging fruit in gambling too: some of the most addictive forms (e.g. fixed odds betting terminals in the UK) are not some intrinsic part of the culture but relatively new innovations promoted by a mere handful of companies and so regulators are much happier grappling with them (in the case of fixed odds betting terminals, I suspect restrictions on how much could be bet per spin were even more popular with the people who actually use them on a regular basis than the wider public!).
In that respect it’s perhaps a lot like animal rights activism requires campaigns focused on winnable battles to be effective (and it might appeal to some people that already have that lobbying skillset)
My belief/experience suggests that the sorts of prediction markets that are profitable and entertaining are not likely to be the ones that are going to be particularly informative to globally impactful/EA funding and policy choices.
(But at the same time, I guess it’s the case that some of the EA/rationalist support for prediction platforms has ended up leading to these entertaining but not socially valuable things.)
I have at least three reasons to be hopeful:
I see forecasting catching on with researchers for experimental design, which could easily save a lot of money and help make more progress. Earlier this month we updated a working paper on forecasts using data from the Social Science Prediction Platform to explicitly include results demonstrating use in power calculations. If a year from now forecasts are used a similar amount to now in economics research then that would be evidence for your hypothesis but from my perspective the concept that forecasts could be used in this way has only just started to be socialized, at least in my field. I also personally know of at least a couple of large institutions seeking forecasts and am planning a RCT on how they affect decision-making in the field.
I think LLMs are making forecasting much cheaper and easier.
If humans don’t take up the use of forecasts in decision-making as much as they “should”, well, LLMs may be more likely to in their own pipelines.
That’s not to say that every project previously funded around forecasting was a good use of money. I would probably agree with you regarding most of the projects you have in mind, while disagreeing with the title and framing which is way too broad.
I definitely have my own gripes about EA/rationalist attitudes towards forecasting (see here), but maybe your objection is a level confusion:
I think when people talk about “leveraging forecasting into better decisions”, they’re saying: “‘Better’ decisions just are decisions guided by the normatively correct beliefs. Namely, they’re decisions that make reasonable-seeming tradeoffs between possible outcomes given the normatively correct beliefs about the plausibility of those outcomes. So our decisions will be more aligned with this standard of ‘better’ if our beliefs are formed by deferring to well-calibrated forecasts.”
E.g. they’re saying, “When navigating AI risk, we’ll make decisions that we endorse more if those decisions are guided by the credences of folks who’ve been unusually successful at forecasting AI developments.”
(At least, that’s the steelman. Maybe I’m being too charitable!)
Whereas you seem to be asking something like: “We already know which beliefs are reasonable. Do these beliefs tell us that ‘plug forecasts into some decision-making procedure’ seems likely to lead to good outcomes (i.e., that this is a ‘useful tool’)?”
(My gripes discussed in the linked post above, FWIW: Re: (1), the typical EA operationalization of “well-calibrated”, and judgments about how to defer to people based on their calibration on some reference class of past questions, are based on very questionable epistemological assumptions. See also this great post.)
I suspect that the main use of forecasting is if you need a probability for something and you don’t really have time to look into it yourself or you wouldn’t trust your judgement even if you did.
I think this is great and makes sense, but this isn’t where 90 percent of the money is going.
Sort of, but that also doesn’t capture the significant accuracy and efficiency benefits the process of structured reasoning and communication that forecasting enables. There’s substantial risks and issues of “just looking into an issue yourself”—especially when you are more confident in your judgement (because that’s a clear risk of confirmation bias/overconfidence).
The main use of forecasting is in utilising the core scientific benefits it can bring as above into, to help real world decision makers. But fundamentally, that hasn’t been funded—instead we’ve funded tournaments and research.
Related comment I made 2 years ago and ensuing discussion: https://forum.effectivealtruism.org/posts/ziSEnEg4j8nFvhcni/new-open-philanthropy-grantmaking-program-forecasting?commentId=7cDWRrv57kivL5sCQ
I think one should distinguish between several things here:
Prediction markets that make a lot of $ and don’t really need more because they do just fine with the profit motive
People spending a lot of time on prediction markets to prove they are a good forecaster
Infrastructure integrated into specific use cases, such as: when a funder is interested in a question so much that they will pay for forecasts, to inform and improve their other funding, or when for structural reasons the institution that has reason to benefit from the forecasts cannot fund them itself (such as some other decision-makers who do not have the mandate to support forecasting and are restricted from spending money to support it but would use forecasts in their work flow), or basic research with positive externalities.
This post really belabours the first and second bullet point, perhaps because that is where a lot of money has gone to, but there can be a lot of value in the third.
“I don’t think there is very much in the way of “this forecasting happened, and now we have made demonstrably better decisions regarding this terminal goal that we care about”.”
I assume some people disagree with this strong claim. One example I’ve heard was AGI timelines and their influence on AI safety field priorities—though I guess one could answer that certain reports or expert opinions where disproportionately more useful than prediction markets.
On a different point, I appreciated Eli Lifland’s past comment on many intellectual activities (such as grantmaking) being forms of forecasting.
This isn’t something I’ve thought a ton about but I think forecasting should plausibly still receive funding in a specific way:
Funders should either pay forecasters to make predictions on important questions, or subsidize prediction markets on those questions.
I don’t think forecasting is a “solution seeking a problem.” There are tons of important but hard-to-predict questions that I’d like better forecasts on! The problem is that the ecosystem hasn’t done a great job of turning dollars into good forecasts.
For example, most of my Metaculus questions are things I wanted answers to, but I tended not to update on the results because the questions usually don’t receive a lot of forecasts. If someone wanted to pay money to get more predictions on questions, I’d learn something useful!
I’m not sure how valuable this is compared to other uses of money (I wouldn’t pay for it myself) but at least it’s better than more general-purpose research on forecasting.
The problem is that prediction markets on useful questions, many years out, suffer from problems due to capital lockup/interest rates, among other things.
Also, I just don’t think the wisdom of the crowds emerges as much as you’d want. I think you can just ask 3 smart people what they think, and this will elicit more useful info.
The wisdom of crowds effect kicks in with very few forecasts. In the working paper I cite elsewhere in the comments, even 5 forecasts gets you pretty far along into the WoC effect, and 10 even more so. This is for asking people what they think, not prediction markets—the latter should, theoretically, require more forecasts, since seeing the implicit beliefs of others through the market price could lead to herding etc. But the wisdom of crowds effect kicking in for very small N is well established in the literature.
I have a different takeaway as you, though, that we only know about this effect—or about the biases people have and how to adjust their forecasts—because of work on forecasting. I don’t know how we’d know this stylized fact without work on it. For the wisdom of the crowds effect specifically, perhaps you could stop funding early since that one is well known, but it’s sufficiently surprising to most people that there could be value in showing it for more domains, and it is really just one example of what we learn more generally from research on forecasting—and these other results on how to optimally weigh forecasts can shrink error much more even after taking the wisdom of the crowds effect into consideration. (In our work, WoC gets you a ~60% reduction in the MSE, but other small adjustments lead to an improvement of an additional ~60% reduction in error compared to the WoC estimate, and those aren’t even all the improvements we can make.)
Today I would never run an experiment without using forecasts to help with power calculations. And there is very recent work I’d use to adjust those forecasts, and we’re collectively not near the optimum in terms of learning what we can learn to make more accurate forecasts or integrating them into workflows. As I said elsewhere in the comments, the claims in the OP are far too strong. Even your asking a few experts—that’s something that could be improved on and integrated into workflows and is part of the titular “forecasting”. (It reads to me kind of like: don’t do forecasting, do this other thing which is itself forecasting and is informed by and improved upon by… forecasting.)
A more defensible claim imo would be that there are some projects that are self-supporting and those should not be funded, or that in some but not all cases if the market doesn’t pay for it then it’s not valuable (abstracting from coordination failures and other market failures, or the externalities of basic research).
I think “your mileage may vary” quite a lot in this. In the context of the social science prediction market, you tend to be asking people who have expertise and familiarity with the methods and context, and sometimes more experienced than the people posting the questions.
On the other hand, if you post detailed technical questions on a mainstream prediction market or even on Metaculus, I expect / have the sense that you don’t get much of this’ wisdom of the crowds dividend’.
I agree with some of this. But let me attempt a conciliatory take: less of forecasting money and effort should go to platforms and tournaments, but more should go to identifying existing, nascent forecasts (people using the word “probably” or “unlikely” about empirical matters) and creating markets (even unsubsidized Manifold markets would be helpful on the margin). I think it would be very helpful for someone to go through popular EA forum posts and org research documents and do this systematically.
I started something in this direction here.
But I am also a bit skeptical that creating lots of unsubsidized markets would generate much positive information. My evidence/experience suggests that most people who are involved in these matters, don’t want to do a substantial amount of research into this sort of very nuanced, detailed questions that are the highest value, so the small amount of predictions you get might just be noise.
Prediction markets seem to be a great business (mostly gambling with all the problems associated with it) so “funding” in the sense of investing in them could be sensible while “funding” in the donation sense not. (And then later donation to AMF or similar).
In general, I’m hesitant to donate to stuff that’s plausibly just a really good business in its own right.
Fair, but cultivating tools used for prediction markets is only a part of this forecasting research funding. And the sorts of questions that EAs want to get predictions over (e.g., number of chickens and cages per year with versus without the production of cell cultured meat) are unlikely to be part of a popular mainstream prediction market.
[COI: I work at the Swift Centre as a forecaster, I have worked for a prediction market, I am very involved in forecasting. It is not my current work however, which is on community notes]
A few things points attempting to say things other commenters haven’t, though I largely agree with the critical comments:
I agree that the $100M doesn’t seem super well allocated. Not because forecasting is useless, but because the money flowed to big institutions and platforms rather than smaller, weirder, mechanism-design bets. I like Metaculus, but it has absorbed a lot of money in the last 5 years and not clearly changed much. I don’t know if I think FRI has been worth it, I am glad someone has done the research but, again, how much are we talking? I would have preferred smaller projects were funded on the margin. Coefficient’s strategy in forecasting has felt poor to me, often ignoring the community who in my view come up with the most interesting projects and going for marginal spending on incumbents.
Nobody funds mechanism design or institutional epistemics. I recently spoke to someone at a household name enormous tech company who described their institutional process. It was almost unbelievably dysfunctional to me. Who is funding the work to help institutions think better? It doesn’t promise near-term wins and frankly should’t be the priority of any non-research org. So basically no one. Forecasting is an attempt. How much value is there in the joint stock company, or in democracy. To me, that’s what we are talking about. Figuring out fundamentally better ways of making decisions. It is a problem at scale, it is neglected, and given the deregulation of prediction markets, tractable (though maybe bad, more later).
On “feels useful when it isn’t” (point 6). I don’t entirely disagree. I deliberately try not to spend time forecasting unless I’m being paid to. It can be a distraction. Where I disagree is that some forecasting is genuinely mentally sharpening, at least for me as a thinking discipline. And I think it’s a not unreasonable status hierarchy. Do I endorse the status that Peter Wildeford or Eli Leifland have gotten from forecasting? Yes. Frankly, who do I not endorse having got status from being a forecaster?
Why don’t AI 2027 and Ajeya count? Tangible forecasting outputs that demonstrably moved discourse and decision-making. Why don’t these count as valuable forecasting outputs. AI 2027 is clearly informed by judgemental forecasters and was read by (I think) the Vice President. Habryka said something like ‘too much time has been wasted down the resolution criteria mines’ and I disagree but even if one agrees, I’m not sure even he thinks the whole field is a waste of time.
Prediction markets may be net-harmful, but not useless. I’ve said publicly I’m less sure PMs are net-positive — bankruptcies and intimate partner violence are real and huge problems that may be as large as any coordinaton benefits. But ‘bad on net’ and ‘useless’ are different claims, and the later seems more obviously incorrect to me. I would be more interested in a post entitled “EA forecasting efforts have caused massive harm”.
I’m reconsidering this point. It seems intuitive, but what is the strongest argument that this is “wrong”?
Off topic, but one additional thing I noticed about this list:
Is the glaring lack of tangible advances in technical AI safety. It’s a different case from your post, as it’s about a problem rather than a tool; but I think it still shows something about whether we understand the dangers of AI and the systems they stem from enough to do anything about it.
I like bets involving donations, and investments as alternatives to forecasting without money on the line.
That’s still a sort of game/cultural thing rather than a means for more positive impact, though. I’ve seen that around EA basically forever, but I don’t think people who bet on their beliefs have been “more right” than those who don’t.
Hi Guy. The bets would be directly beneficial if people who are more accurate donate to more cost-effective interventions? In addition, I wonder whether the discussions of bets involving donations, and investments could have higher quality than ones of forecasting questions without money on the line. The prospects of winning or losing money usually leads to people investigating their views more.
That seems to be a general cultural view in EA, but what I’m saying is that I’ve yet to see any evidence these bets actually help. I think the notion is unfounded.
This is widely believed to be true outside effective altruism too.