elifland

Karma: 2,962

You can give me anonymous feedback here. I often change my mind and don’t necessarily endorse past writings.

elifland 25 Apr 2026 23:29 UTC
12 points
1 ∶ 0
on: Forecasting is Way Overrated, and We Should Stop Funding It
Related comment I made 2 years ago and ensuing discussion: https://forum.effectivealtruism.org/posts/ziSEnEg4j8nFvhcni/new-open-philanthropy-grantmaking-program-forecasting?commentId=7cDWRrv57kivL5sCQ

elifland 23 Mar 2026 16:32 UTC
6 points
0 ∶ 0
in reply to: NickLaing’s comment on: Broad Timelines
To be clear, I agree that we should make comms decisions based on what we think the effects will be, I wasn’t using “feel” intending to mean otherwise.

elifland 23 Mar 2026 4:22 UTC
7 points
1 ∶ 1
in reply to: NickLaing’s comment on: Broad Timelines
I’m still confused by why they picked 2027 even in 2025. Back when they made it, Daniel’s median forecast was 2028 and Eli’s 2031. Surely you then pick 2029 or 2030 for your scenario? Picking the “most likely year for it to happen” still feels a bit disingenous to me.
I’m not sure why picking the mode feels disingenuous to you, it feels fine to me as long as it’s between roughly 15th and 85th percentiles and you are transparent about it.
The causual history of why it was 2027 is that this was Daniel’s median when we started writing it, and it would have been a lot of work to rewrite our near-final draft to make it 2028 after Daniel changed his view. The reason it was based off of Daniel’s view and not other authors’ is that AI 2027 was ultimately supposed to represent his view rather than amalgam that include others’ views. Giving a single person final say seems better than design by committee. That said, we had few strong disagreements.
The other authors considered 2027 plausible enough and close enough to a modal scenario that they (including me) felt happy to help with the project.
Edit: Daniel discusses his perspective here
Edit 2: It probably would have been reasonable for me to push for the timelines to be in between our views rather than Daniel’s. I didn’t really consider it because I thought 2027 was plausible enough and Daniel was leading the project. I think I also gave some weight to Linch’s point about it being important to communicate that things could get crazy very soon, but I’m not sure if this was cruxy. However, memetic fitness wasn’t an (explici)t consideration.

elifland 3 Oct 2025 16:06 UTC
27 points
8 ∶ 0
in reply to: Ajeya’s comment on: Ajeya’s Quick takes
Is the 1-3% x-risk from bio including bio catastrophes mediated by AI (via misuse and/or misalignment? Is it taking into account ASI timelines?
Also, just comparing % x-risk seems to miss out on the value of shaping AI upside / better futures, s-risks + acausal stuff, etc. (also are you counting ai-enabled coups / concentration of power?). And relatedly the general heuristic of working on the thing that will be the dominant determinant of the future once developed (and which might be developed soon).

elifland 6 Aug 2025 16:30 UTC
8 points
1 ∶ 0
in reply to: Arepo’s comment on: Should we aim for flourishing over mere survival? The Better Futures series.
There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them.
I don’t think this has been established. See here

elifland 21 Jun 2025 15:52 UTC
21 points
4 ∶ 0
in reply to: Gregory Lewis🔸’s comment on: A deep critique of AI 2027’s bad timeline models
I would advise looking into plans that are robust to extreme uncertainty in how AI actually goes, and avoid actions that could blow up in your face if you turn out to be badly wrong.
Seeing you highlight this now it occurs to me that I basically agree with this w.r.t. AI timelines (at least on one plausible interpretation, my guess is that titotal could have a different meaning in mind). I mostly don’t think people should take actions that blow up in their face if timelines are long (there are some exceptions, but overall I think long timelines are plausible and actions should be taken with that in mind).
A key thing that titotal doesn’t mention is how much probability mass they put on short timelines like, say, AGI by 2030. This seems very important for weighing various actions, even though we both agree that we should also be prepared for longer timelines.
In general, I feel like executing plans that are robust to extreme uncertainty is a prescription that is hard to follow without having at least a vague idea of the distribution of likelihood of various possibilities.

elifland 19 Jun 2025 18:22 UTC
92 points
10 ∶ 6
on: A deep critique of AI 2027’s bad timeline models
(edit: here is a more comprehensive response)

Thanks titotal for taking the time to dig deep into our model and write up your thoughts, it’s much appreciated. This comment speaks for Daniel Kokotajlo and me, not necessarily any of the other authors on the timelines forecast or AI 2027. It addresses most but not all of titotal’s post.
Overall view: titotal pointed out a few mistakes and communication issues which we will mostly fix. We are therefore going to give titotal a $500 bounty to represent our appreciation. However, we continue to disagree on the core points regarding whether the model’s takeaways are valid and whether it was reasonable to publish a model with this level of polish. We think titotal’s critiques aren’t strong enough to overturn the core conclusion that superhuman coders by 2027 are a serious possibility, ~~nor to significantly move our overall median~~ (edit: I now think it’s plausible that changes made as a result of titotal’s critique will move our median significantly). Moreover, we continue to think that AI 2027’s timelines forecast is (unfortunately) the world’s state-of-the-art, and challenge others to do better. If instead of surpassing us, people simply want to offer us critiques, that’s helpful too; we hope to surpass ourselves every year in part by incorporating and responding to such critiques.
Clarification regarding the updated model
My apologies about quietly updating the timelines forecast with an update without announcing it; we are aiming to announce it soon. I’m glad that titotal was able to see it.
A few clarifications:
1. titotal says “it predicts years longer timescales than the AI2027 short story anyway.” While the medians are indeed 2029 and 2030, the models still give ~25-40% to superhuman coders by the end of 2027.
2. Other team members (e.g. Daniel K) haven’t reviewed the updated model in depth, and have not integrated it into their overall views. Daniel is planning to do this soon, and will publish a blog post about it when he does.
Most important disagreements
I’ll let titotal correct us if we misrepresent them on any of this.
1. Whether to estimate and model dynamics for which we don’t have empirical data. e.g. titotal says there is “very little empirical validation of the model,” and especially criticizes the modeling of superexponentiality as having no empirical backing. We agree that it would be great to have more empirical validation of more of the model components, but unfortunately that’s not feasible at the moment while incorporating all of the highly relevant factors.^[1]
  1. Whether to adjust our estimates based on factors outside the data. For example, titotal criticizes us for making judgmental forecasts for the date of RE-Bench saturation, rather than plugging in the logistic fit. I’m strongly in favor of allowing intuitive adjustments on top of quantitative modeling when estimating parameters.
2. [Unsure about level of disagreement] The value of a “least bad” timelines model. While the model is certainly imperfect due to limited time and the inherent difficulties around forecasting AGI timelines, we still think overall it’s the “least bad” timelines model out there and it’s the model that features most prominently in my overall timelines views. I think titotal disagrees, though I’m not sure which one they consider least bad (perhaps METR’s simpler one in their time horizon paper?). But even if titotal agreed that ours was “least bad,” my sense is that they might still be much more negative on it than us. Some reasons I’m excited about publishing a least bad model:
  1. Reasoning transparency. We wanted to justify the timelines in AI 2027, given limited time. We think it’s valuable to be transparent about where our estimates come from even if the modeling is flawed in significant ways. Additionally, it allows others like titotal to critique it.
  2. Advancing the state of the art. Even if a model is flawed, it seems best to publish to inform others’ opinions and to allow others to build on top of it.
3. The likelihood of time horizon growth being superexponential, before accounting for AI R&D automation. See this section for our arguments in favor of superexponentiallity being plausible, and titotal’s responses (I put it at 45% in our original model). This comment thread has further discussion. If you are very confident in no inherent superexponentiality, superhuman coders by end of 2027 become significantly less likely, though are still >10% if you agree with the rest of our modeling choices (see here for a side-by-side graph generated from my latest model).
  1. How strongly superexponential the progress would be. This section argues that our choice of superexponential function is arbitrary. While we agree that the choice is fairly arbitrary and ideally we would have uncertainty over the best function, my intuition is that titotal’s proposed alternative curve feels less plausible than the one we use in the report, conditional on some level of superexponentiality.
  2. Whether the argument for superexponentiality is stronger at higher time horizons. titotal is confused about why there would sometimes be a delayed superexponential rather than starting at the simulation starting point. The reasoning here is that the conceptual argument for superexponentiality is much stronger at higher time horizons (e.g. going from 100 to 1,000 years feels likely much easier than going from 1 to 10 days, while it’s less clear for 1 to 10 weeks vs. 1 to 10 days). It’s unclear that the delayed superexponential is the exact right way to model that, but it’s what I came up with for now.
Other disagreements
1. Intermediate speedups: Unfortunately we haven’t had the chance to dig deeply into this section of titotal’s critique, and it’s mostly based on the original version of the model rather than the updated one so we probably will not get to this. The speedup from including AI R&D automation seems pretty reasonable intuitively at the moment (you can see a side-by-side here).
2. RE-Bench logistic fit (section): We think it’s reasonable to set the ceiling of the logistic at wherever we think the maximum achievable performance would be. We don’t think it makes any sense to give weight to a fit that achieves a maximum of 0.5 when we know reference solutions achieve 1.0 and we also have reason to believe it’s possible to get substantially higher. We agree that we are making a guess (or with more positive connotation, “estimate”) about the maximum score, but it seems better than the alternative of doing no fit.
Mistakes that titotal pointed out
1. We agree that the graph we’ve tweeted is not closely representative of the typical trajectory of our timelines model conditional on superhuman coders in March 2027. Sorry about that, we should have prioritized making it more precisely faithful to the model. We will fix this in future communications.
2. They convinced us to remove the public vs. internal argument as a consideration in favor of superexponentiality (section).
3. We like the analysis done regarding the inconsistency of the RE-Bench saturation forecasts with an interpolation of the time horizons progression. We agree that it’s plausible that we should just not have RE-Bench in the benchmarks and gaps model; this is partially an artifact of a version of the model that existed before the METR time horizons paper.
In accordance with our bounties program, we will award $500 to titotal for pointing these out.
Communication issues
There were several issues with communication that titotal pointed out which we agree should be clarified, and we will do so. These issues arose from lack of polish rather than malice. 2 of the most important ones:
1. The “exponential” time horizon case still has superexponential growth once you account for automation of AI R&D.
2. The forecasts for RE-Bench saturation were adjusted based on other factors on top of the logistic fit.
1. ^
  Relatedly, titotal thinks that we made our model too complicated, while I think it’s important to make our best guess for how each relevant factor affects our forecast.
What links here?
- Exaggerating the risks (Part 20: AI 2027 timelines forecast, benchmarks and gaps) | Reflective Altruism by Unofficial Reflective Altruism Cross-Poster (1 Jan 2026 23:32 UTC; 14 points)
- Ben_West🔸's comment on Please reconsider your use of adjectives by Alfredo Parra 🔸 (30 Jun 2025 4:19 UTC; 7 points)

elifland 20 Nov 2024 19:26 UTC
17 points
4 ∶ 0
on: Where I Am Donating in 2024
Centre for the Governance of AI does alignment research and policy research. It appears to focus primarily on the former, which, as I’ve discussed, I’m not as optimistic about. (And I don’t like policy research as much as policy advocacy.)
I’m confused, the claim here is that GovAI does more technical alignment than policy research?

elifland 5 Jul 2024 17:17 UTC
11 points
2 ∶ 0
in reply to: titotal’s comment on: titotal’s Quick takes
Would you be interested in making quantitative predictions on the revenue of OpenAI/Anthropic in upcoming years, and/or when various benchmarks like these will be saturated (and OSWorld, released since that series was created), and/or when various Preparedness/ASL levels will be triggered?

elifland 25 Jun 2024 19:20 UTC
2 points
0 ∶ 0
on: Announcing the AI Forecasting Benchmark Series | July 8, $120k in Prizes
Want to discuss bot-building with other competitors? We’ve set up a Discord channel just for this series. Join it here.
I get “Invite Invalid”

elifland 14 Jun 2024 2:36 UTC
20 points
5 ∶ 0
on: Announcing The Midas Project — and our first campaign (which you can help with!)
How did you decide to target Cognition?
IMO it makes much more sense to target AI developers who are training foundation models with huge amounts of compute. My understanding is that Cognition isn’t training foundation models, and is more of a “wrapper” in the sense that they are building on top of others’ foundation models to apply scaffolding, and/or fine-tuning with <~1% of the foundation model training compute. Correct me if I’m wrong.
Gesturing at some of the reasons I think that wrappers should be deprioritized:
1. Much of the risks from scheming AIs routes through internal AI R&D via internal foundation models
2. Over time, I’d guess that wrapper companies working on AI R&D-relevant tasks like Cognition either get acquired or fade into irrelevancy since there will be pressure to make AI R&D agents internally (maybe this campaign is still useful if it gets acquired though?)
3. Accelerating LM agent scaffolding has unclear sign for safety
Maybe the answer is that Cognition was way better than foundation model developers on other dimensions, in which case, fair enough.

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations

Linch12 Jun 2024 13:46 UTC

76 points

8 comments13 min readEA link

elifland 16 Apr 2024 16:04 UTC
3 points
5 ∶ 0
on: Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
Thanks for organizing this! Tentatively excited about work in this domain.

elifland 3 Apr 2024 16:04 UTC
2 points
0 ∶ 0
in reply to: BenjaminTereick’s comment on: The Rationale-Shaped Hole At The Heart Of Forecasting
I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don’t agree that forecasting by definition means that little effort was put into it!
Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?” If yes, it’s forecasting, if not, it’s something else.
Thanks for clarifying! Would you consider OpenPhil worldview investigations reports such Scheming AIs, Is power-seeking AI an existential risk, Bio Anchors, and Davidson’s takeoff model forecasting? It seems to me that they are forecasting in a relevant sense and (for all except Scheming AIs maybe?) the sense you describe of the rationale linked tightly to a numerical forecast, but wouldn’t fit under the OP forecasting program area (correct me if I’m wrong).
Maybe not worth spending too much time on these terminological disputes, perhaps the relevant question for the community is what the scope of your grantmaking program is. If indeed the months-year-long reports above wouldn’t be covered, then it seems to me that the amount of effort spent is a relevant dimension of what counts as “research with a forecast attached” vs. “forecasting as is generally understood in EA circles and would be covered under your program”. So it might be worth clarifying the boundaries there. If you indeed would consider reports like worldview investigations ones under your program, then never mind but good to clarify as I’d guess most would not guess that.

elifland 2 Apr 2024 18:58 UTC
7 points
0 ∶ 0
on: The Rationale-Shaped Hole At The Heart Of Forecasting
Thanks for writing this up, and I’m excited about FutureSearch! I agree with most of this, but I’m not sure framing it as more in-depth forecasting is the most natural given how people generally use the word forecasting in EA circles (i.e. associated with Tetlock-style superforecasting, often aggregation of very part-time forecasters’ views, etc.). It might be imo more natural to think of it as being a need for in-depth research, perhaps with a forecasting flavor. Here’s part of a comment I left on a draft.
However, I kind of think the framing of the essay is wrong [ETA: I might hedge wrong a bit if writing on EAF :p] in that it categorizes a thing as “forecasting” that I think is more naturally categorized as “research” to avoid confusion. See point (2)(a)(ii) at https://www.foxy-scout.com/forecasting-interventions/ ; basically I think calling “forecasting” anything where you slap a number on the end is confusing, because basically every intellectual task/decision can be framed as forecasting.

It feels like this essay is overall arguing that AI safety macrostrategy research is more important than AI safety superforecasting (and the superforecasting is what EAs mean when they say “forecasting”). I don’t think the distinction being pointed to here is necessarily whether you put a number at the end of your research project (though I think that’s usually useful as well), but the difference between deep research projects and Tetlock-style superforecasting.

I don’t think they are necessarily independent btw, they might be complementary (see https://www.foxy-scout.com/forecasting-interventions/ (6)(b)(ii) ), but I agree with you that the research is generally more important to focus on at the current margin.

[...] Like, it seems more intuitive to call https://arxiv.org/abs/2311.08379 a research project rather than forecasting project even though one of the conclusions is a forecast (because as you say, the vast majority of the value of that research doesn’t come from the number at the end).

elifland 25 Mar 2024 20:42 UTC
6 points
0 ∶ 0
on: Podcast: Is Forecasting a Promising EA Cause Area?
Thanks Ozzie for chatting! A few notes reflecting on places I think my arguments in the conversation were weak:
1. It’s unclear what short timelines would mean for AI-specific forecasting. If AI timelines are short it means you shouldn’t forecast non-AI things much, but it’s unclear what it means about forecasting AI stuff. There’s less time for effects to compound but you have more info and proximity to the most important decisions. It does discount non-AI forecasting a lot though, and some flavors of AI forecasting.
2. I also feel weird about the comparison I made between forecasting and waiting for things to happen in the world. There might be something to it, but I think it is valuable to force yourself to think deeply about what will happen, to help form better models of the world, in order to better interpret new events as they happen.

Podcast: Is Forecasting a Promising EA Cause Area?

Ozzie Gooen25 Mar 2024 20:36 UTC

29 points

3 comments1 min readEA link

elifland 11 Mar 2024 15:57 UTC
11 points
0 ∶ 0
in reply to: elifland’s comment on: New Open Philanthropy Grantmaking Program: Forecasting
Just chatted with @Ozzie Gooen about this and will hopefully release audio soon. I probably overstated a few things / gave a false impression of confidence in the parent in a few places (e.g., my tone was probably a little too harsh on non-AI-specific projects); hopefully the audio convo will give a more nuanced sense of my views. I’m also very interested in criticisms of my views and others sharing competing viewpoints.
Also want to emphasize the clarifications from my reply to Ozzie:
1. While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects (and am for the most part more pessimistic about forecasting than others in these groups/orgs). And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
2. I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
What links here?
- elifland's comment on New Open Philanthropy Grantmaking Program: Forecasting by Coefficient Giving (8 Mar 2024 5:52 UTC; 98 points)

elifland 9 Mar 2024 17:41 UTC
6 points
0 ∶ 0
in reply to: Ozzie Gooen’s comment on: New Open Philanthropy Grantmaking Program: Forecasting
Thanks Ozzie for sharing your thoughts!
A few things I want to clarify up front:
1. While I think it’s valuable to share thoughts about the value of different types of work candidly, I am very appreciative of both people working on forecasting projects and grantmakers in the space for their work trying to make the world a better place (and am friendly with many of them). As I maybe should have made more obvious, I am myself affiliated with Samotsvety Forecasting, and Sage which has done several forecasting projects. And I’m also doing AI forecasting research atm, though not the type that would be covered under the grantmaking program.
2. I’m not trying to claim with significant confidence that this program shouldn’t exist. I am trying to share my current views on the value of previous forecasting grants and the areas that seem most promising to me going forward. I’m also open to changing my mind on lots of this!
Thoughts on some of your bullet points:
2. I think that for further funding in this field to be exciting, funders should really work on designing/developing this field to emphasize the very best parts. The current median doesn’t seem great to me, but I think the potential has promise, and think that smart funding can really triple-down on the good stuff. I think it’s sort of unfair to compare forecasting funding (2024) to AI Safety funding (2024), as the latter has had much more time to become mature. This includes having better ideas for impact and attracting better people. I think that if funders just “funded the median projects”, then I’d expect the field to wind up in a similar place to it is now—but if funders can really optimize, then I’d expect them to be taking a decent-EV risk. (Decent chance of failure, but some chance at us having a much more exciting field in 3-10 years).
I was trying to compare previous OP forecasting funding to previous AI Safety. It’s not clear to me how different these were; sure, OP didn’t have a forecasting program but AI safety was also very short-staffed. And re: the field maturing idk Tetlock has been doing work on this for a long time, my impression is that AI safety also had very little effort going into it until like mid-late 2010s. I agree that funding of potentially promising exploratory approaches is good though.
3. I’d prefer funders focus on “increasing wisdom and intelligence” or “epistemic infrastructure” than on “forecasting specifically”. I think that the focus on forecasting is over-limiting. That said, I could see an argument to starting from a forecasting angle, as other interventions in “wisdom and intelligence / epistemic infrastructure” are more speculative.
Seems reasonable. I did like that post!
4. If I were deploying $50M here, I’d probably start out by heavily prioritizing prioritization work itself—work to better understand this area and what is exciting within it. (I explain more of this in the wisdom/intelligence post above). I generally think that there’s been way too little good investigation and prioritization work in this area.
Perhaps, but I think you gain a ton of info from actually trying to do stuff and iterating. I think prioritization work can sometimes seem more intuitively great than it ends up being, relative to the iteration strategy.
6. I’d like to flag that I think that Metaculus/Manifold/Samotsvety/etc forecasting has been valuable for EA decision-making. I’d hate to give this up or de-prioritize this sort of strategy.
I would love for this to be true! Am open to changing mind based on a compelling analysis.
7. I don’t particularly trust EA decision-making right now. It’s not that I think I could personally do better, but rather that we are making decisions about really big things, and I think we have a lot of reason for humility. When choosing between “trying to better figure out how to think and what to do” vs. “trying to maximize the global intervention that we currently think is highest-EV,” I’m nervous about us ignoring the former and going all-in on the latter. That said, some of the crux might be that I’m less certain about our current marginal AI Safety interventions than I think Eli is.
There might be some difference in perceptions of the direct EV of marginal AI Safety interventions. There might also be differences in beliefs in the value of (a) prioritization research vs. (b) trying things out and iterating, as described above (perhaps we disagree on absolute value of both (a) and (b)).
8. Personally, around forecasting, I’m most excited about ambitious, software-heavy proposals. I imagine that AI will be a major part of any compelling story here.
Seems reasonable, though I’d guess we have different views on which ambitious AI-related software-heavy projects.
9. I’d also quickly flag that around AI Safety—I agree that in some ways AI safety is very promising right now. There seems to have been a ton of great talent brought in recently, so there are some excellent people (at very least) to give funding to. I think it’s very unfortunate how small the technical AI safety grantmaking team is at OP. Personally I’d hope that this team could quickly get to 5-30 full time equivalents. However, I don’t think this needs to come at the expense of (much) forecasting/epistemics grantmaking capacity.
I think you might be understating how fungible OpenPhil’s efforts are between AI safety (particularly governance team) and forecasting. Happy to chat in DM if you disagree. Otherwise reasonable point, though you’d ofc still have to do the math to make sure the forecasting program is worth it.
(edit: actually maybe the disagreement is still in the relative value of the work, depending on what you mean by “much” grantmaking capacity)
10. I think you can think of a lot of “EA epistemic/evaluation/forecasting work” as “internal tools/research for EA”. As such, I’d expect that it could make a lot of sense for us to allocate ~5-30% of our resources to it. Maybe 20% of that would be on the “R&D” to this part—perhaps more if you think this part is unusually exciting due to AI advancements. I personally am very interested in this latter part, but recognize it’s a fraction of a fraction of the full EA resources.
Seems unclear what should count as internal research for EA, e.g. are you counting OP worldview investigation team / AI strategy research in general? And re: AI advancements, it both improves the promise of AI for forecasting/epistemics work but also shortens timelines which points toward direct AI safety technical/gov work.
What links here?
- elifland's comment on New Open Philanthropy Grantmaking Program: Forecasting by Coefficient Giving (11 Mar 2024 15:57 UTC; 11 points)

elifland 8 Mar 2024 5:52 UTC
98 points
14 ∶ 5
on: New Open Philanthropy Grantmaking Program: Forecasting
All views are my own rather than those of any organizations/groups that I’m affiliated with. Trying to share my current views relatively bluntly. Note that I am often cynical about things I’m involved in. Thanks to Adam Binks for feedback.
Edit: See also child comment for clarifications/updates.
Edit 2: I think the grantmaking program has different scope than I was expecting; see this comment by Benjamin for more.
Following some of the skeptical comments here, I figured it might be useful to quickly write up some personal takes on forecasting’s promise and what subareas I’m most excited about (where “forecasting” (edit: is defined as things in the vein of “Tetlockian superforecasting” or general prediction markets/platforms, in which questions are often answered by lots of people spending a little time on them, without much incentive to provide deep rationales) ~~is defined as things I would expect to be in the scope of OpenPhil’s program to fund~~).
1. Overall, most forecasting grants that OP has made seem much lower EV than the AI safety grants (I’m not counting grants that seem more AI-y than forecasting-y, e.g. Epoch, and I believe these wouldn’t be covered by the new grantmaking program). In part due to my ASI timelines (~~10th~~ (edit: ~15th) percentile ~2027, median ~late 2030s), I’m most excited about forecasting grants that are closely related to AI, though I’m not super confident that no non-AI related ones are above the bar.
2. I generally agree with the view that I’ve heard repeated a few times that EAs significantly overrate forecasting as a cause area, while the rest of the world significantly underrates it.
  1. I think EAs often overrate superforecasters’ opinions, they’re not magic. A lot of superforecasters aren’t great (at general reasoning, but even at geopolitical forecasting), there’s plenty of variation in quality.
    General quality: Becoming a superforecaster selects for some level of intelligence, open-mindedness, and intuitive forecasting sense among the small group of people who actually make 100 forecasts on GJOpen. There are tons of people (e.g. I’d guess very roughly 30-60% of AI safety full-time employees?) who would become superforecasters if they bothered to put in the time.
    Some background: as I’ve written previously I’m intuitively skeptical of the benefits of large amounts of forecasting practice (i.e. would guess strong diminishing returns).
    Specialties / domain expertise: Contra a caricturized “superforecasters are the best at any forecasting questions” view, consider a grantmaker deciding whether to fund an organization. They are, whether explicitly or implicitly, forecasting a distribution of outcomes for the grant. But I’d guess most would agree that superforecasters would do significantly worse than grantmakers at this “forecasting question”. A similar argument could be made for many intellectual jobs, which could be framed as forecasting. The question on whether superforecasters are relatively better isn’t “Is this task answering a forecasting question“ but rather “What are the specific attributes of this forecasting question”.
    Some people seem to think that the key difference between questions superforecasters are good at vs. smart domain experts are in questions that are *resolvable* or *short-term*. I tend to think that the main differences are along the axes of *domain-specificity* and *complexity*, though these are of course correlated with the other axes. Superforecasters are selected for being relatively good at short-term, often geopolitical questions.
    As I’ve written previously: It varies based on the question/domain how much domain expertise matters, but ultimately I expect reasonable domain experts to make better forecasts than reasonable generalists in many domains.
    There’s an extreme here where e.g. forecasting what the best chess move is obviously better done by chess experts rather than superforecasters.
    So if we think of a spectrum from geopolitics to chess, it’s very unclear to me where things like long-term AI forecasts land.
    This intuition seems to be consistent with the lack of quality existing evidence described in Arb’s report (which debunked the “superforecasters beat intelligence experts without classified information” claim!).
  2. Similarly, I’m skeptical of the straw rationalist view that highly liquid well-run prediction markets would be an insane societal boon, rather than a more moderate-large one (hard to operationalize, hope you get the vibe). See here for related takes. This might change with superhuman AI forecasters though, whose “time” might be more plentiful.
3. Historically, OP-funded forecasting platforms (Metaculus, INFER) seem to be underwhelming on publicly observable impact per dollar (in terms of usefulness for important decision-makers, user activity, rationale quality, etc.). Maybe some private influence over decision-makers makes up for it, but I’m pretty skeptical.
  1. Tbh, it’s not clear that these and other platforms currently provide more value to the world than the opportunity cost of the people who spend time on them. e.g. I was somewhat addicted to Metaculus then later Manifold for a bit and spent more time on these than I would reflectively endorse (though it’s plausible that they were mostly replacing something worse like social media). I resonate with some of the comments on the EA Forum post mentioning that it’s a very nerd-sniping activity; forecasting to move up a leaderboard (esp. w/quick-resolving questions) is quite addicting to me compared to normal work activities.
4. I’ve heard arguments that getting superforecasted probabilities on things is good because they’re more legible/credible because they’re “backed by science”. I don’t have an airtight argument against this, but it feels slimy to me due to my beliefs above about superforecaster quality.
5. Regarding whether forecasting orgs should try to make money, I’m in favor of pushing in that direction as a signal of actually providing value, though it’s of course a balance re: the incentives there and will depend on the org strategy.
6. The types of forecasting grants I’d feel most excited about atm are, roughly ordered, and without a claim that any are above OpenPhil’s GCR bar (and definitely not exhaustive, and biased toward things I’ve thought about recently):
  1. Making AI products for forecasting/epistemics in the vein of FutureSearch and Elicit. I’m also interested in more lightweight forecasting/epistemic assistants.
    FutureSearch and systems in recent papers are already pretty good at forecasting, and I expect substantial improvements soon with next-gen models.
    I’m excited about making AIs push toward what’s true rather than what sounds right at first glance or is pushed by powerful actors.
    However, even if we have good forecasting/epistemics AIs, I’m worried that it won’t convince people of the truth since people are irrational and often variance in their beliefs is explained by gaining status/power, vibes, social circles, etc. It seems especially hard to change people’s minds on very tribal things, which seem correlated with the most important beliefs to change.
    AI friends might actually be more important than AI forecasters for epistemics, but that doesn’t mean AI forecasters are useless.
    I might think/write more about this soon. See also Lukas’s Epistemics Project Ideas and ACX on AI for forecasting
  2. Judgmental forecasting of AI threat models, risks, etc. involving a mix of people who have AI / dangerous domain expertise and/or very strong forecasting track record (>90th percentile superforecaster), ideally as many people as possible who have both. Not sure how helpful it will be but it seems maybe worth more people trying.
    In particular, forecasting that can help inform risk assessment / RSPs seems like a great thing to try. See also discussion of the Delphi technique in the context of AGI risk assessment here. Malcolm Murray at GovAI is running a Delphi study to get estimates of likelihood and impact of various AI risks from experts.
    This is related to a broader class of interventions that might look somewhat like a “structured review process” in which one would take an in-depth threat modeling report and have various people review and contribute their own forecasts in addition to qualitative feedback. My sense is that when superforecasters reviewed Joe Carlsmith’s p(doom) forecast in a similar vein that the result wasn’t that useful, but the exercise could plausibly be more useful with better quality reviews/forecasts. It’s unclear whether this would be a good use of resources above the usual ad-hoc/non-forecasting review process, but might be worth trying more.
  3. Forecasting tournaments on AI questions with large prize pools: I think these historically have been meh (e.g. the Metaculus one attracted few forecasters, wasn’t fun to forecast on (for me at least), and I’d guess significantly improved ~no important decisions), but I think it’s plausible things could go better now as AIs are much more capable, there are many more interesting and maybe important things to predict, etc.
  4. Crafting forecasting questions that are cruxy on threat models / intervention prioritization between folks working on AI safety
    It’s kind of wild that there has been so little success on this front. See frustrations from Alex Turner “I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun.” I worry that this will take a bunch of effort and not get very far (see Paul/Eliezer finding only a somewhat related bet re: their takeoff speeds disagreement), but it seems worth giving a more thorough shot with different participants.
    I’m relatively excited about doing things within the AI safety group rather than between this group and others (e.g. superforecasters) because I expect the results might be more actionable for AI safety people. (edit: I got feedback that this bullet was too tribal and I think that might be right. I think that a better distinction might be preferring inclusion of people who’ve thought deeply about the future of AI, rather than e.g. superforecaster generalists)
I incorporated some snippets of a reflections section from a previous forecasting retrospective above, but there’s a little that I didn’t include if you’re inclined to check it out.
What links here?
- Podcast: Is Forecasting a Promising EA Cause Area? by Ozzie Gooen (25 Mar 2024 20:36 UTC; 29 points)
- elifland's comment on Thomas Larsen’s Shortform by Thomas Larsen (LessWrong; 28 Mar 2026 2:11 UTC; 12 points)

elifland

Long-Term Fu­ture Fund: May 2023 to March 2024 Pay­out recommendations

Pod­cast: Is Fore­cast­ing a Promis­ing EA Cause Area?

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations

Podcast: Is Forecasting a Promising EA Cause Area?