Whilst I strongly disagree with the claim at the object level, many other non-forecasting AI safety interventions work with labs in some way, so even if this were true, the relative penalty applied to AIS forecasting work would be fairly low.
calebp
I think this post significantly overstates its conclusion and is plausibly poorly calibrated on the relative value of forecasting.
My main “directional” issues with the post as it’s currently written:
I think it overstates the amount of funding devoted to forecasting on a “worldview” basis.
Most forecasting funding is (iiuc) not going to neartermist causes or particularly fungible with neartermist causes, so pointing to a bunch of neartermist causes to justify better funding options seems irrelevant.
From my perspective, it seems like:
Within Animal welfare fungible money, very little goes into forecasting e.g. less than $2M per year
Tbh—I would probably prefer that more money went into some kinds of forecasting on the margin. For example, I think that people are generally too bullish on clean meat, and Linch/Open Phil’s work investigating the difficulty of clean meat has plausibly resulted in better allocation of millions of dollars because there are, in fact, good alternatives (like cage-free campaigns).
Within Longtermist/AI fungible money, maybe $10M/year goes into forecasting, which seems pretty reasonable to me but i think to get to 10M you need to be including projects that seems very promising to me for different reasons to mainstream forecasting infrastructure e.g. AI 2027, METR.
I think the strongest version of the argument would be attacking AI evals but I’m unsure about whether those are in-scope for this post—my impression is that evals are useful for forecasting capabilities are a pretty great bet relative to other funding opportunities within the AI space.
So the argument actually seems to be “longtermist funding is not as cost-effective as neartermist funding” which is not totally unreasonable, but clearly needs to engage with the long/neartermist worldview (e.g. moral size of the future) as opposed to just engaging with tangible short-term impact indicators.
I’m less convinced than the OP that funders in particular are overrating forecasting—I just don’t see much effort going into forecasting grantmaking compared to ~every other grantmaking area.
My impression is that a lot of forecasting dollars are funded by organisations that are incentivised to use the money well (e.g. AI companies paying FRI to produce forecasts around safety and capability evaluations for safety planning). I see that others have weighed in on this already so not planning to elaborate on this more.
I agree with some of the post’s vibes and think it’s pointing at real cultural traits of rationalist communities. Though tbh, I think OP is too bearish on the usefulness of betting/making falsifiable predictions for people in EA-spaces. I suspect that OP seen lots of people getting very distracted by futarchy/manifold etc. (and I do think this is a risk), but culturally I think EA should be pretty into “betting/making falsifiable predictions” and that cluster of epistemic traits AND I think forecasting infrastructure has a meaningful effect on this. E.g In two office spaces (out of three that I’ve spent substantial time in), I think Manifold/prediction markets have very clearly made the communities more forecasting-y, and this has had tangible effects on people’s research/choice of projects—this is probably the most explicit example, though most changes are harder to hyperlink.
Given that you are criticising the epistemics of EAs taking AGI very seriously, I think it’s reasonable to hold this post to a higher epistemic standard than a typical EA forum post. Apologies if this comes across as combative—I spent some time trying to tone it down with Claude and struggled to get something that wasn’t just hedged/weak sauce. I am excited about more discussion of the capabilities of AI systems on the EA forum and would like more people to write up their takes on the current situation.
…...I think you are applying more rigour to the bullish case than the bearish one. For example, you say:
[Mythos not providing a substantive improvement in cybersec capabilities] is further highlighted by the fact that an independent analysis was able to find many of the same vulnerabilities using much smaller open-source models.
I think this is misleading for a few reasons:
AISLE is not an “independent” entity—their whole business depends on Mythos and frontier models not being as big a deal as harnesses
That analysis does not “find” many of the same vulns—they were presented to the LLMs selectively
They don’t give a false positive rate, so it’s not clear that the LLMs classifications have much validity
On the claim that Anthropic talks about risks from their own models primarily to create hype: I find this hard to square with the evidence. Talking about how your B2B product might be extremely dangerous, or publishing lengthy documents critically assessing your own product and admitting to errors that would be difficult to identify independently (e.g. accidentally training against the CoT), is not a common marketing tactic. It feels like your model implies that companies should only release materials optimised for short-term interests, which doesn’t predict the real differences in how AI companies approach releases.
Benchmarks are interpreted uncritically
The benchmark contamination arguments are worth engaging with in principle, but I’m not sure they’re doing much work in practice—I don’t think many people in EA are actually updating heavily on raw benchmark scores right now. METR, arguably EA’s favourite benchmarking org, has been pretty vocal about their own benchmarks being saturated, so I think the community is reasonably aware of these limitations already.
Negative results are ignored
I’m genuinely uncertain what you want Anthropic and other AI companies to do here. Do you think “genuine intelligence” is easy to measure and well-defined? The more concrete concepts being used as proxies—coding ability, economic value generated, uplift—seem defensible on their own terms rather than as misleading substitutes for something more fundamental.
On “fundamental limits of LLMs” more broadly: these arguments have been made confidently by prominent researchers since the advent of LLMs and have not had a great track record. That doesn’t make them wrong, but it’s worth noting.
.....
I think this post would be much stronger if it applied its standards more symmetrically. It would also help to have a more concrete conclusion. The current takeaway is essentially “further research is needed”, which is a claim you can make about most areas of research (so much so that it’s been banned from multiple journals), but I don’t have a great sense of what research would actually convince you that the “AI hype” is reasonable.
Having an AI that doesn’t willingly participate in coups doesn’t imply that you need to specify all of the AI’s values in advance, or that it will be incorrigible in a broad (and x-risk increasing sense).
I think that the people preventing AI-assisted coups are imagining pretty corrigible AIs (in the sense that Claude right now is very corrigible); they just won’t want to do coups (in a similar sense to Claude not wanting to help with bioweapons research), and this just seems pretty workable.
A separate cluster of threat models that is worth disentangling is creating more surface area for anti-human-user coordination within the economy, particularly if it’s much easier for smart, misaligned AI systems to coordinate with relatively stupid, corrigible AI systems (e.g., Opus 4.7). The arguments for AI <> AI coordination advantage (over AI <> human) are quite intuitive to me, but I don’t think you actually need an asymmetry here to put society in a more vulnerable state than the current one. I don’t have a great sense of how this washes out, but it feels like a crux for evaluating the net benefit of coordination tech.
Similar to how traditional → digital banking probably creates more surface area for exploitation by computer hackers, it’s probably very good to have primitive computers touching nukes rather than more modern ones.
I thought that part of the core thesis was that as we go through the intelligence explosion, coordination tech becomes increasingly valuable (maybe critical). Are you saying that it’s plausible that we’ll get “good enough” coordination tech out of agents that are much less powerful that than the frontier during the IE? E.g. coordination tech generally uses Opus 4.7, even in the Opus 6-8 era, where coordination tech seems most (?) valuable, but we also have much more legitimate concerns about scheming capabilities?
The dual-use concerns you raise are framed around bad human actors: corporations colluding, coup plotters, criminals. But the coordination infrastructure you’re sketching could also create significant attack surfaces for AI systems themselves. If AI delegates are negotiating on behalf of humans, running arbitration, doing confidential monitoring, and profiling preferences, then a misaligned or adversarially manipulated AI layer sitting inside all of that coordination infrastructure seems like it could be quite a powerful lever for influence or control.
Curious if you have thoughts on this class of concerns?
Thanks for sharing this. Did your team make and test simple prototypes for any of these ideas? If not, I’m curious about why from a research/writing perspective. I would have thought that you could get quite a lot of signal very quickly with Claude Code on the feasibility and difficultly of some of these ideas.
I think the “strong default” framing overstates the case, for a few reasons.
The argument (IIUC) hinges on one actor gaining decisive, uncontested control before anyone else can respond. But that assumption does a lot of work, and I’m not sure it holds:
We currently have dozens of serious actors across multiple adversarial jurisdictions racing simultaneously, which looks more like a setup for messy multipolarity than a clean monopoly
Extreme military advantage hasn’t historically guaranteed political control—the US had overwhelming superiority in Vietnam, Afghanistan and Iraq and still couldn’t convert that into stable governance. On fast take-offs, the gap between “ASI achieved internally” and running a society requires human cooperation, and sustaining that loyalty is very hard.
The same inference (“extreme capability asymmetry, therefore inevitable authoritarianism”) was made about nuclear weapons. What emerged was contested, ugly and dangerous, but not totalitarian. That ofc doesn’t mean ASI follows the same path, but it’s worth thinking about whether you would have predicted that outcome in advance.
Even within a single ASI-controlling organisation, individuals have interests, and defection, whistleblowing and sabotage are historically common responses to illegitimate power grabs from within institutions. The DARPA director scenario assumes a level of internal cohesion that imo rarely holds in practice
I’d put the more likely default as a messy, contested outcome that preserves more democratic structure than your title implies, even if it falls well short of anything we’d be happy with.
Zooming out slightly, I’m not sure what you are actually imagining ASI looks like here, so maybe I’m talking past you. I suspect that either:
You’re imagining a “god-like” AI which has intellectual and physical capabilities that far exceed the aggregate yearly output and total resources of the current USA.
In which case, even aggressive ASI timelines should be measured in a low number of decades rather than years. (Edit: I should have said 5-15 years here, low number if decades make it sound like 30 years. I still think the general point on democratic societies having time to adapt stands)
You’re imagining a “country of geniuses in a datacenter” and little more (perhaps you also get a significant number of automated military drones).
In which case, I don’t think there is a strong case for the kind of overwhelming loss of democratic control. The data centres will still rely on their host country for energy, human resources, etc.
I don’t think they are trying to convert the EA community into something else—they are pretty clearly creating separate spaces for their movement/community. [1]
Describing their post as using “applause lights” seems at best uncharitable, and “absolute nonsense” is just rude. There are several well-received posts on the forum around “[a]ugmenting decision-making with meditative (e.g. mindfulness) [practices]” like this one and this one. It’s fine to dislike their principles, but I think it’s worth making an effort to be encouraging when fellow altruists try to build on the “project” of Effective Altruism;.
- ^
e.g. they say “That being said, we’re also aware of the danger of potential zero-sum dynamics between int/a and EA, and would like to avoid them as much as possible. One thing we are afraid of is int/a gravitating towards the “just bitching about EA” attractor state, which is definitely not the vibe we’re going for. Another concern is “taking people away from EA”. We don’t intend to dissuade people from doing impactful work by EA lights, in fact many of us in the movement are doing incredibly canonical EA jobs.” and have run many events themselves under their own banner.
- ^
I found this post hard to engage with, I’m not quite sure why. I think it’s pointing at some important areas so I’ve tried to write our some of my confusions.
I don’t understand why you believe that these problems won’t be solved by “ASI” or “human-level AI”—presumably, if they are tractable for humans, they’ll be tractable for human-level AIs. Agree that making sure that these systems are used for other problems is important and a lot of that work is “solving the alignment problem”.
I think you might be using terms like AGI and ASI in non-standard ways, e.g. “Approach 4: Research how to steer ASI toward solving non-alignment problems [like philosophy]”. It’s plausible that very powerful AI systems are less good at philosophy than they are at tasks that are cheaper to evaluate—but they’ll almost definitionally be better at philosophy than current humans. I think this is concerning for a bunch of reasons (including doing good alignment research in the run up to ASI) , but I’m not very worried about situations where we succeed at aligning ASI and then can’t get good philosophy research out of it (at least by human standards of good) for capability reasons.
Also, approach 3 (pause at human-level AI) probably does help with misalignment risks relative to the counterfactual of just proceeding to ASI. For reasons like AI control, and having much stronger evidence for our ability to control human-level intelligences than super-intelligences.
I agree with some of the early parts of the post—I definitely feel the community has a lot of researchers and not enough people doing other things though, I suspect that many of the other things people imagine reading this post, are also not very useful for the non-alignment AI problems you described.
no fieldbuilding programs exclusively dedicated to biosecurity
Minor, but in case you aren’t aware, the Cambridge Biosecurity Hub is a fieldbuilding program exclusively dedicated to biosecurity (they are running the AI x Bio stream at ERA, helped start a bio stream at SPAR, and are running their annual conference in a few weeks in Cambridge UK). I think it’s funded by CG’s Biosec team, but it may be funded by your team (the Biosec team is at least aware of them). People interested in making more stuff happen in this space should consider reaching out to them!
In any case, I’m also excited about more biosecurity fieldbuilding work happening and agree that it’s been extremely neglected for a long time—thanks for writing this!
This is great to see! I’ve enjoyed reading Gergo’s Substack and have really appreciated his work on AIS fieldbuilding—excited to see what you do with more resources!
I found this text particularly useful for working out what the program is.
When
Program Timeline: 1 December 2025 – 1 March 2026 (3 months)
Rolling applications begin 31 October 2025
If you are accepted prior to December 1st, you can get an early start with our content!
Extension options: You have the option to extend the program for yourself for up to two additional months – through to 1 June 2026. (We expect several cohort participants to opt for at least 1 month extensions)
Where
Remote, with all chats and content located on the Supercycle.org community platform
Most program events will happen in the European evenings / North American mornings. Other working groups can reach consensus on their own times.
How much
$750 per month (over 3 months)
I don’t think there’s a consensus on how the average young person should navigate the field
Yeah that sounds right, I agree that people should have a vibe “here is a take, it may work for some and not others—we’re still figuring this out” when they are giving career advice (if they aren’t already. Though I think I’d give that advice for most fieldbuilding, including AI safety so maybe that’s too low a bar.
I’m curious about whether other people who would consider themselves particularly well-informed on AI (or an “AI expert”) found these results surprising. I only skimmed the post, but I asked Claude to generate some questions based on the post for me to predict answers to, and I got a Brier score of 0.073, so I did pretty well (or at least don’t feel worried about being wildly out of touch). I’d guess that most people I work with would also do pretty well.
- ^
I didn’t check the answers, but Claude does pretty well at this kind of thing.
- ^
to be clear, these were extremely “gut” level predictions—spent about 10x more time writing this comment than I did on the whole exercise.
- ^
What % of the general public thinks AI will have a positive impact on jobs?
What % of the general public thinks AI will benefit them personally?
What % of Americans are more concerned than excited about increased AI use?
True or false: Most Americans think AI coverage in the media is overhyped/exaggerated
True or false: Younger people (under 30) are less worried about AI than seniors (65+)
What % of Americans say AI will worsen people’s ability to think creatively?
What % of the public has actually used ChatGPT?
- ^
I’m not an expert in this space; @Grace B, who I’ve spoken to a bit about this, runs the AIxBio fellowship and probably has much better takes than I do. Fwiw, I think I have a different perspective to the post.
My rough view is:
1. Historically, we have done a bad job at fieldbuilding in biosecurity (for nuanced reasons, but I guess that we made some bad calls).
2. As of a few months ago, we have started to do a much better job at fieldbuilding e.g. the AIxBio fellowship that you mentioned is ~the first of it’s kind. The other fellowships you mentioned iiuc aren’t focused on biosecurity.3. Most people who want to enter the space are already doing sensible things—not that many people are leaving their undergrad degrees to start PPE companies, many people are doing PhDs and are figuring out what to do—seems good to focus resources on PhDs/postdocs etc. or other people with more experience, whilst that’s extremely undertapped
4. AI safety has done an extremely good job at field building over the past ~4 years. Most of it was the result of the hard work of a bunch of great people fighting hard—rather than being particularly overdetermined by ChatGPT or whatever.
5. There are, in fact, a bunch of great things that one can do in biosecurity at various levels of experience, and outside of sustained 1:1 conversations, it’s really hard for more experienced people to figure out what some specific person should do [1]. My impression is that most users of the EA Forum, based on their current skills, “could” very quickly make useful contributions to the biosecurity space, but will likely get bottlenecked on motivation, strategy, grit etc. [2] I don’t think we should have that expectation of everyone. Still, I speak to people on approximately a weekly basis who I believe could make ambitious contributions, but psych themselves out or have standards that are too low for themselves and idk whether it’s helpful to act more conservatively.
6. I’m not sure where the dishonesty is really coming from, I don’t think there is much in the way of resources/community etc. pushing people to enter the space (yet).
- ^
I’ve occasionally wondered whether someone entreprenurial should try being a “career strategist” and charge people say $1000/hour (paid back over a few years once they start a role they think is particularly useful) to help them figure out how to have an outsized impact. This might look a little like a cross between 80k career coaching/AIM (charity entrepreneurship)/exec coaching with sustained engagement over a few weeks and a lot of time outside of sessions spent researching and hustling (from both the coach and user). Part of the reason to have a large charge is (1) you want to attract people taking impact extremely seriously, and (2) this kind of coaching is probably really hard and you’d need someone great. @Nina Friedrich🔸 you come to mind as someone that could do this!
- ^
Standard disclaimer of optimising hard for impact straight away is not the same as optimising hard for impact over the course of your career. Often it is better to build skills and become insanely leveraged before going hard at the most important problems.
- ^
In your opinion, how many weeks before the event would get 80% of the value of knowing two years in advance?
In general, I think many people who have the option to join Anthropic could do more altruistically ambitious things, but career decisions should factor in a bunch of information that observers have little access to (e.g. team fit, internal excitement/motivation, exit opportunities from new role …).[1] Joe seems exceptionally thoughtful, altruistic, and earnest, and that makes me feel good about Joe’s move.
I am very excited about posts grappling with career decisions involving AI companies, and would love to see more people write them. Thank you very much for sharing it!- ^
Others in the EA community seem more excited about AI personality shaping than I am. I wouldn’t be surprised if it turned out to be very important, though that’s an argument that rules in a bunch of random, currently unexplored projects.
- ^
Interesting, is sports betting plausibly as bad as tobacco/alcohol in low-income countries?
Like, I think sports betting is plausibly one of the “worst businesses” for the US, comparable to alcohol/tobacco—but my impression is that the EAs that care about tobacco/alcohol don’t care very much about interventions in high-income countries relative to low-income countries.