Henry Howard🔸 comments on The AI people have been right a lot

Henry Howard🔸 18 Apr 2026 1:06 UTC
21 points
3 ∶ 11
(NOTE: Coming at this from a place of: a. ignorance of what the AI Safety community actually does and b. not wanting to take the ego hit of admitting that I have been wrong about my long-held skepticism of AI Safety)
I think it was and is fair to be skeptical of the shift to AI Safety in EA on the basis that it’s not that tractable, and that there’s there’s not clear evidence that the AI Safety movement has had a positive effect on the trajectory of AI.
“But it brought the ideas into the mainstream”
I think the AI Safety community will be tempted to think they’ve normalised in the zeitgeist ideas about superintelligent AIs and the philosphical questions and risks that arise from them, but 2001: A Space Odyssey came out in 1968, Terminator in 1984 and The Matrix in 1999 etc.. The ideas of superintelligant AIs and the existential risks of them are diffused through modern culture and it’s possible that The Pope and The UN would have made the same statements about them given the recent progress of LLMs regardless of the AI Safety movement.
Are there many ideas in If Anyone Builds It, Everyone Dies that weren’t broadly covered in Terminator/The Matrix/2001 a Space Odyssey/Dune etc.?
“But the work they’ve done has set us on the right path”
I haven’t seen strong evidence for the direct work of the AI Safety movement reducing existential risks from AI:
- Amanda Askell’s involvement with shaping the character of Claude sounds good. Has it made much difference or is it just putting a nice and brittle mask on the beast?
- AI Safety organisations like MIRI an Redwood Research have been operating for 25 and 5 years respectively. As an outsider I coudn’t point to any particular breakthrough they’ve made in AI alignment. Redwood seems to do some kinda interesting work on measuring rogue behaviour and creating checks. I dunno. Seems like any organisation trying to make a reliable AI product would be heavily incentivised to do this stuff regardless.
- In Australia Good Ancestors has probably contributed in some way to the government’s decision to potentially open an AI Safety Institute here. The statements the government puts out about them seem to mostly emphasise deepfake porn and the threat to people’s jobs rather than existential risks, which makes me think that this decision might have just happened anyway regardless of the AI Safety movement.
- Interpretability research seems far from being able to understand more than a few components at a time. And also the companies making AI would likely have been incentivised to do this work regardless of the AI Safety movement because customers don’t want a black box.
From the outside it seems there’s a good argument that the AI situation would have evolved pretty similarly regardless of EA/AI Safety input.
From that position, it’s easy to believe that if EA had just stuck to Earning To Give and malaria nets and decaging chickens then the impact would have been greater, both directly and because the movement might not have lost as much momentum when AI Safety alienated people.
- Karthik Tadepalli 20 Apr 2026 7:50 UTC
  17 points
  5 ∶ 0
  Parent
  For me personally, even just granting “they were right about the trajectory of AI” is a huge update. I thought AI was a nothingburger, that the bioanchors report saying AGI would be reached by 2047 was ludicrously optimistic. Now I think I was wrong and the AI safety community was right—even pessimistic! - about AI progress. Whether they have changed all that much about AI risk is a different debate, but even if they had done nothing on that front I would be inclined to agree with Dylan.
- Ben_West🔸 19 Apr 2026 5:57 UTC
  12 points
  3 ∶ 0
  Parent
  Maybe, but “if EA had just stuck to Earning To Give and malaria nets and decaging chickens then the impact would have been greater” doesn’t clearly follow. Malaria nets look a lot worse if we all die in a few years from AI anyway, and cage free pledges have ~0 value if humanity ends before the pledge can be fulfilled.
  - Henry Howard🔸 19 Apr 2026 8:46 UTC
    6 points
    0 ∶ 0
    Parent
    That’s a fair point. At either end of the extreme of outcomes: “ASI kills us all” or “ASI quickly uplifts everyone out of poverty” almost all decisions/actions we make today are pretty meaningless.
    But if the next few decades fall somewhere between those two extremes, which I think they probably will, the impact of improving people’s lives remains substantial.
    - Ben_West🔸 19 Apr 2026 17:19 UTC
      5 points
      1 ∶ 0
      Parent
      Hmm, but in a success without dignity world making interpretability a bit better, or governments a bit more interested, is relevant, right?
      - Henry Howard🔸 22 Apr 2026 7:58 UTC
        6 points
        1 ∶ 0
        Parent
        Yes but my point is that whether the AI Safety community has moved the dial on interpretability or government interest is unclear and worth being skeptical of
        Ben_West🔸 24 Apr 2026 21:25 UTC
        6 points
        0 ∶ 0
        Parent
        I suspect that I’m still misunderstanding you, but: eg interpretability tools are empirically able to identify misalignment, which feels like a (somewhat simple example of) the thing we want. Neel Nanda’s 80k podcast goes over the state of the field; tldr is roughly that there are pretty meaningful advances but also he’s skeptical that it will be a silver bullet.
        I agree with Ben Stewart that there’s a galaxy-brain argument that these positive impacts are outweighed by accelerating progress, but it seems hard to argue that things like interpretability aren’t making progress on their own terms.
        Ben Stewart 24 Apr 2026 23:10 UTC
        4 points
        0 ∶ 0
        Parent
        I think Henry’s skeptical that the AI safety community made a counterfactual difference in getting interpretability started earlier or growing faster. Not questioning interpretability’s prospects for reducing x-risk.
        Henry Howard🔸 25 Apr 2026 1:25 UTC
        4 points
        0 ∶ 0
        Parent
        Thanks Ben. I actually suggested both in my original comment: both
        (a) that there is market incentive for the companies to do this themselves so ?did AI Safety movement really move the dial on this?,
        and also
        (b) that I’m skeptical of the value of interpretability research (based only on not having seen anything impressive come from it, but I’m very ignorant of the field)
        Ben_West🔸 25 Apr 2026 1:49 UTC
        2 points
        0 ∶ 0
        Parent
        I see, thanks! I’m not sure exactly what you’d consider as evidence here, but e.g. here’s citation count on papers from the past year vs. AI Lab Watch safety rating^[1]
        ^
        Raw data. Note that anthropic doesn’t use arxiv, which affects their citation counts. This is just coming from a dumb search of semantic scholar; I expect a lot of disagreement could be had over the exact criteria for considering something “interpretability” but I expect the Ant/GDM > OAI >> * ordering to be true for almost any definition.
        Henry Howard🔸 25 Apr 2026 1:21 UTC
        2 points
        0 ∶ 0
        Parent
        Not familiar with this paper so sunk to using Claude to give myself a summary. The misalignment “blind audit game” seems a bit sus
        ~~“Three auditing teams played this game under conditions mimicking those of AI developers conducting real pre-deployment alignment audits, including access to model weights and training data”~~
        ~~They had access to the training documents? That doesn’t seem like detecting alignment, that’s just a search through files to find one with malicious instructions~~
        “A fourth team was given affordances that a lab-external auditors would have—namely, only API access to sample from the model—and failed to discover its hidden objective despite it being possible in principle to do so.”
        ~~🤔 when they don’t have the training documents they can’t detect the misalignment?~~
        ~~If they’re claiming that finding a file with malicious instructions is detecting misalignment then this would update me further against AI Safety research.~~
        Edit: Actually it looks like the teams were using some clever auto-encoder method rather than just a search through documents. In any case this seems all pretty artificial. This method might detect misalignment if it is due to a few malicious documents mixed-in with the training data and where the malicious behaviour is easy to detect. This feels like killing cancer cells in a petri dish—easy and doesn’t tell you much.
        Ben_West🔸 25 Apr 2026 2:00 UTC
        2 points
        0 ∶ 0
        Parent
        Table 1 shows the techniques used; the teams which were allowed to use SAEs (an interpretability technique) used them; the one which was prohibited from using them searched the data.
        Also note that “training data” does not mean “instructions”. Section 3 describes their training process.
        Ben Stewart 24 Apr 2026 9:01 UTC
        4 points
        1 ∶ 0
        Parent
        I think there’s a good case for AI safety having a pretty good counterfactual effect on a bunch of productive areas, but obviously that’s depends on a lot of details and there’s plenty of room for debate.
        I think a stronger line of critique could be that early-mid AI safety efforts/thinking made the frontier race start earlier, go faster, and be more intense (e.g. roles in getting key frontier leaders obsessed, introducing Deepmind cofounders, boosting OpenAI’s founding, etc). I haven’t interrogated that history to know where to come down, but it’s a plausible way that the whole of AI safety has been net-negative. (This claim doesn’t really detract from future impact of AI safety though, if the cat’s out of the bag)
        What links here?
        Ben_West🔸's comment on The AI people have been right a lot by Dylan Matthews (24 Apr 2026 21:25 UTC; 6 points)
  - Ian Turner 20 Apr 2026 19:24 UTC
    2 points
    1 ∶ 4
    Parent
    Malaria nets only last 3 years anyway, their direct impact does not require the world to last longer than that (although, perhaps you value saving a life less, if you think the world will soon end).
    - Mo Putera 21 Apr 2026 3:29 UTC
      16 points
      7 ∶ 0
      Parent
      The way the benefits calculation cashes out on an individual beneficiary basis essentially requires that they (mostly under-5s) live out full lives and enjoy 40 years of increased income, it isn’t a function of how long the nets last.
  - Alex N. 26 Apr 2026 9:36 UTC
    1 point
    0 ∶ 0
    Parent
    The existence of existential threats does not in itself create a strong argument to redirect the effort. Otherwise EA should have been focusing on nuclear disarmament, climate change, asteroid defence, pandemic prevention etc. from the get go
    .
- Michael Townsend🔸 19 Apr 2026 19:29 UTC
  9 points
  1 ∶ 0
  Parent
  I guess as you disclaimed might be the case up front, I don’t think these are the strongest or most informed examples of EAs impact on AI safety.
  
  In many of cases of such impact, one can quibble about many things:
  - Whether that impact was clearly positive, or whether it had some kind of indirectly negative harmful effect, most commonly via speeding up AI development. See Paul Christiano’s reflections on the impact of Reinforcement Learning with Human Feedback as an example.
  - The counterfactuality and persistence of the impact — e.g., like you said for many of these, would this have happened (eventually) anyway?
  - How attributable that was to EA (and unfortunately in some cases, due to EA having a toxic brand in many places, it’s actually best if it is not that attributable to EA).
  - And last “Does any of that matter? All of EAs impact — for better or worse — has been its influence on Anthropic.”
  Yet, I think taken as a whole, I think EA has punched above its weight in many ways with respect to making AI go well. It’s led to:
  - More and better staffed AI safety/security institutes
  - A richer non-profit ecosystem of safety research (like Truthful AI, FAR AI, Redwood Research, etc.)
  - More and better staffed third-party evaluations, auditing, and science (METR, AVERI)
  - Large amounts of field-building that encourages talented people to work on making AI go well (MATS, BlueDot, 80k)
  - A significant amount of policy advocacy and public communications about AI risk.
  - Probably other examples, too.
  A lot of the effort to make this happened relied on EA motivated people willing to take lower paid or less glamorous jobs.^[1] While some specific organizations’ or research or policy wins or public communications would have happened otherwise, but some wouldn’t, and even still, happening earlier is still better.
  I started out in EA caring about global health, and my first EA job was as a Researcher at GWWC. Even after becoming pretty convinced by AI risk and longtermism, I was still fairly sympathetic to concerns like “AI Safety alienating people”. For instance, I was pretty against 80,000 Hours becoming explicitly focused on longtermism, and also pretty skeptical / worried about its pivot last year into leaning even more into AI. Now, looking at just how fast AI progress is developing, how much there is still be done to make it go well, and how valuable (I think) EA has been to date, I think I got a lot of that wrong.
  1. ^
    And of course, in some cases, they happened to get pretty well-paid jobs that ended up being fairly glamorous (even if they weren’t in the beginning). I don’t think that undermines the impact much. I don’t really begrudge the quant finance folks who give >50% of their income to charities, even if they’re still pretty rich at the end of the day.
  - Mo Putera 20 Apr 2026 6:47 UTC
    17 points
    4 ∶ 0
    Parent
    I’m not sure this addresses Henry’s critiques? In general, every bullet listed under “I think EA has punched above its weight in many ways with respect to making AI go well” is a proxy somewhere in the middle of the ToC chain while his comment is more end-of-ToC focused as he’s skeptical of the proxies actually being beneficial, and none of these bullets address the counterfactuality he brought up. In particular, and for instance, you mentioned the founding of Redwood Research as an example of EA making AI go well despite Henry explicitly being skeptical of its impact so far:
    AI Safety organisations like MIRI an Redwood Research have been operating for 25 and 5 years respectively. As an outsider I coudn’t point to any particular breakthrough they’ve made in AI alignment. Redwood seems to do some kinda interesting work on measuring rogue behaviour and creating checks. I dunno. Seems like any organisation trying to make a reliable AI product would be heavily incentivised to do this stuff regardless.
    To be clear I’m not taking sides or anything, I’m just disheartened by what I perceive to be a lot of talking past each other between AIS advocates and skeptics on this forum, some of which seem easily preventable, like in this case.
    - Michael Townsend🔸 20 Apr 2026 23:40 UTC
      5 points
      0 ∶ 0
      Parent
      Fair enough — I think I was trying to say something along the lines of “going through any specific example invites a lot of genuinely thorny and difficult questions about counterfactuality/sign of impact/attribution to EA” (and again many of these are hard to discuss on a public forum) but I think zooming out, you can see EAs fingerprints in various important places. I think this leads to an overall common-sense perspective that EA has helped improve the situation.
      Also, I agree I pointed to work in the middle of the ToC chain, but that seems kind of reasonable to me given that AI is currently not that powerful and not really that scary. AI hasn’t yet been capable of causing a disaster, so it’s not really possible to have prevented one (yet).
      On the specific example of Redwood Research is doing a lot of really valuable safety work. I think pioneering Control has been a fairly useful accomplishment, and I suspect if someone wanted to dig into the details, they’d find that it was fairly counterfactual.
- Sergio Diaz 🔸 21 Apr 2026 16:32 UTC
  3 points
  0 ∶ 0
  Parent
  Even if you’re skeptical about the direct impact of AI safety work on reducing existential risk (a much longer conversation, and one I’m not fully qualified to have), there’s a strong indirect case that the EA and EA-adjacent prioritization of AI in the mid-2010s will end up being hugely important for “traditional”, non-speculative EA causes like global health and animal welfare. Most of Anthropic’s co-founders and many of its early employees were deeply involved in the EA and rationalist communities, and it’s at least plausible that this engagement is what led them to take AI seriously enough to found Anthropic in 2021 or to join early with substantial equity. As Sophie Kim’s post documents, Anthropic’s seven co-founders have pledged to donate 80% of their wealth, which at current valuations could amount to roughly $37.8B combined, nearly ten times what Coefficient Giving has disbursed in its entire history. Including employee equity already in DAFs, the total pool of EA-influenced philanthropic capital could reach nine or ten figures. It’s not unreasonable to assume that a substantial fraction of this is likely to flow into non-AI causes. Many of these donors signed the GWWC pledge before AI was their focus and hold a worldview and values closely aligned with the broader effective altruism community (vven outside EA, it isn’t uncommon for wealthy individuals with modest altruistic inclinations to donate significant amounts to global health causes). Needless to say, this is an average estimate and not guaranteed. It’s possible that Anthropic or the entire AI ecosystem collapses and these funds never materialize, but it’s also possible that Anthropic’s returns end up being even larger.

Henry Howard🔸 comments on The AI people have been right a lot

“But it brought the ideas into the mainstream”

“But the work they’ve done has set us on the right path”