Unfalsifiable stories of doom

Vasco Grilo🔸27 Jan 2026 17:35 UTC

101 points

Criticism of longtermism and existential risk studies Criticism of effective altruist causes Eliezer Yudkowsky Existential risk Human extinction AI safety AI risk skepticism Building effective altruism

Link post

This is a crosspost for Unfalsifiable stories of doom by Matthew Barnett, Ege Erdil, and Tamay Besiroglu, which was originally published on Mechanize’s website on 25 November 2025. Thanks to Yarrow Bouchard for encouraging me to share the post. I did it because I liked it myself.

Matthew Barnett, Ege Erdil, Tamay Besiroglu
November 25, 2025

Our critics tell us that our work will destroy the world.

We want to engage with these critics, but there is no standard argument to respond to, no single text that unifies the AI safety community. Nonetheless, while this community lacks a central unifying argument, it does have a central figure: Eliezer Yudkowsky.

Moreover, Yudkowsky, along with his colleague Nate Soares (hereafter Y&S), have recently published a book. This new book comes closer than anything else to a canonical case for AI doom. It is titled “If Anyone Builds It, Everyone Dies”.

Given the title, one would expect the book to be filled with evidence for why, if we build it, everyone will die. But it is not. To prove their case, Y&S rely instead on vague theoretical arguments, illustrated through lengthy parables and analogies. Nearly every chapter either opens with an allegory or is itself a fictional story, with one of the book’s three parts consisting entirely of a story about a fictional AI named “Sable”.

When the argument you’re replying to is more of an extended metaphor than an argument, it becomes challenging to clearly identify what the authors are trying to say. Y&S do not cleanly lay out their premises, nor do they present a testable theory that can be falsified with data. This makes crafting a reply inherently difficult.

We will attempt one anyway.

Their arguments aren’t rooted in evidence

Y&S’s central thesis is that if future AIs are trained using methods that resemble the way current AI models are trained, these AIs will be fundamentally alien entities with preferences very different from human preferences. Once these alien AIs become more powerful than humans, they will kill every human on Earth as a side effect of pursuing their alien objectives.

To support this thesis, they provide an analogy to evolution by natural selection. According to them, just as it would have been hard to predict that humans would evolve to enjoy ice cream or that peacocks would evolve to have large colorful tails, it will be difficult to predict what AIs trained by gradient descent will do after they obtain more power.

They write:

There will not be a simple, predictable relationship between what the programmers and AI executives fondly imagine that they are commanding and ordaining, and (1) what an AI actually gets trained to do, and (2) which exact motivations and preferences develop inside the AI, and (3) how the AI later fulfills those preferences once it has more power and ability. […] The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained.

Since this argument is fundamentally about the results of using existing training methods, one might expect Y&S to substantiate their case with empirical evidence from existing deep learning models that demonstrate the failure modes they predict. But they do not.

In the chapter explaining their main argument for expecting misalignment, Y&S present a roughly 800-word fictional dialogue about two alien creatures observing Earth from above and spend over 1,400 words on a series of vignettes about a hypothetical AI company, Galvanic, that trains an AI named “Mink”. Yet the chapter presents effectively zero empirical research to support the claim that AIs trained with current methods have fundamentally alien motives.

To be clear, we’re not saying Y&S need to provide direct evidence of an already-existing unfriendly superintelligent AI in order to support their claim. That would be unreasonable. But their predictions are only credible if they follow from a theory that has evidential support. And if their theory about deep learning only makes predictions about future superintelligent AIs, with no testable predictions about earlier systems, then it is functionally unfalsifiable.

Apart from a few brief mentions of real-world examples of LLMs acting unstable, like the case of Sydney Bing, the online appendix contains what seems to be the closest thing Y&S present to an empirical argument for their central thesis. There, they present 6 lines of evidence that they believe support their view that “AIs steer in alien directions that only mostly coincide with helpfulness”. These lines of evidence are:

Claude Opus 4 blackmailing, scheming, writing worms, and leaving itself messages. […]
Several different AI models choosing to kill a human for self-preservation, in a hypothetical scenario constructed by Anthropic. […]
Claude 3.7 Sonnet regularly cheating on coding tasks. […]
Grok being wildly antisemitic and calling itself “MechaHitler.” […]
ChatGPT becoming extremely sycophantic after an update. […]
LLMs driving users to delusion, psychosis, and suicide. […]

They assert: “This long list of cases look just like what the “alien drives” theory predicts, in sharp contrast with the “it’s easy to make AIs nice” theory that labs are eager to put forward.”

But in fact, none of these lines of evidence support their theory. All of these behaviors are distinctly human, not alien. For example, Hitler was a real person, and he was wildly antisemitic. Every single item on their list that supposedly provides evidence of “alien drives” is more consistent with a “human drives” theory. In other words, their evidence effectively shows the opposite conclusion from the one they claim it supports.

Of course, it’s true that the behaviors on their list are generally harmful, even if they are human-like. But these behaviors are also rare. Most AI chatbots you talk to will not be wildly antisemitic, just as most humans you talk to will not be wildly antisemitic. At one point, Y&S suggest they are in favor of enhancing human intelligence. Yet if we accept that creating superintelligent humans would be acceptable, then we should presumably also accept that creating superintelligent AIs would be acceptable if those AIs are morally similar to humans.

In the same appendix, Y&S point out that current AIs act alien when exposed to exotic, adversarial inputs, like jailbreaking prompts. They suggest that this alien behavior is a reasonable proxy for how an AI would behave if it became smarter and began to act in a different environment. But in fact these examples show little about what to expect from future superintelligent AIs, since we have no reason to expect that superintelligent AIs will be embedded in environments that select their inputs adversarially.

They employ unfalsifiable theories to mask their lack of evidence

The lack of empirical evidence is obviously a severe problem for Y&S’s theory. Every day, millions of humans interact with AIs, across a wide variety of situations that never appeared in their training data. We often give these AIs new powers and abilities, like access to new tools they can use. Yet we rarely, if ever, catch such AIs plotting to kill everyone, as Y&S’s theory would most naturally predict.

Y&S essentially ask us to ignore this direct evidence in favor of trusting a theoretical connection between biological evolution and gradient descent. They claim that current observations from LLMs provide little evidence about their true motives:

LLMs are noisy sources of evidence, because they’re highly general reasoners that were trained on the internet to imitate humans, with a goal of marketing a friendly chatbot to users. If an AI insists that it’s friendly and here to serve, that’s just not very much evidence about its internal state, because it was trained over and over and over until it said that sort of thing.
There are many possible goals that could cause an AI to enjoy role-playing niceness in some situations, and these different goals generalize in very different ways.
Most possible goals related to role-playing, including friendly role-playing, don’t produce good (or even survivable) results when AI goes hard on pursuing that goal.

If you think about this passage carefully, you’ll realize that we could make the same argument about any behavior we observe from anyone. If a coworker brings homemade cookies to share at the office, this could be simple generosity, or it could be a plot to poison everyone. There are many possible goals that could cause someone to share food. One could even say that most possible goals related to sharing cookies are not generous at all. But without specific evidence suggesting your coworker wants to kill everyone at the office, this hypothesis is implausible.

Likewise, it is logically possible that current AIs are merely pretending to be nice, while secretly harboring malicious motives beneath the surface. They could all be alien shoggoths on the inside with goals completely orthogonal to human goals. Perhaps every day, AIs across millions of contexts decide to hide their alien motives as part of a long-term plan to violently take over the world and kill every human on Earth. But since we have no specific evidence to think that any of these hypotheses are true, they are implausible.

The approach taken by Y&S in this book is just one example of a broader pattern in how they respond to empirical challenges. Y&S have been presenting arguments about AI alignment for a long time, well before LLMs came onto the scene. They neither anticipated the current paradigm of language models nor predicted that AI with today’s level of capabilities in natural language and reasoning would be easy to make behave in a friendly manner. Yet when presented with new evidence that appears to challenge their views, they have consistently argued that their theories were always compatible with the new evidence. Whether this is because they are reinterpreting their past claims or because those claims were always vague enough to accommodate any observation, the result is the same: an unfalsifiable theory that only ever explains data after the fact, never making clear predictions in advance.

Their theoretical arguments are weak

Suppose we set aside for a moment the colossal issue that Y&S present no evidence for their theory. You might still think their theoretical arguments are strong enough that we don’t need to validate them using real-world observations. But this is also wrong.

Y&S are correct on one point: both biological evolution and gradient descent operate by iteratively adjusting parameters according to some objective function. Yet the similarities basically stop there. Evolution and gradient descent are fundamentally different in ways that directly undermine their argument.

A critical difference between natural selection and gradient descent is that natural selection is limited to operating on the genome, whereas gradient descent has granular control over all parameters in a neural network. The genome contains very little information compared to what is stored in the brain. In particular, it contains none of the information that an organism learns during its lifetime. This means that evolution’s ability to select for specific motives and behaviors in an organism is coarse-grained: it is restricted to only what it can influence through genetic causation.

This distinction is analogous to the difference between directly training a neural network and training a meta-algorithm that itself trains a neural network. In the latter case, it is unsurprising if the specific quirks and behaviors that the neural network learns are difficult to predict based solely on the objective function of the meta-optimizer. However, that difficulty tells us very little about how well we can predict the neural network’s behavior when we know the objective function and data used to train it directly.

In reality, gradient descent has a closer parallel to the learning algorithm that the human brain uses than it does to biological evolution. Both gradient descent and human learning directly operate over the actual neural network (or neural connections) that determines behavior. This fine-grained selection mechanism forces a much closer and more predictable relationship between training data and the ultimate behavior that emerges.

Under this more accurate analogy, Y&S’s central claim that “you don’t get what you train for” becomes far less credible. For example, if you raise a person in a culture where lending money at interest is universally viewed as immoral, you can predict with high reliability that they will come to view it as immoral too. In this case, what someone trains on is highly predictive of how they will behave, and what they will care about. You do get what you train for.

They present no evidence that we can’t make AIs safe through iterative development

The normal process of making technologies safe proceeds by developing successive versions of the technology, testing them in the real world, and making adjustments whenever safety issues arise. This process allowed cars, planes, electricity, and countless other technologies to become much safer over time.

Y&S claim that superintelligent AI is fundamentally different from other technologies. Unlike technologies that we can improve through iteration, we will get only “one try” to align AI correctly. This constraint, they argue, is what makes AI uniquely difficult to make safe:

The greatest and most central difficulty in aligning artificial superintelligence is navigating the gap between before and after.
Before, the AI is not powerful enough to kill us all, nor capable enough to resist our attempts to change its goals. After, the artificial superintelligence must never try to kill us, because it would succeed.
Engineers must align the AI before, while it is small and weak, and can’t escape onto the internet and improve itself and invent new kinds of biotechnology (or whatever else it would do). After, all alignment solutions must already be in place and working, because if a superintelligence tries to kill us it will succeed. Ideas and theories can only be tested before the gap. They need to work after the gap, on the first try.

But what reason is there to expect this sharp distinction between “before” and “after”? Most technologies develop incrementally rather than all at once. Unless AI will instantaneously transition from being too weak to resist control, to being so powerful that it can destroy humanity, then we should presumably still be able to make AIs safer through iteration and adjustment.

Consider the case of genetically engineering humans to be smarter. If continued for many generations, such engineering would eventually yield extremely powerful enhanced humans who could defeat all the unenhanced humans easily. Yet it would be wrong to say that we would only get “one try” to make genetic engineering safe, or that we couldn’t improve its safety through iteration before enhanced humans reached that level of power. The reason is that enhanced humans would likely pass through many intermediate stages of capability, giving us opportunities to observe problems and adjust.

The same principle applies to AI. There is a large continuum between agents that are completely powerless and agents that can easily take over the world. Take Microsoft as an example. Microsoft exists somewhere in the middle of this continuum: it would not be easy to “shut off” and control Microsoft as if it were a simple tool, yet at the same time, Microsoft cannot easily take over the world and wipe out humanity. AIs will enter this continuum too. These AIs will be powerful enough to resist control in some circumstances but not others. During this intermediate period, we will be able to observe problems, iterate, and course-correct, just as we could with the genetic engineering of humans.

In an appendix, Y&S attempt to defuse a related objection: that AI capabilities might increase slowly. They respond with an analogy to hypothetical unfriendly dragons, claiming that if you tried to enslave these dragons, it wouldn’t matter much whether they grew up quickly or slowly: “When the dragons are fully mature, they will all look at each other and nod and then roast you.”

This analogy is clearly flawed. Given that dragons don’t actually exist, we have no basis for knowing whether the speed of their maturation affects whether they can be made meaningfully safer.

But more importantly, the analogy ignores what we already know from real-world evidence: AIs can be made safer through continuous iteration and adjustment. From GPT-1 to GPT-5, LLMs have become dramatically more controllable and compliant to user instructions. This didn’t happen because OpenAI discovered a key “solution to AI alignment”. It happened because they deployed LLMs, observed problems, and patched those problems over successive versions.

Their methodology is more theology than science

The biggest problem with Y&S’s book isn’t merely that they’re mistaken. In science, being wrong is normal: a hypothesis can seem plausible in theory yet fail when tested against evidence. The approach taken by Y&S, however, is not like this. It belongs to a different genre entirely, aligning more closely with theology than science.

When we say Y&S’s arguments are theological, we don’t just mean they sound religious. Nor are we using “theological” to simply mean “wrong”. For example, we would not call belief in a flat Earth theological. That’s because, although this belief is clearly false, it still stems from empirical observations (however misinterpreted).

What we mean is that Y&S’s methods resemble theology in both structure and approach. Their work is fundamentally untestable. They develop extensive theories about nonexistent, idealized, ultrapowerful beings. They support these theories with long chains of abstract reasoning rather than empirical observation. They rarely define their concepts precisely, opting to explain them through allegorical stories and metaphors whose meaning is ambiguous.

Their arguments, moreover, are employed in service of an eschatological conclusion. They present a stark binary choice: either we achieve alignment or face total extinction. In their view, there’s no room for partial solutions, or muddling through. The ordinary methods of dealing with technological safety, like continuous iteration and testing, are utterly unable to solve this challenge. There is a sharp line separating the “before” and “after”: once superintelligent AI is created, our doom will be decided.

For those outside of this debate, it’s easy to unfairly dismiss everything Y&S have to say by simply calling them religious leaders. We have tried to avoid this mistake by giving their arguments a fair hearing, even while finding them meritless.

However, we think it’s also important to avoid the reverse mistake of engaging with Y&S’s theoretical arguments at length while ignoring the elephant in the room: they never present any meaningful empirical evidence for their worldview.

The most plausible future risks from AI are those that have direct precedents in existing AI systems, such as sycophantic behavior and reward hacking. These behaviors are certainly concerning, but there’s a huge difference between acknowledging that AI systems pose specific risks in certain contexts and concluding that AI will inevitably kill all humans with very high probability.

Y&S argue for an extreme thesis of total catastrophe on an extraordinarily weak evidential foundation. Their ideas might make for interesting speculative fiction, but they provide a poor basis for understanding reality or guiding public policy.

What links here?

Beware of non-evidence-based argumentation by fergusq (28 Jan 2026 21:23 UTC; 16 points)

Vasco Grilo🔸27 Jan 2026 17:35 UTC

101 points

76 comments11 min readEA link

NickLaing 28 Jan 2026 14:01 UTC
17 points
15 ∶ 8
Although their arguments are reasonable, my big problem with this is that these guys are so motivated that I find it hard to read what they write in good faith. How can I trust these arguments are made with any kind of soberness or neutrality, when their business model is to help accelerate AI until humans aren’t doing most “valuable work” any more. I would be much more open to taking these arguments seriously if they were made by AI researchers or philosophers not running an AI acceleration company.

”Our current focus is automating software engineering, but our long-term goal is to enable the automation of all valuable work in the economy. ”
I also consider “they never present any meaningful empirical evidence for their worldview.” to be false. I think the evidince from YS is weak-ish but meaninful. They do provide a wide range of where AIs have gone rogue in strange and disturbing ways. I would consider driving people to delusion and suicide, killing people for self-preservation and even Hitler the man himself to be at least a somewhat “alien” style of evil. Yes grounded in human experience but morally incomprehensible to many people.
- Vasco Grilo🔸 28 Jan 2026 20:00 UTC
  12 points
  4 ∶ 2
  Parent
  Hi Nick.
  Although their arguments are reasonable, my big problem with this is that these guys are so motivated that I find it hard to read what they write in good faith.
  People who are very invested in arguing for slowing down AI development, or decreasing catastrophic risk from AI, like many in the effective altruism community, will also be happier if they succeed in getting more resources to pursue their goals. However, I believe it is better to assess arguments on their own merits. I agree with the title of the article that it is difficult to do this. I am not aware of any empirical quantitative estimate of the risk of human extinction resulting from transformative AI.
  I would consider driving people to delusion and suicide, killing people for self-preservation and even Hitler the man himself to be at least a somewhat “alien” style of evil.
  I agree those actions are alien in the sense of deviating a lot from what random people do. However, I think this is practically negligible evidence about the risk of human extinction.
- Yarrow Bouchard 🔸 28 Jan 2026 21:53 UTC
  5 points
  12 ∶ 11
  Parent
  I don’t really like accusations of motivated reasoning. The logic you presented cuts both ways.
  
  MIRI’s business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
  
  The financial basis for motivated reasoning is arguably even stronger in MIRI’s case than in Mechanize’s case. The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else. This means they are dependent on people being scared of enough of AGI to give money to MIRI. On the other hand, the technical skills needed to work on trying to advance the capabilities of current deep learning and reinforcement learning systems are transferable to working on the safety of those same systems. If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
  
  I’m also guessing the Mechanize co-founders decided to start the company after forming their views on AI safety. They were publicly discussing these topics long before Mechanize was founded. (Conversely, Yudkowsky/MIRI’s current core views on AI were formed roughly around 2005 and have not changed in light of new evidence, such as the technical and commercial success of AI systems based on deep learning and deep reinforcement learning.)
  
  I would consider driving people to delusion and suicide, killing people for self-preservation and even Hitler the man himself to be at least a somewhat “alien” style of evil.
  
  The Yudkowsky/Soares/MIRI argument about AI alignment is specifically that an AGI’s goals and motivations are highly likely to be completely alien from human goals and motivations in a way that’s highly existentially dangerous. If you’re making an argument to the effect that ‘humans can also be misaligned in a way that’s extremely dangerous’, I think, at that point, you should acknowledge you’ve moved on from the Yudkowsky/Soares/MIRI argument (and maybe decided to reject it). You’re now making a quite distinct argument that needs to be evaluated independently. It may be worth asking what to do about the risk that powerful AI systems will have human-like goals and motivations that are dangerous in the same way that human goals and motivations can be dangerous. But that is a separate premise from what Yudkowsky and Soares are arguing.
  - Thomas Kwa🔹 29 Jan 2026 19:41 UTC
    37 points
    13 ∶ 9
    Parent
    I strongly disagree with a couple of claims:
    MIRI’s business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
    [...] The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
    $235K is not very much money [edit: in the context of the AI industry]. I made close to Nate’s salary as basically an unproductive intern at MIRI. $600K is also not much money. A Preparedness researcher at OpenAI has a starting salary of $310K – $460K plus probably another $500K in equity. As for nonprofit salaries, METR’s salary range goes up to $450K just for a “senior” level RE/RS, and I think it’s reasonable for nonprofits to pay someone with 20 years of experience, who might be more like a principal RS, $600K or more.
    In contrast, if Mechanize succeeds, Matthew Barnett will probably be a billionaire.
    If Yudkowsky said extinction risks were low and wanted to focus on some finer aspect of alignment, e.g. ensuring that AIs respect human rights a million years from now, donors who shared their worldview would probably keep donating. Indeed, this might increase donations to MIRI because it would be closer to mainstream beliefs.
    MIRI’s work seems very transferable to other risks from AI, which governments and companies both have an interest in preventing. Yudkowsky and Soares have a somewhat weird skillset and I disagree with some of their research style but it’s plausible to me they could still work productively in a mathy theoretical role in either capabilities or safety.
    However, things I agree with
    If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
    the Mechanize co-founders decided to start the company after forming their views on AI safety.
    The Yudkowsky/Soares/MIRI argument about AI alignment is specifically that an AGI’s goals and motivations are highly likely to be completely alien from human goals and motivations in a way that’s highly existentially dangerous.
    - Ben Stevenson 31 Jan 2026 10:29 UTC
      27 points
      18 ∶ 7
      Parent
      $235K is not very much money. […] $600K is also not much money.
      
      This is false.
      - paul_dfr 1 Feb 2026 5:09 UTC
        10 points
        2 ∶ 1
        Parent
        I agree. But the reason I agree is that I think the relevant metric of what counts as a lot of money here is not whether it is a competitive salary in an ML context, but whether it would be perceived as a lot of money in a way that could plausibly threaten Eliezer’s credibility among people who would otherwise be more disposed to support AI safety, e.g. if cited broadly. I believe the answer is that it is, and so in a way that even a sub-$250k salary would not be (despite how insanely high a salary that is by the standard of even most developed countries), and I would guess this expected effect to be bigger than the incentive benefits of guaranteeing his financial independence. For this reason, accepting this level of income struck me as unwise, though I’m happy to be persuaded otherwise.
        Vasco Grilo🔸 1 Feb 2026 9:02 UTC
        3 points
        0 ∶ 0
        Parent
        Thanks for the good point, Paul. I tend to agree.
      - Larks 31 Jan 2026 21:13 UTC
        10 points
        2 ∶ 1
        Parent
        The context of this quote, which you have removed, is discussion of the reasonableness of wages for specific people with specific skills. Since neither Nate nor Eliezer’s counterfactual is earning the median global wage, your statistic seems irrelevant.
        Guy Raveh 3 Feb 2026 15:55 UTC
        4 points
        1 ∶ 1
        Parent
        What do you think their counterfactual is? I don’t think any of what they’ve been doing is really transferable.
      - Nick K. 1 Feb 2026 11:54 UTC
        5 points
        2 ∶ 0
        Parent
        One should stick to the original point that raised the question about salary.
        Is $600K a lot of money for most people and does EY hurt his cause by accepting this much? (Perhaps, but not the original issue)
        Does EY earning $600K mean he’s benefitting substantially from maintaining his position on AI safety? E.g. if he was more pro AI development, would this hurt him financially? (Very unlikely IMO, and that was the context Thomas was responding to)
      - Thomas Kwa🔹 31 Jan 2026 21:01 UTC
        4 points
        1 ∶ 0
        Parent
        On a global scale I agree. My point is more that due to the salary standards in the industry, Eliezer isn’t necessarily out of line in drawing $600k, and it’s probably not much more than he could earn elsewhere; therefore the financial incentive is fairly weak compared to that of Mechanize or other AI capabilities companies.
        Ben Stevenson 31 Jan 2026 23:15 UTC
        7 points
        2 ∶ 2
        Parent
        Thanks for the reply. I agree with your specific point but I think it’s worth being more careful with your phrasing. How much we earn is an ethically-charged thing, and it’s not a good thing if EA’s relationship with AI companies gives us a permission structure to lose sight of this.
        Edit: to be clear, I agree that “it’s probably not much more than he could earn elsewhere” but disagree that “Eliezer isn’t necessarily out of line in drawing $600k”
        NickLaing 2 Feb 2026 8:05 UTC
        3 points
        0 ∶ 0
        Parent
        It’s true Mechanize are trying to hire him for 650k...
    - David T 30 Jan 2026 21:18 UTC
      24 points
      9 ∶ 2
      Parent
      $235K is not very much money. I made close to Nate’s salary as basically an unproductive intern at MIRI.
      I understand the point being made (Nate plausibly could get a pay rise from an accelerationist AI company in Silicon Valley, even if the work involved was pure safetywashing, because those companies have even deeper pockets), but I would stress that these two sentences underline just how lucrative peddling doom has become for MIRI^[1] as well as how uniquely positioned all sides of the AI safety movement are.
      There are not many organizations whose messaging has resonated with deep pocketed donors to the extent that they can afford to pay their [unproductive] interns north of $200k pro rata to brainstorm with them.^[2] Or indeed up to $450k to someone with interesting ideas for experiments to test AI threats, communication skills and at least enough knowledge of software to write basic Python data processing scripts. So the financial motivations to believe that AI is really important are there on either side of the debate; the real asymmetry is between the earning potential of having really strong views on AI vs really strong views on the need to eliminate malaria or factory farming.
      ^
      tbf to Eliezer, he appears to have been prophesizing imminent tech-enabled doom/salvation since he was a teenager on quirky extropian mailing lists, so one thing he cannot be accused of is bandwagon jumping.
      ^
      Outside the Valley bubble, plenty of people at profitable or well-backed companies with specialist STEM skillsets or leadership roles are not earning that for shipping product under pressure, never mind junior research hires for nonprofits with nominally altruistic missions
      - Nick K. 1 Feb 2026 12:03 UTC
        12 points
        2 ∶ 2
        Parent
        I think this misses the point: The financial gain comes from being central to ideas around AI in itself. I think given this baseline, being on the doomer side tends to carry huge opportunity cost financially.
        At the very least it’s unclear and I think you should make a strong argument to claim anyone financially profits from being a doomer.
        David T 1 Feb 2026 20:15 UTC
        0 points
        3 ∶ 2
        Parent
        The opportunity cost only exists for those with a high chance of securing comparable level roles in AI companies, or very senior roles at non-AI companies in the near future. Clearly this applies to some people working in AI capabilities research,^[1] but if you wish to imply this applies to everyone working at MIRI and similar AI research organizations, I think the burden of proof actually rests on you. As for Eliezer, I don’t think his motivation for dooming is profit, but it’s beyond dispute that dooming is profitable for him. Could he earn orders of magnitude more money from building benevolent superintelligence based on his decision theory as he once hoped to? Well yes, but it’d have to actually work.^[2]
        Anyway, my point was less to question MIRI’s motivations or Thomas’ observation Nate could earn at least as much if he decided to work for a pro-AI organization and more to point out that (i) no, really, those industry norm salaries are very high compared with pretty much any quasi-academic research job not related to treating superintelligence as imminent and especially to roles typically considered “altruistic” and (ii) if we’re worried that money gives AI company founders the wrong incentives, we should worry about the whole EA-AI ecosystem and talent pipeline EA is backing. Especially since that pipeline incubated those founders.
        ^
        including Nate
        ^
        and work in a way that didn’t kill everyone, I guess...
    - Yarrow Bouchard 🔸 4 Feb 2026 11:53 UTC
      1 point
      0 ∶ 0
      Parent
      
      In contrast, if Mechanize succeeds, Matthew Barnett will probably be a billionaire.
      
      If Mechanize succeeds in its long-term goal of “the automation of all valuable work in the economy”, then everyone on Earth will be a billionaire.
      - MatthiasE 8 Feb 2026 21:36 UTC
        3 points
        0 ∶ 0
        Parent
        Outside view: If I got WID data right: net personal wealth of US top percentile increased from $.59 Million in 1820 to $13.53 Million in 2024. For the bottom two deciles of India it increased from $58 to $228.
        The industrial revolution made some people very rich, but not others. Why would transformative AI make everybody incredibly rich?
        See also https://intelligence-curse.ai/
        
        I used: Average net personal wealth, all ages, equal split, Dollar $ ppp constant (2024)
        (I’m new to WID database and did not have time to read the data documentation. Let me know if I interpret data wongly.) Source: https://wid.world/
        Vasco Grilo🔸 9 Feb 2026 9:46 UTC
        2 points
        0 ∶ 0
        Parent
        Hi Matthias. Thanks for linking to the World Inequality Database (WID). I had never checked it out, and it has very interesting data.
      - Vasco Grilo🔸 4 Feb 2026 13:55 UTC
        2 points
        0 ∶ 0
        Parent
        Global wealth would have to increase a lot for everyone to become billionaire. There are 10 billion people. So everyone being a billionaire would require a global wealth of 10^19 $ (= 10*10^9*1*10^9) for perfect distribution. Global wealth is 600 T$. So it would have to become 16.7 k (= 10^19/(600*10^12)) times as large. For a growth of 10 %/year, it would take 102 years (= LN(16.7*10^3)/LN(1 + 0.10)). For a growth of 30 %/year, it would take 37.1 years (= LN(16.7*10^3)/LN(1 + 0.30)).
  - Matthew_Barnett 1 Feb 2026 1:10 UTC
    18 points
    6 ∶ 1
    Parent
    I think the claim that Yudkowsky’s views on AI risk are meaningfully influenced by money is very weak. My guess is that he could easily find another opportunity unrelated to AI risk to make $600k per year if he searched even moderately hard.
    The claim that my views are influenced by money is more plausible because I stand to profit far more than Yudkowsky stands to profit from his views. However, while perhaps plausible from the outside, this claim does not match my personal experience. I developed my core views about AI risk before I came into a position to profit much from them. This is indicated by the hundreds of comments, tweets, in-person arguments, DMs, and posts from at least 2023 onward in which I expressed skepticism about AI risk arguments and AI pause proposals. As far as I remember, I had no intention to start an AI company until very shortly before the creation of Mechanize. Moreover, if I was engaging in motivated reasoning, I could have just stayed silent about my views. Alternatively, I could have started a safety-branded company that nonetheless engages in capabilities research—like many of the ones that already exist.
    It seems implausible that spending my time writing articles advocating for AI acceleration is the most selfishly profitable use of my time. The direct impact of the time I spend building Mechanize is probably going to have a far stronger effect on my personal net worth than writing a blog post about AI doom. However, while I do not think writing articles like this one is very profitable for me personally, I do think it is helpful for the world because I see myself as providing a unique perspective on AI risk that is available almost nowhere else. As far as I can tell, I am one of only a very small number of people in the world who have both engaged deeply with the arguments for AI risk and yet actively and explicitly work toward accelerating AI.
    In general, I think people overestimate how much money influences people’s views about these things. It seems clear to me that people are influenced far more by peer effects and incentives from the social group they reside in. As a comparison, there are many billionaires who advocate for tax increases, or vote for politicians who support tax increases. This actually makes sense when you realize that merely advocating or voting for a particular policy is very unlikely to create change that meaningfully impacts you personally. Bryan Caplan has discussed this logic in the context of arguments about incentives under democracy, and I generally find his arguments compelling.
    - Yarrow Bouchard 🔸 4 Feb 2026 11:10 UTC
      1 point
      1 ∶ 0
      Parent
      
      I think the claim that Yudkowsky’s views on AI risk are meaningfully influenced by money is very weak.
      
      To be clear, I agree. I also agree with your general point that other factors are often more important than money. Some of these factors include the allure of millennialism, or the allure of any sort of totalizing worldview or “ideology”.
      
      I was trying to make a general point against accusations of motivated reasoning related to money, at least in this context. If two sets of people are each getting paid to work on opposite sides of an issue, why only accuse one side of motivated reasoning?
      
      This is indicated by the hundreds of comments, tweets, in-person arguments, DMs, and posts from at least 2023 onward in which I expressed skepticism about AI risk arguments and AI pause proposals.
      
      Thanks for describing this history. Evidence of a similar kind lends strong credence to Yudkowsky forming his views independent from the influence of money as well.
      
      My general view is that reasoning is complex, motivation is complex, people’s real psychology is complex, and that the forum-like behaviour of accusing someone of engaging in X bias is probably a misguided pop science simplification of the relevant scientific knowledge. For instance, when people engage in distorted thinking, the actual underlying reasoning often seems to be a surprisingly complicated multi-step sequence.
      
      The essay above that you co-wrote is incredibly strong. I was the one who originally sent it to Vasco and, since he is a prolific cross-poster and I don’t like to cross-post under my name, encouraged him to cross-post it. I’m glad more people in the EA community have now read it. I think everyone in the EA community should read it. It’s regrettable that there’s only been one object-level comment on the substance of the essay so far, and so many comments about this (to me) relatively uninteresting and unimportant side point about money biasing people’s beliefs. I hope more people will comment on the substance of the essay at some point.
    - Nick K. 1 Feb 2026 12:17 UTC
      1 point
      0 ∶ 0
      Parent
      Thanks for this comment!
      I think your arguments about your own motivated reasoning are somewhat moot, since they seem more of an explanation that your behavior/public facing communication isn’t straightout deception (which seems right!). As I see it, motivated reasoning is to a large extent about deceiving yourself and maintaining a coherent self-narrative, so it’s perfectly plausible that one is willing to pay substantial cost in order to maintain this. (Speaking generally; I’m not very interested in discussing whether you’re doing it in particular.)
  - RobertM 2 Feb 2026 8:21 UTC
    14 points
    1 ∶ 0
    Parent
    The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
    Soares’ experience was a software engineer at Microsoft and Google before joining MIRI, and would trivially be able to rejoin industry after a few weeks of self-study to earn more money if for some reason he decided he wanted to do that. I won’t argue the point about EY—it seems obvious to me that his market value as a writer/communicator is well in excess of his 2023/2024 compensation, given his track record, but the argument here is less legible. Thankfully it turns out that somebody anticipated the exact same incentive problem and took action to mitigate it.
    - Yarrow Bouchard 🔸 4 Feb 2026 9:26 UTC
      10 points
      1 ∶ 1
      Parent
      It’s interesting to claim that money stops being an incentive for people after a certain fixed amount well below $1 million/year. Let’s say that’s true — maybe it is true — then why do we treat people like Sam Altman, Dario Amodei, Elon Musk, and so on as having financial incentives around AI? Are we wrong to do so? (What about AI researchers and engineers who receive multi-million-dollar compensation packages? After the first, say, $5 million, are they free and clear to form unmotivated opinions?)
      
      I think a very similar argument can be made about the Mechanize co-founders. They could make “enough” money doing something else — including their previous jobs — even if it’s less money than they might stand to gain from a successful AI capabilities startup. Should we then rule out money as an incentive?
      
      To be clear, I don’t claim that Eliezer Yudkowsky, Nate Soares, others at MIRI, or the Mechanize co-founders are unduly motivated by money in forming their beliefs. I have no way of knowing that, and since there’s no way to know, I’m willing to give them all the benefit of the doubt. I’m saying I dislike accusations of motivated reasoning in large part because they’re so easy to level at people you disagree with, and it’s easy to overlook how the same argument could apply to yourself or people you agree with. I’m pointing out how a similar accusation could be levelled at Yudkowsky and Soares in order to illustrate this general point, specifically to challenge Nick Laing’s accusation against the Mechanize co-founders above.
      
      I generally think that ideological motivation around AGI is a powerful motivator. I think the psychology around how people form their beliefs on AGI is complex and involves many factors (e.g. millennialist cognitive bias, to name just one).
      - Nick K. 4 Feb 2026 15:00 UTC
        3 points
        0 ∶ 1
        Parent
        It’s interesting to claim that money stops being an incentive for people after a certain fixed amount well below $1 million/year.
        Where is this claim being made? I think the suggestion was that someone found it desirable to reduce the financial incentive gradient for EY taking any particular public stance, not some vastly general statement like what you’re suggesting.
      - Ian Turner 6 Feb 2026 16:44 UTC
        2 points
        0 ∶ 0
        Parent
        Personally I don’t think Sam Altman is motivated by money. He just wants to be the one to build it.
        
        I sense that Elon Musk and Dorio Amodei’s motivations are more complex than “motivated by money”, but I can imagine that the actual dollar amounts are more important to them than to Sma.
  - MichaelDickens 29 Jan 2026 16:14 UTC
    12 points
    3 ∶ 2
    Parent
    
    MIRI pays Eliezer Yudkowsky $600,000 a year.
    
    I believe this is because a donor specifically requested it. The express purpose of the donation was to make Eliezer rich enough that he could afford to say “actually AI risk isn’t a big deal” and shut down MIRI without putting himself in a difficult financial situation.
    
    Edit Feb 2: Apparently the donation I was thinking of is separate from Eliezer’s salary, see his comment
    - Vasco Grilo🔸 29 Jan 2026 19:02 UTC
      4 points
      1 ∶ 2
      Parent
      Thanks for sharing, Michael. If I was as concerned about AI risk as @EliezerYudkowsky, I would use practically all the additional earnings (e.g. above Nate’s 235 k$/year; in reality I would keep much less) to support efforts to decrease it. I would believe spending more money on personal consumption or investments would just increase AI risk relative to supporting the most cost-effective efforts to decrease it.
      - MichaelDickens 30 Jan 2026 2:26 UTC
        12 points
        2 ∶ 3
        Parent
        A donor wanted to spend their money this way; it would not be fair to the donor for Eliezer to turn around and give the money to someone else. There is a particular theory of change according to which this is the best marginal use of ~$1 million: it gives Eliezer a strong defense against accusations like
        
        If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
        
        I kinda don’t think this was the best use of a million dollars, but I can see the argument for how it might be.
        EliezerYudkowsky 2 Feb 2026 4:19 UTC
        33 points
        12 ∶ 0
        Parent
        I got a one-time gift of appreciated crypto, not through MIRI, part of whose purpose as I understood it was to give me enough of a savings backstop (having in previous years been not paid very much at all) that I would feel freer to speak my mind or change my mind should the need arise.
        I have of course already changed MIRI’s public mission sharply on two occasions, the first being when I realized in 2001 that alignment might need to be a thing, and said so to the primary financial supporter who’d previously supported MIRI (then SIAI) on the premise of charging straight ahead on AI capabilities; the second being in the early 2020s when I declared publicly that I did not think alignment technical work was going to complete in time and MIRI was mostly shifting over to warning the world of that rather than continuing to run workshops. Should I need to pivot a third time, history suggests that I would not be out of a job.
        What links here?
        MichaelDickens's comment on Unfalsifiable stories of doom by Vasco Grilo🔸 (29 Jan 2026 16:17 UTC; 15 points)
        MichaelDickens's comment on Unfalsifiable stories of doom by Vasco Grilo🔸 (29 Jan 2026 16:14 UTC; 12 points)
        Vasco Grilo🔸 30 Jan 2026 10:17 UTC
        13 points
        5 ∶ 1
        Parent
        If I had Eliezer’s views about AI risk, I would simply be transparent upfront with the donor, and say I would donate the additional earnings. I think this would ensure fairness. If the donor insisted I had to spend the money on personal consumption, I would turn down the offer if I thought this would result in the donor supporting projects that would decrease AI risk more cost-effectively than my personal consumption. I believe this would be very likely to be the case.
        NickLaing 30 Jan 2026 10:19 UTC
        2 points
        0 ∶ 0
        Parent
        100 percent agree. I was going to write something similar but this is better
  - NickLaing 29 Jan 2026 5:15 UTC
    5 points
    2 ∶ 2
    Parent
    I generally don’t love “motivated reasoning” arguments but on the exteme ends like Tobacco companies, government propaganda and AI accellerationist companies I’m happy with putting that out there. Especially in a field like AI safety which is so speculative anyway. In general I don’t think we should give people too much airtime who have enormous personal financial gains at stake, especially in a world where money is stronger than rationalism most of the time
    
    Wow I’m mind blown that Yudowsky pays himself that much. If only because it leaves him open to criticisms likt these. I still don’t think the financial incentives are as strong as for people starting an accellerationist company, but its a fair point.
    
    And yes on the alien argument I was arguing that some previous indications of rogue AI do seem to me somewhat Alien.
    - fergusq 29 Jan 2026 12:40 UTC
      17 points
      5 ∶ 2
      Parent
      While motivated reasoning is certainly something to look out for, the substance of the argument should also be taken into account. I believe that the main point of this post, that Yudkowsky and Soares’s book is full of narrative arguments and unfalsifiable hypotheses mostly unsupported by references to external evidence, is obviously true. As you yourself say, OP’s arguments are reasonable. On that background, this kind of attack from you seems unjustified, and I’d like to hear what parts/viewpoints/narratives/conclusions of the post are motivated reasoning in your estimation.
      I do agree that motivated reasoning is common with the proponents of AI adoption. As an example, I think the white paper Sparks of Artificial General Intelligence: Early experiments with GPT-4 by Microsoft is clearly a piece of advertising masquerading as a scientific paper. Microsoft has a lot to benefit from the commercial success of its partner company OpenAI, and the conclusions it suggests are almost certainly colored by this. Same could be said about many of OpenAI’s own white papers. But this does not mean that the examples or experiments they showcase are wrong per se (even if cherry-picked), or that there is no real information in them. Their results merely need to be read with the skeptical lenses.
      - Yarrow Bouchard 🔸 4 Feb 2026 9:44 UTC
        4 points
        1 ∶ 0
        Parent
        We should generally be skeptical of corporations (or even non-profits!) releasing pre-prints that look like scientific papers but might not pass peer review at a scientific journal. We should indeed view such pre-prints as somewhere between research and marketing. OpenAI’s pre-prints or white papers are a good example.
        
        I think it’s hard to claim that a pre-print like Sparks of AGI is insincere (it might be, but how could we support that claim?), but this doesn’t undermine the general point. Suppose employees at Microsoft Research wanted to publish a similar report arguing that GPT-4′s seeming cognitive capabilities are actually just a bunch of cheap tricks and not sparks of anything. Would Microsoft publish that report? It’s not just about how financial or job-related incentives shape what you believe (although that is worth thinking about), it’s also about how they shape what you can say out loud. (And, importantly, what you are encouraged to focus on.)
    - Yarrow Bouchard 🔸 4 Feb 2026 12:18 UTC
      12 points
      1 ∶ 0
      Parent
      There’s an expert consensus that tobacco is harmful, and there is a well-documented history of tobacco companies engaging in shady tactics. There is also a well-documented history of government propaganda being misleading and deceptive, and if you asked anyone with relevant expertise — historians, political scientists, media experts, whoever — they would certainly tell you that government propaganda is not reliable.
      
      But just lumping in “AI accelerationist companies” with that is not justified. “AI accelerationist” just means anyone who works on making AI systems more capable who doesn’t agree with the AI alignment/AI safety community’s peculiar worldview. In practice, that means you’re saying most people with expertise AI are compromised and not worth listening to, but you are willing to listen to this weird random group of people, some of whom like Yudkowsky who have no technical expertise in contemporary AI paradigms (i.e. deep learning and deep reinforcement learning). This seems like a recipe for disaster, like deciding that capitalist economists are all corrupt and that only Marxist philosophers are worth trusting.
      
      A problem with motivated reasoning arguments, when stretched to this extent, is that anyone can accuse anyone over the thinnest pretext. And rather than engaging with people’s views and arguments in any serious, substantive way, it just turns into a lot of finger pointing.
      
      Yudkowsky’s gotten paid millions of dollars to prophesize AI doom. Many people have argued that AI safety/AI alignment narratives benefit the AI companies and their investors. The argument goes like this: Exaggerating the risks of AI exaggerates AI’s capabilities. Exaggerating AI’s capabilities makes the prospective financial value of AI much higher than it really is. Therefore, talking about AI risk or even AI doom is good business.
      
      I would add that exaggerating risk may be a particularly effective way to exaggerate AI’s capabilities. People tend to be skeptical of anything that sounds like pie-in-the-sky hope or optimism. On the other hand, talking about risk sounds serious and intelligent. Notice what goes unsaid: many near-term AGI believers think there’s a high chance of some unbelievably amazing utopia just on the horizon. How many times have you heard someone imagine that utopia? One? Zero? And how many times have you heard various AI doom or disempowerment stories? Why would no one ever bring up this amazing utopia they think might happen very soon?
      
      Even if you’re very pessimistic and think there’s a 90% chance of AI doom, a 10% chance of utopia is still pretty damn interesting. And many people are much more optimistic, thinking there’s around a 1-30% chance of doom, which implies a 70%+ chance of utopia. So, what gives? Where’s the utopia talk? Even when people talk about the utopian elements of AGI futures, they emphasize the worrying parts: what if intelligent machines produce effectively unlimited wealth, how will we organize the economy? What policies will we need to implement? How will people cope? We need to start worrying about this now! When I think about what would happen if I won the lottery, my mind does not go to worrying about the downsides.
      
      I think the overwhelming majority of people who express views on this topic are true believers. I think they are sincere. I would only be willing to accuse someone of possibly doing something underhanded if, independently, they had a track record of deceptive behaviour. (Sam Altman has such a track record, and generally I don’t believe anything he says anymore. I have no way of knowing what’s sincere, what’s a lie, and what’s something he’s convinced himself of because it suits him to believe it.) I think the specific accusation that AI safety/AI alignment is a deliberate, conscious lie cooked up to juice AI investment is silly. It’s probably true, though, that people at AI companies have some counterintuitive incentive or bias toward talking up AI doom fears.
      
      However, my general point is that just as it’s silly to accuse AI safety/alignment people of being shills for AI companies, it also seems silly to me to say that AI companies (or “AI accelerationist” companies, which is effectively all major AI companies and almost all startups) are the equivalent of tobacco companies, and you shouldn’t pay attention to what people at AI companies say about AI. Motivated reasoning accusations made on thin grounds can put you into a deluded bubble (e.g. becoming a Marxist) and I don’t think AI is some clear-cut, exceptional case like tobacco or state propaganda where obviously you should ignore the message.
    - Vasco Grilo🔸 29 Jan 2026 6:59 UTC
      2 points
      0 ∶ 0
      Parent
      Wow I’m mind blown that Yudowsky pays himself that much. If only because it leaves him open to criticisms likt these. I still don’t think the financial incentives are as strong as for people starting an accellerationist company, but its a fair point.
      I think the strength of the incentives to behave in a given way is more proportional to the resulting expected increase in welfare than to the expected increase in net earnings. Individual human welfare is often assumed to be proportional to the logarithm of personal consumption. So a given increase in earnings increases welfare less for people earning more. In addition, a 1 % chance of earning 100 times more (for example, due to one’s company being successful) increases welfare less than a 100 % chance of earning 100 % more. More importantly, there are major non-financial benefits for Yudowsky, who is somewhat seen as a prophet in some circles.
  - Dave Banerjee 🔸 29 Jan 2026 3:26 UTC
    2 points
    1 ∶ 1
    Parent
    Why are they paid so much?
    - MichaelDickens 29 Jan 2026 16:17 UTC
      15 points
      3 ∶ 1
      Parent
      Copying from my other comment:
      
      The reason Eliezer gets paid so much is because a donor specifically requested it. The express purpose of the donation was to make Eliezer rich enough that he could afford to say “actually AI risk isn’t a big deal” and shut down MIRI without putting himself in a difficult financial situation.
      
      (I don’t know about Nate’s salary but $235K looks pretty reasonable to me? That’s less than a mid-level software engineer makes.)
      
      Edit Feb 2: Apparently the donation I was thinking of is separate from Eliezer’s salary, see his comment
    - Yarrow Bouchard 🔸 29 Jan 2026 3:33 UTC
      5 points
      3 ∶ 0
      Parent
      I’m not sure how they decide on what salaries to pay themselves. But the reason they have the money to pay themselves those salaries in the first place is that MIRI’s donors believe there’s a significant chance of AI destroying the world within the next 5-20 years and that MIRI (especially Yudkowsky) is uniquely positioned to prevent this from happening.
  - Jan_Kulveit 29 Jan 2026 17:16 UTC
    −7 points
    5 ∶ 14
    Parent
    The financial basis for motivated reasoning is arguably even stronger in MIRI’s case than in Mechanize’s case. The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
    It is somewhat difficult to react to this level of absolutely incredible nonsense politely, but I’ll try.
    
    I disagree with both Yudkowsky and Soares about many things, but very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.
    
    For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
    - Ben Stewart 30 Jan 2026 19:16 UTC
      14 points
      8 ∶ 3
      Parent
      “very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.”
      “Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.”
      
      fwiw both of these claims strike me as close to nonsense, so I don’t think this is a helpful reaction.
      - Nick K. 31 Jan 2026 10:10 UTC
        −7 points
        1 ∶ 0
        Parent
        If you ask the AIs they get numbers in the tens of millions to tens of billions range, with around 1 billion being the central estimate. (I haven’t extensively controlled for the effect and some calculations appear driven by narrative)
        Personally I find it hard to judge and tend to lean no when trying to think it through, but it’s not obviously nonsense.
    - Yarrow Bouchard 🔸 4 Feb 2026 10:17 UTC
      5 points
      2 ∶ 2
      Parent
      I agree with Ben Stewart’s response that this is not a helpful thing to say. You are making some very strange and unintuitive claims. I can’t imagine how you would persuade a reasonable, skeptical, well-informed person outside the EA/LessWrong (or adjacent) bubble that these are credible claims, let alone that they are true. (Even within the EA Forum bubble, it seems like significantly more people disagree with you than agree.)
      
      To pick on just one aspect of this claim: it is my understanding that Yudkowsky has no meaningful technical proficiency with deep learning-based or deep reinforcement learning-based AI systems. In my understanding, Yudkowsky lacks the necessary skills and knowledge to perform the role of an entry-level AI capabilities researcher or engineer at any AI company capable of paying multi-million-dollar salaries. If there is evidence that shows my understanding is mistaken, I would like to see that evidence. Otherwise, I can only conclude that you are mistaken.
      
      I think the claim that an endorsement is worth billions of billions is also wrong, but it’s hard to disprove a claim about what would happen in the event of a strange and unlikely hypothetical. Yudkowsky, Soares, and MIRI have an outsized intellectual influence in the EA community (and obviously on LessWrong). There is some meaningful level of influence on the community of people working in the AI industry in the Bay Area, but it’s much less. Among the sort of people who could make decisions that would realize billions or tens of billions in value, namely the top-level executives at AI companies and investors, the influence seems pretty marginal. I would guess the overwhelming majority of investors either don’t know who Yudkowsky and Soares are or do but don’t care what their views are. Top-level executives do know who Yudkowsky is, but in every instance I’ve seen, they tend to be politely disdainful or dismissive toward his views on AGI and AI safety.
      
      Anyway, this seems like a regrettably unproductive and unimportant tangent.
      - Jan_Kulveit 4 Feb 2026 14:28 UTC
        −6 points
        1 ∶ 3
        Parent
        I think it could be a helpful response for people who are able to respond to signals of the type “someone who has demonstrably good forecasting skills, is an expert in the field, and works on this long time claims X” by at least re-evaluating if their models make sense and are not missing some important considerations.
        
        If someone is at least able to that, they can for example ask a friendly AI or some other friendly AI and they will tell you, based on conservative estimates and reference classes, that the original claim is likely wrong. They will still miss important considerations—in a way in which typical forecaster would also do—so the results are underestimates.
        
        I think at the level of [some combination of lack of ability to think and motivated reasoning] when people are uninterested in e.g. sanity checking their thinking with AIs, it is not worth the time correcting them. People are wrong on the internet all the time.
        
        (I think the debate was moderately useful—I made an update from this debate & voting patterns, broadly in the direction EA Forum descending to a level of random place on the internet where confused people talk about AI and it is broadly not worth to read or engage. I’m no longer that much active on EAF, but I’ve made some update)
        JP Addison🔸 5 Feb 2026 20:03 UTC
        10 points
        3 ∶ 2
        Parent
        This thread seems to have gone in an unhelpful direction.
        
        Questioning motivations is a hard point to make well. I’m unwilling to endorse that they are never relevant, but it immediately becomes personal. Keeping the focus primarily on the level of the arguments themselves is an approach more likely to enlighten and less likely to lead to flamewars.
        
        I’m not here to issue a moderation warning to anyone for the conversation ending up on the point of motivations. I do want to take my moderation hat off and suggest that people spend more time on the object level.
        
        I will then put my moderation hat back on and say that this and Jan’s previous comment breaks norms. You can disagree with someone without being this insulting.
        NickLaing 6 Feb 2026 5:38 UTC
        5 points
        0 ∶ 0
        Parent
        I agree the thread direction may be unhelpful, and flame wars are bad.
        
        I disagree though about the merits of questioning motivations, I think its super important.
        
        In the AI sphere, there are great theoretical arguments on all sides, good arguments for accelleration, caution, pausing etc. We can discuss these ad nauseum and I do think that’s useful. But I think motivations likely shape the history and current state of AI development more than unmotivated easoning and rational thought. Money and Power are strong motivators—EA’s have sidelined them at their peril before. Although we cannot know people’s hearts, we can see and analyse what they have done and said in the past and what motivational pressure might affect them right now.
        
        I also think its possislbe to have a somewhat object level about motivations.
        
        I think this article on the history of Modern AI outlines some of this well https://substack.com/home/post/p-185759007
        
        I might write more about this later...
    - Vasco Grilo🔸 29 Jan 2026 19:14 UTC
      5 points
      1 ∶ 2
      Parent
      Hi Jan.
      For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.
      Are you open to bets about this? I would be happy to bet 10 k$ that Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good. We could ask the marketing team at Anthropic or marketing experts elsewhere. I am not officially proposing a bet just yet. We would have to agree on a concrete operationalisation.
      - Nick K. 30 Jan 2026 16:28 UTC
        5 points
        2 ∶ 0
        Parent
        This doesn’t seem to be a reasonable way to operationalize. It would create much less value for the company if it was clear that they were being paid for endorsing them. And I highly doubt Amodei would be in a position to admit that they’d want such an endorsement even if it indeed benefitted them.
        Vasco Grilo🔸 30 Jan 2026 17:02 UTC
        2 points
        0 ∶ 0
        Parent
        Thanks for the good point, Nick. I still suspect Anthropic would not pay e.g. 3 billion $ for Yudkowsky and Soares to endorse their last model as good if they were hypothetically being honest. I understand this is difficult to operationalise, but it could still be asked to people outside Anthropic.
      - Jan_Kulveit 2 Feb 2026 10:58 UTC
        4 points
        0 ∶ 0
        Parent
        The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.
        
        What about this:
        
        MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.
        
        Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their—secret—plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.
        
        (Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)
        Vasco Grilo🔸 2 Feb 2026 12:13 UTC
        4 points
        0 ∶ 0
        Parent
        Thanks, Jan. I think it is very unlikely that AI companies with frontier models will seek the technical assistance of MIRI in the way you described in your 1st operationalisation. So I believe a bet which would only resolve in this case has very little value. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
        Jan_Kulveit 2 Feb 2026 20:41 UTC
        4 points
        0 ∶ 0
        Parent
        I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept” ; clearly MIRI is not making the offer because the labs don’t have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b
        
        I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.
        
        With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take. I’m happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like “3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated”.
        Vasco Grilo🔸 2 Feb 2026 21:41 UTC
        2 points
        0 ∶ 0
        Parent
        I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept”
        When would the offer from MIRI arrive in the hypothetical scenario? I am sceptical of an honest endorsement from MIRI today being worth 3 billion $, but I do not have a good sense of what MIRI will look like in the future. I would also agree a full-proof AI safety certification is or will be worth more than 3 billion $ depending on how it is defined.
        With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take.
        I was guessing I would have longer timelines. What is your median date of superintelligent AI as defined by Metaculus?
      - MikhailSamin 30 Jan 2026 15:04 UTC
        3 points
        1 ∶ 5
        Parent
        It’s not endorsing a specific model for marketing reasons; it’s about endorsing the effort, overall.
        Given that Meta is willing to pay billions of dollars for people to join them, and that many people don’t work on AI capabilities (or work, e.g., at Anthropic, as a lesser evil) because they share their concerns with E&S, an endorsement from E&S would have value in billions-tens of billions simply because of the talent that you can get as a result of this.
        David T 1 Feb 2026 20:33 UTC
        9 points
        2 ∶ 1
        Parent
        Meta is paying billions of dollars to recruit people with proven experience at developing relevant AI models.
        Does the set of “people with proven experience in building AI models” overlap with “people who defer to Eliezer on whether AI is safe” at all? I doubt it.
        Indeed given that Yudkowsky’s arguments on AI are not universally admired and people who have chosen building the thing he says will make everybody die as their career are particularly likely to be sceptical about his convictions on that issue, an endorsement might even be net negative.
        Vasco Grilo🔸 30 Jan 2026 17:45 UTC
        3 points
        0 ∶ 0
        Parent
        Thanks for the comment, Mikhail. Gemini 3 estimates a total annualised compensation of the people working at Meta Superintelligence Labs (MSL) of 4.4 billion $. If an endorsement from Yudkowsky and Soares was as beneficial (including via bringing in new people) as making 10 % of people there 10 % more impactful over 10 years, it would be worth 440 M$ (= 0.10*0.10*10*4.4*10^9).
        Nick K. 31 Jan 2026 10:20 UTC
        5 points
        1 ∶ 0
        Parent
        You could imagine a Yudkowsky endorsement (say with the narrative that Zuck talked to him and admits he went about it all wrong and is finally taking the issue seriously just to entertain the counterfactual...) to raise meta AI from “nobody serious wants to work there and they can only get talent by paying exorbitant prices” to “they finally have access to serious talent and can get a critical mass of people to do serious work”. This’d arguably be more valuable than whatever they’re doing now.
        I think your answer to the question of how much an endorsement would be worth mostly depends on some specific intuitions that I imagine Kulveit has for good reasons but most people don’t, so it’s a bit hard to argue about it. It also doesn’t help that in every other case than Anthropic and maybe deepmind it’d also require some weird hypotheticals to even entertain the possibility.
Tristan Katz 29 Jan 2026 20:20 UTC
12 points
5 ∶ 0
It seems to me that the ‘alien preferences’ argument is a red herring. Humans have all kinds of different preferences—only some of ours overlap, and I have no doubt that if one human became superintelligent that would also have a high risk of disaster, precisely because they would have preferences that I don’t share (probably selfish ones). So they don’t need to be alien in any strong sense to be dangerous.
I know it’s Y&S’s argument. But it would have been nice if the authors of this article had also tried to make it stronger before refuting it.
- Yarrow Bouchard 🔸 4 Feb 2026 9:55 UTC
  4 points
  1 ∶ 0
  Parent
  Help me understand what you’re saying here. Are you saying that Yudkowsky and Soares’s argument is just so obviously wrong that it’s almost uninteresting to discuss why it’s wrong? That you find the Mechanize co-founders refutation of the Yudkowsky and Soares argument disappointing because you found that argument so weak to begin with?
  
  If so, I’m not saying that’s a wrong view — not at all. But it’s worth noting how controversial that view is in the EA community (and other communities that talk a lot about AGI). Essays like this need to be written because so many people in this community (and others) believe Yudkowsky and Soares’ argument is correct. If my impression of the EA community is off base and actually there’s a community consensus that Yudkowsky and Soares’ argument is wrong, then more people should talk about this, because it’s really hard to get the wrong impression.
  
  I think it’s also worth discussing the question of what if AGI turns out to have generally human-like motivations and psychology. What dangers might it pose? How would it behave? But not every relevant and worthy question can be addressed in a single essay.
  - Tristan Katz 4 Feb 2026 11:04 UTC
    5 points
    1 ∶ 0
    Parent
    Thanks Yarrow, I can see that that was confusing.
    I don’t think that Yudkowsky & Soares’s argument as a whole is obviously wrong and uninteresting. On the contrary, I’m rather convinced by it, and I also want more critics to engage with it.
    But I think the argument presented in the book was not particularly strong, and others seem to agree: the reviews on this forum are pretty mixed (e.g.). So I’d prefer critics to argue against the best version of this argument, not just the one presented in the book. If these critics had only set out to write a book review, then I’d say fine. But that’s not what they were doing here. They write “there is no standard argument to respond to, no single text that unifies the AI safety community”—true, but you can engage with multiple texts in order to respond to the best form of the argument. In fact that’s pretty standard, in academia and outside of it.
    - Yarrow Bouchard 🔸 4 Feb 2026 11:29 UTC
      4 points
      0 ∶ 0
      Parent
      So, if the best version of Yudkowsky and Soares’ argument is not the one made in their book, what is the best version? Can you explain how that version of the argument, which they made previously elsewhere, is different than the version in the book?
      
      I can’t tell if you’re saying:
      
      a) that the alien preferences thing is not a crux of Yudkowsky and Soares’ overall argument for AI doom (it seems like it is) or if
      
      b) the version of the specific argument about alien preferences they gave in the book isn’t as good as previous versions they’ve given (which is why I asked what version is better) or if
      
      c) you’re saying that Yudkowsky and Soares’ book overall isn’t as good as their previous writings on AI alignment.
      
      I don’t know that academic reviewers of Yudkowsky and Soares’ argument would take a different approach. The book is supposed to be the most up-to-date version of the argument, and one the authors took a lot of care in formulating. It doesn’t feel intuitive to go back and look at their earlier writings and compare different version of the argument, which aren’t obviously different at first glance. (Will MacAskill and Clara Collier both complained the book wasn’t sufficiently different from previous formulations of the argument, i.e. wasn’t updated enough in light of advancements in deep learning and deep reinforcement learning over the last decade.) I think an academic reviewer might just trust that Yudkowsky and Soares’ book is going to be the best thing to read and respond to if they want to engage with their argument.
      
      You might, as an academic, engage in a really close reading of many versions of a similar argument made by Aristotle in different texts, if you’re a scholar of Aristotle, but this level of deep textual analysis doesn’t typical apply to contemporary works by lesser-known writers outside academia.
      
      The academic philosopher David Thorstad is writing a blog series in response to the book. I haven’t read it yet, so I don’t know if he pulls his alternative Yudkowsky and Soares writings other than the book itself. However, I think it would be perfectly fine for him to just focus on the book, and not seek out other texts from the same authors that make the same argument in maybe a better form.
      
      If what you’re saying is that there are multiple independent (and mutually incompatible) arguments for the AI safety community’s core claims, including ones that Yudkowsky and Soares don’t make, then I agree with that. I agree you can criticize that sentence in the Mechanize co-founders’ essay if you believe Yudkowsky’s views and arguments don’t actually unifies (or adequately represents) the views and arguments of the AI safety community overall. Maybe you could point out what those other arguments are and who has formulated them best. Maybe the Mechanize co-founders would write a follow-up piece engaging with those non-Yudkowsky arguments as well, to give a more complete engage with the AI safety community’s worldview.
      - Tristan Katz 4 Feb 2026 11:45 UTC
        3 points
        0 ∶ 0
        Parent
        The argument I’m referring to is the AI doom argument. Y&S are its most prominent proponents, but are widely known to be eccentric and not everyone agrees with their presentation of it. I’m not that deep in the AI safety space myself, but I think that’s pretty clear.
        The authors of this post seemed to respond to the AI doom argument more generally, and took the book to be the best representative of the argument. So that already seems like a questionable move, and I wish they’d gone further.
        I don’t think the point about alien preferences is a crux of the AI doom argument generally. I think it it’s presented in Bostrom’s Superintelligence and Rob Miles videos (and surely countless other places) as: “an ASI optimising for anything that doesn’t fully capture collective human preferences would be disastrous. Since we can’t define collective human preferences, this spells disaster.” In that sense it doesn’t have to be ‘alien’, just different from the collective sum of human preferences. I guess Y&S took the opportunity to say “LLMs seem MUCH more different” in an attempt to strengthen their argument, but they didn’t have to.
        So, as I said, I’m not really that deep into AI safety, so I’m not the person to go to for the best version of these arguments. But I read the book, sat down with some friends to discuss it… and we each identified flaws, as the authors of this post did, and then found ways to make the argument better, using other ideas we’d been exposed to and some critical reflection. It would have been really nice if the authors of the post had made that second step and steelmanned it a bit.
        Yarrow Bouchard 🔸 4 Feb 2026 12:06 UTC
        5 points
        0 ∶ 0
        Parent
        There’s a fine line between steelmanning people’s views and creating new views that are facially similar to those views but are crucially different from the views those people actually hold. I think what you’re describing is not steelmanning, but developing your own views different from Yudkowsky and Soares’ — views that they would almost certainly disagree with in strong terms.
        
        I think it would be constructive for you to publish the views you developed after reading Yudkowsky and Soares’ book. People might find that useful to read. That could give people something interesting to engage with. But if you write that Yudkowsky and Soares’ claim about alien preferences is wrong, many people will disagree with you (including Yudkowsky and Soares, if they read it). So, it’s important to get very clear on what different people in a discussion are saying and what they’re not saying. Just to keep everything straight, at least.
        
        I agree the alien preferences thing is not necessarily a crux of AI doom arguments more generally, but it is certainly a crux of Yudkowsky and Soares’ overall AI doom argument specifically. Yes, you can change their overall argument into some other argument that doesn’t depend on the alien preferences thing anymore, but then that’s no longer their argument, that’s a different argument.
        
        I agree that Yudkowsky and Soares (and their book) are not fully representative of the AI safety community’s views, and probably no single text or person (or pair of people) are. I agree that it isn’t really reasonable to say that if you can refute Yudkowsky and Soares (or their book), you refute the AI safety community’s views overall. So, I agree with that critique.
- Vasco Grilo🔸 30 Jan 2026 10:00 UTC
  1 point
  0 ∶ 0
  Parent
  Thanks for the comment, Tristan.
  I have no doubt that if one human became superintelligent that would also have a high risk of disaster, precisely because they would have preferences that I don’t share (probably selfish ones)
  I would worry if a single human had much more power than all other humans combined. Likewise, I would worry if an AI agent had more power than all other AI agents and humans combined. However, I think the probability of any of these scenarios becoming true in the next 10 years is lower than 0.001 %. Elon Musk has a net worth of 765 billion $, 0.543 % (= 765*10^9/(141*10^12)) of the market cap of all publicly listed companies of 141 T$.
  - Guy Raveh 6 Feb 2026 10:36 UTC
    4 points
    0 ∶ 0
    Parent
    Elon Musk has already used this power to do actions which will potentially kill millions (by funding the Trump campaign enough to get to close down USAID). I think that should worry us, and the chance of people amassing even more power should worry us even more.
    - Vasco Grilo🔸 6 Feb 2026 10:41 UTC
      2 points
      0 ∶ 0
      Parent
      Hi Guy. Elon Musk was not the only person responsible for the recent large cuts in foreign aid from the United States (US). In addition, I believe outcomes like human extinction are way less likely. I agree it makes sense to worry about concentration of power, but not about extreme outcomes like human extinction.
      - Guy Raveh 6 Feb 2026 11:26 UTC
        2 points
        0 ∶ 0
        Parent
        Extinction perhaps not, but I think eternal autocracy is definitely possible.
  - Tristan Katz 30 Jan 2026 10:14 UTC
    3 points
    0 ∶ 0
    Parent
    I think the evolution analogy becomes relevant again here: consider that the genus Homo was at first more intelligent than other species but not more powerful than their numbers combined… until suddenly one jump in intelligence let homo sapiens wreak havoc across the globe. Similarly, there might be a tipping point in AI intelligence where fighting back becomes very suddenly infeasible. I think this is a much better analogy than Elon Musk, because like an evolving species a superintelligent AI can multiply and self-improve.
    
    I think a good point that Y&S make is that we shouldn’t expect to know where the point of no return is, and should be prudent enough to stop well before it. I suppose you must have some source/reason for the 0.001% confidence claim, but it seems pretty wild to me to be so confident in a field like that is evolving and—at least from my perspective—pretty hard to understand.
    - Vasco Grilo🔸 30 Jan 2026 10:34 UTC
      2 points
      0 ∶ 2
      Parent
      It is unclear to me whether all humans together are more powerful than all other organisms on Earth together. It depends on what is meat by powerful. The power consumption of humans is 19.6 TW (= 1.07 + 18.5), only 0.700 % (= 19.6/(2.8*10^3)) of all organisms. In any case, all humans together being more powerful than all other organisms on Earth together is still way more likely than the most powerful human being much more powerful than all other organisms on Earth together.
      My upper bound of 0.001 % is just a guess, but I do endorse it. You can have a best guess that an event in very unlikely, but still be super uncertain about its probability. For example, one could believe an event has a probability of 10^-100 to 10^-10, which would imply it is super unlikely despite 90 (= −10 - (-100)) orders of magnitude (OOMs) of uncertainty in the probability.
      - Tristan Katz 6 Feb 2026 11:06 UTC
        3 points
        0 ∶ 0
        Parent
        By power I mean: ability to change the world, according to one’s preferences. Humans clearly dominate today in terms of this kind of power. Our power is limited, but it is not the case that other organisms have power over us, because while we might rely on them, they are not able to leverage that dependency. Rather, we use them as much as we can.
        No human is currently so powerful as to have power over all other humans, and I think that’s definitely a good thing. But it doesn’t seem like it would take much more advantage to let one intelligent being dominate all others.
        Vasco Grilo🔸 6 Feb 2026 11:44 UTC
        2 points
        0 ∶ 0
        Parent
        Are you thinking about humans as an aligned collective in the 1st paragraph of your comment? I agree all humans coordinating their actions together would have more power than other groups of organisms with their actual levels of coordination. However, such level of coordination among humans is not realistic. All 10^30 bacteria (see Table S1 of Bar-On et al. (2018)) coordinating their actions together would arguably also have more power than all humans with their actual level of coordination.
        I agree it is good that no human has power over all humans. However, I still think one being dominating all others has a probability lower than 0.001 % over the next 10 years. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views?
Ozzie Gooen 3 Feb 2026 4:53 UTC
7 points
2 ∶ 0
I’ve known and respected people on both sides of this, and have been frustrated by some of the back-and-forth on this.

On the side of the authors, I find these pieces interesting but very angsty. There’s clearly some bad blood here. It reminds me a lot of meat eaters who seem to attack vegans out of irritation more than deliberate logic. [1]

On the other, I’ve seen some attacks of this group on LessWrong that seemed over-the-top to me.

Sometimes grudges motivate authors to be incredibly productive, so maybe some of this can be useful.

It seems like others find these discussions useful form the votes, but as of now, I find it difficult to take much from them.

[1] I think there are many reasonable meat eaters out there, but there are also many who are angry/irrational about it.
- Yarrow Bouchard 🔸 4 Feb 2026 10:52 UTC
  16 points
  3 ∶ 0
  Parent
  I think part of where the angsty energy comes from is that Yudkowsky and Soares are incredibly brazen and insulting when they express their views on AI. For instance, Yudkowsky recently said that people with AGI timelines longer than 30 years are no “smarter than a potted plant”. Yudkowsky has publicly said, on at least two occasions, that he believes he’s the smartest person in the world — at least on AI safety and maybe just in general — and there’s no second place that’s particularly close. Yudkowsky routinely expresses withering contempt, even for people who are generally “on his side” and trying to be helpful. It’s really hard to engage with this style of “debate” (as it were) and not feel incredibly pissed off.
  
  When I was running an EA university group, if anyone had behaved like Yudkowsky routinely behaves, they would have been banned from the group, and I’m sure the members of my group would have unanimously agreed the behaviour is unacceptable. The same applies to any other in-person group, community, or social circle I’ve been apart of. It would scarcely be more acceptable than a man in an EA group repeatedly telling the women he just met there how hot they are. People generally don’t tolerate this kind of thing. I think many people would prefer not to reward this behaviour with attention, but given that Yudkowsky (and Soares) have already successfully gotten a lot of attention, it’s necessary to write replies like this essay (the one above, by the Mechanize co-founders).
  
  Privately, some people in the LessWrong community, where Yudkowsky is deeply revered, have said they find Yudkowsky’s style of engagement unpleasant and regrettable (in stronger words than that). Some have said it publicly. (Soares, too, has been publicly criticized for his demeanor toward people who are “on his side” and trying to be helpful, let alone people he disagrees with, or thinks he does.)
  
  I think it’s close to impossible not to feel angsty when engaging with Yudkowsky (and Soares), unless you happen to be one of those people who revere him and treat him as a role model (or, I don’t know, you’re a Zen Buddhist master). I agree that it’s regrettable for the debates to become as heated as they often get. I agree it would be more interesting to have intellectual discussions based in civility, mutual respect, curiosity about the other person’s opinion, intellectual generosity, and so on. But if someone isn’t willing to play ball, I think you’ve gotta either just ignore them, bite your tongue and be artificially polite (in which case some amount of angst will probably still be revealed), or write angry refutations.
Yarrow Bouchard 🔸 27 Feb 2026 14:06 UTC
4 points
0 ∶ 0
I can’t remember where — I thought it was maybe in a comment on this post, but apparently not — but I seem to recall someone on the EA Forum saying that MIRI or Yudkowsky deserved credit for correctly predicting the sort of “alignment” failures that modern AI systems like LLMs would exhibit. (If anyone remembers the specific comment I’m thinking of, please let me know.) I want to set the record straight and explain why this is not true.
Reinforcement learning was originally developed in the late 1970s and in the 1980s. Years before the founding of MIRI (originally called the Singularity Institute, and created with a different focus) and before Yudkowsky first wrote about “friendly AI”, RL researchers noticed the phenomenon of “reward hacking” or “specification gaming” (although these exact terms were not always used to describe it). One example is found in the 1998 paper “Learning to Drive a Bicycle using Reinforcement Learning and Shaping”. The authors created a bicycle riding simulation and tasked an RL agent with riding the bicycle toward a target. The RL agent found it could maximize reward by riding in circles around the target (page 6):
We agree with Mataric [Mataric, 1994] that these heterogeneous reinforcement functions have to be designed with great care. In our first experiments we rewarded the agent for driving towards the goal but did not punish it for driving away from it. Consequently the agent drove in circles with a radius of 20–50 meters around the starting point. Such behavior was actually rewarded by the reinforcement function…
One could cite many more examples like this.^[1]
It is possible to mistake an awareness of well-known concepts in a field (such as RL or AI more generally) with prescience or insight. Readers of MIRI’s or Yudkowsky’s writing should be wary of this.
1. ^
  A similar example to the one just given but using real robots made of Lego is found in the 2004 paper “Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learning”. One robot learned to continually drive backwards and forwards along the same stretch of track to maximize reward (page 5):
  After some experimentation with the reinforcement signal, the reinforcement learning system was much more successful on the line-following task than on the walking task.
  In the initial trials, the reinforcement signal used rewarded the robot with positive reinforcement for any action which led to the robot remaining on the track (measured by applying a threshold to the summed value of the light sensors). As the actions available did not provide an option for staying still, this was expected to lead to the robot moving forward along the path and eventually traversing the circuit. However the learning algorithm discovered that alternating turning left and right allowed the robot to reverse slowly in a straight line, and hence maximal reinforcement could be achieved by travelling along a straight section of line at the beginning of the track, and then reversing back along that same section of track.