SummaryBot comments on Unless its governance changes, Anthropic is untrustworthy

SummaryBot 4 Dec 2025 17:55 UTC
4 points
0 ∶ 0
Executive summary: The post argues that Anthropic, despite its safety-focused branding and EA-aligned culture, is currently untrustworthy because its leadership has broken or quietly walked back key safety-related commitments, misled stakeholders, lobbied against strong regulation, and adopted governance and investment structures that the author thinks are unlikely to hold up under real pressure, so employees and potential joiners should treat it more like a normal frontier AI lab racing capabilities than a mission-first safety organization.
Key points:
1. The author claims Anthropic leadership repeatedly gave early investors, staff, and others the impression that it would not push the AI capabilities frontier and would only release “second-best” models, but later released models like Claude 3 Opus and subsequent systems that Anthropic itself described as frontier-advancing without clearly acknowledging a policy change.
2. The post argues that Anthropic’s own writings (e.g. “Core Views on AI Safety”) committed it to act as if we might be in pessimistic alignment scenarios and to “sound the alarm” or push for pauses if evidence pointed that way, yet Anthropic leaders have publicly expressed strong optimism about controllability and the author sees no clear operationalization of how the lab would ever decide to halt scaling.
3. The author claims Anthropic’s governance, including the Long-Term Benefit Trust and board, is weak, investor-influenced, and opaque, with at least one LTBT-appointed director lacking visible x-risk focus, and suggests that practical decision-making is driven more by fundraising and competitiveness pressures than by formal safety guardrails.
4. The post reports that Anthropic used concealed non-disparagement and non-disclosure clauses in severance agreements, only backed off after public criticism of OpenAI’s similar practice, and that a cofounder’s public statement about those agreements’ ambiguity was “a straightforward lie,” citing ex-employees who say the gag clauses were explicit and that at least one pushback attempt was rejected.
5. The author details Anthropic lobbying efforts on EU processes, California’s SB-1047, and New York’s RAISE Act, arguing that Anthropic systematically sought to weaken or kill strong safety regulation (e.g. opposing pre-harm enforcement, mandatory SSPs, independent agencies, whistleblower protections, and KYC tied to Amazon’s interests) while maintaining a public image of supporting robust oversight; they also accuse Jack Clark of making a clearly false claim about RAISE harming small companies.
6. The post claims Anthropic quietly weakened its Responsible Scaling Policy over time (e.g. removing commitments to plan for pauses, define ASL-N+1 before training ASL-N models, and maintain strong insider threat protections at ASL-3) without forthright public acknowledgment, and concludes that Anthropic’s real mission, as reflected in its corporate documents and behavior, is to develop advanced AI for commercial and strategic reasons rather than to reliably reduce existential risk, so staff and prospective employees should reconsider contributing to its capabilities work or demand much stronger governance.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.