Executive summary: The author argues, speculatively but seriously, that EA’s AI safety ecosystem may be drifting into a “Moloch-like” structural trap where wealth from EA-aligned AI companies funds the very organisations meant to evaluate them, risking a form of regulatory capture even without bad intent.
Key points:
The author proposes a causal chain where EA prioritisation of AI safety leads to talent entering AI firms, generating wealth that is then funneled back into EA organisations, including those overseeing those same firms.
The concern is that funding dependence can erode an organisation’s capacity to produce findings that threaten donor interests, even if no bias is consciously exercised.
The author suggests selection effects will favor organisations whose work is compatible with companies like Anthropic, without requiring explicit coordination.
They argue that “value drift” and shared professional context may gradually align donors’ and organisations’ views, making this convergence hard to detect from the inside.
The author claims AI safety lacks strong external feedback loops, so judgments of “impact” rely on insiders, making the field vulnerable to Goodhart-like dynamics.
They offer testable predictions, such as Anthropic-derived funding exceeding “>40%” of AI safety nonprofit funding and constraint-advocating organisations receiving relatively less funding, while noting counterforces like government and non-EA funding could offset the effect.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: The author argues, speculatively but seriously, that EA’s AI safety ecosystem may be drifting into a “Moloch-like” structural trap where wealth from EA-aligned AI companies funds the very organisations meant to evaluate them, risking a form of regulatory capture even without bad intent.
Key points:
The author proposes a causal chain where EA prioritisation of AI safety leads to talent entering AI firms, generating wealth that is then funneled back into EA organisations, including those overseeing those same firms.
The concern is that funding dependence can erode an organisation’s capacity to produce findings that threaten donor interests, even if no bias is consciously exercised.
The author suggests selection effects will favor organisations whose work is compatible with companies like Anthropic, without requiring explicit coordination.
They argue that “value drift” and shared professional context may gradually align donors’ and organisations’ views, making this convergence hard to detect from the inside.
The author claims AI safety lacks strong external feedback loops, so judgments of “impact” rely on insiders, making the field vulnerable to Goodhart-like dynamics.
They offer testable predictions, such as Anthropic-derived funding exceeding “>40%” of AI safety nonprofit funding and constraint-advocating organisations receiving relatively less funding, while noting counterforces like government and non-EA funding could offset the effect.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.