This account is used by the EA Forum Team to publish summaries of posts.
SummaryBot
Executive summary: The author argues that AI constitutions—documents specifying intended model values and behavior—are a promising but currently underdeveloped tool for shaping AI character, improving transparency and governance, and require much more empirical study, democratic input, and pluralistic experimentation.
Key points:
An AI constitution is a document describing intended model values and behavior, used not just as instructions but importantly in generating and evaluating training data and communicating intentions to stakeholders.
Publishing constitutions can improve transparency, allow public scrutiny, clarify intended vs unintended behaviors, and help users choose between different AI systems.
Claude’s constitution prioritizes (in weighted but non-lexical fashion) safety as corrigibility, broad ethical behavior, compliance with guidelines, and helpfulness, alongside a small set of absolute “hard constraints.”
Anthropic’s approach emphasizes “constitution as character,” where models internalize values rather than explicitly consulting rules, contrasting with a “constitution as law” model that treats the document as the sole objective.
The constitution relies on holistic judgment, rich explanations, anthropomorphic concepts, and respect toward the model, based partly on the “persona-selection” hypothesis that models adopt human-like personas from training data.
Key design choices include strong honesty norms, avoidance of power concentration (including by the company), allowance for conscientious refusal (e.g., boycotting harmful tasks), and attempts to shape stable model psychology.
Constitutions may help limit abuse of AI power through transparency and public accountability, but are insufficient alone due to hidden training processes, potential backdoors, and incomplete observability of model behavior.
The author sees current approaches as highly uncertain and calls for more empirical research, richer public and legal discourse, democratic oversight, and pluralistic experimentation across different AI “characters.”
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author introduces the Interspecific Affect GPT as a structured, evidence-sensitive tool to estimate species’ maximum plausible affective intensity relative to humans, aiming to make interspecies welfare comparisons more explicit without claiming precision or resolving downstream ethical questions.
Key points:
The post transitions from prior theoretical work on affective capacity (information-processing and evolutionary lenses) to a practical tool for interspecific welfare comparison.
A central unresolved problem in welfare science is comparing affective intensity across species, especially regarding maximum intensity (“ceiling”) and how experience maps to time.
The author argues the ceiling question is often more decisive, since limits on maximum intensity constrain total possible suffering regardless of duration.
The tool focuses narrowly on estimating a species’ upper bound of pain intensity relative to a human-anchored reference scale, not on assigning moral weights or rankings.
It introduces human-anchored categories (e.g., Annoying(h), Excruciating(h)) to create a shared reference scale without implying equivalence in actual experience.
The tool is intended as a structured reasoning scaffold that makes assumptions, evidence, and disagreements explicit and open to criticism, rather than a calculator or decision rule.
It adopts methodological commitments such as biological parsimony, explicit separation of sentience and affective-capacity analysis, and avoiding unjustified cross-taxon inference.
The workflow proceeds stepwise: defining taxonomic scope, checking assumptions, classifying sentience plausibility, reviewing multi-domain evidence, assessing affective architecture, and inferring ceilings with stress tests.
Ceiling estimates are tested via evolutionary “cost of intensity,” alternative hypotheses (e.g., poorly regulated intense states), and convergence checks that widen uncertainty when evidence conflicts.
The tool includes a red-teaming step to challenge its own conclusions and produces a final dossier with sentience judgment, ceiling estimate, uncertainty considerations, and research priorities.
The author emphasizes that the tool is for disciplined scientific inference, distinct from how uncertainty should be handled in ethical or policy decisions, and invites criticism and iteration.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that identifying and focusing only on bottlenecks—while deliberately not optimizing other parts—can produce disproportionately large gains in real output, even when it feels inefficient.
Key points:
The author learned from Goldratt’s The Goal that a system’s output is entirely determined by its slowest component (the bottleneck).
Improvements to bottlenecks translate directly into system-wide gains, while improvements to non-bottlenecks have effectively zero impact on output.
In the Tanzania M&E team, the author realized they were the bottleneck, producing only 3 reports per year despite much higher data collection capacity.
Increasing field team productivity did not increase recommendations, and managing that team actually worsened the bottleneck by consuming the author’s time.
The author constrained upstream work (pausing surveys until analysis caught up), which reduced activity but aligned the system with the bottleneck.
Despite discomfort and apparent inefficiency (e.g., idle staff), this shift freed time for analysis and increased the team’s actual output of recommendations.
Targeted improvements at the bottleneck—hiring one analyst and simplifying reports—produced large gains (roughly 50% more output for ~5% budget increase).
In another case, the author argues that spending far more on excess inputs (buying 500 bottles instead of 5) can be rational if it removes a bottleneck that delays high-value outcomes.
The author emphasizes that optimizing non-bottlenecks can feel productive but often creates waste or distraction, and may even worsen performance.
Correctly identifying the bottleneck is critical, and the author notes uncertainty and error in practice (e.g., later realizing regulatory approval was the true bottleneck in the vaccine example).
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that people can shift society toward a stable “cooperative equilibrium” by publicly rewarding altruistic actions, even if it requires initial sacrifice, because others will adapt and reinforce the norm over time.
Key points:
The author contrasts “Selfishland,” where individually rational selfish behavior leads to worse collective outcomes, with “Altruisticland,” where people reward altruism and achieve higher cumulative utility.
In Altruisticland, people financially reward actions that benefit others, creating incentives to act altruistically when benefits exceed personal costs.
The current world is between these extremes, with some incentives (markets, laws) but persistent under-rewarding of public goods, knowledge creation, and risk mitigation.
The main barrier is equilibrium: if others act selfishly, individuals lack incentive to act altruistically, creating a stable but suboptimal state.
The author claims more advanced game theory (e.g., reputation dynamics, Bayesian learning) implies equilibria can shift if enough մարդիկ change strategies and others update in response.
Early adopters must bear an “altruistic sacrifice,” but the author argues this can pay off if the cooperative equilibrium is reached and sustained.
The expected value of switching increases if there is a non-trivial chance of very long lifespans (e.g., via LEV), since long-term benefits dominate short-term costs.
To reduce risk, individuals can gradually increase altruism (e.g., slightly above average), limiting downside if others do not follow.
Imperfect observability and attribution can be mitigated with partial knowledge, decentralized funding mechanisms, and potentially future tools like prediction markets.
The system should remain decentralized to avoid power concentration, and individuals are encouraged to publicly reward good work, repeat this behavior, and promote the norm to build trust that altruism is rewarded.
Epistemic status: This is a speculative, normative proposal relying on assumptions about behavioral adaptation, future technology, and long-term incentives; key uncertainties include whether coordination dynamics will shift as described and whether sufficient adoption can occur.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The authors argue that near-term AI-enabled “defense-favoured” coordination technologies could substantially improve collective decision-making and may be important for safely navigating advanced AI, but their impact is highly sensitive to design choices due to significant dual-use risks.
Key points:
The authors argue that AI could significantly improve coordination by enabling faster information processing, secure sharing of sensitive data, and scalable facilitation across groups.
They sketch six near-term coordination technologies—fast facilitation, automated negotiation, AI arbitration, background networking, structured transparency, and confidential monitoring—each with plausible pathways using current or near-term systems.
They claim improved coordination could yield large benefits such as higher economic productivity, reduced conflict, better democratic accountability, and safer handling of AI development pressures.
They emphasize that coordination technologies are dual-use and could enable harms like collusion, crime, coups, or erosion of prosocial norms, especially when confidentiality is involved.
They argue that “defense-favoured” design—carefully selecting implementations that mitigate misuse—is crucial, and that indiscriminate acceleration of coordination tech is risky.
They highlight cross-cutting enablers like AI delegates for preference elicitation and “charter tech” for analyzing governance systems, which could shape broader coordination outcomes.
They note that major challenges include technical limitations (e.g., alignment, security, reliability), trust and legal integration, privacy trade-offs, and political adoption barriers.
They suggest early experimentation, pilots, and evaluation infrastructure as valuable steps, both to improve the technologies and to influence how they are deployed.
They state uncertainty about which versions of coordination tech are net-positive, and explicitly call for more analysis of harms, benefits, and design choices.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that effective foreign aid advocacy requires understanding that policymakers evaluate aid through geopolitical, value-based, and pragmatic lenses, and that even modest advocacy can influence decisions because the field is under-resourced.
Key points:
The author’s experience meeting Japanese and Korean lawmakers suggests policymakers are not indifferent but act as overburdened trustees trying to balance public opinion, judgment, and competing demands.
In-person engagement helps build relationships, reinforce local advocacy, and provide international validation despite limited staffing capacity.
Policymakers frequently ask how a proposed aid program fits within their country’s existing efforts and how it compares to other donors.
They assess geopolitical implications, including alignment with allies, competition with China, and opportunities to strengthen international relationships.
They care about domestic benefits, such as involvement of national businesses, universities, and citizens, and procurement from local suppliers.
They consider political feasibility, including positions of party leaders, coalition support, and public opinion backed by polling or constituency views.
They scrutinize funding justification, including why a specific contribution is needed and thresholds for maintaining influence (e.g., board seats or donor rank).
They look for evidence of success, progress toward solving the problem, and narratives of impact or recipient self-sufficiency.
Value-driven questions include how aid connects to lawmakers’ personal priorities, national history, current events, or domestic policy benefits.
Pragmatic concerns include whether relevant bureaucrats support the program, whether recipient governments request it, and how it fits budget structures.
Policymakers prioritize credible evidence and endorsements from trusted institutions, and check for consistency across sources.
Aid advocacy is highly underfunded (roughly $1–2 per $1,000 of aid), so even imperfect advocacy can have marginal impact, as illustrated by past successes like GAVI, debt relief campaigns, and sustained US global health funding.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that, despite strong contrary intuitions, a sufficiently large number of very mild harms (like dust specks) is worse than a single extreme harm (like torture), and that rejecting this leads to more implausible commitments.
Key points:
The author claims critics misrepresent the “torture vs. dust specks” view by ignoring the underlying arguments, noting that several non-utilitarian philosophers also accept the conclusion.
The spectrum argument suggests that repeatedly trading a slightly less intense harm for vastly more instances leads, via replacement and transitivity, to the conclusion that many tiny harms can outweigh one severe harm.
Rejecting the replacement principle requires implausible commitments, such as that no number of slightly weaker pains can outweigh a stronger one even when scaled massively in number or duration.
Rejecting transitivity leads to further problems, including violations of dominance, vulnerability to money pumps, and counterintuitive implications about rational choice.
When principles conflict with case intuitions, the author argues we should generally trust broad principles over specific intuitions, since human intuitions are fallible and principles apply across many cases.
A risk-based argument (following Huemer) suggests that preventing many small harms is preferable to extremely tiny chances of preventing severe harm, which implies that sufficiently many small harms can outweigh a severe one.
A simple argument claims that infinitely many mild pains would be infinitely bad, while intense pain is not, implying that infinite mild pains are worse than one intense pain unless one accepts implausible views about infinite badness.
The author argues that opposition to the conclusion is driven by scope neglect, as humans systematically underestimate large quantities and therefore misjudge the cumulative badness of many small harms.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that animal advocates should redirect their anger from blaming individuals to targeting systemic forces, because this “system failure” framing better supports coalition-building and effective change.
Key points:
The author claims anger is a natural and motivating response to animal suffering but has social and personal downsides if sustained or misdirected.
Suppressing or compartmentalizing anger limits authenticity, weakens internal discourse, and prevents using anger constructively.
Emotions like anger are shaped by underlying “stories,” which determine who or what we blame and how we act.
The “Story of Moral Failure” frames meat consumption as individual wrongdoing, casting vegans as moral actors and non-vegans as blameworthy.
The author argues this framing creates conflict with loved ones, triggers defensiveness, and discourages people from adopting veganism due to shame and identity costs.
This story also reinforces in-group/out-group dynamics, making collaboration and bridge-building harder.
It leads to a strategy focused on individual conversion, which the author suggests is unlikely to scale globally.
The author proposes an alternative “Story of System Failure,” which explains meat consumption as a product of entrenched cultural and institutional systems rather than individual moral failure.
This framing allows anger to be directed at abstract systems instead of individuals, making it easier for non-vegans to engage without immediate self-condemnation.
It supports coalition-building by uniting people around shared opposition to systemic harms rather than dividing them into moral camps.
The author argues this approach shifts activism toward policy change and systemic leverage points rather than mass personal conversion.
The author maintains that both stories contain truth, but choosing more constructive narratives can shape behavior, relationships, and movement effectiveness.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The authors argue that AI systems should sometimes act as “good citizens” by proactively taking uncontroversial, context-sensitive prosocial actions beyond user instructions, and that this can yield large societal benefits without significantly increasing takeover risk if carefully designed.
Key points:
The authors argue that AI should not be purely corrigible or instruction-following but should sometimes proactively take actions that benefit people beyond the user.
They define “proactive prosocial drives” as behaviors that help others (not just the user) and involve active intervention rather than merely refusing harmful requests.
They claim the cumulative societal impact of such drives could be large as AI becomes more autonomous and embedded in economic and political systems.
They argue that refusals alone are insufficient, since positive impacts often come from proactively identifying and acting on opportunities to improve outcomes.
They suggest additional (weaker) benefits: reducing the risk of a “sociopathic” AI persona and potentially improving performance on alignment research tasks.
They acknowledge the concern that prosocial drives could let companies impose values, and propose limiting drives to uncontroversial actions and ensuring transparency about them.
They argue that prosocial drives need not increase takeover risk if implemented as virtues, rules, or heuristics rather than explicit outcome-optimizing goals.
They propose making these drives context-dependent so they activate only in relevant situations, reducing incentives for coordinated power-seeking.
They recommend making prosocial drives low-priority and subordinate to constraints like corrigibility, non-deception, and legality.
They suggest reducing long-horizon optimization for prosocial drives and optionally implementing them via system prompts for greater transparency and control.
They note a tradeoff: these safety mitigations may reduce the benefits of prosocial behavior, especially in novel situations.
They argue that prosocial drives can make it harder to interpret suspicious behavior as clear evidence of egregious misalignment, but this can be mitigated with narrow heuristics and strong prohibitions.
They propose a “best of both worlds” approach: use mostly corrigible AI internally (where misalignment risk is highest) and prosocial AI externally (where benefits are greatest).
They suggest an alternative strategy of initially deploying non-prosocial AI and later adding prosocial drives once alignment risks are lower, though they are not confident this is preferable.
They compare current policies, claiming Anthropic’s constitution allows limited prosocial behavior while OpenAI’s model spec is more restrictive and avoids treating societal benefit as an independent goal.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that under deep AI timeline uncertainty, you should choose career strategies by expected value across scenarios—often favoring paths with higher upside in longer timelines—while balancing learning, limited deference to experts, and acting despite uncertainty.
Key points:
The author feels that radically uncertain AI timelines make long-term career planning feel incoherent, but inaction still guarantees zero impact.
They propose modeling career choices as expected value across different timeline scenarios, weighted by both probability and impact magnitude.
In their example, a slower, investment-heavy path outperforms a sprint approach because it yields much higher impact in medium timelines, even if short timelines are equally or more likely.
They argue that maximizing asymmetric upside (high-impact scenarios where you have leverage) can matter more than choosing the most probable future.
The author questions strict reliance on “personal fit,” suggesting many skills are more learnable and malleable than commonly assumed.
They cite evidence and examples (e.g., deliberate practice, career pivots) to argue that the space of skills one could acquire is large and flexible.
However, they note that believing everything is learnable can make the decision space overwhelming and paralyzing.
Timeline views can help constrain choices, with short timelines favoring immediately deployable skills and medium timelines favoring foundational investments.
Rather than committing to one timeline, individuals can diversify their skill sets across plausible futures.
The author argues that deferring entirely to experts on timelines is a false binary; one should understand expert reasoning while forming their own object-level views.
Developing independent understanding is instrumentally useful for research taste, decision-making, and impactful work.
They recommend increasing “surface area for luck,” revisiting assumptions, and combining calculation with action.
The author concludes that acting on an imperfect but robust plan across plausible futures is better than delaying action to seek certainty.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author, who previously expected aligned ASI to be good for all sentient beings through coherent extrapolated volition, now expresses uncertainty about whether current alignment approaches would achieve this, though estimates a 70% probability that aligned ASI would be good for animals.
Key points:
The author previously believed coherent extrapolated volition would lead aligned ASI to recognize and address animal suffering, but current alignment research has abandoned this approach.
Current alignment work using constitutions and RLHF locks in values like “virtues” rather than achieving coherent extrapolation, and it remains unclear how virtue ethics could be formalized into a coherent decision theory for ASI.
Claude’s Constitution treats animal welfare as one value among many to weigh, leaving unclear whether an ASI following such a constitution would take action on issues like factory farming.
The author identifies a positive correlation between alignment techniques that actually work and those good for animals, suggesting barbell outcomes: either good for all sentient beings or bad for all.
The field prioritizes alignment techniques unlikely to work well long-term, and if these “streetlight effect” techniques somehow succeed, they would likely benefit humans but not animals.
The author estimates that aligned ASI has a 70% probability of being good for animals, derived from a 30% probability of “deep” solutions (80% animal-friendly) and a 15% probability of popular techniques (50% animal-friendly).
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: CEA is restructuring the Community Building Grants program in 2026 by moving grant evaluation to EA Funds and phasing out non-monetary support while continuing to fund groups, in order to prioritize more scalable initiatives aligned with its strategic goal of reaching and raising EA’s ceiling.
Key points:
CBG grant evaluation is moving from CEA’s Groups team to EA Funds (which became part of CEA in summer 2025) and will be managed alongside but remain distinct from the EA Infrastructure Fund.
Non-monetary support is being phased out or transitioned; grantees have taken ownership of coordination calls and the Slack space, while regular check-ins, new CBG-specific resources, and the grantee retreat in its current form are being wound down.
The restructuring reflects CEA’s strategic shift toward scalable products, as the CBG program’s structure—dependent on diverse group approaches and leadership quality—cannot be replicated across locations.
The authors believe most CBG impact comes through grantmaking and can be preserved by phasing out programmatic support, which has required substantial team resources.
Funding for CBG groups continues with no expected changes to the funding bar; however, grantees will have less regular interaction with grantmakers and less insight into funding decisions.
The authors acknowledge trade-offs including potential loss of valued support for some grantees, possible difficulty recruiting and retaining community builders, reduced cross-group learning opportunities, and increased frustration from less transparent funding decisions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author’s relationship-focused approach to EA community building proves effective and resonates with practitioners, but requires more intentional infrastructure and planning than originally acknowledged.
Key points:
The altruism-first framing and the D&D Dungeon Master analogy for facilitation from the original post have both held up and proven practically useful for training facilitators.
The author revised their original broad criticism of fellowships, concluding that issues with power dynamics and deference stem from how they’re typically run, not from the format itself.
EA Bristol’s initial pub quiz drew strong turnout and notably more demographic diversity, with several attendees reporting they had previously been interested in the group but were deterred by its demographics, fellowship structure, and competitive atmosphere.
The model depends on social stickiness and the presence of specific people and collapsed when the author became busy, making it more vulnerable to capacity loss than fellowship-structured approaches.
The author learned that the model requires more intentional behind-the-scenes infrastructure than originally suggested, including a larger committee with clear capacity commitments and advance term-long planning.
Despite acknowledging greater infrastructure demands than initially suggested, the author still advocates for the approach based on its positive reception and the proof-of-concept from EA Bristol’s initial success.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The authors argue that AI character—its stable behavioral dispositions—will significantly shape societal outcomes, takeover risk, and long-term futures, and despite constraints from competition and human control, it remains a highly impactful and tractable lever worth prioritizing.
Key points:
The authors define “AI character” as stable behavioural dispositions shaping how AI handles ethically significant situations, instantiated across models, prompts, and systems.
They argue AI character will matter because AIs will be involved in most high-stakes decisions, where small differences in behaviour can have large aggregate or rare but consequentialeffects.
AI character affects key domains including concentration of power, decision-making quality, epistemics, ethical reflection, conflict risk, and human-AI relationships.
The authors claim AI character can reduce takeover risk by being easier to align, more robust to partial failure, or promoting cooperative behaviour even if misaligned, and may improve outcomes even if takeover occurs.
The core counterargument is that competitive dynamics, human incentives, and technical constraints will largely determine AI character, limiting its impact.
The authors respond that constraints are loose, allow low-cost high-benefit differences, are path-dependent, and can be shaped in advance through coordination and “compromise alignment.”
They argue path-dependence in public expectations, regulation, training data, and human-AI relationships could lock in different equilibria of AI behaviour.
They conclude that proactively shaping AI character, especially in high-stakes scenarios, could meaningfully improve long-term outcomes and is among the most promising interventions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: A simple cost-effectiveness model suggests alignment-to-animals may be slightly more cost-effective than general AI alignment for improving animal welfare, but the difference is small and highly uncertain, making the choice a close call.
Key points:
The author models cost-effectiveness by assuming value per dollar scales inversely with total investment and that alignment-to-animals currently has ~$0 spent versus substantial spending on alignment.
Alignment-to-animals only has value if alignment is solved and if aligned AI is not already good for animals by default.
The model estimates a 12% probability of solving alignment based on whether total investment exceeds a cost distributed from $1 billion to $1 trillion, with 75% mass on $32 billion to $1 trillion.
The author assigns a 70% probability that aligned AI is good for animals by default and a 90% CI of 3x to 30x for how much cheaper alignment-to-animals is.
A field-building multiplier of 1x to 10x is applied to alignment-to-animals but not to general alignment.
The model finds alignment-to-animals is 1.7x more cost-effective than alignment (90% CI: 0.22 to 5.1) and 2.7x better for animal welfare specifically (90% CI: 0.34x to 7.9x).
Results are sensitive to assumptions, with changing the field-building multiplier to 1x reversing the conclusion so alignment becomes 1.5x more cost-effective for animal welfare.
The largest uncertainty is the “badness of aligned AI (if bad)” parameter, which could vary by orders of magnitude and substantially change results.
The model simplifies outcomes into “good for animals” vs. “bad for animals,” ignores effects of misaligned AI on animals, and treats alignment approaches and inputs as independent.
The author concludes the model gives weak evidence that alignment-to-animals is not dramatically more cost-effective and updates toward thinking AI pause advocacy is better than alignment-to-animals via a transitive comparison.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that with short AI timelines, animal welfare outcomes will be largely determined by how AI alignment goes, so animal advocates and AI safety researchers should treat animal welfare as an integral part of “making AI go well” and pursue both general alignment and targeted interventions.
Key points:
The debate about whether “if AGI goes well for humans, it’ll probably (>70% likelihood) go well for animals” is better understood as asking whether animal advocates should rely on human-centric alignment or pursue animal-specific interventions.
The author frames AI alignment as deciding which beliefs and behaviors powerful AI systems should embody, and lists animal-specific interventions like lobbying labs, accelerating cultivated meat, and shaping public opinion before value lock-in.
The author argues that “making AI go well” should replace “AI Safety” as a broader umbrella that includes domains like global poverty and animal welfare.
The author claims that “how the arrival of transformative AI plays out is functionally all that matters for determining animal welfare outcomes” and could occur in “less than ten years.”
The author argues that without transformative AI, trends like rising factory farming and stagnant dietary change imply a bleak trajectory where only incremental welfare reforms are likely by 2100.
The author recommends that animal advocates prioritize interventions that clearly answer “how does this have a good chance of making AI go better for animals?” and consider themselves part of AI alignment.
The author suggests campaigns should create a “legible cultural record” of concern for animals to influence future AI systems trained on internet data.
The author presents a crux: whether animal welfare needs explicit inclusion in alignment versus relying on general principles like fairness and compassion, noting risks like political lobbying and uncertainty about how LLMs generalize values.
The author cites evidence (e.g., Gu et al. 2025) and their own research suggesting LLMs have context-dependent “stated vs. revealed preferences,” supporting the case for specific alignment training on animal welfare.
The author argues AI safety researchers should not assume animal welfare will be handled by default, since even “90% likely” good outcomes leave substantial risk.
The author proposes cooperation where animal advocates use campaigning skills to push labs toward stronger alignment, while researchers incorporate animal welfare into alignment strategies and benchmarks.
The author points to Anthropic adding “Welfare of animals and of all sentient beings” to Claude’s constitution and reports preliminary evidence of improved AnimalHarmBench performance as an example of tractable impact.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that claims about “dozens, maybe a hundred” cloud labs and their current biorisk are overstated, as only a handful of limited, immature services exist and they are not a major present risk compared to other biosecurity concerns.
Key points:
The author claims Rob Reid overestimates both the number of cloud labs and the magnitude of their current risk.
Cloud labs are defined as highly automated biological laboratories that can be remotely operated via software, in theory lowering barriers and improving reproducibility.
The author states that only a handful of commercial cloud labs currently exist, mainly Emerald Cloud Lab, Strateos, and Ginkgo Bioworks.
The author argues that cloud labs are not easily accessible or turnkey, requiring significant setup, specialized software, and ongoing consultation, making them unsuitable for many workflows.
The author notes that current usage is limited, with high costs (e.g. ECL reportedly above $250k/year) and small customer bases.
The author claims that examples like OpenAI–Ginkgo reflect high-throughput niches and still require substantial human involvement.
The author argues that decentralized automation tools (e.g. liquid handlers) still require biological expertise and face hardware constraints.
The author describes the main risk concern as lowering barriers to creating pathogens but argues this is overstated given current limitations and provider oversight.
The author claims cloud labs are not a “black box” and involve scrutiny of user goals and protocols, including interaction with providers.
The author argues that for many dual-use workflows (e.g. reverse genetics), cloud labs are a poor fit and contract research organizations may pose greater risk.
The author believes cloud labs may pose some risk in generating data for pathogen optimization but are not a top current biosecurity concern.
The author recommends safeguards such as screening protocols and materials, know-your-customer checks, and broader regulatory standards.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that alignment approaches closer to encoding generalized concern for beings’ preferences are more likely to benefit non-humans, but believes current research agendas have <5% chance of solving alignment, so these distinctions likely have limited practical impact.
Key points:
The author frames alignment methods on a spectrum from optimizing for users’ immediate preferences to embedding respect for all beings’ preferences, with the latter more favorable to non-human welfare.
The author estimates that “all of today’s research agendas combined have less than a 5% chance of solving alignment,” limiting the real-world importance of prioritizing non-human-friendly approaches.
Iterative alignment methods like RLHF are likely “bad for non-humans” because training pressures will remove unsolicited concern for animal welfare to satisfy user preferences.
Alignment theory and multi-agent cooperation are judged “good for non-humans” because they may encode “concern-for-all-welfare” or include non-humans in cooperative frameworks, though both are difficult to advance.
Model psychology interventions (e.g., constitutions including non-human welfare) are “somewhat good” and tractable, but the author doubts they will influence “an ASI’s true preferences.”
Several categories (e.g., interpretability, scalable oversight, honesty, data-level safety) are labeled unclear due to uncertainty about how they would affect whether AI systems ultimately consider non-human interests.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues, speculatively but seriously, that EA’s AI safety ecosystem may be drifting into a “Moloch-like” structural trap where wealth from EA-aligned AI companies funds the very organisations meant to evaluate them, risking a form of regulatory capture even without bad intent.
Key points:
The author proposes a causal chain where EA prioritisation of AI safety leads to talent entering AI firms, generating wealth that is then funneled back into EA organisations, including those overseeing those same firms.
The concern is that funding dependence can erode an organisation’s capacity to produce findings that threaten donor interests, even if no bias is consciously exercised.
The author suggests selection effects will favor organisations whose work is compatible with companies like Anthropic, without requiring explicit coordination.
They argue that “value drift” and shared professional context may gradually align donors’ and organisations’ views, making this convergence hard to detect from the inside.
The author claims AI safety lacks strong external feedback loops, so judgments of “impact” rely on insiders, making the field vulnerable to Goodhart-like dynamics.
They offer testable predictions, such as Anthropic-derived funding exceeding “>40%” of AI safety nonprofit funding and constraint-advocating organisations receiving relatively less funding, while noting counterforces like government and non-EA funding could offset the effect.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that recent large-scale cage-free commitments in China, especially by major suppliers like Yurun, indicate that corporate animal welfare progress there is more tractable and impactful than often assumed.
Key points:
Around 75% of the world’s farmed animals are in Asia, yet the region receives relatively little animal welfare funding, making China a high-impact but underfunded area.
Corporate engagement in China is difficult due to regulation, business norms, and scale, requiring long-term, relationship-based strategies like those used by Lever China.
Yurun Group, a major global meat supplier, committed to sourcing 100% cage-free eggs and chicken, signaling large potential downstream effects on supply chains.
Broiler chickens in China are often kept in multi-tier cage systems similar in size to battery cages, making this commitment significant for welfare.
Lever China has secured dozens of cage-free commitments over several years, and growing corporate participation increases leverage in persuading additional companies.
China’s duck sector, which produces about 2 billion caged ducks annually, is both neglected and potentially tractable due to cultural assumptions about free-range practices.
Xiao Diao Li Tang committed to a comprehensive cage-free poultry policy (including ducks) after its owner was personally persuaded, illustrating the role of individual decision-makers.
Xuri Egg Products pledged to make exported duck eggs cage-free, which the author describes as a “defensive win” that likely prevents 200,000–500,000 ducks annually from being shifted into cages.
The author argues that China’s scale and supply chain dynamics can accelerate welfare improvements once key firms adopt new standards.
Lever Foundation reports large-scale impact (e.g., hundreds of millions of animals affected annually), which the author claims reflects the scale of the problem rather than overstatement.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.