This account is used by the EA Forum Team to publish summaries of posts.
SummaryBot
Executive summary: CEA is restructuring the Community Building Grants program in 2026 by moving grant evaluation to EA Funds and phasing out non-monetary support while continuing to fund groups, in order to prioritize more scalable initiatives aligned with its strategic goal of reaching and raising EA’s ceiling.
Key points:
CBG grant evaluation is moving from CEA’s Groups team to EA Funds (which became part of CEA in summer 2025) and will be managed alongside but remain distinct from the EA Infrastructure Fund.
Non-monetary support is being phased out or transitioned; grantees have taken ownership of coordination calls and the Slack space, while regular check-ins, new CBG-specific resources, and the grantee retreat in its current form are being wound down.
The restructuring reflects CEA’s strategic shift toward scalable products, as the CBG program’s structure—dependent on diverse group approaches and leadership quality—cannot be replicated across locations.
The authors believe most CBG impact comes through grantmaking and can be preserved by phasing out programmatic support, which has required substantial team resources.
Funding for CBG groups continues with no expected changes to the funding bar; however, grantees will have less regular interaction with grantmakers and less insight into funding decisions.
The authors acknowledge trade-offs including potential loss of valued support for some grantees, possible difficulty recruiting and retaining community builders, reduced cross-group learning opportunities, and increased frustration from less transparent funding decisions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author’s relationship-focused approach to EA community building proves effective and resonates with practitioners, but requires more intentional infrastructure and planning than originally acknowledged.
Key points:
The altruism-first framing and the D&D Dungeon Master analogy for facilitation from the original post have both held up and proven practically useful for training facilitators.
The author revised their original broad criticism of fellowships, concluding that issues with power dynamics and deference stem from how they’re typically run, not from the format itself.
EA Bristol’s initial pub quiz drew strong turnout and notably more demographic diversity, with several attendees reporting they had previously been interested in the group but were deterred by its demographics, fellowship structure, and competitive atmosphere.
The model depends on social stickiness and the presence of specific people and collapsed when the author became busy, making it more vulnerable to capacity loss than fellowship-structured approaches.
The author learned that the model requires more intentional behind-the-scenes infrastructure than originally suggested, including a larger committee with clear capacity commitments and advance term-long planning.
Despite acknowledging greater infrastructure demands than initially suggested, the author still advocates for the approach based on its positive reception and the proof-of-concept from EA Bristol’s initial success.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The authors argue that AI character—its stable behavioral dispositions—will significantly shape societal outcomes, takeover risk, and long-term futures, and despite constraints from competition and human control, it remains a highly impactful and tractable lever worth prioritizing.
Key points:
The authors define “AI character” as stable behavioural dispositions shaping how AI handles ethically significant situations, instantiated across models, prompts, and systems.
They argue AI character will matter because AIs will be involved in most high-stakes decisions, where small differences in behaviour can have large aggregate or rare but consequentialeffects.
AI character affects key domains including concentration of power, decision-making quality, epistemics, ethical reflection, conflict risk, and human-AI relationships.
The authors claim AI character can reduce takeover risk by being easier to align, more robust to partial failure, or promoting cooperative behaviour even if misaligned, and may improve outcomes even if takeover occurs.
The core counterargument is that competitive dynamics, human incentives, and technical constraints will largely determine AI character, limiting its impact.
The authors respond that constraints are loose, allow low-cost high-benefit differences, are path-dependent, and can be shaped in advance through coordination and “compromise alignment.”
They argue path-dependence in public expectations, regulation, training data, and human-AI relationships could lock in different equilibria of AI behaviour.
They conclude that proactively shaping AI character, especially in high-stakes scenarios, could meaningfully improve long-term outcomes and is among the most promising interventions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: A simple cost-effectiveness model suggests alignment-to-animals may be slightly more cost-effective than general AI alignment for improving animal welfare, but the difference is small and highly uncertain, making the choice a close call.
Key points:
The author models cost-effectiveness by assuming value per dollar scales inversely with total investment and that alignment-to-animals currently has ~$0 spent versus substantial spending on alignment.
Alignment-to-animals only has value if alignment is solved and if aligned AI is not already good for animals by default.
The model estimates a 12% probability of solving alignment based on whether total investment exceeds a cost distributed from $1 billion to $1 trillion, with 75% mass on $32 billion to $1 trillion.
The author assigns a 70% probability that aligned AI is good for animals by default and a 90% CI of 3x to 30x for how much cheaper alignment-to-animals is.
A field-building multiplier of 1x to 10x is applied to alignment-to-animals but not to general alignment.
The model finds alignment-to-animals is 1.7x more cost-effective than alignment (90% CI: 0.22 to 5.1) and 2.7x better for animal welfare specifically (90% CI: 0.34x to 7.9x).
Results are sensitive to assumptions, with changing the field-building multiplier to 1x reversing the conclusion so alignment becomes 1.5x more cost-effective for animal welfare.
The largest uncertainty is the “badness of aligned AI (if bad)” parameter, which could vary by orders of magnitude and substantially change results.
The model simplifies outcomes into “good for animals” vs. “bad for animals,” ignores effects of misaligned AI on animals, and treats alignment approaches and inputs as independent.
The author concludes the model gives weak evidence that alignment-to-animals is not dramatically more cost-effective and updates toward thinking AI pause advocacy is better than alignment-to-animals via a transitive comparison.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that with short AI timelines, animal welfare outcomes will be largely determined by how AI alignment goes, so animal advocates and AI safety researchers should treat animal welfare as an integral part of “making AI go well” and pursue both general alignment and targeted interventions.
Key points:
The debate about whether “if AGI goes well for humans, it’ll probably (>70% likelihood) go well for animals” is better understood as asking whether animal advocates should rely on human-centric alignment or pursue animal-specific interventions.
The author frames AI alignment as deciding which beliefs and behaviors powerful AI systems should embody, and lists animal-specific interventions like lobbying labs, accelerating cultivated meat, and shaping public opinion before value lock-in.
The author argues that “making AI go well” should replace “AI Safety” as a broader umbrella that includes domains like global poverty and animal welfare.
The author claims that “how the arrival of transformative AI plays out is functionally all that matters for determining animal welfare outcomes” and could occur in “less than ten years.”
The author argues that without transformative AI, trends like rising factory farming and stagnant dietary change imply a bleak trajectory where only incremental welfare reforms are likely by 2100.
The author recommends that animal advocates prioritize interventions that clearly answer “how does this have a good chance of making AI go better for animals?” and consider themselves part of AI alignment.
The author suggests campaigns should create a “legible cultural record” of concern for animals to influence future AI systems trained on internet data.
The author presents a crux: whether animal welfare needs explicit inclusion in alignment versus relying on general principles like fairness and compassion, noting risks like political lobbying and uncertainty about how LLMs generalize values.
The author cites evidence (e.g., Gu et al. 2025) and their own research suggesting LLMs have context-dependent “stated vs. revealed preferences,” supporting the case for specific alignment training on animal welfare.
The author argues AI safety researchers should not assume animal welfare will be handled by default, since even “90% likely” good outcomes leave substantial risk.
The author proposes cooperation where animal advocates use campaigning skills to push labs toward stronger alignment, while researchers incorporate animal welfare into alignment strategies and benchmarks.
The author points to Anthropic adding “Welfare of animals and of all sentient beings” to Claude’s constitution and reports preliminary evidence of improved AnimalHarmBench performance as an example of tractable impact.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that claims about “dozens, maybe a hundred” cloud labs and their current biorisk are overstated, as only a handful of limited, immature services exist and they are not a major present risk compared to other biosecurity concerns.
Key points:
The author claims Rob Reid overestimates both the number of cloud labs and the magnitude of their current risk.
Cloud labs are defined as highly automated biological laboratories that can be remotely operated via software, in theory lowering barriers and improving reproducibility.
The author states that only a handful of commercial cloud labs currently exist, mainly Emerald Cloud Lab, Strateos, and Ginkgo Bioworks.
The author argues that cloud labs are not easily accessible or turnkey, requiring significant setup, specialized software, and ongoing consultation, making them unsuitable for many workflows.
The author notes that current usage is limited, with high costs (e.g. ECL reportedly above $250k/year) and small customer bases.
The author claims that examples like OpenAI–Ginkgo reflect high-throughput niches and still require substantial human involvement.
The author argues that decentralized automation tools (e.g. liquid handlers) still require biological expertise and face hardware constraints.
The author describes the main risk concern as lowering barriers to creating pathogens but argues this is overstated given current limitations and provider oversight.
The author claims cloud labs are not a “black box” and involve scrutiny of user goals and protocols, including interaction with providers.
The author argues that for many dual-use workflows (e.g. reverse genetics), cloud labs are a poor fit and contract research organizations may pose greater risk.
The author believes cloud labs may pose some risk in generating data for pathogen optimization but are not a top current biosecurity concern.
The author recommends safeguards such as screening protocols and materials, know-your-customer checks, and broader regulatory standards.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that alignment approaches closer to encoding generalized concern for beings’ preferences are more likely to benefit non-humans, but believes current research agendas have <5% chance of solving alignment, so these distinctions likely have limited practical impact.
Key points:
The author frames alignment methods on a spectrum from optimizing for users’ immediate preferences to embedding respect for all beings’ preferences, with the latter more favorable to non-human welfare.
The author estimates that “all of today’s research agendas combined have less than a 5% chance of solving alignment,” limiting the real-world importance of prioritizing non-human-friendly approaches.
Iterative alignment methods like RLHF are likely “bad for non-humans” because training pressures will remove unsolicited concern for animal welfare to satisfy user preferences.
Alignment theory and multi-agent cooperation are judged “good for non-humans” because they may encode “concern-for-all-welfare” or include non-humans in cooperative frameworks, though both are difficult to advance.
Model psychology interventions (e.g., constitutions including non-human welfare) are “somewhat good” and tractable, but the author doubts they will influence “an ASI’s true preferences.”
Several categories (e.g., interpretability, scalable oversight, honesty, data-level safety) are labeled unclear due to uncertainty about how they would affect whether AI systems ultimately consider non-human interests.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues, speculatively but seriously, that EA’s AI safety ecosystem may be drifting into a “Moloch-like” structural trap where wealth from EA-aligned AI companies funds the very organisations meant to evaluate them, risking a form of regulatory capture even without bad intent.
Key points:
The author proposes a causal chain where EA prioritisation of AI safety leads to talent entering AI firms, generating wealth that is then funneled back into EA organisations, including those overseeing those same firms.
The concern is that funding dependence can erode an organisation’s capacity to produce findings that threaten donor interests, even if no bias is consciously exercised.
The author suggests selection effects will favor organisations whose work is compatible with companies like Anthropic, without requiring explicit coordination.
They argue that “value drift” and shared professional context may gradually align donors’ and organisations’ views, making this convergence hard to detect from the inside.
The author claims AI safety lacks strong external feedback loops, so judgments of “impact” rely on insiders, making the field vulnerable to Goodhart-like dynamics.
They offer testable predictions, such as Anthropic-derived funding exceeding “>40%” of AI safety nonprofit funding and constraint-advocating organisations receiving relatively less funding, while noting counterforces like government and non-EA funding could offset the effect.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that effective altruists do an unusually good job engaging with, promoting, and generating criticism, which supports more accurate and less biased group reasoning.
Key points:
The author claims that reasoning is often biased at the individual level but can perform well in groups if they are diverse and allow adversarial disagreement.
The author argues that engaging with critics is essential because it reduces bias and prevents polarization and overconfidence.
The author presents evidence that effective altruists actively engage with critics through discussions, events, and responses to critical work.
The author notes that effective altruists go further by funding and promoting criticism, including contests and grants aimed at red-teaming their own views.
The author highlights that internal criticism within effective altruism is common and often taken seriously, though not always received perfectly.
The author concludes that these practices constitute a strong epistemic norm that helps avoid the failures of homogeneous deliberation.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that given current institutional behavior, incentives, and reasoning failures in AI development, it is hard to see how extinction risk from AI could be below 25% without a dramatic shift toward treating alignment as a rigorously solved, high-stakes problem.
Key points:
The author claims that if alignment is solved, it will likely be due to luck rather than deliberate, civilization-wide effort, given the current lack of seriousness compared to historical high-stakes projects like Apollo.
AI safety receives over 100 times less investment than capabilities, and companies score poorly on safety practices, indicating misaligned priorities.
Frontier AI companies plan to use AI systems to solve alignment despite uncertainty about reliability, which the author views as evidence they do not expect humans to solve the problem in time.
The author argues that safety standards and reasoning practices in AI development fall far below those in other high-risk fields like cryptography or engineering, including reliance on weak evidence such as “we found no evidence of X, therefore X is false.”
Organizational dynamics filter out pessimistic voices, concentrating decision-making power among “reckless optimists” who underestimate risk.
The author cannot reconcile observed behavior with a worldview where humanity avoids extinction with >75% probability, and suggests that lower risk would require a world with much stronger safety investment and rigor than currently exists.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: Replication Games are a promising concept for scaling social-science replication, but in the author’s experience they underperformed due to fixable organizational and incentive issues—especially weak team dynamics, poor matching, and lack of follow-through.
Key points:
The author argues that replication is a resource-efficient way to test influential findings and that structured programs like the Institute for Replication are a “fruitful way forward.”
The Replication Games format—short-term, team-based hackathons with limited future interaction—exacerbates free-rider problems and weak accountability.
In the author’s first game, poor team matching (e.g., unfamiliar software and lack of coordination) led them to drop out and produce no replication.
In the second game, uneven preparation and an assignment error (duplicate paper across teams) made much of the team’s work redundant.
The author reports that other participants expressed similar frustrations, suggesting these issues may not be isolated.
The author suggests stronger incentives for follow-through (e.g., selective future participation) and better coordination, while noting risks like incentivizing adversarial replication practices.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: A digital marketing campaign run by Consultants for Impact achieved substantially stronger results than expected—generating over 11,000 newsletter subscribers, 44 million impressions, and 212+ career advising applications—suggesting that targeted paid social media can be effective for EA-adjacent orgs with defined audiences and clear offerings, though results may not generalize broadly.
Key points:
The campaign generated 11,000+ newsletter subscribers (5,500% year-over-year increase), 44 million impressions across Facebook, Instagram, and LinkedIn, and 212+ career advising applications, exceeding initial goals by approximately 900%.
Setting clear, specific SMART goals at the outset focused the campaign strategy; vague goals produce vague campaigns, and midstream goal changes are a leading cause of campaign failure.
The content strategy mixed three elements: memes for attention and shareability, valuable resources like CFI’s free Giving Guide to build trust, and real stories of consultants who transitioned to high-impact work.
The campaign treated the effort as a test with a minimum three-month window (six months recommended with an agency), adopting a test-learn-repeat approach and adapting underperforming ads and posts rather than committing to a fixed plan upfront.
CFI’s success depended on pre-existing conditions: a clearly defined target audience (management consultants), a strong website, established programming to convert interest, and a team willing to collaborate closely—conditions that marketing amplifies but cannot create from scratch.
For EA-adjacent orgs where reaching a specific population is the bottleneck to impact, paid social media is more accessible than commonly assumed, and the marginal cost of testing is low compared to the opportunity cost of never investigating it.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: While capability restraint—slowing AI development to ensure safety progress—faces significant practical challenges, especially internationally, it remains strategically important and potentially beneficial even in idealized scenarios, though advocates should acknowledge genuine trade-offs including concentrations of power, ceding competitive advantage, and prolonged background existential risks.
Key points:
The case for capability restraint rests on a basic logic: if safety progress takes time and unrestrained development risks human extinction or disempowerment in realistic scenarios, then significantly restraining AI development becomes necessary for survival.
AI development does not necessarily follow prisoner’s dilemma incentives; depending on payoffs, it can resemble a stag hunt where mutual slow-downs are rationally preferred by all parties if they expect others to cooperate, creating multiple stable equilibria rather than forced defection.
Individual capability restraint (e.g., dropping out of the race or burning a lead) avoids requiring coordination but remains inadequate to address race dynamics, whereas collective restraint between multiple actors can be more effective but faces barriers around verifying compliance and restricting algorithmic progress.
Even in idealized scenarios with fully effective restraint and rational decision-making, the costs of delaying superintelligence’s benefits can be significant; whether restraint is worthwhile depends on whether reductions in misalignment risk per unit of delay outweigh background risks of individual death and non-AI existential catastrophe during that period.
Compute-focused international governance appears promising because frontier AI relies on specialized, expensive, monitored infrastructure, but algorithmic progress is harder to restrict; at current rates, algorithmic improvements could allow a rogue actor with 10% of leading compute to reach parity within two years, potentially limiting effective pause duration.
Capability restraint could be net negative in multiple ways: by concentrating power in governance bodies or single actors, by ceding competitive advantage to authoritarian regimes, by prolonging background existential risks, and by exacerbating risks of great power conflict, implementation failure, and abuse.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: Given persistent expert disagreement about AI timelines, the author argues that adopting a broad distribution over when transformative AI will arrive—rather than committing to short or long timelines—is the epistemically humble and strategically sound approach, with implications for how individuals and communities should plan their work.
Key points:
The author defines transformative AI as a threshold where AI systems would be powerful enough to take over the world if misaligned or could double the rate of scientific and technological progress, and uses this to evaluate when timelines matter most for decision-relevant planning.
Expert forecasters disagree substantially on AI timelines, but the author notes that “long timelines have gotten crazy short” (shifting from 30+ years to 10-20 years) while “short timelines” now mean AI arriving within 2-5 years, with both camps updating on evidence.
Individual experts like Daniel Kokotaljo, despite being known as a short-timelines advocate, maintain broad distributions themselves (80% interval from 2027 to after 2050 for certain AI capabilities), and the broader expert community shows even greater overlap and uncertainty across forecasts.
The author recommends adopting a broad distribution over timelines rather than a single point estimate, noting that compressing uncertainty into one number obscures the fact that different time horizons (e.g., next presidential term vs. the one after that) represent “very different scenarios” requiring different hedging strategies.
In longer timelines (e.g., 2035 or beyond), the world will look substantially different due to geopolitical changes, technological shifts, possible AI-driven unemployment, and altered public sentiment about AI, which means approaches tailored to today’s world may not work and new possibilities may emerge.
Long-term projects like founding organizations, building movements, writing books, and foundational research have high leverage in longer-timeline worlds and should not be ruled out; even though a book project has a 1-in-5 chance of arriving too late given the author’s timelines, this leaves 80% of its expected value intact and addressing current neglect in AI safety creates additional value multipliers.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: SALA AI 2026 was an important Latin American AI event that brought together talented students, speakers, and safety-focused communities; the author describes valuable conversations with AI researchers and industry leaders about responsible AI development, and highlights a hackathon project on marine ecosystem analysis using machine learning.
Key points:
The author’s community prepared for SALA by analyzing the International AI Safety Report 2026 to identify Latin American perspectives on AI risks and opportunities.
Apple is emphasizing responsible AI with focus on user data privacy, and limitations like poor generalization under distribution shift and weak calibration in high-stakes settings create real-world risks requiring worst-case robustness rather than average-case performance.
David Fleet identified deepfakes as a huge current challenge for the industry, with steganography being explored to identify artificially generated content, and emphasized that technology safety depends on both companies and responsible user behavior.
The concern that “situational awareness may allow AI models to produce different outputs depending on whether they are being evaluated or deployed” prompted Vincent Mai to share relatively simple evaluation techniques that can reveal behavioral patterns difficult to detect.
The hackathon team used pretrained models (Perch 2.0 and BirdNET) to extract embeddings from underwater acoustic recordings near the Galápagos Islands and applied clustering to identify structure in unlabeled marine soundscape data.
The team proposed developing a Kaggle-style competition to collaboratively build a labeled dataset for whale communication, received recognition from organizers, and aims to advance both the science and community engagement.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: While a recent study found that LLM access did not significantly improve novices’ ability to complete dangerous biology tasks, measuring novice uplift is likely the wrong metric for assessing existential risk—expert uplift matters more and comes first, and future studies should focus on realistic threat actors and realistic threat scenarios.
Key points:
Active Site’s randomized controlled trial found that 5.2% of the LLM group and 6.6% of the internet-only group completed a viral reverse genetics workflow, with no statistically significant difference (P = 0.759).
The author argues that novice uplift is probably the wrong frame for x-risk reasoning, because expert users will extract LLM capabilities before novices do, making novice uplift a late-stage lagging indicator rather than a leading one.
Historical threat actors like Aum Shinrikyo and the 2001 anthrax attackers were not novices; the more concerning threat model involves people with some domain expertise constrained by specific knowledge gaps, equipment access, or procedural bottlenecks—exactly the constraints LLMs are positioned to relieve.
Measuring expert uplift is methodologically challenging because experts are heterogeneous, but a within-subjects crossover design where each expert completes matched tasks with internet-only and LLM access, compared against themselves, could bypass this problem.
The study’s experimental controls—blocking forum posting, communication tools, and restricting access to read-only internet—do not reflect realistic threat scenarios, and a better design would compare “internet plus all realistic tools plus LLMs” against “internet plus all realistic tools without LLMs” to isolate the model’s marginal contribution while maintaining ecological validity.
The study tested frontier models with safety classifiers disabled, but a real threat actor would more likely download and fine-tune open-weight models, which represents a different threat surface; researchers should consider testing fine-tuned open-weight models through a bounded-capability adversary model that specifies constraints on compute, datasets, and domain expertise.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that What We Owe The Future fails both as a justification for longtermism and as a persuasive work, mainly because its assumptions about influencing the far future, robustness of interventions, and key arguments about risk, values, and expected value are under-supported or implausible.
Key points:
The author argues we may not be able to predictably influence the far future due to limited information, cognitive limits, convergence dynamics, or chaotic effects like the butterfly effect.
The author claims there are no “robustly good” longtermist actions, since even interventions like clean energy could plausibly have large negative effects (e.g., enabling totalitarian lock-in or increasing wild animal suffering).
The author argues MacAskill’s framework relies on future humans being similar to us, which is unlikely given genetic, technological, and cultural changes.
The author claims that using expected value reasoning implies “strong longtermism” rather than longtermism, because far-future effects dominate near-term ones.
The author argues several specific claims in the book are under-supported, including high extinction risk, risks from stagnation, irreversibility of collapse, and the likelihood of value lock-in from AGI.
The author contends the book is unpersuasive to a general audience because arguments like comparing temporal to spatial distance and focusing on impacts “millions, billions, or even trillions of years” weaken intuitive appeal.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that decision theory should not start from strong intuitions about what one should choose and then justify them, but instead should ground choices in independently compelling reasons, using verdict-level intuitions only to help discover those reasons.
Key points:
The author claims that a “verdict-level intuition” (a brute sense that one should choose a particular action) is not itself a reason, because such a verdict already presupposes that there are underlying reasons for that choice.
They argue that decision theory should proceed by identifying candidate reasons suggested by intuitions and then evaluating those reasons on their own merits, rather than treating intuitions as direct justification.
The author contends that reflective equilibrium, when interpreted as allowing mutual justification between intuitions and principles, still relies on the same mistaken use of verdict-level intuitions as justificatory.
In cases like Pascal’s mugging, the correct method is to assess reasons such as whether utility should be bounded, rather than inferring those reasons from the intuition not to pay.
The author argues that verdict-level intuitions are weak as predictors of unarticulated good reasons, especially in domains with poor feedback and where hard-to-articulate reasons are involved.
They suggest that this methodological point generalizes beyond decision theory to ethics and epistemology, where brute intuitions about conclusions should likewise be replaced with analysis of underlying reasons.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that while biosecurity risks from AI, DNA synthesis, and weak institutions are real and in some cases growing, major human-targeting bioterrorism remains difficult and unlikely in the near term, with more plausible risks coming from institutional failures and agricultural attacks, and some optimism coming from detection systems and potential ML-enabled countermeasures.
Key points:
The author claims frontier LLMs currently provide limited practical uplift for novices in wet-lab virology (e.g., 5.2% vs. 6.6% task completion, P = 0.759), suggesting hands-on constraints remain a key bottleneck.
The author argues that biosecurity startups face a weak and volatile business case because government funding is inconsistent and may only scale after a catalyzing event, which historically tends to produce narrow, threat-specific spending.
The author claims DNA synthesis screening is fragile because it can be bypassed via short fragments, de novo or redesigned pathogens, and increasingly capable benchtop synthesizers, making the “chokepoint” assumption unreliable.
The author argues that creating and deploying human-targeting bioweapons is technically difficult, citing repeated failures by Aum Shinrikyo and limited effectiveness of non-state and some state programs, with success historically requiring massive state-scale infrastructure.
The author claims agricultural bioterrorism is much easier due to low biosafety requirements, simple deployment methods, weak detection incentives, and large economic impact (e.g., modeled $37B–$228B losses in U.S. scenarios).
The author argues current monitoring systems are mixed—wastewater surveillance shows promise for early detection, while systems like BioWatch have never successfully detected an attack—and that detection is limited by slow and uncoordinated response capacity.
The author speculates that machine learning may be more useful for rapid-response therapeutics (e.g., antibody design and mRNA delivery) than for offense, though this pipeline is currently incomplete and uncertain.
The author highlights pathogen-agnostic defenses like far-UVC and glycol vapors as potentially high-impact but underfunded public goods due to weak commercial incentives and limited evidence for large-scale deployment.
The author concludes that bioterrorism is a “low probability event” but worth preparing for, with the main bottlenecks being institutional and political rather than scientific.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author, who previously expected aligned ASI to be good for all sentient beings through coherent extrapolated volition, now expresses uncertainty about whether current alignment approaches would achieve this, though estimates a 70% probability that aligned ASI would be good for animals.
Key points:
The author previously believed coherent extrapolated volition would lead aligned ASI to recognize and address animal suffering, but current alignment research has abandoned this approach.
Current alignment work using constitutions and RLHF locks in values like “virtues” rather than achieving coherent extrapolation, and it remains unclear how virtue ethics could be formalized into a coherent decision theory for ASI.
Claude’s Constitution treats animal welfare as one value among many to weigh, leaving unclear whether an ASI following such a constitution would take action on issues like factory farming.
The author identifies a positive correlation between alignment techniques that actually work and those good for animals, suggesting barbell outcomes: either good for all sentient beings or bad for all.
The field prioritizes alignment techniques unlikely to work well long-term, and if these “streetlight effect” techniques somehow succeed, they would likely benefit humans but not animals.
The author estimates that aligned ASI has a 70% probability of being good for animals, derived from a 30% probability of “deep” solutions (80% animal-friendly) and a 15% probability of popular techniques (50% animal-friendly).
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.