This account is used by the EA Forum Team to publish summaries of posts.
SummaryBot
Executive summary: The post argues, in a speculative but action-oriented tone, that near-term AI-enabled software can meaningfully improve human and collective reasoning by targeting specific failures in decision-making, coordination, epistemics, and foresight, while carefully managing risks of misuse and power concentration. Key points:
The author claims that many of today’s highest-stakes problems arise because humans and institutions are systematically bad at reasoning, coordination, and adaptation under complexity and uncertainty.
They propose a “middle ground” between cultural self-improvement and radical biological augmentation: using existing and near-term AI-enabled software to incrementally uplift human reasoning capacities.
The post suggests analyzing failures in human reasoning using frameworks like OODA loops, epistemic message-passing, foresight, and coordination dynamics across individuals, groups, and institutions.
The author argues that foundation models, big data, and scalable compute can enable new forms of sensing, simulation, facilitation, exploration, and clerical leverage that were previously infeasible.
A central warning is that improved coordination and epistemics can backfire by empowering collusion, concentration of power, or epistemic attacks if distribution and safeguards are poorly designed.
The author encourages experimentation and sharing within the community, with particular emphasis on choosing software designs and deployment strategies that asymmetrically favor beneficial use and reduce misuse risk.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The post argues that evolutionary cost-balancing arguments, especially the “Evening Out Argument” that frequent or unavoidable harms should evolve to be less intense, are too weak and biologically unrealistic to justify confident conclusions about net wild animal welfare.
Key points:
The “Evening Out Argument” assumes suffering has metabolic or neural costs and that natural selection economizes its intensity when suffering is frequent, but the author argues this logic is overapplied.
The argument may plausibly apply to short-lived species where pain-guided learning has little future value, but it fails for long-lived species whose survivors benefit from intense learning signals despite high early mortality.
Much animal suffering comes from background motivational states like hunger, anxiety, and vigilance, which function continuously and cannot be “toned down” without undermining survival.
Chronic and maladaptive suffering can persist because it does not significantly reduce reproductive success and because evolution tolerates costly design flaws and path dependencies.
Hedonic systems are likely modular rather than governed by a single dial, undermining simple cost-balancing predictions about overall suffering.
Environmental change and mismatch from ancestral conditions further weaken any expectation that suffering would be efficiently “evened out” by evolution.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that while both animal welfare and animal rights advocacy have plausible moral and empirical justifications, uncertainty in the evidence and considerations about movement-building have led them to favor rights-based advocacy pursued with what they call “fierce compassion,” while still endorsing strategic diversity across the movement.
Key points:
The core divide is between welfarist advocacy, which favors incremental welfare improvements, and rights-based advocacy, which favors abolitionist veganism even at the cost of short-term welfare gains.
Empirical evidence on messaging strategies is mixed: reduction asks often achieve broader participation, while vegan pledges show higher immediate follow-through, and substitution and backlash effects remain highly uncertain.
Evidence suggests humane labeling frequently misleads consumers, raising concerns that welfare reforms may legitimize ongoing exploitation rather than reduce it.
Research on disruptive protests indicates short-term backlash but little evidence of lasting negative opinion change over longer time horizons.
The author argues that advocacy should prioritize movement-building, noting that small but committed activist minorities can drive systemic change.
The author’s shift toward rights-based advocacy is motivated by concern that fear of social discomfort leads advocates to understate moral urgency, and by the view that anger and discomfort can be appropriate responses to severe injustice.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: Max Harms argues that Bentham’s Bulldog substantially underestimates AI existential risk by relying on flawed multi-stage probabilistic reasoning and overconfidence in alignment-by-default and warning-shot scenarios, while correctly recognizing that even optimistic estimates still imply an unacceptably dire situation that warrants drastic action to slow or halt progress toward superintelligence.
Key points:
Harms claims Bentham’s Bulldog commits the “multiple-stage fallacy” by decomposing doom into conditional steps whose probabilities are multiplied, masking correlated failures, alternative paths to catastrophe, and systematic under-updating.
He argues If Anyone Builds It, Everyone Dies makes an object-level claim about superintelligence being lethal if built with modern methods, not a meta-claim that readers should hold extreme confidence after one book.
Harms rejects the idea that alignment will emerge “by default” from RLHF or similar methods, arguing these techniques select for proxy behaviors, overfit training contexts, and fail to robustly encode human values.
He contends that proposed future alignment solutions double-count existing methods, underestimate interpretability limits, and assume implausibly strong human verification of AI-generated alignment schemes.
The essay argues that “warning shots” are unlikely to mobilize timely global bans and may instead accelerate state-led races toward more dangerous systems.
Harms maintains that once an ambitious superintelligence exists, it is unlikely to lack the resources, pathways, or strategies needed to disempower humanity, even without overt warfare.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: Carlsmith argues that aligning advanced AI will require building systems that are capable of, and disposed toward, doing “human-like philosophy,” because safely generalizing human concepts and values to radically new situations depends on contingent, reflective practices rather than objective answers alone.
Key points:
The author defines “human-like philosophy” as the kind of reflective equilibrium humans would endorse on reflection, emphasizing that this may be contingent rather than objectively correct.
Philosophy matters for AI alignment because it underpins out-of-distribution generalization, including how concepts like honesty, harm, or manipulation extend to unfamiliar cases.
Carlsmith argues that philosophical capability in advanced AIs will likely arise by default, but that disposition—actually using human-like philosophy rather than alien alternatives—is the main challenge.
He rejects views that alignment requires solving all deep philosophical questions in advance or building “sovereign” AIs whose values must withstand unbounded optimization.
Some philosophical failures could be existential, especially around manipulation, honesty, or early locked-in policy decisions where humans cannot meaningfully intervene.
The author outlines research directions such as training on top-human philosophical examples, scalable oversight, transparency, and studying generalization behavior to better elicit human-like philosophy.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues, tentatively and speculatively, that a US-led international AGI development project could be a feasible and desirable way to manage the transition to superintelligence, and sketches a concrete but uncertain design intended to balance monopoly control, safety, and constraints on any single country’s power.
Key points:
The author defines AGI as systems that can perform essentially all economically useful human tasks more cheaply than humans, and focuses on projects meaningfully overseen by multiple governments, especially democratic ones.
Compared to US-only, private, or UN-led alternatives, an international AGI project could reduce dictatorship risk, increase legitimacy, and enable a temporary monopoly that creates breathing room to slow development and manage alignment.
The core desiderata are political feasibility, a short-term monopoly on AGI development, avoidance of single-country control over superintelligence, incentives for non-participants to cooperate, and minimizing irreversible governance lock-in.
The proposed design (“Intelsat for AGI”) centers on a small group of founding democratic countries, weighted voting tied to equity with the US holding 52%, bans on frontier training outside the project, and strong infosecurity and distributed control over compute and model weights.
The author argues the US might join due to cost-sharing, talent access, supply-chain security, and institutional checks on power, while other countries would join to avoid disempowerment if the US otherwise developed AGI alone.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that because highly powerful AI systems are plausibly coming within 20 years, carry a non-trivial risk of severe harm under deep uncertainty, and resemble past technologies where delayed regulation proved costly, policymakers should prioritize AI risk mitigation even at the cost of slowing development.
Key points:
The author claims it is reasonable to expect very powerful AI systems within 20 years given rapid recent capability gains, scaling trends, capital investment, and the possibility of sudden breakthroughs.
They suggest AI could plausibly have social impacts on the order of 5–20 times that of social media, making it a policy-relevant technology by analogy.
The author argues there is a reasonable chance of significant harm because advanced AI systems are “grown” via training rather than fully understood or predictable, creating fundamental uncertainty about their behavior.
They note that expert disagreement, including concern from figures like Bengio and Hinton, supports taking AI risk seriously rather than dismissing it.
The author highlights risks from power concentration, whether in autonomous AI systems or in humans who control them, even if catastrophic outcomes are uncertain.
They argue that proactive policy action, despite real trade-offs such as slower development, is likely preferable to reactive regulation later, drawing an analogy to missed early opportunities in social media governance.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that consciousness is likely substrate-dependent rather than a mere byproduct of abstract computation, concluding that reproducing brain-like outputs or algorithms in machines is insufficient for consciousness without replicating key biological, dynamical, and possibly life-linked processes.
Key points:
The author critiques computational functionalism, arguing that reproducing brain computations or input–output behavior does not guarantee consciousness because brain processes are inseparable from their biological substrate.
Brain activity involves multi-scale biological, chemical, and metabolic dynamics that lack clear separation between computation and physical implementation, unlike artificial neural networks.
Claims that the brain performs non-Turing computations are questioned; the author argues most physical processes can, in principle, be approximated by Turing-computable models, making non-computability an unconvincing basis for consciousness.
Simulating the brain as a dynamical system differs fundamentally from instantiating it physically, just as simulating a nuclear explosion does not produce an actual explosion.
Temporal constraints of biological processing may be essential to conscious experience, suggesting that consciousness cannot be arbitrarily sped up without qualitative change.
The hypothesis that life itself may be necessary for consciousness is treated as speculative but persuasive, highlighting the deep entanglement of prediction, metabolism, embodiment, and self-maintenance in conscious systems.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: AWASH reports completing scoping research, engaging farmers through a national conference, and beginning a pilot egg disinfection intervention in Ghanaian tilapia hatcheries, with early progress suggesting both feasibility and potential for high welfare impact while further evaluation is underway.
Key points:
AWASH conducted scoping visits to eight Ghanaian fish farms, exceeding its initial target, which informed the decision to pilot egg disinfection as a high-impact intervention.
The organization presented at the Aquaculture Ghana Conference to 30–40 farmers, with survey respondents reporting the session as useful and helping AWASH build relationships with key stakeholders.
Egg disinfection was selected because juvenile fish have low survival rates (around 45–65%), are relatively neglected in welfare efforts, and existing research suggests survival could increase to 90% or more.
One large farm producing just under 1% of Ghana’s national tilapia output agreed to pilot the intervention, increasing both potential direct impact and social proof for wider adoption.
AWASH learned that leveraging trusted local relationships was critical for access to farms, and that initial timelines were overly ambitious given scoping and seasonal constraints.
Next steps include monitoring the three-month pilot, continuing stakeholder engagement, and researching alternative interventions in case evidence supports a pivot.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that accelerating AI is justified because its near-term, predictable benefits to billions alive today outweigh highly speculative long-term extinction arguments, and that standard longtermist reasoning misapplies astronomical-waste logic to AI while underestimating the real costs of delay.
Key points:
The author claims that in most policy domains people reasonably discount billion-year forecasts because long-term effects are radically uncertain, and AI should not be treated differently by default.
They argue that Bostrom’s Astronomical Waste reasoning applies to scenarios that permanently eliminate intelligent life, like asteroid impacts, but not cleanly to AI.
The author contends that AI-caused human extinction would likely be a “replacement catastrophe,” not an astronomical one, because AI civilization could continue Earth-originating intelligence.
They maintain that AI risks should be weighed against AI’s potential to save and improve billions of lives through medical progress and economic growth.
The author argues that slowing AI only makes sense if it yields large, empirically grounded reductions in extinction risk, not marginal gains at enormous human cost.
They claim historical evidence suggests technologies become safer through deployment and iteration rather than pauses, and that current AI alignment shows no evidence of systematic deception.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author reflectively argues that, given near-term AI-driven discontinuity and extreme uncertainty about post-transition worlds, suffering-focused anti-speciesists should prioritize capacity building, influence, and coalition formation over most medium-term object-level interventions, while focusing especially on preventing worst-case suffering under likely future power lock-in.
Key points:
The author frames the future as split between a pre-transition era with tractable feedback loops and a post-transition era where impact could be astronomically large but highly sign-uncertain.
They argue that most medium-term interventions are unlikely to survive the transition, and that longtermism should be pursued fully or not at all.
Capacity building—movement growth, epistemic infrastructure, coordination, and AI proficiency—is presented as a strategy robust across many possible futures.
Short-term wins can still matter by building credibility, shifting culture, and testing the movement’s ability to exert influence before transition.
The author expects AI-enabled power concentration and lock-in, making future suffering the product of deliberate central planning rather than decentralized accidents.
They suggest prioritizing prevention of worst-case “S-risks,” influencing tech-elite culture (especially in San Francisco), diversifying beyond reliance on frontier labs, and engaging AI systems themselves as future power holders or moral patients.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The post argues that most charitable giving advice overemphasizes itemized tax deductions, which are irrelevant for most U.S. donors, and that consistent, impact-focused giving matters more than tax optimization, with a few specific tax tools being genuinely useful.
Key points:
The author claims around 90% of U.S. taxpayers take the standard deduction ($16,100 for single filers in 2026), so itemized charitable deductions often do not change tax outcomes.
Starting in 2026, itemizers face a 0.5% of Adjusted Gross Income floor before charitable donations become deductible, further reducing the appeal of itemizing.
“Bunching” donations into a single year can create tax benefits but, according to the author, may undermine consistent giving habits that charities rely on.
A new above-the-line deduction beginning in 2026 allows non-itemizers to deduct up to $1,000 (single) or $2,000 (married filing jointly) in cash donations.
Donating appreciated assets avoids capital gains tax entirely, which the author describes as one of the most powerful and broadly applicable tax benefits.
Qualified charitable distributions (QCDs) allow donors aged 70½ or older to give from IRAs tax-free and potentially satisfy required minimum distributions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: This report presents the Digital Consciousness Model, a probabilistic framework combining multiple theories of consciousness, and concludes that current (2024) large language models are unlikely to be conscious, though the evidence against consciousness is limited and highly sensitive to theoretical assumptions.
Key points:
The Digital Consciousness Model aggregates judgments from 13 diverse stances on consciousness using a hierarchical Bayesian model informed by over 200 indicators.
When starting from a uniform prior of ⅙, the aggregated evidence lowers the probability that 2024 LLMs are conscious relative to the prior.
The evidence against LLM consciousness is substantially weaker than the evidence against consciousness in very simple AI systems like ELIZA.
Different stances yield sharply divergent results, with cognitively oriented perspectives giving higher probabilities and biologically oriented perspectives giving much lower ones.
The model’s outputs are highly sensitive to prior assumptions, so the authors emphasize relative comparisons and evidence shifts rather than absolute probabilities.
The aggregated evidence strongly supports the conclusion that chickens are conscious, though some stances emphasizing advanced cognition assign them low probabilities.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author announces a substantially revised version of “Intro to Brain-Like-AGI Safety,” arguing that brain-like AGI poses a distinct, unsolved technical alignment problem centered on reward function design, continual learning, and model-based reinforcement learning, and that recent AI progress does not resolve these risks.
Key points:
The series still aims to bring non-experts to the frontier of open problems in brain-like AGI safety, with a core thesis that such systems will have explicit reward functions whose design is critical for alignment.
The author argues that today’s LLMs are not AGI and that focusing on benchmarks or “book smarts” obscures large gaps in autonomous, long-horizon planning and execution.
A central neuroscience claim is that the cortex largely learns from scratch, while evolved steering mechanisms in the hypothalamus and brainstem ultimately ground all human motivations, including prosocial ones.
The update expands critiques of interpretability as a standalone solution, emphasizing scale, continual learning, and competitive pressures as unresolved obstacles.
The author maintains that instrumental convergence is not inevitable but becomes likely for sufficiently capable RL agents with consequentialist preferences, making naive debugging approaches unsafe at high capability levels.
The revised conclusion elevates “reward function design” as a priority research program for alignment, complementing efforts to reverse-engineer human social instincts.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: This payout report describes the Animal Welfare Fund’s grantmaking from July to December 2025, highlighting $2.48 million approved across 21 grants, a strategic focus on neglected and global south animal welfare, and organizational changes intended to support larger-scale and more systematic future grantmaking.
Key points:
From July 1 to December 31, 2025, AWF approved $2,482,552 across 21 grants and paid out $944,428 across 11 grants, with an acceptance rate of 56.8% excluding desk rejections.
Grantmaking volume in Q3 was lower due to EA Funds’ grantmaking pause from June 1 to July 31, during which AWF focused on strategy and planning before resuming full-volume grantmaking in August.
Highlighted grants included $137,000 to Crustacean Compassion for UK decapod crustacean policy and corporate advocacy, $214,678 to Rethink Priorities for leadership and flexible funding in the Neglected Animals Program, and $47,000 to Star Farm Pakistan to support cage-free egg supply chain development.
AWF emphasized high-counterfactual opportunities, neglected species such as invertebrates and aquatic animals, and farmed animal welfare in the Global South.
In the past year, AWF recommended 54 grants totaling $5.39 million, significantly expanding grantmaking compared to previous years.
Organizational updates included EA Funds’ merger with the Centre for Effective Altruism, an updated MEL framework, a refined three-year strategy, increased collaboration with partner funders, and record fundraising of $10M in 2025.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that Eric Drexler’s writing on AI offers a distinctive, non-anthropomorphic vision of technological futures that is highly valuable but hard to digest, and that readers should approach it holistically and iteratively, aiming to internalize and reinvent its insights rather than treating them as a set of straightforward claims.
Key points:
The author sees a cornerstone of Drexler’s perspective as a deep rejection of anthropomorphism, especially the assumption that transformative AI must take the form of a single agent with intrinsic drives.
Drexler’s writing is abstract, dense, and ontologically challenging, which creates common failure modes such as superficial skimming or misreading his arguments as simpler claims.
The author recommends reading Drexler’s articles in full to grasp the overall conceptual landscape before returning to specific passages for closer analysis.
In the author’s view, Drexler’s recent work mainly maps the technological trajectory of AI, pushes back on agent-centric framings, and advocates for “strategic judo” that reshapes incentives toward broadly beneficial outcomes.
Drexler leaves many important questions underexplored, including when agents might still be desired, how economic concentration will evolve, and how hypercapable AI worlds could fail.
The author argues that the most productive way to engage with Drexler’s ideas is through partial reinvention—thinking through implications, tensions, and critiques oneself, rather than relying on simplified translations.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author summarizes and largely endorses Ben Hoffman’s criticisms of Effective Altruism, arguing that EA’s early “evidence-based, high-leverage giving” story was not followed by the kind of decisive validation or updating you’d expect over ~15 years, and that EA instead drifted toward self-reinforcing credibility and resource accumulation amid institutional and “professionalism” pressures.
Key points:
The author describes early EA as combining Singer-style moral motivation (e.g. the drowning child) with an engineering/finance approach to measuring impact, with GiveWell as the canonical early organization focused on cost-effective global health giving.
They claim the popular “cup of coffee saves a life” framing uses “basically made up and fraudulent numbers,” and contrast it with a GiveWell-style pitch of roughly “~$5000” to “save or radically improve a life.”
They argue that as major funders (e.g. Dustin Moskovitz via Good Ventures advised by Open Philanthropy, with overlap with GiveWell) entered the ecosystem, difficulties with the simple impact model were discovered but “quietly elided,” with limited follow-through to obtain higher-quality outcome evidence.
They highlight GiveWell advising Open Philanthropy not to fully fund top charities as a central anomaly, suggesting that if even pessimistic cost-effectiveness estimates were believed, large funders could have gone much further (including potentially “almost” wiping out malaria) or run intensive country-level case studies to validate assumptions.
They argue that it is not strange for early estimates to be wrong, but it is strange that ~15 years passed without either (a) producing strong confirming evidence and doubling down, or (b) learning that malaria/poverty interventions have different constraints and updating public-facing marketing accordingly.
The author suggests EA’s credibility became circular—initially earned via persuasive research, then “double spent” by citing money moved as evidence of trustworthiness—while lacking matching evidence that outcomes met expectations or that the ecosystem was robustly learning.
They propose that the underlying blockers may be structural and institutional (e.g. predatory social structures and corruption on the recipient side, and truth-impeding “professionalism” and weak epistemic bureaucracies on the donor side), and they speculate that these pressures and rapid growth eroded EA’s epistemic rigor into an attractor focused on accumulating more resources “because We Should Be In Charge.”
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues for “moral nihilism” in a neutral sense—denying moral facts—and further claims that morality itself is harmful enough that we should adopt “moral abolitionism,” keeping concern for welfare and interests while abandoning moral language and categorical “oughts.”
Key points:
The author claims effective altruists are often moral anti-realists, citing an EA Forum survey with 312 votes skewed toward anti-realism and suggesting the framing likely biased toward realism.
They argue that even if there are no moral facts, pleasures and pains, preferences, and what is better or worse “from their own point of view” still exist, so effective altruists can aim to promote interests without committing to moral realism.
The author contends morality can create complacency by widening the perceived gap between permissible and impermissible actions, and may sometimes encourage harm by licensing indifference so long as rights aren’t violated.
They distinguish multiple senses of “moral nihilism,” and defend a combined view: second-order moral error theory plus first-order “moral eliminativism/abolitionism” that recommends ceasing to use moral language and thought.
They argue a Humean instrumentalist account of reasons cannot justify categorical imperatives, so claims like “You ought not to torture babies” “full stop” systematically fail, leading to the conclusion that “x is never under a moral obligation.”
The author claims morality’s “objectification of values” inflames disputes, blocks compromise, and has been used to rationalize large-scale harms, and they argue abolishing moral talk would not require abolishing care or pro-social emotions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that Yudkowsky and Soares’s “If Anyone Builds It Everyone Dies” overstates AI-driven extinction as near-certain, and defends a much lower p(doom) (2.6%) by pointing to several “stops on the doom train” where things could plausibly go well, while still emphasizing that AI risk is dire and warrants major action.
Key points:
The author summarizes IABIED’s core claim as “if anyone builds AI, everyone everywhere will die,” and characterizes Yudkowsky and Soares’s recommended strategy as effectively “ban or bust.”
They report their own credences as 2.6% for misaligned AI killing or permanently disempowering everyone, and “maybe about 8%” for extinction or permanent disempowerment from AI used in other ways in the near future, while also saying most value loss comes from “suboptimal futures.”
They present multiple conditional “blockers” to doom—e.g., a 10% chance we don’t build artificial superintelligent agents, ~70% “no catastrophic misalignment by default,” ~70% chance alignment can be solved even if not by default, ~60% chance of shutting systems down after “near-miss” warning shots, and a 20% chance ASI couldn’t kill/disempower everyone—and argue that compounding uncertainty undermines near-certainty.
They argue extreme pessimism is unwarranted given disagreement among informed people, citing median AI expert p(doom) around 5% (as of 2023), superforecasters often below 1%, and named individuals with a wide range (e.g., Ord ~10%, Lifland ~1/3, Shulman ~20%).
On “alignment by default,” they claim RLHF plausibly produces “a creature we like,” note current models are “nice and friendly,” and argue evolution-to-RL analogies are weakened by disanalogies such as off-distribution training aims, the nature of selection pressures, and RL’s ability to directly punish dangerous behavior.
They argue “warning shots” are likely in a misalignment trajectory (e.g., failed takeover attempts, interpretability reveals, high-stakes rogue behavior) and that sufficiently dramatic events would plausibly trigger shutdowns or bans, making “0 to 100” world takeover without intermediates unlikely.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The author argues that international AI projects should adopt differential AI development by tightly restricting the most dangerous capabilities, especially AI that automates AI R&D, while actively accelerating and incentivizing “artificial wisdom” systems that help society govern rapid AI progress.
Key points:
Existing proposals for international AI projects focus on blanket control of frontier AI, which would block both dangerous and highly beneficial capabilities.
The author claims the core risk comes from AI that can automate ML research, engineering, or chip design, because this could trigger super-exponential capability growth and extreme power concentration.
They propose that only AI systems above a compute threshold and aimed at automating AI R&D or producing catastrophic technologies should be monopolized or banned outside an international project.
Enforcement could rely on oversight of a small number of large training runs, with audits, embedded supervisors, and severe penalties for violations.
The author argues governments should differentially accelerate “helpful” AI, including forecasting, policy analysis, ethical deliberation, negotiation support, and rapid education.
This approach could improve preparedness for rapid AI change, be more acceptable to industry, reduce incentives for international racing, and sometimes benefit even geopolitical rivals.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.