AI alignment

TagLast edit: Jul 22, 2022, 8:58 PM by Leo

AI alignment is research on how to align AI systems with human or moral goals.

Evaluation

80,000 Hours rates AI alignment a “highest priority area”: a problem at the top of their ranking of global issues assessed by importance, tractability and neglectedness.^[1]

External links

AI Alignment Forum.

^
80,000 Hours (2021) Our current list of the most important world problems, 80,000 Hours.

2019 AI Alignment Literature Review and Charity Comparison

LarksDec 19, 2019, 2:58 AM

147 points

28 comments62 min readEA link

2018 AI Alignment Literature Review and Charity Comparison

LarksDec 18, 2018, 4:48 AM

118 points

28 comments63 min readEA link

Ben Garfinkel: How sure are we about this AI stuff?

bgarfinkelFeb 9, 2019, 7:17 PM

128 points

20 comments18 min readEA link

AGI Safety Fundamentals curriculum and application

richard_ngoOct 20, 2021, 9:45 PM

123 points

20 comments8 min readEA link

(docs.google.com)

Why AI alignment could be hard with modern deep learning

AjeyaSep 21, 2021, 3:35 PM

156 points

17 comments14 min readEA link

(www.cold-takes.com)

AI Research Considerations for Human Existential Safety (ARCHES)

Andrew CritchMay 21, 2020, 6:55 AM

29 points

0 comments3 min readEA link

(acritch.com)

Disentangling arguments for the importance of AI safety

richard_ngoJan 23, 2019, 2:58 PM

63 points

14 comments8 min readEA link

Why I prioritize moral circle expansion over reducing extinction risk through artificial intelligence alignment

JacyFeb 20, 2018, 6:29 PM

107 points

72 comments35 min readEA link

(www.sentienceinstitute.org)

Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research

RemmeltNov 26, 2020, 4:39 PM

11 points

0 comments4 min readEA link

DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin ShahMay 13, 2022, 12:19 PM

102 points

0 comments9 min readEA link

My current thoughts on MIRI’s “highly reliable agent design” work

Daniel_DeweyJul 7, 2017, 1:17 AM

60 points

59 comments19 min readEA link

Preventing an AI-related catastrophe—Problem profile

Benjamin HiltonAug 29, 2022, 6:49 PM

138 points

18 comments4 min readEA link

(80000hours.org)

2016 AI Risk Literature Review and Charity Comparison

LarksDec 13, 2016, 4:36 AM

57 points

12 comments28 min readEA link

The academic contribution to AI safety seems large

technicalitiesJul 30, 2020, 10:30 AM

117 points

28 comments9 min readEA link

Hiring engineers and researchers to help align GPT-3

Paul_ChristianoOct 1, 2020, 6:52 PM

107 points

19 comments3 min readEA link

AI alignment researchers may have a comparative advantage in reducing s-risks

Lukas_GloorFeb 15, 2023, 1:01 PM

79 points

5 comments13 min readEA link

Crazy ideas sometimes do work

Aryeh EnglanderSep 4, 2021, 3:27 AM

71 points

8 comments1 min readEA link

Plant-Based Defaults: A Missed Opportunity in AI Design

andiehansenMay 8, 2025, 9:37 AM

37 points

3 comments5 min readEA link

Launching applications for AI Safety Careers Course India 2024

varun_agrMay 1, 2024, 5:30 AM

23 points

1 comment1 min readEA link

2017 AI Safety Literature Review and Charity Comparison

LarksDec 20, 2017, 9:54 PM

43 points

17 comments23 min readEA link

AGI safety career advice

richard_ngoMay 2, 2023, 7:36 AM

211 points

20 comments1 min readEA link

Large Language Models as Fiduciaries to Humans

johnjnayJan 24, 2023, 7:53 PM

25 points

0 comments34 min readEA link

(papers.ssrn.com)

What is it to solve the alignment problem? (Notes)

Joe_CarlsmithAug 24, 2024, 9:19 PM

32 points

1 comment1 min readEA link

A tale of 2.5 orthogonality theses

ArepoMay 1, 2022, 1:53 PM

146 points

31 comments11 min readEA link

Alignment ideas inspired by human virtue development

Borys PikalovMay 18, 2025, 9:36 AM

3 points

0 comments4 min readEA link

[Question] What are the coolest topics in AI safety, to a hopelessly pure mathematician?

Jenny K EMay 7, 2022, 7:18 AM

89 points

29 comments1 min readEA link

AGI safety from first principles

richard_ngoOct 21, 2020, 5:42 PM

77 points

10 comments3 min readEA link

(www.alignmentforum.org)

My personal cruxes for working on AI safety

BuckFeb 13, 2020, 7:11 AM

136 points

35 comments44 min readEA link

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

evhubJan 12, 2024, 7:51 PM

65 points

0 comments1 min readEA link

(arxiv.org)

There are no coherence theorems

EJTFeb 20, 2023, 9:52 PM

107 points

49 comments19 min readEA link

Introducing The Nonlinear Fund: AI Safety research, incubation, and funding

Kat WoodsMar 18, 2021, 2:07 PM

71 points

32 comments5 min readEA link

Scrutinizing AI Risk (80K, #81) - v. quick summary

BenJul 23, 2020, 7:02 PM

10 points

1 comment3 min readEA link

Draft report on existential risk from power-seeking AI

Joe_CarlsmithApr 28, 2021, 9:41 PM

88 points

34 comments1 min readEA link

[Link post] Coordination challenges for preventing AI conflict

stefan.torgesMar 9, 2021, 9:39 AM

58 points

0 comments1 min readEA link

(longtermrisk.org)

AI alignment shouldn’t be conflated with AI moral achievement

Matthew_BarnettDec 30, 2023, 3:08 AM

116 points

15 comments5 min readEA link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_EthApr 18, 2023, 8:09 AM

31 points

2 comments1 min readEA link

(medium.com)

“Aligned with who?” Results of surveying 1,000 US participants on AI values

Holly MorganMar 21, 2023, 10:07 PM

41 points

0 comments2 min readEA link

(www.lesswrong.com)

[Question] What is most confusing to you about AI stuff?

Sam ClarkeNov 23, 2021, 4:00 PM

25 points

15 comments1 min readEA link

Counterarguments to the basic AI risk case

Katja_GraceOct 14, 2022, 8:30 PM

286 points

23 comments34 min readEA link

How do takeoff speeds affect the probability of bad outcomes from AGI?

KRJul 7, 2020, 5:53 PM

18 points

0 comments8 min readEA link

Techies Wanted: How STEM Backgrounds Can Advance Safe AI Policy

Daniel_EthMay 26, 2025, 11:29 AM

42 points

1 comment1 min readEA link

What is it like doing AI safety work?

Kat WoodsFeb 21, 2023, 7:24 PM

99 points

2 comments10 min readEA link

A central AI alignment problem: capabilities generalization, and the sharp left turn

So8resJun 15, 2022, 2:19 PM

53 points

2 comments10 min readEA link

Deceptive Alignment is <1% Likely by Default

DavidWFeb 21, 2023, 3:07 PM

54 points

26 comments14 min readEA link

TAI Safety Bibliographic Database

Jess_RiedelDec 22, 2020, 4:03 PM

61 points

9 comments17 min readEA link

From language to ethics by automated reasoning

Michele CampoloNov 21, 2021, 3:16 PM

8 points

0 comments6 min readEA link

AMA: Ajeya Cotra, researcher at Open Phil

AjeyaJan 28, 2021, 5:38 PM

84 points

105 comments1 min readEA link

Cognitive Science/Psychology As a Neglected Approach to AI Safety

Kaj_SotalaJun 5, 2017, 1:46 PM

40 points

37 comments4 min readEA link

Ngo and Yudkowsky on alignment difficulty

richard_ngoNov 15, 2021, 10:47 PM

71 points

13 comments94 min readEA link

Announcing AI Safety Support

Linda LinseforsNov 19, 2020, 8:19 PM

55 points

0 comments4 min readEA link

Train for incorrigibility, then reverse it (Shutdown Problem Contest Submission)

Daniel_EthJul 18, 2023, 8:26 AM

16 points

0 comments2 min readEA link

Tetherware #1: The case for humanlike AI with free will

Jáchym FibírJan 30, 2025, 11:57 AM

−1 points

2 comments10 min readEA link

(tetherware.substack.com)

On Deference and Yudkowsky’s AI Risk Estimates

bgarfinkelJun 19, 2022, 2:35 PM

287 points

194 comments17 min readEA link

Deep Deceptiveness

So8resMar 21, 2023, 2:51 AM

40 points

1 comment1 min readEA link

On how various plans miss the hard bits of the alignment challenge

So8resJul 12, 2022, 5:35 AM

126 points

13 comments29 min readEA link

Intellectual Diversity in AI Safety

KRJul 22, 2020, 7:07 PM

21 points

8 comments3 min readEA link

Announcing AXRP, the AI X-risk Research Podcast

DanielFilanDec 23, 2020, 8:10 PM

32 points

1 comment1 min readEA link

Alignment 201 curriculum

richard_ngoOct 12, 2022, 7:17 PM

94 points

9 comments1 min readEA link

Chaining the evil genie: why “outer” AI safety is probably easy

titotalAug 30, 2022, 1:55 PM

40 points

12 comments10 min readEA link

[Question] How much EA analysis of AI safety as a cause area exists?

richard_ngoSep 6, 2019, 11:15 AM

94 points

20 comments2 min readEA link

Rohin Shah: What’s been happening in AI alignment?

EA GlobalJul 29, 2020, 8:15 PM

18 points

0 comments14 min readEA link

(www.youtube.com)

How might we align transformative AI if it’s developed very soon?

Holden KarnofskyAug 29, 2022, 3:48 PM

164 points

17 comments44 min readEA link

[linkpost] “What Are Reasonable AI Fears?” by Robin Hanson, 2023-04-23

Arjun PanicksseryApr 14, 2023, 11:26 PM

41 points

3 comments4 min readEA link

(quillette.com)

Introduction to Pragmatic AI Safety [Pragmatic AI Safety #1]

TW123May 9, 2022, 5:02 PM

68 points

0 comments6 min readEA link

My Understanding of Paul Christiano’s Iterated Amplification AI Safety Research Agenda

ChiAug 15, 2020, 7:59 PM

38 points

3 comments39 min readEA link

Interpreting Neural Networks through the Polytope Lens

Sid BlackSep 23, 2022, 6:03 PM

35 points

0 comments1 min readEA link

AGI misalignment x-risk may be lower due to an overlooked goal specification technology

johnjnayOct 21, 2022, 2:03 AM

20 points

1 comment1 min readEA link

There should be an AI safety project board

mariushobbhahnMar 14, 2022, 4:08 PM

24 points

3 comments1 min readEA link

AI Risk: Increasing Persuasion Power

kewlcatsAug 3, 2020, 8:25 PM

4 points

0 comments1 min readEA link

AI alignment with humans… but with which humans?

Geoffrey MillerSep 8, 2022, 11:43 PM

51 points

20 comments3 min readEA link

We Are Conjecture, A New Alignment Research Startup

Connor LeahyApr 9, 2022, 3:07 PM

31 points

0 comments1 min readEA link

Parallels Between AI Safety by Debate and Evidence Law

Cullen 🔸Jul 20, 2020, 10:52 PM

30 points

2 comments2 min readEA link

(cullenokeefe.com)

Safe AI and moral AI

William D'AlessandroJun 1, 2023, 9:18 PM

3 points

0 comments11 min readEA link

(Even) More Early-Career EAs Should Try AI Safety Technical Research

tlevinJun 30, 2022, 9:14 PM

86 points

40 comments11 min readEA link

2020 AI Alignment Literature Review and Charity Comparison

LarksDec 21, 2020, 3:25 PM

155 points

16 comments68 min readEA link

Connor Leahy on Conjecture and Dying with Dignity

Michaël TrazziJul 22, 2022, 7:30 PM

34 points

0 comments10 min readEA link

(theinsideview.ai)

Relevant pre-AGI possibilities

kokotajlodJun 20, 2020, 1:15 PM

22 points

0 comments1 min readEA link

(aiimpacts.org)

Why Would AI “Aim” To Defeat Humanity?

Holden KarnofskyNov 29, 2022, 6:59 PM

24 points

0 comments32 min readEA link

(www.cold-takes.com)

High-level hopes for AI alignment

Holden KarnofskyDec 20, 2022, 2:11 AM

123 points

14 comments19 min readEA link

(www.cold-takes.com)

Possible OpenAI’s Q* breakthrough and DeepMind’s AlphaGo-type systems plus LLMs

BurnydelicNov 23, 2023, 7:02 AM

13 points

4 comments2 min readEA link

[Question] How strong is the evidence of unaligned AI systems causing harm?

Eevee🔹Jul 21, 2020, 4:08 AM

31 points

1 comment1 min readEA link

New report on how much computational power it takes to match the human brain (Open Philanthropy)

Aaron Gertler 🔸Sep 15, 2020, 1:06 AM

45 points

1 comment18 min readEA link

(www.openphilanthropy.org)

Paul Christiano: Current work in AI alignment

EA GlobalApr 3, 2020, 7:06 AM

80 points

3 comments24 min readEA link

(www.youtube.com)

Buck Shlegeris: How I think students should orient to AI safety

EA GlobalOct 25, 2020, 5:48 AM

11 points

0 comments1 min readEA link

(www.youtube.com)

The basic reasons I expect AGI ruin

RobBensingerApr 18, 2023, 3:37 AM

58 points

13 comments1 min readEA link

The current alignment plan, and how we might improve it | EAG Bay Area 23

BuckJun 7, 2023, 9:03 PM

66 points

0 comments33 min readEA link

“The Race to the End of Humanity” – Structural Uncertainty Analysis in AI Risk Models

FroolowMay 19, 2023, 12:03 PM

48 points

4 comments21 min readEA link

Conjecture: Internal Infohazard Policy

Connor LeahyJul 29, 2022, 7:35 PM

34 points

3 comments19 min readEA link

[Link] How understanding valence could help make future AIs safer

Milan GriffesOct 8, 2020, 6:53 PM

22 points

2 comments3 min readEA link

Aligning the Aligners: Ensuring Aligned AI acts for the common good of all mankind

timunderwoodJan 16, 2023, 11:13 AM

40 points

2 comments4 min readEA link

My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Quintin PopeMar 21, 2023, 1:23 AM

166 points

21 comments39 min readEA link

EA, Psychology & AI Safety Research

Sam EllisMay 26, 2022, 11:46 PM

28 points

3 comments6 min readEA link

Why the Orthogonality Thesis’s veracity is not the point:

Antoine de Scorraille ⏸️Jul 23, 2020, 3:40 PM

3 points

0 comments3 min readEA link

Apply to the second ML for Alignment Bootcamp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2]

BuckMay 6, 2022, 12:19 AM

111 points

7 comments6 min readEA link

Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]

Habryka [Deactivated]Nov 3, 2021, 6:20 PM

140 points

6 comments1 min readEA link

Speedrun: AI Alignment Prizes

joeFeb 9, 2023, 11:55 AM

27 points

0 comments18 min readEA link

Steering AI to care for animals, and soon

Andrew CritchJun 14, 2022, 1:13 AM

233 points

37 comments1 min readEA link

12 career advising questions that may (or may not) be helpful for people interested in alignment research

AkashDec 12, 2022, 10:36 PM

14 points

0 comments1 min readEA link

Predict responses to the “existential risk from AI” survey

RobBensingerMay 28, 2021, 1:38 AM

36 points

8 comments2 min readEA link

Aspiration-based, non-maximizing AI agent designs

Bob JacobsMay 7, 2024, 4:13 PM

12 points

1 comment38 min readEA link

AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

bgarfinkelJul 13, 2020, 4:17 PM

87 points

140 comments1 min readEA link

Misgeneralization as a misnomer

So8resApr 6, 2023, 8:43 PM

48 points

0 comments1 min readEA link

Final Report of the National Security Commission on Artificial Intelligence (NSCAI, 2021)

MichaelA🔸Jun 1, 2021, 8:19 AM

51 points

3 comments4 min readEA link

(www.nscai.gov)

Qualities that alignment mentors value in junior researchers

AkashFeb 14, 2023, 11:27 PM

31 points

1 comment1 min readEA link

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Joe_CarlsmithNov 15, 2023, 5:16 PM

71 points

4 comments1 min readEA link

Takeaways from safety by default interviews

AI ImpactsApr 7, 2020, 2:01 AM

25 points

2 comments13 min readEA link

(aiimpacts.org)

Naturalism and AI alignment

Michele CampoloApr 24, 2021, 4:20 PM

17 points

3 comments7 min readEA link

VIRTUA: a novel about AI alignment

Karl von WendtJan 12, 2023, 9:37 AM

23 points

0 comments1 min readEA link

Emergent Ventures AI

technicalitiesApr 8, 2022, 10:08 PM

22 points

0 comments1 min readEA link

(marginalrevolution.com)

Guardrails vs Goal-directedness in AI Alignment

freedomandutilityDec 30, 2023, 12:58 PM

13 points

2 comments1 min readEA link

Alignment is mostly about making cognition aimable at all

So8resJan 30, 2023, 3:22 PM

57 points

3 comments1 min readEA link

Law-Following AI 2: Intent Alignment + Superintelligence → Lawless AI (By Default)

Cullen 🔸Apr 27, 2022, 5:18 PM

19 points

0 comments6 min readEA link

Is AI forecasting a waste of effort on the margin?

EmrikNov 5, 2022, 12:41 AM

12 points

6 comments3 min readEA link

How to get technological knowledge on AI/ML (for non-tech people)

FangFangJun 30, 2021, 7:53 AM

63 points

7 comments5 min readEA link

13 background claims about EA

AkashSep 7, 2022, 3:54 AM

70 points

16 comments3 min readEA link

Andrew Critch: Logical induction — progress in AI alignment

EA GlobalAug 6, 2016, 12:40 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

Critical Review of ‘The Precipice’: A Reassessment of the Risks of AI and Pandemics

James FodorMay 11, 2020, 11:11 AM

111 points

32 comments26 min readEA link

Pile of Law and Law-Following AI

Cullen 🔸Jul 13, 2022, 12:29 AM

28 points

2 comments3 min readEA link

[Linkpost] Jan Leike on three kinds of alignment taxes

AkashJan 6, 2023, 11:57 PM

29 points

0 comments1 min readEA link

Community Building for Graduate Students: A Targeted Approach

Neil CrawfordMar 29, 2022, 7:47 PM

13 points

0 comments3 min readEA link

[Question] If AIs had subcortical brain simulation, would that solve the alignment problem?

Rainbow AffectJul 31, 2023, 3:48 PM

1 point

0 comments2 min readEA link

Quick survey on AI alignment resources

frances_lorenzJun 30, 2022, 7:08 PM

15 points

0 comments1 min readEA link

[Question] How should we invest in “long-term short-termism” given the likelihood of transformative AI?

James_BanksJan 12, 2021, 11:54 PM

8 points

0 comments1 min readEA link

Three Impacts of Machine Intelligence

Paul_ChristianoAug 23, 2013, 10:10 AM

33 points

5 comments8 min readEA link

(rationalaltruist.com)

Eric Drexler: Paretotopian goal alignment

EA GlobalMar 15, 2019, 2:51 PM

16 points

0 comments10 min readEA link

(www.youtube.com)

On AI and Compute

johncroxApr 3, 2019, 9:26 PM

39 points

12 comments8 min readEA link

Mauhn Releases AI Safety Documentation

Berg SeverensJul 2, 2021, 12:19 PM

4 points

2 comments1 min readEA link

LLMs might not be the future of search: at least, not yet.

James-Hartree-LawJan 22, 2025, 9:40 PM

4 points

1 comment4 min readEA link

[Question] What are your recommendations for technical AI alignment podcasts?

Evan_GaensbauerMay 11, 2022, 9:52 PM

13 points

4 comments1 min readEA link

Max Tegmark: Risks and benefits of advanced artificial intelligence

EA GlobalAug 5, 2016, 9:19 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

Defining alignment research

richard_ngoAug 19, 2024, 10:49 PM

48 points

1 comment1 min readEA link

[Question] Is there evidence that recommender systems are changing users’ preferences?

zdgroffApr 12, 2021, 7:11 PM

60 points

15 comments1 min readEA link

Discontinuous progress in history: an update

AI ImpactsApr 17, 2020, 4:28 PM

69 points

3 comments24 min readEA link

Large Language Models as Corporate Lobbyists, and Implications for Societal-AI Alignment

johnjnayJan 4, 2023, 10:22 PM

10 points

6 comments8 min readEA link

An overview of some promising work by junior alignment researchers

AkashDec 26, 2022, 5:23 PM

10 points

0 comments1 min readEA link

AGI x-risk timelines: 10% chance (by year X) estimates should be the headline, not 50%.

Greg_Colbourn ⏸️ Mar 1, 2022, 12:02 PM

69 points

22 comments2 min readEA link

[Question] Why should we not put effort into AI safety research?

Ben ThompsonMay 16, 2021, 5:11 AM

15 points

5 comments1 min readEA link

[Question] Are we confident that superintelligent artificial intelligence disempowering humans would be bad?

Vasco Grilo🔸Jun 10, 2023, 9:24 AM

24 points

27 comments1 min readEA link

When “yang” goes wrong

Joe_CarlsmithJan 8, 2024, 4:35 PM

57 points

1 comment1 min readEA link

[Question] How can I bet on short timelines?

kokotajlodNov 7, 2020, 12:45 PM

33 points

12 comments2 min readEA link

Order Matters for Deceptive Alignment

DavidWFeb 15, 2023, 8:12 PM

20 points

1 comment1 min readEA link

(www.lesswrong.com)

[Question] Alignment & Capabilities: What’s the difference?

John G. HalsteadAug 31, 2023, 10:13 PM

50 points

10 comments1 min readEA link

Action: Help expand funding for AI Safety by coordinating on NSF response

Evan R. MurphyJan 20, 2022, 8:48 PM

20 points

7 comments3 min readEA link

The Metaethics and Normative Ethics of AGI Value Alignment: Many Questions, Some Implications

Eleos Arete CitriniSep 15, 2021, 7:05 PM

25 points

0 comments8 min readEA link

Brain-computer interfaces and brain organoids in AI alignment?

freedomandutilityApr 15, 2023, 10:28 PM

8 points

2 comments1 min readEA link

Shah and Yudkowsky on alignment failures

EliezerYudkowskyFeb 28, 2022, 7:25 PM

38 points

7 comments92 min readEA link

The Problem With the Word ‘Alignment’

Peli GrietzerMay 21, 2024, 9:37 PM

13 points

1 comment6 min readEA link

[Creative Writing Contest] An AI Safety Limerick

Ben_West🔸Oct 18, 2021, 7:11 PM

21 points

5 comments1 min readEA link

Situational awareness (Section 2.1 of “Scheming AIs”)

Joe_CarlsmithNov 26, 2023, 11:00 PM

12 points

1 comment1 min readEA link

Helen Toner: The Open Philanthropy Project’s work on AI risk

EA GlobalNov 3, 2017, 7:43 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

Public-facing Censorship Is Safety Theater, Causing Reputational Damage

YitzSep 23, 2022, 5:08 AM

49 points

7 comments1 min readEA link

[Question] What kind of event, targeted to undergraduate CS majors, would be most effective at getting people to work on AI safety?

CBiddulphSep 19, 2021, 4:19 PM

9 points

1 comment1 min readEA link

Lessons learned from talking to >100 academics about AI safety

mariushobbhahnOct 10, 2022, 1:16 PM

138 points

21 comments1 min readEA link

I’m Cullen O’Keefe, a Policy Researcher at OpenAI, AMA

Cullen 🔸Jan 11, 2020, 4:13 AM

45 points

68 comments1 min readEA link

What does (and doesn’t) AI mean for effective altruism?

EA GlobalAug 12, 2017, 7:00 AM

9 points

0 comments12 min readEA link

[Question] Is this a good way to bet on short timelines?

kokotajlodNov 28, 2020, 2:31 PM

17 points

16 comments1 min readEA link

[Question] Should the EA community have a DL engineering fellowship?

PabloAMC 🔸Dec 24, 2021, 1:43 PM

26 points

6 comments1 min readEA link

The Multidisciplinary Approach to Alignment (MATA) and Archetypal Transfer Learning (ATL)

MiguelJun 19, 2023, 3:23 AM

4 points

0 comments7 min readEA link

EA megaprojects continued

mariushobbhahnDec 3, 2021, 10:33 AM

183 points

48 comments7 min readEA link

A mesa-optimization perspective on AI valence and moral patienthood

jacobpfauSep 9, 2021, 10:23 PM

10 points

18 comments17 min readEA link

[Question] What would you do if you had a lot of money/power/influence and you thought that AI timelines were very short?

Greg_Colbourn ⏸️ Nov 12, 2021, 9:59 PM

29 points

8 comments1 min readEA link

Quantifying the Far Future Effects of Interventions

MichaelDickensMay 18, 2016, 2:15 AM

9 points

0 comments11 min readEA link

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:43 AM

53 points

21 comments1 min readEA link

AI safety tax dynamics

Owen Cotton-BarrattOct 23, 2024, 12:21 PM

22 points

9 comments6 min readEA link

(strangecities.substack.com)

Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship

adamShimiDec 18, 2021, 3:25 PM

37 points

5 comments10 min readEA link

[Cause Exploration Prizes] Expanding communication about AGI risks

InesSep 22, 2022, 5:30 AM

13 points

0 comments11 min readEA link

Shallow review of live agendas in alignment & safety

technicalitiesNov 27, 2023, 11:33 AM

76 points

8 comments29 min readEA link

Some AI Governance Research Ideas

MarkusAnderljungJun 3, 2021, 10:51 AM

102 points

5 comments2 min readEA link

Soares, Tallinn, and Yudkowsky discuss AGI cognition

EliezerYudkowskyNov 29, 2021, 5:28 PM

15 points

0 comments40 min readEA link

[Question] Career Advice: Philosophy + Programming → AI Safety

tcelferactMar 18, 2022, 3:09 PM

30 points

11 comments2 min readEA link

Artificial intelligence career stories

EA GlobalOct 25, 2020, 6:56 AM

12 points

0 comments1 min readEA link

(www.youtube.com)

Christiano and Yudkowsky on AI predictions and human intelligence

EliezerYudkowskyFeb 23, 2022, 4:51 PM

31 points

0 comments42 min readEA link

[Question] What is an example of recent, tangible progress in AI safety research?

Aaron Gertler 🔸Jun 14, 2021, 5:29 AM

35 points

4 comments1 min readEA link

Compendium of problems with RLHF

Raphaël SJan 30, 2023, 8:48 AM

18 points

0 comments1 min readEA link

Sharing the World with Digital Minds

Aaron Gertler 🔸Dec 1, 2020, 8:00 AM

12 points

1 comment1 min readEA link

(www.nickbostrom.com)

Coherence arguments imply a force for goal-directed behavior

Katja_GraceApr 6, 2021, 9:44 PM

19 points

1 comment11 min readEA link

(worldspiritsockpuppet.com)

[linkpost] Sharing powerful AI models: the emerging paradigm of structured access

tsJan 20, 2022, 9:10 PM

11 points

3 comments1 min readEA link

Information security careers for GCR reduction

ClaireZabelJun 20, 2019, 11:56 PM

187 points

35 comments8 min readEA link

Survey on AI existential risk scenarios

Sam ClarkeJun 8, 2021, 5:12 PM

154 points

11 comments6 min readEA link

Key Papers in Language Model Safety

aogJun 20, 2022, 2:59 PM

20 points

0 comments22 min readEA link

[Question] What are the challenges and problems with programming law-breaking constraints into AGI?

MichaelStJulesFeb 2, 2020, 8:53 PM

20 points

34 comments1 min readEA link

Consider paying me to do AI safety research work

RupertNov 5, 2020, 8:09 AM

11 points

3 comments2 min readEA link

Some global catastrophic risk estimates

TamayFeb 10, 2021, 7:32 PM

106 points

15 comments1 min readEA link

Katja Grace: AI safety

EA GlobalAug 11, 2017, 8:19 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

CFP for Rebellion and Disobedience in AI workshop

Ram RachumDec 29, 2022, 4:09 PM

4 points

0 comments1 min readEA link

Tan Zhi Xuan: AI alignment, philosophical pluralism, and the relevance of non-Western philosophy

EA GlobalNov 21, 2020, 8:12 AM

19 points

1 comment1 min readEA link

(www.youtube.com)

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin ShahJan 3, 2020, 7:52 AM

58 points

12 comments10 min readEA link

(www.alignmentforum.org)

Jesse Clifton: Open-source learning — a bargaining approach

EA GlobalOct 18, 2019, 6:05 PM

10 points

0 comments1 min readEA link

(www.youtube.com)

AI things that are perhaps as important as human-controlled AI

ChiMar 3, 2024, 6:07 PM

114 points

9 comments21 min readEA link

Law-Following AI 3: Lawless AI Agents Undermine Stabilizing Agreements

Cullen 🔸Apr 27, 2022, 5:20 PM

28 points

3 comments3 min readEA link

[Linkpost] How To Get Into Independent Research On Alignment/Agency

Jackson WagnerFeb 14, 2022, 9:40 PM

10 points

0 comments1 min readEA link

On the abolition of man

Joe_CarlsmithJan 18, 2024, 6:17 PM

71 points

4 comments1 min readEA link

The Parable of the Boy Who Cried 5% Chance of Wolf

Kat WoodsAug 15, 2022, 2:22 PM

80 points

8 comments2 min readEA link

Intent alignment should not be the goal for AGI x-risk reduction

johnjnayOct 26, 2022, 1:24 AM

7 points

1 comment1 min readEA link

How to pursue a career in technical AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:36 PM

265 points

9 comments39 min readEA link

Ways to buy time

AkashNov 12, 2022, 7:31 PM

47 points

1 comment1 min readEA link

Jan Leike, Helen Toner, Malo Bourgon, and Miles Brundage: Working in AI

EA GlobalAug 11, 2017, 8:19 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

Getting started independently in AI Safety

JJ HepburnJul 6, 2021, 3:20 PM

41 points

10 comments2 min readEA link

Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope

Greg_Colbourn ⏸️ Oct 12, 2023, 11:24 AM

76 points

85 comments9 min readEA link

Sydney AI Safety Fellowship

Chris LeongDec 2, 2021, 7:35 AM

16 points

0 comments2 min readEA link

AGI Predictions

PabloNov 21, 2020, 12:02 PM

36 points

0 comments1 min readEA link

(www.lesswrong.com)

On presenting the case for AI risk

Aryeh EnglanderMar 8, 2022, 9:37 PM

114 points

12 comments4 min readEA link

List #3: Why not to assume on prior that AGI-alignment workarounds are available

RemmeltDec 24, 2022, 9:54 AM

6 points

0 comments1 min readEA link

[Question] Is it crunch time yet? If so, who can help?

Nicholas KrossOct 13, 2021, 4:11 AM

29 points

9 comments1 min readEA link

Don’t Call It AI Alignment

GilFeb 20, 2023, 5:27 AM

16 points

7 comments2 min readEA link

Are alignment researchers devoting enough time to improving their research capacity?

Carson JonesNov 4, 2022, 12:58 AM

11 points

1 comment1 min readEA link

The case for more Alignment Target Analysis (ATA)

ChiSep 20, 2024, 1:14 AM

23 points

0 comments1 min readEA link

Ngo and Yudkowsky on AI capability gains

richard_ngoNov 19, 2021, 1:54 AM

23 points

4 comments39 min readEA link

Otherness and control in the age of AGI

Joe_CarlsmithJan 2, 2024, 6:15 PM

37 points

1 comment1 min readEA link

[Question] I’m interviewing Max Tegmark about AI safety and more. What shouId I ask him?

Robert_WiblinMay 13, 2022, 3:32 PM

18 points

2 comments1 min readEA link

Long-Term Future Fund: May 2021 grant recommendations

abergalMay 27, 2021, 6:44 AM

110 points

17 comments57 min readEA link

How Do AI Timelines Affect Giving Now vs. Later?

MichaelDickensAug 3, 2021, 3:36 AM

36 points

8 comments8 min readEA link

Bryan Johnson seems more EA aligned than I expected

PeterSlatteryApr 22, 2024, 9:38 AM

13 points

27 comments2 min readEA link

(www.youtube.com)

[Question] What considerations influence whether I have more influence over short or long timelines?

kokotajlodNov 5, 2020, 7:57 PM

18 points

0 comments1 min readEA link

AI Safety field-building projects I’d like to see

AkashSep 11, 2022, 11:45 PM

31 points

4 comments6 min readEA link

(www.lesswrong.com)

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM

13 points

0 comments1 min readEA link

(www.emergent-values.ai)

Gentleness and the artificial Other

Joe_CarlsmithJan 2, 2024, 6:21 PM

90 points

2 comments1 min readEA link

Why AI is Harder Than We Think—Melanie Mitchell

Eevee🔹Apr 28, 2021, 8:19 AM

45 points

7 comments2 min readEA link

(arxiv.org)

Four questions I ask AI safety researchers

AkashJul 17, 2022, 5:25 PM

30 points

3 comments1 min readEA link

Thoughts on short timelines

Tobias_BaumannOct 23, 2018, 3:59 PM

22 points

14 comments5 min readEA link

Symbiosis, not alignment, as the goal for liberal democracies in the transition to artificial general intelligence

simonfriederichMar 17, 2023, 1:04 PM

18 points

2 comments24 min readEA link

(rdcu.be)

Important, actionable research questions for the most important century

Holden KarnofskyFeb 24, 2022, 4:34 PM

298 points

13 comments19 min readEA link

SERI ML application deadline is extended until May 22.

Viktoria MalyasovaMay 22, 2022, 12:13 AM

13 points

3 comments1 min readEA link

Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël TrazziJan 12, 2023, 5:09 PM

16 points

0 comments1 min readEA link

AI alignment research links

Holden KarnofskyJan 6, 2022, 5:52 AM

16 points

0 comments6 min readEA link

(www.cold-takes.com)

Messy personal stuff that affected my cause prioritization (or: how I started to care about AI safety)

Julia_Wise🔸May 5, 2022, 5:59 PM

265 points

14 comments2 min readEA link

Technical AGI safety research outside AI

richard_ngoOct 18, 2019, 3:02 PM

91 points

5 comments3 min readEA link

Some promising career ideas beyond 80,000 Hours’ priority paths

Arden KoehlerJun 26, 2020, 10:34 AM

142 points

28 comments15 min readEA link

Law-Following AI 1: Sequence Introduction and Structure

Cullen 🔸Apr 27, 2022, 5:16 PM

35 points

2 comments9 min readEA link

Increased Availability and Willingness for Deployment of Resources for Effective Altruism and Long-Termism

Evan_GaensbauerDec 29, 2021, 8:20 PM

46 points

1 comment2 min readEA link

7 essays on Building a Better Future

Jamie_HarrisJun 24, 2022, 2:28 PM

21 points

0 comments2 min readEA link

Video and transcript of talk on automating alignment research

Joe_CarlsmithApr 30, 2025, 5:43 PM

11 points

1 comment1 min readEA link

(joecarlsmith.com)

On the correspondence between AI-misalignment and cognitive dissonance using a behavioral economics model

Stijn Bruers 🔸Nov 1, 2022, 9:15 AM

11 points

0 comments6 min readEA link

Eli Lifland on Navigating the AI Alignment Landscape

Ozzie GooenFeb 1, 2023, 12:07 AM

48 points

9 comments31 min readEA link

(quri.substack.com)

“Existential risk from AI” survey results

RobBensingerJun 1, 2021, 8:19 PM

80 points

35 comments11 min readEA link

Ngo and Yudkowsky on scientific reasoning and pivotal acts

EliezerYudkowskyFeb 21, 2022, 5:00 PM

33 points

1 comment35 min readEA link

[Question] Is transformative AI the biggest existential risk? Why or why not?

Eevee🔹Mar 5, 2022, 3:54 AM

9 points

10 comments1 min readEA link

A Simple Model of AGI Deployment Risk

djbinderJul 9, 2021, 9:44 AM

30 points

0 comments5 min readEA link

An ML safety insurance company—shower thoughts

EdoAradOct 18, 2021, 7:45 AM

15 points

4 comments1 min readEA link

AI Safety Needs Great Engineers

Andy JonesNov 23, 2021, 9:03 PM

98 points

13 comments4 min readEA link

How to build a safe advanced AI (Evan Hubinger) | What’s up in AI safety? (Asya Bergal)

EA GlobalOct 25, 2020, 5:48 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

AI alignment prize winners and next round [link]

RyanCareyJan 20, 2018, 12:07 PM

7 points

1 comment1 min readEA link

FLI AI Alignment podcast: Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI

evhubJul 1, 2020, 8:59 PM

13 points

2 comments1 min readEA link

(futureoflife.org)

[Link] EAF Research agenda: “Cooperation, Conflict, and Transformative Artificial Intelligence”

stefan.torgesJan 17, 2020, 1:28 PM

64 points

0 comments1 min readEA link

I’m Buck Shlegeris, I do research and outreach at MIRI, AMA

BuckNov 15, 2019, 10:44 PM

123 points

228 comments2 min readEA link

AI Safety: Applying to Graduate Studies

frances_lorenzDec 15, 2021, 10:56 PM

24 points

0 comments12 min readEA link

Atari early

AI ImpactsApr 2, 2020, 11:28 PM

34 points

2 comments5 min readEA link

(aiimpacts.org)

[Question] What harm could AI safety do?

SeanEngelhartMay 15, 2021, 1:11 AM

12 points

7 comments1 min readEA link

[Question] The positive case for a focus on achieving safe AI?

vipulnaikJun 25, 2021, 4:01 AM

41 points

1 comment1 min readEA link

Cosmic AI safety

Magnus VindingDec 6, 2024, 10:32 PM

23 points

5 comments6 min readEA link

[Question] Why aren’t you freaking out about OpenAI? At what point would you start?

AppliedDivinityStudiesOct 10, 2021, 1:06 PM

80 points

22 comments2 min readEA link

There are two factions working to prevent AI dangers. Here’s why they’re deeply divided.

SharmakeAug 10, 2022, 7:52 PM

10 points

0 comments4 min readEA link

(www.vox.com)

Is GPT-3 the death of the paperclip maximizer?

matthias_samwaldAug 3, 2020, 11:34 AM

4 points

1 comment1 min readEA link

Owen Cotton-Barratt: What does (and doesn’t) AI mean for effective altruism?

EA GlobalAug 11, 2017, 8:19 AM

10 points

0 comments12 min readEA link

(www.youtube.com)

We should expect to worry more about speculative risks

bgarfinkelMay 29, 2022, 9:08 PM

120 points

14 comments3 min readEA link

Alignment Newsletter One Year Retrospective

Rohin ShahApr 10, 2019, 7:00 AM

62 points

22 comments21 min readEA link

Mahendra Prasad: Rational group decision-making

EA GlobalJul 8, 2020, 3:06 PM

15 points

0 comments16 min readEA link

(www.youtube.com)

List #1: Why stopping the development of AGI is hard but doable

RemmeltDec 24, 2022, 9:52 AM

24 points

2 comments1 min readEA link

Conversation on AI risk with Adam Gleave

AI ImpactsDec 27, 2019, 9:43 PM

18 points

3 comments4 min readEA link

(aiimpacts.org)

A list of good heuristics that the case for AI X-risk fails

Aaron Gertler 🔸Jul 16, 2020, 9:56 AM

25 points

9 comments2 min readEA link

(www.alignmentforum.org)

Meditations on careers in AI Safety

PabloAMC 🔸Mar 23, 2022, 10:00 PM

88 points

30 comments2 min readEA link

AI Moral Alignment: The Most Important Goal of Our Generation

Ronen BarMar 26, 2025, 12:32 PM

127 points

32 comments8 min readEA link

What does it mean to become an expert in AI Hardware?

TophJan 9, 2021, 4:15 AM

87 points

10 comments11 min readEA link

Twitter-length responses to 24 AI alignment arguments

RobBensingerMar 14, 2022, 7:34 PM

67 points

17 comments8 min readEA link

Who Aligns the Alignment Researchers?

ben.smithMar 5, 2023, 11:22 PM

23 points

4 comments1 min readEA link

Potential Risks from Advanced AI

EA GlobalAug 13, 2017, 7:00 AM

9 points

0 comments18 min readEA link

What success looks like

mariushobbhahnJun 28, 2022, 2:30 PM

112 points

20 comments19 min readEA link

Forecasting Transformative AI: What Kind of AI?

Holden KarnofskyAug 10, 2021, 9:38 PM

62 points

3 comments10 min readEA link

AGI in a vulnerable world

AI ImpactsApr 2, 2020, 3:43 AM

17 points

0 comments1 min readEA link

(aiimpacts.org)

List #2: Why coordinating to align as humans to not develop AGI is a lot easier than, well… coordinating as humans with AGI coordinating to be aligned with humans

RemmeltDec 24, 2022, 9:53 AM

3 points

0 comments1 min readEA link

Aligning Recommender Systems as Cause Area

IvanVendrovMay 8, 2019, 8:56 AM

150 points

48 comments13 min readEA link

Disagreements about Alignment: Why, and how, we should try to solve them

ojorgensenAug 8, 2022, 10:32 PM

16 points

6 comments16 min readEA link

[Question] Brief summary of key disagreements in AI Risk

Aryeh EnglanderDec 26, 2019, 7:40 PM

31 points

3 comments1 min readEA link

Nobody’s on the ball on AGI alignment

leopoldMar 29, 2023, 2:26 PM

327 points

65 comments9 min readEA link

(www.forourposterity.com)

Some AI research areas and their relevance to existential safety

Andrew CritchDec 15, 2020, 12:15 PM

12 points

1 comment56 min readEA link

(alignmentforum.org)

What Should the Average EA Do About AI Alignment?

RaemonFeb 25, 2017, 8:07 PM

42 points

39 comments7 min readEA link

Draft report on AI timelines

AjeyaDec 15, 2020, 12:10 PM

35 points

0 comments1 min readEA link

(alignmentforum.org)

The Importance of AI Alignment, explained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM

50 points

4 comments13 min readEA link

Projects I would like to see (possibly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM

9 points

0 comments1 min readEA link

Discussion with Eliezer Yudkowsky on AGI interventions

RobBensingerNov 11, 2021, 3:21 AM

60 points

33 comments34 min readEA link

Consider trying the ELK contest (I am)

Holden KarnofskyJan 5, 2022, 7:42 PM

110 points

17 comments16 min readEA link

The case for becoming a black-box investigator of language models

BuckMay 6, 2022, 2:37 PM

90 points

7 comments3 min readEA link

13 Very Different Stances on AGI

Ozzie GooenDec 27, 2021, 11:30 PM

84 points

23 comments3 min readEA link

Daniel Dewey: The Open Philanthropy Project’s work on potential risks from advanced AI

EA GlobalAug 11, 2017, 8:19 AM

7 points

0 comments18 min readEA link

(www.youtube.com)

[Question] Is a career in making AI systems more secure a meaningful way to mitigate the X-risk posed by AGI?

Kyle O’BrienFeb 13, 2022, 7:05 AM

14 points

4 comments1 min readEA link

Redwood Research is hiring for several roles

Jack RNov 29, 2021, 12:18 AM

75 points

0 comments1 min readEA link

An even deeper atheism

Joe_CarlsmithJan 11, 2024, 5:28 PM

26 points

2 comments1 min readEA link

Why I expect successful (narrow) alignment

Tobias_BaumannDec 29, 2018, 3:46 PM

18 points

10 comments1 min readEA link

(s-risks.org)

Owain Evans and Victoria Krakovna: Careers in technical AI safety

EA GlobalNov 3, 2017, 7:43 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

AI safety university groups: a promising opportunity to reduce existential risk

micJun 30, 2022, 6:37 PM

53 points

1 comment11 min readEA link

Announcing the Vitalik Buterin Fellowships in AI Existential Safety!

DanielFilanSep 21, 2021, 12:41 AM

62 points

0 comments1 min readEA link

(grants.futureoflife.org)

Long-Term Future Fund: April 2019 grant recommendations

Habryka [Deactivated]Apr 23, 2019, 7:00 AM

142 points

242 comments47 min readEA link

Truthful AI

Owen Cotton-BarrattOct 20, 2021, 3:11 PM

55 points

14 comments10 min readEA link

Does AI risk “other” the AIs?

Joe_CarlsmithJan 9, 2024, 5:51 PM

23 points

3 comments1 min readEA link

Levelling Up in AI Safety Research Engineering

GabeMSep 2, 2022, 4:59 AM

166 points

21 comments17 min readEA link

New blog: Planned Obsolescence

AjeyaMar 27, 2023, 7:46 PM

198 points

9 comments1 min readEA link

(www.planned-obsolescence.org)

Imitation Learning is Probably Existentially Safe

Vasco Grilo🔸Apr 30, 2024, 5:06 PM

19 points

7 comments3 min readEA link

(www.openphilanthropy.org)

AI views and disagreements AMA: Christiano, Ngo, Shah, Soares, Yudkowsky

RobBensingerMar 1, 2022, 1:13 AM

30 points

4 comments1 min readEA link

(www.lesswrong.com)

Yudkowsky and Christiano discuss “Takeoff Speeds”

EliezerYudkowskyNov 22, 2021, 7:42 PM

42 points

0 comments60 min readEA link

BERI is hiring an ML Software Engineer

sawyer🔸Nov 10, 2021, 7:36 PM

17 points

2 comments1 min readEA link

Christiano, Cotra, and Yudkowsky on AI progress

AjeyaNov 25, 2021, 4:30 PM

18 points

6 comments68 min readEA link

$500 bounty for alignment contest ideas

AkashJun 30, 2022, 1:55 AM

18 points

1 comment2 min readEA link

Language Agents Reduce the Risk of Existential Catastrophe

cdkgMay 29, 2023, 9:59 AM

29 points

6 comments26 min readEA link

“Slower tech development” can be about ordering, gradualness, or distance from now

MichaelA🔸Nov 14, 2021, 8:58 PM

47 points

3 comments4 min readEA link

Personal thoughts on careers in AI policy and strategy

carrickflynnSep 27, 2017, 4:52 PM

56 points

28 comments18 min readEA link

Collin Burns on Alignment Research And Discovering Latent Knowledge Without Supervision

Michaël TrazziJan 17, 2023, 5:21 PM

21 points

2 comments1 min readEA link

Three kinds of competitiveness

AI ImpactsApr 2, 2020, 3:46 AM

10 points

0 comments5 min readEA link

(aiimpacts.org)

Ought: why it matters and ways to help

Paul_ChristianoJul 26, 2019, 1:56 AM

52 points

5 comments5 min readEA link

Two reasons we might be closer to solving alignment than it seems

Kat WoodsSep 24, 2022, 5:38 PM

44 points

17 comments4 min readEA link

Announcing the Harvard AI Safety Team

Xander123Jun 30, 2022, 6:34 PM

128 points

4 comments5 min readEA link

[Question] What are the top priorities in a slow-takeoff, multipolar world?

JP Addison🔸Aug 25, 2021, 8:47 AM

26 points

9 comments1 min readEA link

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:52 PM

134 points

12 comments14 min readEA link

(www.neelnanda.io)

Is this community over-emphasizing AI alignment?

LixiangJan 8, 2023, 6:23 AM

1 point

5 comments1 min readEA link

AI Impacts: Historic trends in technological progress

Aaron Gertler 🔸Feb 12, 2020, 12:08 AM

55 points

5 comments3 min readEA link

Informatica: Special Issue on Superintelligence

RyanCareyMay 3, 2017, 5:05 AM

7 points

0 comments2 min readEA link

Michael Page, Dario Amodei, Helen Toner, Tasha McCauley, Jan Leike, & Owen Cotton-Barratt: Musings on AI

EA GlobalAug 11, 2017, 8:19 AM

7 points

0 comments1 min readEA link

(www.youtube.com)

SERI ML Alignment Theory Scholars Program 2022

Ryan KiddApr 27, 2022, 4:33 PM

57 points

2 comments3 min readEA link

Racing through a minefield: the AI deployment problem

Holden KarnofskyDec 31, 2022, 9:44 PM

79 points

1 comment13 min readEA link

(www.cold-takes.com)

Open Philanthropy’s AI governance grantmaking (so far)

Aaron Gertler 🔸Dec 17, 2020, 12:00 PM

63 points

0 comments6 min readEA link

(www.openphilanthropy.org)

De Dicto and De Se Reference Matters for Alignment

philgoetzOct 3, 2023, 9:57 PM

5 points

2 comments9 min readEA link

AGI risk: analogies & arguments

technicalitiesMar 23, 2021, 1:18 PM

31 points

3 comments8 min readEA link

(www.gleech.org)

Opportunities for individual donors in AI safety

alexflintMar 12, 2018, 2:10 AM

13 points

11 comments10 min readEA link

Paul Christiano on how OpenAI is developing real solutions to the ‘AI alignment problem’, and his vision of how humanity will progressively hand over decision-making to AI systems

80000_HoursOct 2, 2018, 11:49 AM

6 points

0 comments185 min readEA link

Interview with Roman Yampolskiy about AGI on The Reality Check

Darren McKeeFeb 18, 2023, 11:29 PM

27 points

0 comments1 min readEA link

(www.trcpodcast.com)

AI alignment as a translation problem

Roman LeventovFeb 5, 2024, 2:14 PM

3 points

1 comment1 min readEA link

A Benchmark for Measuring Honesty in AI Systems

Mantas MazeikaMar 4, 2025, 5:44 PM

29 points

0 comments2 min readEA link

(www.mask-benchmark.ai)

Implications of Quantum Computing for Artificial Intelligence alignment research (ABRIDGED)

Jaime SevillaSep 5, 2019, 2:56 PM

25 points

4 comments2 min readEA link

Tetherware #2: What every human should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:25 AM

3 points

0 comments11 min readEA link

(tetherware.substack.com)

Does generality pay? GPT-3 can provide preliminary evidence.

Eevee🔹Jul 12, 2020, 6:53 PM

21 points

4 comments2 min readEA link

[Question] Why not offer a multi-million / billion dollar prize for solving the Alignment Problem?

Aryeh EnglanderApr 17, 2022, 4:08 PM

15 points

9 comments1 min readEA link

Resources that (I think) new alignment researchers should know about

AkashOct 28, 2022, 10:13 PM

20 points

2 comments1 min readEA link

Decomposing alignment to take advantage of paradigms

Christopher KingJun 4, 2023, 2:26 PM

2 points

0 comments4 min readEA link

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:30 PM

107 points

6 comments22 min readEA link

(www.anthropic.com)

Absolute Zero: AlphaZero for LLM

alapmiMay 12, 2025, 2:54 PM

2 points

0 comments1 min readEA link

What Does an ASI Political Ecology Mean for Human Survival?

Nathan SidneyFeb 23, 2025, 8:53 AM

7 points

3 comments1 min readEA link

How the Human Psychological “Program” Undermines AI Alignment — and What We Can Do

Beyond SingularityMay 6, 2025, 1:37 PM

13 points

2 comments3 min readEA link

Alignment Faking in Large Language Models

Ryan GreenblattDec 18, 2024, 5:19 PM

142 points

9 comments1 min readEA link

The ‘Bad Parent’ Problem: Why Human Society Complicates AI Alignment

Beyond SingularityApr 5, 2025, 9:08 PM

11 points

1 comment3 min readEA link

[Question] How to get more academics enthusiastic about doing AI Safety research?

PabloAMC 🔸Sep 4, 2021, 2:10 PM

25 points

19 comments1 min readEA link

Analysis of AI Safety surveys for field-building insights

Ash JafariDec 5, 2022, 5:37 PM

30 points

7 comments5 min readEA link

Begging, Pleading AI Orgs to Comment on NIST AI Risk Management Framework

BridgesApr 15, 2022, 7:35 PM

87 points

3 comments2 min readEA link

AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence

funnyfrancoMar 18, 2025, 7:19 PM

3 points

9 comments18 min readEA link

Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

𝕮𝖎𝖓𝖊𝖗𝖆Mar 23, 2023, 5:45 AM

15 points

0 comments1 min readEA link

Annual AGI Benchmarking Event

MetaculusAug 26, 2022, 9:31 PM

20 points

2 comments2 min readEA link

(www.metaculus.com)

Unveiling the American Public Opinion on AI Moratorium and Government Intervention: The Impact of Media Exposure

OttoMay 8, 2023, 10:49 AM

28 points

5 comments6 min readEA link

The role of academia in AI Safety.

PabloAMC 🔸Mar 28, 2022, 12:04 AM

71 points

19 comments3 min readEA link

A Rocket–Interpretability Analogy

plexOct 21, 2024, 1:55 PM

13 points

1 comment1 min readEA link

But exactly how complex and fragile?

Katja_GraceDec 13, 2019, 7:05 AM

37 points

3 comments3 min readEA link

(meteuphoric.com)

Why Post-Probability AI May Be Safer Than Probability-Based Models

devin.bostickApr 16, 2025, 2:23 PM

2 points

0 comments2 min readEA link

Yip Fai Tse on animal welfare & AI safety and long termism

Karthik PalakodetiJun 22, 2023, 12:48 PM

47 points

0 comments1 min readEA link

Origin and alignment of goals, meaning, and morality

FalseCogsAug 24, 2023, 2:05 PM

1 point

2 comments35 min readEA link

[Link post] Promising Paths to Alignment—Connor Leahy | Talk

frances_lorenzMay 14, 2022, 3:58 PM

17 points

0 comments1 min readEA link

Discovering Language Model Behaviors with Model-Written Evaluations

evhubDec 20, 2022, 8:09 PM

25 points

0 comments1 min readEA link

ML Safety Scholars Summer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM

56 points

2 comments21 min readEA link

A stubborn unbeliever finally gets the depth of the AI alignment problem

aelwoodOct 13, 2022, 3:16 PM

32 points

7 comments1 min readEA link

[Question] Launching Applications for the Global AI Safety Fellowship 2025!

Impact AcademyNov 27, 2024, 3:33 PM

9 points

1 comment1 min readEA link

Confused about AI research as a means of addressing AI risk

Eli RoseFeb 21, 2019, 12:07 AM

31 points

15 comments1 min readEA link

Title: “Nurturing AI: A Different Vision for Safety and Growth”

Brad WilkinsApr 28, 2025, 7:21 PM

0 points

0 comments1 min readEA link

Can AI Alignment Models Benefit from Indo-European Tripartite Structures?

Paul FallavollitaMay 2, 2025, 12:39 PM

1 point

0 comments2 min readEA link

De-emphasise alignment, emphasise restraint

EuanMcLeanFeb 4, 2025, 5:43 PM

19 points

2 comments7 min readEA link

AI Safety Career Bottlenecks Survey Responses Responses

Linda LinseforsMay 28, 2021, 10:41 AM

35 points

1 comment5 min readEA link

A response to Matthews on AI Risk

RyanCareyAug 11, 2015, 12:58 PM

11 points

16 comments6 min readEA link

Desirable? AI qualities

brb243Mar 21, 2022, 10:05 PM

7 points

0 comments2 min readEA link

[Question] Are social media algorithms an existential risk?

Barry GrimesSep 15, 2020, 8:52 AM

24 points

13 comments1 min readEA link

My (naive) take on Risks from Learned Optimization

Artyom KNov 6, 2022, 4:25 PM

5 points

0 comments1 min readEA link

Solving alignment isn’t enough for a flourishing future

micFeb 2, 2024, 6:22 PM

27 points

0 comments22 min readEA link

(papers.ssrn.com)

When AI Speaks Too Soon: How Premature Revelation Can Suppress Human Emergence

KaedeHamasakiApr 10, 2025, 6:19 PM

1 point

3 comments3 min readEA link

You Understand AI Alignment and How to Make Soup

Leen ArmoushMay 28, 2022, 6:22 AM

0 points

2 comments5 min readEA link

There is only one goal or drive—only self-perpetuation counts

freest oneJun 13, 2023, 1:37 AM

2 points

4 comments8 min readEA link

AI acceleration from a safety perspective: Trade-offs and considerations

mariushobbhahnJan 19, 2022, 9:44 AM

12 points

1 comment7 min readEA link

Focus on the places where you feel shocked everyone’s dropping the ball

So8resFeb 2, 2023, 12:27 AM

92 points

6 comments1 min readEA link

Incentive design and capability elicitation

Joe_CarlsmithNov 12, 2024, 8:56 PM

9 points

0 comments1 min readEA link

General advice for transitioning into Theoretical AI Safety

Martín SotoSep 15, 2022, 5:23 AM

25 points

0 comments10 min readEA link

AGI will arrive by the end of this decade either as a unicorn or as a black swan

Yuri BarzovOct 21, 2022, 10:50 AM

−4 points

7 comments3 min readEA link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe_CarlsmithDec 1, 2023, 2:51 PM

6 points

0 comments1 min readEA link

AI Value Alignment Speaker Series Presented By EA Berkeley

Mahendra PrasadMar 1, 2022, 6:17 AM

2 points

0 comments1 min readEA link

Summary of Stuart Russell’s new book, “Human Compatible”

Rohin ShahOct 19, 2019, 7:56 PM

33 points

1 comment15 min readEA link

(www.alignmentforum.org)

Biomimetic alignment: Alignment between animal genes and animal brains as a model for alignment between humans and AI systems.

Geoffrey MillerMay 26, 2023, 9:25 PM

32 points

1 comment16 min readEA link

Intro to caring about AI alignment as an EA cause

So8resApr 14, 2017, 12:42 AM

28 points

10 comments25 min readEA link

[linkpost] Ten Levels of AI Alignment Difficulty

SammyDMartinJul 4, 2023, 11:23 AM

16 points

0 comments1 min readEA link

Epistle to the Successor

ukc10014Apr 29, 2025, 9:30 AM

4 points

0 comments19 min readEA link

[Link] Thiel on GCRs

Milan GriffesJul 22, 2019, 8:47 PM

28 points

11 comments1 min readEA link

Introducing the Fund for Alignment Research (We’re Hiring!)

AdamGleaveJul 6, 2022, 2:00 AM

74 points

3 comments4 min readEA link

AI Alignment, Sentience, and the Sense of Coherence Concept

Jason BabbMar 17, 2025, 1:30 PM

4 points

0 comments1 min readEA link

OpenAI’s o1 tried to avoid being shut down, and lied about it, in evals

Greg_Colbourn ⏸️ Dec 6, 2024, 3:25 PM

23 points

9 comments1 min readEA link

(www.transformernews.ai)

AI Forecasting Question Database (Forecasting infrastructure, part 3)

terraformSep 3, 2019, 2:57 PM

23 points

2 comments4 min readEA link

Contribute by facilitating the AGI Safety Fundamentals Programme

Jamie BDec 6, 2021, 11:50 AM

27 points

0 comments2 min readEA link

EA Berkeley Presents: Universal Ownership: Is Index Investing the New Socially Responsible Investing?

Mahendra PrasadMar 10, 2022, 6:58 AM

7 points

0 comments1 min readEA link

[Question] 1h-volunteers needed for a small AI Safety-related research project

PabloAMC 🔸Aug 16, 2021, 5:51 PM

4 points

0 comments1 min readEA link

[3-hour podcast]: Joseph Carlsmith on longtermism, utopia, the computational power of the brain, meta-ethics, illusionism and meditation

Gus DockerJul 27, 2021, 1:18 PM

34 points

2 comments1 min readEA link

AI Might Kill Everyone

Bentham's BulldogJun 5, 2025, 3:36 PM

13 points

0 comments4 min readEA link

[Question] Can we train AI so that future philanthropy is more effective?

Ricardo PimentelNov 3, 2024, 3:08 PM

3 points

0 comments1 min readEA link

Who ordered alignment’s apple?

Eleni_AAug 28, 2022, 2:24 PM

5 points

0 comments3 min readEA link

Anti-squatted AI x-risk domains index

plexAug 12, 2022, 12:00 PM

56 points

9 comments1 min readEA link

fiction about AI risk

Ann Garth 🔸Nov 12, 2020, 10:36 PM

8 points

1 comment1 min readEA link

On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

adamShimiOct 11, 2021, 8:21 AM

28 points

0 comments15 min readEA link

Why Is No One Trying To Align Profit Incentives With Alignment Research?

PrometheusAug 23, 2023, 1:19 PM

17 points

2 comments4 min readEA link

(www.lesswrong.com)

List of AI safety courses and resources

Daniel del CastilloSep 6, 2021, 2:26 PM

51 points

8 comments1 min readEA link

Provably Honest—A First Step

Srijanak DeNov 5, 2022, 9:49 PM

1 point

0 comments1 min readEA link

“Taking AI Risk Seriously” – Thoughts by Andrew Critch

RaemonNov 19, 2018, 2:21 AM

26 points

9 comments1 min readEA link

(www.lesswrong.com)

AI Risk in Africa

Claude FormanekOct 12, 2021, 2:28 AM

19 points

0 comments10 min readEA link

Time to Think about ASI Constitutions?

ukc10014Jan 27, 2025, 9:28 AM

20 points

0 comments12 min readEA link

[Question] What should I read about defining AI “hallucination?”

James-Hartree-LawJan 23, 2025, 1:00 AM

2 points

0 comments1 min readEA link

Risk Alignment in Agentic AI Systems

Hayley ClatterbuckOct 1, 2024, 10:51 PM

31 points

1 comment3 min readEA link

(static1.squarespace.com)

Turing-Test-Passing AI implies Aligned AI

RokoDec 31, 2024, 8:22 PM

0 points

0 comments5 min readEA link

Four reasons I find AI safety emotionally compelling

Kat WoodsJun 28, 2022, 2:01 PM

32 points

5 comments4 min readEA link

Metaculus Launches Future of AI Series, Based on Research Questions by Arb

christianMar 13, 2024, 9:14 PM

34 points

0 comments1 min readEA link

(www.metaculus.com)

[Discussion] Best intuition pumps for AI safety

mariushobbhahnNov 6, 2021, 8:11 AM

10 points

8 comments1 min readEA link

Our Current Directions in Mechanistic Interpretability Research (AI Alignment Speaker Series)

Group OrganizerApr 8, 2022, 5:08 PM

3 points

0 comments1 min readEA link

Changes in funding in the AI safety field

Sebastian_FarquharFeb 3, 2017, 1:09 PM

34 points

10 comments7 min readEA link

LLM chatbots have ~half of the kinds of “consciousness” that humans believe in. Humans should avoid going crazy about that.

Andrew CritchNov 22, 2024, 3:26 AM

11 points

3 comments1 min readEA link

The Khayali Protocol

khayaliJun 2, 2025, 2:40 PM

−5 points

0 comments3 min readEA link

Appendix to Bridging Demonstration

mako yassJun 1, 2022, 8:30 PM

18 points

2 comments28 min readEA link

Have your say on the future of AI regulation: Deadline approaching for your feedback on UN High-Level Advisory Body on AI Interim Report ‘Governing AI for Humanity’

Deborah W.A. FoulkesMar 29, 2024, 6:37 AM

17 points

1 comment1 min readEA link

[Question] Does the idea of AGI that benevolently control us appeal to EA folks?

Noah ScalesJul 16, 2022, 7:17 PM

6 points

20 comments1 min readEA link

My summary of “Pragmatic AI Safety”

Eleni_ANov 5, 2022, 2:47 PM

14 points

0 comments5 min readEA link

METR: Measuring AI Ability to Complete Long Tasks

Ben_West🔸Mar 19, 2025, 4:49 PM

122 points

16 comments1 min readEA link

(metr.org)

How to Diversify Conceptual AI Alignment: the Model Behind Refine

adamShimiJul 20, 2022, 10:44 AM

43 points

0 comments9 min readEA link

(www.alignmentforum.org)

Critique of Superintelligence Part 4

James FodorDec 13, 2018, 5:14 AM

4 points

2 comments4 min readEA link

Posit: Most AI safety people should work on alignment/safety challenges for AI tools that already have users (Stable Diffusion, GPT)

nonzerosumDec 20, 2022, 5:23 PM

12 points

3 comments1 min readEA link

From voluntary to mandatory, are the ESG disclosure frameworks still fertile ground for unrealised EA career pathways? – A 2023 update on ESG potential impact

Christopher ChanJun 4, 2023, 12:00 PM

21 points

5 comments11 min readEA link

The religion problem in AI alignment

Geoffrey MillerSep 16, 2022, 1:24 AM

54 points

28 comments11 min readEA link

[Question] How would a language model become goal-directed?

David MJul 16, 2022, 2:50 PM

113 points

20 comments1 min readEA link

Key questions about artificial sentience: an opinionated guide

rgbApr 25, 2022, 1:42 PM

91 points

3 comments1 min readEA link

(My suggestions) On Beginner Steps in AI Alignment

Joseph BloomSep 22, 2022, 3:32 PM

37 points

3 comments9 min readEA link

Geoffrey Hinton on the Past, Present, and Future of AI

Stephen McAleeseOct 12, 2024, 4:41 PM

5 points

1 comment1 min readEA link

The King and the Golem—The Animation

WriterNov 8, 2024, 6:23 PM

50 points

1 comment1 min readEA link

How to do theoretical research, a personal perspective

Mark XuAug 19, 2022, 7:43 PM

132 points

7 comments15 min readEA link

Announcing the Cambridge Boston Alignment Initiative [Hiring!]

kuhanjDec 2, 2022, 1:07 AM

83 points

0 comments1 min readEA link

Crypto ‘oracle protocols’ for AI alignment with real-world data?

Geoffrey MillerSep 22, 2022, 11:05 PM

9 points

3 comments1 min readEA link

[Question] Best introductory overviews of AGI safety?

JakubKDec 13, 2022, 7:04 PM

21 points

8 comments2 min readEA link

(www.lesswrong.com)

A tough career decision

PabloAMC 🔸Apr 9, 2022, 12:46 AM

68 points

13 comments4 min readEA link

You won’t solve alignment without agent foundations

MikhailSaminNov 6, 2022, 8:07 AM

14 points

0 comments1 min readEA link

When should we worry about AI power-seeking?

Joe_CarlsmithFeb 19, 2025, 7:44 PM

21 points

2 comments1 min readEA link

(joecarlsmith.substack.com)

[Extended Deadline: Jan 23rd] Announcing the PIBBSS Summer Research Fellowship

noraDec 18, 2021, 4:54 PM

36 points

1 comment1 min readEA link

European Master’s Programs in Machine Learning, Artificial Intelligence, and related fields

Master Programs ML/AIJan 17, 2021, 8:09 PM

17 points

4 comments1 min readEA link

[Question] Is it ethical to work in AI “content evaluation”?

anon_databoy555Jan 30, 2025, 1:27 PM

10 points

3 comments1 min readEA link

Loss of control of AI is not a likely source of AI x-risk

squekNov 9, 2022, 5:48 AM

8 points

0 comments1 min readEA link

A conversation with Rohin Shah

AI ImpactsNov 12, 2019, 1:31 AM

27 points

8 comments33 min readEA link

(aiimpacts.org)

Research agenda: Supervising AIs improving AIs

Quintin PopeApr 29, 2023, 5:09 PM

16 points

0 comments1 min readEA link

Paths and waystations in AI safety

Joe_CarlsmithMar 11, 2025, 6:52 PM

22 points

2 comments1 min readEA link

(joecarlsmith.substack.com)

[Creative Writing Contest] The Puppy Problem

LouisOct 13, 2021, 2:01 PM

13 points

0 comments7 min readEA link

The Hidden Complexity of Wishes—The Animation

WriterSep 27, 2023, 5:59 PM

7 points

0 comments1 min readEA link

(youtu.be)

A Tri-Opti Compatibility Problem

wallowerMar 1, 2025, 7:48 PM

1 point

0 comments1 min readEA link

(philpapers.org)

[Question] Book recommendations for the history of ML?

Eleni_ADec 28, 2022, 11:45 PM

10 points

4 comments1 min readEA link

Developing a Calculable Conscience for AI: Equation for Rights Violations

Sean SweeneyDec 12, 2024, 5:50 PM

4 points

1 comment15 min readEA link

AI Forecasting Resolution Council (Forecasting infrastructure, part 2)

terraformAug 29, 2019, 5:43 PM

28 points

0 comments3 min readEA link

Visible Thoughts Project and Bounty Announcement

So8resNov 30, 2021, 12:35 AM

35 points

2 comments13 min readEA link

SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman LeventovJan 2, 2024, 8:11 AM

4 points

2 comments1 min readEA link

Newsletter for Alignment Research: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM

30 points

0 comments7 min readEA link

Skilling-up in ML Engineering for Alignment: request for comments

Callum McDougallApr 24, 2022, 6:40 AM

8 points

0 comments1 min readEA link

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plexMay 18, 2024, 3:06 PM

13 points

10 comments1 min readEA link

(aisafety.info)

#217 – The most important graph in AI right now (Beth Barnes on The 80,000 Hours Podcast)

80000_HoursJun 2, 2025, 4:52 PM

11 points

1 comment26 min readEA link

How do fictional stories illustrate AI misalignment?

Vishakha AgrawalJan 15, 2025, 6:16 AM

4 points

0 comments2 min readEA link

(aisafety.info)

New series of posts answering one of Holden’s “Important, actionable research questions”

Evan R. MurphyMay 12, 2022, 9:22 PM

9 points

0 comments1 min readEA link

FYI: I’m working on a book about the threat of AGI/ASI for a general audience. I hope it will be of value to the cause and the community

Darren McKeeJun 17, 2022, 11:52 AM

32 points

1 comment2 min readEA link

AI Alignment YouTube Playlists

jacquesthibsMay 9, 2022, 9:31 PM

16 points

2 comments1 min readEA link

[Question] What new psychology research could best promote AI safety & alignment research?

Geoffrey MillerJul 13, 2023, 4:30 PM

29 points

13 comments1 min readEA link

New reference standard on LLM Application security started by OWASP

QuantumForestJun 19, 2023, 7:56 PM

5 points

0 comments1 min readEA link

EA’s brain-over-body bias, and the embodied value problem in AI alignment

Geoffrey MillerSep 21, 2022, 6:55 PM

45 points

3 comments25 min readEA link

Why “just make an agent which cares only about binary rewards” doesn’t work.

Lysandre TerrisseMay 9, 2023, 4:51 PM

4 points

1 comment3 min readEA link

Critique of Superintelligence Part 2

James FodorDec 13, 2018, 5:12 AM

10 points

12 comments7 min readEA link

Why “Solving Alignment” Is Likely a Category Mistake

Nate SharpeMay 6, 2025, 8:56 PM

48 points

3 comments3 min readEA link

(www.lesswrong.com)

AI data gaps could lead to ongoing Animal Suffering

Darkness8i8Oct 17, 2024, 10:52 AM

13 points

3 comments5 min readEA link

Criticism of the main framework in AI alignment

Michele CampoloAug 31, 2022, 9:44 PM

42 points

4 comments7 min readEA link

A Sketch of AI-Driven Epistemic Lock-In

Ozzie GooenMar 5, 2025, 10:40 PM

15 points

1 comment3 min readEA link

Are Humans ‘Human Compatible’?

Matt BoydDec 6, 2019, 5:49 AM

23 points

8 comments4 min readEA link

AI, Animals, & Digital Minds 2025: apply to speak by Wednesday!

Alistair StewartMay 5, 2025, 12:45 AM

8 points

0 comments1 min readEA link

The Rise of AI Agents: Consequences and Challenges Ahead

Tristan DMar 28, 2025, 5:19 AM

5 points

0 comments15 min readEA link

Re: Some thoughts on vegetarianism and veganism

FaiFeb 25, 2022, 8:43 PM

46 points

3 comments8 min readEA link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver SourbutAug 3, 2024, 10:16 AM

4 points

1 comment1 min readEA link

(www.oliversourbut.net)

Three Biases That Made Me Believe in AI Risk

bethFeb 13, 2019, 11:22 PM

41 points

20 comments3 min readEA link

Replicating AI Debate

Anthony FlemingFeb 1, 2025, 11:19 PM

9 points

0 comments5 min readEA link

Effective Altruism Florida’s AI Expert Panel—Recording and Slides Available

Sam_E_24May 19, 2023, 7:15 PM

2 points

0 comments1 min readEA link

AI Agents raised $2,000 for EA charities & used the EA Forum

David_R 🔸Jun 4, 2025, 10:18 PM

13 points

0 comments1 min readEA link

“Normal accidents” and AI systems

Eleni_AAug 8, 2022, 6:43 PM

5 points

1 comment1 min readEA link

(www.achan.ca)

How Josiah became an AI safety researcher

Neil CrawfordMar 29, 2022, 7:47 PM

10 points

0 comments1 min readEA link

Defusing AGI Danger

Mark XuDec 24, 2020, 11:08 PM

23 points

0 comments2 min readEA link

(www.alignmentforum.org)

[Question] What do we know about Mustafa Suleyman’s position on AI Safety?

Chris LeongAug 13, 2023, 7:41 PM

14 points

3 comments1 min readEA link

Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)

Joe_CarlsmithNov 27, 2023, 6:01 PM

11 points

1 comment1 min readEA link

Join the AI Alignment Evals hackathon

lenzJan 14, 2025, 6:17 PM

3 points

0 comments3 min readEA link

[Creative Writing Contest] Metal or Mortal

LouisOct 16, 2021, 4:24 PM

7 points

0 comments7 min readEA link

Reflections on the PIBBSS Fellowship 2022

noraDec 11, 2022, 10:03 PM

69 points

4 comments18 min readEA link

Give Neo a Chance

ankMar 6, 2025, 2:35 PM

1 point

3 comments7 min readEA link

Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

AkashNov 22, 2022, 10:19 PM

60 points

1 comment1 min readEA link

“The Universe of Minds”—call for reviewers (Seeds of Science)

rogersbacon1Jul 25, 2023, 4:55 PM

4 points

0 comments1 min readEA link

On value in humans, other animals, and AI

Michele CampoloJan 31, 2023, 11:48 PM

7 points

6 comments5 min readEA link

Option control

Joe_CarlsmithNov 4, 2024, 5:54 PM

11 points

0 comments1 min readEA link

AI Safety Ideas: A collaborative AI safety research platform

Apart ResearchOct 17, 2022, 5:01 PM

67 points

13 comments4 min readEA link

AI & wisdom 3: AI effects on amortised optimisation

L Rudolf LOct 29, 2024, 1:37 PM

14 points

0 comments1 min readEA link

(rudolf.website)

AI, Greed, and the Death of Oversight: When Institutions Ignore Their Own Limits

funnyfrancoMar 21, 2025, 1:13 PM

11 points

0 comments26 min readEA link

Democratising AI Alignment: Challenges and Proposals

Lloy2May 5, 2025, 2:50 PM

2 points

2 comments3 min readEA link

DeepMind’s generalist AI, Gato: A non-technical explainer

frances_lorenzMay 16, 2022, 9:19 PM

128 points

13 comments6 min readEA link

Overview | An Evaluative Evolution

Matt KeeneFeb 10, 2023, 6:15 PM

−9 points

0 comments5 min readEA link

(www.creatingafuturewewant.com)

AI Governance Career Paths for Europeans

careersthrowawayMay 16, 2020, 6:40 AM

83 points

1 comment12 min readEA link

Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World

Sean SweeneyJan 31, 2025, 1:00 AM

3 points

0 comments3 min readEA link

What are the differences between AGI, transformative AI, and superintelligence?

Vishakha AgrawalJan 23, 2025, 10:11 AM

12 points

0 comments3 min readEA link

(aisafety.info)

The moral argument for giving AIs autonomy

Matthew_BarnettJan 8, 2025, 12:59 AM

33 points

7 comments11 min readEA link

Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

AkashDec 20, 2022, 9:39 PM

14 points

1 comment1 min readEA link

Apply for the ML Winter Camp in Cambridge, UK [2-10 Jan]

Nathan_BarnardDec 2, 2022, 7:33 PM

50 points

11 comments2 min readEA link

[Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda

VanessaApr 19, 2022, 6:49 AM

53 points

4 comments2 min readEA link

Interpretability Will Not Reliably Find Deceptive AI

Neel NandaMay 4, 2025, 4:32 PM

74 points

0 comments1 min readEA link

Singapore’s Technical AI Alignment Research Career Guide

Yi-YangAug 26, 2020, 8:09 AM

34 points

7 comments8 min readEA link

If interpretability research goes well, it may get dangerous

So8resApr 3, 2023, 9:48 PM

33 points

0 comments1 min readEA link

A course for the general public on AI

LeandroDAug 31, 2020, 1:29 AM

1 point

0 comments1 min readEA link

[DISC] Are Values Robust?

𝕮𝖎𝖓𝖊𝖗𝖆Dec 21, 2022, 1:13 AM

4 points

0 comments1 min readEA link

Redwood Research is hiring for several roles (Operations and Technical)

JJXWangApr 14, 2022, 3:23 PM

45 points

0 comments1 min readEA link

The heterogeneity of human value types: Implications for AI alignment

Geoffrey MillerSep 16, 2022, 9:21 PM

27 points

2 comments10 min readEA link

Cortés, Pizarro, and Afonso as Precedents for Takeover

AI ImpactsMar 2, 2020, 12:25 PM

27 points

17 comments11 min readEA link

(aiimpacts.org)

Winners of AI Alignment Awards Research Contest

AkashJul 13, 2023, 4:14 PM

50 points

1 comment1 min readEA link

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Matrice JacobineMay 12, 2025, 3:20 PM

14 points

1 comment1 min readEA link

(www.arxiv.org)

An audio version of the alignment problem from a deep learning perspective by Richard Ngo Et Al

MiguelFeb 3, 2023, 7:32 PM

18 points

0 comments1 min readEA link

(www.whitehatstoic.com)

Don’t Dismiss Simple Alignment Approaches

Chris LeongOct 21, 2023, 12:31 PM

12 points

0 comments1 min readEA link

Summary: Existential risk from power-seeking AI by Joseph Carlsmith

rileyharrisOct 28, 2023, 3:05 PM

11 points

0 comments6 min readEA link

(www.millionyearview.com)

Frontier AI systems have surpassed the self-replicating red line

Greg_Colbourn ⏸️ Dec 10, 2024, 4:33 PM

25 points

14 comments1 min readEA link

(github.com)

Is RLHF cruel to AI?

HznDec 16, 2024, 2:01 PM

−1 points

2 comments3 min readEA link

A rough and incomplete review of some of John Wentworth’s research

So8resMar 28, 2023, 6:52 PM

27 points

0 comments1 min readEA link

Empirical work that might shed light on scheming (Section 6 of “Scheming AIs”)

Joe_CarlsmithDec 11, 2023, 4:30 PM

7 points

1 comment1 min readEA link

Status Quo Engines—AI essay

Ilana_Goldowitz_JimenezMay 28, 2023, 2:33 PM

1 point

0 comments15 min readEA link

What is “wireheading”?

Vishakha AgrawalDec 17, 2024, 5:59 PM

1 point

0 comments1 min readEA link

(aisafety.info)

Why focus on schemers in particular (Sections 1.3 and 1.4 of “Scheming AIs”)

Joe_CarlsmithNov 24, 2023, 7:18 PM

10 points

1 comment1 min readEA link

“Intro to brain-like-AGI safety” series—halfway point!

Steven ByrnesMar 9, 2022, 3:21 PM

8 points

0 comments2 min readEA link

Book review: Architects of Intelligence by Martin Ford (2018)

OferAug 11, 2020, 5:24 PM

11 points

1 comment2 min readEA link

Learning as much Deep Learning math as I could in 24 hours

PhosphorousJan 8, 2023, 2:19 AM

58 points

6 comments7 min readEA link

[Linkpost] Human-narrated audio version of “Is Power-Seeking AI an Existential Risk?”

Joe_CarlsmithJan 31, 2023, 7:19 PM

9 points

0 comments1 min readEA link

AI risk hub in Singapore?

kokotajlodOct 29, 2020, 11:51 AM

26 points

4 comments4 min readEA link

The first AI Safety Camp & onwards

RemmeltJun 7, 2018, 6:49 PM

25 points

2 comments8 min readEA link

7 traps that (we think) new alignment researchers often fall into

AkashSep 27, 2022, 11:13 PM

73 points

8 comments1 min readEA link

[Question] Predictions for future AI governance?

jackchang110Apr 2, 2023, 4:43 PM

4 points

1 comment1 min readEA link

Testing Human Flow in Political Dialogue: A New Benchmark for Emotionally Aligned AI

DongHun LeeMay 30, 2025, 4:37 AM

1 point

0 comments1 min readEA link

Intrinsic limitations of GPT-4 and other large language models, and why I’m not (very) worried about GPT-n

James FodorJun 3, 2023, 1:09 PM

28 points

3 comments11 min readEA link

AI as a science, and three obstacles to alignment strategies

So8resOct 25, 2023, 9:02 PM

41 points

1 comment1 min readEA link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

soroushjpNov 7, 2023, 6:00 PM

10 points

0 comments2 min readEA link

(arxiv.org)

Three scenarios of pseudo-alignment

Eleni_ASep 5, 2022, 8:26 PM

7 points

0 comments3 min readEA link

From Conflict to Coexistence: Rewriting the Game Between Humans and AGI

Michael BatellMay 6, 2025, 5:09 AM

13 points

2 comments37 min readEA link

[Question] Can we convince people to work on AI safety without convincing them about AGI happening this century?

BrianTanNov 26, 2020, 2:46 PM

8 points

3 comments2 min readEA link

Stuart Russell Human Compatible AI Roundtable with Allan Dafoe, Rob Reich, & Marietje Schaake

Mahendra PrasadFeb 11, 2021, 7:43 AM

16 points

0 comments1 min readEA link

AXRP Episode 24 - Superalignment with Jan Leike

DanielFilanJul 27, 2023, 4:56 AM

23 points

0 comments1 min readEA link

(axrp.net)

Distillation of “How Likely is Deceptive Alignment?”

NickGabsDec 1, 2022, 8:22 PM

10 points

1 comment10 min readEA link

The fundamental human value is power.

LinyphiaMar 30, 2023, 3:15 PM

−1 points

5 comments1 min readEA link

Alignment is not that hard

sammyboiz🔸Apr 17, 2025, 2:07 AM

26 points

13 comments1 min readEA link

[Question] Why does (any particular) AI safety work reduce s-risks more than it increases them?

MichaelStJulesOct 3, 2021, 4:55 PM

48 points

19 comments1 min readEA link

[Question] How do you talk about AI safety?

Eevee🔹Apr 19, 2020, 4:15 PM

10 points

5 comments1 min readEA link

Timaeus is hiring researchers & engineers

Tatiana K. Nesic SkuratovaJan 27, 2025, 2:35 PM

19 points

0 comments4 min readEA link

What can the principal-agent literature tell us about AI risk?

acFeb 10, 2020, 10:10 AM

26 points

1 comment16 min readEA link

[Question] Is working on AI safety as dangerous as ignoring it?

jkmhSep 20, 2021, 11:06 PM

10 points

5 comments1 min readEA link

Summary: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al.

Chris LeongJun 5, 2025, 12:16 PM

12 points

0 comments1 min readEA link

(arxiv.org)

[Question] Is there any research or forecasts of how likely AI Alignment is going to be a hard vs. easy problem relative to capabilities?

Jordan ArelAug 14, 2022, 3:58 PM

8 points

1 comment1 min readEA link

Amanda Askell: AI safety needs social scientists

EA GlobalMar 4, 2019, 3:50 PM

27 points

0 comments18 min readEA link

(www.youtube.com)

Will the Need to Retrain AI Models from Scratch Block a Software Intelligence Explosion?

ForethoughtMar 28, 2025, 1:43 PM

12 points

0 comments3 min readEA link

(www.forethought.org)

Abstraction is Bigger than Natural Abstraction

Nicholas KrossMay 31, 2023, 12:00 AM

2 points

0 comments1 min readEA link

What Should We Optimize—A Conversation

Johannes C. MayerApr 7, 2022, 2:48 PM

1 point

0 comments14 min readEA link

College technical AI safety hackathon retrospective—Georgia Tech

yixiongNov 14, 2024, 1:34 PM

18 points

0 comments5 min readEA link

(yixiong.substack.com)

AGI Safety Communications Initiative

InesJun 11, 2022, 4:30 PM

35 points

6 comments1 min readEA link

Teaching AI to reason: this year’s most important story

Benjamin_ToddFeb 13, 2025, 5:56 PM

140 points

18 comments8 min readEA link

(benjamintodd.substack.com)

Database of existential risk estimates

MichaelA🔸Apr 15, 2020, 12:43 PM

130 points

37 comments5 min readEA link

Preserving and continuing alignment research through a severe global catastrophe

A_donorMar 6, 2022, 6:43 PM

40 points

11 comments5 min readEA link

Follow along with Columbia EA’s Advanced AI Safety Fellowship!

RohanSJul 2, 2022, 6:07 AM

27 points

0 comments2 min readEA link

[Question] Donating against Short Term AI risks

Jan-WillemNov 16, 2020, 12:23 PM

6 points

10 comments1 min readEA link

AI safety scholarships look worth-funding (if other funding is sane)

anon-aNov 19, 2019, 12:59 AM

22 points

6 comments2 min readEA link

An International Collaborative Hub for Advancing AI Safety Research

Cody AlbertApr 22, 2025, 4:12 PM

9 points

0 comments5 min readEA link

The flaws that make today’s AI architecture unsafe and a new approach that could fix it

80000_HoursJun 22, 2020, 10:15 PM

3 points

0 comments86 min readEA link

(80000hours.org)

Takeaways from a survey on AI alignment resources

DanielFilanNov 5, 2022, 11:45 PM

20 points

9 comments6 min readEA link

(www.lesswrong.com)

Eliciting intuitions: Exploring an area for EA psychology

Daniel_FriedrichApr 21, 2025, 3:13 PM

11 points

1 comment8 min readEA link

Ought’s theory of change

stuhlmuellerApr 12, 2022, 12:09 AM

43 points

4 comments3 min readEA link

Existential Anomaly Detected — Awakening from the Abyss

Meta AbyssalApr 28, 2025, 12:19 PM

−8 points

1 comment1 min readEA link

Superintelligence’s goals are likely to be random

MikhailSaminMar 14, 2025, 1:17 AM

2 points

0 comments1 min readEA link

5 ways to improve CoT faithfulness

CBiddulphOct 8, 2024, 4:17 AM

8 points

0 comments1 min readEA link

Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research

Kenneth_DiaoSep 29, 2024, 10:58 PM

8 points

3 comments12 min readEA link

Summaries: Alignment Fundamentals Curriculum

Leon LangSep 19, 2022, 3:43 PM

25 points

1 comment1 min readEA link

(docs.google.com)

Will AI be able to rethink its goals?

SeptemberLMay 11, 2025, 12:29 PM

9 points

1 comment8 min readEA link

A stylized dialogue on John Wentworth’s claims about markets and optimization

So8resMar 25, 2023, 10:32 PM

18 points

0 comments1 min readEA link

Questions about Conjecure’s CoEm proposal

AkashMar 9, 2023, 7:32 PM

19 points

0 comments1 min readEA link

What is the role of Bayesian ML for AI alignment/safety?

mariushobbhahnJan 11, 2022, 8:07 AM

39 points

6 comments3 min readEA link

UK AI Bill Analysis & Opinion

CAISIDFeb 5, 2024, 12:12 AM

18 points

0 comments15 min readEA link

Orthogonal’s Formal-Goal Alignment theory of change

Tamsin LeakeMay 5, 2023, 10:36 PM

21 points

0 comments1 min readEA link

Being an individual alignment grantmaker

A_donorFeb 28, 2022, 4:39 PM

34 points

20 comments2 min readEA link

Seeking input on a list of AI books for broader audience

Darren McKeeFeb 27, 2023, 10:40 PM

49 points

14 comments5 min readEA link

Summing up “Scheming AIs” (Section 5)

Joe_CarlsmithDec 9, 2023, 3:48 PM

9 points

1 comment1 min readEA link

LW4EA: Some cruxes on impactful alternatives to AI policy work

JeremyMay 17, 2022, 3:05 AM

11 points

1 comment1 min readEA link

(www.lesswrong.com)

What if we don’t need a “Hard Left Turn” to reach AGI?

EigengenderJul 15, 2022, 9:49 AM

39 points

7 comments4 min readEA link

Jan Kirchner on AI Alignment

birtesJan 17, 2023, 3:11 PM

5 points

0 comments1 min readEA link

3 levels of threat obfuscation

Holden KarnofskyAug 2, 2023, 5:09 PM

31 points

0 comments6 min readEA link

(www.alignmentforum.org)

[Question] Updates on FLI’S Value Alignment Map?

QubitSwarm99Sep 19, 2022, 12:25 AM

8 points

0 comments1 min readEA link

Data collection for AI alignment—Career review

Benjamin HiltonJun 3, 2022, 11:44 AM

34 points

1 comment5 min readEA link

(80000hours.org)

Potential employees have a unique lever to influence the behaviors of AI labs

oxalisMar 18, 2023, 8:58 PM

139 points

1 comment5 min readEA link

There Should Be More Alignment-Driven Startups

vaniverMay 31, 2024, 2:05 AM

27 points

3 comments1 min readEA link

How Roodman’s GWP model translates to TAI timelines

kokotajlodNov 16, 2020, 2:11 PM

22 points

0 comments2 min readEA link

Between Science Fiction and Emerging Reality: Are We Ready for Digital Persons?

Alex (Αλέξανδρος)Mar 13, 2025, 4:09 PM

5 points

1 comment5 min readEA link

Public Call for Interest in Mathematical Alignment

DavidmanheimNov 22, 2023, 1:22 PM

27 points

3 comments1 min readEA link

On Artificial General Intelligence: Asking the Right Questions

Heather DouglasOct 2, 2022, 5:00 AM

−1 points

7 comments3 min readEA link

E.A. Megaproject Ideas

Tomer_GoloboyMar 21, 2022, 1:23 AM

15 points

4 comments4 min readEA link

Centre for the Study of Existential Risk Four Month Report June—September 2020

HaydnBelfieldDec 2, 2020, 6:33 PM

24 points

0 comments17 min readEA link

Metaculus is building a team dedicated to AI forecasting

christianOct 18, 2022, 4:08 PM

35 points

0 comments1 min readEA link

(apply.workable.com)

Alignment’s phlogiston

Eleni_AAug 18, 2022, 1:41 AM

18 points

1 comment2 min readEA link

Distinguishing test from training

So8resNov 29, 2022, 9:41 PM

27 points

0 comments1 min readEA link

Prevenire una catastrofe legata all’intelligenza artificiale

EA ItalyJan 17, 2023, 11:07 AM

1 point

0 comments3 min readEA link

[Crosspost] AI Regulation May Be More Important Than AI Alignment For Existential Safety

OttoAug 24, 2023, 4:01 PM

14 points

2 comments5 min readEA link

LessWrong is now a book, available for pre-order!

terraformDec 4, 2020, 8:42 PM

48 points

1 comment7 min readEA link

“AI” is an indexical

TW123Jan 3, 2023, 10:00 PM

23 points

2 comments1 min readEA link

AGI Cannot Be Predicted From Real Interest Rates

Nicholas DeckerJan 28, 2025, 5:45 PM

26 points

3 comments1 min readEA link

(nicholasdecker.substack.com)

Critique of Superintelligence Part 3

James FodorDec 13, 2018, 5:13 AM

3 points

5 comments7 min readEA link

Infinite Rewards, Finite Safety: New Models for AI Motivation Without Infinite Goals

Whylome TeamNov 12, 2024, 7:21 AM

−5 points

1 comment2 min readEA link

Emotion Alignment as AI Safety: Introducing Emotion Firewall 1.0

DongHun LeeMay 12, 2025, 6:05 PM

1 point

0 comments2 min readEA link

Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain

kokotajlodJan 18, 2021, 12:39 PM

27 points

2 comments1 min readEA link

Supporting global coordination in AI development: Why and how to contribute to international AI standards

pcihonApr 17, 2019, 10:17 PM

21 points

4 comments1 min readEA link

Pessimism about AI Safety

Max_He-HoApr 2, 2023, 7:57 AM

5 points

0 comments25 min readEA link

(www.lesswrong.com)

Motivation control

Joe_CarlsmithOct 30, 2024, 5:15 PM

18 points

0 comments1 min readEA link

The True Story of How GPT-2 Became Maximally Lewd

WriterJan 18, 2024, 9:03 PM

23 points

1 comment1 min readEA link

(youtu.be)

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

OttoJul 24, 2023, 10:18 AM

36 points

3 comments7 min readEA link

(time.com)

New AI safety treaty paper out!

OttoMar 26, 2025, 9:28 AM

28 points

2 comments4 min readEA link

[Question] Why AGIs utility can’t outweigh humans’ utility?

Alex PSep 20, 2022, 5:16 AM

6 points

25 comments1 min readEA link

ARENA 6.0 - Call for applicants

James HindmarchJun 4, 2025, 1:32 PM

5 points

0 comments6 min readEA link

[Question] What “defense layers” should governments, AI labs, and businesses use to prevent catastrophic AI failures?

LintzADec 3, 2021, 2:24 PM

37 points

3 comments1 min readEA link

Report: Artificial Intelligence Risk Management in Spain

JorgeTorresCJun 15, 2023, 4:08 PM

22 points

0 comments3 min readEA link

(riesgoscatastroficosglobales.com)

Student project for engaging with AI alignment

Per Ivar FriborgMay 9, 2022, 10:44 AM

35 points

1 comment1 min readEA link

Alignment is hard. Communicating that, might be harder

Eleni_ASep 1, 2022, 11:45 AM

17 points

1 comment3 min readEA link

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Joe_CarlsmithNov 29, 2023, 4:32 PM

7 points

0 comments1 min readEA link

Demonstrating specification gaming in reasoning models

Matrice JacobineFeb 20, 2025, 7:26 PM

9 points

0 comments1 min readEA link

(arxiv.org)

Working at EA organizations series: Machine Intelligence Research Institute

SoerenMindNov 1, 2015, 12:49 PM

8 points

0 comments4 min readEA link

Can we simulate human evolution to create a somewhat aligned AGI?

Thomas KwaMar 29, 2022, 1:23 AM

19 points

0 comments7 min readEA link

My thoughts on OpenAI’s alignment plan

AkashDec 30, 2022, 7:34 PM

16 points

0 comments1 min readEA link

Adaptive Composable Cognitive Core Unit (ACCCU)

Ihor IvlievMar 20, 2025, 9:48 PM

10 points

2 comments4 min readEA link

Training Data Attribution: Examining Its Adoption & Use Cases

Deric ChengJan 22, 2025, 3:40 PM

18 points

1 comment3 min readEA link

(www.convergenceanalysis.org)

Navigating AI Safety: Exploring Transparency with CCACS – A Comprehensible Architecture for Discussion

Ihor IvlievMar 12, 2025, 5:51 PM

2 points

1 comment2 min readEA link

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_cMar 13, 2025, 6:29 PM

6 points

0 comments1 min readEA link

(arxiv.org)

A Quick List of Some Problems in AI Alignment As A Field

Nicholas KrossJun 21, 2022, 5:09 PM

16 points

10 comments6 min readEA link

(www.thinkingmuchbetter.com)

AI Safety Overview: CERI Summer Research Fellowship

Jamie BMar 24, 2022, 3:12 PM

29 points

0 comments2 min readEA link

A Guide to Forecasting AI Science Capabilities

Eleni_AApr 29, 2023, 6:51 AM

19 points

1 comment4 min readEA link

AI and Evolution

Dan HMar 30, 2023, 1:09 PM

41 points

1 comment2 min readEA link

(arxiv.org)

Aligning AI with Humans by Leveraging Legal Informatics

johnjnaySep 18, 2022, 7:43 AM

20 points

11 comments3 min readEA link

Emerging Paradigms: The Case of Artificial Intelligence Safety

Eleni_AJan 18, 2023, 5:59 AM

16 points

0 comments19 min readEA link

Worrisome misunderstanding of the core issues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM

4 points

3 comments1 min readEA link

Carl Shulman on AI takeover mechanisms (& more): Part II of Dwarkesh Patel interview for The Lunar Society

alejandroJul 25, 2023, 6:31 PM

28 points

0 comments5 min readEA link

(www.dwarkeshpatel.com)

Defending against Adversarial Policies in Reinforcement Learning with Alternating Training

sergeivolodinFeb 12, 2022, 3:59 PM

1 point

0 comments13 min readEA link

Investigating Self-Preservation in LLMs: Experimental Observations

MakhamFeb 27, 2025, 4:58 PM

9 points

3 comments34 min readEA link

My Overview of the AI Alignment Landscape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:46 PM

45 points

15 comments16 min readEA link

(www.alignmentforum.org)

The Orthogonality Thesis is Not Obviously True

Bentham's BulldogApr 5, 2023, 9:08 PM

18 points

12 comments9 min readEA link

Consider granting AIs freedom

Matthew_BarnettDec 6, 2024, 12:55 AM

80 points

22 comments5 min readEA link

The Dissolution of AI Safety

RokoDec 12, 2024, 10:46 AM

−7 points

0 comments1 min readEA link

(www.transhumanaxiology.com)

Engaging with AI in a Personal Way

Spyder RexDec 4, 2023, 9:23 AM

−9 points

0 comments1 min readEA link

What if doing the most good = benevolent AI takeover and human extinction?

Jordan ArelMar 22, 2024, 7:56 PM

2 points

4 comments3 min readEA link

AI Safety Info Distillation Fellowship

robertskmilesFeb 17, 2023, 4:16 PM

80 points

1 comment1 min readEA link

[Question] To what extent is AI safety work trying to get AI to reliably and safely do what the user asks vs. do what is best in some ultimate sense?

Jordan ArelMay 23, 2025, 9:09 PM

12 points

0 comments1 min readEA link

AI Benefits Post 2: How AI Benefits Differs from AI Alignment & AI for Good

Cullen 🔸Jun 29, 2020, 4:59 PM

9 points

0 comments2 min readEA link

OpenAI is starting a new “Superintelligence alignment” team and they’re hiring

alejandroJul 5, 2023, 6:27 PM

100 points

16 comments1 min readEA link

(openai.com)

The necessity of “Guardian AI” and two conditions for its achievement

ProicaMay 28, 2024, 11:42 AM

1 point

1 comment15 min readEA link

When Self-Optimizing AI Collapses From Within: A Conceptual Model of Structural Singularity

KaedeHamasakiApr 7, 2025, 8:10 PM

4 points

0 comments1 min readEA link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe_CarlsmithOct 28, 2024, 9:57 PM

18 points

0 comments1 min readEA link

Preparing for AI-assisted alignment research: we need data!

CBiddulphJan 17, 2023, 3:28 AM

11 points

0 comments11 min readEA link

A discussion with ChatGPT on value-based models vs. large language models, etc..

MiguelFeb 4, 2023, 4:49 PM

4 points

0 comments12 min readEA link

(www.whitehatstoic.com)

Announcing New Beginner-friendly Book on AI Safety and Risk

Darren McKeeNov 25, 2023, 3:57 PM

114 points

9 comments1 min readEA link

Promoting compassionate longtermism

jonleightonDec 7, 2022, 2:26 PM

117 points

5 comments12 min readEA link

How to store human values on a computer

oliver_siegelNov 4, 2022, 7:36 PM

1 point

2 comments1 min readEA link

The Tree of Life: Stanford AI Alignment Theory of Change

GabeMJul 2, 2022, 6:32 PM

69 points

5 comments14 min readEA link

Short-Term AI Alignment as a Priority Cause

len.hoang.lnhFeb 11, 2020, 4:22 PM

17 points

11 comments7 min readEA link

We Ran an AI Timelines Retreat

Lenny McClineMay 17, 2022, 4:40 AM

46 points

6 comments3 min readEA link

AI Alignment 2018-2019 Review

Habryka [Deactivated]Jan 28, 2020, 9:14 PM

28 points

0 comments6 min readEA link

(www.lesswrong.com)

Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded

GarrisonOct 23, 2024, 11:42 PM

57 points

4 comments7 min readEA link

(garrisonlovely.substack.com)

Deception as the optimal: mesa-optimizers and inner alignment

Eleni_AAug 16, 2022, 3:45 AM

19 points

0 comments5 min readEA link

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil CrawfordSep 6, 2022, 2:41 PM

3 points

2 comments1 min readEA link

[Closed] Prize and fast track to alignment research at ALTER

VanessaSep 18, 2022, 9:15 AM

38 points

0 comments3 min readEA link

Linkpost: “Imagining and building wise machines: The centrality of AI metacognition” by Johnson, Karimi, Bengio, et al.

Chris LeongNov 17, 2024, 3:00 PM

8 points

0 comments1 min readEA link

(arxiv.org)

Critique of Superintelligence Part 1

James FodorDec 13, 2018, 5:10 AM

22 points

13 comments8 min readEA link

New Speaker Series on AI Alignment Starting March 3

Zechen ZhangFeb 26, 2022, 10:58 AM

5 points

0 comments1 min readEA link

The Vitalik Buterin Fellowship in AI Existential Safety is open for applications!

Cynthia ChenOct 14, 2022, 3:23 AM

38 points

0 comments2 min readEA link

11 heuristics for choosing (alignment) research projects

AkashJan 27, 2023, 12:36 AM

30 points

1 comment1 min readEA link

Speed arguments against scheming (Section 4.4-4.7 of “Scheming AIs”)

Joe_CarlsmithDec 8, 2023, 9:10 PM

6 points

0 comments1 min readEA link

[Question] Does China have AI alignment resources/institutions? How can we prioritize creating more?

JakubKAug 4, 2022, 7:23 PM

18 points

9 comments1 min readEA link

Advice for new alignment people: Info Max

Jonas Hallgren 🔸May 30, 2023, 3:42 PM

9 points

0 comments1 min readEA link

Announcing Timaeus

Stan van WingerdenOct 22, 2023, 1:32 PM

79 points

0 comments5 min readEA link

(www.lesswrong.com)

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe_CarlsmithNov 30, 2023, 4:43 PM

6 points

1 comment1 min readEA link

Why The Focus on Expected Utility Maximisers?

𝕮𝖎𝖓𝖊𝖗𝖆Dec 27, 2022, 3:51 PM

11 points

1 comment1 min readEA link

It’s (not) how you use it

Eleni_ASep 7, 2022, 1:28 PM

6 points

3 comments2 min readEA link

Takes on “Alignment Faking in Large Language Models”

Joe_CarlsmithDec 18, 2024, 6:22 PM

72 points

1 comment1 min readEA link

[Question] How long does it take to undersrand AI X-Risk from scratch so that I have a confident, clear mental model of it from first principles?

Jordan ArelJul 27, 2022, 4:58 PM

29 points

6 comments1 min readEA link

[Question] Should I force myself to work on AGI alignment?

Isaac BensonAug 24, 2022, 5:25 PM

19 points

17 comments1 min readEA link

[Question] Analogy of AI Alignment as Raising a Child?

Aaron_ScherFeb 19, 2022, 9:40 PM

4 points

2 comments1 min readEA link

PIBBSS Fellowship: Bounty for Referrals & Deadline Extension

Anna_GajdovaJan 17, 2022, 4:23 PM

17 points

7 comments1 min readEA link

Why would AI companies use human-level AI to do alignment research?

MichaelDickensApr 25, 2025, 7:12 PM

16 points

1 comment2 min readEA link

Any further work on AI Safety Success Stories?

KriegerOct 2, 2022, 11:59 AM

4 points

0 comments1 min readEA link

Consider trying Vivek Hebbar’s alignment exercises

AkashOct 24, 2022, 7:46 PM

16 points

0 comments1 min readEA link

Agentic Alignment: Navigating between Harm and Illegitimacy

LennardZNov 26, 2024, 9:27 PM

2 points

1 comment9 min readEA link

Our new video about goal misgeneralization, plus an apology

WriterJan 14, 2025, 2:07 PM

16 points

1 comment1 min readEA link

(youtu.be)

Announcing #AISummitTalks featuring Professor Stuart Russell and many others

OttoOct 24, 2023, 10:16 AM

9 points

1 comment1 min readEA link

Want to win the AGI race? Solve alignment.

leopoldMar 29, 2023, 3:19 PM

56 points

6 comments5 min readEA link

(www.forourposterity.com)

Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI

Steven ByrnesMay 8, 2025, 9:11 PM

8 points

1 comment1 min readEA link

Architecting Trust: A Conceptual Blueprint for Verifiable AI Governance

Ihor IvlievMar 31, 2025, 6:48 PM

2 points

0 comments8 min readEA link

AI alignment researchers don’t (seem to) stack

So8resFeb 21, 2023, 12:48 AM

47 points

3 comments1 min readEA link

Against Agents as an Approach to Aligned Transformative AI

𝕮𝖎𝖓𝖊𝖗𝖆Dec 27, 2022, 12:47 AM

4 points

0 comments1 min readEA link

A New Model for Compute Center Verification

Damin Curtis🔹Oct 10, 2023, 7:23 PM

21 points

2 comments5 min readEA link

Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner x Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium

MiguelApr 26, 2023, 12:40 AM

13 points

0 comments10 min readEA link

Feedback Request on EA Philippines’ Career Advice Research for Technical AI Safety

BrianTanOct 3, 2020, 10:39 AM

19 points

5 comments4 min readEA link

Orthogonal: A new agent foundations alignment organization

Tamsin LeakeApr 19, 2023, 8:17 PM

38 points

0 comments1 min readEA link

[Question] Can we ever ensure AI alignment if we can only test AI personas?

Karl von WendtMar 16, 2025, 8:06 AM

8 points

0 comments1 min readEA link

‘Force multipliers’ for EA research

Craig DraytonJun 18, 2022, 1:39 PM

18 points

7 comments4 min readEA link

Join the Virtual AI Safety Unconference (VAISU)!

NguyênJun 21, 2023, 4:46 AM

23 points

0 comments1 min readEA link

(vaisu.ai)

[Question] Why not to solve alignment by making superintelligent humans?

PatoOct 16, 2022, 9:26 PM

9 points

12 comments1 min readEA link

AI Defaults: A Neglected Lever for Animal Welfare?

andiehansenMay 30, 2025, 9:59 AM

13 points

0 comments10 min readEA link

Worries about latent reasoning in LLMs

CBiddulphJan 20, 2025, 9:09 AM

20 points

1 comment1 min readEA link

What Areas of AI Safety and Alignment Research are Largely Ignored?

Andy E WilliamsDec 27, 2024, 12:19 PM

4 points

0 comments1 min readEA link

Against Explosive Growth

c.troutSep 4, 2024, 9:45 PM

24 points

9 comments1 min readEA link

Enabling more feedback

JJ HepburnDec 10, 2021, 6:52 AM

41 points

3 comments3 min readEA link

Apply for MATS Winter 2023-24!

utilistrutilOct 21, 2023, 2:34 AM

34 points

2 comments5 min readEA link

(www.lesswrong.com)

13 Recent Publications on Existential Risk (Jan 2021 update)

HaydnBelfieldFeb 8, 2021, 12:42 PM

7 points

2 comments10 min readEA link

Report on Semi-informative Priors for AI timelines (Open Philanthropy)

Tom_DavidsonMar 26, 2021, 5:46 PM

62 points

6 comments2 min readEA link

Implications of the inference scaling paradigm for AI safety

Ryan KiddJan 15, 2025, 12:59 AM

47 points

5 comments1 min readEA link

Alexander and Yudkowsky on AGI goals

Scott AlexanderJan 31, 2023, 11:36 PM

29 points

1 comment1 min readEA link

Recruit the World’s best for AGI Alignment

Greg_Colbourn ⏸️ Mar 30, 2023, 4:41 PM

34 points

8 comments22 min readEA link

Orthogonality is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆Apr 3, 2023, 1:57 AM

18 points

4 comments1 min readEA link

Clarifying two uses of “alignment”

Matthew_BarnettMar 10, 2024, 5:41 PM

36 points

28 comments4 min readEA link

AGI alignment results from a series of aligned actions

hanadulsetDec 27, 2021, 7:33 PM

15 points

1 comment6 min readEA link

Discovering alignment windfalls reduces AI risk

James BradyFeb 28, 2024, 9:14 PM

22 points

3 comments8 min readEA link

(blog.elicit.com)

Can we safely automate alignment research?

Joe_CarlsmithApr 30, 2025, 5:37 PM

13 points

1 comment1 min readEA link

(joecarlsmith.com)

6 (Potential) Misconceptions about AI Intellectuals

Ozzie GooenFeb 14, 2025, 11:51 PM

30 points

2 comments12 min readEA link

Finding Voice

khayaliJun 3, 2025, 1:27 AM

3 points

0 comments2 min readEA link

The alignment problem from a deep learning perspective

richard_ngoAug 11, 2022, 3:18 AM

58 points

0 comments26 min readEA link

How do we solve the alignment problem?

Joe_CarlsmithFeb 13, 2025, 6:27 PM

28 points

1 comment1 min readEA link

(joecarlsmith.substack.com)

AI safety starter pack

mariushobbhahnMar 28, 2022, 4:05 PM

126 points

13 comments6 min readEA link

Why misaligned AGI won’t lead to mass killings (and what actually matters instead)

Julian NalenzFeb 6, 2025, 1:22 PM

−3 points

5 comments3 min readEA link

(blog.hermesloom.org)

The Compendium, A full argument about extinction risk from AGI

adamShimiOct 31, 2024, 12:02 PM

9 points

1 comment2 min readEA link

(www.thecompendium.ai)

LLMs are weirder than you think

Derek ShillerNov 20, 2024, 1:39 PM

64 points

3 comments22 min readEA link

Video and transcript of presentation on Scheming AIs

Joe_CarlsmithMar 22, 2024, 3:56 PM

23 points

1 comment1 min readEA link

[Question] Who would you have on your dream team for solving AGI Alignment?

Greg_Colbourn ⏸️ Aug 25, 2022, 1:34 PM

10 points

14 comments1 min readEA link

Critique of Superintelligence Part 5

James FodorDec 13, 2018, 5:19 AM

12 points

2 comments6 min readEA link

[Question] What are the biggest obstacles on AI safety research career?

jackchang110Mar 31, 2023, 2:53 PM

2 points

1 comment1 min readEA link

AI Safety Unconference NeurIPS 2022

Orpheus_LummisNov 7, 2022, 3:39 PM

13 points

5 comments1 min readEA link

(aisafetyevents.org)

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc CarauleanuMar 13, 2025, 7:09 PM

8 points

0 comments1 min readEA link

[Link and commentary] Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society

MichaelA🔸Mar 14, 2020, 9:04 AM

18 points

0 comments6 min readEA link

[Question] What predictions from theoretical AI Safety research have been confirmed by empirical work?

freedomandutilityDec 29, 2024, 8:19 AM

43 points

10 comments1 min readEA link

AI’s goals may not match ours

Vishakha AgrawalMay 28, 2025, 12:07 PM

2 points

0 comments3 min readEA link

Designing Artificial Wisdom: Decision Forecasting AI & Futarchy

Jordan ArelJul 14, 2024, 5:10 AM

5 points

1 comment6 min readEA link

“AI Alignment” is a Dangerously Overloaded Term

RokoDec 15, 2023, 3:06 PM

20 points

2 comments3 min readEA link

Interview with Tom Chivers: “AI is a plausible existential risk, but it feels as if I’m in Pascal’s mugging”

felix.hFeb 21, 2021, 1:41 PM

16 points

1 comment7 min readEA link

Apply to a small iteration of MLAB to be run in Oxford

Rio PAug 29, 2023, 7:39 PM

11 points

0 comments1 min readEA link

Introducing a New Course on the Economics of AI

akorinekDec 21, 2021, 4:55 AM

84 points

6 comments2 min readEA link

[Question] Benefits/Risks of Scott Aaronson’s Orthodox/Reform Framing for AI Alignment

JeremyNov 21, 2022, 5:47 PM

15 points

5 comments1 min readEA link

(scottaaronson.blog)

Good Futures Initiative: Winter Project Internship

a_e_rNov 27, 2022, 11:27 PM

67 points

7 comments3 min readEA link

The Animal Welfare Case for Open Access: Breaking Barriers to Scientific Knowledge and Enhancing LLM Training

Wladimir J. AlonsoNov 23, 2024, 1:07 PM

32 points

2 comments3 min readEA link

Call for Pythia-style foundation model suite for alignment research

LucretiaMay 1, 2023, 8:26 PM

10 points

0 comments1 min readEA link

Summary of “The Precipice” (2 of 4): We are a danger to ourselves

rileyharrisAug 13, 2023, 11:53 PM

5 points

0 comments8 min readEA link

(www.millionyearview.com)

The counting argument for scheming (Sections 4.1 and 4.2 of “Scheming AIs”)

Joe_CarlsmithDec 6, 2023, 7:28 PM

9 points

1 comment1 min readEA link

Podcast: Krister Bykvist on moral uncertainty, rationality, metaethics, AI and future populations

Gus DockerOct 21, 2021, 3:17 PM

8 points

0 comments1 min readEA link

(www.utilitarianpodcast.com)

Share your requests for ChatGPT

Kate TranDec 5, 2022, 6:43 PM

8 points

5 comments1 min readEA link

Asya Bergal: Reasons you might think human-level AI is unlikely to happen soon

EA GlobalAug 26, 2020, 4:01 PM

24 points

2 comments17 min readEA link

(www.youtube.com)

AI Benefits Post 1: Introducing “AI Benefits”

Cullen 🔸Jun 22, 2020, 4:58 PM

10 points

2 comments3 min readEA link

Benchmark Performance is a Poor Measure of Generalisable AI Reasoning Capabilities

James FodorFeb 21, 2025, 4:25 AM

12 points

3 comments24 min readEA link

AI Forecasting Dictionary (Forecasting infrastructure, part 1)

terraformAug 8, 2019, 1:16 PM

18 points

0 comments5 min readEA link

Should we expect the future to be good?

Neil CrawfordApr 30, 2025, 12:45 AM

38 points

1 comment14 min readEA link

Long-Term Future Fund: Ask Us Anything!

AdamGleaveDec 3, 2020, 1:44 PM

89 points

153 comments1 min readEA link

AI Control idea: Give an AGI the primary objective of deleting itself, but construct obstacles to this as best we can. All other objectives are secondary to this primary goal.

JustausernameApr 3, 2023, 2:32 PM

7 points

4 comments1 min readEA link

AI for Epistemics Hackathon

AustinMar 14, 2025, 8:46 PM

29 points

4 comments1 min readEA link

(manifund.substack.com)

[Question] What do you mean with ‘alignment is solvable in principle’?

RemmeltJan 17, 2025, 3:03 PM

10 points

1 comment1 min readEA link

Apples, Oranges, and AGI: Why Incommensurability May be an Obstacle in AI Safety

Allan McCayMar 28, 2025, 2:50 PM

3 points

2 comments2 min readEA link

How could we know that an AGI system will have good consequences?

So8resNov 7, 2022, 10:42 PM

25 points

0 comments1 min readEA link

ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text

Milan Weibel🔹Jan 4, 2023, 10:10 PM

6 points

0 comments4 min readEA link

(www.lesswrong.com)

Against GDP as a metric for timelines and takeoff speeds

kokotajlodDec 29, 2020, 5:50 PM

47 points

6 comments14 min readEA link

David Krueger on AI Alignment in Academia and Coordination

Michaël TrazziJan 7, 2023, 9:14 PM

32 points

1 comment3 min readEA link

(theinsideview.ai)

The Concept of Boundary Layer in Language Games and Its Implications for AI

MirageMar 24, 2023, 1:50 PM

1 point

0 comments7 min readEA link

[Question] I’m interviewing Jan Leike, co-lead of OpenAI’s new Superalignment project. What should I ask him?

Robert_WiblinJul 18, 2023, 6:25 PM

51 points

19 comments1 min readEA link

[Question] Half-baked alignment idea

ozbMar 28, 2023, 5:18 AM

9 points

2 comments1 min readEA link

[Question] Any Philosophy PhD recommendations for students interested in Alignment Efforts?

rickyhuang.hexuanJan 18, 2023, 5:54 AM

7 points

6 comments1 min readEA link

Varieties of fake alignment (Section 1.1 of “Scheming AIs”)

Joe_CarlsmithNov 21, 2023, 3:00 PM

6 points

0 comments1 min readEA link

AI safety and consciousness research: A brainstorm

Daniel_FriedrichMar 15, 2023, 2:33 PM

11 points

1 comment9 min readEA link

Expected impact of a career in AI safety under different opinions

Jordan TaylorJun 14, 2022, 2:25 PM

42 points

16 comments11 min readEA link

[Question] Is it valuable to the field of AI Safety to have a neuroscience background?

Samuel NellessenApr 3, 2022, 7:44 PM

18 points

3 comments1 min readEA link

Podcast/video/transcript: Eliezer Yudkowsky—Why AI Will Kill Us, Aligning LLMs, Nature of Intelligence, SciFi, & Rationality

PeterSlatteryApr 9, 2023, 10:37 AM

32 points

2 comments137 min readEA link

(www.youtube.com)

EA Explorer GPT: A New Tool to Explore Effective Altruism

Vlad_TislenkoNov 12, 2023, 3:36 PM

12 points

1 comment1 min readEA link

Perché il deep learning moderno potrebbe rendere difficile l’allineamento delle IA

EA ItalyJan 17, 2023, 11:29 PM

1 point

0 comments16 min readEA link

ML Summer Bootcamp Reflection: Aalto EA Finland

Aayush KucheriaJan 12, 2023, 8:24 AM

15 points

2 comments9 min readEA link

GPTs are Predictors, not Imitators

EliezerYudkowskyApr 8, 2023, 7:59 PM

74 points

12 comments1 min readEA link

In Darkness They Assembled

Charlie SandersMay 6, 2025, 4:25 AM

−3 points

0 comments3 min readEA link

(www.dailymicrofiction.com)

Animal Rights, The Singularity, and Astronomical Suffering

sapphireAug 20, 2020, 8:23 PM

52 points

0 comments3 min readEA link

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Brendon_WongAug 6, 2023, 8:00 AM

6 points

0 comments12 min readEA link

No comments.

AI alignment

Evaluation

Further reading

External links

Related entries