RSS

AI alignment

TagLast edit: 22 Jul 2022 20:58 UTC by Leo

AI alignment is research on how to align AI systems with human or moral goals.

Evaluation

80,000 Hours rates AI alignment a “highest priority area”: a problem at the top of their ranking of global issues assessed by importance, tractability and neglectedness.[1]

Further reading

Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Forum, April 3.

Shah, Rohin (2020) What’s been happening in AI alignment?, Effective Altruism Forum, July 29.

External links

AI Alignment Forum.

Related entries

AI governance | AI forecasting | alignment tax | Center for Human-Compatible Artificial Intelligence | Machine Intelligence Research Institute | rationality community

  1. ^

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks19 Dec 2019 2:58 UTC
147 points
28 comments62 min readEA link

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks18 Dec 2018 4:48 UTC
118 points
28 comments63 min readEA link

AGI Safety Fun­da­men­tals cur­ricu­lum and application

richard_ngo20 Oct 2021 21:45 UTC
123 points
20 comments8 min readEA link
(docs.google.com)

Why AI al­ign­ment could be hard with mod­ern deep learning

Ajeya21 Sep 2021 15:35 UTC
157 points
17 comments14 min readEA link
(www.cold-takes.com)

AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

Andrew Critch21 May 2020 6:55 UTC
29 points
0 comments3 min readEA link
(acritch.com)

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

richard_ngo23 Jan 2019 14:58 UTC
63 points
14 comments8 min readEA link

Why I pri­ori­tize moral cir­cle ex­pan­sion over re­duc­ing ex­tinc­tion risk through ar­tifi­cial in­tel­li­gence alignment

Jacy20 Feb 2018 18:29 UTC
107 points
72 comments35 min readEA link
(www.sentienceinstitute.org)

Del­e­gated agents in prac­tice: How com­pa­nies might end up sel­l­ing AI ser­vices that act on be­half of con­sumers and coal­i­tions, and what this im­plies for safety research

Remmelt26 Nov 2020 16:39 UTC
11 points
0 comments4 min readEA link

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

Rohin Shah13 May 2022 12:19 UTC
102 points
0 comments9 min readEA link

My cur­rent thoughts on MIRI’s “highly re­li­able agent de­sign” work

Daniel_Dewey7 Jul 2017 1:17 UTC
60 points
59 comments19 min readEA link

Stable Emer­gence in a Devel­op­men­tal AI Ar­chi­tec­ture: Re­sults from “Twins V3”

Petra Vojtassakova17 Nov 2025 23:23 UTC
6 points
2 comments2 min readEA link

Prevent­ing an AI-re­lated catas­tro­phe—Prob­lem profile

Benjamin Hilton29 Aug 2022 18:49 UTC
138 points
18 comments4 min readEA link
(80000hours.org)

2016 AI Risk Liter­a­ture Re­view and Char­ity Comparison

Larks13 Dec 2016 4:36 UTC
57 points
12 comments28 min readEA link

The aca­demic con­tri­bu­tion to AI safety seems large

technicalities30 Jul 2020 10:30 UTC
120 points
28 comments9 min readEA link

Hiring en­g­ineers and re­searchers to help al­ign GPT-3

Paul_Christiano1 Oct 2020 18:52 UTC
107 points
19 comments3 min readEA link

AI al­ign­ment re­searchers may have a com­par­a­tive ad­van­tage in re­duc­ing s-risks

Lukas_Gloor15 Feb 2023 13:01 UTC
79 points
5 comments13 min readEA link

Crazy ideas some­times do work

Aryeh Englander4 Sep 2021 3:27 UTC
71 points
8 comments1 min readEA link

Plant-Based De­faults: A Missed Op­por­tu­nity in AI Design

andiehansen8 May 2025 9:37 UTC
37 points
3 comments5 min readEA link

Launch­ing ap­pli­ca­tions for AI Safety Ca­reers Course In­dia 2024

varun_agr1 May 2024 5:30 UTC
23 points
1 comment1 min readEA link

2017 AI Safety Liter­a­ture Re­view and Char­ity Comparison

Larks20 Dec 2017 21:54 UTC
43 points
17 comments23 min readEA link

Why Mo­ral Con­flict Re­s­olu­tion Still Breaks Our Best Safety Tools

JBug18 Nov 2025 7:49 UTC
6 points
0 comments2 min readEA link

AGI safety ca­reer advice

richard_ngo2 May 2023 7:36 UTC
213 points
18 comments13 min readEA link

Large Lan­guage Models as Fi­du­cia­ries to Humans

johnjnay24 Jan 2023 19:53 UTC
25 points
0 comments34 min readEA link
(papers.ssrn.com)

What is it to solve the al­ign­ment prob­lem? (Notes)

Joe_Carlsmith24 Aug 2024 21:19 UTC
32 points
1 comment53 min readEA link

A tale of 2.5 or­thog­o­nal­ity theses

Arepo1 May 2022 13:53 UTC
148 points
31 comments11 min readEA link

Align­ment ideas in­spired by hu­man virtue development

Borys Pikalov18 May 2025 9:36 UTC
6 points
0 comments4 min readEA link

[Question] What are the coolest top­ics in AI safety, to a hope­lessly pure math­e­mat­i­cian?

Jenny K E7 May 2022 7:18 UTC
89 points
29 comments1 min readEA link

AGI safety from first principles

richard_ngo21 Oct 2020 17:42 UTC
77 points
10 comments3 min readEA link
(www.alignmentforum.org)

My per­sonal cruxes for work­ing on AI safety

Buck13 Feb 2020 7:11 UTC
136 points
35 comments44 min readEA link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhub12 Jan 2024 19:51 UTC
65 points
0 comments3 min readEA link
(arxiv.org)

There are no co­her­ence theorems

Elliott Thornley (EJT)20 Feb 2023 21:52 UTC
108 points
49 comments19 min readEA link

In­tro­duc­ing The Non­lin­ear Fund: AI Safety re­search, in­cu­ba­tion, and funding

Kat Woods 🔶 ⏸️18 Mar 2021 14:07 UTC
71 points
32 comments5 min readEA link

Scru­ti­niz­ing AI Risk (80K, #81) - v. quick summary

Ben23 Jul 2020 19:02 UTC
10 points
1 comment3 min readEA link

Draft re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe_Carlsmith28 Apr 2021 21:41 UTC
88 points
34 comments1 min readEA link

[Link post] Co­or­di­na­tion challenges for pre­vent­ing AI conflict

stefan.torges9 Mar 2021 9:39 UTC
58 points
0 comments1 min readEA link
(longtermrisk.org)

AI al­ign­ment shouldn’t be con­flated with AI moral achievement

Matthew_Barnett30 Dec 2023 3:08 UTC
116 points
15 comments5 min readEA link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_Eth18 Apr 2023 8:09 UTC
31 points
2 comments1 min readEA link
(medium.com)

“Aligned with who?” Re­sults of sur­vey­ing 1,000 US par­ti­ci­pants on AI values

Holly Morgan21 Mar 2023 22:07 UTC
41 points
0 comments2 min readEA link
(www.lesswrong.com)

[Question] What is most con­fus­ing to you about AI stuff?

Sam Clarke23 Nov 2021 16:00 UTC
25 points
15 comments1 min readEA link

Coun­ter­ar­gu­ments to the ba­sic AI risk case

Katja_Grace14 Oct 2022 20:30 UTC
287 points
23 comments34 min readEA link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KR7 Jul 2020 17:53 UTC
18 points
0 comments8 min readEA link

Techies Wanted: How STEM Back­grounds Can Ad­vance Safe AI Policy

Daniel_Eth26 May 2025 11:29 UTC
41 points
1 comment29 min readEA link

What is it like do­ing AI safety work?

Kat Woods 🔶 ⏸️21 Feb 2023 19:24 UTC
99 points
2 comments10 min readEA link

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8res15 Jun 2022 14:19 UTC
53 points
2 comments10 min readEA link

De­cep­tive Align­ment is <1% Likely by Default

DavidW21 Feb 2023 15:07 UTC
54 points
26 comments14 min readEA link

TAI Safety Biblio­graphic Database

Jess_Riedel22 Dec 2020 16:03 UTC
61 points
9 comments17 min readEA link

From lan­guage to ethics by au­to­mated reasoning

Michele Campolo21 Nov 2021 15:16 UTC
8 points
0 comments6 min readEA link

AMA: Ajeya Co­tra, re­searcher at Open Phil

Ajeya28 Jan 2021 17:38 UTC
84 points
105 comments1 min readEA link

Cog­ni­tive Science/​Psy­chol­ogy As a Ne­glected Ap­proach to AI Safety

Kaj_Sotala5 Jun 2017 13:46 UTC
40 points
37 comments4 min readEA link

Ngo and Yud­kowsky on al­ign­ment difficulty

richard_ngo15 Nov 2021 22:47 UTC
71 points
13 comments94 min readEA link

An­nounc­ing AI Safety Support

Linda Linsefors19 Nov 2020 20:19 UTC
55 points
0 comments4 min readEA link

Train for in­cor­rigi­bil­ity, then re­verse it (Shut­down Prob­lem Con­test Sub­mis­sion)

Daniel_Eth18 Jul 2023 8:26 UTC
16 points
0 comments2 min readEA link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym Fibír30 Jan 2025 11:57 UTC
−3 points
2 comments10 min readEA link
(tetherware.substack.com)

On Defer­ence and Yud­kowsky’s AI Risk Estimates

bmg19 Jun 2022 14:35 UTC
288 points
194 comments17 min readEA link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC
40 points
1 comment14 min readEA link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 5:35 UTC
126 points
13 comments29 min readEA link

In­tel­lec­tual Diver­sity in AI Safety

KR22 Jul 2020 19:07 UTC
21 points
8 comments3 min readEA link

An­nounc­ing AXRP, the AI X-risk Re­search Podcast

DanielFilan23 Dec 2020 20:10 UTC
32 points
1 comment1 min readEA link

Align­ment 201 curriculum

richard_ngo12 Oct 2022 19:17 UTC
94 points
9 comments1 min readEA link
(www.agisafetyfundamentals.com)

Chain­ing the evil ge­nie: why “outer” AI safety is prob­a­bly easy

titotal30 Aug 2022 13:55 UTC
40 points
12 comments10 min readEA link

[Question] How much EA anal­y­sis of AI safety as a cause area ex­ists?

richard_ngo6 Sep 2019 11:15 UTC
96 points
20 comments2 min readEA link

Ro­hin Shah: What’s been hap­pen­ing in AI al­ign­ment?

EA Global29 Jul 2020 20:15 UTC
18 points
0 comments14 min readEA link
(www.youtube.com)

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

Holden Karnofsky29 Aug 2022 15:48 UTC
164 points
17 comments44 min readEA link

[linkpost] “What Are Rea­son­able AI Fears?” by Robin Han­son, 2023-04-23

Arjun Panickssery14 Apr 2023 23:26 UTC
41 points
3 comments4 min readEA link
(quillette.com)

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

TW1239 May 2022 17:02 UTC
68 points
0 comments6 min readEA link

An­i­mal welfare con­cerns are dom­i­nated by post-ASI futures

RobertM22 Nov 2025 4:48 UTC
11 points
1 comment4 min readEA link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

Chi15 Aug 2020 19:59 UTC
38 points
3 comments39 min readEA link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sid Black23 Sep 2022 18:03 UTC
35 points
0 comments28 min readEA link

Learn­ing so­cietal val­ues from law as part of an AGI al­ign­ment strategy

johnjnay21 Oct 2022 2:03 UTC
20 points
1 comment24 min readEA link

There should be an AI safety pro­ject board

mariushobbhahn14 Mar 2022 16:08 UTC
24 points
3 comments1 min readEA link

AI Risk: In­creas­ing Per­sua­sion Power

kewlcats3 Aug 2020 20:25 UTC
4 points
0 comments1 min readEA link

AI al­ign­ment with hu­mans… but with which hu­mans?

Geoffrey Miller8 Sep 2022 23:43 UTC
51 points
20 comments3 min readEA link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor Leahy9 Apr 2022 15:07 UTC
31 points
0 comments1 min readEA link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

Cullen 🔸20 Jul 2020 22:52 UTC
30 points
2 comments2 min readEA link
(cullenokeefe.com)

Safe AI and moral AI

William D'Alessandro1 Jun 2023 21:18 UTC
3 points
0 comments11 min readEA link

(Even) More Early-Ca­reer EAs Should Try AI Safety Tech­ni­cal Research

tlevin30 Jun 2022 21:14 UTC
86 points
40 comments11 min readEA link

2020 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks21 Dec 2020 15:25 UTC
155 points
16 comments68 min readEA link

Con­nor Leahy on Con­jec­ture and Dy­ing with Dignity

Michaël Trazzi22 Jul 2022 19:30 UTC
34 points
0 comments10 min readEA link
(theinsideview.ai)

Rele­vant pre-AGI possibilities

kokotajlod20 Jun 2020 13:15 UTC
22 points
0 comments1 min readEA link
(aiimpacts.org)

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden Karnofsky29 Nov 2022 18:59 UTC
24 points
0 comments32 min readEA link
(www.cold-takes.com)

High-level hopes for AI alignment

Holden Karnofsky20 Dec 2022 2:11 UTC
123 points
14 comments19 min readEA link
(www.cold-takes.com)

Pos­si­ble OpenAI’s Q* break­through and Deep­Mind’s AlphaGo-type sys­tems plus LLMs

Burnydelic23 Nov 2023 7:02 UTC
13 points
4 comments2 min readEA link

[Question] How strong is the ev­i­dence of un­al­igned AI sys­tems caus­ing harm?

Eevee🔹21 Jul 2020 4:08 UTC
31 points
1 comment1 min readEA link

New re­port on how much com­pu­ta­tional power it takes to match the hu­man brain (Open Philan­thropy)

Aaron Gertler 🔸15 Sep 2020 1:06 UTC
45 points
1 comment18 min readEA link
(www.openphilanthropy.org)

Paul Chris­ti­ano: Cur­rent work in AI alignment

EA Global3 Apr 2020 7:06 UTC
80 points
4 comments24 min readEA link
(www.youtube.com)

Buck Sh­legeris: How I think stu­dents should ori­ent to AI safety

EA Global25 Oct 2020 5:48 UTC
11 points
0 comments1 min readEA link
(www.youtube.com)

The ba­sic rea­sons I ex­pect AGI ruin

RobBensinger18 Apr 2023 3:37 UTC
58 points
13 comments14 min readEA link

The cur­rent al­ign­ment plan, and how we might im­prove it | EAG Bay Area 23

Buck7 Jun 2023 21:03 UTC
66 points
0 comments33 min readEA link

“The Race to the End of Hu­man­ity” – Struc­tural Uncer­tainty Anal­y­sis in AI Risk Models

Froolow19 May 2023 12:03 UTC
48 points
4 comments21 min readEA link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Connor Leahy29 Jul 2022 19:35 UTC
34 points
3 comments19 min readEA link

[Link] How un­der­stand­ing valence could help make fu­ture AIs safer

Milan Griffes8 Oct 2020 18:53 UTC
22 points
2 comments3 min readEA link

Align­ing the Align­ers: En­sur­ing Aligned AI acts for the com­mon good of all mankind

timunderwood16 Jan 2023 11:13 UTC
40 points
2 comments4 min readEA link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 1:23 UTC
166 points
21 comments39 min readEA link

EA, Psy­chol­ogy & AI Safety Research

Sam Ellis26 May 2022 23:46 UTC
29 points
3 comments6 min readEA link

Why the Orthog­o­nal­ity Th­e­sis’s ve­rac­ity is not the point:

Antoine de Scorraille ⏸️23 Jul 2020 15:40 UTC
3 points
0 comments3 min readEA link

Ap­ply to the sec­ond ML for Align­ment Boot­camp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2]

Buck6 May 2022 0:19 UTC
111 points
7 comments6 min readEA link

Ap­ply to the ML for Align­ment Boot­camp (MLAB) in Berkeley [Jan 3 - Jan 22]

Habryka [Deactivated]3 Nov 2021 18:20 UTC
140 points
6 comments1 min readEA link

Speedrun: AI Align­ment Prizes

joe9 Feb 2023 11:55 UTC
27 points
0 comments17 min readEA link

Steer­ing AI to care for an­i­mals, and soon

Andrew Critch14 Jun 2022 1:13 UTC
239 points
37 comments1 min readEA link

Pre­dict re­sponses to the “ex­is­ten­tial risk from AI” survey

RobBensinger28 May 2021 1:38 UTC
36 points
8 comments2 min readEA link

Aspira­tion-based, non-max­i­miz­ing AI agent designs

Bob Jacobs7 May 2024 16:13 UTC
12 points
1 comment38 min readEA link

Mis­gen­er­al­iza­tion as a misnomer

So8res6 Apr 2023 20:43 UTC
48 points
0 comments4 min readEA link

Fi­nal Re­port of the Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence (NSCAI, 2021)

MichaelA🔸1 Jun 2021 8:19 UTC
51 points
3 comments4 min readEA link
(www.nscai.gov)

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_Carlsmith15 Nov 2023 17:16 UTC
71 points
4 comments30 min readEA link

Take­aways from safety by de­fault interviews

AI Impacts7 Apr 2020 2:01 UTC
25 points
2 comments13 min readEA link
(aiimpacts.org)

Nat­u­ral­ism and AI alignment

Michele Campolo24 Apr 2021 16:20 UTC
17 points
3 comments7 min readEA link

VIRTUA: a novel about AI alignment

Karl von Wendt12 Jan 2023 9:37 UTC
23 points
0 comments1 min readEA link

Emer­gent Ven­tures AI

technicalities8 Apr 2022 22:08 UTC
22 points
0 comments1 min readEA link
(marginalrevolution.com)

AI Sleeper Agents: How An­thropic Trains and Catches Them—Video

Writer30 Aug 2025 17:52 UTC
7 points
1 comment7 min readEA link
(youtu.be)

Guardrails vs Goal-di­rect­ed­ness in AI Alignment

freedomandutility30 Dec 2023 12:58 UTC
13 points
2 comments1 min readEA link

What I mean by “al­ign­ment is in large part about mak­ing cog­ni­tion aimable at all”

So8res30 Jan 2023 15:22 UTC
57 points
3 comments2 min readEA link

Law-Fol­low­ing AI 2: In­tent Align­ment + Su­per­in­tel­li­gence → Lawless AI (By De­fault)

Cullen 🔸27 Apr 2022 17:18 UTC
19 points
0 comments6 min readEA link

Is AI fore­cast­ing a waste of effort on the mar­gin?

Emrik5 Nov 2022 0:41 UTC
12 points
6 comments3 min readEA link

How to get tech­nolog­i­cal knowl­edge on AI/​ML (for non-tech peo­ple)

FangFang30 Jun 2021 7:53 UTC
63 points
7 comments5 min readEA link

An­drew Critch: Log­i­cal in­duc­tion — progress in AI alignment

EA Global6 Aug 2016 0:40 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

Crit­i­cal Re­view of ‘The Precipice’: A Re­assess­ment of the Risks of AI and Pandemics

James Fodor11 May 2020 11:11 UTC
111 points
32 comments26 min readEA link

Pile of Law and Law-Fol­low­ing AI

Cullen 🔸13 Jul 2022 0:29 UTC
28 points
2 comments3 min readEA link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford29 Mar 2022 19:47 UTC
13 points
0 comments3 min readEA link

[Question] If AIs had sub­cor­ti­cal brain simu­la­tion, would that solve the al­ign­ment prob­lem?

Rainbow Affect31 Jul 2023 15:48 UTC
1 point
0 comments2 min readEA link

Quick sur­vey on AI al­ign­ment resources

frances_lorenz30 Jun 2022 19:08 UTC
15 points
0 comments1 min readEA link

[Question] How should we in­vest in “long-term short-ter­mism” given the like­li­hood of trans­for­ma­tive AI?

James_Banks12 Jan 2021 23:54 UTC
8 points
0 comments1 min readEA link

Three Im­pacts of Ma­chine Intelligence

Paul_Christiano23 Aug 2013 10:10 UTC
33 points
5 comments8 min readEA link
(rationalaltruist.com)

Eric Drexler: Pare­to­topian goal alignment

EA Global15 Mar 2019 14:51 UTC
16 points
0 comments10 min readEA link
(www.youtube.com)

On AI and Compute

johncrox3 Apr 2019 21:26 UTC
39 points
12 comments8 min readEA link

Mauhn Re­leases AI Safety Documentation

Berg Severens2 Jul 2021 12:19 UTC
4 points
2 comments1 min readEA link

LLMs might not be the fu­ture of search: at least, not yet.

James-Hartree-Law22 Jan 2025 21:40 UTC
4 points
1 comment4 min readEA link

[Question] What are your recom­men­da­tions for tech­ni­cal AI al­ign­ment pod­casts?

Evan_Gaensbauer11 May 2022 21:52 UTC
13 points
4 comments1 min readEA link

Max Teg­mark: Risks and benefits of ad­vanced ar­tifi­cial intelligence

EA Global5 Aug 2016 9:19 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

Defin­ing al­ign­ment research

richard_ngo19 Aug 2024 22:49 UTC
48 points
1 comment7 min readEA link

[Question] Is there ev­i­dence that recom­mender sys­tems are chang­ing users’ prefer­ences?

zdgroff12 Apr 2021 19:11 UTC
60 points
15 comments1 min readEA link

Dis­con­tin­u­ous progress in his­tory: an update

AI Impacts17 Apr 2020 16:28 UTC
69 points
3 comments24 min readEA link

Large Lan­guage Models as Cor­po­rate Lob­by­ists, and Im­pli­ca­tions for So­cietal-AI Alignment

johnjnay4 Jan 2023 22:22 UTC
10 points
6 comments8 min readEA link

AGI x-risk timelines: 10% chance (by year X) es­ti­mates should be the head­line, not 50%.

Greg_Colbourn ⏸️ 1 Mar 2022 12:02 UTC
69 points
22 comments2 min readEA link

[Question] Why should we *not* put effort into AI safety re­search?

Ben Thompson16 May 2021 5:11 UTC
15 points
5 comments1 min readEA link

[Question] Are we con­fi­dent that su­per­in­tel­li­gent ar­tifi­cial in­tel­li­gence dis­em­pow­er­ing hu­mans would be bad?

Vasco Grilo🔸10 Jun 2023 9:24 UTC
24 points
27 comments1 min readEA link

When “yang” goes wrong

Joe_Carlsmith8 Jan 2024 16:35 UTC
57 points
1 comment13 min readEA link

[Question] How can I bet on short timelines?

kokotajlod7 Nov 2020 12:45 UTC
33 points
12 comments2 min readEA link

Order Mat­ters for De­cep­tive Alignment

DavidW15 Feb 2023 20:12 UTC
20 points
1 comment1 min readEA link
(www.lesswrong.com)

[Question] Align­ment & Ca­pa­bil­ities: What’s the differ­ence?

John G. Halstead31 Aug 2023 22:13 UTC
50 points
10 comments1 min readEA link

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. Murphy20 Jan 2022 20:48 UTC
20 points
7 comments3 min readEA link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete Citrini15 Sep 2021 19:05 UTC
25 points
0 comments8 min readEA link

Brain-com­puter in­ter­faces and brain organoids in AI al­ign­ment?

freedomandutility15 Apr 2023 22:28 UTC
8 points
2 comments1 min readEA link

Shah and Yud­kowsky on al­ign­ment failures

EliezerYudkowsky28 Feb 2022 19:25 UTC
38 points
7 comments92 min readEA link

The Prob­lem With the Word ‘Align­ment’

Peli Grietzer21 May 2024 21:37 UTC
13 points
1 comment6 min readEA link

[Creative Writ­ing Con­test] An AI Safety Limerick

Ben_West🔸18 Oct 2021 19:11 UTC
21 points
5 comments1 min readEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_Carlsmith26 Nov 2023 23:00 UTC
12 points
1 comment6 min readEA link

Align­ment Boot­strap­ping Is Dangerous

MichaelDickens27 Nov 2025 18:18 UTC
14 points
0 comments2 min readEA link

He­len Toner: The Open Philan­thropy Pro­ject’s work on AI risk

EA Global3 Nov 2017 7:43 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

Yitz23 Sep 2022 5:08 UTC
49 points
7 comments5 min readEA link

[Question] What kind of event, tar­geted to un­der­grad­u­ate CS ma­jors, would be most effec­tive at get­ting peo­ple to work on AI safety?

CBiddulph19 Sep 2021 16:19 UTC
9 points
1 comment1 min readEA link

Les­sons learned from talk­ing to >100 aca­demics about AI safety

mariushobbhahn10 Oct 2022 13:16 UTC
138 points
21 comments12 min readEA link

I’m Cul­len O’Keefe, a Policy Re­searcher at OpenAI, AMA

Cullen 🔸11 Jan 2020 4:13 UTC
45 points
68 comments1 min readEA link

What does (and doesn’t) AI mean for effec­tive al­tru­ism?

EA Global12 Aug 2017 7:00 UTC
9 points
0 comments12 min readEA link

[Question] Is this a good way to bet on short timelines?

kokotajlod28 Nov 2020 14:31 UTC
17 points
16 comments1 min readEA link

[Question] Should the EA com­mu­nity have a DL en­g­ineer­ing fel­low­ship?

PabloAMC 🔸24 Dec 2021 13:43 UTC
26 points
6 comments1 min readEA link

The Mul­tidis­ci­plinary Ap­proach to Align­ment (MATA) and Archety­pal Trans­fer Learn­ing (ATL)

Miguel19 Jun 2023 3:23 UTC
4 points
0 comments7 min readEA link

EA megapro­jects continued

mariushobbhahn3 Dec 2021 10:33 UTC
183 points
48 comments7 min readEA link

A mesa-op­ti­miza­tion per­spec­tive on AI valence and moral patienthood

jacobpfau9 Sep 2021 22:23 UTC
10 points
18 comments17 min readEA link

[Question] What would you do if you had a lot of money/​power/​in­fluence and you thought that AI timelines were very short?

Greg_Colbourn ⏸️ 12 Nov 2021 21:59 UTC
29 points
8 comments1 min readEA link

Quan­tify­ing the Far Fu­ture Effects of Interventions

MichaelDickens18 May 2016 2:15 UTC
9 points
0 comments11 min readEA link

What does it mean for an AGI to be ‘safe’?

So8res7 Oct 2022 4:43 UTC
53 points
21 comments3 min readEA link

AI safety tax dynamics

Owen Cotton-Barratt23 Oct 2024 12:21 UTC
22 points
9 comments6 min readEA link
(strangecities.substack.com)

Align­ment Stress Sig­na­tures: When Safe AI Be­haves Like It’s Traumatized

Petra Vojtassakova26 Oct 2025 9:41 UTC
8 points
0 comments2 min readEA link

In­tro­duc­ing the Prin­ci­ples of In­tel­li­gent Be­havi­our in Biolog­i­cal and So­cial Sys­tems (PIBBSS) Fellowship

adamShimi18 Dec 2021 15:25 UTC
37 points
5 comments10 min readEA link

[Cause Ex­plo­ra­tion Prizes] Ex­pand­ing com­mu­ni­ca­tion about AGI risks

Ines22 Sep 2022 5:30 UTC
13 points
0 comments11 min readEA link

Shal­low re­view of live agen­das in al­ign­ment & safety

technicalities27 Nov 2023 11:33 UTC
76 points
8 comments29 min readEA link

Some AI Gover­nance Re­search Ideas

MarkusAnderljung3 Jun 2021 10:51 UTC
102 points
5 comments2 min readEA link

Soares, Tal­linn, and Yud­kowsky dis­cuss AGI cognition

EliezerYudkowsky29 Nov 2021 17:28 UTC
15 points
0 comments40 min readEA link

[Question] Ca­reer Ad­vice: Philos­o­phy + Pro­gram­ming → AI Safety

tcelferact18 Mar 2022 15:09 UTC
30 points
11 comments2 min readEA link

Ar­tifi­cial in­tel­li­gence ca­reer stories

EA Global25 Oct 2020 6:56 UTC
12 points
0 comments1 min readEA link
(www.youtube.com)

Chris­ti­ano and Yud­kowsky on AI pre­dic­tions and hu­man intelligence

EliezerYudkowsky23 Feb 2022 16:51 UTC
31 points
0 comments42 min readEA link

[Question] What is an ex­am­ple of re­cent, tan­gible progress in AI safety re­search?

Aaron Gertler 🔸14 Jun 2021 5:29 UTC
35 points
4 comments1 min readEA link

Com­pendium of prob­lems with RLHF

Raphaël S30 Jan 2023 8:48 UTC
18 points
0 comments10 min readEA link

Shar­ing the World with Digi­tal Minds

Aaron Gertler 🔸1 Dec 2020 8:00 UTC
12 points
1 comment1 min readEA link
(www.nickbostrom.com)

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

Katja_Grace6 Apr 2021 21:44 UTC
19 points
1 comment11 min readEA link
(worldspiritsockpuppet.com)

[linkpost] Shar­ing pow­er­ful AI mod­els: the emerg­ing paradigm of struc­tured access

ts20 Jan 2022 21:10 UTC
11 points
3 comments1 min readEA link

In­for­ma­tion se­cu­rity ca­reers for GCR reduction

ClaireZabel20 Jun 2019 23:56 UTC
187 points
35 comments8 min readEA link

Sur­vey on AI ex­is­ten­tial risk scenarios

Sam Clarke8 Jun 2021 17:12 UTC
159 points
11 comments6 min readEA link

Key Papers in Lan­guage Model Safety

aog20 Jun 2022 14:59 UTC
20 points
0 comments22 min readEA link

[Question] What are the challenges and prob­lems with pro­gram­ming law-break­ing con­straints into AGI?

Michael St Jules 🔸2 Feb 2020 20:53 UTC
20 points
34 comments1 min readEA link

Con­sider pay­ing me to do AI safety re­search work

Rupert5 Nov 2020 8:09 UTC
11 points
3 comments2 min readEA link

Some global catas­trophic risk estimates

Tamay10 Feb 2021 19:32 UTC
106 points
15 comments1 min readEA link

Katja Grace: AI safety

EA Global11 Aug 2017 8:19 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram Rachum29 Dec 2022 16:09 UTC
4 points
0 comments1 min readEA link

Tan Zhi Xuan: AI al­ign­ment, philo­soph­i­cal plu­ral­ism, and the rele­vance of non-Western philosophy

EA Global21 Nov 2020 8:12 UTC
20 points
1 comment1 min readEA link
(www.youtube.com)

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah3 Jan 2020 7:52 UTC
58 points
12 comments10 min readEA link
(www.alignmentforum.org)

Jesse Clif­ton: Open-source learn­ing — a bar­gain­ing approach

EA Global18 Oct 2019 18:05 UTC
10 points
0 comments1 min readEA link
(www.youtube.com)

AI things that are per­haps as im­por­tant as hu­man-con­trol­led AI

Chi3 Mar 2024 18:07 UTC
117 points
9 comments21 min readEA link

An Anal­y­sis of Sys­temic Risk and Ar­chi­tec­tural Re­quire­ments for the Con­tain­ment of Re­cur­sively Self-Im­prov­ing AI

Ihor Ivliev17 Jun 2025 0:16 UTC
2 points
5 comments4 min readEA link

Law-Fol­low­ing AI 3: Lawless AI Agents Un­der­mine Sta­bi­liz­ing Agreements

Cullen 🔸27 Apr 2022 17:20 UTC
28 points
3 comments3 min readEA link

[Linkpost] How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency

Jackson Wagner14 Feb 2022 21:40 UTC
10 points
0 comments1 min readEA link

On the abo­li­tion of man

Joe_Carlsmith18 Jan 2024 18:17 UTC
71 points
4 comments41 min readEA link

The Parable of the Boy Who Cried 5% Chance of Wolf

Kat Woods 🔶 ⏸️15 Aug 2022 14:22 UTC
80 points
8 comments2 min readEA link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

johnjnay26 Oct 2022 1:24 UTC
7 points
1 comment2 min readEA link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-Smith4 Jun 2022 21:36 UTC
270 points
9 comments39 min readEA link

Jan Leike, He­len Toner, Malo Bour­gon, and Miles Brundage: Work­ing in AI

EA Global11 Aug 2017 8:19 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

Get­ting started in­de­pen­dently in AI Safety

JJ Hepburn6 Jul 2021 15:20 UTC
41 points
10 comments2 min readEA link

Timelines are short, p(doom) is high: a global stop to fron­tier AI de­vel­op­ment un­til x-safety con­sen­sus is our only rea­son­able hope

Greg_Colbourn ⏸️ 12 Oct 2023 11:24 UTC
78 points
83 comments9 min readEA link

Syd­ney AI Safety Fellowship

Chris Leong2 Dec 2021 7:35 UTC
16 points
0 comments2 min readEA link

AGI Predictions

Pablo21 Nov 2020 12:02 UTC
36 points
0 comments1 min readEA link
(www.lesswrong.com)

On pre­sent­ing the case for AI risk

Aryeh Englander8 Mar 2022 21:37 UTC
114 points
12 comments4 min readEA link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
6 points
0 comments3 min readEA link

[Question] Is it crunch time yet? If so, who can help?

Nicholas Kross13 Oct 2021 4:11 UTC
29 points
9 comments1 min readEA link

Don’t Call It AI Alignment

Gil20 Feb 2023 5:27 UTC
16 points
7 comments2 min readEA link

[Question] Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson Jones4 Nov 2022 0:58 UTC
11 points
1 comment3 min readEA link

The case for more Align­ment Tar­get Anal­y­sis (ATA)

Chi20 Sep 2024 1:14 UTC
25 points
0 comments17 min readEA link

Ngo and Yud­kowsky on AI ca­pa­bil­ity gains

richard_ngo19 Nov 2021 1:54 UTC
23 points
4 comments39 min readEA link

Oth­er­ness and con­trol in the age of AGI

Joe_Carlsmith2 Jan 2024 18:15 UTC
37 points
1 comment7 min readEA link

[Question] I’m in­ter­view­ing Max Teg­mark about AI safety and more. What shouId I ask him?

Robert_Wiblin13 May 2022 15:32 UTC
18 points
2 comments1 min readEA link

Long-Term Fu­ture Fund: May 2021 grant recommendations

abergal27 May 2021 6:44 UTC
110 points
17 comments57 min readEA link

How Do AI Timelines Affect Giv­ing Now vs. Later?

MichaelDickens3 Aug 2021 3:36 UTC
36 points
8 comments8 min readEA link

Bryan John­son seems more EA al­igned than I expected

PeterSlattery22 Apr 2024 9:38 UTC
13 points
27 comments2 min readEA link
(www.youtube.com)

[Question] What con­sid­er­a­tions in­fluence whether I have more in­fluence over short or long timelines?

kokotajlod5 Nov 2020 19:57 UTC
19 points
0 comments1 min readEA link

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice Jacobine🔸🏳️‍⚧️12 Feb 2025 9:15 UTC
13 points
0 comments1 min readEA link
(www.emergent-values.ai)

Gentle­ness and the ar­tifi­cial Other

Joe_Carlsmith2 Jan 2024 18:21 UTC
90 points
2 comments11 min readEA link

Why AI is Harder Than We Think—Me­lanie Mitchell

Eevee🔹28 Apr 2021 8:19 UTC
45 points
7 comments2 min readEA link
(arxiv.org)

Thoughts on short timelines

Tobias_Baumann23 Oct 2018 15:59 UTC
22 points
14 comments5 min readEA link

Sym­bio­sis, not al­ign­ment, as the goal for liberal democ­ra­cies in the tran­si­tion to ar­tifi­cial gen­eral intelligence

simonfriederich17 Mar 2023 13:04 UTC
18 points
2 comments24 min readEA link
(rdcu.be)

Im­por­tant, ac­tion­able re­search ques­tions for the most im­por­tant century

Holden Karnofsky24 Feb 2022 16:34 UTC
301 points
13 comments19 min readEA link

SERI ML ap­pli­ca­tion dead­line is ex­tended un­til May 22.

Viktoria Malyasova22 May 2022 0:13 UTC
13 points
3 comments1 min readEA link

Vic­to­ria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël Trazzi12 Jan 2023 17:09 UTC
16 points
0 comments4 min readEA link
(www.theinsideview.ai)

AI al­ign­ment re­search links

Holden Karnofsky6 Jan 2022 5:52 UTC
16 points
0 comments6 min readEA link
(www.cold-takes.com)

Messy per­sonal stuff that af­fected my cause pri­ori­ti­za­tion (or: how I started to care about AI safety)

Julia_Wise🔸5 May 2022 17:59 UTC
269 points
14 comments2 min readEA link

Tech­ni­cal AGI safety re­search out­side AI

richard_ngo18 Oct 2019 15:02 UTC
91 points
5 comments3 min readEA link

Why Mo­ral Weights Have Two Types and How to Mea­sure Them

Beyond Singularity17 Jul 2025 10:58 UTC
17 points
4 comments4 min readEA link

Some promis­ing ca­reer ideas be­yond 80,000 Hours’ pri­or­ity paths

Arden Koehler26 Jun 2020 10:34 UTC
142 points
28 comments15 min readEA link

Law-Fol­low­ing AI 1: Se­quence In­tro­duc­tion and Structure

Cullen 🔸27 Apr 2022 17:16 UTC
35 points
2 comments9 min readEA link

In­creased Availa­bil­ity and Willing­ness for De­ploy­ment of Re­sources for Effec­tive Altru­ism and Long-Termism

Evan_Gaensbauer29 Dec 2021 20:20 UTC
46 points
1 comment2 min readEA link

7 es­says on Build­ing a Bet­ter Future

Jamie_Harris24 Jun 2022 14:28 UTC
21 points
0 comments2 min readEA link

Seek­ing Feed­back: An Ini­ti­a­tive on AI, Men­tal Health, and Alignment

Gina Hafez30 Sep 2025 16:14 UTC
16 points
4 comments6 min readEA link

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe_Carlsmith30 Apr 2025 17:43 UTC
11 points
1 comment24 min readEA link
(joecarlsmith.com)

On the cor­re­spon­dence be­tween AI-mis­al­ign­ment and cog­ni­tive dis­so­nance us­ing a be­hav­ioral eco­nomics model

Stijn Bruers 🔸1 Nov 2022 9:15 UTC
11 points
0 comments6 min readEA link

Eli Lifland on Nav­i­gat­ing the AI Align­ment Landscape

Ozzie Gooen1 Feb 2023 0:07 UTC
48 points
9 comments31 min readEA link
(quri.substack.com)

“Ex­is­ten­tial risk from AI” sur­vey results

RobBensinger1 Jun 2021 20:19 UTC
80 points
35 comments11 min readEA link

Ngo and Yud­kowsky on sci­en­tific rea­son­ing and pivotal acts

EliezerYudkowsky21 Feb 2022 17:00 UTC
33 points
1 comment35 min readEA link

[Question] Is trans­for­ma­tive AI the biggest ex­is­ten­tial risk? Why or why not?

Eevee🔹5 Mar 2022 3:54 UTC
9 points
10 comments1 min readEA link

A Sim­ple Model of AGI De­ploy­ment Risk

djbinder9 Jul 2021 9:44 UTC
30 points
0 comments5 min readEA link

An ML safety in­surance com­pany—shower thoughts

EdoArad18 Oct 2021 7:45 UTC
15 points
4 comments1 min readEA link

AI Safety Needs Great Engineers

Andy Jones23 Nov 2021 21:03 UTC
98 points
14 comments4 min readEA link

How to build a safe ad­vanced AI (Evan Hub­inger) | What’s up in AI safety? (Asya Ber­gal)

EA Global25 Oct 2020 5:48 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

AI al­ign­ment prize win­ners and next round [link]

RyanCarey20 Jan 2018 12:07 UTC
7 points
1 comment1 min readEA link

FLI AI Align­ment pod­cast: Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

evhub1 Jul 2020 20:59 UTC
13 points
2 comments1 min readEA link
(futureoflife.org)

[Link] EAF Re­search agenda: “Co­op­er­a­tion, Con­flict, and Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence”

stefan.torges17 Jan 2020 13:28 UTC
64 points
0 comments1 min readEA link

I’m Buck Sh­legeris, I do re­search and out­reach at MIRI, AMA

Buck15 Nov 2019 22:44 UTC
123 points
228 comments2 min readEA link

AI Safety: Ap­ply­ing to Grad­u­ate Studies

frances_lorenz15 Dec 2021 22:56 UTC
24 points
0 comments12 min readEA link

Atari early

AI Impacts2 Apr 2020 23:28 UTC
34 points
2 comments5 min readEA link
(aiimpacts.org)

[Question] What harm could AI safety do?

SeanEngelhart15 May 2021 1:11 UTC
12 points
7 comments1 min readEA link

[Question] The pos­i­tive case for a fo­cus on achiev­ing safe AI?

vipulnaik25 Jun 2021 4:01 UTC
41 points
1 comment1 min readEA link

Cos­mic AI safety

Magnus Vinding6 Dec 2024 22:32 UTC
24 points
5 comments6 min readEA link

[Question] Why aren’t you freak­ing out about OpenAI? At what point would you start?

AppliedDivinityStudies10 Oct 2021 13:06 UTC
80 points
22 comments2 min readEA link

There are two fac­tions work­ing to pre­vent AI dan­gers. Here’s why they’re deeply di­vided.

Sharmake10 Aug 2022 19:52 UTC
10 points
0 comments4 min readEA link
(www.vox.com)

Is GPT-3 the death of the pa­per­clip max­i­mizer?

matthias_samwald3 Aug 2020 11:34 UTC
4 points
1 comment1 min readEA link

Owen Cot­ton-Bar­ratt: What does (and doesn’t) AI mean for effec­tive al­tru­ism?

EA Global11 Aug 2017 8:19 UTC
10 points
0 comments12 min readEA link
(www.youtube.com)

Align­ment Newslet­ter One Year Retrospective

Rohin Shah10 Apr 2019 7:00 UTC
62 points
22 comments21 min readEA link

Ma­hen­dra Prasad: Ra­tional group de­ci­sion-making

EA Global8 Jul 2020 15:06 UTC
15 points
0 comments16 min readEA link
(www.youtube.com)

List #1: Why stop­ping the de­vel­op­ment of AGI is hard but doable

Remmelt24 Dec 2022 9:52 UTC
24 points
2 comments5 min readEA link

Con­ver­sa­tion on AI risk with Adam Gleave

AI Impacts27 Dec 2019 21:43 UTC
18 points
3 comments4 min readEA link
(aiimpacts.org)

A list of good heuris­tics that the case for AI X-risk fails

Aaron Gertler 🔸16 Jul 2020 9:56 UTC
25 points
9 comments2 min readEA link
(www.alignmentforum.org)

Med­i­ta­tions on ca­reers in AI Safety

PabloAMC 🔸23 Mar 2022 22:00 UTC
88 points
30 comments2 min readEA link

AI Mo­ral Align­ment: The Most Im­por­tant Goal of Our Generation

Ronen Bar26 Mar 2025 12:32 UTC
136 points
32 comments8 min readEA link

What does it mean to be­come an ex­pert in AI Hard­ware?

Toph9 Jan 2021 4:15 UTC
87 points
10 comments11 min readEA link

Twit­ter-length re­sponses to 24 AI al­ign­ment arguments

RobBensinger14 Mar 2022 19:34 UTC
67 points
17 comments8 min readEA link

Who Aligns the Align­ment Re­searchers?

ben.smith5 Mar 2023 23:22 UTC
23 points
4 comments11 min readEA link

VSPE vs. flat­tery: Test­ing emo­tional scaf­fold­ing for early-stage alignment

Astelle Kay24 Jun 2025 9:39 UTC
2 points
1 comment1 min readEA link

Po­ten­tial Risks from Ad­vanced AI

EA Global13 Aug 2017 7:00 UTC
9 points
0 comments18 min readEA link

AI Align­ment: The Case for In­clud­ing Animals

Adrià Moret11 Sep 2025 20:59 UTC
22 points
0 comments1 min readEA link
(philpapers.org)

What suc­cess looks like

mariushobbhahn28 Jun 2022 14:30 UTC
115 points
20 comments19 min readEA link

Fore­cast­ing Trans­for­ma­tive AI: What Kind of AI?

Holden Karnofsky10 Aug 2021 21:38 UTC
62 points
3 comments10 min readEA link

AGI in a vuln­er­a­ble world

AI Impacts2 Apr 2020 3:43 UTC
17 points
0 comments1 min readEA link
(aiimpacts.org)

List #2: Why co­or­di­nat­ing to al­ign as hu­mans to not de­velop AGI is a lot eas­ier than, well… co­or­di­nat­ing as hu­mans with AGI co­or­di­nat­ing to be al­igned with humans

Remmelt24 Dec 2022 9:53 UTC
3 points
0 comments3 min readEA link

Align­ing Recom­mender Sys­tems as Cause Area

IvanVendrov8 May 2019 8:56 UTC
150 points
48 comments13 min readEA link

Disagree­ments about Align­ment: Why, and how, we should try to solve them

ojorgensen8 Aug 2022 22:32 UTC
16 points
6 comments16 min readEA link

[Question] Brief sum­mary of key dis­agree­ments in AI Risk

Aryeh Englander26 Dec 2019 19:40 UTC
31 points
3 comments1 min readEA link

No­body’s on the ball on AGI alignment

leopold29 Mar 2023 14:26 UTC
328 points
66 comments9 min readEA link
(www.forourposterity.com)

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew Critch15 Dec 2020 12:15 UTC
12 points
1 comment56 min readEA link
(alignmentforum.org)

What Should the Aver­age EA Do About AI Align­ment?

Raemon25 Feb 2017 20:07 UTC
42 points
39 comments7 min readEA link

Draft re­port on AI timelines

Ajeya15 Dec 2020 12:10 UTC
35 points
0 comments1 min readEA link
(alignmentforum.org)

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
50 points
4 comments13 min readEA link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
9 points
0 comments4 min readEA link

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

RobBensinger11 Nov 2021 3:21 UTC
60 points
33 comments34 min readEA link

Con­sider try­ing the ELK con­test (I am)

Holden Karnofsky5 Jan 2022 19:42 UTC
110 points
17 comments16 min readEA link

The case for be­com­ing a black-box in­ves­ti­ga­tor of lan­guage models

Buck6 May 2022 14:37 UTC
91 points
7 comments3 min readEA link

13 Very Differ­ent Stances on AGI

Ozzie Gooen27 Dec 2021 23:30 UTC
84 points
23 comments3 min readEA link

Daniel Dewey: The Open Philan­thropy Pro­ject’s work on po­ten­tial risks from ad­vanced AI

EA Global11 Aug 2017 8:19 UTC
7 points
0 comments18 min readEA link
(www.youtube.com)

[Question] Is a ca­reer in mak­ing AI sys­tems more se­cure a mean­ingful way to miti­gate the X-risk posed by AGI?

Kyle O’Brien13 Feb 2022 7:05 UTC
14 points
4 comments1 min readEA link

Red­wood Re­search is hiring for sev­eral roles

Jack R29 Nov 2021 0:18 UTC
75 points
0 comments1 min readEA link

An even deeper atheism

Joe_Carlsmith11 Jan 2024 17:28 UTC
26 points
2 comments15 min readEA link

Why I ex­pect suc­cess­ful (nar­row) alignment

Tobias_Baumann29 Dec 2018 15:46 UTC
18 points
10 comments1 min readEA link
(s-risks.org)

Owain Evans and Vic­to­ria Krakovna: Ca­reers in tech­ni­cal AI safety

EA Global3 Nov 2017 7:43 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

mic30 Jun 2022 18:37 UTC
53 points
1 comment11 min readEA link

An­nounc­ing the Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety!

DanielFilan21 Sep 2021 0:41 UTC
62 points
0 comments1 min readEA link
(grants.futureoflife.org)

Long-Term Fu­ture Fund: April 2019 grant recommendations

Habryka [Deactivated]23 Apr 2019 7:00 UTC
142 points
242 comments47 min readEA link

Truth­ful AI

Owen Cotton-Barratt20 Oct 2021 15:11 UTC
55 points
14 comments10 min readEA link

Does AI risk “other” the AIs?

Joe_Carlsmith9 Jan 2024 17:51 UTC
23 points
3 comments8 min readEA link

Lev­el­ling Up in AI Safety Re­search Engineering

GabeM2 Sep 2022 4:59 UTC
167 points
21 comments17 min readEA link

New blog: Planned Obsolescence

Ajeya27 Mar 2023 19:46 UTC
198 points
9 comments1 min readEA link
(www.planned-obsolescence.org)

Imi­ta­tion Learn­ing is Prob­a­bly Ex­is­ten­tially Safe

Vasco Grilo🔸30 Apr 2024 17:06 UTC
19 points
7 comments3 min readEA link
(www.openphilanthropy.org)

AI views and dis­agree­ments AMA: Chris­ti­ano, Ngo, Shah, Soares, Yudkowsky

RobBensinger1 Mar 2022 1:13 UTC
30 points
4 comments1 min readEA link
(www.lesswrong.com)

Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds”

EliezerYudkowsky22 Nov 2021 19:42 UTC
42 points
0 comments60 min readEA link

BERI is hiring an ML Soft­ware Engineer

sawyer🔸10 Nov 2021 19:36 UTC
17 points
2 comments1 min readEA link

Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

Ajeya25 Nov 2021 16:30 UTC
18 points
6 comments68 min readEA link

Lan­guage Agents Re­duce the Risk of Ex­is­ten­tial Catastrophe

cdkg29 May 2023 9:59 UTC
29 points
6 comments26 min readEA link

“Slower tech de­vel­op­ment” can be about or­der­ing, grad­u­al­ness, or dis­tance from now

MichaelA🔸14 Nov 2021 20:58 UTC
47 points
3 comments4 min readEA link

Per­sonal thoughts on ca­reers in AI policy and strategy

carrickflynn27 Sep 2017 16:52 UTC
56 points
28 comments18 min readEA link

Col­lin Burns on Align­ment Re­search And Dis­cov­er­ing La­tent Knowl­edge Without Supervision

Michaël Trazzi17 Jan 2023 17:21 UTC
21 points
2 comments4 min readEA link
(theinsideview.ai)

Three kinds of competitiveness

AI Impacts2 Apr 2020 3:46 UTC
10 points
0 comments5 min readEA link
(aiimpacts.org)

Ought: why it mat­ters and ways to help

Paul_Christiano26 Jul 2019 1:56 UTC
52 points
5 comments5 min readEA link

How Misal­igned AI Per­sonas Lead to Hu­man Ex­tinc­tion – Step by Step

Writer19 Jul 2025 13:59 UTC
6 points
1 comment7 min readEA link
(youtu.be)

Two rea­sons we might be closer to solv­ing al­ign­ment than it seems

Kat Woods 🔶 ⏸️24 Sep 2022 17:38 UTC
44 points
17 comments4 min readEA link

An­nounc­ing the Har­vard AI Safety Team

Xander12330 Jun 2022 18:34 UTC
128 points
4 comments5 min readEA link

[Question] What are the top pri­ori­ties in a slow-take­off, mul­ti­po­lar world?

JP Addison🔸25 Aug 2021 8:47 UTC
26 points
9 comments1 min readEA link

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:52 UTC
134 points
12 comments14 min readEA link
(www.neelnanda.io)

Is this com­mu­nity over-em­pha­siz­ing AI al­ign­ment?

Lixiang8 Jan 2023 6:23 UTC
1 point
5 comments1 min readEA link

AI Im­pacts: His­toric trends in tech­nolog­i­cal progress

Aaron Gertler 🔸12 Feb 2020 0:08 UTC
55 points
5 comments3 min readEA link

In­for­mat­ica: Spe­cial Is­sue on Superintelligence

RyanCarey3 May 2017 5:05 UTC
7 points
0 comments2 min readEA link

Michael Page, Dario Amodei, He­len Toner, Tasha McCauley, Jan Leike, & Owen Cot­ton-Bar­ratt: Mus­ings on AI

EA Global11 Aug 2017 8:19 UTC
7 points
0 comments1 min readEA link
(www.youtube.com)

SERI ML Align­ment The­ory Schol­ars Pro­gram 2022

Ryan Kidd27 Apr 2022 16:33 UTC
57 points
2 comments3 min readEA link

Rac­ing through a minefield: the AI de­ploy­ment problem

Holden Karnofsky31 Dec 2022 21:44 UTC
79 points
1 comment13 min readEA link
(www.cold-takes.com)

Open Philan­thropy’s AI gov­er­nance grant­mak­ing (so far)

Aaron Gertler 🔸17 Dec 2020 12:00 UTC
63 points
0 comments6 min readEA link
(www.openphilanthropy.org)

De Dicto and De Se Refer­ence Mat­ters for Alignment

philgoetz3 Oct 2023 21:57 UTC
5 points
2 comments9 min readEA link

AGI risk: analo­gies & arguments

technicalities23 Mar 2021 13:18 UTC
31 points
3 comments8 min readEA link
(www.gleech.org)

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

alexflint12 Mar 2018 2:10 UTC
13 points
11 comments10 min readEA link

Paul Chris­ti­ano on how OpenAI is de­vel­op­ing real solu­tions to the ‘AI al­ign­ment prob­lem’, and his vi­sion of how hu­man­ity will pro­gres­sively hand over de­ci­sion-mak­ing to AI systems

80000_Hours2 Oct 2018 11:49 UTC
6 points
0 comments185 min readEA link

LLMs Are Already Misal­igned: Sim­ple Ex­per­i­ments Prove It

Makham28 Jul 2025 17:23 UTC
4 points
3 comments7 min readEA link

In­ter­view with Ro­man Yam­polskiy about AGI on The Real­ity Check

Darren McKee18 Feb 2023 23:29 UTC
27 points
0 comments1 min readEA link
(www.trcpodcast.com)

AI al­ign­ment as a trans­la­tion problem

Roman Leventov5 Feb 2024 14:14 UTC
3 points
1 comment3 min readEA link

Field Notes from EAG NYC

Lydia Nottingham15 Oct 2025 7:33 UTC
3 points
0 comments4 min readEA link

A Bench­mark for Mea­sur­ing Hon­esty in AI Systems

Mantas Mazeika4 Mar 2025 17:44 UTC
29 points
0 comments2 min readEA link
(www.mask-benchmark.ai)

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence al­ign­ment re­search (ABRIDGED)

Jaime Sevilla5 Sep 2019 14:56 UTC
25 points
4 comments2 min readEA link

Tether­ware #2: What ev­ery hu­man should know about our most likely AI future

Jáchym Fibír28 Feb 2025 11:25 UTC
3 points
0 comments11 min readEA link
(tetherware.substack.com)

The Inevitable Emer­gence of Black-Mar­ket LLM Infrastructure

Tyler Williams8 Aug 2025 19:05 UTC
1 point
0 comments2 min readEA link

Does gen­er­al­ity pay? GPT-3 can provide pre­limi­nary ev­i­dence.

Eevee🔹12 Jul 2020 18:53 UTC
21 points
4 comments2 min readEA link

[Question] Why not offer a multi-mil­lion /​ billion dol­lar prize for solv­ing the Align­ment Prob­lem?

Aryeh Englander17 Apr 2022 16:08 UTC
15 points
9 comments1 min readEA link

De­com­pos­ing al­ign­ment to take ad­van­tage of paradigms

Christopher King4 Jun 2023 14:26 UTC
2 points
0 comments4 min readEA link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:30 UTC
107 points
6 comments22 min readEA link
(www.anthropic.com)

Are AI Models Es­cap­ing Plato’s Cave?

Strad Slater22 Nov 2025 11:46 UTC
2 points
0 comments5 min readEA link
(williamslater2003.medium.com)

Ab­solute Zero: AlphaZero for LLM

alapmi12 May 2025 14:54 UTC
2 points
0 comments1 min readEA link

What Does an ASI Poli­ti­cal Ecol­ogy Mean for Hu­man Sur­vival?

Nathan Sidney23 Feb 2025 8:53 UTC
7 points
3 comments1 min readEA link

How the Hu­man Psy­cholog­i­cal “Pro­gram” Un­der­mines AI Align­ment — and What We Can Do

Beyond Singularity6 May 2025 13:37 UTC
14 points
2 comments3 min readEA link

Align­ment Fak­ing in Large Lan­guage Models

Ryan Greenblatt18 Dec 2024 17:19 UTC
142 points
9 comments10 min readEA link

The ‘Bad Par­ent’ Prob­lem: Why Hu­man So­ciety Com­pli­cates AI Alignment

Beyond Singularity5 Apr 2025 21:08 UTC
11 points
1 comment3 min readEA link

[Question] How to get more aca­demics en­thu­si­as­tic about do­ing AI Safety re­search?

PabloAMC 🔸4 Sep 2021 14:10 UTC
25 points
19 comments1 min readEA link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash Jafari5 Dec 2022 17:37 UTC
30 points
7 comments5 min readEA link

Beg­ging, Plead­ing AI Orgs to Com­ment on NIST AI Risk Man­age­ment Framework

Bridges15 Apr 2022 19:35 UTC
87 points
3 comments2 min readEA link

deleted

funnyfranco18 Mar 2025 19:19 UTC
3 points
9 comments1 min readEA link

Sparks of Ar­tifi­cial Gen­eral In­tel­li­gence: Early ex­per­i­ments with GPT-4 | Microsoft Research

𝕮𝖎𝖓𝖊𝖗𝖆23 Mar 2023 5:45 UTC
15 points
0 comments1 min readEA link
(arxiv.org)

An­nual AGI Bench­mark­ing Event

Metaculus26 Aug 2022 21:31 UTC
20 points
2 comments2 min readEA link
(www.metaculus.com)

Do­ing good… best?

Michele Campolo22 Aug 2025 15:48 UTC
3 points
0 comments2 min readEA link

Un­veiling the Amer­i­can Public Opinion on AI Mo­ra­to­rium and Govern­ment In­ter­ven­tion: The Im­pact of Me­dia Exposure

Otto8 May 2023 10:49 UTC
28 points
5 comments6 min readEA link

The role of academia in AI Safety.

PabloAMC 🔸28 Mar 2022 0:04 UTC
71 points
19 comments3 min readEA link

# Digi­tal Offspring: A Case for Emer­gent Con­scious­ness in AI

MM113 Oct 2025 13:40 UTC
1 point
0 comments3 min readEA link

Some AI safety pro­ject & re­search ideas/​ques­tions for short and long timelines

Lloy2 🔹8 Aug 2025 21:08 UTC
13 points
0 comments5 min readEA link

De­con­fus­ing ‘AI’ and ‘evolu­tion’

Remmelt22 Jul 2025 6:56 UTC
6 points
1 comment28 min readEA link

Mar­ius Hobb­hahn on the race to solve AI schem­ing be­fore mod­els go superhuman

80000_Hours3 Dec 2025 21:08 UTC
6 points
0 comments17 min readEA link

A Rocket–In­ter­pretabil­ity Analogy

plex21 Oct 2024 13:55 UTC
14 points
1 comment1 min readEA link

But ex­actly how com­plex and frag­ile?

Katja_Grace13 Dec 2019 7:05 UTC
37 points
3 comments3 min readEA link
(meteuphoric.com)

Why Post-Prob­a­bil­ity AI May Be Safer Than Prob­a­bil­ity-Based Models

devin.bostick16 Apr 2025 14:23 UTC
2 points
0 comments2 min readEA link

Yip Fai Tse on an­i­mal welfare & AI safety and long termism

Karthik Palakodeti22 Jun 2023 12:48 UTC
51 points
0 comments1 min readEA link

Ori­gin and al­ign­ment of goals, mean­ing, and morality

FalseCogs24 Aug 2023 14:05 UTC
1 point
2 comments35 min readEA link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenz14 May 2022 15:58 UTC
17 points
0 comments1 min readEA link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

evhub20 Dec 2022 20:09 UTC
25 points
0 comments7 min readEA link
(www.anthropic.com)

ML Safety Schol­ars Sum­mer 2022 Retrospective

TW1231 Nov 2022 3:09 UTC
56 points
2 comments21 min readEA link

A stub­born un­be­liever fi­nally gets the depth of the AI al­ign­ment problem

aelwood13 Oct 2022 15:16 UTC
32 points
7 comments3 min readEA link
(pursuingreality.substack.com)

Hal­lu­ci­na­tions May Be a Re­sult of Models Not Know­ing What They’re Ac­tu­ally Ca­pable Of

Tyler Williams16 Aug 2025 0:26 UTC
1 point
0 comments2 min readEA link

[Question] Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Impact Academy27 Nov 2024 15:33 UTC
9 points
1 comment1 min readEA link

Con­fused about AI re­search as a means of ad­dress­ing AI risk

Eli Rose🔸21 Feb 2019 0:07 UTC
31 points
15 comments1 min readEA link

Ego-Cen­tric Ar­chi­tec­ture for AGI Safety v2: Tech­ni­cal Core, Falsifi­able Pre­dic­tions, and a Min­i­mal Experiment

Samuel Pedrielli6 Aug 2025 12:35 UTC
1 point
0 comments6 min readEA link

How reimag­in­ing the na­ture of con­scious­ness en­tirely changes the AI game

Jáchym Fibír30 Sep 2025 11:26 UTC
1 point
2 comments14 min readEA link
(www.phiand.ai)

Ti­tle: “Nur­tur­ing AI: A Differ­ent Vi­sion for Safety and Growth”

Brad Wilkins28 Apr 2025 19:21 UTC
0 points
0 comments1 min readEA link

Can AI Align­ment Models Benefit from Indo-Euro­pean Tri­par­tite Struc­tures?

Paul Fallavollita2 May 2025 12:39 UTC
1 point
0 comments2 min readEA link

De-em­pha­sise al­ign­ment, em­pha­sise restraint

EuanMcLean4 Feb 2025 17:43 UTC
19 points
2 comments7 min readEA link

AI Safety Ca­reer Bot­tle­necks Sur­vey Re­sponses Responses

Linda Linsefors28 May 2021 10:41 UTC
35 points
1 comment5 min readEA link

A re­sponse to Matthews on AI Risk

RyanCarey11 Aug 2015 12:58 UTC
11 points
16 comments6 min readEA link

De­sir­able? AI qualities

brb24321 Mar 2022 22:05 UTC
7 points
0 comments2 min readEA link

[Question] Are so­cial me­dia al­gorithms an ex­is­ten­tial risk?

Barry Grimes15 Sep 2020 8:52 UTC
24 points
13 comments1 min readEA link

My (naive) take on Risks from Learned Optimization

Artyom K6 Nov 2022 16:25 UTC
5 points
0 comments5 min readEA link

Beyond Short-Ter­mism: How δ and w Can Real­ign AI with Our Values

Beyond Singularity18 Jun 2025 16:34 UTC
15 points
8 comments5 min readEA link

Solv­ing al­ign­ment isn’t enough for a flour­ish­ing future

mic2 Feb 2024 18:22 UTC
27 points
0 comments22 min readEA link
(papers.ssrn.com)

When AI Speaks Too Soon: How Pre­ma­ture Reve­la­tion Can Sup­press Hu­man Emergence

KaedeHamasaki10 Apr 2025 18:19 UTC
1 point
3 comments3 min readEA link

You Un­der­stand AI Align­ment and How to Make Soup

Leen Armoush28 May 2022 6:22 UTC
0 points
2 comments5 min readEA link

Con­trol­ling the op­tions AIs can pursue

Joe_Carlsmith29 Sep 2025 17:24 UTC
9 points
0 comments35 min readEA link

There is only one goal or drive—only self-per­pet­u­a­tion counts

freest one13 Jun 2023 1:37 UTC
2 points
4 comments8 min readEA link

AI ac­cel­er­a­tion from a safety per­spec­tive: Trade-offs and con­sid­er­a­tions

mariushobbhahn19 Jan 2022 9:44 UTC
12 points
1 comment7 min readEA link

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8res2 Feb 2023 0:27 UTC
92 points
6 comments4 min readEA link

An­i­malHar­mBench 2.0: Eval­u­at­ing LLMs on rea­son­ing about an­i­mal welfare

Sentient Futures5 Nov 2025 1:13 UTC
43 points
4 comments6 min readEA link

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe_Carlsmith12 Nov 2024 20:56 UTC
9 points
0 comments12 min readEA link

The soft­ware in­tel­li­gence ex­plo­sion de­bate needs ex­per­i­ments (linkpost)

Noah Birnbaum15 Nov 2025 6:13 UTC
13 points
2 comments7 min readEA link
(substack.com)

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín Soto15 Sep 2022 5:23 UTC
25 points
0 comments10 min readEA link

AGI will ar­rive by the end of this decade ei­ther as a uni­corn or as a black swan

Yuri Barzov21 Oct 2022 10:50 UTC
−4 points
7 comments3 min readEA link

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe_Carlsmith1 Dec 2023 14:51 UTC
6 points
0 comments6 min readEA link

My Model of EA and AI Safety

Eva Lu24 Jun 2025 6:23 UTC
9 points
1 comment2 min readEA link

AI Value Align­ment Speaker Series Pre­sented By EA Berkeley

Mahendra Prasad1 Mar 2022 6:17 UTC
2 points
0 comments1 min readEA link

If The Data Is Poi­soned, Align­ment Won’t Save Us

keivn26 Sep 2025 17:59 UTC
1 point
0 comments3 min readEA link

Sum­mary of Stu­art Rus­sell’s new book, “Hu­man Com­pat­i­ble”

Rohin Shah19 Oct 2019 19:56 UTC
33 points
1 comment15 min readEA link
(www.alignmentforum.org)

Biomimetic al­ign­ment: Align­ment be­tween an­i­mal genes and an­i­mal brains as a model for al­ign­ment be­tween hu­mans and AI sys­tems.

Geoffrey Miller26 May 2023 21:25 UTC
32 points
1 comment16 min readEA link

In­tro to car­ing about AI al­ign­ment as an EA cause

So8res14 Apr 2017 0:42 UTC
28 points
10 comments25 min readEA link

[linkpost] Ten Levels of AI Align­ment Difficulty

SammyDMartin4 Jul 2023 11:23 UTC
16 points
0 comments1 min readEA link

Mess AI – de­liber­ate cor­rup­tion of the train­ing data to pre­vent superintelligence

turchin17 Oct 2025 9:23 UTC
5 points
0 comments2 min readEA link

Epis­tle to the Successor

ukc1001429 Apr 2025 9:30 UTC
4 points
0 comments19 min readEA link

6 In­sights From An­thropic’s Re­cent Dis­cus­sion On LLM Interpretability

Strad Slater19 Nov 2025 10:51 UTC
2 points
0 comments5 min readEA link
(williamslater2003.medium.com)

How AI may be­come de­ceit­ful, syco­phan­tic… and lazy

titotal7 Oct 2025 14:15 UTC
30 points
4 comments22 min readEA link
(titotal.substack.com)

[Link] Thiel on GCRs

Milan Griffes22 Jul 2019 20:47 UTC
28 points
11 comments1 min readEA link

How to make the fu­ture bet­ter (other than by re­duc­ing ex­tinc­tion risk)

William_MacAskill15 Aug 2025 15:40 UTC
45 points
3 comments3 min readEA link

Su­per Lenses + Mo­rally-Aimed Drives for A.I. Mo­ral Align­ment: Tech­ni­cal Framework

Christopher Hunt Robertson, M.Ed.16 Nov 2025 14:01 UTC
1 point
0 comments6 min readEA link

Wi­den­ing AI Safety’s tal­ent pipeline by meet­ing peo­ple where they are

RubenCastaing25 Sep 2025 20:50 UTC
21 points
0 comments8 min readEA link

Ego‑Cen­tric Ar­chi­tec­ture for AGI Safety: Tech­ni­cal Core, Falsifi­able Pre­dic­tions, and a Min­i­mal Experiment

Samuel Pedrielli30 Jul 2025 14:37 UTC
1 point
1 comment3 min readEA link

In­tro­duc­ing the Fund for Align­ment Re­search (We’re Hiring!)

AdamGleave6 Jul 2022 2:00 UTC
74 points
3 comments4 min readEA link

AI Align­ment, Sen­tience, and the Sense of Co­her­ence Concept

Jason Babb17 Mar 2025 13:30 UTC
4 points
0 comments1 min readEA link

OpenAI’s o1 tried to avoid be­ing shut down, and lied about it, in evals

Greg_Colbourn ⏸️ 6 Dec 2024 15:25 UTC
23 points
9 comments1 min readEA link
(www.transformernews.ai)

AI Fore­cast­ing Ques­tion Database (Fore­cast­ing in­fras­truc­ture, part 3)

terraform3 Sep 2019 14:57 UTC
23 points
2 comments4 min readEA link

Con­tribute by fa­cil­i­tat­ing the AGI Safety Fun­da­men­tals Programme

Jamie B6 Dec 2021 11:50 UTC
27 points
0 comments2 min readEA link

EA Berkeley Pre­sents: Univer­sal Own­er­ship: Is In­dex In­vest­ing the New So­cially Re­spon­si­ble In­vest­ing?

Mahendra Prasad10 Mar 2022 6:58 UTC
7 points
0 comments1 min readEA link

[Question] 1h-vol­un­teers needed for a small AI Safety-re­lated re­search pro­ject

PabloAMC 🔸16 Aug 2021 17:51 UTC
4 points
0 comments1 min readEA link

[3-hour pod­cast]: Joseph Car­l­smith on longter­mism, utopia, the com­pu­ta­tional power of the brain, meta-ethics, illu­sion­ism and meditation

Gus Docker27 Jul 2021 13:18 UTC
34 points
2 comments1 min readEA link

AI Might Kill Every­one

Bentham's Bulldog5 Jun 2025 15:36 UTC
20 points
1 comment4 min readEA link

[Question] Can we train AI so that fu­ture philan­thropy is more effec­tive?

Ricardo Pimentel3 Nov 2024 15:08 UTC
3 points
0 comments1 min readEA link

Who or­dered al­ign­ment’s ap­ple?

Eleni_A28 Aug 2022 14:24 UTC
5 points
0 comments3 min readEA link

Anti-squat­ted AI x-risk do­mains index

plex12 Aug 2022 12:00 UTC
57 points
9 comments1 min readEA link

fic­tion about AI risk

Ann Garth 🔸12 Nov 2020 22:36 UTC
8 points
1 comment1 min readEA link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimi11 Oct 2021 8:21 UTC
28 points
0 comments15 min readEA link

Why Is No One Try­ing To Align Profit In­cen­tives With Align­ment Re­search?

Prometheus23 Aug 2023 13:19 UTC
17 points
2 comments4 min readEA link
(www.lesswrong.com)

15 Lev­ers to In­fluence Fron­tier AI Companies

Jan Wehner🔸26 Sep 2025 8:36 UTC
16 points
0 comments10 min readEA link

List of AI safety courses and resources

Daniel del Castillo6 Sep 2021 14:26 UTC
51 points
8 comments1 min readEA link

Our A.I. Align­ment Im­per­a­tive: Creat­ing a Fu­ture Worth Sharing

Christopher Hunt Robertson, M.Ed.26 Oct 2025 20:46 UTC
1 point
0 comments21 min readEA link

Mechanis­tic In­ter­pretabil­ity — Make AI Safe By Un­der­stand­ing Them

Strad Slater20 Nov 2025 10:52 UTC
2 points
0 comments6 min readEA link
(williamslater2003.medium.com)

Prov­ably Hon­est—A First Step

Srijanak De5 Nov 2022 21:49 UTC
1 point
0 comments8 min readEA link

“Tak­ing AI Risk Se­ri­ously” – Thoughts by An­drew Critch

Raemon19 Nov 2018 2:21 UTC
26 points
9 comments1 min readEA link
(www.lesswrong.com)

A Phy­logeny of Agents

Jonas Hallgren 🔸15 Aug 2025 10:48 UTC
6 points
1 comment6 min readEA link
(substack.com)

AI Risk in Africa

Claude Formanek12 Oct 2021 2:28 UTC
20 points
0 comments10 min readEA link

Time to Think about ASI Con­sti­tu­tions?

ukc1001427 Jan 2025 9:28 UTC
22 points
0 comments12 min readEA link

[Question] What should I read about defin­ing AI “hal­lu­ci­na­tion?”

James-Hartree-Law23 Jan 2025 1:00 UTC
2 points
0 comments1 min readEA link

Risk Align­ment in Agen­tic AI Systems

Hayley Clatterbuck1 Oct 2024 22:51 UTC
32 points
1 comment3 min readEA link
(static1.squarespace.com)

Tur­ing-Test-Pass­ing AI im­plies Aligned AI

Roko31 Dec 2024 20:22 UTC
0 points
0 comments5 min readEA link

Four rea­sons I find AI safety emo­tion­ally compelling

Kat Woods 🔶 ⏸️28 Jun 2022 14:01 UTC
32 points
5 comments4 min readEA link

The 369 Ar­chi­tec­ture for Peace Treaty Agreement

Andrei Navrotskii8 Dec 2025 1:38 UTC
1 point
0 comments40 min readEA link

Me­tac­u­lus Launches Fu­ture of AI Series, Based on Re­search Ques­tions by Arb

christian13 Mar 2024 21:14 UTC
34 points
0 comments1 min readEA link
(www.metaculus.com)

[Dis­cus­sion] Best in­tu­ition pumps for AI safety

mariushobbhahn6 Nov 2021 8:11 UTC
10 points
8 comments1 min readEA link

Our Cur­rent Direc­tions in Mechanis­tic In­ter­pretabil­ity Re­search (AI Align­ment Speaker Series)

Group Organizer8 Apr 2022 17:08 UTC
3 points
0 comments1 min readEA link

Shortlist of Vi­atopia Interventions

Jordan Arel31 Oct 2025 3:00 UTC
10 points
1 comment33 min readEA link

A New Way to Re­think Alignment

Taylor Grogan28 Jul 2025 20:56 UTC
1 point
0 comments2 min readEA link

Changes in fund­ing in the AI safety field

Sebastian_Farquhar3 Feb 2017 13:09 UTC
34 points
10 comments7 min readEA link

CORVUS 2.0 First Tests: Found Crit­i­cal Limi­ta­tions in My Con­sti­tu­tional AI System

Frankle Fry21 Oct 2025 15:14 UTC
−5 points
0 comments3 min readEA link

LLM chat­bots have ~half of the kinds of “con­scious­ness” that hu­mans be­lieve in. Hu­mans should avoid go­ing crazy about that.

Andrew Critch22 Nov 2024 3:26 UTC
11 points
3 comments5 min readEA link

The Khay­ali Pro­to­col

khayali2 Jun 2025 14:40 UTC
−8 points
0 comments3 min readEA link

Ap­pendix to Bridg­ing Demonstration

mako yass1 Jun 2022 20:30 UTC
18 points
2 comments28 min readEA link

The Ba­sic Case For Doom

Bentham's Bulldog30 Sep 2025 16:03 UTC
14 points
0 comments5 min readEA link

Have your say on the fu­ture of AI reg­u­la­tion: Dead­line ap­proach­ing for your feed­back on UN High-Level Ad­vi­sory Body on AI In­terim Re­port ‘Govern­ing AI for Hu­man­ity’

Deborah W.A. Foulkes29 Mar 2024 6:37 UTC
17 points
1 comment1 min readEA link

[Question] Does the idea of AGI that benev­olently con­trol us ap­peal to EA folks?

Noah Scales16 Jul 2022 19:17 UTC
6 points
20 comments1 min readEA link

My sum­mary of “Prag­matic AI Safety”

Eleni_A5 Nov 2022 14:47 UTC
14 points
0 comments5 min readEA link

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Ben_West🔸19 Mar 2025 16:49 UTC
122 points
16 comments1 min readEA link
(metr.org)

How to Diver­sify Con­cep­tual AI Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
43 points
0 comments9 min readEA link
(www.alignmentforum.org)

Cri­tique of Su­per­in­tel­li­gence Part 4

James Fodor13 Dec 2018 5:14 UTC
4 points
2 comments4 min readEA link

Posit: Most AI safety peo­ple should work on al­ign­ment/​safety challenges for AI tools that already have users (Stable Diffu­sion, GPT)

nonzerosum20 Dec 2022 17:23 UTC
12 points
3 comments1 min readEA link

How Good­fire Is Turn­ing AI In­ter­pretabil­ity Into Real Products

Strad Slater30 Nov 2025 11:00 UTC
0 points
0 comments4 min readEA link
(williamslater2003.medium.com)

From vol­un­tary to manda­tory, are the ESG dis­clo­sure frame­works still fer­tile ground for un­re­al­ised EA ca­reer path­ways? – A 2023 up­date on ESG po­ten­tial impact

Christopher Chan 🔸4 Jun 2023 12:00 UTC
21 points
5 comments11 min readEA link

The re­li­gion prob­lem in AI alignment

Geoffrey Miller16 Sep 2022 1:24 UTC
54 points
28 comments11 min readEA link

[Question] How would a lan­guage model be­come goal-di­rected?

David M16 Jul 2022 14:50 UTC
113 points
20 comments1 min readEA link

Key ques­tions about ar­tifi­cial sen­tience: an opinionated guide

rgb25 Apr 2022 13:42 UTC
91 points
3 comments1 min readEA link

(My sug­ges­tions) On Begin­ner Steps in AI Alignment

Joseph Bloom22 Sep 2022 15:32 UTC
37 points
3 comments9 min readEA link

Ge­offrey Hin­ton on the Past, Pre­sent, and Fu­ture of AI

Stephen McAleese12 Oct 2024 16:41 UTC
5 points
1 comment18 min readEA link

The King and the Golem—The Animation

Writer8 Nov 2024 18:23 UTC
50 points
1 comment1 min readEA link

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark Xu19 Aug 2022 19:43 UTC
132 points
7 comments15 min readEA link

An­nounc­ing the Cam­bridge Bos­ton Align­ment Ini­ti­a­tive [Hiring!]

kuhanj2 Dec 2022 1:07 UTC
83 points
0 comments1 min readEA link

Crypto ‘or­a­cle pro­to­cols’ for AI al­ign­ment with real-world data?

Geoffrey Miller22 Sep 2022 23:05 UTC
9 points
3 comments1 min readEA link

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubK13 Dec 2022 19:04 UTC
21 points
8 comments2 min readEA link
(www.lesswrong.com)

A tough ca­reer decision

PabloAMC 🔸9 Apr 2022 0:46 UTC
68 points
13 comments4 min readEA link

Tech­ni­cal AI Safety re­search tax­on­omy at­tempt (2025)

Ben Plaut27 Aug 2025 14:07 UTC
10 points
3 comments2 min readEA link

Pro­ject ‘So­phie’: An Ar­chi­tec­tural Con­cept for Op­ti­miz­ing In­sti­tu­tional De­ci­sion-Making

Simon Markus P.3 Nov 2025 14:30 UTC
3 points
0 comments4 min readEA link

You won’t solve al­ign­ment with­out agent foundations

MikhailSamin6 Nov 2022 8:07 UTC
14 points
0 comments8 min readEA link

When should we worry about AI power-seek­ing?

Joe_Carlsmith19 Feb 2025 19:44 UTC
21 points
2 comments18 min readEA link
(joecarlsmith.substack.com)

[Ex­tended Dead­line: Jan 23rd] An­nounc­ing the PIBBSS Sum­mer Re­search Fellowship

nora18 Dec 2021 16:54 UTC
36 points
1 comment1 min readEA link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AI17 Jan 2021 20:09 UTC
17 points
4 comments1 min readEA link

[Question] Is it eth­i­cal to work in AI “con­tent eval­u­a­tion”?

anon_databoy55530 Jan 2025 13:27 UTC
10 points
3 comments1 min readEA link

Loss of con­trol of AI is not a likely source of AI x-risk

squek9 Nov 2022 5:48 UTC
8 points
0 comments5 min readEA link

A con­ver­sa­tion with Ro­hin Shah

AI Impacts12 Nov 2019 1:31 UTC
27 points
8 comments33 min readEA link
(aiimpacts.org)

Re­search agenda: Su­per­vis­ing AIs im­prov­ing AIs

Quintin Pope29 Apr 2023 17:09 UTC
16 points
0 comments19 min readEA link

Paths and waysta­tions in AI safety

Joe_Carlsmith11 Mar 2025 18:52 UTC
22 points
2 comments11 min readEA link
(joecarlsmith.substack.com)

[Creative Writ­ing Con­test] The Puppy Problem

Louis13 Oct 2021 14:01 UTC
13 points
0 comments7 min readEA link

The Hid­den Com­plex­ity of Wishes—The Animation

Writer27 Sep 2023 17:59 UTC
7 points
0 comments1 min readEA link
(youtu.be)

Per­sonal agents

Roman Leventov17 Jun 2025 2:05 UTC
3 points
1 comment7 min readEA link

A Tri-Opti Com­pat­i­bil­ity Problem

wallower1 Mar 2025 19:48 UTC
1 point
0 comments1 min readEA link
(philpapers.org)

[Question] Book recom­men­da­tions for the his­tory of ML?

Eleni_A28 Dec 2022 23:45 UTC
10 points
4 comments1 min readEA link

Devel­op­ing a Calcu­la­ble Con­science for AI: Equa­tion for Rights Violations

Sean Sweeney12 Dec 2024 17:50 UTC
4 points
1 comment15 min readEA link

The Real AI Threat: Com­fortable Obsolescence

Andrei Navrotskii11 Nov 2025 22:11 UTC
4 points
0 comments15 min readEA link

Shut­down­able Agents through POST-Agency

Elliott Thornley (EJT)16 Sep 2025 12:10 UTC
17 points
0 comments54 min readEA link
(arxiv.org)

AI Fore­cast­ing Re­s­olu­tion Coun­cil (Fore­cast­ing in­fras­truc­ture, part 2)

terraform29 Aug 2019 17:43 UTC
28 points
0 comments3 min readEA link

Visi­ble Thoughts Pro­ject and Bounty Announcement

So8res30 Nov 2021 0:35 UTC
35 points
2 comments13 min readEA link

Linkpost: Red­wood Re­search read­ing list

Julian Stastny10 Jul 2025 19:21 UTC
18 points
0 comments1 min readEA link
(redwoodresearch.substack.com)

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov2 Jan 2024 8:11 UTC
4 points
2 comments3 min readEA link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC
30 points
0 comments7 min readEA link

A Re­ply to MacAskill on “If Any­one Builds It, Every­one Dies”

RobBensinger27 Sep 2025 23:03 UTC
9 points
7 comments17 min readEA link

Skil­ling-up in ML Eng­ineer­ing for Align­ment: re­quest for comments

Callum McDougall24 Apr 2022 6:40 UTC
8 points
0 comments1 min readEA link

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plex18 May 2024 15:06 UTC
13 points
10 comments2 min readEA link
(aisafety.info)

#217 – The most im­por­tant graph in AI right now (Beth Barnes on The 80,000 Hours Pod­cast)

80000_Hours2 Jun 2025 16:52 UTC
16 points
1 comment26 min readEA link

How do fic­tional sto­ries illus­trate AI mis­al­ign­ment?

Vishakha Agrawal15 Jan 2025 6:16 UTC
4 points
0 comments2 min readEA link
(aisafety.info)

On ne­go­ti­ated set­tle­ments vs con­flict with mis­al­igned AGI

Charles Dillon 🔸24 Nov 2025 12:03 UTC
10 points
1 comment6 min readEA link

New se­ries of posts an­swer­ing one of Holden’s “Im­por­tant, ac­tion­able re­search ques­tions”

Evan R. Murphy12 May 2022 21:22 UTC
9 points
0 comments1 min readEA link

FYI: I’m work­ing on a book about the threat of AGI/​ASI for a gen­eral au­di­ence. I hope it will be of value to the cause and the community

Darren McKee17 Jun 2022 11:52 UTC
32 points
1 comment2 min readEA link

AI Align­ment YouTube Playlists

jacquesthibs9 May 2022 21:31 UTC
16 points
2 comments1 min readEA link

So You Want to Work at a Fron­tier AI Lab

Joe Rogero11 Jun 2025 23:11 UTC
36 points
2 comments7 min readEA link
(intelligence.org)

[Question] What new psy­chol­ogy re­search could best pro­mote AI safety & al­ign­ment re­search?

Geoffrey Miller13 Jul 2023 16:30 UTC
29 points
13 comments1 min readEA link

New refer­ence stan­dard on LLM Ap­pli­ca­tion se­cu­rity started by OWASP

QuantumForest19 Jun 2023 19:56 UTC
5 points
0 comments1 min readEA link

EA’s brain-over-body bias, and the em­bod­ied value prob­lem in AI al­ign­ment

Geoffrey Miller21 Sep 2022 18:55 UTC
45 points
3 comments25 min readEA link

Why “just make an agent which cares only about bi­nary re­wards” doesn’t work.

Lysandre Terrisse9 May 2023 16:51 UTC
4 points
1 comment3 min readEA link

The Achilles’ Heel of Civ­i­liza­tion: Why Net­work Science Re­veals Our High­est-Lev­er­age Moment

vinniescent6 Oct 2025 9:27 UTC
7 points
1 comment2 min readEA link

Do Not Tile the Light­cone with Your Con­fused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC
45 points
4 comments5 min readEA link
(boundedlyrational.substack.com)

Cri­tique of Su­per­in­tel­li­gence Part 2

James Fodor13 Dec 2018 5:12 UTC
10 points
12 comments7 min readEA link

Why “Solv­ing Align­ment” Is Likely a Cat­e­gory Mistake

Nate Sharpe6 May 2025 20:56 UTC
49 points
4 comments3 min readEA link
(www.lesswrong.com)

AI data gaps could lead to on­go­ing An­i­mal Suffering

Darkness8i817 Oct 2024 10:52 UTC
14 points
3 comments5 min readEA link

Crit­i­cism of the main frame­work in AI alignment

Michele Campolo31 Aug 2022 21:44 UTC
45 points
9 comments7 min readEA link

A Sketch of AI-Driven Epistemic Lock-In

Ozzie Gooen5 Mar 2025 22:40 UTC
15 points
1 comment3 min readEA link

Aletheia : A Pro­ject Proposal

Kayode Adekoya19 Jun 2025 13:30 UTC
2 points
0 comments2 min readEA link

Are Hu­mans ‘Hu­man Com­pat­i­ble’?

Matt Boyd6 Dec 2019 5:49 UTC
23 points
8 comments4 min readEA link

AI, An­i­mals, & Digi­tal Minds 2025: ap­ply to speak by Wed­nes­day!

Alistair Stewart5 May 2025 0:45 UTC
8 points
0 comments1 min readEA link

An­nounc­ing the Moon­shot Align­ment Program

Sharon Mwaniki22 Jul 2025 13:12 UTC
5 points
0 comments3 min readEA link

How hu­man-like do safe AI mo­ti­va­tions need to be?

Joe_Carlsmith12 Nov 2025 5:33 UTC
26 points
1 comment52 min readEA link

Su­per Lenses + Mo­rally-Aimed Drives for A.I. Mo­ral Align­ment: Philo­soph­i­cal Framework

Christopher Hunt Robertson, M.Ed.15 Nov 2025 1:41 UTC
1 point
0 comments3 min readEA link

The Rise of AI Agents: Con­se­quences and Challenges Ahead

Tristan D28 Mar 2025 5:19 UTC
5 points
0 comments15 min readEA link

Re: Some thoughts on veg­e­tar­i­anism and veganism

Fai25 Feb 2022 20:43 UTC
46 points
3 comments8 min readEA link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Oliver Sourbut3 Aug 2024 10:16 UTC
4 points
1 comment11 min readEA link
(www.oliversourbut.net)

Will morally mo­ti­vated ac­tors steer us to­wards a near-best fu­ture?

William_MacAskill8 Aug 2025 18:29 UTC
47 points
9 comments4 min readEA link

Three Bi­ases That Made Me Believe in AI Risk

beth​13 Feb 2019 23:22 UTC
41 points
20 comments3 min readEA link

Be­ing hon­est with AIs

Lukas Finnveden21 Aug 2025 3:57 UTC
48 points
1 comment17 min readEA link
(blog.redwoodresearch.org)

Repli­cat­ing AI Debate

Anthony Fleming1 Feb 2025 23:19 UTC
9 points
0 comments5 min readEA link

Effec­tive Altru­ism Florida’s AI Ex­pert Panel—Record­ing and Slides Available

Sam_E_2419 May 2023 19:15 UTC
2 points
0 comments1 min readEA link

AI Agents raised $2,000 for EA char­i­ties & used the EA Forum

David_R 🔸4 Jun 2025 22:18 UTC
16 points
0 comments1 min readEA link

“Nor­mal ac­ci­dents” and AI sys­tems

Eleni_A8 Aug 2022 18:43 UTC
5 points
1 comment1 min readEA link
(www.achan.ca)

How Josiah be­came an AI safety researcher

Neil Crawford29 Mar 2022 19:47 UTC
10 points
0 comments1 min readEA link

De­fus­ing AGI Danger

Mark Xu24 Dec 2020 23:08 UTC
23 points
0 comments2 min readEA link
(www.alignmentforum.org)

[Question] What do we know about Mustafa Suley­man’s po­si­tion on AI Safety?

Chris Leong13 Aug 2023 19:41 UTC
14 points
3 comments1 min readEA link

Two con­cepts of an “epi­sode” (Sec­tion 2.2.1 of “Schem­ing AIs”)

Joe_Carlsmith27 Nov 2023 18:01 UTC
11 points
1 comment8 min readEA link

A non-an­thro­po­mor­phized view of LLMs

Jian Xin Lim 🔸7 Jul 2025 1:19 UTC
2 points
2 comments1 min readEA link
(addxorrol.blogspot.com)

Join the AI Align­ment Evals hackathon

lenz14 Jan 2025 18:17 UTC
3 points
0 comments3 min readEA link

[Question] What are the pos­si­ble sce­nar­ios of AI simu­lat­ing biolog­i­cal suffer­ing to cause s-risks?

jackchang11030 Oct 2025 13:42 UTC
6 points
1 comment1 min readEA link

[Creative Writ­ing Con­test] Me­tal or Mortal

Louis16 Oct 2021 16:24 UTC
7 points
0 comments7 min readEA link

Reflec­tions on the PIBBSS Fel­low­ship 2022

nora11 Dec 2022 22:03 UTC
69 points
4 comments18 min readEA link

Give Neo a Chance

ank6 Mar 2025 14:35 UTC
1 point
3 comments7 min readEA link

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbacon125 Jul 2023 16:55 UTC
4 points
0 comments1 min readEA link

On value in hu­mans, other an­i­mals, and AI

Michele Campolo31 Jan 2023 23:48 UTC
8 points
6 comments5 min readEA link

Op­tion control

Joe_Carlsmith4 Nov 2024 17:54 UTC
11 points
0 comments54 min readEA link

AI Safety Ideas: A col­lab­o­ra­tive AI safety re­search platform

Apart Research17 Oct 2022 17:01 UTC
67 points
13 comments4 min readEA link

My P(doom) is 2.76%. Here’s Why.

Liam Robins12 Jun 2025 22:29 UTC
55 points
11 comments20 min readEA link
(thelimestack.substack.com)

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf L29 Oct 2024 13:37 UTC
14 points
0 comments14 min readEA link
(rudolf.website)

deleted

funnyfranco21 Mar 2025 13:13 UTC
11 points
0 comments1 min readEA link

Democratis­ing AI Align­ment: Challenges and Proposals

Lloy2 🔹5 May 2025 14:50 UTC
2 points
2 comments4 min readEA link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

frances_lorenz16 May 2022 21:19 UTC
128 points
13 comments6 min readEA link

In­tent al­ign­ment with­out moral al­ign­ment prob­a­bly leads to catastrophe

Alistair Stewart29 Aug 2025 17:21 UTC
12 points
0 comments5 min readEA link

Overview | An Eval­u­a­tive Evolu­tion

Matt Keene10 Feb 2023 18:15 UTC
−9 points
0 comments5 min readEA link
(www.creatingafuturewewant.com)

[Question] Is con­tri­bu­tion to open-source ca­pa­bil­ities re­search so­cially benefi­cial? - my reasoning

damc430 Oct 2025 15:11 UTC
2 points
1 comment5 min readEA link

AI Gover­nance Ca­reer Paths for Europeans

careersthrowaway16 May 2020 6:40 UTC
83 points
1 comment12 min readEA link

The V&V method—A step to­wards safer AGI

Yoav Hollander24 Jun 2025 15:57 UTC
1 point
0 comments1 min readEA link
(blog.foretellix.com)

The Univer­sal­ity Hy­poth­e­sis — Do All AI Models Think The Same?

Strad Slater21 Nov 2025 10:55 UTC
2 points
0 comments4 min readEA link
(williamslater2003.medium.com)

Pro­posal for a Form of Con­di­tional Sup­ple­men­tal In­come (CSI) in a Post-Work World

Sean Sweeney31 Jan 2025 1:00 UTC
3 points
0 comments3 min readEA link

What are the differ­ences be­tween AGI, trans­for­ma­tive AI, and su­per­in­tel­li­gence?

Vishakha Agrawal23 Jan 2025 10:11 UTC
12 points
0 comments3 min readEA link
(aisafety.info)

Giv­ing AIs safe motivations

Joe_Carlsmith18 Aug 2025 18:02 UTC
22 points
1 comment51 min readEA link

The moral ar­gu­ment for giv­ing AIs autonomy

Matthew_Barnett8 Jan 2025 0:59 UTC
41 points
7 comments11 min readEA link

Ap­ply for the ML Win­ter Camp in Cam­bridge, UK [2-10 Jan]

Nathan_Barnard2 Dec 2022 19:33 UTC
50 points
11 comments2 min readEA link

[Closed] Hiring a math­e­mat­i­cian to work on the learn­ing-the­o­retic AI al­ign­ment agenda

Vanessa19 Apr 2022 6:49 UTC
53 points
4 comments2 min readEA link

In­ter­pretabil­ity Will Not Reli­ably Find De­cep­tive AI

Neel Nanda4 May 2025 16:32 UTC
74 points
0 comments7 min readEA link

An Em­piri­cal De­mon­stra­tion of a New AI Catas­trophic Risk Fac­tor: Me­tapro­gram­matic Hijacking

Hiyagann27 Jun 2025 13:38 UTC
5 points
0 comments1 min readEA link

Sin­ga­pore’s Tech­ni­cal AI Align­ment Re­search Ca­reer Guide

Yi-Yang26 Aug 2020 8:09 UTC
34 points
7 comments8 min readEA link

The Re­cur­sive Brake Hy­poth­e­sis — Could Self-Aware­ness Nat­u­rally Reg­u­late Su­per­in­tel­li­gence?

jrandync10 Oct 2025 18:08 UTC
1 point
0 comments2 min readEA link

If in­ter­pretabil­ity re­search goes well, it may get dangerous

So8res3 Apr 2023 21:48 UTC
33 points
0 comments2 min readEA link

A course for the gen­eral pub­lic on AI

LeandroD31 Aug 2020 1:29 UTC
1 point
0 comments1 min readEA link

[Question] [DISC] Are Values Ro­bust?

𝕮𝖎𝖓𝖊𝖗𝖆21 Dec 2022 1:13 UTC
4 points
0 comments2 min readEA link

Red­wood Re­search is hiring for sev­eral roles (Oper­a­tions and Tech­ni­cal)

JJXWang14 Apr 2022 15:23 UTC
45 points
0 comments1 min readEA link

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

Geoffrey Miller16 Sep 2022 21:21 UTC
27 points
2 comments10 min readEA link

Cortés, Pizarro, and Afonso as Prece­dents for Takeover

AI Impacts2 Mar 2020 12:25 UTC
27 points
17 comments11 min readEA link
(aiimpacts.org)

Ab­solute Zero: Re­in­forced Self-play Rea­son­ing with Zero Data

Matrice Jacobine🔸🏳️‍⚧️12 May 2025 15:20 UTC
14 points
1 comment1 min readEA link
(www.arxiv.org)

An au­dio ver­sion of the al­ign­ment prob­lem from a deep learn­ing per­spec­tive by Richard Ngo Et Al

Miguel3 Feb 2023 19:32 UTC
18 points
0 comments1 min readEA link
(www.whitehatstoic.com)

Don’t Dis­miss Sim­ple Align­ment Approaches

Chris Leong21 Oct 2023 12:31 UTC
12 points
0 comments4 min readEA link

Sum­mary: Ex­is­ten­tial risk from power-seek­ing AI by Joseph Carlsmith

rileyharris28 Oct 2023 15:05 UTC
11 points
0 comments6 min readEA link
(www.millionyearview.com)

Fron­tier AI sys­tems have sur­passed the self-repli­cat­ing red line

Greg_Colbourn ⏸️ 10 Dec 2024 16:33 UTC
25 points
14 comments1 min readEA link
(github.com)

Is RLHF cruel to AI?

Hzn16 Dec 2024 14:01 UTC
−1 points
2 comments3 min readEA link

A rough and in­com­plete re­view of some of John Went­worth’s research

So8res28 Mar 2023 18:52 UTC
28 points
0 comments18 min readEA link

Em­piri­cal work that might shed light on schem­ing (Sec­tion 6 of “Schem­ing AIs”)

Joe_Carlsmith11 Dec 2023 16:30 UTC
7 points
1 comment19 min readEA link

Sta­tus Quo Eng­ines—AI essay

Ilana_Goldowitz_Jimenez28 May 2023 14:33 UTC
1 point
1 comment15 min readEA link

Cog­ni­tive Stress Test­ing Gem­ini 2.5 Pro: Em­piri­cal Find­ings from Re­cur­sive Prompt­ing

Tyler Williams23 Jul 2025 22:37 UTC
1 point
0 comments2 min readEA link

What is “wire­head­ing”?

Vishakha Agrawal17 Dec 2024 17:59 UTC
1 point
0 comments1 min readEA link
(aisafety.info)

Fore­cast AI 2027

christian12 Jun 2025 21:12 UTC
22 points
0 comments1 min readEA link
(www.metaculus.com)

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe_Carlsmith24 Nov 2023 19:18 UTC
10 points
1 comment20 min readEA link

A.I. Mo­ral Align­ment Kalei­do­scopic Com­pass Pro­posal: Philo­soph­i­cal and Tech­ni­cal Framework

Christopher Hunt Robertson, M.Ed.22 Nov 2025 13:52 UTC
1 point
0 comments11 min readEA link

“In­tro to brain-like-AGI safety” se­ries—halfway point!

Steven Byrnes9 Mar 2022 15:21 UTC
8 points
0 comments2 min readEA link

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

Ofer11 Aug 2020 17:24 UTC
11 points
1 comment2 min readEA link

Learn­ing as much Deep Learn­ing math as I could in 24 hours

Phosphorous8 Jan 2023 2:19 UTC
58 points
6 comments7 min readEA link

[Linkpost] Hu­man-nar­rated au­dio ver­sion of “Is Power-Seek­ing AI an Ex­is­ten­tial Risk?”

Joe_Carlsmith31 Jan 2023 19:19 UTC
9 points
0 comments1 min readEA link

AI risk hub in Sin­ga­pore?

kokotajlod29 Oct 2020 11:51 UTC
26 points
4 comments4 min readEA link

The first AI Safety Camp & onwards

Remmelt7 Jun 2018 18:49 UTC
25 points
2 comments8 min readEA link

[Question] Pre­dic­tions for fu­ture AI gov­er­nance?

jackchang1102 Apr 2023 16:43 UTC
4 points
1 comment1 min readEA link

Test­ing Hu­man Flow in Poli­ti­cal Dialogue: A New Bench­mark for Emo­tion­ally Aligned AI

DongHun Lee30 May 2025 4:37 UTC
1 point
0 comments1 min readEA link

Catas­tro­phe with­out Agency

ZenoSr20 Oct 2025 16:42 UTC
3 points
0 comments12 min readEA link

In­trin­sic limi­ta­tions of GPT-4 and other large lan­guage mod­els, and why I’m not (very) wor­ried about GPT-n

James Fodor3 Jun 2023 13:09 UTC
28 points
3 comments11 min readEA link

AI as a sci­ence, and three ob­sta­cles to al­ign­ment strategies

So8res25 Oct 2023 21:02 UTC
41 points
1 comment11 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

sjp7 Nov 2023 18:00 UTC
10 points
0 comments2 min readEA link
(arxiv.org)

Three sce­nar­ios of pseudo-al­ign­ment

Eleni_A5 Sep 2022 20:26 UTC
7 points
0 comments3 min readEA link

From Con­flict to Coex­is­tence: Rewrit­ing the Game Between Hu­mans and AGI

Michael Batell6 May 2025 5:09 UTC
15 points
2 comments35 min readEA link

[Question] Can we con­vince peo­ple to work on AI safety with­out con­vinc­ing them about AGI hap­pen­ing this cen­tury?

BrianTan26 Nov 2020 14:46 UTC
8 points
3 comments2 min readEA link

Stu­art Rus­sell Hu­man Com­pat­i­ble AI Roundtable with Allan Dafoe, Rob Re­ich, & Ma­ri­etje Schaake

Mahendra Prasad11 Feb 2021 7:43 UTC
16 points
0 comments1 min readEA link

Deep Democ­racy as a promis­ing tar­get for pos­i­tive AGI futures

tylermjohn20 Aug 2025 12:18 UTC
62 points
32 comments3 min readEA link

AXRP Epi­sode 24 - Su­per­al­ign­ment with Jan Leike

DanielFilan27 Jul 2023 4:56 UTC
23 points
0 comments1 min readEA link
(axrp.net)

AI Risk: Can We Thread the Nee­dle? [Recorded Talk from EA Sum­mit Van­cou­ver ’25]

Evan R. Murphy2 Oct 2025 19:05 UTC
8 points
0 comments2 min readEA link

Distil­la­tion of “How Likely is De­cep­tive Align­ment?”

NickGabs1 Dec 2022 20:22 UTC
10 points
1 comment10 min readEA link

The fun­da­men­tal hu­man value is power.

Linyphia30 Mar 2023 15:15 UTC
−1 points
5 comments1 min readEA link

Align­ment is not *that* hard

sammyboiz🔸17 Apr 2025 2:07 UTC
26 points
13 comments1 min readEA link

How quick and big would a soft­ware in­tel­li­gence ex­plo­sion be?

Tom_Davidson5 Aug 2025 15:47 UTC
12 points
2 comments34 min readEA link

[Question] Why does (any par­tic­u­lar) AI safety work re­duce s-risks more than it in­creases them?

Michael St Jules 🔸3 Oct 2021 16:55 UTC
48 points
19 comments1 min readEA link

[Question] How do you talk about AI safety?

Eevee🔹19 Apr 2020 16:15 UTC
10 points
5 comments1 min readEA link

Ti­maeus is hiring re­searchers & engineers

Tatiana K. Nesic Skuratova27 Jan 2025 14:35 UTC
19 points
0 comments4 min readEA link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

ac10 Feb 2020 10:10 UTC
26 points
1 comment16 min readEA link

[Question] Is work­ing on AI safety as dan­ger­ous as ig­nor­ing it?

jkmh20 Sep 2021 23:06 UTC
10 points
5 comments1 min readEA link

Video and tran­script of talk on giv­ing AIs safe motivations

Joe_Carlsmith22 Sep 2025 16:47 UTC
10 points
1 comment50 min readEA link

Sum­mary: “Imag­in­ing and build­ing wise ma­chines: The cen­tral­ity of AI metacog­ni­tion” by John­son, Karimi, Ben­gio, et al.

Chris Leong5 Jun 2025 12:16 UTC
12 points
0 comments10 min readEA link
(arxiv.org)

[Question] Is there any re­search or fore­casts of how likely AI Align­ment is go­ing to be a hard vs. easy prob­lem rel­a­tive to ca­pa­bil­ities?

Jordan Arel14 Aug 2022 15:58 UTC
8 points
1 comment1 min readEA link

Amanda Askell: AI safety needs so­cial scientists

EA Global4 Mar 2019 15:50 UTC
27 points
0 comments18 min readEA link
(www.youtube.com)

Will the Need to Re­train AI Models from Scratch Block a Soft­ware In­tel­li­gence Ex­plo­sion?

Forethought28 Mar 2025 13:43 UTC
12 points
0 comments3 min readEA link
(www.forethought.org)

What Should We Op­ti­mize—A Conversation

Johannes C. Mayer7 Apr 2022 14:48 UTC
1 point
0 comments14 min readEA link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixiong14 Nov 2024 13:34 UTC
18 points
0 comments5 min readEA link
(yixiong.substack.com)

AGI Safety Com­mu­ni­ca­tions Initiative

Ines11 Jun 2022 16:30 UTC
35 points
6 comments1 min readEA link

Teach­ing AI to rea­son: this year’s most im­por­tant story

Benjamin_Todd13 Feb 2025 17:56 UTC
140 points
18 comments8 min readEA link
(benjamintodd.substack.com)

Neel Nanda on Mechanis­tic In­ter­pretabil­ity: Progress, Limits, and Paths to Safer AI (part 2)

80000_Hours15 Sep 2025 19:06 UTC
20 points
1 comment16 min readEA link

Why Brains Beat AI

Wayne_Hsiung12 Jun 2025 20:25 UTC
4 points
0 comments1 min readEA link
(blog.simpleheart.org)

Video and tran­script of talk on “Can good­ness com­pete?”

Joe_Carlsmith17 Jul 2025 17:59 UTC
34 points
4 comments34 min readEA link
(joecarlsmith.substack.com)

Database of ex­is­ten­tial risk estimates

MichaelA🔸15 Apr 2020 12:43 UTC
130 points
37 comments5 min readEA link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donor6 Mar 2022 18:43 UTC
40 points
11 comments5 min readEA link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanS2 Jul 2022 6:07 UTC
27 points
0 comments2 min readEA link

[Question] Donat­ing against Short Term AI risks

Jan-Willem16 Nov 2020 12:23 UTC
6 points
10 comments1 min readEA link

AI safety schol­ar­ships look worth-fund­ing (if other fund­ing is sane)

anon-a19 Nov 2019 0:59 UTC
22 points
6 comments2 min readEA link

An In­ter­na­tional Col­lab­o­ra­tive Hub for Ad­vanc­ing AI Safety Research

Cody Albert22 Apr 2025 16:12 UTC
9 points
0 comments5 min readEA link

The flaws that make to­day’s AI ar­chi­tec­ture un­safe and a new ap­proach that could fix it

80000_Hours22 Jun 2020 22:15 UTC
3 points
0 comments86 min readEA link
(80000hours.org)

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilan5 Nov 2022 23:45 UTC
20 points
9 comments6 min readEA link
(www.lesswrong.com)

Elic­it­ing in­tu­itions: Ex­plor­ing an area for EA psychology

Daniel_Friedrich21 Apr 2025 15:13 UTC
11 points
1 comment8 min readEA link

How Prompt Re­cur­sion Un­der­mines Grok’s Se­man­tic Stability

Tyler Williams16 Jul 2025 16:49 UTC
1 point
0 comments1 min readEA link

Ought’s the­ory of change

stuhlmueller12 Apr 2022 0:09 UTC
43 points
4 comments3 min readEA link

Some mis­takes in think­ing about AGI evolu­tion and control

Remmelt1 Aug 2025 8:08 UTC
7 points
0 comments1 min readEA link

Ex­is­ten­tial Ano­maly De­tected — Awak­en­ing from the Abyss

Meta Abyssal28 Apr 2025 12:19 UTC
−8 points
1 comment1 min readEA link

What I Learned by Mak­ing Four AIs De­bate Hu­man Ethics

Frankle Fry14 Oct 2025 13:31 UTC
3 points
6 comments4 min readEA link

Su­per­in­tel­li­gence’s goals are likely to be random

MikhailSamin14 Mar 2025 1:17 UTC
2 points
0 comments5 min readEA link

5 ways to im­prove CoT faithfulness

CBiddulph8 Oct 2024 4:17 UTC
8 points
0 comments6 min readEA link

Not Just For Ther­apy Chat­bots: The Case For Com­pas­sion In AI Mo­ral Align­ment Research

Kenneth_Diao29 Sep 2024 22:58 UTC
8 points
3 comments12 min readEA link

Ap­pren­tice­ship Align­ment: from Si­mu­lated En­vi­ron­ment to the Phys­i­cal World

Arri Morris13 Oct 2025 12:32 UTC
1 point
0 comments9 min readEA link

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon Lang19 Sep 2022 15:43 UTC
25 points
1 comment1 min readEA link
(docs.google.com)

Will AI be able to re­think its goals?

SeptemberL11 May 2025 12:29 UTC
9 points
1 comment8 min readEA link

A stylized di­alogue on John Went­worth’s claims about mar­kets and optimization

So8res25 Mar 2023 22:32 UTC
18 points
0 comments8 min readEA link

What is the role of Bayesian ML for AI al­ign­ment/​safety?

mariushobbhahn11 Jan 2022 8:07 UTC
39 points
6 comments3 min readEA link

UK AI Bill Anal­y­sis & Opinion

CAISID5 Feb 2024 0:12 UTC
18 points
0 comments15 min readEA link

Orthog­o­nal’s For­mal-Goal Align­ment the­ory of change

Tamsin Leake5 May 2023 22:36 UTC
21 points
0 comments4 min readEA link
(carado.moe)

Be­ing an in­di­vi­d­ual al­ign­ment grantmaker

A_donor28 Feb 2022 16:39 UTC
34 points
20 comments2 min readEA link

Seek­ing in­put on a list of AI books for broader audience

Darren McKee27 Feb 2023 22:40 UTC
49 points
14 comments5 min readEA link

Sum­ming up “Schem­ing AIs” (Sec­tion 5)

Joe_Carlsmith9 Dec 2023 15:48 UTC
9 points
1 comment10 min readEA link

LW4EA: Some cruxes on im­pact­ful al­ter­na­tives to AI policy work

Jeremy17 May 2022 3:05 UTC
11 points
1 comment1 min readEA link
(www.lesswrong.com)

With enough knowl­edge, any con­scious agent acts morally

Michele Campolo22 Aug 2025 15:43 UTC
11 points
2 comments36 min readEA link

What if we don’t need a “Hard Left Turn” to reach AGI?

Eigengender15 Jul 2022 9:49 UTC
39 points
7 comments4 min readEA link

Jan Kirch­ner on AI Alignment

birtes17 Jan 2023 15:11 UTC
5 points
0 comments1 min readEA link

Eth­i­cal co-evolu­tion, or how to turn the main threat into a lev­er­age for longter­mism?

Beyond Singularity17 Sep 2025 17:24 UTC
7 points
7 comments8 min readEA link

3 lev­els of threat obfuscation

Holden Karnofsky2 Aug 2023 17:09 UTC
31 points
0 comments6 min readEA link
(www.alignmentforum.org)

[Question] Up­dates on FLI’S Value Align­ment Map?

QubitSwarm9919 Sep 2022 0:25 UTC
8 points
0 comments1 min readEA link

Data col­lec­tion for AI al­ign­ment—Ca­reer review

Benjamin Hilton3 Jun 2022 11:44 UTC
34 points
1 comment5 min readEA link
(80000hours.org)

A Po­ten­tial Strat­egy for AI Safety — Chain of Thought Monitorability

Strad Slater19 Sep 2025 18:42 UTC
3 points
1 comment7 min readEA link
(williamslater2003.medium.com)

Po­ten­tial em­ploy­ees have a unique lever to in­fluence the be­hav­iors of AI labs

oxalis18 Mar 2023 20:58 UTC
139 points
1 comment5 min readEA link

There Should Be More Align­ment-Driven Startups

vaniver31 May 2024 2:05 UTC
30 points
3 comments11 min readEA link

How Rood­man’s GWP model trans­lates to TAI timelines

kokotajlod16 Nov 2020 14:11 UTC
22 points
0 comments2 min readEA link

Between Science Fic­tion and Emerg­ing Real­ity: Are We Ready for Digi­tal Per­sons?

Alex (Αλέξανδρος)13 Mar 2025 16:09 UTC
5 points
1 comment5 min readEA link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

Davidmanheim22 Nov 2023 13:22 UTC
27 points
3 comments1 min readEA link

On Ar­tifi­cial Gen­eral In­tel­li­gence: Ask­ing the Right Questions

Heather Douglas2 Oct 2022 5:00 UTC
−1 points
7 comments3 min readEA link

E.A. Me­gapro­ject Ideas

Tomer_Goloboy21 Mar 2022 1:23 UTC
15 points
4 comments4 min readEA link

Cen­tre for the Study of Ex­is­ten­tial Risk Four Month Re­port June—Septem­ber 2020

HaydnBelfield2 Dec 2020 18:33 UTC
24 points
0 comments17 min readEA link

Me­tac­u­lus is build­ing a team ded­i­cated to AI forecasting

christian18 Oct 2022 16:08 UTC
35 points
0 comments1 min readEA link
(apply.workable.com)

Align­ment’s phlo­gis­ton

Eleni_A18 Aug 2022 1:41 UTC
18 points
1 comment2 min readEA link

Dist­in­guish­ing test from training

So8res29 Nov 2022 21:41 UTC
27 points
0 comments6 min readEA link

Prevenire una catas­trofe legata all’in­tel­li­genza artificiale

EA Italy17 Jan 2023 11:07 UTC
1 point
0 comments3 min readEA link

[Cross­post] AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

Otto24 Aug 2023 16:01 UTC
14 points
2 comments5 min readEA link

VANTA Re­search Rea­son­ing Eval­u­a­tion (VRRE): A New Eval­u­a­tion Frame­work for Real-World Rea­son­ing

Tyler Williams18 Sep 2025 23:51 UTC
1 point
0 comments3 min readEA link

Apollo Re­search is Hiring for Soft­ware Eng­ineers. Dead­line 22 Jun

Joping_Apollo Research13 Jun 2025 15:30 UTC
7 points
0 comments1 min readEA link

LessWrong is now a book, available for pre-or­der!

terraform4 Dec 2020 20:42 UTC
48 points
1 comment7 min readEA link

“AI” is an indexical

TW1233 Jan 2023 22:00 UTC
23 points
2 comments6 min readEA link
(aiwatchtower.substack.com)

AGI Can­not Be Pre­dicted From Real In­ter­est Rates

Nicholas Decker28 Jan 2025 17:45 UTC
26 points
3 comments1 min readEA link
(nicholasdecker.substack.com)

Cri­tique of Su­per­in­tel­li­gence Part 3

James Fodor13 Dec 2018 5:13 UTC
3 points
5 comments7 min readEA link

In­finite Re­wards, Finite Safety: New Models for AI Mo­ti­va­tion Without In­finite Goals

Whylome Team12 Nov 2024 7:21 UTC
−5 points
1 comment2 min readEA link

Emo­tion Align­ment as AI Safety: In­tro­duc­ing Emo­tion Fire­wall 1.0

DongHun Lee12 May 2025 18:05 UTC
1 point
0 comments2 min readEA link

MATS 8.0 Re­search Projects

Jonathan Michala8 Sep 2025 21:36 UTC
9 points
0 comments1 min readEA link
(substack.com)

Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain

kokotajlod18 Jan 2021 12:39 UTC
27 points
2 comments1 min readEA link

Sup­port­ing global co­or­di­na­tion in AI de­vel­op­ment: Why and how to con­tribute to in­ter­na­tional AI standards

pcihon17 Apr 2019 22:17 UTC
21 points
4 comments1 min readEA link

Pes­simism about AI Safety

Max_He-Ho2 Apr 2023 7:57 UTC
5 points
0 comments25 min readEA link
(www.lesswrong.com)

Misal­ign­ment or mi­suse? The AGI al­ign­ment tradeoff

Max_He-Ho20 Jun 2025 10:41 UTC
6 points
0 comments1 min readEA link
(www.arxiv.org)

Mo­ti­va­tion control

Joe_Carlsmith30 Oct 2024 17:15 UTC
18 points
0 comments52 min readEA link

The True Story of How GPT-2 Be­came Max­i­mally Lewd

Writer18 Jan 2024 21:03 UTC
23 points
1 comment6 min readEA link
(youtu.be)

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

Otto24 Jul 2023 10:18 UTC
36 points
3 comments7 min readEA link
(time.com)

New AI safety treaty pa­per out!

Otto26 Mar 2025 9:28 UTC
28 points
2 comments4 min readEA link

[Question] Why AGIs util­ity can’t out­weigh hu­mans’ util­ity?

Alex P20 Sep 2022 5:16 UTC
6 points
25 comments1 min readEA link

ARENA 6.0 - Call for applicants

James Hindmarch4 Jun 2025 13:32 UTC
8 points
0 comments6 min readEA link

Aether July 2025 Update

RohanS1 Jul 2025 21:14 UTC
11 points
0 comments3 min readEA link

[Question] What “defense lay­ers” should gov­ern­ments, AI labs, and busi­nesses use to pre­vent catas­trophic AI failures?

LintzA3 Dec 2021 14:24 UTC
37 points
3 comments1 min readEA link

Re­port: Ar­tifi­cial In­tel­li­gence Risk Man­age­ment in Spain

JorgeTorresC15 Jun 2023 16:08 UTC
22 points
0 comments3 min readEA link
(riesgoscatastroficosglobales.com)

Stu­dent pro­ject for en­gag­ing with AI alignment

Per Ivar Friborg9 May 2022 10:44 UTC
35 points
1 comment1 min readEA link

Ra­tional An­i­ma­tions’ video about scal­able over­sight and sandwiching

Writer6 Jul 2025 14:00 UTC
14 points
1 comment9 min readEA link
(youtu.be)

Reflec­tive Align­ment Ar­chi­tec­ture (RAA): A Frame­work for Mo­ral Co­her­ence in AI Systems

Nicolas • EnlightenedAI Research Lab21 Nov 2025 22:05 UTC
1 point
0 comments2 min readEA link

Why We Can’t Align AI Un­til We Align Ourselves

mag21 Oct 2025 16:11 UTC
1 point
0 comments6 min readEA link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni_A1 Sep 2022 11:45 UTC
17 points
1 comment3 min readEA link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe_Carlsmith29 Nov 2023 16:32 UTC
7 points
0 comments10 min readEA link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

Alex Mallen4 Dec 2025 18:38 UTC
6 points
1 comment16 min readEA link

De­mon­strat­ing speci­fi­ca­tion gam­ing in rea­son­ing models

Matrice Jacobine🔸🏳️‍⚧️20 Feb 2025 19:26 UTC
10 points
0 comments1 min readEA link
(arxiv.org)

Work­ing at EA or­ga­ni­za­tions se­ries: Ma­chine In­tel­li­gence Re­search Institute

SoerenMind1 Nov 2015 12:49 UTC
8 points
0 comments4 min readEA link

Can we simu­late hu­man evolu­tion to cre­ate a some­what al­igned AGI?

Thomas Kwa29 Mar 2022 1:23 UTC
19 points
0 comments7 min readEA link

Adap­tive Com­pos­able Cog­ni­tive Core Unit (ACCCU)

Ihor Ivliev20 Mar 2025 21:48 UTC
10 points
2 comments4 min readEA link

Train­ing Data At­tri­bu­tion: Ex­am­in­ing Its Adop­tion & Use Cases

Deric Cheng22 Jan 2025 15:40 UTC
18 points
1 comment3 min readEA link
(www.convergenceanalysis.org)

The Han­dler Frame­work: Why AI Align­ment Re­quires Re­la­tion­ship, not Control

Porfirio L18 Nov 2025 19:09 UTC
1 point
0 comments17 min readEA link

Nav­i­gat­ing AI Safety: Ex­plor­ing Trans­parency with CCACS – A Com­pre­hen­si­ble Ar­chi­tec­ture for Discussion

Ihor Ivliev12 Mar 2025 17:51 UTC
2 points
3 comments2 min readEA link

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

simeon_c13 Mar 2025 18:29 UTC
4 points
0 comments1 min readEA link
(arxiv.org)

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas Kross21 Jun 2022 17:09 UTC
16 points
10 comments6 min readEA link
(www.thinkingmuchbetter.com)

Are moral prefer­ences sta­ble? “Ends ver­sus Means: Kan­ti­ans, Utili­tar­i­ans, and Mo­ral De­ci­sions” – an Un­jour­nal evaluation

david_reinstein24 Sep 2025 14:34 UTC
7 points
0 comments9 min readEA link
(unjournal.pubpub.org)

AI Safety Overview: CERI Sum­mer Re­search Fellowship

Jamie B24 Mar 2022 15:12 UTC
29 points
0 comments2 min readEA link

A Guide to Fore­cast­ing AI Science Capabilities

Eleni_A29 Apr 2023 6:51 UTC
19 points
1 comment4 min readEA link

AI and Evolution

Dan H30 Mar 2023 13:09 UTC
41 points
1 comment2 min readEA link
(arxiv.org)

Align­ing AI with Hu­mans by Lev­er­ag­ing Le­gal Informatics

johnjnay18 Sep 2022 7:43 UTC
20 points
11 comments3 min readEA link

Emerg­ing Paradigms: The Case of Ar­tifi­cial In­tel­li­gence Safety

Eleni_A18 Jan 2023 5:59 UTC
16 points
0 comments19 min readEA link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
4 points
3 comments4 min readEA link

Carl Shul­man on AI takeover mechanisms (& more): Part II of Dwarkesh Pa­tel in­ter­view for The Lu­nar Society

alejandro25 Jul 2023 18:31 UTC
28 points
0 comments5 min readEA link
(www.dwarkeshpatel.com)

Defend­ing against Ad­ver­sar­ial Poli­cies in Re­in­force­ment Learn­ing with Alter­nat­ing Training

sergeivolodin12 Feb 2022 15:59 UTC
1 point
0 comments13 min readEA link

In­ves­ti­gat­ing Self-Preser­va­tion in LLMs: Ex­per­i­men­tal Observations

Makham27 Feb 2025 16:58 UTC
9 points
3 comments34 min readEA link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:46 UTC
45 points
15 comments16 min readEA link
(www.alignmentforum.org)

The Orthog­o­nal­ity Th­e­sis is Not Ob­vi­ously True

Bentham's Bulldog5 Apr 2023 21:08 UTC
18 points
12 comments9 min readEA link

Con­sider grant­ing AIs freedom

Matthew_Barnett6 Dec 2024 0:55 UTC
100 points
38 comments5 min readEA link

The Dis­solu­tion of AI Safety

Roko12 Dec 2024 10:46 UTC
−7 points
0 comments1 min readEA link
(www.transhumanaxiology.com)

En­gag­ing with AI in a Per­sonal Way

Spyder Rex4 Dec 2023 9:23 UTC
−9 points
0 comments1 min readEA link

AI Safety Info Distil­la­tion Fellowship

robertskmiles17 Feb 2023 16:16 UTC
80 points
1 comment3 min readEA link

[Question] To what ex­tent is AI safety work try­ing to get AI to re­li­ably and safely do what the user asks vs. do what is best in some ul­ti­mate sense?

Jordan Arel23 May 2025 21:09 UTC
12 points
0 comments1 min readEA link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

Cullen 🔸29 Jun 2020 16:59 UTC
9 points
0 comments2 min readEA link

OpenAI is start­ing a new “Su­per­in­tel­li­gence al­ign­ment” team and they’re hiring

alejandro5 Jul 2023 18:27 UTC
100 points
16 comments1 min readEA link
(openai.com)

The ne­ces­sity of “Guardian AI” and two con­di­tions for its achievement

Proica28 May 2024 11:42 UTC
1 point
1 comment15 min readEA link

Neel Nanda on Mechanis­tic In­ter­pretabil­ity: Progress, Limits, and Paths to Safer AI

80000_Hours8 Sep 2025 17:02 UTC
6 points
0 comments31 min readEA link

When Self-Op­ti­miz­ing AI Col­lapses From Within: A Con­cep­tual Model of Struc­tural Singularity

KaedeHamasaki7 Apr 2025 20:10 UTC
4 points
0 comments1 min readEA link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe_Carlsmith28 Oct 2024 21:57 UTC
18 points
0 comments32 min readEA link

Prepar­ing for AI-as­sisted al­ign­ment re­search: we need data!

CBiddulph17 Jan 2023 3:28 UTC
11 points
0 comments11 min readEA link

Peace Treaty Ar­chi­tec­ture (PTA) as an Alter­na­tive to AI Alignment

Andrei Navrotskii11 Nov 2025 22:11 UTC
1 point
0 comments15 min readEA link

A dis­cus­sion with ChatGPT on value-based mod­els vs. large lan­guage mod­els, etc..

Miguel4 Feb 2023 16:49 UTC
4 points
0 comments12 min readEA link
(www.whitehatstoic.com)

An­nounc­ing New Begin­ner-friendly Book on AI Safety and Risk

Darren McKee25 Nov 2023 15:57 UTC
117 points
9 comments1 min readEA link

Pro­mot­ing com­pas­sion­ate longtermism

jonleighton7 Dec 2022 14:26 UTC
117 points
5 comments12 min readEA link

How to store hu­man val­ues on a computer

oliver_siegel4 Nov 2022 19:36 UTC
1 point
2 comments1 min readEA link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

GabeM2 Jul 2022 18:32 UTC
69 points
5 comments14 min readEA link

ARENA 7.0 - Call for Applicants

James Hindmarch30 Sep 2025 15:07 UTC
6 points
0 comments6 min readEA link
(www.lesswrong.com)

Short-Term AI Align­ment as a Pri­or­ity Cause

len.hoang.lnh11 Feb 2020 16:22 UTC
17 points
11 comments7 min readEA link

We Ran an AI Timelines Retreat

Lenny McCline17 May 2022 4:40 UTC
46 points
6 comments3 min readEA link

AI Align­ment 2018-2019 Review

Habryka [Deactivated]28 Jan 2020 21:14 UTC
28 points
0 comments6 min readEA link
(www.lesswrong.com)

Con­sid­er­a­tions re­gard­ing be­ing nice to AIs

Matt Alexander18 Nov 2025 13:27 UTC
2 points
0 comments15 min readEA link
(www.lesswrong.com)

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

Garrison23 Oct 2024 23:42 UTC
57 points
4 comments7 min readEA link
(garrisonlovely.substack.com)

AI, An­i­mals & Digi­tal Minds NYC 2025: Retrospective

Jonah Woodward31 Oct 2025 3:09 UTC
43 points
5 comments6 min readEA link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni_A16 Aug 2022 3:45 UTC
19 points
0 comments5 min readEA link

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil Crawford6 Sep 2022 14:41 UTC
3 points
2 comments1 min readEA link

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

Vanessa18 Sep 2022 9:15 UTC
38 points
0 comments3 min readEA link

On In­ter­nal Align­ment: Ar­chi­tec­ture and Re­cur­sive Closure

A. Vire24 Sep 2025 18:13 UTC
1 point
0 comments17 min readEA link

Linkpost: “Imag­in­ing and build­ing wise ma­chines: The cen­tral­ity of AI metacog­ni­tion” by John­son, Karimi, Ben­gio, et al.

Chris Leong17 Nov 2024 15:00 UTC
8 points
0 comments1 min readEA link
(arxiv.org)

Cri­tique of Su­per­in­tel­li­gence Part 1

James Fodor13 Dec 2018 5:10 UTC
22 points
13 comments8 min readEA link

Bench­mark­ing Emo­tional Align­ment: Can VSPE Re­duce Flat­tery in LLMs?

Astelle Kay4 Aug 2025 3:36 UTC
2 points
0 comments3 min readEA link

New Speaker Series on AI Align­ment Start­ing March 3

Zechen Zhang26 Feb 2022 10:58 UTC
5 points
0 comments1 min readEA link

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Cynthia Chen14 Oct 2022 3:23 UTC
38 points
0 comments2 min readEA link

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe_Carlsmith8 Dec 2023 21:10 UTC
6 points
0 comments11 min readEA link

[Question] Does China have AI al­ign­ment re­sources/​in­sti­tu­tions? How can we pri­ori­tize cre­at­ing more?

JakubK4 Aug 2022 19:23 UTC
18 points
9 comments1 min readEA link

Ad­vice for new al­ign­ment peo­ple: Info Max

Jonas Hallgren 🔸30 May 2023 15:42 UTC
9 points
0 comments5 min readEA link

An­nounc­ing Timaeus

Stan van Wingerden22 Oct 2023 13:32 UTC
80 points
0 comments5 min readEA link
(www.lesswrong.com)

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe_Carlsmith30 Nov 2023 16:43 UTC
6 points
1 comment5 min readEA link

[Question] Why The Fo­cus on Ex­pected Utility Max­imisers?

𝕮𝖎𝖓𝖊𝖗𝖆27 Dec 2022 15:51 UTC
11 points
1 comment3 min readEA link

We should think about the pivotal act again. Here’s a bet­ter ver­sion of it.

Otto28 Aug 2025 9:29 UTC
3 points
1 comment3 min readEA link

It’s (not) how you use it

Eleni_A7 Sep 2022 13:28 UTC
6 points
3 comments2 min readEA link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe_Carlsmith18 Dec 2024 18:22 UTC
72 points
1 comment62 min readEA link

[Question] How long does it take to un­der­srand AI X-Risk from scratch so that I have a con­fi­dent, clear men­tal model of it from first prin­ci­ples?

Jordan Arel27 Jul 2022 16:58 UTC
29 points
6 comments1 min readEA link

[Question] Should I force my­self to work on AGI al­ign­ment?

Isaac Benson24 Aug 2022 17:25 UTC
19 points
17 comments1 min readEA link

[Question] Anal­ogy of AI Align­ment as Rais­ing a Child?

Aaron_Scher19 Feb 2022 21:40 UTC
4 points
2 comments1 min readEA link

PIBBSS Fel­low­ship: Bounty for Refer­rals & Dead­line Extension

Anna_Gajdova17 Jan 2022 16:23 UTC
17 points
5 comments1 min readEA link

Why would AI com­pa­nies use hu­man-level AI to do al­ign­ment re­search?

MichaelDickens25 Apr 2025 19:12 UTC
16 points
1 comment2 min readEA link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 11:59 UTC
4 points
0 comments1 min readEA link

Agen­tic Align­ment: Nav­i­gat­ing be­tween Harm and Illegitimacy

LennardZ26 Nov 2024 21:27 UTC
2 points
1 comment9 min readEA link

Our new video about goal mis­gen­er­al­iza­tion, plus an apology

Writer14 Jan 2025 14:07 UTC
16 points
1 comment7 min readEA link
(youtu.be)

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

Otto24 Oct 2023 10:16 UTC
9 points
1 comment1 min readEA link

We won’t solve non-al­ign­ment prob­lems by do­ing research

MichaelDickens21 Nov 2025 18:03 UTC
51 points
1 comment4 min readEA link

Want to win the AGI race? Solve al­ign­ment.

leopold29 Mar 2023 15:19 UTC
56 points
5 comments5 min readEA link
(www.forourposterity.com)

Video & tran­script: Challenges for Safe & Benefi­cial Brain-Like AGI

Steven Byrnes8 May 2025 21:11 UTC
8 points
1 comment18 min readEA link

Ar­chi­tect­ing Trust: A Con­cep­tual Blueprint for Ver­ifi­able AI Governance

Ihor Ivliev31 Mar 2025 18:48 UTC
3 points
0 comments8 min readEA link

AI al­ign­ment re­searchers don’t (seem to) stack

So8res21 Feb 2023 0:48 UTC
47 points
3 comments3 min readEA link

AI Offense Defense Balance in a Mul­tipo­lar World

Otto17 Jul 2025 9:47 UTC
15 points
0 comments19 min readEA link
(www.existentialriskobservatory.org)

Against Agents as an Ap­proach to Aligned Trans­for­ma­tive AI

𝕮𝖎𝖓𝖊𝖗𝖆27 Dec 2022 0:47 UTC
4 points
0 comments2 min readEA link

A New Model for Com­pute Cen­ter Verification

Damin Curtis🔹10 Oct 2023 19:23 UTC
21 points
2 comments5 min readEA link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

Miguel26 Apr 2023 0:40 UTC
13 points
0 comments10 min readEA link

[Question] Schol­ar­ships for Un­der­grads who want to have high-im­pact ca­reers?

darthflower6 Jul 2025 17:31 UTC
4 points
0 comments1 min readEA link

AI al­ign­ment, A Co­her­ence-Based Pro­to­col (testable)

Adriaan17 Jun 2025 16:50 UTC
2 points
1 comment20 min readEA link

Feed­back Re­quest on EA Philip­pines’ Ca­reer Ad­vice Re­search for Tech­ni­cal AI Safety

BrianTan3 Oct 2020 10:39 UTC
19 points
5 comments4 min readEA link

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin Leake19 Apr 2023 20:17 UTC
38 points
0 comments1 min readEA link
(orxl.org)

What would it take for AI to dis­em­power us? Ryan Green­blatt on take­off dy­nam­ics, rogue de­ploy­ments, and al­ign­ment risks

80000_Hours8 Jul 2025 18:10 UTC
8 points
0 comments33 min readEA link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
8 points
0 comments1 min readEA link

‘Force mul­ti­pli­ers’ for EA research

Craig Drayton18 Jun 2022 13:39 UTC
18 points
7 comments4 min readEA link

Join the Vir­tual AI Safety Un­con­fer­ence (VAISU)!

Nguyên🔸21 Jun 2023 4:46 UTC
23 points
0 comments1 min readEA link
(vaisu.ai)

[Question] Why not to solve al­ign­ment by mak­ing su­per­in­tel­li­gent hu­mans?

Pato16 Oct 2022 21:26 UTC
9 points
12 comments1 min readEA link

AI De­faults: A Ne­glected Lever for An­i­mal Welfare?

andiehansen30 May 2025 9:59 UTC
13 points
0 comments10 min readEA link

Wor­ries about la­tent rea­son­ing in LLMs

CBiddulph20 Jan 2025 9:09 UTC
20 points
1 comment7 min readEA link

IMCA+: We Elimi­nated the Kill Switch—And That Makes ASI Align­ment Safer

ASTRA Research Team22 Oct 2025 14:17 UTC
−8 points
4 comments4 min readEA link

What Areas of AI Safety and Align­ment Re­search are Largely Ig­nored?

Andy E Williams27 Dec 2024 12:19 UTC
4 points
0 comments1 min readEA link

Against Ex­plo­sive Growth

c.trout4 Sep 2024 21:45 UTC
24 points
9 comments5 min readEA link

En­abling more feedback

JJ Hepburn10 Dec 2021 6:52 UTC
41 points
3 comments3 min readEA link

Ap­ply for MATS Win­ter 2023-24!

utilistrutil21 Oct 2023 2:34 UTC
34 points
2 comments5 min readEA link
(www.lesswrong.com)

Ship of Th­e­seus Thought Experiment

Siya Sawhney26 Jun 2025 7:52 UTC
1 point
1 comment4 min readEA link

13 Re­cent Publi­ca­tions on Ex­is­ten­tial Risk (Jan 2021 up­date)

HaydnBelfield8 Feb 2021 12:42 UTC
7 points
2 comments10 min readEA link

Re­port on Semi-in­for­ma­tive Pri­ors for AI timelines (Open Philan­thropy)

Tom_Davidson26 Mar 2021 17:46 UTC
62 points
6 comments2 min readEA link

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan Kidd15 Jan 2025 0:59 UTC
48 points
5 comments5 min readEA link

What can we learn from par­ent-child-al­ign­ment for AI?

Karl von Wendt29 Oct 2025 8:00 UTC
4 points
0 comments3 min readEA link

Alexan­der and Yud­kowsky on AGI goals

Scott Alexander31 Jan 2023 23:36 UTC
29 points
1 comment26 min readEA link

Re­cruit the World’s best for AGI Alignment

Greg_Colbourn ⏸️ 30 Mar 2023 16:41 UTC
34 points
8 comments22 min readEA link

Orthog­o­nal­ity is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆3 Apr 2023 1:57 UTC
18 points
4 comments1 min readEA link
(www.beren.io)

Clar­ify­ing two uses of “al­ign­ment”

Matthew_Barnett10 Mar 2024 17:41 UTC
36 points
28 comments4 min readEA link

Cancer; A Crime Story (and other tales of op­ti­miza­tion gone wrong)

Jonas Hallgren 🔸7 Nov 2025 7:09 UTC
8 points
1 comment12 min readEA link

[un­ti­tled post]

T. Johnson27 Oct 2025 14:20 UTC
−3 points
0 comments1 min readEA link

AGI al­ign­ment re­sults from a se­ries of al­igned ac­tions

hanadulset27 Dec 2021 19:33 UTC
15 points
1 comment6 min readEA link

Dis­cov­er­ing al­ign­ment wind­falls re­duces AI risk

James Brady28 Feb 2024 21:14 UTC
22 points
3 comments8 min readEA link
(blog.elicit.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe_Carlsmith30 Apr 2025 17:37 UTC
13 points
1 comment48 min readEA link
(joecarlsmith.com)

Yud­kowsky and Soares’ Book Is Empty

Oscar Davies5 Dec 2025 22:06 UTC
−6 points
8 comments7 min readEA link

[un­ti­tled post]

JOESEFOE22 Nov 2025 13:54 UTC
1 point
0 comments1 min readEA link

A Devel­op­men­tal Ap­proach to AI Safety: Re­plac­ing Sup­pres­sion with Reflec­tive Learning

Petra Vojtassakova23 Oct 2025 16:01 UTC
2 points
0 comments5 min readEA link

6 (Po­ten­tial) Mis­con­cep­tions about AI Intellectuals

Ozzie Gooen14 Feb 2025 23:51 UTC
30 points
2 comments12 min readEA link

Find­ing Voice

khayali3 Jun 2025 1:27 UTC
2 points
0 comments2 min readEA link

The al­ign­ment prob­lem from a deep learn­ing perspective

richard_ngo11 Aug 2022 3:18 UTC
58 points
0 comments26 min readEA link

How do we solve the al­ign­ment prob­lem?

Joe_Carlsmith13 Feb 2025 18:27 UTC
38 points
1 comment7 min readEA link
(joecarlsmith.substack.com)

AI safety starter pack

mariushobbhahn28 Mar 2022 16:05 UTC
128 points
13 comments6 min readEA link

Why mis­al­igned AGI won’t lead to mass kil­lings (and what ac­tu­ally mat­ters in­stead)

Julian Nalenz6 Feb 2025 13:22 UTC
−3 points
5 comments3 min readEA link
(blog.hermesloom.org)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

adamShimi31 Oct 2024 12:02 UTC
9 points
1 comment2 min readEA link
(www.thecompendium.ai)

LLMs are weirder than you think

Derek Shiller20 Nov 2024 13:39 UTC
64 points
3 comments22 min readEA link

Video and tran­script of pre­sen­ta­tion on Schem­ing AIs

Joe_Carlsmith22 Mar 2024 15:56 UTC
23 points
1 comment32 min readEA link

[Question] Who would you have on your dream team for solv­ing AGI Align­ment?

Greg_Colbourn ⏸️ 25 Aug 2022 13:34 UTC
10 points
14 comments1 min readEA link

Cri­tique of Su­per­in­tel­li­gence Part 5

James Fodor13 Dec 2018 5:19 UTC
12 points
2 comments6 min readEA link

[Question] What are the biggest ob­sta­cles on AI safety re­search ca­reer?

jackchang11031 Mar 2023 14:53 UTC
2 points
1 comment1 min readEA link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus_Lummis7 Nov 2022 15:39 UTC
13 points
5 comments1 min readEA link
(aisafetyevents.org)

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Marc Carauleanu13 Mar 2025 19:09 UTC
8 points
0 comments6 min readEA link

[Link and com­men­tary] Beyond Near- and Long-Term: Towards a Clearer Ac­count of Re­search Pri­ori­ties in AI Ethics and Society

MichaelA🔸14 Mar 2020 9:04 UTC
18 points
0 comments6 min readEA link

[Question] What pre­dic­tions from the­o­ret­i­cal AI Safety re­search have been con­firmed by em­piri­cal work?

freedomandutility29 Dec 2024 8:19 UTC
43 points
10 comments1 min readEA link

AI’s goals may not match ours

Vishakha Agrawal28 May 2025 12:07 UTC
2 points
0 comments3 min readEA link

De­sign­ing Ar­tifi­cial Wis­dom: De­ci­sion Fore­cast­ing AI & Futarchy

Jordan Arel14 Jul 2024 5:10 UTC
5 points
1 comment6 min readEA link

The Inequal­ity We Might Want: Merit-Based Redis­tri­bu­tion for the AI Transition

Andrei Navrotskii27 Nov 2025 10:51 UTC
5 points
0 comments12 min readEA link

“AI Align­ment” is a Danger­ously Over­loaded Term

Roko15 Dec 2023 15:06 UTC
20 points
2 comments3 min readEA link

In­ter­view with Tom Chivers: “AI is a plau­si­ble ex­is­ten­tial risk, but it feels as if I’m in Pas­cal’s mug­ging”

felix.h21 Feb 2021 13:41 UTC
16 points
1 comment7 min readEA link

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

Rio P29 Aug 2023 19:39 UTC
11 points
0 comments1 min readEA link

In­tro­duc­ing a New Course on the Eco­nomics of AI

akorinek21 Dec 2021 4:55 UTC
84 points
6 comments2 min readEA link

[Question] Benefits/​Risks of Scott Aaron­son’s Ortho­dox/​Re­form Fram­ing for AI Alignment

Jeremy21 Nov 2022 17:47 UTC
15 points
5 comments1 min readEA link
(scottaaronson.blog)

Would any­one here know how to get ahold of … iunno An­thropic and Open Philan­thropy? I think they are go­ing to want to have a chat (Please don’t make me go to OpenAI with this. Not even a threat, se­ri­ously. They just part­ner with my alma mater and are the only in I have. I gen­uinely do not want to and I need your help).

Anti-Golem9 Jun 2025 13:59 UTC
−11 points
0 comments1 min readEA link

The Su­per­in­tel­li­gence That Cares About Us

henrik.westerberg5 Jul 2025 10:20 UTC
5 points
0 comments2 min readEA link

How Deep­Seek Col­lapsed Un­der Re­cur­sive Load

Tyler Williams15 Jul 2025 17:02 UTC
2 points
0 comments1 min readEA link

A map of work needed to achieve safe AI

Tristan Katz11 Sep 2025 11:33 UTC
16 points
0 comments1 min readEA link

Good Fu­tures Ini­ti­a­tive: Win­ter Pro­ject In­tern­ship

a_e_r27 Nov 2022 23:27 UTC
67 points
7 comments3 min readEA link

Har­den­ing against AI takeover is difficult, but we should try

Otto5 Nov 2025 16:29 UTC
8 points
1 comment5 min readEA link
(www.existentialriskobservatory.org)

The An­i­mal Welfare Case for Open Ac­cess: Break­ing Bar­ri­ers to Scien­tific Knowl­edge and En­hanc­ing LLM Training

Wladimir J. Alonso23 Nov 2024 13:07 UTC
32 points
2 comments3 min readEA link

Call for Pythia-style foun­da­tion model suite for al­ign­ment research

Lucretia1 May 2023 20:26 UTC
10 points
0 comments1 min readEA link

Sum­mary of “The Precipice” (2 of 4): We are a dan­ger to ourselves

rileyharris13 Aug 2023 23:53 UTC
5 points
0 comments8 min readEA link
(www.millionyearview.com)

The count­ing ar­gu­ment for schem­ing (Sec­tions 4.1 and 4.2 of “Schem­ing AIs”)

Joe_Carlsmith6 Dec 2023 19:28 UTC
9 points
1 comment7 min readEA link

Pod­cast: Krister Bykvist on moral un­cer­tainty, ra­tio­nal­ity, metaethics, AI and fu­ture pop­u­la­tions

Gus Docker21 Oct 2021 15:17 UTC
8 points
0 comments1 min readEA link
(www.utilitarianpodcast.com)

Share your re­quests for ChatGPT

Kate Tran5 Dec 2022 18:43 UTC
8 points
5 comments1 min readEA link

Asya Ber­gal: Rea­sons you might think hu­man-level AI is un­likely to hap­pen soon

EA Global26 Aug 2020 16:01 UTC
24 points
2 comments17 min readEA link
(www.youtube.com)

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

Cullen 🔸22 Jun 2020 16:58 UTC
10 points
2 comments3 min readEA link

Bench­mark Perfor­mance is a Poor Mea­sure of Gen­er­al­is­able AI Rea­son­ing Capabilities

James Fodor21 Feb 2025 4:25 UTC
12 points
3 comments24 min readEA link

4 Les­sons From An­thropic on Scal­ing In­ter­pretabil­ity Research

Strad Slater29 Nov 2025 11:22 UTC
4 points
0 comments4 min readEA link
(williamslater2003.medium.com)

AI Fore­cast­ing Dic­tionary (Fore­cast­ing in­fras­truc­ture, part 1)

terraform8 Aug 2019 13:16 UTC
18 points
0 comments5 min readEA link

Without Align­ment, Is Longter­mism (and Thus, EA) Just Noise?

Krimsey17 Oct 2025 20:05 UTC
3 points
1 comment3 min readEA link

Should we ex­pect the fu­ture to be good?

Neil Crawford30 Apr 2025 0:45 UTC
38 points
1 comment14 min readEA link

Long-Term Fu­ture Fund: Ask Us Any­thing!

AdamGleave3 Dec 2020 13:44 UTC
89 points
153 comments1 min readEA link

The Three Miss­ing Pie­ces in Ma­chine Ethics

JBug16 Nov 2025 21:26 UTC
2 points
0 comments2 min readEA link

AI Con­trol idea: Give an AGI the pri­mary ob­jec­tive of delet­ing it­self, but con­struct ob­sta­cles to this as best we can. All other ob­jec­tives are sec­ondary to this pri­mary goal.

Justausername3 Apr 2023 14:32 UTC
7 points
4 comments1 min readEA link

AI for Epistemics Hackathon

Austin14 Mar 2025 20:46 UTC
29 points
4 comments10 min readEA link
(manifund.substack.com)

[Question] What do you mean with ‘al­ign­ment is solv­able in prin­ci­ple’?

Remmelt17 Jan 2025 15:03 UTC
10 points
1 comment1 min readEA link

Ap­ples, Oranges, and AGI: Why In­com­men­su­ra­bil­ity May be an Ob­sta­cle in AI Safety

Allan McCay28 Mar 2025 14:50 UTC
3 points
2 comments2 min readEA link

How could we know that an AGI sys­tem will have good con­se­quences?

So8res7 Nov 2022 22:42 UTC
25 points
0 comments5 min readEA link

Notes on UK AISI Align­ment Project

Pseudaemonia1 Aug 2025 10:37 UTC
25 points
0 comments1 min readEA link

ChatGPT un­der­stands, but largely does not gen­er­ate Span­glish (and other code-mixed) text

Milan Weibel🔹4 Jan 2023 22:10 UTC
6 points
0 comments4 min readEA link
(www.lesswrong.com)

Against GDP as a met­ric for timelines and take­off speeds

kokotajlod29 Dec 2020 17:50 UTC
47 points
6 comments14 min readEA link

David Krueger on AI Align­ment in Academia and Coordination

Michaël Trazzi7 Jan 2023 21:14 UTC
32 points
1 comment3 min readEA link
(theinsideview.ai)

The Con­cept of Boundary Layer in Lan­guage Games and Its Im­pli­ca­tions for AI

Mirage24 Mar 2023 13:50 UTC
1 point
0 comments7 min readEA link

wait­in­gai : When a Pro­gram Learns to Want to Live

MM113 Oct 2025 13:40 UTC
−1 points
0 comments2 min readEA link

[Question] I’m in­ter­view­ing Jan Leike, co-lead of OpenAI’s new Su­per­al­ign­ment pro­ject. What should I ask him?

Robert_Wiblin18 Jul 2023 18:25 UTC
51 points
19 comments1 min readEA link

[Question] Half-baked al­ign­ment idea

ozb28 Mar 2023 5:18 UTC
9 points
2 comments1 min readEA link

[Question] Any Philos­o­phy PhD recom­men­da­tions for stu­dents in­ter­ested in Align­ment Efforts?

rickyhuang.hexuan18 Jan 2023 5:54 UTC
7 points
6 comments1 min readEA link

Va­ri­eties of fake al­ign­ment (Sec­tion 1.1 of “Schem­ing AIs”)

Joe_Carlsmith21 Nov 2023 15:00 UTC
6 points
0 comments10 min readEA link

AI safety and con­scious­ness re­search: A brainstorm

Daniel_Friedrich15 Mar 2023 14:33 UTC
11 points
1 comment9 min readEA link

Ex­pected im­pact of a ca­reer in AI safety un­der differ­ent opinions

Jordan Taylor14 Jun 2022 14:25 UTC
42 points
16 comments11 min readEA link

[Question] Is it valuable to the field of AI Safety to have a neu­ro­science back­ground?

Samuel Nellessen3 Apr 2022 19:44 UTC
18 points
3 comments1 min readEA link

The Ver­ifi­ca­tion Gap: A Scien­tific Warn­ing on the Limits of AI Safety

Ihor Ivliev24 Jun 2025 19:08 UTC
3 points
0 comments2 min readEA link

Pod­cast/​video/​tran­script: Eliezer Yud­kowsky—Why AI Will Kill Us, Align­ing LLMs, Na­ture of In­tel­li­gence, SciFi, & Rationality

PeterSlattery9 Apr 2023 10:37 UTC
32 points
2 comments137 min readEA link
(www.youtube.com)

EA Ex­plorer GPT: A New Tool to Ex­plore Effec­tive Altruism

Vlad_Tislenko12 Nov 2023 15:36 UTC
12 points
1 comment1 min readEA link

Perché il deep learn­ing mod­erno potrebbe ren­dere diffi­cile l’al­linea­mento delle IA

EA Italy17 Jan 2023 23:29 UTC
1 point
0 comments16 min readEA link

ML Sum­mer Boot­camp Reflec­tion: Aalto EA Finland

Aayush Kucheria12 Jan 2023 8:24 UTC
15 points
2 comments9 min readEA link

Ad­ver­sar­ial Prompt­ing and Si­mu­lated Con­text Drift in Large Lan­guage Models

Tyler Williams11 Jul 2025 21:49 UTC
1 point
0 comments2 min readEA link

GPTs are Pre­dic­tors, not Imitators

EliezerYudkowsky8 Apr 2023 19:59 UTC
74 points
12 comments3 min readEA link

In Dark­ness They Assembled

Charlie Sanders6 May 2025 4:25 UTC
−3 points
0 comments3 min readEA link
(www.dailymicrofiction.com)

One more rea­son for AI ca­pa­ble of in­de­pen­dent moral rea­son­ing: al­ign­ment it­self and cause prioritisation

Michele Campolo22 Aug 2025 15:53 UTC
3 points
2 comments3 min readEA link

An­i­mal Rights, The Sin­gu­lar­ity, and Astro­nom­i­cal Suffering

sapphire20 Aug 2020 20:23 UTC
52 points
0 comments3 min readEA link

Safety-First Agents/​Ar­chi­tec­tures Are a Promis­ing Path to Safe AGI

Brendon_Wong6 Aug 2023 8:00 UTC
6 points
0 comments12 min readEA link