RSS

AI alignment

TagLast edit: Jul 22, 2022, 8:58 PM by Leo

AI alignment is research on how to align AI systems with human or moral goals.

Evaluation

80,000 Hours rates AI alignment a “highest priority area”: a problem at the top of their ranking of global issues assessed by importance, tractability and neglectedness.[1]

Further reading

Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Forum, April 3.

Shah, Rohin (2020) What’s been happening in AI alignment?, Effective Altruism Forum, July 29.

External links

AI Alignment Forum.

Related entries

AI governance | AI forecasting | alignment tax | Center for Human-Compatible Artificial Intelligence | Machine Intelligence Research Institute | rationality community

  1. ^

2019 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 19, 2019, 2:58 AM
147 points
28 comments62 min readEA link

Ben Garfinkel: How sure are we about this AI stuff?

bgarfinkelFeb 9, 2019, 7:17 PM
128 points
20 comments18 min readEA link

2018 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 18, 2018, 4:48 AM
118 points
28 comments63 min readEA link

AGI Safety Fun­da­men­tals cur­ricu­lum and application

richard_ngoOct 20, 2021, 9:45 PM
123 points
20 comments8 min readEA link
(docs.google.com)

AI Re­search Con­sid­er­a­tions for Hu­man Ex­is­ten­tial Safety (ARCHES)

Andrew CritchMay 21, 2020, 6:55 AM
29 points
0 comments3 min readEA link
(acritch.com)

Why AI al­ign­ment could be hard with mod­ern deep learning

AjeyaSep 21, 2021, 3:35 PM
153 points
17 comments14 min readEA link
(www.cold-takes.com)

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

richard_ngoJan 23, 2019, 2:58 PM
63 points
14 comments8 min readEA link

Del­e­gated agents in prac­tice: How com­pa­nies might end up sel­l­ing AI ser­vices that act on be­half of con­sumers and coal­i­tions, and what this im­plies for safety research

RemmeltNov 26, 2020, 4:39 PM
11 points
0 comments4 min readEA link

Why I pri­ori­tize moral cir­cle ex­pan­sion over re­duc­ing ex­tinc­tion risk through ar­tifi­cial in­tel­li­gence alignment

JacyFeb 20, 2018, 6:29 PM
107 points
72 comments35 min readEA link
(www.sentienceinstitute.org)

My cur­rent thoughts on MIRI’s “highly re­li­able agent de­sign” work

Daniel_DeweyJul 7, 2017, 1:17 AM
60 points
59 comments19 min readEA link

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

Rohin ShahMay 13, 2022, 12:19 PM
102 points
0 comments9 min readEA link

Hiring en­g­ineers and re­searchers to help al­ign GPT-3

Paul_ChristianoOct 1, 2020, 6:52 PM
107 points
19 comments3 min readEA link

AI al­ign­ment re­searchers may have a com­par­a­tive ad­van­tage in re­duc­ing s-risks

Lukas_GloorFeb 15, 2023, 1:01 PM
79 points
5 comments13 min readEA link

2017 AI Safety Liter­a­ture Re­view and Char­ity Comparison

LarksDec 20, 2017, 9:54 PM
43 points
17 comments23 min readEA link

The aca­demic con­tri­bu­tion to AI safety seems large

technicalitiesJul 30, 2020, 10:30 AM
117 points
28 comments9 min readEA link

Crazy ideas some­times do work

Aryeh EnglanderSep 4, 2021, 3:27 AM
71 points
8 comments1 min readEA link

Prevent­ing an AI-re­lated catas­tro­phe—Prob­lem profile

Benjamin HiltonAug 29, 2022, 6:49 PM
138 points
18 comments4 min readEA link
(80000hours.org)

2016 AI Risk Liter­a­ture Re­view and Char­ity Comparison

LarksDec 13, 2016, 4:36 AM
57 points
12 comments28 min readEA link

Launch­ing ap­pli­ca­tions for AI Safety Ca­reers Course In­dia 2024

varun_agrMay 1, 2024, 5:30 AM
23 points
1 comment1 min readEA link

A tale of 2.5 or­thog­o­nal­ity theses

ArepoMay 1, 2022, 1:53 PM
141 points
31 comments11 min readEA link

My per­sonal cruxes for work­ing on AI safety

BuckFeb 13, 2020, 7:11 AM
136 points
35 comments44 min readEA link

AMA: Ajeya Co­tra, re­searcher at Open Phil

AjeyaJan 28, 2021, 5:38 PM
84 points
105 comments1 min readEA link

From lan­guage to ethics by au­to­mated reasoning

Michele CampoloNov 21, 2021, 3:16 PM
8 points
0 comments6 min readEA link

De­cep­tive Align­ment is <1% Likely by Default

DavidWFeb 21, 2023, 3:07 PM
54 points
26 comments14 min readEA link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_EthApr 18, 2023, 8:09 AM
31 points
2 comments1 min readEA link
(medium.com)

Scru­ti­niz­ing AI Risk (80K, #81) - v. quick summary

BenJul 23, 2020, 7:02 PM
10 points
1 comment3 min readEA link

TAI Safety Biblio­graphic Database

Jess_RiedelDec 22, 2020, 4:03 PM
61 points
9 comments17 min readEA link

Ngo and Yud­kowsky on al­ign­ment difficulty

richard_ngoNov 15, 2021, 10:47 PM
71 points
13 comments94 min readEA link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym FibírJan 30, 2025, 11:57 AM
−1 points
2 comments10 min readEA link
(tetherware.substack.com)

Train for in­cor­rigi­bil­ity, then re­verse it (Shut­down Prob­lem Con­test Sub­mis­sion)

Daniel_EthJul 18, 2023, 8:26 AM
16 points
0 comments2 min readEA link

Draft re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe_CarlsmithApr 28, 2021, 9:41 PM
88 points
34 comments1 min readEA link

AI al­ign­ment shouldn’t be con­flated with AI moral achievement

Matthew_BarnettDec 30, 2023, 3:08 AM
114 points
15 comments5 min readEA link

AGI safety from first principles

richard_ngoOct 21, 2020, 5:42 PM
77 points
10 comments3 min readEA link
(www.alignmentforum.org)

[Question] What are the coolest top­ics in AI safety, to a hope­lessly pure math­e­mat­i­cian?

Jenny K EMay 7, 2022, 7:18 AM
89 points
29 comments1 min readEA link

Coun­ter­ar­gu­ments to the ba­sic AI risk case

Katja_GraceOct 14, 2022, 8:30 PM
284 points
23 comments34 min readEA link

[Question] What is most con­fus­ing to you about AI stuff?

Sam ClarkeNov 23, 2021, 4:00 PM
25 points
15 comments1 min readEA link

“Aligned with who?” Re­sults of sur­vey­ing 1,000 US par­ti­ci­pants on AI values

Holly MorganMar 21, 2023, 10:07 PM
41 points
0 comments2 min readEA link
(www.lesswrong.com)

What is it to solve the al­ign­ment prob­lem? (Notes)

Joe_CarlsmithAug 24, 2024, 9:19 PM
32 points
1 comment1 min readEA link

Cog­ni­tive Science/​Psy­chol­ogy As a Ne­glected Ap­proach to AI Safety

Kaj_SotalaJun 5, 2017, 1:46 PM
40 points
37 comments4 min readEA link

[Link post] Co­or­di­na­tion challenges for pre­vent­ing AI conflict

stefan.torgesMar 9, 2021, 9:39 AM
58 points
0 comments1 min readEA link
(longtermrisk.org)

There are no co­her­ence theorems

EJTFeb 20, 2023, 9:52 PM
107 points
49 comments19 min readEA link

What is it like do­ing AI safety work?

Kat WoodsFeb 21, 2023, 7:24 PM
99 points
2 comments10 min readEA link

How do take­off speeds af­fect the prob­a­bil­ity of bad out­comes from AGI?

KRJul 7, 2020, 5:53 PM
18 points
0 comments8 min readEA link

In­tro­duc­ing The Non­lin­ear Fund: AI Safety re­search, in­cu­ba­tion, and funding

Kat WoodsMar 18, 2021, 2:07 PM
71 points
32 comments5 min readEA link

AGI safety ca­reer advice

richard_ngoMay 2, 2023, 7:36 AM
211 points
20 comments1 min readEA link

An­nounc­ing AI Safety Support

Linda LinseforsNov 19, 2020, 8:19 PM
55 points
0 comments4 min readEA link

Large Lan­guage Models as Fi­du­cia­ries to Humans

johnjnayJan 24, 2023, 7:53 PM
25 points
0 comments34 min readEA link
(papers.ssrn.com)

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8resJun 15, 2022, 2:19 PM
53 points
2 comments10 min readEA link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhubJan 12, 2024, 7:51 PM
65 points
0 comments1 min readEA link
(arxiv.org)

Align­ment 201 curriculum

richard_ngoOct 12, 2022, 7:17 PM
94 points
9 comments1 min readEA link

The ba­sic rea­sons I ex­pect AGI ruin

RobBensingerApr 18, 2023, 3:37 AM
58 points
13 comments1 min readEA link

How might we al­ign trans­for­ma­tive AI if it’s de­vel­oped very soon?

Holden KarnofskyAug 29, 2022, 3:48 PM
163 points
17 comments44 min readEA link

My Un­der­stand­ing of Paul Chris­ti­ano’s Iter­ated Am­plifi­ca­tion AI Safety Re­search Agenda

ChiAug 15, 2020, 7:59 PM
38 points
3 comments39 min readEA link

AGI mis­al­ign­ment x-risk may be lower due to an over­looked goal speci­fi­ca­tion technology

johnjnayOct 21, 2022, 2:03 AM
20 points
1 comment1 min readEA link

Why the Orthog­o­nal­ity Th­e­sis’s ve­rac­ity is not the point:

Antoine de Scorraille ⏸️Jul 23, 2020, 3:40 PM
3 points
0 comments3 min readEA link

The cur­rent al­ign­ment plan, and how we might im­prove it | EAG Bay Area 23

BuckJun 7, 2023, 9:03 PM
66 points
0 comments33 min readEA link

There should be an AI safety pro­ject board

mariushobbhahnMar 14, 2022, 4:08 PM
24 points
3 comments1 min readEA link

[linkpost] “What Are Rea­son­able AI Fears?” by Robin Han­son, 2023-04-23

Arjun PanicksseryApr 14, 2023, 11:26 PM
41 points
3 comments4 min readEA link
(quillette.com)

AI Risk: In­creas­ing Per­sua­sion Power

kewlcatsAug 3, 2020, 8:25 PM
4 points
0 comments1 min readEA link

Deep Deceptiveness

So8resMar 21, 2023, 2:51 AM
40 points
1 comment1 min readEA link

AI al­ign­ment with hu­mans… but with which hu­mans?

Geoffrey MillerSep 8, 2022, 11:43 PM
51 points
20 comments3 min readEA link

We Are Con­jec­ture, A New Align­ment Re­search Startup

Connor LeahyApr 9, 2022, 3:07 PM
31 points
0 comments1 min readEA link

Rele­vant pre-AGI possibilities

kokotajlodJun 20, 2020, 1:15 PM
22 points
0 comments1 min readEA link
(aiimpacts.org)

(Even) More Early-Ca­reer EAs Should Try AI Safety Tech­ni­cal Research

tlevinJun 30, 2022, 9:14 PM
86 points
40 comments11 min readEA link

Chain­ing the evil ge­nie: why “outer” AI safety is prob­a­bly easy

titotalAug 30, 2022, 1:55 PM
40 points
12 comments10 min readEA link

Buck Sh­legeris: How I think stu­dents should ori­ent to AI safety

EA GlobalOct 25, 2020, 5:48 AM
11 points
0 comments1 min readEA link
(www.youtube.com)

Ap­ply to the ML for Align­ment Boot­camp (MLAB) in Berkeley [Jan 3 - Jan 22]

Habryka [Deactivated]Nov 3, 2021, 6:20 PM
140 points
6 comments1 min readEA link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

Connor LeahyJul 29, 2022, 7:35 PM
34 points
3 comments19 min readEA link

[Link] How un­der­stand­ing valence could help make fu­ture AIs safer

Milan GriffesOct 8, 2020, 6:53 PM
22 points
2 comments3 min readEA link

Safe AI and moral AI

William D'AlessandroJun 1, 2023, 9:18 PM
3 points
0 comments11 min readEA link

On Defer­ence and Yud­kowsky’s AI Risk Estimates

bgarfinkelJun 19, 2022, 2:35 PM
285 points
194 comments17 min readEA link

Pos­si­ble OpenAI’s Q* break­through and Deep­Mind’s AlphaGo-type sys­tems plus LLMs

BurnydelicNov 23, 2023, 7:02 AM
13 points
4 comments2 min readEA link

In­tel­lec­tual Diver­sity in AI Safety

KRJul 22, 2020, 7:07 PM
21 points
8 comments3 min readEA link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8resJul 12, 2022, 5:35 AM
126 points
13 comments29 min readEA link

Con­nor Leahy on Con­jec­ture and Dy­ing with Dignity

Michaël TrazziJul 22, 2022, 7:30 PM
34 points
0 comments10 min readEA link
(theinsideview.ai)

2020 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

LarksDec 21, 2020, 3:25 PM
155 points
16 comments68 min readEA link

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

TW123May 9, 2022, 5:02 PM
68 points
0 comments6 min readEA link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sid BlackSep 23, 2022, 6:03 PM
35 points
0 comments1 min readEA link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin PopeMar 21, 2023, 1:23 AM
166 points
21 comments39 min readEA link

Par­allels Between AI Safety by De­bate and Ev­i­dence Law

Cullen 🔸Jul 20, 2020, 10:52 PM
30 points
2 comments2 min readEA link
(cullenokeefe.com)

“The Race to the End of Hu­man­ity” – Struc­tural Uncer­tainty Anal­y­sis in AI Risk Models

FroolowMay 19, 2023, 12:03 PM
48 points
4 comments21 min readEA link

Speedrun: AI Align­ment Prizes

joeFeb 9, 2023, 11:55 AM
27 points
0 comments18 min readEA link

EA, Psy­chol­ogy & AI Safety Research

Sam EllisMay 26, 2022, 11:46 PM
28 points
3 comments6 min readEA link

Align­ing the Align­ers: En­sur­ing Aligned AI acts for the com­mon good of all mankind

timunderwoodJan 16, 2023, 11:13 AM
40 points
2 comments4 min readEA link

High-level hopes for AI alignment

Holden KarnofskyDec 20, 2022, 2:11 AM
123 points
14 comments19 min readEA link
(www.cold-takes.com)

An­nounc­ing AXRP, the AI X-risk Re­search Podcast

DanielFilanDec 23, 2020, 8:10 PM
32 points
1 comment1 min readEA link

[Question] How much EA anal­y­sis of AI safety as a cause area ex­ists?

richard_ngoSep 6, 2019, 11:15 AM
94 points
20 comments2 min readEA link

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden KarnofskyNov 29, 2022, 6:59 PM
24 points
0 comments32 min readEA link
(www.cold-takes.com)

Paul Chris­ti­ano: Cur­rent work in AI alignment

EA GlobalApr 3, 2020, 7:06 AM
80 points
3 comments24 min readEA link
(www.youtube.com)

Ap­ply to the sec­ond ML for Align­ment Boot­camp (MLAB 2) in Berkeley [Aug 15 - Fri Sept 2]

BuckMay 6, 2022, 12:19 AM
111 points
7 comments6 min readEA link

Ro­hin Shah: What’s been hap­pen­ing in AI al­ign­ment?

EA GlobalJul 29, 2020, 8:15 PM
18 points
0 comments14 min readEA link
(www.youtube.com)

[Question] How strong is the ev­i­dence of un­al­igned AI sys­tems caus­ing harm?

Eevee🔹Jul 21, 2020, 4:08 AM
31 points
1 comment1 min readEA link

New re­port on how much com­pu­ta­tional power it takes to match the hu­man brain (Open Philan­thropy)

Aaron Gertler 🔸Sep 15, 2020, 1:06 AM
45 points
1 comment18 min readEA link
(www.openphilanthropy.org)

Paul Chris­ti­ano on how OpenAI is de­vel­op­ing real solu­tions to the ‘AI al­ign­ment prob­lem’, and his vi­sion of how hu­man­ity will pro­gres­sively hand over de­ci­sion-mak­ing to AI systems

80000_HoursOct 2, 2018, 11:49 AM
6 points
0 comments185 min readEA link

Op­por­tu­ni­ties for in­di­vi­d­ual donors in AI safety

alexflintMar 12, 2018, 2:10 AM
13 points
11 comments10 min readEA link

AGI risk: analo­gies & arguments

technicalitiesMar 23, 2021, 1:18 PM
31 points
3 comments8 min readEA link
(www.gleech.org)

Open Philan­thropy’s AI gov­er­nance grant­mak­ing (so far)

Aaron Gertler 🔸Dec 17, 2020, 12:00 PM
63 points
0 comments6 min readEA link
(www.openphilanthropy.org)

SERI ML Align­ment The­ory Schol­ars Pro­gram 2022

Ryan KiddApr 27, 2022, 4:33 PM
57 points
2 comments3 min readEA link

Michael Page, Dario Amodei, He­len Toner, Tasha McCauley, Jan Leike, & Owen Cot­ton-Bar­ratt: Mus­ings on AI

EA GlobalAug 11, 2017, 8:19 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

In­for­mat­ica: Spe­cial Is­sue on Superintelligence

RyanCareyMay 3, 2017, 5:05 AM
7 points
0 comments2 min readEA link

AI Im­pacts: His­toric trends in tech­nolog­i­cal progress

Aaron Gertler 🔸Feb 12, 2020, 12:08 AM
55 points
5 comments3 min readEA link

How I Formed My Own Views About AI Safety

Neel NandaFeb 27, 2022, 6:52 PM
134 points
12 comments14 min readEA link
(www.neelnanda.io)

[Question] What are the top pri­ori­ties in a slow-take­off, mul­ti­po­lar world?

JP Addison🔸Aug 25, 2021, 8:47 AM
26 points
9 comments1 min readEA link

Ought: why it mat­ters and ways to help

Paul_ChristianoJul 26, 2019, 1:56 AM
52 points
5 comments5 min readEA link

An­nounc­ing the Har­vard AI Safety Team

Xander123Jun 30, 2022, 6:34 PM
128 points
4 comments5 min readEA link

Two rea­sons we might be closer to solv­ing al­ign­ment than it seems

Kat WoodsSep 24, 2022, 5:38 PM
44 points
17 comments4 min readEA link

Three kinds of competitiveness

AI ImpactsApr 2, 2020, 3:46 AM
10 points
0 comments5 min readEA link
(aiimpacts.org)

Per­sonal thoughts on ca­reers in AI policy and strategy

carrickflynnSep 27, 2017, 4:52 PM
56 points
28 comments18 min readEA link

Chris­ti­ano, Co­tra, and Yud­kowsky on AI progress

AjeyaNov 25, 2021, 4:30 PM
18 points
6 comments68 min readEA link

Yud­kowsky and Chris­ti­ano dis­cuss “Take­off Speeds”

EliezerYudkowskyNov 22, 2021, 7:42 PM
42 points
0 comments60 min readEA link

“Slower tech de­vel­op­ment” can be about or­der­ing, grad­u­al­ness, or dis­tance from now

MichaelA🔸Nov 14, 2021, 8:58 PM
47 points
3 comments4 min readEA link

$500 bounty for al­ign­ment con­test ideas

AkashJun 30, 2022, 1:55 AM
18 points
1 comment2 min readEA link

BERI is hiring an ML Soft­ware Engineer

sawyer🔸Nov 10, 2021, 7:36 PM
17 points
2 comments1 min readEA link

AI views and dis­agree­ments AMA: Chris­ti­ano, Ngo, Shah, Soares, Yudkowsky

RobBensingerMar 1, 2022, 1:13 AM
30 points
4 comments1 min readEA link
(www.lesswrong.com)

Truth­ful AI

Owen Cotton-BarrattOct 20, 2021, 3:11 PM
55 points
14 comments10 min readEA link

Long-Term Fu­ture Fund: April 2019 grant recommendations

Habryka [Deactivated]Apr 23, 2019, 7:00 AM
142 points
242 comments47 min readEA link

An­nounc­ing the Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety!

DanielFilanSep 21, 2021, 12:41 AM
62 points
0 comments1 min readEA link
(grants.futureoflife.org)

AI safety uni­ver­sity groups: a promis­ing op­por­tu­nity to re­duce ex­is­ten­tial risk

micJun 30, 2022, 6:37 PM
53 points
1 comment11 min readEA link

Con­sider try­ing the ELK con­test (I am)

Holden KarnofskyJan 5, 2022, 7:42 PM
110 points
17 comments16 min readEA link

[Question] Brief sum­mary of key dis­agree­ments in AI Risk

Aryeh EnglanderDec 26, 2019, 7:40 PM
31 points
3 comments1 min readEA link

AGI in a vuln­er­a­ble world

AI ImpactsApr 2, 2020, 3:43 AM
17 points
0 comments1 min readEA link
(aiimpacts.org)

What suc­cess looks like

mariushobbhahnJun 28, 2022, 2:30 PM
112 points
20 comments19 min readEA link

Po­ten­tial Risks from Ad­vanced AI

EA GlobalAug 13, 2017, 7:00 AM
9 points
0 comments18 min readEA link

Twit­ter-length re­sponses to 24 AI al­ign­ment arguments

RobBensingerMar 14, 2022, 7:34 PM
67 points
17 comments8 min readEA link

Align­ment Newslet­ter One Year Retrospective

Rohin ShahApr 10, 2019, 7:00 AM
62 points
22 comments21 min readEA link

A list of good heuris­tics that the case for AI X-risk fails

Aaron Gertler 🔸Jul 16, 2020, 9:56 AM
25 points
9 comments2 min readEA link
(www.alignmentforum.org)

Owen Cot­ton-Bar­ratt: What does (and doesn’t) AI mean for effec­tive al­tru­ism?

EA GlobalAug 11, 2017, 8:19 AM
10 points
0 comments12 min readEA link
(www.youtube.com)

Is GPT-3 the death of the pa­per­clip max­i­mizer?

matthias_samwaldAug 3, 2020, 11:34 AM
4 points
1 comment1 min readEA link

[Question] The pos­i­tive case for a fo­cus on achiev­ing safe AI?

vipulnaikJun 25, 2021, 4:01 AM
41 points
1 comment1 min readEA link

[Link] EAF Re­search agenda: “Co­op­er­a­tion, Con­flict, and Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence”

stefan.torgesJan 17, 2020, 1:28 PM
64 points
0 comments1 min readEA link

AI Safety Needs Great Engineers

Andy JonesNov 23, 2021, 9:03 PM
98 points
13 comments4 min readEA link

An ML safety in­surance com­pany—shower thoughts

EdoAradOct 18, 2021, 7:45 AM
15 points
4 comments1 min readEA link

On the cor­re­spon­dence be­tween AI-mis­al­ign­ment and cog­ni­tive dis­so­nance us­ing a be­hav­ioral eco­nomics model

Stijn Bruers 🔸Nov 1, 2022, 9:15 AM
11 points
0 comments6 min readEA link

In­creased Availa­bil­ity and Willing­ness for De­ploy­ment of Re­sources for Effec­tive Altru­ism and Long-Termism

Evan_GaensbauerDec 29, 2021, 8:20 PM
46 points
1 comment2 min readEA link

“Ex­is­ten­tial risk from AI” sur­vey results

RobBensingerJun 1, 2021, 8:19 PM
80 points
35 comments11 min readEA link

7 es­says on Build­ing a Bet­ter Future

Jamie_HarrisJun 24, 2022, 2:28 PM
21 points
0 comments2 min readEA link

Some promis­ing ca­reer ideas be­yond 80,000 Hours’ pri­or­ity paths

Arden KoehlerJun 26, 2020, 10:34 AM
142 points
28 comments15 min readEA link

Tech­ni­cal AGI safety re­search out­side AI

richard_ngoOct 18, 2019, 3:02 PM
91 points
5 comments3 min readEA link

SERI ML ap­pli­ca­tion dead­line is ex­tended un­til May 22.

Viktoria MalyasovaMay 22, 2022, 12:13 AM
13 points
3 comments1 min readEA link

Messy per­sonal stuff that af­fected my cause pri­ori­ti­za­tion (or: how I started to care about AI safety)

Julia_Wise🔸May 5, 2022, 5:59 PM
265 points
14 comments2 min readEA link

Im­por­tant, ac­tion­able re­search ques­tions for the most im­por­tant century

Holden KarnofskyFeb 24, 2022, 4:34 PM
298 points
13 comments19 min readEA link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

RemmeltDec 24, 2022, 9:54 AM
6 points
0 comments1 min readEA link

Ngo and Yud­kowsky on AI ca­pa­bil­ity gains

richard_ngoNov 19, 2021, 1:54 AM
23 points
4 comments39 min readEA link

[Question] Is it crunch time yet? If so, who can help?

Nicholas / Heather KrossOct 13, 2021, 4:11 AM
29 points
9 comments1 min readEA link

Syd­ney AI Safety Fellowship

Chris LeongDec 2, 2021, 7:35 AM
16 points
0 comments2 min readEA link

Jan Leike, He­len Toner, Malo Bour­gon, and Miles Brundage: Work­ing in AI

EA GlobalAug 11, 2017, 8:19 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

The Parable of the Boy Who Cried 5% Chance of Wolf

Kat WoodsAug 15, 2022, 2:22 PM
80 points
8 comments2 min readEA link

[Linkpost] How To Get Into In­de­pen­dent Re­search On Align­ment/​Agency

Jackson WagnerFeb 14, 2022, 9:40 PM
10 points
0 comments1 min readEA link

Law-Fol­low­ing AI 3: Lawless AI Agents Un­der­mine Sta­bi­liz­ing Agreements

Cullen 🔸Apr 27, 2022, 5:20 PM
28 points
3 comments3 min readEA link

[Question] What are the challenges and prob­lems with pro­gram­ming law-break­ing con­straints into AGI?

MichaelStJulesFeb 2, 2020, 8:53 PM
20 points
34 comments1 min readEA link

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin ShahJan 3, 2020, 7:52 AM
58 points
12 comments10 min readEA link
(www.alignmentforum.org)

Con­sider pay­ing me to do AI safety re­search work

RupertNov 5, 2020, 8:09 AM
11 points
3 comments2 min readEA link

In­for­ma­tion se­cu­rity ca­reers for GCR reduction

ClaireZabelJun 20, 2019, 11:56 PM
187 points
35 comments8 min readEA link

Shar­ing the World with Digi­tal Minds

Aaron Gertler 🔸Dec 1, 2020, 8:00 AM
12 points
1 comment1 min readEA link
(www.nickbostrom.com)

[Question] What is an ex­am­ple of re­cent, tan­gible progress in AI safety re­search?

Aaron Gertler 🔸Jun 14, 2021, 5:29 AM
35 points
4 comments1 min readEA link

Chris­ti­ano and Yud­kowsky on AI pre­dic­tions and hu­man intelligence

EliezerYudkowskyFeb 23, 2022, 4:51 PM
31 points
0 comments42 min readEA link

Quan­tify­ing the Far Fu­ture Effects of Interventions

MichaelDickensMay 18, 2016, 2:15 AM
8 points
0 comments11 min readEA link

[Question] What would you do if you had a lot of money/​power/​in­fluence and you thought that AI timelines were very short?

Greg_Colbourn ⏸️ Nov 12, 2021, 9:59 PM
29 points
8 comments1 min readEA link

A mesa-op­ti­miza­tion per­spec­tive on AI valence and moral patienthood

jacobpfauSep 9, 2021, 10:23 PM
10 points
18 comments17 min readEA link

EA megapro­jects continued

mariushobbhahnDec 3, 2021, 10:33 AM
183 points
48 comments7 min readEA link

What does (and doesn’t) AI mean for effec­tive al­tru­ism?

EA GlobalAug 12, 2017, 7:00 AM
9 points
0 comments12 min readEA link

Public-fac­ing Cen­sor­ship Is Safety Theater, Caus­ing Rep­u­ta­tional Da­m­age

YitzSep 23, 2022, 5:08 AM
49 points
7 comments1 min readEA link

I’m Cul­len O’Keefe, a Policy Re­searcher at OpenAI, AMA

Cullen 🔸Jan 11, 2020, 4:13 AM
45 points
68 comments1 min readEA link

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. MurphyJan 20, 2022, 8:48 PM
20 points
7 comments3 min readEA link

Shah and Yud­kowsky on al­ign­ment failures

EliezerYudkowskyFeb 28, 2022, 7:25 PM
38 points
7 comments92 min readEA link

[Question] Why should we *not* put effort into AI safety re­search?

Ben ThompsonMay 16, 2021, 5:11 AM
15 points
5 comments1 min readEA link

An overview of some promis­ing work by ju­nior al­ign­ment researchers

AkashDec 26, 2022, 5:23 PM
10 points
0 comments1 min readEA link

An­drew Critch: Log­i­cal in­duc­tion — progress in AI alignment

EA GlobalAug 6, 2016, 12:40 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

Is AI fore­cast­ing a waste of effort on the mar­gin?

EmrikNov 5, 2022, 12:41 AM
12 points
6 comments3 min readEA link

Law-Fol­low­ing AI 2: In­tent Align­ment + Su­per­in­tel­li­gence → Lawless AI (By De­fault)

Cullen 🔸Apr 27, 2022, 5:18 PM
19 points
0 comments6 min readEA link

Fi­nal Re­port of the Na­tional Se­cu­rity Com­mis­sion on Ar­tifi­cial In­tel­li­gence (NSCAI, 2021)

MichaelA🔸Jun 1, 2021, 8:19 AM
51 points
3 comments4 min readEA link
(www.nscai.gov)

AMA or dis­cuss my 80K pod­cast epi­sode: Ben Garfinkel, FHI researcher

bgarfinkelJul 13, 2020, 4:17 PM
87 points
140 comments1 min readEA link

Pre­dict re­sponses to the “ex­is­ten­tial risk from AI” survey

RobBensingerMay 28, 2021, 1:38 AM
36 points
8 comments2 min readEA link

Steer­ing AI to care for an­i­mals, and soon

Andrew CritchJun 14, 2022, 1:13 AM
230 points
37 comments1 min readEA link

Owain Evans and Vic­to­ria Krakovna: Ca­reers in tech­ni­cal AI safety

EA GlobalNov 3, 2017, 7:43 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

[Question] Is a ca­reer in mak­ing AI sys­tems more se­cure a mean­ingful way to miti­gate the X-risk posed by AGI?

Kyle O’BrienFeb 13, 2022, 7:05 AM
14 points
4 comments1 min readEA link

Why I ex­pect suc­cess­ful (nar­row) alignment

Tobias_BaumannDec 29, 2018, 3:46 PM
18 points
10 comments1 min readEA link
(s-risks.org)

Red­wood Re­search is hiring for sev­eral roles

Jack RNov 29, 2021, 12:18 AM
75 points
0 comments1 min readEA link

Daniel Dewey: The Open Philan­thropy Pro­ject’s work on po­ten­tial risks from ad­vanced AI

EA GlobalAug 11, 2017, 8:19 AM
7 points
0 comments18 min readEA link
(www.youtube.com)

13 Very Differ­ent Stances on AGI

Ozzie GooenDec 27, 2021, 11:30 PM
84 points
23 comments3 min readEA link

The case for be­com­ing a black-box in­ves­ti­ga­tor of lan­guage models

BuckMay 6, 2022, 2:37 PM
90 points
7 comments3 min readEA link

Dis­cus­sion with Eliezer Yud­kowsky on AGI interventions

RobBensingerNov 11, 2021, 3:21 AM
60 points
33 comments34 min readEA link

Draft re­port on AI timelines

AjeyaDec 15, 2020, 12:10 PM
35 points
0 comments1 min readEA link
(alignmentforum.org)

Some AI re­search ar­eas and their rele­vance to ex­is­ten­tial safety

Andrew CritchDec 15, 2020, 12:15 PM
12 points
1 comment56 min readEA link
(alignmentforum.org)

Align­ing Recom­mender Sys­tems as Cause Area

IvanVendrovMay 8, 2019, 8:56 AM
150 points
48 comments13 min readEA link

List #2: Why co­or­di­nat­ing to al­ign as hu­mans to not de­velop AGI is a lot eas­ier than, well… co­or­di­nat­ing as hu­mans with AGI co­or­di­nat­ing to be al­igned with humans

RemmeltDec 24, 2022, 9:53 AM
3 points
0 comments1 min readEA link

Fore­cast­ing Trans­for­ma­tive AI: What Kind of AI?

Holden KarnofskyAug 10, 2021, 9:38 PM
62 points
3 comments10 min readEA link

What Should the Aver­age EA Do About AI Align­ment?

RaemonFeb 25, 2017, 8:07 PM
42 points
39 comments7 min readEA link

Disagree­ments about Align­ment: Why, and how, we should try to solve them

ojorgensenAug 8, 2022, 10:32 PM
16 points
6 comments16 min readEA link

List #1: Why stop­ping the de­vel­op­ment of AGI is hard but doable

RemmeltDec 24, 2022, 9:52 AM
24 points
2 comments1 min readEA link

Ma­hen­dra Prasad: Ra­tional group de­ci­sion-making

EA GlobalJul 8, 2020, 3:06 PM
15 points
0 comments16 min readEA link
(www.youtube.com)

We should ex­pect to worry more about spec­u­la­tive risks

bgarfinkelMay 29, 2022, 9:08 PM
120 points
14 comments3 min readEA link

What does it mean to be­come an ex­pert in AI Hard­ware?

TophJan 9, 2021, 4:15 AM
87 points
10 comments11 min readEA link

Med­i­ta­tions on ca­reers in AI Safety

PabloAMC 🔸Mar 23, 2022, 10:00 PM
88 points
30 comments2 min readEA link

Con­ver­sa­tion on AI risk with Adam Gleave

AI ImpactsDec 27, 2019, 9:43 PM
18 points
3 comments4 min readEA link
(aiimpacts.org)

There are two fac­tions work­ing to pre­vent AI dan­gers. Here’s why they’re deeply di­vided.

SharmakeAug 10, 2022, 7:52 PM
10 points
0 comments4 min readEA link
(www.vox.com)

Atari early

AI ImpactsApr 2, 2020, 11:28 PM
34 points
2 comments5 min readEA link
(aiimpacts.org)

I’m Buck Sh­legeris, I do re­search and out­reach at MIRI, AMA

BuckNov 15, 2019, 10:44 PM
123 points
228 comments2 min readEA link

[Question] Why aren’t you freak­ing out about OpenAI? At what point would you start?

AppliedDivinityStudiesOct 10, 2021, 1:06 PM
80 points
22 comments2 min readEA link

[Question] What harm could AI safety do?

SeanEngelhartMay 15, 2021, 1:11 AM
12 points
7 comments1 min readEA link

AI Safety: Ap­ply­ing to Grad­u­ate Studies

frances_lorenzDec 15, 2021, 10:56 PM
23 points
0 comments12 min readEA link

FLI AI Align­ment pod­cast: Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

evhubJul 1, 2020, 8:59 PM
13 points
2 comments1 min readEA link
(futureoflife.org)

How to build a safe ad­vanced AI (Evan Hub­inger) | What’s up in AI safety? (Asya Ber­gal)

EA GlobalOct 25, 2020, 5:48 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

AI al­ign­ment prize win­ners and next round [link]

RyanCareyJan 20, 2018, 12:07 PM
7 points
1 comment1 min readEA link

A Sim­ple Model of AGI De­ploy­ment Risk

djbinderJul 9, 2021, 9:44 AM
30 points
0 comments5 min readEA link

[Question] Is trans­for­ma­tive AI the biggest ex­is­ten­tial risk? Why or why not?

Eevee🔹Mar 5, 2022, 3:54 AM
9 points
10 comments1 min readEA link

Ngo and Yud­kowsky on sci­en­tific rea­son­ing and pivotal acts

EliezerYudkowskyFeb 21, 2022, 5:00 PM
33 points
1 comment35 min readEA link

Law-Fol­low­ing AI 1: Se­quence In­tro­duc­tion and Structure

Cullen 🔸Apr 27, 2022, 5:16 PM
35 points
2 comments9 min readEA link

Thoughts on short timelines

Tobias_BaumannOct 23, 2018, 3:59 PM
22 points
14 comments5 min readEA link

Four ques­tions I ask AI safety researchers

AkashJul 17, 2022, 5:25 PM
30 points
3 comments1 min readEA link

Why AI is Harder Than We Think—Me­lanie Mitchell

Eevee🔹Apr 28, 2021, 8:19 AM
45 points
7 comments2 min readEA link
(arxiv.org)

[Question] What con­sid­er­a­tions in­fluence whether I have more in­fluence over short or long timelines?

kokotajlodNov 5, 2020, 7:57 PM
18 points
0 comments1 min readEA link

AI Safety field-build­ing pro­jects I’d like to see

AkashSep 11, 2022, 11:45 PM
31 points
4 comments6 min readEA link
(www.lesswrong.com)

How Do AI Timelines Affect Giv­ing Now vs. Later?

MichaelDickensAug 3, 2021, 3:36 AM
36 points
8 comments8 min readEA link

Long-Term Fu­ture Fund: May 2021 grant recommendations

abergalMay 27, 2021, 6:44 AM
110 points
17 comments57 min readEA link

[Question] I’m in­ter­view­ing Max Teg­mark about AI safety and more. What shouId I ask him?

Robert_WiblinMay 13, 2022, 3:32 PM
18 points
2 comments1 min readEA link

Are al­ign­ment re­searchers de­vot­ing enough time to im­prov­ing their re­search ca­pac­ity?

Carson JonesNov 4, 2022, 12:58 AM
11 points
1 comment1 min readEA link

On pre­sent­ing the case for AI risk

Aryeh EnglanderMar 8, 2022, 9:37 PM
114 points
12 comments4 min readEA link

AGI Predictions

PabloNov 21, 2020, 12:02 PM
36 points
0 comments1 min readEA link
(www.lesswrong.com)

Get­ting started in­de­pen­dently in AI Safety

JJ HepburnJul 6, 2021, 3:20 PM
41 points
10 comments2 min readEA link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-SmithJun 4, 2022, 9:36 PM
265 points
9 comments39 min readEA link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

johnjnayOct 26, 2022, 1:24 AM
7 points
1 comment1 min readEA link

Jesse Clif­ton: Open-source learn­ing — a bar­gain­ing approach

EA GlobalOct 18, 2019, 6:05 PM
10 points
0 comments1 min readEA link
(www.youtube.com)

Tan Zhi Xuan: AI al­ign­ment, philo­soph­i­cal plu­ral­ism, and the rele­vance of non-Western philosophy

EA GlobalNov 21, 2020, 8:12 AM
19 points
1 comment1 min readEA link
(www.youtube.com)

Sur­vey on AI ex­is­ten­tial risk scenarios

Sam ClarkeJun 8, 2021, 5:12 PM
154 points
11 comments6 min readEA link

Katja Grace: AI safety

EA GlobalAug 11, 2017, 8:19 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

Some global catas­trophic risk estimates

TamayFeb 10, 2021, 7:32 PM
106 points
15 comments1 min readEA link

Key Papers in Lan­guage Model Safety

aogJun 20, 2022, 2:59 PM
20 points
0 comments22 min readEA link

[linkpost] Shar­ing pow­er­ful AI mod­els: the emerg­ing paradigm of struc­tured access

tsJan 20, 2022, 9:10 PM
11 points
3 comments1 min readEA link

Ar­tifi­cial in­tel­li­gence ca­reer stories

EA GlobalOct 25, 2020, 6:56 AM
12 points
0 comments1 min readEA link
(www.youtube.com)

[Question] Ca­reer Ad­vice: Philos­o­phy + Pro­gram­ming → AI Safety

tcelferactMar 18, 2022, 3:09 PM
30 points
11 comments2 min readEA link

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

Katja_GraceApr 6, 2021, 9:44 PM
19 points
1 comment11 min readEA link
(worldspiritsockpuppet.com)

Soares, Tal­linn, and Yud­kowsky dis­cuss AGI cognition

EliezerYudkowskyNov 29, 2021, 5:28 PM
15 points
0 comments40 min readEA link

Some AI Gover­nance Re­search Ideas

MarkusAnderljungJun 3, 2021, 10:51 AM
102 points
5 comments2 min readEA link

[Cause Ex­plo­ra­tion Prizes] Ex­pand­ing com­mu­ni­ca­tion about AGI risks

InesSep 22, 2022, 5:30 AM
13 points
0 comments11 min readEA link

In­tro­duc­ing the Prin­ci­ples of In­tel­li­gent Be­havi­our in Biolog­i­cal and So­cial Sys­tems (PIBBSS) Fellowship

adamShimiDec 18, 2021, 3:25 PM
37 points
5 comments10 min readEA link

What does it mean for an AGI to be ‘safe’?

So8resOct 7, 2022, 4:43 AM
53 points
21 comments1 min readEA link

[Question] Should the EA com­mu­nity have a DL en­g­ineer­ing fel­low­ship?

PabloAMC 🔸Dec 24, 2021, 1:43 PM
26 points
6 comments1 min readEA link

[Question] Is this a good way to bet on short timelines?

kokotajlodNov 28, 2020, 2:31 PM
17 points
16 comments1 min readEA link

He­len Toner: The Open Philan­thropy Pro­ject’s work on AI risk

EA GlobalNov 3, 2017, 7:43 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

Les­sons learned from talk­ing to >100 aca­demics about AI safety

mariushobbhahnOct 10, 2022, 1:16 PM
138 points
21 comments1 min readEA link

[Question] What kind of event, tar­geted to un­der­grad­u­ate CS ma­jors, would be most effec­tive at get­ting peo­ple to work on AI safety?

CBiddulphSep 19, 2021, 4:19 PM
9 points
1 comment1 min readEA link

[Creative Writ­ing Con­test] An AI Safety Limerick

Ben_West🔸Oct 18, 2021, 7:11 PM
21 points
5 comments1 min readEA link

The Me­taethics and Nor­ma­tive Ethics of AGI Value Align­ment: Many Ques­tions, Some Implications

Eleos Arete CitriniSep 15, 2021, 7:05 PM
25 points
0 comments8 min readEA link

[Question] How can I bet on short timelines?

kokotajlodNov 7, 2020, 12:45 PM
33 points
12 comments2 min readEA link

AGI x-risk timelines: 10% chance (by year X) es­ti­mates should be the head­line, not 50%.

Greg_Colbourn ⏸️ Mar 1, 2022, 12:02 PM
69 points
22 comments2 min readEA link

Dis­con­tin­u­ous progress in his­tory: an update

AI ImpactsApr 17, 2020, 4:28 PM
69 points
3 comments24 min readEA link

[Question] Is there ev­i­dence that recom­mender sys­tems are chang­ing users’ prefer­ences?

zdgroffApr 12, 2021, 7:11 PM
60 points
15 comments1 min readEA link

Max Teg­mark: Risks and benefits of ad­vanced ar­tifi­cial intelligence

EA GlobalAug 5, 2016, 9:19 AM
7 points
0 comments1 min readEA link
(www.youtube.com)

[Question] What are your recom­men­da­tions for tech­ni­cal AI al­ign­ment pod­casts?

Evan_GaensbauerMay 11, 2022, 9:52 PM
13 points
4 comments1 min readEA link

Mauhn Re­leases AI Safety Documentation

Berg SeverensJul 2, 2021, 12:19 PM
4 points
2 comments1 min readEA link

On AI and Compute

johncroxApr 3, 2019, 9:26 PM
39 points
12 comments8 min readEA link

Eric Drexler: Pare­to­topian goal alignment

EA GlobalMar 15, 2019, 2:51 PM
16 points
0 comments10 min readEA link
(www.youtube.com)

[Question] How should we in­vest in “long-term short-ter­mism” given the like­li­hood of trans­for­ma­tive AI?

James_BanksJan 12, 2021, 11:54 PM
8 points
0 comments1 min readEA link

Quick sur­vey on AI al­ign­ment resources

frances_lorenzJun 30, 2022, 7:08 PM
15 points
0 comments1 min readEA link

Three Im­pacts of Ma­chine Intelligence

Paul_ChristianoAug 23, 2013, 10:10 AM
33 points
5 comments8 min readEA link
(rationalaltruist.com)

Crit­i­cal Re­view of ‘The Precipice’: A Re­assess­ment of the Risks of AI and Pandemics

James FodorMay 11, 2020, 11:11 AM
111 points
32 comments26 min readEA link

Pile of Law and Law-Fol­low­ing AI

Cullen 🔸Jul 13, 2022, 12:29 AM
28 points
2 comments3 min readEA link

13 back­ground claims about EA

AkashSep 7, 2022, 3:54 AM
70 points
16 comments3 min readEA link

How to get tech­nolog­i­cal knowl­edge on AI/​ML (for non-tech peo­ple)

FangFangJun 30, 2021, 7:53 AM
62 points
7 comments5 min readEA link

Emer­gent Ven­tures AI

technicalitiesApr 8, 2022, 10:08 PM
22 points
0 comments1 min readEA link
(marginalrevolution.com)

Nat­u­ral­ism and AI alignment

Michele CampoloApr 24, 2021, 4:20 PM
17 points
3 comments7 min readEA link

Take­aways from safety by de­fault interviews

AI ImpactsApr 7, 2020, 2:01 AM
25 points
2 comments13 min readEA link
(aiimpacts.org)

12 ca­reer ad­vis­ing ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

AkashDec 12, 2022, 10:36 PM
14 points
0 comments1 min readEA link

CFP for Re­bel­lion and Di­sobe­di­ence in AI workshop

Ram RachumDec 29, 2022, 4:09 PM
4 points
0 comments1 min readEA link

Ways to buy time

AkashNov 12, 2022, 7:31 PM
47 points
1 comment1 min readEA link

AI al­ign­ment re­search links

Holden KarnofskyJan 6, 2022, 5:52 AM
16 points
0 comments6 min readEA link
(www.cold-takes.com)

Large Lan­guage Models as Cor­po­rate Lob­by­ists, and Im­pli­ca­tions for So­cietal-AI Alignment

johnjnayJan 4, 2023, 10:22 PM
10 points
6 comments8 min readEA link

[Linkpost] Jan Leike on three kinds of al­ign­ment taxes

AkashJan 6, 2023, 11:57 PM
29 points
0 comments1 min readEA link

Is this com­mu­nity over-em­pha­siz­ing AI al­ign­ment?

LixiangJan 8, 2023, 6:23 AM
1 point
5 comments1 min readEA link

Lev­el­ling Up in AI Safety Re­search Engineering

GabeMSep 2, 2022, 4:59 AM
165 points
21 comments17 min readEA link

VIRTUA: a novel about AI alignment

Karl von WendtJan 12, 2023, 9:37 AM
23 points
0 comments1 min readEA link

Vic­to­ria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël TrazziJan 12, 2023, 5:09 PM
16 points
0 comments1 min readEA link

Col­lin Burns on Align­ment Re­search And Dis­cov­er­ing La­tent Knowl­edge Without Supervision

Michaël TrazziJan 17, 2023, 5:21 PM
21 points
3 comments1 min readEA link

Com­pendium of prob­lems with RLHF

Raphaël SJan 30, 2023, 8:48 AM
18 points
0 comments1 min readEA link

Align­ment is mostly about mak­ing cog­ni­tion aimable at all

So8resJan 30, 2023, 3:22 PM
57 points
3 comments1 min readEA link

Eli Lifland on Nav­i­gat­ing the AI Align­ment Landscape

Ozzie GooenFeb 1, 2023, 12:07 AM
48 points
9 comments31 min readEA link
(quri.substack.com)

Qual­ities that al­ign­ment men­tors value in ju­nior researchers

AkashFeb 14, 2023, 11:27 PM
31 points
1 comment1 min readEA link

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_EthFeb 11, 2023, 2:56 AM
50 points
4 comments13 min readEA link

Order Mat­ters for De­cep­tive Alignment

DavidWFeb 15, 2023, 8:12 PM
20 points
1 comment1 min readEA link
(www.lesswrong.com)

Don’t Call It AI Alignment

GilFeb 20, 2023, 5:27 AM
16 points
7 comments2 min readEA link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil CrawfordMar 29, 2022, 7:47 PM
13 points
0 comments3 min readEA link

Who Aligns the Align­ment Re­searchers?

ben.smithMar 5, 2023, 11:22 PM
23 points
4 comments1 min readEA link

[Question] Align­ment & Ca­pa­bil­ities: What’s the differ­ence?

John G. HalsteadAug 31, 2023, 10:13 PM
50 points
10 comments1 min readEA link

De Dicto and De Se Refer­ence Mat­ters for Alignment

philgoetzOct 3, 2023, 9:57 PM
5 points
2 comments9 min readEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 26, 2023, 11:00 PM
12 points
1 comment1 min readEA link

[Question] If AIs had sub­cor­ti­cal brain simu­la­tion, would that solve the al­ign­ment prob­lem?

Rainbow AffectJul 31, 2023, 3:48 PM
1 point
0 comments2 min readEA link

Sym­bio­sis, not al­ign­ment, as the goal for liberal democ­ra­cies in the tran­si­tion to ar­tifi­cial gen­eral intelligence

simonfriederichMar 17, 2023, 1:04 PM
18 points
2 comments24 min readEA link
(rdcu.be)

Shal­low re­view of live agen­das in al­ign­ment & safety

technicalitiesNov 27, 2023, 11:33 AM
76 points
8 comments29 min readEA link

New blog: Planned Obsolescence

AjeyaMar 27, 2023, 7:46 PM
198 points
9 comments1 min readEA link
(www.planned-obsolescence.org)

Mis­gen­er­al­iza­tion as a misnomer

So8resApr 6, 2023, 8:43 PM
48 points
0 comments1 min readEA link

Brain-com­puter in­ter­faces and brain organoids in AI al­ign­ment?

freedomandutilityApr 15, 2023, 10:28 PM
8 points
2 comments1 min readEA link

No­body’s on the ball on AGI alignment

leopoldMar 29, 2023, 2:26 PM
327 points
65 comments9 min readEA link
(www.forourposterity.com)

Rac­ing through a minefield: the AI de­ploy­ment problem

Holden KarnofskyDec 31, 2022, 9:44 PM
79 points
1 comment13 min readEA link
(www.cold-takes.com)

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM
9 points
0 comments1 min readEA link

Lan­guage Agents Re­duce the Risk of Ex­is­ten­tial Catastrophe

cdkgMay 29, 2023, 9:59 AM
29 points
6 comments26 min readEA link

[Question] Are we con­fi­dent that su­per­in­tel­li­gent ar­tifi­cial in­tel­li­gence dis­em­pow­er­ing hu­mans would be bad?

Vasco Grilo🔸Jun 10, 2023, 9:24 AM
24 points
27 comments1 min readEA link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_CarlsmithNov 15, 2023, 5:16 PM
71 points
4 comments1 min readEA link

The Mul­tidis­ci­plinary Ap­proach to Align­ment (MATA) and Archety­pal Trans­fer Learn­ing (ATL)

MiguelJun 19, 2023, 3:23 AM
4 points
0 comments7 min readEA link

The case for more Align­ment Tar­get Anal­y­sis (ATA)

ChiSep 20, 2024, 1:14 AM
21 points
0 comments1 min readEA link

AI things that are per­haps as im­por­tant as hu­man-con­trol­led AI

ChiMar 3, 2024, 6:07 PM
113 points
9 comments21 min readEA link

Guardrails vs Goal-di­rect­ed­ness in AI Alignment

freedomandutilityDec 30, 2023, 12:58 PM
13 points
2 comments1 min readEA link

Timelines are short, p(doom) is high: a global stop to fron­tier AI de­vel­op­ment un­til x-safety con­sen­sus is our only rea­son­able hope

Greg_Colbourn ⏸️ Oct 12, 2023, 11:24 AM
73 points
85 comments9 min readEA link

Defin­ing al­ign­ment research

richard_ngoAug 19, 2024, 10:49 PM
48 points
1 comment1 min readEA link

Gentle­ness and the ar­tifi­cial Other

Joe_CarlsmithJan 2, 2024, 6:21 PM
90 points
2 comments1 min readEA link

Imi­ta­tion Learn­ing is Prob­a­bly Ex­is­ten­tially Safe

Vasco Grilo🔸Apr 30, 2024, 5:06 PM
19 points
7 comments3 min readEA link
(www.openphilanthropy.org)

Oth­er­ness and con­trol in the age of AGI

Joe_CarlsmithJan 2, 2024, 6:15 PM
37 points
1 comment1 min readEA link

When “yang” goes wrong

Joe_CarlsmithJan 8, 2024, 4:35 PM
57 points
1 comment1 min readEA link

Does AI risk “other” the AIs?

Joe_CarlsmithJan 9, 2024, 5:51 PM
23 points
3 comments1 min readEA link

An even deeper atheism

Joe_CarlsmithJan 11, 2024, 5:28 PM
26 points
2 comments1 min readEA link

Bryan John­son seems more EA al­igned than I expected

PeterSlatteryApr 22, 2024, 9:38 AM
13 points
27 comments2 min readEA link
(www.youtube.com)

Aspira­tion-based, non-max­i­miz­ing AI agent designs

Bob JacobsMay 7, 2024, 4:13 PM
12 points
1 comment38 min readEA link

The Prob­lem With the Word ‘Align­ment’

Peli GrietzerMay 21, 2024, 9:37 PM
13 points
1 comment6 min readEA link

On the abo­li­tion of man

Joe_CarlsmithJan 18, 2024, 6:17 PM
71 points
4 comments1 min readEA link

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice JacobineFeb 12, 2025, 9:15 AM
13 points
0 comments1 min readEA link
(www.emergent-values.ai)

AI safety tax dynamics

Owen Cotton-BarrattOct 23, 2024, 12:21 PM
21 points
9 comments6 min readEA link
(strangecities.substack.com)

LLMs might not be the fu­ture of search: at least, not yet.

James-Hartree-LawJan 22, 2025, 9:40 PM
4 points
1 comment4 min readEA link

Cos­mic AI safety

Magnus VindingDec 6, 2024, 10:32 PM
23 points
5 comments6 min readEA link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman LeventovJan 2, 2024, 8:11 AM
4 points
2 comments1 min readEA link

[Question] Can we con­vince peo­ple to work on AI safety with­out con­vinc­ing them about AGI hap­pen­ing this cen­tury?

BrianTanNov 26, 2020, 2:46 PM
8 points
3 comments2 min readEA link

From Con­flict to Coex­is­tence: Rewrit­ing the Game Between Hu­mans and AGI

Michael BatellMar 4, 2025, 2:10 PM
12 points
2 comments19 min readEA link

Have your say on the fu­ture of AI reg­u­la­tion: Dead­line ap­proach­ing for your feed­back on UN High-Level Ad­vi­sory Body on AI In­terim Re­port ‘Govern­ing AI for Hu­man­ity’

Deborah W.A. FoulkesMar 29, 2024, 6:37 AM
17 points
1 comment1 min readEA link

AI Benefits Post 1: In­tro­duc­ing “AI Benefits”

Cullen 🔸Jun 22, 2020, 4:58 PM
10 points
2 comments3 min readEA link

Give Neo a Chance

ankMar 6, 2025, 2:35 PM
1 point
3 comments7 min readEA link

Cortés, Pizarro, and Afonso as Prece­dents for Takeover

AI ImpactsMar 2, 2020, 12:25 PM
27 points
17 comments11 min readEA link
(aiimpacts.org)

Red­wood Re­search is hiring for sev­eral roles (Oper­a­tions and Tech­ni­cal)

JJXWangApr 14, 2022, 3:23 PM
45 points
0 comments1 min readEA link

[DISC] Are Values Ro­bust?

𝕮𝖎𝖓𝖊𝖗𝖆Dec 21, 2022, 1:13 AM
4 points
0 comments1 min readEA link

AI Fore­cast­ing Dic­tionary (Fore­cast­ing in­fras­truc­ture, part 1)

terraformAug 8, 2019, 1:16 PM
18 points
0 comments5 min readEA link

[Question] Is it eth­i­cal to work in AI “con­tent eval­u­a­tion”?

anon_databoy555Jan 30, 2025, 1:27 PM
10 points
3 comments1 min readEA link

Sin­ga­pore’s Tech­ni­cal AI Align­ment Re­search Ca­reer Guide

Yi-YangAug 26, 2020, 8:09 AM
34 points
7 comments8 min readEA link

[Closed] Hiring a math­e­mat­i­cian to work on the learn­ing-the­o­retic AI al­ign­ment agenda

VanessaApr 19, 2022, 6:49 AM
53 points
4 comments2 min readEA link

Pro­posal for a Form of Con­di­tional Sup­ple­men­tal In­come (CSI) in a Post-Work World

Sean SweeneyJan 31, 2025, 1:00 AM
3 points
0 comments3 min readEA link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe_CarlsmithOct 28, 2024, 9:57 PM
18 points
0 comments1 min readEA link

AI Safety Ideas: A col­lab­o­ra­tive AI safety re­search platform

Apart ResearchOct 17, 2022, 5:01 PM
67 points
13 comments4 min readEA link

[Question] Is it valuable to the field of AI Safety to have a neu­ro­science back­ground?

Samuel NellessenApr 3, 2022, 7:44 PM
18 points
3 comments1 min readEA link

Long-Term Fu­ture Fund: Ask Us Any­thing!

AdamGleaveDec 3, 2020, 1:44 PM
89 points
153 comments1 min readEA link

[Question] How do you talk about AI safety?

Eevee🔹Apr 19, 2020, 4:15 PM
10 points
5 comments1 min readEA link

Pod­cast: Tam­era Lan­ham on AI risk, threat mod­els, al­ign­ment pro­pos­als, ex­ter­nal­ized rea­son­ing over­sight, and work­ing at Anthropic

AkashDec 20, 2022, 9:39 PM
14 points
1 comment1 min readEA link

Repli­cat­ing AI Debate

Anthony FlemingFeb 1, 2025, 11:19 PM
9 points
0 comments5 min readEA link

Linkpost: “Imag­in­ing and build­ing wise ma­chines: The cen­tral­ity of AI metacog­ni­tion” by John­son, Karimi, Ben­gio, et al.

Chris LeongNov 17, 2024, 3:00 PM
8 points
0 comments1 min readEA link
(arxiv.org)

[Question] Is there any re­search or fore­casts of how likely AI Align­ment is go­ing to be a hard vs. easy prob­lem rel­a­tive to ca­pa­bil­ities?

Jordan ArelAug 14, 2022, 3:58 PM
8 points
1 comment1 min readEA link

Fron­tier AI sys­tems have sur­passed the self-repli­cat­ing red line

Greg_Colbourn ⏸️ Dec 10, 2024, 4:33 PM
25 points
14 comments1 min readEA link
(github.com)

LLMs are weirder than you think

Derek ShillerNov 20, 2024, 1:39 PM
61 points
3 comments22 min readEA link

An­nounc­ing AI Align­ment Awards: $100k re­search con­tests about goal mis­gen­er­al­iza­tion & corrigibility

AkashNov 22, 2022, 10:19 PM
60 points
1 comment1 min readEA link

LLM chat­bots have ~half of the kinds of “con­scious­ness” that hu­mans be­lieve in. Hu­mans should avoid go­ing crazy about that.

Andrew CritchNov 22, 2024, 3:26 AM
11 points
3 comments1 min readEA link

Reflec­tions on the PIBBSS Fel­low­ship 2022

noraDec 11, 2022, 10:03 PM
69 points
4 comments18 min readEA link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe_CarlsmithNov 29, 2023, 4:32 PM
7 points
0 comments1 min readEA link

The An­i­mal Welfare Case for Open Ac­cess: Break­ing Bar­ri­ers to Scien­tific Knowl­edge and En­hanc­ing LLM Training

Wladimir J. AlonsoNov 23, 2024, 1:07 PM
32 points
2 comments3 min readEA link

“Nor­mal ac­ci­dents” and AI sys­tems

Eleni_AAug 8, 2022, 6:43 PM
5 points
1 comment1 min readEA link
(www.achan.ca)

Three Bi­ases That Made Me Believe in AI Risk

beth​Feb 13, 2019, 11:22 PM
41 points
20 comments3 min readEA link

How Josiah be­came an AI safety researcher

Neil CrawfordMar 29, 2022, 7:47 PM
10 points
0 comments1 min readEA link

Paths and waysta­tions in AI safety

Joe_CarlsmithMar 11, 2025, 6:52 PM
22 points
2 comments1 min readEA link
(joecarlsmith.substack.com)

Be­ing an in­di­vi­d­ual al­ign­ment grantmaker

A_donorFeb 28, 2022, 4:39 PM
34 points
20 comments2 min readEA link

Re: Some thoughts on veg­e­tar­i­anism and veganism

FaiFeb 25, 2022, 8:43 PM
46 points
3 comments8 min readEA link

How could we know that an AGI sys­tem will have good con­se­quences?

So8resNov 7, 2022, 10:42 PM
25 points
0 comments1 min readEA link

Are Hu­mans ‘Hu­man Com­pat­i­ble’?

Matt BoydDec 6, 2019, 5:49 AM
23 points
8 comments4 min readEA link

Crit­i­cism of the main frame­work in AI alignment

Michele CampoloAug 31, 2022, 9:44 PM
42 points
4 comments7 min readEA link

What is the role of Bayesian ML for AI al­ign­ment/​safety?

mariushobbhahnJan 11, 2022, 8:07 AM
39 points
6 comments3 min readEA link

How Rood­man’s GWP model trans­lates to TAI timelines

kokotajlodNov 16, 2020, 2:11 PM
22 points
0 comments2 min readEA link

EA’s brain-over-body bias, and the em­bod­ied value prob­lem in AI al­ign­ment

Geoffrey MillerSep 21, 2022, 6:55 PM
45 points
3 comments25 min readEA link

New se­ries of posts an­swer­ing one of Holden’s “Im­por­tant, ac­tion­able re­search ques­tions”

Evan R. MurphyMay 12, 2022, 9:22 PM
9 points
0 comments1 min readEA link

FYI: I’m work­ing on a book about the threat of AGI/​ASI for a gen­eral au­di­ence. I hope it will be of value to the cause and the community

Darren McKeeJun 17, 2022, 11:52 AM
32 points
1 comment2 min readEA link

Risk Align­ment in Agen­tic AI Systems

Hayley ClatterbuckOct 1, 2024, 10:51 PM
31 points
1 comment3 min readEA link
(static1.squarespace.com)

Against GDP as a met­ric for timelines and take­off speeds

kokotajlodDec 29, 2020, 5:50 PM
47 points
6 comments14 min readEA link

Align­ment’s phlo­gis­ton

Eleni_AAug 18, 2022, 1:41 AM
18 points
1 comment2 min readEA link

AI Align­ment YouTube Playlists

jacquesthibsMay 9, 2022, 9:31 PM
16 points
2 comments1 min readEA link

On Ar­tifi­cial Gen­eral In­tel­li­gence: Ask­ing the Right Questions

Heather DouglasOct 2, 2022, 5:00 AM
−1 points
7 comments3 min readEA link

[Question] Book recom­men­da­tions for the his­tory of ML?

Eleni_ADec 28, 2022, 11:45 PM
10 points
4 comments1 min readEA link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM
4 points
3 comments1 min readEA link

AI Fore­cast­ing Re­s­olu­tion Coun­cil (Fore­cast­ing in­fras­truc­ture, part 2)

terraformAug 29, 2019, 5:43 PM
28 points
0 comments3 min readEA link

Dist­in­guish­ing test from training

So8resNov 29, 2022, 9:41 PM
27 points
0 comments1 min readEA link

Is RLHF cruel to AI?

HznDec 16, 2024, 2:01 PM
−1 points
2 comments3 min readEA link

De­sign­ing Ar­tifi­cial Wis­dom: De­ci­sion Fore­cast­ing AI & Futarchy

Jordan ArelJul 14, 2024, 5:10 AM
5 points
1 comment6 min readEA link

You won’t solve al­ign­ment with­out agent foundations

MikhailSaminNov 6, 2022, 8:07 AM
14 points
0 comments1 min readEA link

Nav­i­gat­ing AI Safety: Ex­plor­ing Trans­parency with CCACS – A Com­pre­hen­si­ble Ar­chi­tec­ture for Discussion

Ihor IvlievMar 12, 2025, 5:51 PM
2 points
0 comments2 min readEA link

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plexMay 18, 2024, 3:06 PM
13 points
10 comments1 min readEA link
(aisafety.info)

[Question] Best in­tro­duc­tory overviews of AGI safety?

JakubKDec 13, 2022, 7:04 PM
21 points
8 comments2 min readEA link
(www.lesswrong.com)

Crypto ‘or­a­cle pro­to­cols’ for AI al­ign­ment with real-world data?

Geoffrey MillerSep 22, 2022, 11:05 PM
9 points
3 comments1 min readEA link

Key ques­tions about ar­tifi­cial sen­tience: an opinionated guide

rgbApr 25, 2022, 1:42 PM
91 points
3 comments1 min readEA link

Agen­tic Align­ment: Nav­i­gat­ing be­tween Harm and Illegitimacy

LennardZNov 26, 2024, 9:27 PM
2 points
1 comment9 min readEA link

My sum­mary of “Prag­matic AI Safety”

Eleni_ANov 5, 2022, 2:47 PM
14 points
0 comments5 min readEA link

(My sug­ges­tions) On Begin­ner Steps in AI Alignment

Joseph BloomSep 22, 2022, 3:32 PM
37 points
3 comments9 min readEA link

The re­li­gion prob­lem in AI alignment

Geoffrey MillerSep 16, 2022, 1:24 AM
54 points
28 comments11 min readEA link

Posit: Most AI safety peo­ple should work on al­ign­ment/​safety challenges for AI tools that already have users (Stable Diffu­sion, GPT)

nonzerosumDec 20, 2022, 5:23 PM
12 points
3 comments1 min readEA link

How to Diver­sify Con­cep­tual AI Align­ment: the Model Be­hind Refine

adamShimiJul 20, 2022, 10:44 AM
43 points
0 comments9 min readEA link
(www.alignmentforum.org)

[Question] Does the idea of AGI that benev­olently con­trol us ap­peal to EA folks?

Noah ScalesJul 16, 2022, 7:17 PM
6 points
20 comments1 min readEA link

Ap­pendix to Bridg­ing Demonstration

mako yassJun 1, 2022, 8:30 PM
18 points
2 comments28 min readEA link

The ne­ces­sity of “Guardian AI” and two con­di­tions for its achievement

ProicaMay 28, 2024, 11:42 AM
1 point
1 comment15 min readEA link

[Dis­cus­sion] Best in­tu­ition pumps for AI safety

mariushobbhahnNov 6, 2021, 8:11 AM
10 points
8 comments1 min readEA link

LessWrong is now a book, available for pre-or­der!

terraformDec 4, 2020, 8:42 PM
48 points
1 comment7 min readEA link

Birds, Brains, Planes, and AI: Against Ap­peals to the Com­plex­ity/​Mys­te­ri­ous­ness/​Effi­ciency of the Brain

kokotajlodJan 18, 2021, 12:39 PM
27 points
2 comments1 min readEA link

Our Cur­rent Direc­tions in Mechanis­tic In­ter­pretabil­ity Re­search (AI Align­ment Speaker Series)

Group OrganizerApr 8, 2022, 5:08 PM
3 points
0 comments1 min readEA link

[Question] Why AGIs util­ity can’t out­weigh hu­mans’ util­ity?

Alex PSep 20, 2022, 5:16 AM
6 points
25 comments1 min readEA link

AI Risk in Africa

Claude FormanekOct 12, 2021, 2:28 AM
18 points
0 comments10 min readEA link

Ge­offrey Hin­ton on the Past, Pre­sent, and Fu­ture of AI

Stephen McAleeseOct 12, 2024, 4:41 PM
5 points
1 comment1 min readEA link

Prov­ably Hon­est—A First Step

Srijanak DeNov 5, 2022, 9:49 PM
1 point
0 comments1 min readEA link

[Question] Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Impact AcademyNov 27, 2024, 3:33 PM
9 points
1 comment1 min readEA link

List of AI safety courses and resources

Daniel del CastilloSep 6, 2021, 2:26 PM
51 points
8 comments1 min readEA link

Anti-squat­ted AI x-risk do­mains index

plexAug 12, 2022, 12:00 PM
56 points
9 comments1 min readEA link

There Should Be More Align­ment-Driven Startups

vaniverMay 31, 2024, 2:05 AM
27 points
3 comments1 min readEA link

[Question] 1h-vol­un­teers needed for a small AI Safety-re­lated re­search pro­ject

PabloAMC 🔸Aug 16, 2021, 5:51 PM
4 points
0 comments1 min readEA link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni_ASep 1, 2022, 11:45 AM
17 points
1 comment3 min readEA link

A Quick List of Some Prob­lems in AI Align­ment As A Field

Nicholas / Heather KrossJun 21, 2022, 5:09 PM
16 points
10 comments6 min readEA link
(www.thinkingmuchbetter.com)

5 ways to im­prove CoT faithfulness

CBiddulphOct 8, 2024, 4:17 AM
8 points
0 comments1 min readEA link

In­tro­duc­ing the Fund for Align­ment Re­search (We’re Hiring!)

AdamGleaveJul 6, 2022, 2:00 AM
74 points
3 comments4 min readEA link

On Solv­ing Prob­lems Be­fore They Ap­pear: The Weird Episte­molo­gies of Alignment

adamShimiOct 11, 2021, 8:21 AM
28 points
0 comments15 min readEA link

fic­tion about AI risk

Ann Garth 🔸Nov 12, 2020, 10:36 PM
8 points
1 comment1 min readEA link

AI Safety Overview: CERI Sum­mer Re­search Fellowship

Jamie BMar 24, 2022, 3:12 PM
29 points
0 comments2 min readEA link

Who or­dered al­ign­ment’s ap­ple?

Eleni_AAug 28, 2022, 2:24 PM
5 points
0 comments3 min readEA link

The Tree of Life: Stan­ford AI Align­ment The­ory of Change

GabeMJul 2, 2022, 6:32 PM
69 points
5 comments14 min readEA link

[3-hour pod­cast]: Joseph Car­l­smith on longter­mism, utopia, the com­pu­ta­tional power of the brain, meta-ethics, illu­sion­ism and meditation

Gus DockerJul 27, 2021, 1:18 PM
34 points
2 comments1 min readEA link

EA Berkeley Pre­sents: Univer­sal Own­er­ship: Is In­dex In­vest­ing the New So­cially Re­spon­si­ble In­vest­ing?

Mahendra PrasadMar 10, 2022, 6:58 AM
7 points
0 comments1 min readEA link

Con­tribute by fa­cil­i­tat­ing the AGI Safety Fun­da­men­tals Programme

Jamie BDec 6, 2021, 11:50 AM
27 points
0 comments2 min readEA link

AI Fore­cast­ing Ques­tion Database (Fore­cast­ing in­fras­truc­ture, part 3)

terraformSep 3, 2019, 2:57 PM
23 points
2 comments4 min readEA link

Adap­tive Com­pos­able Cog­ni­tive Core Unit (ACCCU)

Ihor IvlievMar 20, 2025, 9:48 PM
10 points
1 comment4 min readEA link

A Five-Year Plan to En­sure AGI Benefits All Animals

Sam TuckerDec 17, 2024, 2:29 AM
30 points
2 comments15 min readEA link

We Ran an AI Timelines Retreat

Lenny McClineMay 17, 2022, 4:40 AM
46 points
6 comments3 min readEA link

What is “wire­head­ing”?

Vishakha AgrawalDec 17, 2024, 5:59 PM
1 point
0 comments1 min readEA link
(aisafety.info)

Align­ment Fak­ing in Large Lan­guage Models

Ryan GreenblattDec 18, 2024, 5:19 PM
142 points
9 comments1 min readEA link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe_CarlsmithDec 18, 2024, 6:22 PM
72 points
1 comment1 min readEA link

Why mis­al­igned AGI won’t lead to mass kil­lings (and what ac­tu­ally mat­ters in­stead)

Julian NalenzFeb 6, 2025, 1:22 PM
−3 points
5 comments3 min readEA link
(blog.hermesloom.org)

In­tro to car­ing about AI al­ign­ment as an EA cause

So8resApr 14, 2017, 12:42 AM
28 points
10 comments25 min readEA link

[Link] Thiel on GCRs

Milan GriffesJul 22, 2019, 8:47 PM
28 points
11 comments1 min readEA link

Short-Term AI Align­ment as a Pri­or­ity Cause

len.hoang.lnhFeb 11, 2020, 4:22 PM
17 points
11 comments7 min readEA link

AI data gaps could lead to on­go­ing An­i­mal Suffering

Darkness8i8Oct 17, 2024, 10:52 AM
13 points
3 comments5 min readEA link

A Rocket–In­ter­pretabil­ity Analogy

plexOct 21, 2024, 1:55 PM
13 points
1 comment1 min readEA link

AI Value Align­ment Speaker Series Pre­sented By EA Berkeley

Mahendra PrasadMar 1, 2022, 6:17 AM
2 points
0 comments1 min readEA link

Gen­eral ad­vice for tran­si­tion­ing into The­o­ret­i­cal AI Safety

Martín SotoSep 15, 2022, 5:23 AM
25 points
0 comments10 min readEA link

AI ac­cel­er­a­tion from a safety per­spec­tive: Trade-offs and con­sid­er­a­tions

mariushobbhahnJan 19, 2022, 9:44 AM
12 points
1 comment7 min readEA link

Miles Brundage re­signed from OpenAI, and his AGI readi­ness team was disbanded

GarrisonOct 23, 2024, 11:42 PM
57 points
4 comments7 min readEA link
(garrisonlovely.substack.com)

AI, Greed, and the Death of Over­sight: When In­sti­tu­tions Ig­nore Their Own Limits

funnyfrancoMar 21, 2025, 1:13 PM
8 points
0 comments26 min readEA link

Con­sider grant­ing AIs freedom

Matthew_BarnettDec 6, 2024, 12:55 AM
80 points
22 comments5 min readEA link

De­sir­able? AI qualities

brb243Mar 21, 2022, 10:05 PM
7 points
0 comments2 min readEA link

How do we solve the al­ign­ment prob­lem?

Joe_CarlsmithFeb 13, 2025, 6:27 PM
28 points
1 comment1 min readEA link
(joecarlsmith.substack.com)

Teach­ing AI to rea­son: this year’s most im­por­tant story

Benjamin_ToddFeb 13, 2025, 5:56 PM
140 points
18 comments8 min readEA link
(benjamintodd.substack.com)

Con­fused about AI re­search as a means of ad­dress­ing AI risk

Eli RoseFeb 21, 2019, 12:07 AM
31 points
15 comments1 min readEA link

What Areas of AI Safety and Align­ment Re­search are Largely Ig­nored?

Andy E WilliamsDec 27, 2024, 12:19 PM
4 points
0 comments1 min readEA link

[Link post] Promis­ing Paths to Align­ment—Con­nor Leahy | Talk

frances_lorenzMay 14, 2022, 3:58 PM
17 points
0 comments1 min readEA link

[Question] What pre­dic­tions from the­o­ret­i­cal AI Safety re­search have been con­firmed by em­piri­cal work?

freedomandutilityDec 29, 2024, 8:19 AM
43 points
10 comments1 min readEA link

A stub­born un­be­liever fi­nally gets the depth of the AI al­ign­ment problem

aelwoodOct 13, 2022, 3:16 PM
32 points
7 comments1 min readEA link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

evhubDec 20, 2022, 8:09 PM
25 points
0 comments1 min readEA link

But ex­actly how com­plex and frag­ile?

Katja_GraceDec 13, 2019, 7:05 AM
37 points
3 comments3 min readEA link
(meteuphoric.com)

Beg­ging, Plead­ing AI Orgs to Com­ment on NIST AI Risk Man­age­ment Framework

BridgesApr 15, 2022, 7:35 PM
87 points
3 comments2 min readEA link

[Question] How to get more aca­demics en­thu­si­as­tic about do­ing AI Safety re­search?

PabloAMC 🔸Sep 4, 2021, 2:10 PM
25 points
19 comments1 min readEA link

Does gen­er­al­ity pay? GPT-3 can provide pre­limi­nary ev­i­dence.

Eevee🔹Jul 12, 2020, 6:53 PM
21 points
4 comments2 min readEA link

Im­pli­ca­tions of Quan­tum Com­put­ing for Ar­tifi­cial In­tel­li­gence al­ign­ment re­search (ABRIDGED)

Jaime SevillaSep 5, 2019, 2:56 PM
25 points
4 comments2 min readEA link

Tur­ing-Test-Pass­ing AI im­plies Aligned AI

RokoDec 31, 2024, 8:22 PM
0 points
0 comments5 min readEA link

6 (Po­ten­tial) Mis­con­cep­tions about AI Intellectuals

Ozzie GooenFeb 14, 2025, 11:51 PM
30 points
2 comments12 min readEA link

My thoughts on OpenAI’s al­ign­ment plan

AkashDec 30, 2022, 7:34 PM
16 points
0 comments1 min readEA link

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf LOct 29, 2024, 1:37 PM
14 points
0 comments1 min readEA link
(rudolf.website)

“AI” is an indexical

TW123Jan 3, 2023, 10:00 PM
23 points
2 comments1 min readEA link

Solv­ing al­ign­ment isn’t enough for a flour­ish­ing future

micFeb 2, 2024, 6:22 PM
27 points
0 comments22 min readEA link
(papers.ssrn.com)

ChatGPT un­der­stands, but largely does not gen­er­ate Span­glish (and other code-mixed) text

Milan Weibel🔹Jan 4, 2023, 10:10 PM
6 points
0 comments4 min readEA link
(www.lesswrong.com)

When should we worry about AI power-seek­ing?

Joe_CarlsmithFeb 19, 2025, 7:44 PM
21 points
2 comments1 min readEA link
(joecarlsmith.substack.com)

The moral ar­gu­ment for giv­ing AIs autonomy

Matthew_BarnettJan 8, 2025, 12:59 AM
33 points
7 comments11 min readEA link

De­mon­strat­ing speci­fi­ca­tion gam­ing in rea­son­ing models

Matrice JacobineFeb 20, 2025, 7:26 PM
9 points
0 comments1 min readEA link
(arxiv.org)

Join the AI Align­ment Evals hackathon

lenzJan 14, 2025, 6:17 PM
3 points
0 comments3 min readEA link

Learn­ing as much Deep Learn­ing math as I could in 24 hours

PhosphorousJan 8, 2023, 2:19 AM
58 points
6 comments7 min readEA link

David Krueger on AI Align­ment in Academia and Coordination

Michaël TrazziJan 7, 2023, 9:14 PM
32 points
1 comment3 min readEA link
(theinsideview.ai)

ML Sum­mer Boot­camp Reflec­tion: Aalto EA Finland

Aayush KucheriaJan 12, 2023, 8:24 AM
15 points
2 comments9 min readEA link

Our new video about goal mis­gen­er­al­iza­tion, plus an apology

WriterJan 14, 2025, 2:07 PM
16 points
1 comment1 min readEA link
(youtu.be)

Im­pli­ca­tions of the in­fer­ence scal­ing paradigm for AI safety

Ryan KiddJan 15, 2025, 12:59 AM
46 points
5 comments1 min readEA link

Between Science Fic­tion and Emerg­ing Real­ity: Are We Ready for Digi­tal Per­sons?

Alex (Αλέξανδρος)Mar 13, 2025, 4:09 PM
3 points
1 comment5 min readEA link

[Question] Any Philos­o­phy PhD recom­men­da­tions for stu­dents in­ter­ested in Align­ment Efforts?

rickyhuang.hexuanJan 18, 2023, 5:54 AM
7 points
6 comments1 min readEA link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni_AAug 16, 2022, 3:45 AM
19 points
0 comments5 min readEA link

Prepar­ing for AI-as­sisted al­ign­ment re­search: we need data!

CBiddulphJan 17, 2023, 3:28 AM
11 points
0 comments11 min readEA link

Jan Kirch­ner on AI Alignment

birtesJan 17, 2023, 3:11 PM
5 points
0 comments1 min readEA link

How do fic­tional sto­ries illus­trate AI mis­al­ign­ment?

Vishakha AgrawalJan 15, 2025, 6:16 AM
4 points
0 comments2 min readEA link
(aisafety.info)

Emerg­ing Paradigms: The Case of Ar­tifi­cial In­tel­li­gence Safety

Eleni_AJan 18, 2023, 5:59 AM
16 points
0 comments19 min readEA link

UK AI Bill Anal­y­sis & Opinion

CAISIDFeb 5, 2024, 12:12 AM
18 points
0 comments15 min readEA link

[Question] How can we se­cure more re­search po­si­tions at our uni­ver­si­ties for x-risk re­searchers?

Neil CrawfordSep 6, 2022, 2:41 PM
3 points
2 comments1 min readEA link

11 heuris­tics for choos­ing (al­ign­ment) re­search projects

AkashJan 27, 2023, 12:36 AM
30 points
1 comment1 min readEA link

Mo­ti­va­tion control

Joe_CarlsmithOct 30, 2024, 5:15 PM
18 points
0 comments1 min readEA link

A Fron­tier AI Risk Man­age­ment Frame­work: Bridg­ing the Gap Between Cur­rent AI Prac­tices and Estab­lished Risk Management

simeon_cMar 13, 2025, 6:29 PM
6 points
0 comments1 min readEA link
(arxiv.org)

Share your re­quests for ChatGPT

Kate TranDec 5, 2022, 6:43 PM
8 points
5 comments1 min readEA link

Bench­mark Perfor­mance is a Poor Mea­sure of Gen­er­al­is­able AI Rea­son­ing Capabilities

James FodorFeb 21, 2025, 4:25 AM
12 points
3 comments24 min readEA link

On value in hu­mans, other an­i­mals, and AI

Michele CampoloJan 31, 2023, 11:48 PM
7 points
6 comments5 min readEA link

[Linkpost] Hu­man-nar­rated au­dio ver­sion of “Is Power-Seek­ing AI an Ex­is­ten­tial Risk?”

Joe_CarlsmithJan 31, 2023, 7:19 PM
9 points
0 comments1 min readEA link

Alexan­der and Yud­kowsky on AGI goals

Scott AlexanderJan 31, 2023, 11:36 PM
29 points
1 comment1 min readEA link

Fo­cus on the places where you feel shocked ev­ery­one’s drop­ping the ball

So8resFeb 2, 2023, 12:27 AM
92 points
6 comments1 min readEA link

An au­dio ver­sion of the al­ign­ment prob­lem from a deep learn­ing per­spec­tive by Richard Ngo Et Al

MiguelFeb 3, 2023, 7:32 PM
18 points
0 comments1 min readEA link
(www.whitehatstoic.com)

A dis­cus­sion with ChatGPT on value-based mod­els vs. large lan­guage mod­els, etc..

MiguelFeb 4, 2023, 4:49 PM
4 points
0 comments12 min readEA link
(www.whitehatstoic.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

adamShimiOct 31, 2024, 12:02 PM
9 points
1 comment2 min readEA link
(www.thecompendium.ai)

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Marc CarauleanuMar 13, 2025, 7:09 PM
8 points
0 comments1 min readEA link

In­ter­view with Ro­man Yam­polskiy about AGI on The Real­ity Check

Darren McKeeFeb 18, 2023, 11:29 PM
27 points
0 comments1 min readEA link
(www.trcpodcast.com)

It’s (not) how you use it

Eleni_ASep 7, 2022, 1:28 PM
6 points
3 comments2 min readEA link

Ex­pected im­pact of a ca­reer in AI safety un­der differ­ent opinions

Jordan TaylorJun 14, 2022, 2:25 PM
42 points
16 comments11 min readEA link

[Question] What do you mean with ‘al­ign­ment is solv­able in prin­ci­ple’?

RemmeltJan 17, 2025, 3:03 PM
10 points
1 comment1 min readEA link

Overview | An Eval­u­a­tive Evolu­tion

Matt KeeneFeb 10, 2023, 6:15 PM
−9 points
0 comments5 min readEA link
(www.creatingafuturewewant.com)

The Vi­talik Bu­terin Fel­low­ship in AI Ex­is­ten­tial Safety is open for ap­pli­ca­tions!

Cynthia ChenOct 14, 2022, 3:23 AM
38 points
0 comments2 min readEA link

Wor­ries about la­tent rea­son­ing in LLMs

CBiddulphJan 20, 2025, 9:09 AM
20 points
1 comment1 min readEA link

AI Safety Info Distil­la­tion Fellowship

robertskmilesFeb 17, 2023, 4:16 PM
80 points
1 comment1 min readEA link

What Does an ASI Poli­ti­cal Ecol­ogy Mean for Hu­man Sur­vival?

Nathan SidneyFeb 23, 2025, 8:53 AM
7 points
3 comments1 min readEA link

Why The Fo­cus on Ex­pected Utility Max­imisers?

𝕮𝖎𝖓𝖊𝖗𝖆Dec 27, 2022, 3:51 PM
11 points
1 comment1 min readEA link

AI al­ign­ment re­searchers don’t (seem to) stack

So8resFeb 21, 2023, 12:48 AM
47 points
3 comments1 min readEA link

Seek­ing in­put on a list of AI books for broader audience

Darren McKeeFeb 27, 2023, 10:40 PM
49 points
14 comments5 min readEA link

Train­ing Data At­tri­bu­tion: Ex­am­in­ing Its Adop­tion & Use Cases

Deric ChengJan 22, 2025, 3:40 PM
18 points
1 comment3 min readEA link
(www.convergenceanalysis.org)

AI al­ign­ment as a trans­la­tion problem

Roman LeventovFeb 5, 2024, 2:14 PM
3 points
1 comment1 min readEA link

Op­tion control

Joe_CarlsmithNov 4, 2024, 5:54 PM
11 points
0 comments1 min readEA link

An­thropic: Core Views on AI Safety: When, Why, What, and How

jonmenasterMar 9, 2023, 5:30 PM
107 points
6 comments22 min readEA link
(www.anthropic.com)

Ques­tions about Con­je­cure’s CoEm proposal

AkashMar 9, 2023, 7:32 PM
19 points
0 comments1 min readEA link

Su­per­in­tel­li­gence’s goals are likely to be random

MikhailSaminMar 14, 2025, 1:17 AM
2 points
0 comments1 min readEA link

AI for Epistemics Hackathon

AustinMar 14, 2025, 8:46 PM
29 points
4 comments1 min readEA link
(manifund.substack.com)

AI Align­ment, Sen­tience, and the Sense of Co­her­ence Concept

Jason BabbMar 17, 2025, 1:30 PM
4 points
0 comments1 min readEA link

Ap­ply to a small iter­a­tion of MLAB to be run in Oxford

Rio PAug 29, 2023, 7:39 PM
11 points
0 comments1 min readEA link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von WendtMar 16, 2025, 8:06 AM
8 points
0 comments1 min readEA link

AI safety and con­scious­ness re­search: A brainstorm

Daniel_FriedrichMar 15, 2023, 2:33 PM
11 points
1 comment9 min readEA link

[Question] Can we train AI so that fu­ture philan­thropy is more effec­tive?

Ricardo PimentelNov 3, 2024, 3:08 PM
3 points
0 comments1 min readEA link

[Question] Should I force my­self to work on AGI al­ign­ment?

Isaac BensonAug 24, 2022, 5:25 PM
19 points
17 comments1 min readEA link

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe_CarlsmithNov 24, 2023, 7:18 PM
10 points
1 comment1 min readEA link

OpenAI’s o1 tried to avoid be­ing shut down, and lied about it, in evals

Greg_Colbourn ⏸️ Dec 6, 2024, 3:25 PM
23 points
9 comments1 min readEA link
(www.transformernews.ai)

Ap­ply for MATS Win­ter 2023-24!

utilistrutilOct 21, 2023, 2:34 AM
34 points
2 comments5 min readEA link
(www.lesswrong.com)

Po­ten­tial em­ploy­ees have a unique lever to in­fluence the be­hav­iors of AI labs

oxalisMar 18, 2023, 8:58 PM
139 points
1 comment5 min readEA link

Don’t Dis­miss Sim­ple Align­ment Approaches

Chris LeongOct 21, 2023, 12:31 PM
12 points
0 comments1 min readEA link

[linkpost] Ten Levels of AI Align­ment Difficulty

SammyDMartinJul 4, 2023, 11:23 AM
16 points
0 comments1 min readEA link

An­nounc­ing Timaeus

Stan van WingerdenOct 22, 2023, 1:32 PM
79 points
0 comments5 min readEA link
(www.lesswrong.com)

An­nounc­ing New Begin­ner-friendly Book on AI Safety and Risk

Darren McKeeNov 25, 2023, 3:57 PM
114 points
9 comments1 min readEA link

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 30, 2023, 4:43 PM
6 points
1 comment1 min readEA link

An­nounc­ing #AISum­mitTalks fea­tur­ing Pro­fes­sor Stu­art Rus­sell and many others

OttoOct 24, 2023, 10:16 AM
9 points
1 comment1 min readEA link

PIBBSS Fel­low­ship: Bounty for Refer­rals & Dead­line Extension

Anna_GajdovaJan 17, 2022, 4:23 PM
17 points
7 comments1 min readEA link

OpenAI is start­ing a new “Su­per­in­tel­li­gence al­ign­ment” team and they’re hiring

alejandroJul 5, 2023, 6:27 PM
100 points
16 comments1 min readEA link
(openai.com)

Va­ri­eties of fake al­ign­ment (Sec­tion 1.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 21, 2023, 3:00 PM
6 points
0 comments1 min readEA link

The Dis­solu­tion of AI Safety

RokoDec 12, 2024, 10:46 AM
−7 points
0 comments1 min readEA link
(www.transhumanaxiology.com)

Con­sider try­ing Vivek Heb­bar’s al­ign­ment exercises

AkashOct 24, 2022, 7:46 PM
16 points
0 comments1 min readEA link

[Question] What new psy­chol­ogy re­search could best pro­mote AI safety & al­ign­ment re­search?

Geoffrey MillerJul 13, 2023, 4:30 PM
29 points
13 comments1 min readEA link

Against Agents as an Ap­proach to Aligned Trans­for­ma­tive AI

𝕮𝖎𝖓𝖊𝖗𝖆Dec 27, 2022, 12:47 AM
4 points
0 comments1 min readEA link

Win­ners of AI Align­ment Awards Re­search Contest

AkashJul 13, 2023, 4:14 PM
50 points
1 comment1 min readEA link

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbacon1Jul 25, 2023, 4:55 PM
4 points
0 comments1 min readEA link

AXRP Epi­sode 24 - Su­per­al­ign­ment with Jan Leike

DanielFilanJul 27, 2023, 4:56 AM
23 points
0 comments1 min readEA link
(axrp.net)

[Question] I’m in­ter­view­ing Jan Leike, co-lead of OpenAI’s new Su­per­al­ign­ment pro­ject. What should I ask him?

Robert_WiblinJul 18, 2023, 6:25 PM
51 points
19 comments1 min readEA link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

OttoJul 24, 2023, 10:18 AM
36 points
3 comments7 min readEA link
(time.com)

An­i­mal Rights, The Sin­gu­lar­ity, and Astro­nom­i­cal Suffering

sapphireAug 20, 2020, 8:23 PM
51 points
0 comments3 min readEA link

Carl Shul­man on AI takeover mechanisms (& more): Part II of Dwarkesh Pa­tel in­ter­view for The Lu­nar Society

alejandroJul 25, 2023, 6:31 PM
28 points
0 comments5 min readEA link
(www.dwarkeshpatel.com)

What are the differ­ences be­tween AGI, trans­for­ma­tive AI, and su­per­in­tel­li­gence?

Vishakha AgrawalJan 23, 2025, 10:11 AM
12 points
0 comments3 min readEA link
(aisafety.info)

[Question] What should I read about defin­ing AI “hal­lu­ci­na­tion?”

James-Hartree-LawJan 23, 2025, 1:00 AM
2 points
0 comments1 min readEA link

AGI al­ign­ment re­sults from a se­ries of al­igned ac­tions

hanadulsetDec 27, 2021, 7:33 PM
15 points
1 comment6 min readEA link

A Tri-Opti Com­pat­i­bil­ity Problem

wallowerMar 1, 2025, 7:48 PM
1 point
0 comments1 min readEA link
(philpapers.org)

3 lev­els of threat obfuscation

Holden KarnofskyAug 2, 2023, 5:09 PM
31 points
0 comments6 min readEA link
(www.alignmentforum.org)

The Con­cept of Boundary Layer in Lan­guage Games and Its Im­pli­ca­tions for AI

MirageMar 24, 2023, 1:50 PM
1 point
0 comments7 min readEA link

Sparks of Ar­tifi­cial Gen­eral In­tel­li­gence: Early ex­per­i­ments with GPT-4 | Microsoft Research

𝕮𝖎𝖓𝖊𝖗𝖆Mar 23, 2023, 5:45 AM
15 points
0 comments1 min readEA link

A stylized di­alogue on John Went­worth’s claims about mar­kets and optimization

So8resMar 25, 2023, 10:32 PM
18 points
0 comments1 min readEA link

Time to Think about ASI Con­sti­tu­tions?

ukc10014Jan 27, 2025, 9:28 AM
20 points
0 comments12 min readEA link

[Question] Half-baked al­ign­ment idea

ozbMar 28, 2023, 5:18 AM
9 points
2 comments1 min readEA link

A rough and in­com­plete re­view of some of John Went­worth’s research

So8resMar 28, 2023, 6:52 PM
27 points
0 comments1 min readEA link

The al­ign­ment prob­lem from a deep learn­ing perspective

richard_ngoAug 11, 2022, 3:18 AM
58 points
0 comments26 min readEA link

Want to win the AGI race? Solve al­ign­ment.

leopoldMar 29, 2023, 3:19 PM
56 points
6 comments5 min readEA link
(www.forourposterity.com)

The fun­da­men­tal hu­man value is power.

LinyphiaMar 30, 2023, 3:15 PM
−1 points
5 comments1 min readEA link

Re­cruit the World’s best for AGI Alignment

Greg_Colbourn ⏸️ Mar 30, 2023, 4:41 PM
34 points
8 comments22 min readEA link

AI and Evolution

Dan HMar 30, 2023, 1:09 PM
41 points
1 comment2 min readEA link
(arxiv.org)

[Question] What are the biggest ob­sta­cles on AI safety re­search ca­reer?

jackchang110Mar 31, 2023, 2:53 PM
2 points
1 comment1 min readEA link

Pes­simism about AI Safety

Max_He-HoApr 2, 2023, 7:57 AM
5 points
0 comments25 min readEA link
(www.lesswrong.com)

Two con­cepts of an “epi­sode” (Sec­tion 2.2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 27, 2023, 6:01 PM
11 points
1 comment1 min readEA link

GPTs are Pre­dic­tors, not Imitators

EliezerYudkowskyApr 8, 2023, 7:59 PM
74 points
12 comments1 min readEA link

[Question] Pre­dic­tions for fu­ture AI gov­er­nance?

jackchang110Apr 2, 2023, 4:43 PM
4 points
1 comment1 min readEA link

Orthog­o­nal­ity is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆Apr 3, 2023, 1:57 AM
18 points
4 comments1 min readEA link

If in­ter­pretabil­ity re­search goes well, it may get dangerous

So8resApr 3, 2023, 9:48 PM
33 points
0 comments1 min readEA link

The King and the Golem—The Animation

WriterNov 8, 2024, 6:23 PM
50 points
1 comment1 min readEA link

The Orthog­o­nal­ity Th­e­sis is Not Ob­vi­ously True

Bentham's BulldogApr 5, 2023, 9:08 PM
18 points
12 comments9 min readEA link

AI Con­trol idea: Give an AGI the pri­mary ob­jec­tive of delet­ing it­self, but con­struct ob­sta­cles to this as best we can. All other ob­jec­tives are sec­ondary to this pri­mary goal.

JustausernameApr 3, 2023, 2:32 PM
7 points
4 comments1 min readEA link

AI as a sci­ence, and three ob­sta­cles to al­ign­ment strategies

So8resOct 25, 2023, 9:02 PM
41 points
1 comment1 min readEA link

EA Ex­plorer GPT: A New Tool to Ex­plore Effec­tive Altruism

Vlad_TislenkoNov 12, 2023, 3:36 PM
12 points
1 comment1 min readEA link

Pod­cast/​video/​tran­script: Eliezer Yud­kowsky—Why AI Will Kill Us, Align­ing LLMs, Na­ture of In­tel­li­gence, SciFi, & Rationality

PeterSlatteryApr 9, 2023, 10:37 AM
32 points
2 comments137 min readEA link
(www.youtube.com)

A New Model for Com­pute Cen­ter Verification

Damin Curtis🔹Oct 10, 2023, 7:23 PM
21 points
2 comments5 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

soroushjpNov 7, 2023, 6:00 PM
10 points
0 comments2 min readEA link
(arxiv.org)

Devel­op­ing a Calcu­la­ble Con­science for AI: Equa­tion for Rights Violations

Sean SweeneyDec 12, 2024, 5:50 PM
4 points
1 comment15 min readEA link

Perché il deep learn­ing mod­erno potrebbe ren­dere diffi­cile l’al­linea­mento delle IA

EA ItalyJan 17, 2023, 11:29 PM
1 point
0 comments16 min readEA link

In­ves­ti­gat­ing Self-Preser­va­tion in LLMs: Ex­per­i­men­tal Observations

MakhamFeb 27, 2025, 4:58 PM
9 points
3 comments34 min readEA link

AI safety starter pack

mariushobbhahnMar 28, 2022, 4:05 PM
126 points
13 comments6 min readEA link

Prevenire una catas­trofe legata all’in­tel­li­genza artificiale

EA ItalyJan 17, 2023, 11:07 AM
1 point
0 comments3 min readEA link

[Question] Who would you have on your dream team for solv­ing AGI Align­ment?

Greg_Colbourn ⏸️ Aug 25, 2022, 1:34 PM
10 points
14 comments1 min readEA link

Ap­ply for the ML Win­ter Camp in Cam­bridge, UK [2-10 Jan]

Nathan_BarnardDec 2, 2022, 7:33 PM
50 points
11 comments2 min readEA link

AI Gover­nance Ca­reer Paths for Europeans

careersthrowawayMay 16, 2020, 6:40 AM
83 points
1 comment12 min readEA link

Deep­Mind’s gen­er­al­ist AI, Gato: A non-tech­ni­cal explainer

frances_lorenzMay 16, 2022, 9:19 PM
128 points
13 comments6 min readEA link

Ti­maeus is hiring re­searchers & engineers

Tatiana K. Nesic SkuratovaJan 27, 2025, 2:35 PM
19 points
0 comments4 min readEA link

Not Just For Ther­apy Chat­bots: The Case For Com­pas­sion In AI Mo­ral Align­ment Research

Kenneth_DiaoSep 29, 2024, 10:58 PM
8 points
3 comments12 min readEA link

[Creative Writ­ing Con­test] Me­tal or Mortal

LouisOct 16, 2021, 4:24 PM
7 points
0 comments7 min readEA link

De­fus­ing AGI Danger

Mark XuDec 24, 2020, 11:08 PM
23 points
0 comments2 min readEA link
(www.alignmentforum.org)

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin LeakeApr 19, 2023, 8:17 PM
38 points
0 comments1 min readEA link

The het­ero­gene­ity of hu­man value types: Im­pli­ca­tions for AI alignment

Geoffrey MillerSep 16, 2022, 9:21 PM
27 points
2 comments10 min readEA link

Safety-First Agents/​Ar­chi­tec­tures Are a Promis­ing Path to Safe AGI

Brendon_WongAug 6, 2023, 8:00 AM
6 points
0 comments12 min readEA link

AGI Can­not Be Pre­dicted From Real In­ter­est Rates

Nicholas DeckerJan 28, 2025, 5:45 PM
24 points
3 comments1 min readEA link
(nicholasdecker.substack.com)

Cri­tique of Su­per­in­tel­li­gence Part 2

James FodorDec 13, 2018, 5:12 AM
10 points
12 comments7 min readEA link

Cri­tique of Su­per­in­tel­li­gence Part 5

James FodorDec 13, 2018, 5:19 AM
12 points
2 comments6 min readEA link

A course for the gen­eral pub­lic on AI

LeandroDAug 31, 2020, 1:29 AM
1 point
0 comments1 min readEA link

Skil­ling-up in ML Eng­ineer­ing for Align­ment: re­quest for comments

TheMcDouglasApr 24, 2022, 6:40 AM
8 points
0 comments1 min readEA link

Tether­ware #2: What ev­ery hu­man should know about our most likely AI future

Jáchym FibírFeb 28, 2025, 11:25 AM
3 points
0 comments11 min readEA link
(tetherware.substack.com)

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben KranOct 22, 2022, 4:17 PM
30 points
0 comments7 min readEA link

Visi­ble Thoughts Pro­ject and Bounty Announcement

So8resNov 30, 2021, 12:35 AM
35 points
2 comments13 min readEA link

[Creative Writ­ing Con­test] The Puppy Problem

LouisOct 13, 2021, 2:01 PM
13 points
0 comments7 min readEA link

AI Safety Un­con­fer­ence NeurIPS 2022

Orpheus_LummisNov 7, 2022, 3:39 PM
13 points
5 comments1 min readEA link
(aisafetyevents.org)

A con­ver­sa­tion with Ro­hin Shah

AI ImpactsNov 12, 2019, 1:31 AM
27 points
8 comments33 min readEA link
(aiimpacts.org)

Loss of con­trol of AI is not a likely source of AI x-risk

squekNov 9, 2022, 5:48 AM
8 points
0 comments1 min readEA link

Euro­pean Master’s Pro­grams in Ma­chine Learn­ing, Ar­tifi­cial In­tel­li­gence, and re­lated fields

Master Programs ML/AIJan 17, 2021, 8:09 PM
17 points
4 comments1 min readEA link

[Link and com­men­tary] Beyond Near- and Long-Term: Towards a Clearer Ac­count of Re­search Pri­ori­ties in AI Ethics and Society

MichaelA🔸Mar 14, 2020, 9:04 AM
18 points
0 comments6 min readEA link

[Ex­tended Dead­line: Jan 23rd] An­nounc­ing the PIBBSS Sum­mer Re­search Fellowship

noraDec 18, 2021, 4:54 PM
36 points
1 comment1 min readEA link

A tough ca­reer decision

PabloAMC 🔸Apr 9, 2022, 12:46 AM
68 points
13 comments4 min readEA link

An­nounc­ing the Cam­bridge Bos­ton Align­ment Ini­ti­a­tive [Hiring!]

kuhanjDec 2, 2022, 1:07 AM
83 points
0 comments1 min readEA link

Cri­tique of Su­per­in­tel­li­gence Part 4

James FodorDec 13, 2018, 5:14 AM
4 points
2 comments4 min readEA link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

MiguelApr 26, 2023, 12:40 AM
13 points
0 comments10 min readEA link

How to do the­o­ret­i­cal re­search, a per­sonal perspective

Mark XuAug 19, 2022, 7:43 PM
132 points
7 comments15 min readEA link

[Question] How would a lan­guage model be­come goal-di­rected?

David MJul 16, 2022, 2:50 PM
113 points
20 comments1 min readEA link

Re­search agenda: Su­per­vis­ing AIs im­prov­ing AIs

Quintin PopeApr 29, 2023, 5:09 PM
16 points
0 comments1 min readEA link

A Guide to Fore­cast­ing AI Science Capabilities

Eleni_AApr 29, 2023, 6:51 AM
19 points
1 comment4 min readEA link

Changes in fund­ing in the AI safety field

Sebastian_FarquharFeb 3, 2017, 1:09 PM
34 points
10 comments7 min readEA link

The first AI Safety Camp & onwards

RemmeltJun 7, 2018, 6:49 PM
25 points
2 comments8 min readEA link

“Tak­ing AI Risk Se­ri­ously” – Thoughts by An­drew Critch

RaemonNov 19, 2018, 2:21 AM
26 points
9 comments1 min readEA link
(www.lesswrong.com)

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe_CarlsmithDec 1, 2023, 2:51 PM
6 points
0 comments1 min readEA link

Four rea­sons I find AI safety emo­tion­ally compelling

Kat WoodsJun 28, 2022, 2:01 PM
32 points
5 comments4 min readEA link

Call for Pythia-style foun­da­tion model suite for al­ign­ment research

LucretiaMay 1, 2023, 8:26 PM
10 points
0 comments1 min readEA link

Sum­mary of Stu­art Rus­sell’s new book, “Hu­man Com­pat­i­ble”

Rohin ShahOct 19, 2019, 7:56 PM
33 points
1 comment15 min readEA link
(www.alignmentforum.org)

AGI will ar­rive by the end of this decade ei­ther as a uni­corn or as a black swan

Yuri BarzovOct 21, 2022, 10:50 AM
−4 points
7 comments3 min readEA link

In­ter­view with Tom Chivers: “AI is a plau­si­ble ex­is­ten­tial risk, but it feels as if I’m in Pas­cal’s mug­ging”

felix.hFeb 21, 2021, 1:41 PM
16 points
1 comment7 min readEA link

7 traps that (we think) new al­ign­ment re­searchers of­ten fall into

AkashSep 27, 2022, 11:13 PM
73 points
8 comments1 min readEA link

“In­tro to brain-like-AGI safety” se­ries—halfway point!

Steven ByrnesMar 9, 2022, 3:21 PM
8 points
0 comments2 min readEA link

You Un­der­stand AI Align­ment and How to Make Soup

Leen ArmoushMay 28, 2022, 6:22 AM
0 points
2 comments5 min readEA link

The Hid­den Com­plex­ity of Wishes—The Animation

WriterSep 27, 2023, 5:59 PM
7 points
0 comments1 min readEA link
(youtu.be)

My (naive) take on Risks from Learned Optimization

Artyom KNov 6, 2022, 4:25 PM
5 points
0 comments1 min readEA link

A re­sponse to Matthews on AI Risk

RyanCareyAug 11, 2015, 12:58 PM
11 points
16 comments6 min readEA link

[Question] Are so­cial me­dia al­gorithms an ex­is­ten­tial risk?

Barry GrimesSep 15, 2020, 8:52 AM
24 points
13 comments1 min readEA link

AI Safety Ca­reer Bot­tle­necks Sur­vey Re­sponses Responses

Linda LinseforsMay 28, 2021, 10:41 AM
35 points
1 comment5 min readEA link

ML Safety Schol­ars Sum­mer 2022 Retrospective

TW123Nov 1, 2022, 3:09 AM
56 points
2 comments21 min readEA link

In­tro­duc­ing a New Course on the Eco­nomics of AI

akorinekDec 21, 2021, 4:55 AM
84 points
6 comments2 min readEA link

The role of academia in AI Safety.

PabloAMC 🔸Mar 28, 2022, 12:04 AM
71 points
19 comments3 min readEA link

Orthog­o­nal’s For­mal-Goal Align­ment the­ory of change

Tamsin LeakeMay 5, 2023, 10:36 PM
21 points
0 comments1 min readEA link

An­nual AGI Bench­mark­ing Event

MetaculusAug 26, 2022, 9:31 PM
20 points
2 comments2 min readEA link
(www.metaculus.com)

Book re­view: Ar­chi­tects of In­tel­li­gence by Martin Ford (2018)

OferAug 11, 2020, 5:24 PM
11 points
1 comment2 min readEA link

Col­lege tech­ni­cal AI safety hackathon ret­ro­spec­tive—Ge­or­gia Tech

yixiongNov 14, 2024, 1:34 PM
18 points
0 comments5 min readEA link
(yixiong.substack.com)

Why “just make an agent which cares only about bi­nary re­wards” doesn’t work.

Lysandre TerrisseMay 9, 2023, 4:51 PM
4 points
1 comment3 min readEA link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash JafariDec 5, 2022, 5:37 PM
30 points
7 comments5 min readEA link

Re­sources that (I think) new al­ign­ment re­searchers should know about

AkashOct 28, 2022, 10:13 PM
20 points
2 comments1 min readEA link

[Question] Why not offer a multi-mil­lion /​ billion dol­lar prize for solv­ing the Align­ment Prob­lem?

Aryeh EnglanderApr 17, 2022, 4:08 PM
15 points
9 comments1 min readEA link

Un­veiling the Amer­i­can Public Opinion on AI Mo­ra­to­rium and Govern­ment In­ter­ven­tion: The Im­pact of Me­dia Exposure

OttoMay 8, 2023, 10:49 AM
28 points
5 comments6 min readEA link

AI risk hub in Sin­ga­pore?

kokotajlodOct 29, 2020, 11:51 AM
24 points
3 comments4 min readEA link

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

DavidmanheimNov 22, 2023, 1:22 PM
27 points
3 comments1 min readEA link

13 Re­cent Publi­ca­tions on Ex­is­ten­tial Risk (Jan 2021 up­date)

HaydnBelfieldFeb 8, 2021, 12:42 PM
7 points
2 comments10 min readEA link

Why Is No One Try­ing To Align Profit In­cen­tives With Align­ment Re­search?

PrometheusAug 23, 2023, 1:19 PM
17 points
2 comments4 min readEA link
(www.lesswrong.com)

[Question] Why not to solve al­ign­ment by mak­ing su­per­in­tel­li­gent hu­mans?

PatoOct 16, 2022, 9:26 PM
9 points
12 comments1 min readEA link

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

Oliver SourbutAug 3, 2024, 10:16 AM
4 points
1 comment1 min readEA link
(www.oliversourbut.net)

Three sce­nar­ios of pseudo-al­ign­ment

Eleni_ASep 5, 2022, 8:26 PM
7 points
0 comments3 min readEA link

Re­port on Semi-in­for­ma­tive Pri­ors for AI timelines (Open Philan­thropy)

Tom_DavidsonMar 26, 2021, 5:46 PM
62 points
6 comments2 min readEA link

En­abling more feedback

JJ HepburnDec 10, 2021, 6:52 AM
41 points
3 comments3 min readEA link

Ori­gin and al­ign­ment of goals, mean­ing, and morality

FalseCogsAug 24, 2023, 2:05 PM
1 point
2 comments35 min readEA link

[Cross­post] AI Reg­u­la­tion May Be More Im­por­tant Than AI Align­ment For Ex­is­ten­tial Safety

OttoAug 24, 2023, 4:01 PM
14 points
2 comments5 min readEA link

‘Force mul­ti­pli­ers’ for EA research

Craig DraytonJun 18, 2022, 1:39 PM
18 points
7 comments4 min readEA link

[Question] Benefits/​Risks of Scott Aaron­son’s Ortho­dox/​Re­form Fram­ing for AI Alignment

JeremyNov 21, 2022, 5:47 PM
15 points
5 comments1 min readEA link
(scottaaronson.blog)

Effec­tive Altru­ism Florida’s AI Ex­pert Panel—Record­ing and Slides Available

Sam_E_24May 19, 2023, 7:15 PM
2 points
0 comments1 min readEA link

Feed­back Re­quest on EA Philip­pines’ Ca­reer Ad­vice Re­search for Tech­ni­cal AI Safety

BrianTanOct 3, 2020, 10:39 AM
19 points
5 comments4 min readEA link

Em­piri­cal work that might shed light on schem­ing (Sec­tion 6 of “Schem­ing AIs”)

Joe_CarlsmithDec 11, 2023, 4:30 PM
7 points
1 comment1 min readEA link

Any fur­ther work on AI Safety Suc­cess Sto­ries?

KriegerOct 2, 2022, 11:59 AM
4 points
0 comments1 min readEA link

Biomimetic al­ign­ment: Align­ment be­tween an­i­mal genes and an­i­mal brains as a model for al­ign­ment be­tween hu­mans and AI sys­tems.

Geoffrey MillerMay 26, 2023, 9:25 PM
32 points
1 comment16 min readEA link

In­finite Re­wards, Finite Safety: New Models for AI Mo­ti­va­tion Without In­finite Goals

Whylome TeamNov 12, 2024, 7:21 AM
−5 points
1 comment2 min readEA link

[Question] Anal­ogy of AI Align­ment as Rais­ing a Child?

Aaron_ScherFeb 19, 2022, 9:40 PM
4 points
2 comments1 min readEA link

Sum­mary: Ex­is­ten­tial risk from power-seek­ing AI by Joseph Carlsmith

rileyharrisOct 28, 2023, 3:05 PM
11 points
0 comments6 min readEA link
(www.millionyearview.com)

[Question] How long does it take to un­der­srand AI X-Risk from scratch so that I have a con­fi­dent, clear men­tal model of it from first prin­ci­ples?

Jordan ArelJul 27, 2022, 4:58 PM
29 points
6 comments1 min readEA link

Sta­tus Quo Eng­ines—AI essay

Ilana_Goldowitz_JimenezMay 28, 2023, 2:33 PM
1 point
0 comments15 min readEA link

Ad­vice for new al­ign­ment peo­ple: Info Max

Jonas HallgrenMay 30, 2023, 3:42 PM
9 points
0 comments1 min readEA link

Ab­strac­tion is Big­ger than Nat­u­ral Abstraction

Nicholas / Heather KrossMay 31, 2023, 12:00 AM
2 points
0 comments1 min readEA link

Good Fu­tures Ini­ti­a­tive: Win­ter Pro­ject In­tern­ship

a_e_rNov 27, 2022, 11:27 PM
67 points
7 comments3 min readEA link

In­trin­sic limi­ta­tions of GPT-4 and other large lan­guage mod­els, and why I’m not (very) wor­ried about GPT-n

James FodorJun 3, 2023, 1:09 PM
28 points
3 comments11 min readEA link

[Question] Does China have AI al­ign­ment re­sources/​in­sti­tu­tions? How can we pri­ori­tize cre­at­ing more?

JakubKAug 4, 2022, 7:23 PM
18 points
9 comments1 min readEA link

De­com­pos­ing al­ign­ment to take ad­van­tage of paradigms

Christopher KingJun 4, 2023, 2:26 PM
2 points
0 comments4 min readEA link

New Speaker Series on AI Align­ment Start­ing March 3

Zechen ZhangFeb 26, 2022, 10:58 AM
5 points
0 comments1 min readEA link

Cri­tique of Su­per­in­tel­li­gence Part 1

James FodorDec 13, 2018, 5:10 AM
22 points
13 comments8 min readEA link

[Closed] Prize and fast track to al­ign­ment re­search at ALTER

VanessaSep 18, 2022, 9:15 AM
38 points
0 comments3 min readEA link

AI Align­ment 2018-2019 Review

Habryka [Deactivated]Jan 28, 2020, 9:14 PM
28 points
0 comments6 min readEA link
(www.lesswrong.com)

Distil­la­tion of “How Likely is De­cep­tive Align­ment?”

NickGabsDec 1, 2022, 8:22 PM
10 points
1 comment10 min readEA link

From vol­un­tary to manda­tory, are the ESG dis­clo­sure frame­works still fer­tile ground for un­re­al­ised EA ca­reer path­ways? – A 2023 up­date on ESG po­ten­tial impact

Christopher ChanJun 4, 2023, 12:00 PM
21 points
5 comments11 min readEA link

Pod­cast: Krister Bykvist on moral un­cer­tainty, ra­tio­nal­ity, metaethics, AI and fu­ture pop­u­la­tions

Gus DockerOct 21, 2021, 3:17 PM
8 points
0 comments1 min readEA link
(www.utilitarianpodcast.com)

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe_CarlsmithNov 12, 2024, 8:56 PM
9 points
0 comments1 min readEA link

[Question] Is work­ing on AI safety as dan­ger­ous as ig­nor­ing it?

jkmhSep 20, 2021, 11:06 PM
10 points
5 comments1 min readEA link

En­gag­ing with AI in a Per­sonal Way

Spyder RexDec 4, 2023, 9:23 AM
−9 points
0 comments1 min readEA link

How to store hu­man val­ues on a computer

oliver_siegelNov 4, 2022, 7:36 PM
1 point
2 comments1 min readEA link

Pro­mot­ing com­pas­sion­ate longtermism

jonleightonDec 7, 2022, 2:26 PM
117 points
5 comments12 min readEA link

Sum­ming up “Schem­ing AIs” (Sec­tion 5)

Joe_CarlsmithDec 9, 2023, 3:48 PM
9 points
1 comment1 min readEA link

AI Benefits Post 2: How AI Benefits Differs from AI Align­ment & AI for Good

Cullen 🔸Jun 29, 2020, 4:59 PM
9 points
0 comments2 min readEA link

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe_CarlsmithDec 8, 2023, 9:10 PM
6 points
0 comments1 min readEA link

Defend­ing against Ad­ver­sar­ial Poli­cies in Re­in­force­ment Learn­ing with Alter­nat­ing Training

sergeivolodinFeb 12, 2022, 3:59 PM
1 point
0 comments13 min readEA link

Align­ing AI with Hu­mans by Lev­er­ag­ing Le­gal Informatics

johnjnaySep 18, 2022, 7:43 AM
20 points
11 comments3 min readEA link

There is only one goal or drive—only self-per­pet­u­a­tion counts

freest oneJun 13, 2023, 1:37 AM
2 points
4 comments8 min readEA link

Re­port: Ar­tifi­cial In­tel­li­gence Risk Man­age­ment in Spain

JorgeTorresCJun 15, 2023, 4:08 PM
22 points
0 comments3 min readEA link
(riesgoscatastroficosglobales.com)

De-em­pha­sise al­ign­ment, em­pha­sise restraint

EuanMcLeanFeb 4, 2025, 5:43 PM
19 points
2 comments7 min readEA link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel NandaDec 15, 2021, 11:46 PM
45 points
15 comments16 min readEA link
(www.alignmentforum.org)

New refer­ence stan­dard on LLM Ap­pli­ca­tion se­cu­rity started by OWASP

QuantumForestJun 19, 2023, 7:56 PM
5 points
0 comments1 min readEA link

AGI Mo­ral­ity and Why It Is Un­likely to Emerge as a Fea­ture of Superintelligence

funnyfrancoMar 18, 2025, 7:19 PM
3 points
9 comments18 min readEA link

The count­ing ar­gu­ment for schem­ing (Sec­tions 4.1 and 4.2 of “Schem­ing AIs”)

Joe_CarlsmithDec 6, 2023, 7:28 PM
9 points
1 comment1 min readEA link

Can we simu­late hu­man evolu­tion to cre­ate a some­what al­igned AGI?

Thomas KwaMar 29, 2022, 1:23 AM
19 points
0 comments7 min readEA link

Work­ing at EA or­ga­ni­za­tions se­ries: Ma­chine In­tel­li­gence Re­search Institute

SoerenMindNov 1, 2015, 12:49 PM
8 points
0 comments4 min readEA link

Join the Vir­tual AI Safety Un­con­fer­ence (VAISU)!

NguyênJun 21, 2023, 4:46 AM
23 points
0 comments1 min readEA link
(vaisu.ai)

[Question] Why does (any par­tic­u­lar) AI safety work re­duce s-risks more than it in­creases them?

MichaelStJulesOct 3, 2021, 4:55 PM
48 points
19 comments1 min readEA link

Stu­dent pro­ject for en­gag­ing with AI alignment

Per Ivar FriborgMay 9, 2022, 10:44 AM
35 points
1 comment1 min readEA link

[Question] What “defense lay­ers” should gov­ern­ments, AI labs, and busi­nesses use to pre­vent catas­trophic AI failures?

LintzADec 3, 2021, 2:24 PM
37 points
3 comments1 min readEA link

Yip Fai Tse on an­i­mal welfare & AI safety and long termism

Karthik PalakodetiJun 22, 2023, 12:48 PM
47 points
0 comments1 min readEA link

[Question] What do we know about Mustafa Suley­man’s po­si­tion on AI Safety?

Chris LeongAug 13, 2023, 7:41 PM
14 points
3 comments1 min readEA link

Sum­mary of “The Precipice” (2 of 4): We are a dan­ger to ourselves

rileyharrisAug 13, 2023, 11:53 PM
5 points
0 comments8 min readEA link
(www.millionyearview.com)

Sup­port­ing global co­or­di­na­tion in AI de­vel­op­ment: Why and how to con­tribute to in­ter­na­tional AI standards

pcihonApr 17, 2019, 10:17 PM
21 points
4 comments1 min readEA link

Cri­tique of Su­per­in­tel­li­gence Part 3

James FodorDec 13, 2018, 5:13 AM
3 points
5 comments7 min readEA link

The MASK Bench­mark: Disen­tan­gling Hon­esty From Ac­cu­racy in AI Systems

Mantas MazeikaMar 4, 2025, 5:44 PM
22 points
0 comments2 min readEA link
(www.mask-benchmark.ai)

Me­tac­u­lus is build­ing a team ded­i­cated to AI forecasting

christianOct 18, 2022, 4:08 PM
35 points
0 comments1 min readEA link
(apply.workable.com)

Cen­tre for the Study of Ex­is­ten­tial Risk Four Month Re­port June—Septem­ber 2020

HaydnBelfieldDec 2, 2020, 6:33 PM
24 points
0 comments17 min readEA link

E.A. Me­gapro­ject Ideas

Tomer_GoloboyMar 21, 2022, 1:23 AM
15 points
4 comments4 min readEA link

Data col­lec­tion for AI al­ign­ment—Ca­reer review

Benjamin HiltonJun 3, 2022, 11:44 AM
34 points
1 comment5 min readEA link
(80000hours.org)

LW4EA: Some cruxes on im­pact­ful al­ter­na­tives to AI policy work

JeremyMay 17, 2022, 3:05 AM
11 points
1 comment1 min readEA link
(www.lesswrong.com)

“AI Align­ment” is a Danger­ously Over­loaded Term

RokoDec 15, 2023, 3:06 PM
20 points
2 comments3 min readEA link

Dis­cov­er­ing al­ign­ment wind­falls re­duces AI risk

James BradyFeb 28, 2024, 9:14 PM
22 points
3 comments8 min readEA link
(blog.elicit.com)

[Question] Up­dates on FLI’S Value Align­ment Map?

QubitSwarm99Sep 19, 2022, 12:25 AM
8 points
0 comments1 min readEA link

What if we don’t need a “Hard Left Turn” to reach AGI?

EigengenderJul 15, 2022, 9:49 AM
39 points
7 comments4 min readEA link

The True Story of How GPT-2 Be­came Max­i­mally Lewd

WriterJan 18, 2024, 9:03 PM
23 points
1 comment1 min readEA link
(youtu.be)

Sum­maries: Align­ment Fun­da­men­tals Curriculum

Leon LangSep 19, 2022, 3:43 PM
25 points
1 comment1 min readEA link
(docs.google.com)

A Sketch of AI-Driven Epistemic Lock-In

Ozzie GooenMar 5, 2025, 10:40 PM
13 points
1 comment3 min readEA link

Ought’s the­ory of change

stuhlmuellerApr 12, 2022, 12:09 AM
43 points
4 comments3 min readEA link

AI safety schol­ar­ships look worth-fund­ing (if other fund­ing is sane)

anon-aNov 19, 2019, 12:59 AM
22 points
6 comments2 min readEA link

Clar­ify­ing two uses of “al­ign­ment”

Matthew_BarnettMar 10, 2024, 5:41 PM
36 points
28 comments4 min readEA link

Fol­low along with Columbia EA’s Ad­vanced AI Safety Fel­low­ship!

RohanSJul 2, 2022, 6:07 AM
27 points
0 comments2 min readEA link

Take­aways from a sur­vey on AI al­ign­ment resources

DanielFilanNov 5, 2022, 11:45 PM
20 points
9 comments6 min readEA link
(www.lesswrong.com)

Me­tac­u­lus Launches Fu­ture of AI Series, Based on Re­search Ques­tions by Arb

christianMar 13, 2024, 9:14 PM
34 points
0 comments1 min readEA link
(www.metaculus.com)

The flaws that make to­day’s AI ar­chi­tec­ture un­safe and a new ap­proach that could fix it

80000_HoursJun 22, 2020, 10:15 PM
3 points
0 comments86 min readEA link
(80000hours.org)

[Question] Donat­ing against Short Term AI risks

Jan-WillemNov 16, 2020, 12:23 PM
6 points
10 comments1 min readEA link

Pre­serv­ing and con­tin­u­ing al­ign­ment re­search through a se­vere global catastrophe

A_donorMar 6, 2022, 6:43 PM
40 points
11 comments5 min readEA link

Database of ex­is­ten­tial risk estimates

MichaelA🔸Apr 15, 2020, 12:43 PM
130 points
37 comments5 min readEA link

Against Ex­plo­sive Growth

c.troutSep 4, 2024, 9:45 PM
24 points
9 comments1 min readEA link

AGI Safety Com­mu­ni­ca­tions Initiative

InesJun 11, 2022, 4:30 PM
35 points
6 comments1 min readEA link

What Should We Op­ti­mize—A Conversation

Johannes C. MayerApr 7, 2022, 2:48 PM
1 point
0 comments14 min readEA link

What can the prin­ci­pal-agent liter­a­ture tell us about AI risk?

acFeb 10, 2020, 10:10 AM
26 points
1 comment16 min readEA link

What if do­ing the most good = benev­olent AI takeover and hu­man ex­tinc­tion?

Jordan ArelMar 22, 2024, 7:56 PM
2 points
4 comments3 min readEA link

Amanda Askell: AI safety needs so­cial scientists

EA GlobalMar 4, 2019, 3:50 PM
27 points
0 comments18 min readEA link
(www.youtube.com)

Video and tran­script of pre­sen­ta­tion on Schem­ing AIs

Joe_CarlsmithMar 22, 2024, 3:56 PM
23 points
1 comment1 min readEA link

Asya Ber­gal: Rea­sons you might think hu­man-level AI is un­likely to hap­pen soon

EA GlobalAug 26, 2020, 4:01 PM
24 points
2 comments17 min readEA link
(www.youtube.com)

METR: Mea­sur­ing AI Abil­ity to Com­plete Long Tasks

Ben_West🔸Mar 19, 2025, 4:49 PM
106 points
17 comments1 min readEA link
(metr.org)

Stu­art Rus­sell Hu­man Com­pat­i­ble AI Roundtable with Allan Dafoe, Rob Re­ich, & Ma­ri­etje Schaake

Mahendra PrasadFeb 11, 2021, 7:43 AM
16 points
0 comments1 min readEA link
No comments.