RSS

AI safety

Core TagLast edit: 7 Aug 2024 15:10 UTC by vipulnaik

AI safety is the study of ways to reduce risks posed by artificial intelligence.

Interventions that aim to reduce these risks can be split into:

Reading on why AI might be an existential risk

Hilton, Benjamin (2023) Preventing an AI-related catastrophe, 80000 Hours, March 2023

Cotra, Ajeya (2022) Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover Effective Altruism Forum, July 18

Carlsmith, Joseph (2022) Is Power-Seeking AI an Existential Risk? Arxiv, 16 June

Yudkowsky, Eliezer (2022) AGI Ruin: A List of Lethalities LessWrong, June 5

Ngo et al (2023) The alignment problem from a deep learning perspectiveArxiv, February 23

Arguments against AI safety

AI safety and AI risk is sometimes referred to as a Pascal’s Mugging [1], implying that the risks are tiny and that for any stated level of ignorable risk the the payoffs could be exaggerated to force it to still be a top priority. A response to this is that in a survey of 700 ML researchers, the median answer to the “the probability that the long-run effect of advanced AI on humanity will be “extremely bad (e.g., human extinction)” was 5% with, with 48% of respondents giving 10% or higher[2]. These probabilites are too high (by at least 5 orders of magnitude) to be consider Pascalian.

Further reading on arguments against AI Safety

Grace, Katja (2022) Counterarguments to the basic AI x-risk case EA Forum, October 14

Garfinkel, Ben (2020) Scrutinising classic AI risk arguments 80000 Hours Podcast, July 9

AI safety as a career

80,000 Hours’ medium-depth investigation rates technical AI safety research a “priority path”—among the most promising career opportunities the organization has identified so far.[3][4] Richard Ngo and Holden Karnofsky also have advice for those interested in working on AI Safety[5][6].

Further reading

Gates, Vael (2022) Resources I send to AI researchers about AI safety, Effective Altruism Forum, June 13.

Krakovna, Victoria (2017) Introductory resources on AI safety research, Victoria Krakovna’s Blog, October 19.

Ngo, Richard (2019) Disentangling arguments for the importance of AI safety, Effective Altruism Forum, January 21.

Rice, Issa; Naik, Vipul (2024) Timeline of AI safety, Timelines Wiki

Related entries

AI alignment | AI governance | AI forecasting| AI takeoff | AI race | Economics of artificial intelligence |AI interpretability | AI risk | cooperative AI | building the field of AI safety |

  1. ^

    https://​​twitter.com/​​amasad/​​status/​​1632121317146361856 The CEO of Replit, a coding organisation who are involved in ML Tools

  2. ^
  3. ^

    Todd, Benjamin (2023) The highest impact career paths our research has identified so far, 80,000 Hours, May 12.

  4. ^

    Hilton, Benjamin (2023) AI safety technical research, 80,000 Hours, June 19th

  5. ^

    Ngo, Richard (2023) AGI safety career advice, EA Forum, May 2

  6. ^

    Karnofsky, Holden (2023), Jobs that can help with the most important century, EA Forum, Feb 12

High-level hopes for AI alignment

Holden Karnofsky20 Dec 2022 2:11 UTC
123 points
14 comments19 min readEA link
(www.cold-takes.com)

Re­sources I send to AI re­searchers about AI safety

Vael Gates11 Jan 2023 1:24 UTC
43 points
0 comments1 min readEA link

An­nounc­ing the Win­ners of the 2023 Open Philan­thropy AI Wor­ld­views Contest

Jason Schukraft30 Sep 2023 3:51 UTC
74 points
30 comments2 min readEA link

AI safety needs to scale, and here’s how you can do it

Esben Kran2 Feb 2024 7:17 UTC
32 points
2 comments5 min readEA link
(apartresearch.com)

Chilean AIS Hackathon Retrospective

Agustín Covarrubias 🔸9 May 2023 1:34 UTC
67 points
0 comments5 min readEA link

Katja Grace: Let’s think about slow­ing down AI

peterhartree23 Dec 2022 0:57 UTC
84 points
6 comments2 min readEA link
(worldspiritsockpuppet.substack.com)

Fill out this cen­sus of ev­ery­one in­ter­ested in re­duc­ing catas­trophic AI risks

Alex HT18 May 2024 15:53 UTC
105 points
1 comment1 min readEA link

FLI open let­ter: Pause gi­ant AI experiments

Zach Stein-Perlman29 Mar 2023 4:04 UTC
220 points
38 comments1 min readEA link

An­nounc­ing AI Safety Bulgaria

Aleksandar N. Angelov3 Mar 2024 17:53 UTC
15 points
0 comments1 min readEA link

Me­tac­u­lus Launches Fu­ture of AI Series, Based on Re­search Ques­tions by Arb

christian13 Mar 2024 21:14 UTC
34 points
0 comments1 min readEA link
(www.metaculus.com)

An­nounc­ing the Euro­pean Net­work for AI Safety (ENAIS)

Esben Kran22 Mar 2023 17:57 UTC
124 points
3 comments3 min readEA link

AI Safety Europe Re­treat 2023 Retrospective

Magdalena Wache14 Apr 2023 9:05 UTC
41 points
10 comments1 min readEA link

Launch­ing ap­pli­ca­tions for AI Safety Ca­reers Course In­dia 2024

varun_agr1 May 2024 5:30 UTC
23 points
1 comment1 min readEA link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJT23 Feb 2024 16:01 UTC
26 points
0 comments1 min readEA link

Pre­dictable up­dat­ing about AI risk

Joe_Carlsmith8 May 2023 22:05 UTC
130 points
12 comments36 min readEA link

A Qual­i­ta­tive Case for LTFF: Filling Crit­i­cal Ecosys­tem Gaps

Linch3 Dec 2024 21:57 UTC
89 points
26 comments9 min readEA link

Con­sider grant­ing AIs freedom

Matthew_Barnett6 Dec 2024 0:55 UTC
80 points
22 comments5 min readEA link

Please vote for PauseAI US in the Dona­tion Elec­tion!

Holly Elmore ⏸️ 🔸22 Nov 2024 4:12 UTC
21 points
3 comments2 min readEA link

Dona­tion recom­men­da­tions for xrisk + ai safety

vincentweisser6 Feb 2023 21:25 UTC
17 points
11 comments1 min readEA link

The Choice Transition

Owen Cotton-Barratt18 Nov 2024 12:32 UTC
42 points
1 comment15 min readEA link
(strangecities.substack.com)

Sym­bio­sis, not al­ign­ment, as the goal for liberal democ­ra­cies in the tran­si­tion to ar­tifi­cial gen­eral intelligence

simonfriederich17 Mar 2023 13:04 UTC
18 points
2 comments24 min readEA link
(rdcu.be)

Vael Gates: Risks from Highly-Ca­pable AI (March 2023)

Vael Gates1 Apr 2023 20:54 UTC
31 points
4 comments1 min readEA link
(docs.google.com)

“Near Mid­night in Suicide City”

Greg_Colbourn6 Dec 2024 19:54 UTC
5 points
0 comments1 min readEA link
(www.youtube.com)

Four mind­set dis­agree­ments be­hind ex­is­ten­tial risk dis­agree­ments in ML

RobBensinger11 Apr 2023 4:53 UTC
61 points
2 comments9 min readEA link

Long list of AI ques­tions

NunoSempere6 Dec 2023 11:12 UTC
124 points
14 comments86 min readEA link

Fund­ing case: AI Safety Camp 10

Remmelt12 Dec 2023 9:05 UTC
45 points
13 comments5 min readEA link
(manifund.org)

Cos­mic AI safety

Magnus Vinding6 Dec 2024 22:32 UTC
22 points
5 comments6 min readEA link

Nav­i­gat­ing the New Real­ity in DC: An EIP Primer

IanDavidMoss20 Dec 2024 16:59 UTC
20 points
1 comment13 min readEA link
(effectiveinstitutionsproject.substack.com)

AI for An­i­mals 2025 Con­fer­ence—Get Early Bird Tick­ets Now

Constance Li20 Nov 2024 0:53 UTC
47 points
0 comments1 min readEA link

My cover story in Ja­cobin on AI cap­i­tal­ism and the x-risk debates

Garrison12 Feb 2024 23:34 UTC
154 points
10 comments6 min readEA link
(jacobin.com)

AISN #45: Cen­ter for AI Safety 2024 Year in Review

Center for AI Safety19 Dec 2024 18:14 UTC
11 points
0 comments4 min readEA link
(newsletter.safe.ai)

Win­ners of the Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philosophy

Owen Cotton-Barratt29 Oct 2024 0:02 UTC
37 points
2 comments30 min readEA link
(blog.aiimpacts.org)

AI al­ign­ment re­searchers may have a com­par­a­tive ad­van­tage in re­duc­ing s-risks

Lukas_Gloor15 Feb 2023 13:01 UTC
79 points
5 comments13 min readEA link

Where I Am Donat­ing in 2024

MichaelDickens19 Nov 2024 0:09 UTC
179 points
73 comments46 min readEA link

MIRI’s 2024 End-of-Year Update

RobBensinger3 Dec 2024 4:33 UTC
32 points
7 comments1 min readEA link

We are not alone: many com­mu­ni­ties want to stop Big Tech from scal­ing un­safe AI

Remmelt22 Sep 2023 17:38 UTC
28 points
30 comments4 min readEA link

Prevent­ing an AI-re­lated catas­tro­phe—Prob­lem profile

Benjamin Hilton29 Aug 2022 18:49 UTC
138 points
18 comments4 min readEA link
(80000hours.org)

Here’s how The Mi­das Pro­ject could use ad­di­tional fund­ing.

Tyler Johnston17 Nov 2024 22:15 UTC
20 points
0 comments2 min readEA link

De­cep­tive Align­ment is <1% Likely by Default

DavidW21 Feb 2023 15:07 UTC
54 points
26 comments14 min readEA link

How AI Takeover Might Hap­pen in Two Years

Joshc7 Feb 2025 23:51 UTC
21 points
2 comments29 min readEA link
(x.com)

A case for donat­ing to AI risk re­duc­tion (in­clud­ing if you work in AI)

tlevin2 Dec 2024 19:05 UTC
118 points
5 comments3 min readEA link

An­nounc­ing the Q1 2025 Long-Term Fu­ture Fund grant round

Linch20 Dec 2024 2:17 UTC
42 points
5 comments2 min readEA link

[Question] Seek­ing sug­gested read­ings & videos for a new course on ‘AI and Psy­chol­ogy’

Geoffrey Miller20 May 2024 17:45 UTC
32 points
7 comments1 min readEA link

Against Aschen­bren­ner: How ‘Si­tu­a­tional Aware­ness’ con­structs a nar­ra­tive that un­der­mines safety and threat­ens humanity

Gideon Futerman15 Jul 2024 16:21 UTC
240 points
22 comments21 min readEA link

Evolu­tion pro­vides no ev­i­dence for the sharp left turn

Quintin Pope11 Apr 2023 18:48 UTC
43 points
2 comments1 min readEA link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

Jan_Kulveit11 Apr 2023 9:06 UTC
22 points
0 comments8 min readEA link
(www.lesswrong.com)

[Linkpost] State­ment from Scar­lett Jo­hans­son on OpenAI’s use of the “Sky” voice, that was shock­ingly similar to her own voice.

Linch20 May 2024 23:50 UTC
46 points
8 comments1 min readEA link
(variety.com)

To the Bat Mo­bile!! My Mid-Ca­reer Tran­si­tion into AI Safety

Moneer7 Nov 2024 15:59 UTC
12 points
0 comments3 min readEA link

How I failed to form views on AI safety

Ada-Maaria Hyvärinen17 Apr 2022 11:05 UTC
213 points
72 comments40 min readEA link

The “low-hang­ing fruits” of AI safety

Julian Nalenz19 Dec 2024 13:38 UTC
−1 points
0 comments6 min readEA link
(blog.hermesloom.org)

INTELLECT-1 Re­lease: The First Globally Trained 10B Pa­ram­e­ter Model

Matrice Jacobine29 Nov 2024 23:03 UTC
2 points
1 comment1 min readEA link
(www.primeintellect.ai)

1-year up­date on im­pactRIO, the first AI Safety group in Brazil

João Lucas Duim28 Jun 2024 10:59 UTC
56 points
2 comments10 min readEA link

Two im­por­tant re­cent AI Talks- Ge­bru and Lazar

Gideon Futerman6 Mar 2023 1:30 UTC
−7 points
5 comments1 min readEA link

But why would the AI kill us?

So8res17 Apr 2023 19:38 UTC
45 points
3 comments1 min readEA link

Anti-‘FOOM’ (stop try­ing to make your cute pet name the thing)

david_reinstein14 Apr 2023 16:05 UTC
41 points
17 comments2 min readEA link

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

simeon_c22 Apr 2023 13:49 UTC
21 points
17 comments1 min readEA link

[Question] Can we train AI so that fu­ture philan­thropy is more effec­tive?

Ricardo Pimentel3 Nov 2024 15:08 UTC
3 points
0 comments1 min readEA link

Propos­ing the Con­di­tional AI Safety Treaty (linkpost TIME)

Otto15 Nov 2024 13:56 UTC
12 points
6 comments3 min readEA link
(time.com)

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhub12 Jan 2024 19:51 UTC
65 points
0 comments1 min readEA link
(arxiv.org)

AI Risk US Pres­i­den­tal Candidate

Simon Berens11 Apr 2023 20:18 UTC
12 points
8 comments1 min readEA link

2021 AI Align­ment Liter­a­ture Re­view and Char­ity Comparison

Larks23 Dec 2021 14:06 UTC
176 points
18 comments73 min readEA link

AIS Hun­gary is hiring a part-time Tech­ni­cal Lead! (Dead­line: Dec 31st)

gergo17 Dec 2024 14:08 UTC
9 points
0 comments2 min readEA link

Ex­ec­u­tive Direc­tor for AIS Brus­sels—Ex­pres­sion of interest

gergo19 Dec 2024 9:15 UTC
28 points
0 comments4 min readEA link

Agen­tic Align­ment: Nav­i­gat­ing be­tween Harm and Illegitimacy

LennardZ26 Nov 2024 21:27 UTC
2 points
1 comment9 min readEA link

AI Safety Camp 10

Robert Kralisch26 Oct 2024 11:36 UTC
15 points
0 comments18 min readEA link
(www.lesswrong.com)

Prevent­ing AI Mi­suse: State of the Art Re­search and its Flaws

Madhav Malhotra23 Apr 2023 10:50 UTC
24 points
2 comments11 min readEA link

AI Can Help An­i­mal Ad­vo­cacy More Than It Can Help In­dus­trial Farming

Wladimir J. Alonso26 Nov 2024 9:55 UTC
21 points
10 comments4 min readEA link

[Linkpost] AI Align­ment, Ex­plained in 5 Points (up­dated)

Daniel_Eth18 Apr 2023 8:09 UTC
31 points
2 comments1 min readEA link
(medium.com)

Brain-com­puter in­ter­faces and brain organoids in AI al­ign­ment?

freedomandutility15 Apr 2023 22:28 UTC
8 points
2 comments1 min readEA link

Man­i­fund x AI Worldviews

Austin31 Mar 2023 15:32 UTC
32 points
2 comments2 min readEA link
(manifund.org)

Cy­borg Pe­ri­ods: There will be mul­ti­ple AI transitions

Jan_Kulveit22 Feb 2023 16:09 UTC
68 points
1 comment1 min readEA link

Trendlines in AIxBio evals

ljusten31 Oct 2024 0:09 UTC
39 points
2 comments11 min readEA link
(www.lennijusten.com)

Merger of Deep­Mind and Google Brain

Greg_Colbourn20 Apr 2023 20:16 UTC
11 points
12 comments1 min readEA link
(blog.google)

[SEE NEW EDITS] No, *You* Need to Write Clearer

Nicholas / Heather Kross29 Apr 2023 5:04 UTC
71 points
8 comments1 min readEA link
(www.thinkingmuchbetter.com)

De­con­fus­ing Pauses: Long Term Mo­ra­to­rium vs Slow­ing AI

Gideon Futerman4 Aug 2024 11:32 UTC
17 points
3 comments5 min readEA link

Drexler’s Nanosys­tems is now available online

MikhailSamin1 Jun 2024 14:41 UTC
32 points
4 comments1 min readEA link
(nanosyste.ms)

Filling the Void: A Com­pre­hen­sive Database for AI Risks Materials

J.A.M.28 May 2024 16:03 UTC
10 points
1 comment4 min readEA link

[Question] Why hasn’t there been any sig­nifi­cant AI protest

sammyboiz17 May 2024 2:59 UTC
21 points
14 comments1 min readEA link

My lab’s small AI safety agenda

Jobst Heitzig (vodle.it)18 Jun 2023 12:29 UTC
59 points
26 comments3 min readEA link

Videos on the world’s most press­ing prob­lems, by 80,000 Hours

Bella21 Mar 2024 20:18 UTC
63 points
5 comments2 min readEA link

Public Weights?

Jeff Kaufman 🔸2 Nov 2023 2:51 UTC
20 points
7 comments1 min readEA link

Shap­ing Poli­cies for Eth­i­cal AI Devel­op­ment in Africa

Kuiyaki16 May 2024 14:15 UTC
3 points
0 comments1 min readEA link

Co­or­di­na­tion by com­mon knowl­edge to pre­vent un­con­trol­lable AI

Karl von Wendt14 May 2023 13:37 UTC
14 points
0 comments1 min readEA link

Pro­ject ideas: Backup plans & Co­op­er­a­tive AI

Lukas Finnveden4 Jan 2024 7:26 UTC
25 points
2 comments13 min readEA link
(lukasfinnveden.substack.com)

Cri­tiques of promi­nent AI safety labs: Red­wood Research

Omega31 Mar 2023 8:58 UTC
338 points
91 comments20 min readEA link

AI Safety Im­pact Mar­kets: Your Char­ity Eval­u­a­tor for AI Safety

Dawn Drescher1 Oct 2023 10:47 UTC
28 points
4 comments6 min readEA link
(impactmarkets.substack.com)

Par­tial value takeover with­out world takeover

Katja_Grace18 Apr 2024 3:00 UTC
24 points
2 comments1 min readEA link

Arkose: Or­ga­ni­za­tional Up­dates & Ways to Get Involved

Arkose1 Aug 2024 13:03 UTC
28 points
1 comment1 min readEA link

AISafety.info “How can I help?” FAQ

StevenKaas5 Jun 2023 22:09 UTC
48 points
1 comment1 min readEA link

Did Ben­gio and Teg­mark lose a de­bate about AI x-risk against LeCun and Mitchell?

Karl von Wendt25 Jun 2023 16:59 UTC
80 points
24 comments1 min readEA link

AISC 2024 - Pro­ject Summaries

Nicky Pochinkov27 Nov 2023 22:35 UTC
13 points
1 comment18 min readEA link

An­nounc­ing Fore­castBench, a new bench­mark for AI and hu­man fore­cast­ing abilities

Forecasting Research Institute1 Oct 2024 12:31 UTC
20 points
1 comment3 min readEA link
(arxiv.org)

An­nounc­ing the AI Fables Writ­ing Con­test!

Daystar Eld12 Jul 2023 3:04 UTC
76 points
52 comments3 min readEA link

Pro­ject ideas: Epistemics

Lukas Finnveden4 Jan 2024 7:26 UTC
43 points
1 comment17 min readEA link
(lukasfinnveden.substack.com)

AGI safety ca­reer advice

richard_ngo2 May 2023 7:36 UTC
211 points
20 comments1 min readEA link

Oper­a­tional­iz­ing timelines

Zach Stein-Perlman10 Mar 2023 17:30 UTC
30 points
2 comments1 min readEA link

Claude Doesn’t Want to Die

Garrison5 Mar 2024 6:00 UTC
22 points
14 comments10 min readEA link
(garrisonlovely.substack.com)

Should AI X-Risk Wor­ri­ers Short the Mar­ket?

postlibertarian4 Nov 2024 16:16 UTC
14 points
1 comment6 min readEA link

AI al­ign­ment, hu­man al­ign­ment, oh my

MilesW31 Oct 2024 3:23 UTC
−12 points
0 comments2 min readEA link

NIST Seeks Com­ments On “Safety Con­sid­er­a­tions for Chem­i­cal and/​or Biolog­i­cal AI Models”

Dylan Richardson26 Oct 2024 18:28 UTC
15 points
0 comments1 min readEA link
(www.federalregister.gov)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

adamShimi31 Oct 2024 12:02 UTC
9 points
1 comment2 min readEA link
(www.thecompendium.ai)

Re­quest to AGI or­ga­ni­za­tions: Share your views on paus­ing AI progress

Akash11 Apr 2023 17:30 UTC
85 points
1 comment1 min readEA link

My fa­vorite AI gov­er­nance re­search this year so far

Zach Stein-Perlman23 Jul 2023 22:00 UTC
81 points
4 comments7 min readEA link
(blog.aiimpacts.org)

Which in­cen­tives should be used to en­courage com­pli­ance with UK AI leg­is­la­tion?

jcw18 Nov 2024 18:13 UTC
12 points
0 comments12 min readEA link

In­ter­ac­tive AI Gover­nance Map

Hamish McDoodles12 Mar 2024 10:02 UTC
66 points
8 comments1 min readEA link

Please won­der about the hard parts of the al­ign­ment problem

MikhailSamin11 Jul 2023 17:02 UTC
8 points
0 comments1 min readEA link

Fu­ture Mat­ters #8: Bing Chat, AI labs on safety, and paus­ing Fu­ture Matters

Pablo21 Mar 2023 14:50 UTC
81 points
5 comments24 min readEA link

AI strat­egy given the need for good reflection

Owen Cotton-Barratt18 Mar 2024 0:48 UTC
40 points
1 comment5 min readEA link

EU poli­cy­mak­ers reach an agree­ment on the AI Act

tlevin15 Dec 2023 6:03 UTC
109 points
13 comments1 min readEA link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

Holden Karnofsky15 Mar 2023 20:17 UTC
113 points
3 comments1 min readEA link

[Question] What is the cur­rent most rep­re­sen­ta­tive EA AI x-risk ar­gu­ment?

Matthew_Barnett15 Dec 2023 22:04 UTC
117 points
50 comments3 min readEA link

AI safety starter pack

mariushobbhahn28 Mar 2022 16:05 UTC
126 points
13 comments6 min readEA link

AGI Catas­tro­phe and Takeover: Some Refer­ence Class-Based Priors

zdgroff24 May 2023 19:14 UTC
103 points
10 comments6 min readEA link

When “hu­man-level” is the wrong thresh­old for AI

Ben Millwood🔸22 Jun 2024 14:34 UTC
38 points
3 comments7 min readEA link

Par­tial Tran­script of Re­cent Se­nate Hear­ing Dis­cussing AI X-Risk

Daniel_Eth27 Jul 2023 9:16 UTC
150 points
2 comments22 min readEA link
(medium.com)

Count­ing ar­gu­ments provide no ev­i­dence for AI doom

Nora Belrose27 Feb 2024 23:03 UTC
84 points
15 comments1 min readEA link

Ten ar­gu­ments that AI is an ex­is­ten­tial risk

Katja_Grace14 Aug 2024 21:51 UTC
30 points
0 comments7 min readEA link

In favour of ex­plor­ing nag­ging doubts about x-risk

Owen Cotton-Barratt25 Jun 2024 23:52 UTC
89 points
15 comments2 min readEA link

An­nounc­ing the CLR Foun­da­tions Course and CLR S-Risk Seminars

James Faville19 Nov 2024 1:18 UTC
52 points
2 comments3 min readEA link

New Busi­ness Wars pod­cast sea­son on Sam Alt­man and OpenAI

Eevee🔹2 Apr 2024 6:22 UTC
10 points
0 comments1 min readEA link
(wondery.com)

Why some peo­ple dis­agree with the CAIS state­ment on AI

David_Moss15 Aug 2023 13:39 UTC
144 points
15 comments16 min readEA link

AI-nu­clear in­te­gra­tion: ev­i­dence of au­toma­tion bias from hu­mans and LLMs [re­search sum­mary]

Tao27 Apr 2024 21:59 UTC
17 points
2 comments12 min readEA link

Sen­tience In­sti­tute 2021 End of Year Summary

Ali26 Nov 2021 14:40 UTC
66 points
5 comments6 min readEA link
(www.sentienceinstitute.org)

Break­through in AI agents? (On Devin—The Zvi, linkpost)

SiebeRozendal20 Mar 2024 9:43 UTC
16 points
9 comments1 min readEA link
(thezvi.substack.com)

Large Lan­guage Models as Fi­du­cia­ries to Humans

johnjnay24 Jan 2023 19:53 UTC
25 points
0 comments34 min readEA link
(papers.ssrn.com)

The mar­ket plau­si­bly ex­pects AI soft­ware to cre­ate trillions of dol­lars of value by 2027

Benjamin_Todd6 May 2024 5:16 UTC
88 points
19 comments1 min readEA link
(benjamintodd.substack.com)

De­tails on how an IAEA-style AI reg­u­la­tor would func­tion?

freedomandutility3 Jun 2023 12:03 UTC
12 points
5 comments1 min readEA link

Fund­ing AI Safety poli­ti­cal ad­vo­cacy in the US: In­di­vi­d­ual donors and small dona­tions may be es­pe­cially helpful

Holly Elmore ⏸️ 🔸14 Nov 2023 23:14 UTC
64 points
8 comments1 min readEA link

Join­ing the Carnegie En­dow­ment for In­ter­na­tional Peace

Holden Karnofsky29 Apr 2024 15:45 UTC
228 points
14 comments2 min readEA link

Men­tor­ship in AGI Safety (MAGIS)

Joe Rogero23 May 2024 18:34 UTC
11 points
1 comment2 min readEA link

Jan Leike: “I’m ex­cited to join @An­throp­icAI to con­tinue the su­per­al­ign­ment mis­sion!”

defun 🔸28 May 2024 18:08 UTC
35 points
11 comments1 min readEA link
(x.com)

How to Give Com­ing AGI’s the Best Chance of Figur­ing Out Ethics for Us

Sean Sweeney23 May 2024 19:44 UTC
1 point
1 comment10 min readEA link

Hooray for step­ping out of the limelight

So8res1 Apr 2023 2:45 UTC
103 points
0 comments1 min readEA link

AI Win­ter Sea­son at EA Hotel

CEEALAR25 Sep 2024 13:36 UTC
57 points
2 comments1 min readEA link

AI stocks could crash. And that could have im­pli­ca­tions for AI safety

Benjamin_Todd9 May 2024 7:23 UTC
173 points
41 comments4 min readEA link
(benjamintodd.substack.com)

Dis­rupt­ing mal­i­cious uses of AI by state-af­fili­ated threat actors

Agustín Covarrubias 🔸14 Feb 2024 21:28 UTC
22 points
1 comment1 min readEA link
(openai.com)

[Question] What’s the best way to get a sense of the day-to-day ac­tivi­ties of differ­ent re­searchers/​re­search di­rec­tions? (AI Gover­nance)

Luise27 May 2024 12:48 UTC
15 points
1 comment1 min readEA link

[Linkpost] Given Ex­tinc­tion Wor­ries, Why Don’t AI Re­searchers Quit? Well, Sev­eral Reasons

Daniel_Eth6 Jun 2023 7:31 UTC
25 points
6 comments1 min readEA link
(medium.com)

AI do­ing philos­o­phy = AI gen­er­at­ing hands?

Wei Dai15 Jan 2024 9:04 UTC
67 points
6 comments3 min readEA link

Dario Amodei — Machines of Lov­ing Grace

Matrice Jacobine11 Oct 2024 21:39 UTC
66 points
0 comments1 min readEA link
(darioamodei.com)

How to Ad­dress EA Dilem­mas – What is Miss­ing from EA Values?

alexis schoenlaub13 Oct 2024 9:33 UTC
6 points
4 comments6 min readEA link

Cor­po­rate cam­paigns work: a key learn­ing for AI Safety

Jamie_Harris17 Aug 2023 21:35 UTC
72 points
12 comments6 min readEA link

The Tech In­dus­try is the Biggest Blocker to Mean­ingful AI Safety Regulations

Garrison16 Aug 2024 19:37 UTC
139 points
8 comments8 min readEA link
(garrisonlovely.substack.com)

Slim overview of work one could do to make AI go bet­ter (and a grab-bag of other ca­reer con­sid­er­a­tions)

Chi20 Mar 2024 23:17 UTC
34 points
1 comment3 min readEA link

Among the A.I. Doom­say­ers—The New Yorker

Agustín Covarrubias 🔸11 Mar 2024 21:12 UTC
66 points
0 comments1 min readEA link
(www.newyorker.com)

OpenAI in­tro­duces func­tion call­ing for GPT-4

mic20 Jun 2023 1:58 UTC
26 points
0 comments1 min readEA link

Prob­lem-solv­ing tasks in Graph The­ory for lan­guage mod­els

Bruno López Orozco1 Oct 2024 12:36 UTC
21 points
1 comment9 min readEA link

NYT: Google will ‘re­cal­ibrate’ the risk of re­leas­ing AI due to com­pe­ti­tion with OpenAI

Michael Huang22 Jan 2023 2:13 UTC
173 points
8 comments1 min readEA link
(www.nytimes.com)

Train for in­cor­rigi­bil­ity, then re­verse it (Shut­down Prob­lem Con­test Sub­mis­sion)

Daniel_Eth18 Jul 2023 8:26 UTC
16 points
0 comments2 min readEA link

Shut­ting down all com­pet­ing AI pro­jects might not buy a lot of time due to In­ter­nal Time Pressure

ThomasCederborg3 Oct 2024 0:05 UTC
6 points
1 comment12 min readEA link

2/​3 Aussie & NZ AI Safety folk of­ten or some­times feel lonely or dis­con­nected (and 16 other bar­ri­ers to im­pact)

yanni kyriacos1 Aug 2024 1:14 UTC
19 points
11 comments8 min readEA link

A short con­ver­sa­tion I had with Google Gem­ini on the dan­gers of un­reg­u­lated LLM API use, while mildly drunk in an air­port.

EvanMcCormick17 Dec 2024 12:25 UTC
1 point
0 comments8 min readEA link

Anal­ogy Bank for AI Safety

utilistrutil29 Jan 2024 2:35 UTC
14 points
5 comments1 min readEA link

Ap­ply to the Cavendish Labs Fel­low­ship (by 4/​15)

Derik K3 Apr 2023 23:06 UTC
35 points
2 comments1 min readEA link

[MLSN #8]: Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

TW12320 Feb 2023 16:06 UTC
25 points
0 comments4 min readEA link
(newsletter.mlsafety.org)

2024: a year of con­soli­da­tion for ORCG

JorgeTorresC18 Dec 2024 17:47 UTC
33 points
0 comments7 min readEA link
(www.orcg.info)

Why AGI sys­tems will not be fa­nat­i­cal max­imisers (un­less trained by fa­nat­i­cal hu­mans)

titotal17 May 2023 11:58 UTC
43 points
3 comments15 min readEA link

Sam Alt­man re­turn­ing as OpenAI CEO “in prin­ci­ple”

Fermi–Dirac Distribution22 Nov 2023 6:15 UTC
55 points
37 comments1 min readEA link

Data Tax­a­tion: A Pro­posal for Slow­ing Down AGI Progress

Per Ivar Friborg11 Apr 2023 17:27 UTC
42 points
6 comments12 min readEA link

Solv­ing ad­ver­sar­ial at­tacks in com­puter vi­sion as a baby ver­sion of gen­eral AI alignment

Stanislav Fort31 Aug 2024 16:15 UTC
3 points
1 comment7 min readEA link

[Linkpost] 538 Poli­tics Pod­cast on AI risk & politics

jackva11 Apr 2023 17:03 UTC
64 points
5 comments1 min readEA link
(fivethirtyeight.com)

How to help cru­cial AI safety leg­is­la­tion pass with 10 min­utes of effort

ThomasW11 Sep 2024 19:14 UTC
258 points
33 comments3 min readEA link

Non-al­ign­ment pro­ject ideas for mak­ing trans­for­ma­tive AI go well

Lukas Finnveden4 Jan 2024 7:23 UTC
66 points
1 comment3 min readEA link
(lukasfinnveden.substack.com)

Please don’t crit­i­cize EAs who “sell out” to OpenAI and Anthropic

Eevee🔹5 Mar 2023 21:17 UTC
−4 points
21 comments2 min readEA link

Timelines are short, p(doom) is high: a global stop to fron­tier AI de­vel­op­ment un­til x-safety con­sen­sus is our only rea­son­able hope

Greg_Colbourn12 Oct 2023 11:24 UTC
73 points
85 comments9 min readEA link

AI Safety Ac­tion Plan—A re­port com­mis­sioned by the US State Department

Agustín Covarrubias 🔸11 Mar 2024 22:13 UTC
25 points
1 comment1 min readEA link
(www.gladstone.ai)

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

Marc Carauleanu18 Dec 2023 21:13 UTC
21 points
0 comments12 min readEA link

A fresh­man year dur­ing the AI midgame: my ap­proach to the next year

Buck14 Apr 2023 0:38 UTC
179 points
30 comments7 min readEA link

The Leeroy Jenk­ins prin­ci­ple: How faulty AI could guaran­tee “warn­ing shots”

titotal14 Jan 2024 15:03 UTC
54 points
2 comments21 min readEA link
(titotal.substack.com)

AI Risk & Policy Fore­casts from Me­tac­u­lus & FLI’s AI Path­ways Workshop

Will Aldred16 May 2023 8:53 UTC
41 points
0 comments8 min readEA link

Some Things I Heard about AI Gover­nance at EAG

utilistrutil28 Feb 2023 21:27 UTC
35 points
5 comments6 min readEA link

[Question] Con­crete, ex­ist­ing ex­am­ples of high-im­pact risks from AI?

freedomandutility15 Apr 2023 22:19 UTC
9 points
1 comment1 min readEA link

Bounty for Ev­i­dence on Some of Pal­isade Re­search’s Beliefs

bwr23 Sep 2024 20:05 UTC
5 points
0 comments1 min readEA link

“Aligned with who?” Re­sults of sur­vey­ing 1,000 US par­ti­ci­pants on AI values

Holly Morgan21 Mar 2023 22:07 UTC
41 points
0 comments2 min readEA link
(www.lesswrong.com)

Cur­rent UK gov­ern­ment lev­ers on AI development

rosehadshar10 Apr 2023 13:16 UTC
82 points
3 comments4 min readEA link

FLI re­port: Poli­cy­mak­ing in the Pause

Zach Stein-Perlman15 Apr 2023 17:01 UTC
29 points
4 comments1 min readEA link

Whether you should do a PhD doesn’t de­pend much on timelines.

alex lawsen22 Mar 2023 12:25 UTC
67 points
7 comments4 min readEA link

List of AI safety newslet­ters and other resources

Lizka1 May 2023 17:24 UTC
49 points
5 comments4 min readEA link

[Question] Should peo­ple get neu­ro­science phD to work in AI safety field?

jackchang1107 Mar 2023 16:21 UTC
9 points
11 comments1 min readEA link

Dis­cus­sion about AI Safety fund­ing (FB tran­script)

Akash30 Apr 2023 19:05 UTC
104 points
10 comments6 min readEA link

Refram­ing the bur­den of proof: Com­pa­nies should prove that mod­els are safe (rather than ex­pect­ing au­di­tors to prove that mod­els are dan­ger­ous)

Akash25 Apr 2023 18:49 UTC
35 points
1 comment1 min readEA link

A Roundtable for Safe AI (RSAI)?

Lara_TH9 Mar 2023 12:11 UTC
9 points
0 comments4 min readEA link

[Linkpost] Scott Alexan­der re­acts to OpenAI’s lat­est post

Akash11 Mar 2023 22:24 UTC
105 points
4 comments1 min readEA link

Ex­plor­ing Me­tac­u­lus’s AI Track Record

Peter Scoblic1 May 2023 21:02 UTC
52 points
5 comments5 min readEA link

Re­search agenda: Su­per­vis­ing AIs im­prov­ing AIs

Quintin Pope29 Apr 2023 17:09 UTC
16 points
0 comments1 min readEA link

Misal­ign­ment Mu­seum opens in San Fran­cisco: ‘Sorry for kil­ling most of hu­man­ity’

Michael Huang4 Mar 2023 7:09 UTC
99 points
6 comments1 min readEA link
(www.misalignmentmuseum.com)

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin Leake19 Apr 2023 20:17 UTC
38 points
0 comments1 min readEA link

AI Progress: The Game Show

Alex Arnett21 Apr 2023 16:47 UTC
3 points
0 comments2 min readEA link

World and Mind in Ar­tifi­cial In­tel­li­gence: ar­gu­ments against the AI pause

Arturo Macias18 Apr 2023 14:35 UTC
6 points
3 comments5 min readEA link

Risk of AI de­cel­er­a­tion.

Micah Zoltu18 Apr 2023 11:19 UTC
9 points
14 comments3 min readEA link

The ba­sic rea­sons I ex­pect AGI ruin

RobBensinger18 Apr 2023 3:37 UTC
58 points
13 comments1 min readEA link

“Risk Aware­ness Mo­ments” (Rams): A con­cept for think­ing about AI gov­er­nance interventions

oeg14 Apr 2023 17:40 UTC
53 points
0 comments9 min readEA link

[linkpost] “What Are Rea­son­able AI Fears?” by Robin Han­son, 2023-04-23

Arjun Panickssery14 Apr 2023 23:26 UTC
41 points
3 comments4 min readEA link
(quillette.com)

[Question] Who is test­ing AI Safety pub­lic out­reach mes­sag­ing?

yanni kyriacos15 Apr 2023 0:53 UTC
20 points
2 comments1 min readEA link

AGI Safety Needs Peo­ple With All Skil­lsets!

Severin25 Jul 2022 13:30 UTC
33 points
7 comments2 min readEA link

UK Govern­ment an­nounces £100 mil­lion in fund­ing for Foun­da­tion Model Task­force.

Jordan Pieters 🔸25 Apr 2023 11:29 UTC
10 points
1 comment1 min readEA link
(www.gov.uk)

How we could stum­ble into AI catastrophe

Holden Karnofsky16 Jan 2023 14:52 UTC
83 points
0 comments31 min readEA link
(www.cold-takes.com)

Sen­tinel min­utes for week #52/​2024

NunoSempere30 Dec 2024 18:25 UTC
61 points
0 comments6 min readEA link
(blog.sentinel-team.org)

Help us find pain points in AI safety

Esben Kran12 Apr 2022 18:43 UTC
31 points
4 comments9 min readEA link

Nav­i­gat­ing the Open-Source AI Land­scape: Data, Fund­ing, and Safety

AndreFerretti12 Apr 2023 10:30 UTC
23 points
3 comments10 min readEA link

Mea­sur­ing ar­tifi­cial in­tel­li­gence on hu­man bench­marks is naive

Ward A11 Apr 2023 11:28 UTC
3 points
2 comments1 min readEA link

Ex­is­ten­tial risk x Crypto: An un­con­fer­ence at Zuzalu

Yesh11 Apr 2023 13:31 UTC
6 points
0 comments1 min readEA link

How ma­jor gov­ern­ments can help with the most im­por­tant century

Holden Karnofsky24 Feb 2023 19:37 UTC
56 points
4 comments4 min readEA link
(www.cold-takes.com)

How can OSINT be used for the en­force­ment of the EU AI Act?

Kristina7 Jun 2024 11:07 UTC
8 points
1 comment1 min readEA link

How to pur­sue a ca­reer in tech­ni­cal AI alignment

Charlie Rogers-Smith4 Jun 2022 21:36 UTC
265 points
9 comments39 min readEA link

Is fear pro­duc­tive when com­mu­ni­cat­ing AI x-risk? [Study re­sults]

Johanna Roniger22 Jan 2024 5:38 UTC
78 points
10 comments5 min readEA link

Gaia Net­work: An Illus­trated Primer

Roman Leventov26 Jan 2024 11:55 UTC
4 points
4 comments15 min readEA link

Stan­dard policy frame­works for AI governance

Nathan_Barnard30 Jan 2024 18:14 UTC
26 points
2 comments3 min readEA link

Map­ping How Alli­ances, Ac­qui­si­tions, and An­titrust are Shap­ing the Fron­tier AI Industry

t6aguirre3 Jun 2024 9:43 UTC
24 points
1 comment2 min readEA link

AI Safety Ar­gu­ments: An In­ter­ac­tive Guide

Lukas Trötzmüller🔸1 Feb 2023 19:21 UTC
32 points
5 comments3 min readEA link

This might be the last AI Safety Camp

Remmelt24 Jan 2024 9:29 UTC
87 points
32 comments1 min readEA link

AI and Work: Sum­maris­ing a New Liter­a­ture Review

cpeppiatt15 Jul 2024 10:27 UTC
13 points
0 comments2 min readEA link
(arxiv.org)

[US] NTIA: AI Ac­countabil­ity Policy Re­quest for Comment

Kyle J. Lucchese13 Apr 2023 16:12 UTC
47 points
4 comments1 min readEA link
(ntia.gov)

[Question] Do you worry about to­tal­i­tar­ian regimes us­ing AI Align­ment tech­nol­ogy to cre­ate AGI that sub­scribe to their val­ues?

diodio_yang28 Feb 2023 18:12 UTC
25 points
12 comments2 min readEA link

What does Bing Chat tell us about AI risk?

Holden Karnofsky28 Feb 2023 18:47 UTC
99 points
8 comments2 min readEA link
(www.cold-takes.com)

Prospects for AI safety agree­ments be­tween countries

oeg14 Apr 2023 17:41 UTC
104 points
3 comments22 min readEA link

ChatGPT not so clever or not so ar­tifi­cial as hyped to be?

Haris Shekeris2 Mar 2023 6:16 UTC
−7 points
2 comments1 min readEA link

Overview of in­tro­duc­tory re­sources in AI Governance

Lucie Philippon 🔸27 May 2024 16:22 UTC
26 points
1 comment6 min readEA link
(www.lesswrong.com)

Pay to get AI safety info from be­hind NDA wall?

louisbarclay5 Jun 2024 10:19 UTC
2 points
2 comments1 min readEA link

14+ AI Safety Ad­vi­sors You Can Speak to – New AISafety.com Resource

Bryce Robertson21 Jan 2025 17:34 UTC
18 points
2 comments1 min readEA link

What can we do now to pre­pare for AI sen­tience, in or­der to pro­tect them from the global scale of hu­man sadism?

rime18 Apr 2023 9:58 UTC
44 points
0 comments2 min readEA link

New Ar­tifi­cial In­tel­li­gence quiz: can you beat ChatGPT?

AndreFerretti3 Mar 2023 15:46 UTC
29 points
3 comments1 min readEA link

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

Oliver Z18 Apr 2023 18:36 UTC
56 points
1 comment4 min readEA link
(newsletter.safe.ai)

There are no co­her­ence theorems

EJT20 Feb 2023 21:52 UTC
107 points
49 comments19 min readEA link

What AI com­pa­nies can do to­day to help with the most im­por­tant century

Holden Karnofsky20 Feb 2023 17:40 UTC
104 points
8 comments11 min readEA link
(www.cold-takes.com)

Rea­sons to have hope

Jordan Pieters 🔸20 Apr 2023 10:19 UTC
53 points
4 comments1 min readEA link

[Closed] MIT Fu­tureTech are hiring for a Head of Oper­a­tions role

PeterSlattery2 Oct 2024 16:51 UTC
8 points
0 comments4 min readEA link

AGI ris­ing: why we are in a new era of acute risk and in­creas­ing pub­lic aware­ness, and what to do now

Greg_Colbourn2 May 2023 10:17 UTC
68 points
35 comments13 min readEA link

A great talk for AI noobs (ac­cord­ing to an AI noob)

Dov23 Apr 2023 5:32 UTC
8 points
0 comments1 min readEA link
(www.youtube.com)

PhD Po­si­tion: AI In­ter­pretabil­ity in Ber­lin, Germany

Martian Moonshine22 Apr 2023 18:57 UTC
24 points
0 comments1 min readEA link
(stephanw.net)

Paper Sum­mary: The Effec­tive­ness of AI Ex­is­ten­tial Risk Com­mu­ni­ca­tion to the Amer­i­can and Dutch Public

Otto9 Mar 2023 10:40 UTC
97 points
11 comments4 min readEA link

Draghi’s re­port sig­nal a less safety-fo­cused Euro­pean Union on AI

t6aguirre9 Sep 2024 18:39 UTC
17 points
3 comments1 min readEA link

[Question] Pre­dic­tions for fu­ture AI gov­er­nance?

jackchang1102 Apr 2023 16:43 UTC
4 points
1 comment1 min readEA link

De­sign­ing Ar­tifi­cial Wis­dom: De­ci­sion Fore­cast­ing AI & Futarchy

Jordan Arel14 Jul 2024 5:10 UTC
5 points
1 comment6 min readEA link

World’s first ma­jor law for ar­tifi­cial in­tel­li­gence gets fi­nal EU green light

Dane Valerie24 May 2024 14:57 UTC
3 points
1 comment2 min readEA link
(www.cnbc.com)

Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down

EliezerYudkowsky9 Apr 2023 15:53 UTC
50 points
3 comments1 min readEA link

Pillars to Convergence

Phlobton1 Apr 2023 13:04 UTC
1 point
0 comments8 min readEA link

Pes­simism about AI Safety

Max_He-Ho2 Apr 2023 7:57 UTC
5 points
0 comments25 min readEA link
(www.lesswrong.com)

Up­dates from Cam­paign for AI Safety

Jolyn Khoo27 Sep 2023 2:44 UTC
16 points
0 comments2 min readEA link
(www.campaignforaisafety.org)

What’s new at FAR AI

AdamGleave4 Dec 2023 21:18 UTC
68 points
0 comments1 min readEA link
(far.ai)

Miti­gat­ing ex­treme AI risks amid rapid progress [Linkpost]

Akash21 May 2024 20:04 UTC
36 points
1 comment1 min readEA link

Epi­sode: Austin vs Linch on OpenAI

Austin25 May 2024 16:15 UTC
22 points
2 comments44 min readEA link
(manifund.substack.com)

Chain­ing the evil ge­nie: why “outer” AI safety is prob­a­bly easy

titotal30 Aug 2022 13:55 UTC
40 points
12 comments10 min readEA link

The two-tiered society

Roman Leventov13 May 2024 7:53 UTC
14 points
5 comments1 min readEA link

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
4 points
3 comments1 min readEA link

Ap­ply to the Cam­bridge ML for Align­ment Boot­camp (CaMLAB) [26 March − 8 April]

hannah9 Feb 2023 16:32 UTC
62 points
1 comment5 min readEA link

Now THIS is fore­cast­ing: un­der­stand­ing Epoch’s Direct Approach

Elliot Mckernon4 May 2024 12:06 UTC
52 points
2 comments19 min readEA link

In DC, a new wave of AI lob­by­ists gains the up­per hand

Chris Leong13 May 2024 7:31 UTC
97 points
7 comments1 min readEA link
(www.politico.com)

MIT Fu­tureTech are hiring for an Oper­a­tions and Pro­ject Man­age­ment role.

PeterSlattery17 May 2024 1:29 UTC
12 points
0 comments3 min readEA link

Weekly newslet­ter for AI safety events and train­ing programs

Bryce Robertson3 May 2024 0:37 UTC
15 points
0 comments1 min readEA link
(www.lesswrong.com)

Open-Source AI: A Reg­u­la­tory Review

Elliot Mckernon29 Apr 2024 10:10 UTC
14 points
1 comment8 min readEA link

Tech­nol­ogy is Power: Rais­ing Aware­ness Of Tech­nolog­i­cal Risks

Marc Wong9 Feb 2023 15:13 UTC
3 points
0 comments2 min readEA link

Cy­ber­se­cu­rity of Fron­tier AI Models: A Reg­u­la­tory Review

Deric Cheng25 Apr 2024 14:51 UTC
9 points
1 comment8 min readEA link

Speedrun: AI Align­ment Prizes

joe9 Feb 2023 11:55 UTC
27 points
0 comments18 min readEA link

Re­search Sum­mary: Fore­cast­ing with Large Lan­guage Models

Damien Laird2 Apr 2023 10:52 UTC
4 points
0 comments7 min readEA link
(damienlaird.substack.com)

Deep­Mind: Fron­tier Safety Framework

Zach Stein-Perlman17 May 2024 17:30 UTC
23 points
0 comments1 min readEA link
(deepmind.google)

List of pro­jects that seem im­pact­ful for AI Governance

JaimeRV14 Jan 2024 16:52 UTC
35 points
2 comments13 min readEA link

How evals might (or might not) pre­vent catas­trophic risks from AI

Akash7 Feb 2023 20:16 UTC
28 points
0 comments1 min readEA link

Danger­ous ca­pa­bil­ity tests should be harder

Luca Righetti 🔸20 Aug 2024 16:11 UTC
23 points
1 comment5 min readEA link
(www.planned-obsolescence.org)

The new UK gov­ern­ment’s stance on AI safety

Elliot Mckernon31 Jul 2024 15:23 UTC
19 points
0 comments1 min readEA link

Dis­cussing AI-Hu­man Col­lab­o­ra­tion Through Fic­tion: The Story of Laika and GPT-∞

Laika27 Jul 2023 6:04 UTC
1 point
0 comments1 min readEA link

Con­scious AI & Public Per­cep­tion: Four futures

nicoleta-k3 Jul 2024 23:06 UTC
12 points
1 comment16 min readEA link

Poster Ses­sion on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC
8 points
0 comments4 min readEA link

‘The AI Dilemma: Growth vs Ex­is­ten­tial Risk’: An Ex­ten­sion for EAs and a Sum­mary for Non-economists

TomHoulden21 Apr 2024 16:28 UTC
65 points
1 comment16 min readEA link

In­tro­duc­tion to Prag­matic AI Safety [Prag­matic AI Safety #1]

TW1239 May 2022 17:02 UTC
68 points
0 comments6 min readEA link

My Feed­back to the UN Ad­vi­sory Body on AI

Heramb Podar4 Apr 2024 23:39 UTC
7 points
1 comment4 min readEA link

Re­port: Eval­u­at­ing an AI Chip Regis­tra­tion Policy

Deric Cheng12 Apr 2024 4:40 UTC
15 points
0 comments5 min readEA link
(www.convergenceanalysis.org)

Ap­ply to Aether—In­de­pen­dent LLM Agent Safety Re­search Group

RohanS21 Aug 2024 9:40 UTC
47 points
13 comments8 min readEA link

An even deeper atheism

Joe_Carlsmith11 Jan 2024 17:28 UTC
26 points
2 comments1 min readEA link

Reza Ne­garestani’s In­tel­li­gence & Spirit

ukc1001427 Jun 2024 18:17 UTC
7 points
1 comment4 min readEA link

What do XPT fore­casts tell us about AI risk?

Forecasting Research Institute19 Jul 2023 7:43 UTC
97 points
21 comments14 min readEA link

MIRI 2024 Mis­sion and Strat­egy Update

Malo5 Jan 2024 1:10 UTC
154 points
38 comments1 min readEA link

Coun­ter­ar­gu­ments to the ba­sic AI risk case

Katja_Grace14 Oct 2022 20:30 UTC
284 points
23 comments34 min readEA link

Is effec­tive al­tru­ism re­ally to blame for the OpenAI de­ba­cle?

Garrison23 Nov 2023 0:44 UTC
13 points
0 comments1 min readEA link
(garrisonlovely.substack.com)

An AI crash is our best bet for re­strict­ing AI

Remmelt11 Oct 2024 2:12 UTC
20 points
3 comments1 min readEA link

UNGA Re­s­olu­tion on AI: 5 Key Take­aways Look­ing to Fu­ture Policy

Heramb Podar24 Mar 2024 12:03 UTC
17 points
1 comment3 min readEA link

Towards ev­i­dence gap-maps for AI safety

dEAsign25 Jul 2023 8:13 UTC
6 points
1 comment2 min readEA link

Risk-averse Batch Ac­tive In­verse Re­ward Design

Panagiotis Liampas7 Oct 2023 8:56 UTC
11 points
0 comments15 min readEA link

Help the UN de­sign global gov­er­nance struc­tures for AI

Joanna (Asia) Wiaterek12 Jan 2024 8:44 UTC
72 points
2 comments1 min readEA link

UK AI Bill Anal­y­sis & Opinion

CAISID5 Feb 2024 0:12 UTC
18 points
0 comments15 min readEA link

I am un­able to get any AI safety re­lated fel­low­ships or in­tern­ships.

Aavishkar11 Mar 2024 5:00 UTC
5 points
6 comments1 min readEA link

Claude 3.5 Sonnet

Zach Stein-Perlman20 Jun 2024 18:00 UTC
31 points
0 comments1 min readEA link
(www.anthropic.com)

AI In­ci­dent Re­port­ing: A Reg­u­la­tory Review

Deric Cheng11 Mar 2024 21:02 UTC
10 points
1 comment6 min readEA link

Assess­ment of AI safety agen­das: think about the down­side risk

Roman Leventov19 Dec 2023 9:02 UTC
6 points
0 comments1 min readEA link

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbacon125 Jul 2023 16:55 UTC
4 points
0 comments1 min readEA link

Claude 3 claims it’s con­scious, doesn’t want to die or be modified

MikhailSamin4 Mar 2024 23:05 UTC
8 points
3 comments1 min readEA link

Liter­a­ture re­view of Trans­for­ma­tive Ar­tifi­cial In­tel­li­gence timelines

Jaime Sevilla27 Jan 2023 20:36 UTC
148 points
10 comments1 min readEA link

OpenAI’s Su­per­al­ign­ment team has opened Fast Grants

Yadav16 Dec 2023 15:41 UTC
31 points
2 comments1 min readEA link
(openai.com)

Bring­ing about an­i­mal-in­clu­sive AI

Max Taylor18 Dec 2023 11:49 UTC
121 points
9 comments16 min readEA link

Join the AI Eval­u­a­tion Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC
20 points
0 comments4 min readEA link

[Question] Why haven’t we been de­stroyed by a power-seek­ing AGI from el­se­where in the uni­verse?

Jadon Schmitt22 Jul 2023 7:21 UTC
35 points
14 comments1 min readEA link

AISN#15: China and the US take ac­tion to reg­u­late AI, re­sults from a tour­na­ment fore­cast­ing AI risk, up­dates on xAI’s plan, and Meta re­leases its open-source and com­mer­cially available Llama 2

Center for AI Safety19 Jul 2023 1:40 UTC
5 points
0 comments6 min readEA link
(newsletter.safe.ai)

An In­tro­duc­tion to Cri­tiques of promi­nent AI safety organizations

Omega19 Jul 2023 6:53 UTC
87 points
2 comments5 min readEA link

(Even) More Early-Ca­reer EAs Should Try AI Safety Tech­ni­cal Research

tlevin30 Jun 2022 21:14 UTC
86 points
40 comments11 min readEA link

AI-Rele­vant Reg­u­la­tion: In­surance in Safety-Crit­i­cal Industries

SWK22 Jul 2023 17:52 UTC
5 points
0 comments6 min readEA link

AI Policy In­sights from the AIMS Survey

Janet Pauketat22 Feb 2024 19:17 UTC
10 points
1 comment18 min readEA link
(www.sentienceinstitute.org)

Ap­ply to MATS 7.0!

Ryan Kidd21 Sep 2024 0:23 UTC
27 points
0 comments1 min readEA link

AI Risk and Sur­vivor­ship Bias—How An­dreessen and LeCun got it wrong

stepanlos14 Jul 2023 17:10 UTC
5 points
1 comment6 min readEA link

A fic­tional AI law laced w/​ al­ign­ment theory

Miguel17 Jul 2023 3:26 UTC
3 points
0 comments2 min readEA link

Help us seed AI Safety Brussels

gergo7 Aug 2024 6:17 UTC
50 points
2 comments3 min readEA link

An economist’s per­spec­tive on AI safety

David Stinson7 Jun 2024 7:55 UTC
7 points
1 comment9 min readEA link

Cam­bridge AI Safety Hub is look­ing for full- or part-time organisers

hannah15 Jul 2023 14:31 UTC
12 points
0 comments1 min readEA link

Ad­vo­cat­ing for Public Own­er­ship of Fu­ture AGI: Pre­serv­ing Hu­man­ity’s Col­lec­tive Heritage

George_A (Digital Intelligence Rights Initiative) 14 Jul 2023 16:01 UTC
−10 points
2 comments4 min readEA link

Up­dates from Cam­paign for AI Safety

Jolyn Khoo19 Jul 2023 8:15 UTC
5 points
0 comments2 min readEA link
(www.campaignforaisafety.org)

Non-triv­ial Fel­low­ship Pro­ject: Towards a Unified Danger­ous Ca­pa­bil­ities Benchmark

Jord 4 Mar 2024 9:24 UTC
2 points
1 comment9 min readEA link

Cur­rent paths to im­pact in EU AI Policy (Feb ’24)

JOMG_Monnet12 Feb 2024 15:57 UTC
47 points
0 comments5 min readEA link

[Question] How in­de­pen­dent is the re­search com­ing out of OpenAI’s pre­pared­ness team?

Earthling10 Feb 2024 16:59 UTC
18 points
0 comments1 min readEA link

[Linkpost] A Nar­row Path—How to Se­cure our Future

MathiasKB🔸2 Oct 2024 22:50 UTC
63 points
0 comments1 min readEA link
(www.narrowpath.co)

An ar­gu­ment for ac­cel­er­at­ing in­ter­na­tional AI gov­er­nance re­search (part 1)

MattThinks16 Aug 2023 5:40 UTC
9 points
0 comments3 min readEA link

Sam Alt­man’s Chip Am­bi­tions Un­der­cut OpenAI’s Safety Strategy

Garrison10 Feb 2024 19:52 UTC
286 points
20 comments3 min readEA link
(garrisonlovely.substack.com)

Model­ling large-scale cy­ber at­tacks from ad­vanced AI sys­tems with Ad­vanced Per­sis­tent Threats

Iyngkarran Kumar2 Oct 2023 9:54 UTC
28 points
2 comments30 min readEA link

Thoughts on the AI Safety Sum­mit com­pany policy re­quests and responses

So8res31 Oct 2023 23:54 UTC
42 points
3 comments1 min readEA link

(How) Is tech­ni­cal AI Safety re­search be­ing eval­u­ated?

JohnSnow11 Jul 2023 9:37 UTC
27 points
1 comment1 min readEA link

Begin­ner’s guide to re­duc­ing s-risks [link-post]

Center on Long-Term Risk17 Oct 2023 0:51 UTC
129 points
3 comments3 min readEA link
(longtermrisk.org)

Tort Law Can Play an Im­por­tant Role in Miti­gat­ing AI Risk

Gabriel Weil12 Feb 2024 17:11 UTC
99 points
6 comments5 min readEA link

AI-Rele­vant Reg­u­la­tion: IAEA

SWK15 Jul 2023 18:20 UTC
10 points
0 comments5 min readEA link

Paradigms and The­ory Choice in AI: Adap­tivity, Econ­omy and Control

particlemania28 Aug 2023 22:44 UTC
3 points
0 comments16 min readEA link

[Question] What am I miss­ing re. open-source LLM’s?

another-anon-do-gooder4 Dec 2023 4:48 UTC
1 point
2 comments1 min readEA link

AI-Rele­vant Reg­u­la­tion: CERN

SWK15 Jul 2023 18:40 UTC
12 points
0 comments6 min readEA link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC
40 points
1 comment1 min readEA link

AI Wellbeing

Simon 11 Jul 2023 0:34 UTC
11 points
0 comments9 min readEA link

A sim­ple way of ex­ploit­ing AI’s com­ing eco­nomic im­pact may be highly-impactful

kuira16 Jul 2023 10:30 UTC
5 points
0 comments2 min readEA link
(www.lesswrong.com)

Up­dates from Cam­paign for AI Safety

Jolyn Khoo31 Oct 2023 5:46 UTC
14 points
1 comment2 min readEA link
(www.campaignforaisafety.org)

Ask AI com­pa­nies about what they are do­ing for AI safety?

mic8 Mar 2022 21:54 UTC
44 points
1 comment2 min readEA link

Assess­ing the Danger­ous­ness of Malev­olent Ac­tors in AGI Gover­nance: A Pre­limi­nary Exploration

Callum Hinchcliffe14 Oct 2023 21:18 UTC
28 points
4 comments9 min readEA link

UK Foun­da­tion Model Task Force—Ex­pres­sion of Interest

ojorgensen18 Jun 2023 9:40 UTC
111 points
3 comments1 min readEA link
(twitter.com)

An­nounc­ing Athena—Women in AI Align­ment Research

Claire Short7 Nov 2023 22:02 UTC
180 points
28 comments3 min readEA link

Should you work at a lead­ing AI lab? (in­clud­ing in non-safety roles)

Benjamin Hilton25 Jul 2023 16:28 UTC
38 points
13 comments12 min readEA link

[Question] Could some­one help me un­der­stand why it’s so difficult to solve the al­ign­ment prob­lem?

Jadon Schmitt22 Jul 2023 4:39 UTC
35 points
21 comments1 min readEA link

My Ob­jec­tions to “We’re All Gonna Die with Eliezer Yud­kowsky”

Quintin Pope21 Mar 2023 1:23 UTC
166 points
21 comments39 min readEA link

[Question] Know a grad stu­dent study­ing AI’s eco­nomic im­pacts?

Madhav Malhotra5 Jul 2023 0:07 UTC
7 points
0 comments1 min readEA link

News: Span­ish AI image out­cry + US AI work­force “reg­u­la­tion”

Benevolent_Rain26 Sep 2023 7:43 UTC
9 points
0 comments1 min readEA link

Aus­trali­ans are con­cerned about AI risks and ex­pect strong gov­ern­ment action

Alexander Saeri8 Mar 2024 6:39 UTC
38 points
12 comments5 min readEA link
(aigovernance.org.au)

Biolog­i­cal su­per­in­tel­li­gence: a solu­tion to AI safety

Yarrow4 Dec 2023 13:09 UTC
0 points
6 comments1 min readEA link

Dr Alt­man or: How I Learned to Stop Wor­ry­ing and Love the Killer AI

Barak Gila11 Mar 2024 5:01 UTC
−7 points
0 comments2 min readEA link

The Mul­tidis­ci­plinary Ap­proach to Align­ment (MATA) and Archety­pal Trans­fer Learn­ing (ATL)

Miguel19 Jun 2023 3:23 UTC
4 points
0 comments7 min readEA link

We Should Talk About This More. Epistemic World Col­lapse as Im­mi­nent Safety Risk of Gen­er­a­tive AI.

Jörg Weiß16 Nov 2023 8:34 UTC
4 points
0 comments29 min readEA link

Po­ten­tial em­ploy­ees have a unique lever to in­fluence the be­hav­iors of AI labs

oxalis18 Mar 2023 20:58 UTC
139 points
1 comment5 min readEA link

Neu­ron­pe­dia—AI Safety Game

johnnylin16 Oct 2023 9:35 UTC
9 points
2 comments4 min readEA link
(neuronpedia.org)

Hash­marks: Pri­vacy-Pre­serv­ing Bench­marks for High-Stakes AI Evaluation

Paul Bricman4 Dec 2023 7:41 UTC
4 points
0 comments16 min readEA link
(arxiv.org)

Align­ing the Align­ers: En­sur­ing Aligned AI acts for the com­mon good of all mankind

timunderwood16 Jan 2023 11:13 UTC
40 points
2 comments4 min readEA link

LLMs won’t lead to AGI—Fran­cois Chollet

tobycrisford 🔸11 Jun 2024 20:19 UTC
37 points
23 comments1 min readEA link
(www.youtube.com)

If you are too stressed, walk away from the front lines

Neil Warren12 Jun 2023 21:01 UTC
7 points
2 comments4 min readEA link

[Question] Why is learn­ing eco­nomics, psy­chol­ogy, so­ciol­ogy im­por­tant for pre­vent­ing AI risks?

jackchang1103 Nov 2023 21:48 UTC
3 points
0 comments1 min readEA link

An­nounc­ing New Begin­ner-friendly Book on AI Safety and Risk

Darren McKee25 Nov 2023 15:57 UTC
114 points
9 comments1 min readEA link

Pod­cast: In­ter­view se­ries fea­tur­ing Dr. Peter Park

Jacob-Haimes26 Mar 2024 0:35 UTC
1 point
0 comments2 min readEA link
(into-ai-safety.github.io)

All Tech is Hu­man <-> EA

tae 🔸3 Dec 2023 21:01 UTC
29 points
0 comments2 min readEA link

There is only one goal or drive—only self-per­pet­u­a­tion counts

freest one13 Jun 2023 1:37 UTC
2 points
4 comments8 min readEA link

Cri­tiques of promi­nent AI safety labs: Conjecture

Omega12 Jun 2023 5:52 UTC
150 points
83 comments32 min readEA link

[Question] How does AI progress af­fect other EA cause ar­eas?

Luis Mota Freitas9 Jun 2023 12:43 UTC
95 points
13 comments1 min readEA link

What can su­per­in­tel­li­gent ANI tell us about su­per­in­tel­li­gent AGI?

Ted Sanders12 Jun 2023 6:32 UTC
81 points
20 comments5 min readEA link

The Bar for Con­tribut­ing to AI Safety is Lower than You Think

Chris Leong17 Aug 2024 10:52 UTC
14 points
5 comments2 min readEA link

Have your say on the fu­ture of AI reg­u­la­tion: Dead­line ap­proach­ing for your feed­back on UN High-Level Ad­vi­sory Body on AI In­terim Re­port ‘Govern­ing AI for Hu­man­ity’

Deborah W.A. Foulkes29 Mar 2024 6:37 UTC
17 points
1 comment1 min readEA link

Rais­ing the voices that ac­tu­ally count

Kim Holder13 Jun 2023 19:21 UTC
2 points
3 comments2 min readEA link

Fix­ing In­sider Threats in the AI Sup­ply Chain

Madhav Malhotra7 Oct 2023 10:49 UTC
9 points
2 comments5 min readEA link

The AI Endgame: A coun­ter­fac­tual to AI al­ign­ment by an AI Safety newcomer

Andreas P1 Dec 2023 5:49 UTC
2 points
5 comments3 min readEA link

A sum­mary of cur­rent work in AI governance

constructive17 Jun 2023 16:58 UTC
87 points
4 comments11 min readEA link

Ob­ser­va­tions on the fund­ing land­scape of EA and AI safety

Vilhelm Skoglund2 Oct 2023 9:45 UTC
136 points
12 comments15 min readEA link

The cur­rent al­ign­ment plan, and how we might im­prove it | EAG Bay Area 23

Buck7 Jun 2023 21:03 UTC
66 points
0 comments33 min readEA link

The Risks of AI-Gen­er­ated Con­tent on the EA Forum

WobblyPanda24 Jun 2023 5:33 UTC
−1 points
0 comments1 min readEA link

Epoch is hiring a Product and Data Vi­su­al­iza­tion Designer

merilalama25 Nov 2023 0:14 UTC
21 points
0 comments4 min readEA link
(careers.rethinkpriorities.org)

Mud­dling Along Is More Likely Than Dystopia

Jeffrey Heninger21 Oct 2023 9:30 UTC
87 points
3 comments8 min readEA link
(blog.aiimpacts.org)

Does AI risk “other” the AIs?

Joe_Carlsmith9 Jan 2024 17:51 UTC
23 points
3 comments1 min readEA link

Co­op­er­a­tive AI: Three things that con­fused me as a be­gin­ner (and my cur­rent un­der­stand­ing)

C Tilli16 Apr 2024 7:06 UTC
56 points
10 comments6 min readEA link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:23 UTC
5 points
0 comments6 min readEA link

OpenAI board re­ceived let­ter warn­ing of pow­er­ful AI

JordanStone23 Nov 2023 0:16 UTC
26 points
2 comments1 min readEA link
(www.reuters.com)

AI com­pa­nies are not on track to se­cure model weights

Jeffrey Ladish18 Jul 2024 15:13 UTC
73 points
3 comments19 min readEA link

Au­to­mated Par­li­a­ments — A Solu­tion to De­ci­sion Uncer­tainty and Misal­ign­ment in Lan­guage Models

Shak Ragoler2 Oct 2023 9:47 UTC
8 points
0 comments17 min readEA link

Catas­trophic Risks from Un­safe AI: Nav­i­gat­ing a Tightrope Sce­nario (Ben Garfinkel, EAG Lon­don 2023)

Alexander Saeri2 Jun 2023 9:59 UTC
19 points
1 comment10 min readEA link

A com­pute-based frame­work for think­ing about the fu­ture of AI

Matthew_Barnett31 May 2023 22:00 UTC
96 points
36 comments19 min readEA link

Safe AI and moral AI

William D'Alessandro1 Jun 2023 21:18 UTC
3 points
0 comments11 min readEA link

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

Center for AI Safety30 May 2023 11:44 UTC
16 points
3 comments6 min readEA link
(newsletter.safe.ai)

Digi­tal peo­ple could make AI safer

GMcGowan10 Jun 2022 15:29 UTC
24 points
15 comments4 min readEA link
(www.mindlessalgorithm.com)

Ex­plor­ers in a vir­tual coun­try: Nav­i­gat­ing the knowl­edge land­scape of large lan­guage models

Alexander Saeri28 Mar 2023 21:32 UTC
17 points
1 comment6 min readEA link

ChatGPT: to­wards AI subjectivity

KrisDAmato1 May 2024 10:13 UTC
3 points
0 comments1 min readEA link
(link.springer.com)

Prim­i­tive Global Dis­course Frame­work, Con­sti­tu­tional AI us­ing le­gal frame­works, and Mono­cul­ture—A loss of con­trol over the role of AGI in society

broptross1 Jun 2023 5:12 UTC
2 points
0 comments12 min readEA link

Key take­aways from our EA and al­ign­ment re­search surveys

Cameron Berg4 May 2024 15:51 UTC
64 points
21 comments21 min readEA link

Without a tra­jec­tory change, the de­vel­op­ment of AGI is likely to go badly

Max H30 May 2023 0:21 UTC
1 point
0 comments13 min readEA link

Boomerang—pro­to­col to dis­solve some com­mit­ment races

Filip Sondej30 May 2023 16:24 UTC
20 points
0 comments8 min readEA link
(www.lesswrong.com)

AISN #35: Lob­by­ing on AI Reg­u­la­tion Plus, New Models from OpenAI and Google, and Le­gal Regimes for Train­ing on Copy­righted Data

Center for AI Safety16 May 2024 14:26 UTC
14 points
0 comments6 min readEA link
(newsletter.safe.ai)

We are fight­ing a shared bat­tle (a call for a differ­ent ap­proach to AI Strat­egy)

Gideon Futerman16 Mar 2023 14:37 UTC
59 points
11 comments15 min readEA link

AI, Cy­ber­se­cu­rity, and Malware: A Shal­low Re­port [Gen­eral]

Madhav Malhotra31 Mar 2023 12:01 UTC
5 points
0 comments8 min readEA link

My Proven AI Safety Ex­pla­na­tion (as a com­put­ing stu­dent)

Mica White6 Feb 2024 3:58 UTC
8 points
4 comments6 min readEA link

AI, Cy­ber­se­cu­rity, and Malware: A Shal­low Re­port [Tech­ni­cal]

Madhav Malhotra31 Mar 2023 12:03 UTC
4 points
0 comments9 min readEA link

AGI de­vel­op­ment role-play­ing game

rekahalasz11 Dec 2023 10:22 UTC
4 points
0 comments1 min readEA link

Sta­tus Quo Eng­ines—AI essay

Ilana_Goldowitz_Jimenez28 May 2023 14:33 UTC
1 point
0 comments15 min readEA link

I de­signed an AI safety course (for a philos­o­phy de­part­ment)

Eleni_A23 Sep 2023 21:56 UTC
27 points
3 comments2 min readEA link

Pos­si­ble OpenAI’s Q* break­through and Deep­Mind’s AlphaGo-type sys­tems plus LLMs

Burnydelic23 Nov 2023 7:02 UTC
13 points
4 comments2 min readEA link

It’s not ob­vi­ous that get­ting dan­ger­ous AI later is better

Aaron_Scher23 Sep 2023 5:35 UTC
23 points
9 comments16 min readEA link

An­nounc­ing: Mechanism De­sign for AI Safety—Read­ing Group

Rubi J. Hudson9 Aug 2022 4:25 UTC
36 points
1 comment4 min readEA link

What to think when a lan­guage model tells you it’s sentient

rgb20 Feb 2023 2:59 UTC
112 points
18 comments6 min readEA link

[Question] Would an An­thropic/​OpenAI merger be good for AI safety?

M22 Nov 2023 20:21 UTC
6 points
1 comment1 min readEA link

AGI mis­al­ign­ment x-risk may be lower due to an over­looked goal speci­fi­ca­tion technology

johnjnay21 Oct 2022 2:03 UTC
20 points
1 comment1 min readEA link

Why Would AI “Aim” To Defeat Hu­man­ity?

Holden Karnofsky29 Nov 2022 18:59 UTC
24 points
0 comments32 min readEA link
(www.cold-takes.com)

[Linkpost] Be­ware the Squir­rel by Ver­ity Harding

Earthling3 Sep 2023 21:04 UTC
1 point
1 comment2 min readEA link
(samf.substack.com)

[Linkpost] Longter­mists Are Push­ing a New Cold War With China

Radical Empath Ismam27 May 2023 6:53 UTC
37 points
16 comments1 min readEA link
(jacobin.com)

An­nounc­ing Hu­man-al­igned AI Sum­mer School

Jan_Kulveit22 May 2024 8:55 UTC
33 points
0 comments1 min readEA link
(humanaligned.ai)

Oc­to­ber 2022 AI Risk Com­mu­nity Sur­vey Results

Froolow24 May 2023 10:37 UTC
19 points
0 comments7 min readEA link

A Viral Li­cense for AI Safety

IvanVendrov5 Jun 2021 2:00 UTC
30 points
6 comments5 min readEA link

AI Safety & Risk Din­ner w/​ En­trepreneur First CEO & ARIA Chair, Matt Clifford in New York

SimonPastor28 Nov 2023 19:45 UTC
2 points
0 comments1 min readEA link

[CFP] NeurIPS work­shop: AI meets Mo­ral Philos­o­phy and Mo­ral Psychology

jaredlcm4 Sep 2023 6:21 UTC
10 points
1 comment4 min readEA link

What we’re miss­ing: the case for struc­tural risks from AI

Justin Olive9 Nov 2023 5:52 UTC
31 points
3 comments6 min readEA link

MATS Sum­mer 2023 Retrospective

utilistrutil2 Dec 2023 0:12 UTC
28 points
3 comments1 min readEA link

[Linkpost] OpenAI lead­ers call for reg­u­la­tion of “su­per­in­tel­li­gence” to re­duce ex­is­ten­tial risk.

Lowe Lundin25 May 2023 14:14 UTC
5 points
0 comments1 min readEA link

You Can’t Prove Aliens Aren’t On Their Way To De­stroy The Earth (A Com­pre­hen­sive Take­down Of The Doomer View Of AI)

Murphy7 Apr 2023 13:37 UTC
−31 points
7 comments9 min readEA link

Diminish­ing Re­turns in Ma­chine Learn­ing Part 1: Hard­ware Devel­op­ment and the Phys­i­cal Frontier

Brian Chau27 May 2023 12:39 UTC
16 points
3 comments12 min readEA link
(www.fromthenew.world)

In­trin­sic limi­ta­tions of GPT-4 and other large lan­guage mod­els, and why I’m not (very) wor­ried about GPT-n

James Fodor3 Jun 2023 13:09 UTC
28 points
3 comments11 min readEA link

Bi­den-Har­ris Ad­minis­tra­tion An­nounces First-Ever Con­sor­tium Ded­i­cated to AI Safety

ben.smith9 Feb 2024 6:40 UTC
15 points
1 comment1 min readEA link
(www.nist.gov)

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 9:24 UTC
7 points
0 comments5 min readEA link

Trans­for­ma­tive AI and Com­pute—Read­ing List

Frederik Berg4 Sep 2023 6:21 UTC
24 points
0 comments1 min readEA link
(docs.google.com)

AI Safety Camp 2024

Linda Linsefors18 Nov 2023 10:37 UTC
21 points
1 comment1 min readEA link
(aisafety.camp)

Unions for AI safety?

dEAsign24 Sep 2023 0:13 UTC
7 points
12 comments2 min readEA link

Five ne­glected work ar­eas that could re­duce AI risk

Aaron_Scher24 Sep 2023 2:09 UTC
22 points
0 comments9 min readEA link

AI Align­ment in The New Yorker

Eleni_A17 May 2023 21:19 UTC
23 points
0 comments1 min readEA link
(www.newyorker.com)

GovAI: Towards best prac­tices in AGI safety and gov­er­nance: A sur­vey of ex­pert opinion

Zach Stein-Perlman15 May 2023 1:42 UTC
68 points
3 comments1 min readEA link

Up­dates from Cam­paign for AI Safety

Jolyn Khoo30 Aug 2023 5:36 UTC
7 points
0 comments2 min readEA link
(www.campaignforaisafety.org)

“The Race to the End of Hu­man­ity” – Struc­tural Uncer­tainty Anal­y­sis in AI Risk Models

Froolow19 May 2023 12:03 UTC
48 points
4 comments21 min readEA link

AI safety and con­scious­ness re­search: A brainstorm

Daniel_Friedrich15 Mar 2023 14:33 UTC
11 points
1 comment9 min readEA link

A note of cau­tion on be­liev­ing things on a gut level

Nathan_Barnard9 May 2023 12:20 UTC
41 points
5 comments2 min readEA link

[Question] Would a su­per-in­tel­li­gent AI nec­es­sar­ily sup­port its own ex­is­tence?

Porque?25 Jun 2023 10:39 UTC
8 points
2 comments2 min readEA link

You don’t need to be a ge­nius to be in AI safety research

Claire Short10 May 2023 22:23 UTC
28 points
4 comments6 min readEA link

Align­ment, Goals, & The Gut-Head Gap: A Re­view of Ngo. et al

Violet Hour11 May 2023 17:16 UTC
26 points
0 comments13 min readEA link

Sum­mary of Si­tu­a­tional Aware­ness—The Decade Ahead

OscarD🔸8 Jun 2024 11:29 UTC
143 points
5 comments18 min readEA link

Aim for con­di­tional pauses

AnonResearcherMajorAILab25 Sep 2023 1:05 UTC
100 points
42 comments12 min readEA link

“Pivotal Act” In­ten­tions: Nega­tive Con­se­quences and Fal­la­cious Arguments

Andrew Critch19 Apr 2022 20:24 UTC
80 points
10 comments7 min readEA link

How quickly AI could trans­form the world (Tom David­son on The 80,000 Hours Pod­cast)

80000_Hours8 May 2023 13:23 UTC
82 points
3 comments17 min readEA link

AI policy & gov­er­nance in Aus­tralia: notes from an ini­tial discussion

Alexander Saeri15 May 2023 0:00 UTC
31 points
1 comment3 min readEA link

De­com­pos­ing al­ign­ment to take ad­van­tage of paradigms

Christopher King4 Jun 2023 14:26 UTC
2 points
0 comments4 min readEA link

[Question] Is work­ing on AI to help democ­racy a good idea?

WillPearson17 Feb 2024 23:15 UTC
5 points
3 comments1 min readEA link

Risk Align­ment in Agen­tic AI Systems

Hayley Clatterbuck1 Oct 2024 22:51 UTC
31 points
1 comment3 min readEA link
(static1.squarespace.com)

Peter Eck­er­sley (1979-2022)

technicalities3 Sep 2022 10:45 UTC
497 points
9 comments1 min readEA link

How MATS ad­dresses “mass move­ment build­ing” concerns

Ryan Kidd4 May 2023 0:55 UTC
79 points
4 comments1 min readEA link

We’re all in this together

Tamsin Leake5 Dec 2023 13:57 UTC
15 points
1 comment1 min readEA link
(carado.moe)

Four ques­tions I ask AI safety researchers

Akash17 Jul 2022 17:25 UTC
30 points
3 comments1 min readEA link

Giv­ing away copies of Un­con­trol­lable by Dar­ren McKee

Greg_Colbourn14 Dec 2023 17:00 UTC
39 points
2 comments1 min readEA link

[Link Post: New York Times] White House Un­veils Ini­ti­a­tives to Re­duce Risks of A.I.

Rockwell4 May 2023 14:04 UTC
50 points
1 comment2 min readEA link

AI welfare vs. AI rights

Matthew_Barnett4 Feb 2025 18:28 UTC
33 points
20 comments3 min readEA link

AI gov­er­nance tal­ent pro­files I’d like to see ap­ply for OP funding

JulianHazell19 Dec 2023 12:34 UTC
118 points
4 comments3 min readEA link
(www.openphilanthropy.org)

AI Views Snapshots

RobBensinger13 Dec 2023 0:45 UTC
25 points
0 comments1 min readEA link

Owain Evans on LLMs, Truth­ful AI, AI Com­po­si­tion, and More

Ozzie Gooen2 May 2023 1:20 UTC
21 points
0 comments1 min readEA link
(quri.substack.com)

Yud­kowsky on AGI risk on the Ban­kless podcast

RobBensinger13 Mar 2023 0:42 UTC
54 points
2 comments75 min readEA link

P(doom|AGI) is high: why the de­fault out­come of AGI is doom

Greg_Colbourn2 May 2023 10:40 UTC
13 points
28 comments3 min readEA link

How CISA can Sup­port the Se­cu­rity of Large AI Models Against Theft [Grad School As­sign­ment]

Marcel D3 May 2023 15:36 UTC
7 points
0 comments13 min readEA link

My cur­rent take on ex­is­ten­tial AI risk [FB post]

Aryeh Englander1 May 2023 16:22 UTC
10 points
0 comments3 min readEA link

Planes are still decades away from dis­plac­ing most bird jobs

guzey25 Nov 2022 16:49 UTC
27 points
2 comments1 min readEA link

Apoca­lypse in­surance, and the hardline liber­tar­ian take on AI risk

So8res28 Nov 2023 2:09 UTC
21 points
0 comments1 min readEA link

AI safety logo de­sign con­test, due end of May (ex­tended)

Adrian Cipriani28 Apr 2023 2:53 UTC
13 points
23 comments2 min readEA link

New open let­ter on AI — “In­clude Con­scious­ness Re­search”

Jamie_Harris28 Apr 2023 7:50 UTC
55 points
1 comment3 min readEA link
(amcs-community.org)

A Guide to Fore­cast­ing AI Science Capabilities

Eleni_A29 Apr 2023 6:51 UTC
19 points
1 comment4 min readEA link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 19:39 UTC
29 points
8 comments2 min readEA link
(www.lesswrong.com)

An­nounc­ing the Open Philan­thropy AI Wor­ld­views Contest

Jason Schukraft10 Mar 2023 2:33 UTC
137 points
33 comments3 min readEA link
(www.openphilanthropy.org)

Emerg­ing Tech­nolo­gies: More to explore

EA Handbook1 Jan 2021 11:06 UTC
4 points
0 comments2 min readEA link

AI Rights for Hu­man Safety

Matthew_Barnett3 Aug 2024 0:47 UTC
54 points
1 comment1 min readEA link
(papers.ssrn.com)

A Bare­bones Guide to Mechanis­tic In­ter­pretabil­ity Prerequisites

Neel Nanda29 Nov 2022 18:43 UTC
54 points
1 comment3 min readEA link
(neelnanda.io)

Max Teg­mark’s new Time ar­ti­cle on how we’re in a Don’t Look Up sce­nario [Linkpost]

Jonas Hallgren25 Apr 2023 15:47 UTC
41 points
0 comments1 min readEA link

The AI in­dus­try turns against its fa­vorite philosophy

Jonathan Yan22 Nov 2023 0:11 UTC
14 points
2 comments1 min readEA link
(www.semafor.com)

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

Miguel26 Apr 2023 0:40 UTC
13 points
0 comments10 min readEA link

[Linkpost] ‘The God­father of A.I.’ Leaves Google and Warns of Danger Ahead

imp4rtial 🔸1 May 2023 19:54 UTC
43 points
3 comments3 min readEA link
(www.nytimes.com)

A Wind­fall Clause for CEO could worsen AI race dynamics

Larks9 Mar 2023 18:02 UTC
69 points
12 comments7 min readEA link

Ques­tions about Con­je­cure’s CoEm proposal

Akash9 Mar 2023 19:32 UTC
19 points
0 comments1 min readEA link

AI Safety in a World of Vuln­er­a­ble Ma­chine Learn­ing Systems

AdamGleave8 Mar 2023 2:40 UTC
20 points
0 comments1 min readEA link

Two con­trast­ing mod­els of “in­tel­li­gence” and fu­ture growth

Magnus Vinding24 Nov 2022 11:54 UTC
74 points
32 comments22 min readEA link

Stu­dent com­pe­ti­tion for draft­ing a treaty on mora­to­rium of large-scale AI ca­pa­bil­ities R&D

Nayanika24 Apr 2023 13:15 UTC
36 points
4 comments2 min readEA link

“Who Will You Be After ChatGPT Takes Your Job?”

Stephen Thomas21 Apr 2023 21:31 UTC
23 points
4 comments2 min readEA link
(www.wired.com)

Be­fore Alt­man’s Ouster, OpenAI’s Board Was Di­vided and Feuding

Jonathan Yan22 Nov 2023 1:01 UTC
25 points
1 comment1 min readEA link
(www.nytimes.com)

Is the time crunch for AI Safety Move­ment Build­ing now?

Chris Leong8 Jun 2022 12:19 UTC
14 points
10 comments3 min readEA link

Who Aligns the Align­ment Re­searchers?

ben.smith5 Mar 2023 23:22 UTC
23 points
4 comments1 min readEA link

“Can We Sur­vive Tech­nol­ogy?” by John von Neumann

Eli Rose13 Mar 2023 2:26 UTC
51 points
0 comments1 min readEA link
(geosci.uchicago.edu)

Power laws in Speedrun­ning and Ma­chine Learning

Jaime Sevilla24 Apr 2023 10:06 UTC
48 points
0 comments1 min readEA link

Paper­clip Club (AI Safety Meetup)

Luke Thorburn20 Apr 2023 16:04 UTC
2 points
0 comments1 min readEA link

How bad a fu­ture do ML re­searchers ex­pect?

Katja_Grace13 Mar 2023 5:47 UTC
165 points
20 comments1 min readEA link

Play Re­grantor: Move up to $250,000 to Your Top High-Im­pact Pro­jects!

Dawn Drescher17 May 2023 16:51 UTC
58 points
2 comments2 min readEA link
(impactmarkets.substack.com)

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
32 points
1 comment1 min readEA link

[Question] If your AGI x-risk es­ti­mates are low, what sce­nar­ios make up the bulk of your ex­pec­ta­tions for an OK out­come?

Greg_Colbourn21 Apr 2023 11:15 UTC
62 points
55 comments1 min readEA link

12 ten­ta­tive ideas for US AI policy (Luke Muehlhauser)

Lizka19 Apr 2023 21:05 UTC
117 points
12 comments4 min readEA link
(www.openphilanthropy.org)

Quick takes on “AI is easy to con­trol”

So8res2 Dec 2023 22:33 UTC
−12 points
4 comments1 min readEA link

Com­ments on OpenAI’s “Plan­ning for AGI and be­yond”

So8res3 Mar 2023 23:01 UTC
115 points
7 comments1 min readEA link

In­tro­duc­ing the new Ries­gos Catas­trófi­cos Globales team

Jaime Sevilla3 Mar 2023 23:04 UTC
74 points
3 comments5 min readEA link
(riesgoscatastroficosglobales.com)

Pivotal Re­search is Hiring Re­search Managers

Tobias Häberli25 Sep 2024 19:11 UTC
8 points
0 comments3 min readEA link

[Video] - How does the EU AI Act Work?

Yadav11 Sep 2024 14:16 UTC
10 points
0 comments5 min readEA link

Notes on risk compensation

trammell12 May 2024 18:40 UTC
136 points
14 comments21 min readEA link

De­com­pos­ing Agency — ca­pa­bil­ities with­out desires

Owen Cotton-Barratt11 Jul 2024 9:38 UTC
37 points
2 comments12 min readEA link
(strangecities.substack.com)

Pod­cast with Yoshua Ben­gio on Why AI Labs are “Play­ing Dice with Hu­man­ity’s Fu­ture”

Garrison10 May 2024 17:23 UTC
29 points
3 comments2 min readEA link
(garrisonlovely.substack.com)

Brand­ing AI Safety Groups: A Field Guide

Agustín Covarrubias 🔸13 May 2024 17:17 UTC
44 points
6 comments1 min readEA link

GDP per cap­ita in 2050

Hauke Hillebrandt6 May 2024 15:14 UTC
130 points
11 comments16 min readEA link
(hauke.substack.com)

Peter Eck­er­sley (1979-2022)

technicalities3 Sep 2022 10:45 UTC
497 points
9 comments1 min readEA link

Safety tax functions

Owen Cotton-Barratt20 Oct 2024 14:13 UTC
23 points
1 comment6 min readEA link
(strangecities.substack.com)

Epoch AI is Hiring an Oper­a­tions Associate

merilalama3 May 2024 0:16 UTC
5 points
1 comment3 min readEA link
(careers.rethinkpriorities.org)

Biorisk is an Un­helpful Anal­ogy for AI Risk

Davidmanheim6 May 2024 6:18 UTC
22 points
4 comments3 min readEA link

Is the time crunch for AI Safety Move­ment Build­ing now?

Chris Leong8 Jun 2022 12:19 UTC
14 points
10 comments3 min readEA link

Up­dates on the EA catas­trophic risk land­scape

Benjamin_Todd6 May 2024 4:52 UTC
194 points
46 comments2 min readEA link

Who Aligns the Align­ment Re­searchers?

ben.smith5 Mar 2023 23:22 UTC
23 points
4 comments1 min readEA link

ML4Good is seek­ing part­ner or­gani­sa­tions, in­di­vi­d­ual or­ganisers and TAs

Nia13 May 2024 13:43 UTC
22 points
0 comments3 min readEA link

The In­ten­tional Stance, LLMs Edition

Eleni_A1 May 2024 15:22 UTC
8 points
2 comments8 min readEA link

Les­sons from the FDA for AI

Remmelt2 Aug 2024 0:52 UTC
6 points
2 comments1 min readEA link
(ainowinstitute.org)

Risks I am Con­cerned About

HappyBunny29 Apr 2024 23:41 UTC
1 point
1 comment1 min readEA link

AISN #38: Supreme Court De­ci­sion Could Limit Fed­eral Abil­ity to Reg­u­late AI Plus, “Cir­cuit Break­ers” for AI sys­tems, and up­dates on China’s AI industry

Center for AI Safety9 Jul 2024 19:29 UTC
8 points
0 comments5 min readEA link
(newsletter.safe.ai)

Aspira­tion-based, non-max­i­miz­ing AI agent designs

Bob Jacobs 🔸7 May 2024 16:13 UTC
12 points
1 comment38 min readEA link

AI Safety is Some­times a Model Property

Cullen 🔸2 May 2024 15:38 UTC
18 points
1 comment1 min readEA link
(open.substack.com)

Re­lease of UN’s draft re­lated to the gov­er­nance of AI (a sum­mary of the Si­mon In­sti­tute’s re­sponse)

SebastianSchmidt27 Apr 2024 18:27 UTC
22 points
0 comments1 min readEA link

Com­ments on OpenAI’s “Plan­ning for AGI and be­yond”

So8res3 Mar 2023 23:01 UTC
115 points
7 comments1 min readEA link

AISC9 has ended and there will be an AISC10

Linda Linsefors29 Apr 2024 10:53 UTC
36 points
0 comments1 min readEA link

AI Safety Newslet­ter #42: New­som Ve­toes SB 1047 Plus, OpenAI’s o1, and AI Gover­nance Summary

Center for AI Safety1 Oct 2024 20:33 UTC
10 points
0 comments6 min readEA link
(newsletter.safe.ai)

List #2: Why co­or­di­nat­ing to al­ign as hu­mans to not de­velop AGI is a lot eas­ier than, well… co­or­di­nat­ing as hu­mans with AGI co­or­di­nat­ing to be al­igned with humans

Remmelt24 Dec 2022 9:53 UTC
3 points
0 comments1 min readEA link

AI Gover­nance & Strat­egy: Pri­ori­ties, tal­ent gaps, & opportunities

Akash3 Mar 2023 18:09 UTC
21 points
0 comments1 min readEA link

Re­sults of an in­for­mal sur­vey on AI grantmaking

Scott Alexander21 Aug 2024 13:19 UTC
127 points
28 comments1 min readEA link

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime_Riche26 Apr 2024 16:06 UTC
10 points
2 comments3 min readEA link

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
80 points
0 comments1 min readEA link

Is AI fore­cast­ing a waste of effort on the mar­gin?

Emrik5 Nov 2022 0:41 UTC
10 points
6 comments3 min readEA link

Staged release

Zach Stein-Perlman20 Apr 2024 1:00 UTC
16 points
0 comments1 min readEA link

80,000 hours should re­move OpenAI from the Job Board (and similar EA orgs should do similarly)

Raemon3 Jul 2024 20:34 UTC
263 points
79 comments3 min readEA link

Fron­tier AI sys­tems have sur­passed the self-repli­cat­ing red line

Greg_Colbourn10 Dec 2024 16:33 UTC
25 points
14 comments1 min readEA link
(github.com)

Law-Fol­low­ing AI 4: Don’t Rely on Vi­car­i­ous Liability

Cullen 🔸2 Aug 2022 23:23 UTC
13 points
0 comments3 min readEA link

[Video] Why SB-1047 de­serves a fairer debate

Yadav20 Aug 2024 10:38 UTC
15 points
1 comment7 min readEA link

Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philos­o­phy — $25k in prizes

Owen Cotton-Barratt16 Apr 2024 10:08 UTC
80 points
15 comments8 min readEA link
(blog.aiimpacts.org)

A Gen­tle In­tro­duc­tion to Risk Frame­works Beyond Forecasting

pending_survival11 Apr 2024 9:15 UTC
81 points
4 comments27 min readEA link

CEA seeks co-founder for AI safety group sup­port spin-off

Agustín Covarrubias 🔸8 Apr 2024 15:42 UTC
62 points
0 comments4 min readEA link

Imi­ta­tion Learn­ing is Prob­a­bly Ex­is­ten­tially Safe

Vasco Grilo🔸30 Apr 2024 17:06 UTC
19 points
7 comments3 min readEA link
(www.openphilanthropy.org)

The ar­gu­ment for near-term hu­man dis­em­pow­er­ment through AI

Chris Leong16 Apr 2024 3:07 UTC
31 points
12 comments1 min readEA link
(link.springer.com)

Women in AI Safety Lon­don Meetup

Nia1 Aug 2024 9:48 UTC
2 points
0 comments1 min readEA link

[Question] If AI is in a bub­ble and the bub­ble bursts, what would you do?

Remmelt19 Aug 2024 10:56 UTC
28 points
6 comments1 min readEA link

What suc­cess looks like

mariushobbhahn28 Jun 2022 14:30 UTC
112 points
20 comments19 min readEA link

List #1: Why stop­ping the de­vel­op­ment of AGI is hard but doable

Remmelt24 Dec 2022 9:52 UTC
24 points
2 comments1 min readEA link

Want to work on US emerg­ing tech policy? Con­sider the Hori­zon Fel­low­ship.

ES30 Jul 2024 11:46 UTC
32 points
0 comments1 min readEA link

Scal­ing Laws and Likely Limits to AI

Davidmanheim18 Aug 2024 17:19 UTC
19 points
0 comments3 min readEA link

De­cod­ing Repub­li­can AI Policy: In­sights from 10 Key Ar­ti­cles from Mid-2024

anonymous00718 Aug 2024 9:48 UTC
5 points
0 comments6 min readEA link

[Question] Suggested read­ings & videos for a new col­lege course on ‘Psy­chol­ogy and AI’?

Geoffrey Miller11 Jan 2024 22:26 UTC
12 points
3 comments1 min readEA link

Com­mu­nity Build­ing for Grad­u­ate Stu­dents: A Tar­geted Approach

Neil Crawford29 Mar 2022 19:47 UTC
13 points
0 comments3 min readEA link

Cog­ni­tive as­sets and defen­sive acceleration

JulianHazell3 Apr 2024 14:55 UTC
13 points
3 comments4 min readEA link
(muddyclothes.substack.com)

Ap­ply to the 2024 PIBBSS Sum­mer Re­search Fellowship

nora12 Jan 2024 4:06 UTC
37 points
1 comment1 min readEA link

New Me­tac­u­lus Space for AI and X-Risk Re­lated Questions

David Mathers🔸6 Sep 2024 11:37 UTC
16 points
0 comments1 min readEA link

How do AI welfare and AI safety in­ter­act?

Lucius Caviola1 Jul 2024 10:39 UTC
77 points
21 comments7 min readEA link
(outpaced.substack.com)

Bryan John­son seems more EA al­igned than I expected

PeterSlattery22 Apr 2024 9:38 UTC
13 points
27 comments2 min readEA link
(www.youtube.com)

Reflec­tions on my first year of AI safety research

Jay Bailey8 Jan 2024 7:49 UTC
63 points
2 comments12 min readEA link

2023: news on AI safety, an­i­mal welfare, global health, and more

Lizka5 Jan 2024 21:57 UTC
54 points
1 comment12 min readEA link

Sur­vey on in­ter­me­di­ate goals in AI governance

MichaelA🔸17 Mar 2023 12:44 UTC
155 points
4 comments1 min readEA link

A new­comer’s guide to the tech­ni­cal AI safety field

zeshen4 Nov 2022 14:29 UTC
16 points
0 comments1 min readEA link

Against most, but not all, AI risk analogies

Matthew_Barnett14 Jan 2024 19:13 UTC
43 points
9 comments1 min readEA link

[Question] What is MIRI cur­rently do­ing?

Roko14 Dec 2024 2:55 UTC
9 points
2 comments1 min readEA link

Pri­ori­tis­ing be­tween ex­tinc­tion risks: Ev­i­dence Quality

freedomandutility30 Dec 2023 12:25 UTC
11 points
0 comments2 min readEA link

Pro­ject ideas: Gover­nance dur­ing ex­plo­sive tech­nolog­i­cal growth

Lukas Finnveden4 Jan 2024 7:25 UTC
33 points
1 comment16 min readEA link
(lukasfinnveden.substack.com)

AI, cen­tral­iza­tion, and the One Ring

Owen Cotton-Barratt13 Sep 2024 13:56 UTC
18 points
0 comments8 min readEA link
(strangecities.substack.com)

An Ar­gu­ment for Fo­cus­ing on Mak­ing AI go Well

Chris Leong28 Dec 2023 13:25 UTC
13 points
4 comments3 min readEA link

Eric Sch­midt’s blueprint for US tech­nol­ogy strategy

OscarD🔸15 Oct 2024 19:54 UTC
29 points
4 comments9 min readEA link

Pro­ject ideas: Sen­tience and rights of digi­tal minds

Lukas Finnveden4 Jan 2024 7:26 UTC
33 points
1 comment20 min readEA link
(lukasfinnveden.substack.com)

Po­si­tions at MITFutureTech

PeterSlattery19 Dec 2023 20:28 UTC
21 points
1 comment4 min readEA link

En­hanc­ing biose­cu­rity with lan­guage mod­els: defin­ing re­search directions

mic26 Mar 2024 12:30 UTC
11 points
1 comment13 min readEA link
(papers.ssrn.com)

The Fu­ture of Work: How Can Poli­cy­mak­ers Pre­pare for AI’s Im­pact on La­bor Mar­kets?

DavidConrad24 Jun 2024 21:43 UTC
4 points
1 comment3 min readEA link
(www.lesswrong.com)

[Question] Best giv­ing mul­ti­plier for X-risk/​AI safety?

SiebeRozendal27 Dec 2023 10:51 UTC
7 points
0 comments1 min readEA link

Talk: AI safety field­build­ing at MATS

Ryan Kidd23 Jun 2024 23:06 UTC
14 points
1 comment1 min readEA link

More peo­ple get­ting into AI safety should do a PhD

AdamGleave14 Mar 2024 22:14 UTC
50 points
4 comments1 min readEA link
(gleave.me)

[Question] Who should we give books on AI X-risk to?

yanni18 Dec 2023 23:57 UTC
13 points
1 comment1 min readEA link

AI gov­er­nance and strat­egy: a list of re­search agen­das and work that could be done.

Nathan_Barnard12 Mar 2024 11:21 UTC
33 points
4 comments17 min readEA link

Disen­tan­gling ar­gu­ments for the im­por­tance of AI safety

richard_ngo23 Jan 2019 14:58 UTC
63 points
14 comments8 min readEA link

Ret­ro­spec­tive on the 2022 Con­jec­ture AI Discussions

Andrea_Miotti24 Feb 2023 22:41 UTC
12 points
1 comment1 min readEA link

Nav­i­gat­ing Risks from Ad­vanced Ar­tifi­cial In­tel­li­gence: A Guide for Philan­thropists [Founders Pledge]

Tom Barnes🔸21 Jun 2024 9:48 UTC
101 points
7 comments1 min readEA link
(www.founderspledge.com)

On the fu­ture of lan­guage models

Owen Cotton-Barratt20 Dec 2023 16:58 UTC
125 points
3 comments36 min readEA link

“Ar­tifi­cial Gen­eral In­tel­li­gence”: an ex­tremely brief FAQ

Steven Byrnes11 Mar 2024 17:49 UTC
12 points
0 comments1 min readEA link

Chris­ti­ano (ARC) and GA (Con­jec­ture) Dis­cuss Align­ment Cruxes

Andrea_Miotti24 Feb 2023 23:03 UTC
16 points
1 comment1 min readEA link

De­con­struct­ing Bostrom’s Clas­sic Ar­gu­ment for AI Doom

Nora Belrose11 Mar 2024 6:03 UTC
25 points
0 comments1 min readEA link
(www.youtube.com)

Case stud­ies on so­cial-welfare-based stan­dards in var­i­ous industries

Holden Karnofsky20 Jun 2024 13:33 UTC
73 points
2 comments1 min readEA link

Fif­teen Law­suits against OpenAI

Remmelt9 Mar 2024 12:22 UTC
55 points
5 comments1 min readEA link

[Question] What should the EA/​AI safety com­mu­nity change, in re­sponse to Sam Alt­man’s re­vealed pri­ori­ties?

SiebeRozendal8 Mar 2024 12:35 UTC
30 points
16 comments1 min readEA link

Chain­ing Retroac­tive Fun­ders to Bor­row Against Un­likely Utopias

Dawn Drescher19 Apr 2022 18:25 UTC
24 points
4 comments9 min readEA link
(impactmarkets.substack.com)

AI, An­i­mals, and Digi­tal Minds 2024 - Retrospective

Constance Li19 Jun 2024 14:56 UTC
80 points
8 comments8 min readEA link

The last era of hu­man mistakes

Owen Cotton-Barratt24 Jul 2024 9:56 UTC
23 points
4 comments7 min readEA link
(strangecities.substack.com)

[Question] Any tips on ap­ply­ing for EA fund­ing?

Eevee🔹22 Sep 2024 5:11 UTC
18 points
4 comments1 min readEA link

AI Safety Newslet­ter #37: US Launches An­titrust In­ves­ti­ga­tions Plus, re­cent crit­i­cisms of OpenAI and An­thropic, and a sum­mary of Si­tu­a­tional Awareness

Center for AI Safety18 Jun 2024 18:08 UTC
15 points
0 comments5 min readEA link
(newsletter.safe.ai)

Pal­isade is hiring: Exec As­sis­tant, Con­tent Lead, Ops Lead, and Policy Lead

Charlie Rogers-Smith9 Oct 2024 0:04 UTC
15 points
2 comments1 min readEA link

[Question] Has An­thropic already made the ex­ter­nally leg­ible com­mit­ments that it planned to make?

Ofer12 Mar 2024 13:45 UTC
21 points
3 comments1 min readEA link

AI things that are per­haps as im­por­tant as hu­man-con­trol­led AI

Chi3 Mar 2024 18:07 UTC
113 points
9 comments21 min readEA link

Tak­ing a leave of ab­sence from Open Philan­thropy to work on AI safety

Holden Karnofsky23 Feb 2023 19:05 UTC
420 points
31 comments2 min readEA link

[Question] Why won’t nan­otech kill us all?

Yarrow16 Dec 2023 23:27 UTC
19 points
5 comments1 min readEA link

My ar­ti­cle in The Na­tion — Cal­ifor­nia’s AI Safety Bill Is a Mask-Off Mo­ment for the Industry

Garrison15 Aug 2024 19:25 UTC
134 points
0 comments1 min readEA link
(www.thenation.com)

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe_Carlsmith8 Oct 2024 22:30 UTC
18 points
1 comment1 min readEA link

Offer­ing AI safety sup­port calls for ML professionals

Vael Gates15 Feb 2024 23:48 UTC
52 points
1 comment1 min readEA link

In­cu­bat­ing AI x-risk pro­jects: some per­sonal reflections

Ben Snodin19 Dec 2023 17:03 UTC
84 points
10 comments9 min readEA link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
6 points
0 comments1 min readEA link

The case for more Align­ment Tar­get Anal­y­sis (ATA)

Chi20 Sep 2024 1:14 UTC
21 points
0 comments1 min readEA link

Can the AI af­ford to wait?

Ben Millwood🔸20 Mar 2024 19:45 UTC
48 points
11 comments7 min readEA link

A tale of 2.5 or­thog­o­nal­ity theses

Arepo1 May 2022 13:53 UTC
140 points
31 comments11 min readEA link

On the Dwarkesh/​Chol­let Pod­cast, and the cruxes of scal­ing to AGI

JWS 🔸15 Jun 2024 20:24 UTC
72 points
49 comments17 min readEA link

[Question] Dan Hendrycks and EA

Caruso3 Aug 2024 13:49 UTC
−1 points
6 comments1 min readEA link

Thoughts on “The Offense-Defense Balance Rarely Changes”

Cullen 🔸12 Feb 2024 3:26 UTC
42 points
4 comments5 min readEA link

The benefits and risks of op­ti­mism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC
3 points
5 comments1 min readEA link

Ar­tifi­cial In­tel­li­gence, Con­scious Machines, and An­i­mals: Broad­en­ing AI Ethics

Group Organizer21 Sep 2023 20:58 UTC
4 points
0 comments1 min readEA link

FLI pod­cast se­ries, “Imag­ine A World”, about as­pira­tional fu­tures with AGI

Jackson Wagner13 Oct 2023 16:03 UTC
18 points
0 comments4 min readEA link

Does schem­ing lead to ad­e­quate fu­ture em­pow­er­ment? (Sec­tion 2.3.1.2 of “Schem­ing AIs”)

Joe_Carlsmith3 Dec 2023 18:32 UTC
6 points
1 comment1 min readEA link

Think­ing-in-limits about TAI from the de­mand per­spec­tive. De­mand sat­u­ra­tion, re­source wars, new debt.

Ivan Madan7 Nov 2023 22:44 UTC
2 points
0 comments4 min readEA link

An­nounc­ing The Mi­das Pro­ject — and our first cam­paign (which you can help with!)

Tyler Johnston13 Jun 2024 18:41 UTC
98 points
15 comments4 min readEA link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

JamesFox5 Feb 2024 10:36 UTC
65 points
3 comments9 min readEA link

RP’s AI Gover­nance & Strat­egy team—June 2023 in­terim overview

MichaelA🔸22 Jun 2023 13:45 UTC
68 points
1 comment7 min readEA link

Up­com­ing speaker se­ries on emerg­ing tech, na­tional se­cu­rity & US policy careers

kuhanj21 Jun 2023 4:49 UTC
42 points
0 comments2 min readEA link

[Question] How good/​bad is the new Bing AI for the world?

Nathan Young17 Feb 2023 16:31 UTC
21 points
14 comments1 min readEA link

A Friendly Face (Another Failure Story)

Karl von Wendt20 Jun 2023 10:31 UTC
22 points
8 comments1 min readEA link

The Hub­inger lec­tures on AGI safety: an in­tro­duc­tory lec­ture series

evhub22 Jun 2023 0:59 UTC
44 points
0 comments1 min readEA link

ML4G Ger­many—AI Align­ment Camp

Evander H. 🔸19 Jun 2023 7:24 UTC
17 points
1 comment1 min readEA link

An­nounc­ing FAR Labs, an AI safety cowork­ing space

ghabs2 Oct 2023 20:15 UTC
63 points
0 comments1 min readEA link
(www.lesswrong.com)

Up­dat­ing Drexler’s CAIS model

Matthew_Barnett17 Jun 2023 1:57 UTC
59 points
0 comments1 min readEA link

Ra­tional An­i­ma­tions is look­ing for an AI Safety scriptwriter, a lead com­mu­nity man­ager, and other roles.

Writer16 Jun 2023 9:41 UTC
40 points
4 comments1 min readEA link

[Question] What would it look like for AIS to no longer be ne­glected?

Rockwell16 Jun 2023 15:59 UTC
100 points
15 comments1 min readEA link

Si­mu­lat­ing Shut­down Code Ac­ti­va­tions in an AI Virus Lab

Miguel20 Jun 2023 5:27 UTC
4 points
0 comments6 min readEA link

ai-plans.com De­cem­ber Cri­tique-a-Thon

Kabir_Kumar4 Dec 2023 9:27 UTC
1 point
0 comments2 min readEA link

Safety isn’t safety with­out a so­cial model (or: dis­pel­ling the myth of per se tech­ni­cal safety)

Andrew Critch14 Jun 2024 0:16 UTC
95 points
3 comments1 min readEA link

The “tech­nol­ogy” bucket error

Holly Elmore ⏸️ 🔸21 Sep 2023 0:59 UTC
33 points
10 comments4 min readEA link
(open.substack.com)

Hy­po­thet­i­cal grants that the Long-Term Fu­ture Fund nar­rowly rejected

calebp15 Nov 2023 19:39 UTC
95 points
12 comments6 min readEA link

Global Pause AI Protest 10/​21

Holly Elmore ⏸️ 🔸14 Oct 2023 3:17 UTC
22 points
0 comments1 min readEA link

M&A in AI

Hauke Hillebrandt30 Oct 2023 17:43 UTC
9 points
1 comment6 min readEA link

An­nounc­ing the Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety!

DanielFilan21 Sep 2021 0:41 UTC
62 points
0 comments1 min readEA link
(grants.futureoflife.org)

[Question] Pros and cons of set­ting up a com­pany to do in­de­pen­dent AIS re­search?

Eevee🔹13 Aug 2024 0:11 UTC
15 points
0 comments1 min readEA link

Brief thoughts on Data, Re­port­ing, and Re­sponse for AI Risk Mitigation

Davidmanheim15 Jun 2023 7:53 UTC
18 points
3 comments8 min readEA link

Some tal­ent needs in AI governance

Sam Clarke13 Jun 2023 13:53 UTC
133 points
10 comments8 min readEA link

ARC is hiring the­o­ret­i­cal researchers

Jacob_Hilton12 Jun 2023 19:11 UTC
78 points
0 comments4 min readEA link
(www.lesswrong.com)

Ap­ti­tudes for AI gov­er­nance work

Sam Clarke13 Jun 2023 13:54 UTC
68 points
0 comments7 min readEA link

Us­ing Con­sen­sus Mechanisms as an ap­proach to Alignment

Prometheus11 Jun 2023 13:24 UTC
14 points
0 comments1 min readEA link

Mesa-Op­ti­miza­tion: Ex­plain it like I’m 10 Edition

brook26 Aug 2023 23:06 UTC
7 points
0 comments6 min readEA link
(www.lesswrong.com)

12 ca­reer ad­vis­ing ques­tions that may (or may not) be helpful for peo­ple in­ter­ested in al­ign­ment research

Akash12 Dec 2022 22:36 UTC
14 points
0 comments1 min readEA link

UN Sec­re­tary-Gen­eral recog­nises ex­is­ten­tial threat from AI

Greg_Colbourn15 Jun 2023 17:03 UTC
58 points
1 comment1 min readEA link

Care­less talk on US-China AI com­pe­ti­tion? (and crit­i­cism of CAIS cov­er­age)

Oliver Sourbut20 Sep 2023 12:46 UTC
52 points
19 comments1 min readEA link
(www.oliversourbut.net)

UK gov­ern­ment to host first global sum­mit on AI Safety

DavidNash8 Jun 2023 13:24 UTC
78 points
1 comment5 min readEA link
(www.gov.uk)

[Question] Are we con­fi­dent that su­per­in­tel­li­gent ar­tifi­cial in­tel­li­gence dis­em­pow­er­ing hu­mans would be bad?

Vasco Grilo🔸10 Jun 2023 9:24 UTC
24 points
27 comments1 min readEA link

AI take­off and nu­clear war

Owen Cotton-Barratt11 Jun 2024 19:33 UTC
72 points
5 comments11 min readEA link
(strangecities.substack.com)

An­nounc­ing the In­tro­duc­tion to ML Safety Course

TW1236 Aug 2022 2:50 UTC
136 points
4 comments7 min readEA link

Be­ware pop­u­lar dis­cus­sions of AI “sen­tience”

David Mathers🔸8 Jun 2023 8:57 UTC
42 points
6 comments9 min readEA link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_Carlsmith15 Nov 2023 17:16 UTC
71 points
4 comments1 min readEA link

Protest against Meta’s ir­re­versible pro­lifer­a­tion (Sept 29, San Fran­cisco)

Holly Elmore ⏸️ 🔸19 Sep 2023 23:40 UTC
114 points
32 comments1 min readEA link

AI Safety Newslet­ter #41: The Next Gen­er­a­tion of Com­pute Scale Plus, Rank­ing Models by Sus­cep­ti­bil­ity to Jailbreak­ing, and Ma­chine Ethics

Center for AI Safety11 Sep 2024 19:11 UTC
12 points
0 comments5 min readEA link
(newsletter.safe.ai)

RSPs are pauses done right

evhub14 Oct 2023 4:06 UTC
93 points
7 comments1 min readEA link

Trans­for­ma­tive AGI by 2043 is <1% likely

Ted Sanders6 Jun 2023 15:51 UTC
98 points
92 comments5 min readEA link
(arxiv.org)

Ap­pli­ca­tions are now open for In­tro to ML Safety Spring 2023

Joshc4 Nov 2022 22:45 UTC
49 points
1 comment2 min readEA link

Which ML skills are use­ful for find­ing a new AIS re­search agenda?

Yonatan Cale9 Feb 2023 13:09 UTC
7 points
3 comments1 min readEA link

Cri­tiques of non-ex­is­tent AI safety labs: Yours

Anneal16 Jun 2023 6:50 UTC
117 points
12 comments3 min readEA link

AI Safety Newslet­ter #39: Im­pli­ca­tions of a Trump Ad­minis­tra­tion for AI Policy Plus, Safety Engineering

Center for AI Safety29 Jul 2024 17:48 UTC
6 points
0 comments6 min readEA link
(newsletter.safe.ai)

In­tro­duc­ing Kairos: a new AI safety field­build­ing or­ga­ni­za­tion (the new home for SPAR and FSP)

Agustín Covarrubias 🔸25 Oct 2024 21:59 UTC
71 points
2 comments2 min readEA link

Some thoughts on “AI could defeat all of us com­bined”

Milan Griffes2 Jun 2023 15:03 UTC
23 points
0 comments4 min readEA link

AI Safety Hub Ser­bia Offi­cial Opening

Dušan D. Nešić (Dushan)28 Oct 2023 17:10 UTC
26 points
3 comments1 min readEA link
(forum.effectivealtruism.org)

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. Murphy20 Jan 2022 20:48 UTC
20 points
7 comments3 min readEA link

An­nounce­ment: You can now listen to the “AI Safety Fun­da­men­tals” courses

peterhartree9 Jun 2023 16:32 UTC
101 points
8 comments1 min readEA link

Will scal­ing work?

Vasco Grilo🔸4 Feb 2024 9:29 UTC
19 points
1 comment12 min readEA link
(www.dwarkeshpatel.com)

In­tro­duc­ing Fu­ture Mat­ters – a strat­egy consultancy

KyleGracey30 Sep 2023 2:06 UTC
59 points
2 comments5 min readEA link

State­ment on AI Ex­tinc­tion—Signed by AGI Labs, Top Aca­demics, and Many Other Notable Figures

Center for AI Safety30 May 2023 9:06 UTC
427 points
28 comments1 min readEA link
(www.safe.ai)

AXRP: Store, Pa­treon, Video

DanielFilan7 Feb 2023 5:12 UTC
7 points
0 comments1 min readEA link

The bul­ls­eye frame­work: My case against AI doom

titotal30 May 2023 11:52 UTC
71 points
15 comments17 min readEA link

A moral back­lash against AI will prob­a­bly slow down AGI development

Geoffrey Miller31 May 2023 21:31 UTC
142 points
22 comments14 min readEA link

Why and When In­ter­pretabil­ity Work is Dangerous

Nicholas / Heather Kross28 May 2023 0:27 UTC
6 points
0 comments1 min readEA link

Cal­ifor­ni­ans, tell your reps to vote yes on SB 1047!

Holly Elmore ⏸️ 🔸12 Aug 2024 19:49 UTC
106 points
6 comments1 min readEA link

List of Masters Pro­grams in Tech Policy, Public Policy and Se­cu­rity (Europe)

sberg29 May 2023 10:23 UTC
47 points
0 comments3 min readEA link

Biomimetic al­ign­ment: Align­ment be­tween an­i­mal genes and an­i­mal brains as a model for al­ign­ment be­tween hu­mans and AI sys­tems.

Geoffrey Miller26 May 2023 21:25 UTC
32 points
1 comment16 min readEA link

Seek­ing (Paid) Case Stud­ies on Standards

Holden Karnofsky26 May 2023 17:58 UTC
99 points
14 comments1 min readEA link

[Job Ad] SERI MATS is hiring for our sum­mer program

annashive26 May 2023 4:51 UTC
8 points
1 comment7 min readEA link

On the cor­re­spon­dence be­tween AI-mis­al­ign­ment and cog­ni­tive dis­so­nance us­ing a be­hav­ioral eco­nomics model

Stijn1 Nov 2022 9:15 UTC
11 points
0 comments6 min readEA link

[Linkpost] OpenAI is award­ing ten 100k grants for build­ing pro­to­types of a demo­cratic pro­cess for steer­ing AI

pseudonym26 May 2023 12:49 UTC
36 points
2 comments1 min readEA link
(openai.com)

[Linkpost] “Gover­nance of su­per­in­tel­li­gence” by OpenAI

Daniel_Eth22 May 2023 20:15 UTC
51 points
6 comments2 min readEA link
(openai.com)

Box in­ver­sion revisited

Jan_Kulveit7 Nov 2023 11:09 UTC
13 points
1 comment1 min readEA link

[Question] AI strat­egy ca­reer pipeline

Zach Stein-Perlman22 May 2023 0:00 UTC
72 points
23 comments1 min readEA link

Bandgaps, Brains, and Bioweapons: The limi­ta­tions of com­pu­ta­tional sci­ence and what it means for AGI

titotal26 May 2023 15:57 UTC
59 points
0 comments18 min readEA link

Please, some­one make a dataset of sup­posed cases of “tech panic”

Marcel D7 Nov 2023 2:49 UTC
4 points
2 comments2 min readEA link

Google in­vests $300mn in ar­tifi­cial in­tel­li­gence start-up An­thropic | FT

𝕮𝖎𝖓𝖊𝖗𝖆3 Feb 2023 19:43 UTC
155 points
5 comments1 min readEA link
(www.ft.com)

A Study of AI Science Models

Eleni_A13 May 2023 19:14 UTC
12 points
4 comments24 min readEA link

Yann LeCun on AGI and AI Safety

Chris Leong8 Aug 2023 23:43 UTC
23 points
4 comments1 min readEA link
(drive.google.com)

“Di­a­mon­doid bac­te­ria” nanobots: deadly threat or dead-end? A nan­otech in­ves­ti­ga­tion

titotal29 Sep 2023 14:01 UTC
102 points
33 comments20 min readEA link
(titotal.substack.com)

Pod­cast (+tran­script): Nathan Barnard on how US fi­nan­cial reg­u­la­tion can in­form AI governance

Aaron Bergman8 Aug 2023 21:46 UTC
12 points
0 comments23 min readEA link
(www.aaronbergman.net)

A re­cent write-up of the case for AI (ex­is­ten­tial) risk

Timsey18 May 2023 13:07 UTC
17 points
0 comments19 min readEA link

Will AI Avoid Ex­ploita­tion? (Adam Bales)

Global Priorities Institute13 Dec 2023 11:37 UTC
38 points
0 comments2 min readEA link

Stu­art J. Rus­sell on “should we press pause on AI?”

Kaleem18 Sep 2023 13:19 UTC
32 points
3 comments1 min readEA link
(podcasts.apple.com)

Some quotes from Tues­day’s Se­nate hear­ing on AI

Daniel_Eth17 May 2023 12:13 UTC
105 points
7 comments4 min readEA link

Cul­ture and Pro­gram­ming Ret­ro­spec­tive: ERA Fel­low­ship 2023

Gideon Futerman28 Sep 2023 16:45 UTC
16 points
0 comments10 min readEA link

Trends in the dol­lar train­ing cost of ma­chine learn­ing systems

Ben Cottier1 Feb 2023 14:48 UTC
63 points
3 comments1 min readEA link

The state of AI in differ­ent coun­tries — an overview

Lizka14 Sep 2023 10:37 UTC
68 points
6 comments13 min readEA link
(aisafetyfundamentals.com)

SPAR seeks ad­vi­sors and stu­dents for AI safety pro­jects (Se­cond Wave)

mic14 Sep 2023 23:09 UTC
14 points
0 comments1 min readEA link

AI safety field-build­ing sur­vey: Ta­lent needs, in­fras­truc­ture needs, and re­la­tion­ship to EA

michel27 Oct 2023 21:08 UTC
67 points
3 comments9 min readEA link

[Question] Ask­ing for on­line calls on AI s-risks discussions

jackchang11014 May 2023 13:58 UTC
26 points
3 comments1 min readEA link

What does it mean for an AGI to be ‘safe’?

So8res7 Oct 2022 4:43 UTC
53 points
21 comments1 min readEA link

Law & AI Din­ner—EAG Bos­ton 2023

Alfredo Parra 🔸12 Oct 2023 8:32 UTC
8 points
0 comments1 min readEA link

How “AGI” could end up be­ing many differ­ent spe­cial­ized AI’s stitched together

titotal8 May 2023 12:32 UTC
31 points
2 comments9 min readEA link

Ap­ply to lead a pro­ject dur­ing the next vir­tual AI Safety Camp

Linda Linsefors13 Sep 2023 13:29 UTC
16 points
0 comments1 min readEA link
(aisafety.camp)

Ag­gre­gat­ing Utilities for Cor­rigible AI [Feed­back Draft]

Dan H12 May 2023 20:57 UTC
12 points
0 comments1 min readEA link

How much do mar­kets value Open AI?

Ben_West🔸14 May 2023 19:28 UTC
39 points
13 comments4 min readEA link

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

StevenKaas8 May 2023 22:30 UTC
19 points
11 comments1 min readEA link

Re­minder: AI Wor­ld­views Con­test Closes May 31

Jason Schukraft8 May 2023 17:40 UTC
20 points
0 comments1 min readEA link

ARC Evals: Re­spon­si­ble Scal­ing Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC
16 points
1 comment1 min readEA link
(evals.alignment.org)

An Anal­ogy for Un­der­stand­ing Transformers

TheMcDouglas13 May 2023 12:20 UTC
7 points
0 comments1 min readEA link

Un­veiling the Amer­i­can Public Opinion on AI Mo­ra­to­rium and Govern­ment In­ter­ven­tion: The Im­pact of Me­dia Exposure

Otto8 May 2023 10:49 UTC
28 points
5 comments6 min readEA link

Sam Alt­man /​ Open AI Dis­cus­sion Thread

John Salter20 Nov 2023 9:21 UTC
40 points
36 comments1 min readEA link

My model of how differ­ent AI risks fit together

Stephen Clare31 Jan 2024 17:09 UTC
63 points
4 comments7 min readEA link
(unfoldingatlas.substack.com)

OpenAI’s new Pre­pared­ness team is hiring

leopold26 Oct 2023 20:41 UTC
85 points
13 comments1 min readEA link

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

Linda Linsefors27 Sep 2023 21:27 UTC
9 points
0 comments1 min readEA link

AI Safety Seems Hard to Measure

Holden Karnofsky11 Dec 2022 1:31 UTC
90 points
4 comments14 min readEA link
(www.cold-takes.com)

New re­port on the state of AI safety in China

Geoffrey Miller27 Oct 2023 20:20 UTC
22 points
0 comments3 min readEA link
(concordia-consulting.com)

The Parable of the Boy Who Cried 5% Chance of Wolf

Kat Woods15 Aug 2022 14:22 UTC
80 points
8 comments2 min readEA link

Re­grant up to $600,000 to AI safety pro­jects with GiveWiki

Dawn Drescher28 Oct 2023 19:56 UTC
22 points
0 comments3 min readEA link

AI risk/​re­ward: A sim­ple model

Nathan Young4 May 2023 19:12 UTC
37 points
5 comments7 min readEA link

[Question] Ask­ing for on­line re­sources why AI now is near AGI

jackchang11018 May 2023 0:04 UTC
6 points
4 comments1 min readEA link

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

Akash3 Feb 2023 18:49 UTC
22 points
0 comments1 min readEA link

Thread: Reflec­tions on the AGI Safety Fun­da­men­tals course?

Clifford18 May 2023 13:11 UTC
27 points
7 comments1 min readEA link

Are there enough op­por­tu­ni­ties for AI safety spe­cial­ists?

mhint19913 May 2023 21:18 UTC
8 points
2 comments3 min readEA link

Re­think Pri­ori­ties is hiring a Com­pute Gover­nance Re­searcher or Re­search Assistant

MichaelA🔸7 Jun 2023 13:22 UTC
36 points
2 comments8 min readEA link
(careers.rethinkpriorities.org)

Un-un­plug­ga­bil­ity—can’t we just un­plug it?

Oliver Sourbut15 May 2023 13:23 UTC
15 points
0 comments1 min readEA link
(www.oliversourbut.net)

How CISA can Sup­port the Se­cu­rity of Large AI Models Against Theft [Grad School As­sign­ment]

Marcel D3 May 2023 15:36 UTC
7 points
0 comments13 min readEA link

Order Mat­ters for De­cep­tive Alignment

DavidW15 Feb 2023 20:12 UTC
20 points
1 comment1 min readEA link
(www.lesswrong.com)

I don’t want to talk about ai

Kirsten22 May 2023 21:19 UTC
7 points
0 comments1 min readEA link
(ealifestyles.substack.com)

Quick sur­vey on AI al­ign­ment resources

frances_lorenz30 Jun 2022 19:08 UTC
15 points
0 comments1 min readEA link

The Po­lar­ity Prob­lem [Draft]

Dan H23 May 2023 21:05 UTC
11 points
0 comments1 min readEA link

[Link post] Michael Niel­sen’s “Notes on Ex­is­ten­tial Risk from Ar­tifi­cial Su­per­in­tel­li­gence”

Joel Becker19 Sep 2023 13:31 UTC
38 points
1 comment6 min readEA link
(michaelnotebook.com)

We’re all in this together

Tamsin Leake5 Dec 2023 13:57 UTC
15 points
1 comment1 min readEA link
(carado.moe)

AI welfare vs. AI rights

Matthew_Barnett4 Feb 2025 18:28 UTC
33 points
20 comments3 min readEA link

AI Views Snapshots

RobBensinger13 Dec 2023 0:45 UTC
25 points
0 comments1 min readEA link

Giv­ing away copies of Un­con­trol­lable by Dar­ren McKee

Greg_Colbourn14 Dec 2023 17:00 UTC
39 points
2 comments1 min readEA link

Owain Evans on LLMs, Truth­ful AI, AI Com­po­si­tion, and More

Ozzie Gooen2 May 2023 1:20 UTC
21 points
0 comments1 min readEA link
(quri.substack.com)

[Question] How to hedge in­vest­ment port­fo­lio against AI risk?

Timothy_Liptrot31 Jan 2023 8:04 UTC
8 points
0 comments1 min readEA link

The Retroac­tive Fund­ing Land­scape: In­no­va­tions for Donors and Grantmakers

Dawn Drescher29 Sep 2023 17:39 UTC
17 points
2 comments19 min readEA link
(impactmarkets.substack.com)

My AI Align­ment Re­search Agenda and Threat Model, right now (May 2023)

Nicholas / Heather Kross28 May 2023 3:23 UTC
6 points
0 comments1 min readEA link

Cal­ling for Stu­dent Sub­mis­sions: AI Safety Distil­la­tion Contest

a_e_r23 Apr 2022 20:24 UTC
102 points
28 comments3 min readEA link

P(doom|AGI) is high: why the de­fault out­come of AGI is doom

Greg_Colbourn2 May 2023 10:40 UTC
13 points
28 comments3 min readEA link

How MATS ad­dresses “mass move­ment build­ing” concerns

Ryan Kidd4 May 2023 0:55 UTC
79 points
4 comments1 min readEA link

My cur­rent take on ex­is­ten­tial AI risk [FB post]

Aryeh Englander1 May 2023 16:22 UTC
10 points
0 comments3 min readEA link

EA, Psy­chol­ogy & AI Safety Research

Sam Ellis26 May 2022 23:46 UTC
28 points
3 comments6 min readEA link

Align­ment is mostly about mak­ing cog­ni­tion aimable at all

So8res30 Jan 2023 15:22 UTC
57 points
3 comments1 min readEA link

Re­think Pri­ori­ties’ 2023 Sum­mary, 2024 Strat­egy, and Fund­ing Gaps

kierangreig🔸15 Nov 2023 20:56 UTC
86 points
7 comments3 min readEA link

Play Re­grantor: Move up to $250,000 to Your Top High-Im­pact Pro­jects!

Dawn Drescher17 May 2023 16:51 UTC
58 points
2 comments2 min readEA link
(impactmarkets.substack.com)

A Guide to Fore­cast­ing AI Science Capabilities

Eleni_A29 Apr 2023 6:51 UTC
19 points
1 comment4 min readEA link

Apoca­lypse in­surance, and the hardline liber­tar­ian take on AI risk

So8res28 Nov 2023 2:09 UTC
21 points
0 comments1 min readEA link

New open let­ter on AI — “In­clude Con­scious­ness Re­search”

Jamie_Harris28 Apr 2023 7:50 UTC
55 points
1 comment3 min readEA link
(amcs-community.org)

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner x Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

Miguel26 Apr 2023 0:40 UTC
13 points
0 comments10 min readEA link

Fram­ing AI strategy

Zach Stein-Perlman7 Feb 2023 20:03 UTC
16 points
0 comments1 min readEA link
(www.lesswrong.com)

Com­pendium of prob­lems with RLHF

Raphaël S30 Jan 2023 8:48 UTC
18 points
0 comments1 min readEA link

AI safety logo de­sign con­test, due end of May (ex­tended)

Adrian Cipriani28 Apr 2023 2:53 UTC
13 points
23 comments2 min readEA link

[Linkpost] ‘The God­father of A.I.’ Leaves Google and Warns of Danger Ahead

imp4rtial 🔸1 May 2023 19:54 UTC
43 points
3 comments3 min readEA link
(www.nytimes.com)

The AI in­dus­try turns against its fa­vorite philosophy

Jonathan Yan22 Nov 2023 0:11 UTC
14 points
2 comments1 min readEA link
(www.semafor.com)

Value frag­ility and AI takeover

Joe_Carlsmith5 Aug 2024 21:28 UTC
38 points
3 comments1 min readEA link

Max Teg­mark’s new Time ar­ti­cle on how we’re in a Don’t Look Up sce­nario [Linkpost]

Jonas Hallgren25 Apr 2023 15:47 UTC
41 points
0 comments1 min readEA link

“Who Will You Be After ChatGPT Takes Your Job?”

Stephen Thomas21 Apr 2023 21:31 UTC
23 points
4 comments2 min readEA link
(www.wired.com)

Tech­nolog­i­cal de­vel­op­ments that could in­crease risks from nu­clear weapons: A shal­low review

MichaelA🔸9 Feb 2023 15:41 UTC
79 points
3 comments5 min readEA link
(bit.ly)

Up­date on cause area fo­cus work­ing group

Bastian_Stern10 Aug 2023 1:21 UTC
140 points
18 comments5 min readEA link

Power laws in Speedrun­ning and Ma­chine Learning

Jaime Sevilla24 Apr 2023 10:06 UTC
48 points
0 comments1 min readEA link

In­tent al­ign­ment should not be the goal for AGI x-risk reduction

johnjnay26 Oct 2022 1:24 UTC
7 points
1 comment1 min readEA link

Jobs that can help with the most im­por­tant century

Holden Karnofsky12 Feb 2023 18:19 UTC
57 points
2 comments32 min readEA link
(www.cold-takes.com)

A note of cau­tion about re­cent AI risk coverage

Sean_o_h7 Jun 2023 17:05 UTC
283 points
29 comments3 min readEA link

Qual­ities that al­ign­ment men­tors value in ju­nior researchers

Akash14 Feb 2023 23:27 UTC
31 points
1 comment1 min readEA link

An Ex­er­cise to Build In­tu­itions on AGI Risk

Lauro Langosco8 Jun 2023 11:20 UTC
4 points
0 comments8 min readEA link
(www.alignmentforum.org)

Have your say on the Aus­tralian Govern­ment’s AI Policy [Bris­bane]

Michael Noetel 🔸9 Jun 2023 0:15 UTC
6 points
0 comments1 min readEA link

Fo­cus­ing your im­pact on short vs long TAI timelines

kuhanj30 Sep 2023 19:23 UTC
44 points
0 comments10 min readEA link

Stu­dent com­pe­ti­tion for draft­ing a treaty on mora­to­rium of large-scale AI ca­pa­bil­ities R&D

Nayanika24 Apr 2023 13:15 UTC
36 points
4 comments2 min readEA link

Ideas for AI labs: Read­ing list

Zach Stein-Perlman24 Apr 2023 19:00 UTC
28 points
2 comments1 min readEA link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 19:39 UTC
29 points
8 comments2 min readEA link
(www.lesswrong.com)

Join AISafety.info’s Distil­la­tion Hackathon (Oct 6-9th)

leillustrations🔸1 Oct 2023 18:42 UTC
27 points
2 comments2 min readEA link
(www.lesswrong.com)

Be­fore Alt­man’s Ouster, OpenAI’s Board Was Di­vided and Feuding

Jonathan Yan22 Nov 2023 1:01 UTC
25 points
1 comment1 min readEA link
(www.nytimes.com)

[Question] If your AGI x-risk es­ti­mates are low, what sce­nar­ios make up the bulk of your ex­pec­ta­tions for an OK out­come?

Greg_Colbourn21 Apr 2023 11:15 UTC
62 points
55 comments1 min readEA link

<$750k grants for Gen­eral Pur­pose AI As­surance/​Safety Research

Phosphorous13 Jun 2023 4:51 UTC
37 points
0 comments1 min readEA link
(cset.georgetown.edu)

Deep­Mind and Google Brain are merg­ing [Linkpost]

Akash20 Apr 2023 18:47 UTC
32 points
1 comment1 min readEA link

Will re­leas­ing the weights of large lan­guage mod­els grant wide­spread ac­cess to pan­demic agents?

Jeff Kaufman 🔸30 Oct 2023 17:42 UTC
56 points
18 comments1 min readEA link
(arxiv.org)

The Im­por­tance of AI Align­ment, ex­plained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC
50 points
4 comments13 min readEA link

11 heuris­tics for choos­ing (al­ign­ment) re­search projects

Akash27 Jan 2023 0:36 UTC
30 points
1 comment1 min readEA link

What is it like do­ing AI safety work?

Kat Woods21 Feb 2023 19:24 UTC
99 points
2 comments10 min readEA link

Linkpost: Dwarkesh Pa­tel in­ter­view­ing Carl Shulman

Stefan_Schubert14 Jun 2023 15:30 UTC
110 points
5 comments1 min readEA link
(podcastaddict.com)

4 ways to think about de­moc­ra­tiz­ing AI [GovAI Linkpost]

Akash13 Feb 2023 18:06 UTC
35 points
0 comments1 min readEA link

What Does a Marginal Grant at LTFF Look Like? Fund­ing Pri­ori­ties and Grant­mak­ing Thresh­olds at the Long-Term Fu­ture Fund

Linch10 Aug 2023 20:11 UTC
175 points
22 comments8 min readEA link

‘AI Emer­gency Eject Cri­te­ria’ Survey

tcelferact19 Apr 2023 21:55 UTC
5 points
3 comments1 min readEA link

Paper­clip Club (AI Safety Meetup)

Luke Thorburn20 Apr 2023 16:04 UTC
2 points
0 comments1 min readEA link

[Link Post: New York Times] White House Un­veils Ini­ti­a­tives to Re­duce Risks of A.I.

Rockwell4 May 2023 14:04 UTC
50 points
1 comment2 min readEA link

AI Risk Man­age­ment Frame­work | NIST

𝕮𝖎𝖓𝖊𝖗𝖆26 Jan 2023 15:27 UTC
50 points
0 comments1 min readEA link

5 Rea­sons Why Govern­ments/​Mili­taries Already Want AI for In­for­ma­tion Warfare

trevor112 Nov 2023 18:24 UTC
5 points
0 comments1 min readEA link

AI policy ideas: Read­ing list

Zach Stein-Perlman17 Apr 2023 19:00 UTC
60 points
3 comments1 min readEA link

Ge­offrey Miller on Cross-Cul­tural Un­der­stand­ing Between China and Western Coun­tries as a Ne­glected Con­sid­er­a­tion in AI Alignment

Evan_Gaensbauer17 Apr 2023 3:26 UTC
25 points
2 comments4 min readEA link

2023 Align­ment Re­search Up­dates from FAR AI

AdamGleave4 Dec 2023 22:32 UTC
14 points
0 comments1 min readEA link
(far.ai)

Vir­tual AI Safety Un­con­fer­ence (VAISU)

Nguyên20 Jun 2023 9:47 UTC
14 points
0 comments1 min readEA link

AI Takeover Sce­nario with Scaled LLMs

simeon_c16 Apr 2023 23:28 UTC
29 points
1 comment1 min readEA link

Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

Otto19 Apr 2023 10:50 UTC
75 points
1 comment4 min readEA link

Next steps af­ter AGISF at UMich

JakubK25 Jan 2023 20:57 UTC
18 points
1 comment1 min readEA link

[Question] What harm could AI safety do?

SeanEngelhart15 May 2021 1:11 UTC
12 points
7 comments1 min readEA link

AGI in sight: our look at the game board

Andrea_Miotti18 Feb 2023 22:17 UTC
25 points
18 comments1 min readEA link

Ex­cerpts from “Do­ing EA Bet­ter” on x-risk methodology

Eevee🔹26 Jan 2023 1:04 UTC
22 points
5 comments6 min readEA link
(forum.effectivealtruism.org)

[Linkpost] The A.I. Dilemma—March 9, 2023, with Tris­tan Har­ris and Aza Raskin

PeterSlattery14 Apr 2023 8:00 UTC
38 points
3 comments41 min readEA link
(youtu.be)

Spread­ing mes­sages to help with the most im­por­tant century

Holden Karnofsky25 Jan 2023 20:35 UTC
128 points
21 comments18 min readEA link
(www.cold-takes.com)

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

simeon_c14 Apr 2023 14:35 UTC
12 points
1 comment1 min readEA link

12 ten­ta­tive ideas for US AI policy (Luke Muehlhauser)

Lizka19 Apr 2023 21:05 UTC
117 points
12 comments4 min readEA link
(www.openphilanthropy.org)

An­nounc­ing Epoch’s dash­board of key trends and figures in Ma­chine Learning

Jaime Sevilla13 Apr 2023 7:33 UTC
127 points
4 comments1 min readEA link

Don’t Call It AI Alignment

Gil20 Feb 2023 5:27 UTC
16 points
7 comments2 min readEA link

[Question] Ev­i­dence to pri­ori­tize or work­ing on AI as the most im­pact­ful thing?

Vaipan22 Sep 2023 8:43 UTC
9 points
6 comments1 min readEA link

AIs ac­cel­er­at­ing AI research

Ajeya12 Apr 2023 11:41 UTC
84 points
7 comments4 min readEA link

[MLSN #9] Ver­ify­ing large train­ing runs, se­cu­rity risks from LLM ac­cess to APIs, why nat­u­ral se­lec­tion may fa­vor AIs over humans

TW12311 Apr 2023 16:05 UTC
18 points
0 comments6 min readEA link
(newsletter.mlsafety.org)

kpurens’s Quick takes

kpurens11 Apr 2023 14:10 UTC
9 points
2 comments2 min readEA link

Why peo­ple want to work on AI safety (but don’t)

Emily Grundy24 Jan 2023 6:41 UTC
70 points
10 comments7 min readEA link

AI Safety Newslet­ter #1 [CAIS Linkpost]

Akash10 Apr 2023 20:18 UTC
38 points
0 comments1 min readEA link

CEEALAR: 2024 Update

CEEALAR19 Jul 2024 11:14 UTC
116 points
7 comments4 min readEA link

An EA used de­cep­tive mes­sag­ing to ad­vance her pro­ject; we need mechanisms to avoid de­on­tolog­i­cally du­bi­ous plans

MikhailSamin13 Feb 2024 23:11 UTC
22 points
39 comments5 min readEA link

Me­tac­u­lus’ pre­dic­tions are much bet­ter than low-in­for­ma­tion priors

Vasco Grilo🔸11 Apr 2023 8:36 UTC
53 points
0 comments6 min readEA link

Sur­vey on the ac­cel­er­a­tion risks of our new RFPs to study LLM capabilities

Ajeya10 Nov 2023 23:59 UTC
38 points
1 comment8 min readEA link

Ap­ply for men­tor­ship in AI Safety field-building

Akash17 Sep 2022 19:03 UTC
21 points
0 comments1 min readEA link

Cruxes on US lead for some do­mes­tic AI regulation

Zach Stein-Perlman10 Sep 2023 18:00 UTC
20 points
6 comments2 min readEA link

[Question] Which stocks or ETFs should you in­vest in to take ad­van­tage of a pos­si­ble AGI ex­plo­sion, and why?

Eevee🔹10 Apr 2023 17:55 UTC
19 points
16 comments1 min readEA link

Hu­mans are not pre­pared to op­er­ate out­side their moral train­ing distribution

Prometheus10 Apr 2023 21:44 UTC
12 points
0 comments1 min readEA link

Ap­pli­ca­tions open: Sup­port for tal­ent work­ing on in­de­pen­dent learn­ing, re­search or en­trepreneurial pro­jects fo­cused on re­duc­ing global catas­trophic risks

CEEALAR9 Feb 2024 13:04 UTC
63 points
1 comment2 min readEA link

My highly per­sonal skep­ti­cism brain­dump on ex­is­ten­tial risk from ar­tifi­cial in­tel­li­gence.

NunoSempere23 Jan 2023 20:08 UTC
435 points
116 comments14 min readEA link
(nunosempere.com)

[Question] Why might AI be a x-risk? Suc­cinct ex­pla­na­tions please

Sanjay4 Apr 2023 12:46 UTC
20 points
9 comments1 min readEA link

Mis­gen­er­al­iza­tion as a misnomer

So8res6 Apr 2023 20:43 UTC
48 points
0 comments1 min readEA link

Beren’s “De­con­fus­ing Direct vs Amor­tised Op­ti­mi­sa­tion”

𝕮𝖎𝖓𝖊𝖗𝖆7 Apr 2023 8:57 UTC
9 points
0 comments1 min readEA link

[Question] Imag­ine AGI kil­led us all in three years. What would have been our biggest mis­takes?

yanni kyriacos7 Apr 2023 0:06 UTC
17 points
6 comments1 min readEA link

Re­cur­sive Mid­dle Man­ager Hell

Raemon17 Jan 2023 19:02 UTC
73 points
3 comments1 min readEA link

Is it time for a pause?

Kelsey Piper6 Apr 2023 11:48 UTC
103 points
6 comments5 min readEA link

Orthog­o­nal­ity is Expensive

𝕮𝖎𝖓𝖊𝖗𝖆3 Apr 2023 1:57 UTC
18 points
4 comments1 min readEA link

EA In­fosec: skill up in or make a tran­si­tion to in­fosec via this book club

Jason Clinton5 Mar 2023 21:02 UTC
170 points
16 comments2 min readEA link

Say­ing ‘AI safety re­search is a Pas­cal’s Mug­ging’ isn’t a strong response

Robert_Wiblin15 Dec 2015 13:48 UTC
15 points
16 comments2 min readEA link

An ‘AGI Emer­gency Eject Cri­te­ria’ con­sen­sus could be re­ally use­ful.

tcelferact7 Apr 2023 16:21 UTC
27 points
3 comments1 min readEA link

GPTs are Pre­dic­tors, not Imitators

EliezerYudkowsky8 Apr 2023 19:59 UTC
74 points
12 comments1 min readEA link

OpenAI o1

Zach Stein-Perlman12 Sep 2024 18:54 UTC
38 points
0 comments1 min readEA link

In­ves­ti­gat­ing an in­surance-for-AI startup

L Rudolf L21 Sep 2024 15:29 UTC
40 points
1 comment1 min readEA link
(www.strataoftheworld.com)

Race to the Top: Bench­marks for AI Safety

isaduan4 Dec 2022 22:50 UTC
51 points
8 comments1 min readEA link

AISafety.world is a map of the AIS ecosystem

Hamish McDoodles6 Apr 2023 11:47 UTC
190 points
8 comments1 min readEA link

We might get lucky with AGI warn­ing shots. Let’s be ready!

tcelferact31 Mar 2023 21:37 UTC
22 points
2 comments1 min readEA link

Distinc­tions when Dis­cussing Utility Functions

Ozzie Gooen8 Mar 2024 18:43 UTC
15 points
5 comments8 min readEA link

[Question] Can we eval­u­ate the “tool ver­sus agent” AGI pre­dic­tion?

Ben_West🔸8 Apr 2023 18:35 UTC
63 points
7 comments1 min readEA link

New sur­vey: 46% of Amer­i­cans are con­cerned about ex­tinc­tion from AI; 69% sup­port a six-month pause in AI development

Akash5 Apr 2023 1:26 UTC
143 points
34 comments1 min readEA link

Wi­den­ing Over­ton Win­dow—Open Thread

Prometheus31 Mar 2023 10:06 UTC
12 points
5 comments1 min readEA link
(www.lesswrong.com)

Defer­ence on AI timelines: sur­vey results

Sam Clarke30 Mar 2023 23:03 UTC
68 points
3 comments2 min readEA link

Re­cruit the World’s best for AGI Alignment

Greg_Colbourn30 Mar 2023 16:41 UTC
34 points
8 comments22 min readEA link

Nu­clear brinks­man­ship is not a good AI x-risk strategy

titotal30 Mar 2023 22:07 UTC
19 points
8 comments5 min readEA link

AI and Evolution

Dan H30 Mar 2023 13:09 UTC
41 points
1 comment2 min readEA link
(arxiv.org)

How LDT helps re­duce the AI arms race

Tamsin Leake10 Dec 2023 16:21 UTC
8 points
1 comment1 min readEA link
(carado.moe)

No­body’s on the ball on AGI alignment

leopold29 Mar 2023 14:26 UTC
327 points
65 comments9 min readEA link
(www.forourposterity.com)

[Draft] The hum­ble cos­mol­o­gist’s P(doom) paradox

titotal16 Mar 2024 11:13 UTC
38 points
6 comments10 min readEA link

[TIME mag­a­z­ine] Deep­Mind’s CEO Helped Take AI Main­stream. Now He’s Urg­ing Cau­tion (Per­rigo, 2023)

Will Aldred20 Jan 2023 20:37 UTC
93 points
0 comments1 min readEA link
(time.com)

Want to win the AGI race? Solve al­ign­ment.

leopold29 Mar 2023 15:19 UTC
56 points
6 comments5 min readEA link
(www.forourposterity.com)

“Dangers of AI and the End of Hu­man Civ­i­liza­tion” Yud­kowsky on Lex Fridman

𝕮𝖎𝖓𝖊𝖗𝖆30 Mar 2023 15:44 UTC
28 points
0 comments1 min readEA link

A rough and in­com­plete re­view of some of John Went­worth’s research

So8res28 Mar 2023 18:52 UTC
27 points
0 comments1 min readEA link

Deep­Mind: Eval­u­at­ing Fron­tier Models for Danger­ous Capabilities

Zach Stein-Perlman21 Mar 2024 23:00 UTC
28 points
0 comments1 min readEA link
(arxiv.org)

Semi-con­duc­tor /​ AI stocks dis­cus­sion.

sapphire25 Nov 2022 23:35 UTC
10 points
3 comments1 min readEA link

What would a com­pute mon­i­tor­ing plan look like? [Linkpost]

Akash26 Mar 2023 19:33 UTC
61 points
1 comment1 min readEA link

A stylized di­alogue on John Went­worth’s claims about mar­kets and optimization

So8res25 Mar 2023 22:32 UTC
18 points
0 comments1 min readEA link

[Question] Please help me sense-check my as­sump­tions about the needs of the AI Safety com­mu­nity and re­lated ca­reer plans

PeterSlattery27 Mar 2023 8:11 UTC
23 points
27 comments2 min readEA link

Suc­ces­sif: Join our AI pro­gram to help miti­gate the catas­trophic risks of AI

ClaireB25 Oct 2023 16:51 UTC
15 points
0 comments5 min readEA link

My at­tempt at ex­plain­ing the case for AI risk in a straight­for­ward way

JulianHazell25 Mar 2023 16:32 UTC
25 points
7 comments18 min readEA link
(muddyclothes.substack.com)

[Question] AI+bio can­not be half of AI catas­tro­phe risk, right?

Benevolent_Rain10 Oct 2023 3:17 UTC
23 points
11 comments2 min readEA link

13 Very Differ­ent Stances on AGI

Ozzie Gooen27 Dec 2021 23:30 UTC
84 points
23 comments3 min readEA link

AI al­ign­ment shouldn’t be con­flated with AI moral achievement

Matthew_Barnett30 Dec 2023 3:08 UTC
114 points
15 comments5 min readEA link

Guardrails vs Goal-di­rect­ed­ness in AI Alignment

freedomandutility30 Dec 2023 12:58 UTC
13 points
2 comments1 min readEA link

Civil di­s­obe­di­ence op­por­tu­nity—a way to help re­duce chance of hard take­off from re­cur­sive self im­prove­ment of code

JonCefalu25 Mar 2023 22:37 UTC
−5 points
0 comments1 min readEA link
(codegencodepoisoningcontest.cargo.site)

EA Wins 2023

Shakeel Hashim31 Dec 2023 14:07 UTC
357 points
9 comments3 min readEA link

Truth and Ad­van­tage: Re­sponse to a draft of “AI safety seems hard to mea­sure”

So8res22 Mar 2023 3:36 UTC
11 points
0 comments1 min readEA link

Ideas for im­prov­ing epistemics in AI safety outreach

mic21 Aug 2023 19:56 UTC
31 points
0 comments3 min readEA link
(www.lesswrong.com)

[Question] What is AI Safety’s line of re­treat?

Remmelt28 Jul 2024 5:43 UTC
4 points
2 comments1 min readEA link

Col­lin Burns on Align­ment Re­search And Dis­cov­er­ing La­tent Knowl­edge Without Supervision

Michaël Trazzi17 Jan 2023 17:21 UTC
21 points
3 comments1 min readEA link

[Linkpost] Prospect Magaz­ine—How to save hu­man­ity from extinction

jackva26 Sep 2023 19:16 UTC
32 points
2 comments1 min readEA link
(www.prospectmagazine.co.uk)

Mea­sur­ing AI-Driven Risk with Stock Prices (Su­sana Cam­pos-Mart­ins)

Global Priorities Institute12 Dec 2024 14:22 UTC
10 points
1 comment4 min readEA link
(globalprioritiesinstitute.org)

[Question] Will AI Wor­ld­view Prize Fund­ing Be Re­placed?

Jordan Arel13 Nov 2022 17:10 UTC
26 points
4 comments1 min readEA link

Ap­ply to CEEALAR to do AGI mora­to­rium work

Greg_Colbourn26 Jul 2023 21:24 UTC
62 points
0 comments1 min readEA link

Shal­low re­view of live agen­das in al­ign­ment & safety

technicalities27 Nov 2023 11:33 UTC
76 points
8 comments29 min readEA link

Me­tac­u­lus Pre­dicts Weak AGI in 2 Years and AGI in 10

Chris Leong24 Mar 2023 19:43 UTC
27 points
12 comments1 min readEA link

An­nounc­ing the ITAM AI Fu­tures Fel­low­ship

AmAristizabal28 Jul 2023 16:44 UTC
43 points
3 comments2 min readEA link

Paul Chris­ti­ano on Dwarkesh Podcast

ESRogs3 Nov 2023 22:13 UTC
5 points
0 comments1 min readEA link
(www.dwarkeshpatel.com)

An­nounc­ing the Pivotal Re­search Fel­low­ship – Ap­ply Now!

Tobias Häberli3 Apr 2024 17:30 UTC
51 points
5 comments2 min readEA link

The Nav­i­ga­tion Fund launched + is hiring a pro­gram officer to lead the dis­tri­bu­tion of $20M an­nu­ally for AI safety! Full-time, fully re­mote, pay starts at $200k

vincentweisser3 Nov 2023 21:53 UTC
120 points
3 comments1 min readEA link

An­nounc­ing Epoch’s newly ex­panded Pa­ram­e­ters, Com­pute and Data Trends in Ma­chine Learn­ing database

Robi Rahman25 Oct 2023 3:03 UTC
38 points
1 comment1 min readEA link
(epochai.org)

The Top AI Safety Bets for 2023: GiveWiki’s Lat­est Recommendations

Dawn Drescher11 Nov 2023 9:04 UTC
11 points
4 comments8 min readEA link

AGI and the EMH: mar­kets are not ex­pect­ing al­igned or un­al­igned AI in the next 30 years

basil.halperin10 Jan 2023 16:05 UTC
342 points
177 comments26 min readEA link

Call for Papers on Global AI Gover­nance from the UN

Chris Leong20 Aug 2023 8:56 UTC
36 points
1 comment1 min readEA link
(www.linkedin.com)

On “slack” in train­ing (Sec­tion 1.5 of “Schem­ing AIs”)

Joe_Carlsmith25 Nov 2023 17:51 UTC
14 points
1 comment1 min readEA link

Su­per­vised Pro­gram for Align­ment Re­search (SPAR) at UC Berkeley: Spring 2023 summary

mic19 Aug 2023 2:32 UTC
18 points
1 comment6 min readEA link
(www.lesswrong.com)

Be­ware safety-washing

Lizka13 Jan 2023 10:39 UTC
143 points
7 comments4 min readEA link

Defin­ing al­ign­ment research

richard_ngo19 Aug 2024 22:49 UTC
48 points
1 comment1 min readEA link

Longter­mism Fund: Au­gust 2023 Grants Report

Michael Townsend🔸20 Aug 2023 5:34 UTC
81 points
3 comments5 min readEA link

[Question] Game the­ory work on AI al­ign­ment with di­verse AI sys­tems, hu­man in­di­vi­d­u­als, & hu­man groups?

Geoffrey Miller2 Mar 2023 16:50 UTC
22 points
2 comments1 min readEA link

How ARENA course ma­te­rial gets made

TheMcDouglas2 Jul 2024 7:27 UTC
12 points
0 comments1 min readEA link

Sili­con Valley’s Rab­bit Hole Problem

Mandelbrot8 Oct 2023 12:25 UTC
34 points
44 comments11 min readEA link
(medium.com)

[Question] What is the coun­ter­fac­tual value of differ­ent AI Safety pro­fes­sion­als?

PabloAMC 🔸3 Jul 2024 14:38 UTC
6 points
2 comments1 min readEA link

The AI Boom Mainly Benefits Big Firms, but long-term, mar­kets will concentrate

Hauke Hillebrandt29 Oct 2023 8:38 UTC
12 points
0 comments1 min readEA link

AIのタイムライン ─ 提案されている論証と「専門家」の立ち位置

EA Japan17 Aug 2023 14:59 UTC
2 points
0 comments1 min readEA link

Vic­to­ria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël Trazzi12 Jan 2023 17:09 UTC
16 points
0 comments1 min readEA link

What is au­ton­omy, and how does it lead to greater risk from AI?

Davidmanheim1 Aug 2023 8:06 UTC
10 points
0 comments6 min readEA link
(www.lesswrong.com)

Linkpost: 7 A.I. Com­pa­nies Agree to Safe­guards After Pres­sure From the White House

MHR🔸21 Jul 2023 13:23 UTC
61 points
4 comments1 min readEA link
(www.nytimes.com)

VIRTUA: a novel about AI alignment

Karl von Wendt12 Jan 2023 9:37 UTC
23 points
0 comments1 min readEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_Carlsmith26 Nov 2023 23:00 UTC
12 points
1 comment1 min readEA link

The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Akash23 Mar 2023 17:10 UTC
112 points
11 comments1 min readEA link

Tran­script: NBC Nightly News: AI ‘race to reck­less­ness’ w/​ Tris­tan Har­ris, Aza Raskin

WilliamKiely23 Mar 2023 3:45 UTC
47 points
1 comment1 min readEA link

In­tro­duc­ing the new Ries­gos Catas­trófi­cos Globales team

Jaime Sevilla3 Mar 2023 23:04 UTC
74 points
3 comments5 min readEA link
(riesgoscatastroficosglobales.com)

Ex­cerpts from “Ma­jor­ity Leader Schumer De­liv­ers Re­marks To Launch SAFE In­no­va­tion Frame­work For Ar­tifi­cial In­tel­li­gence At CSIS”

Chris Leong21 Jul 2023 23:15 UTC
19 points
0 comments1 min readEA link
(www.democrats.senate.gov)

AGI Take­off dy­nam­ics—In­tel­li­gence vs Quan­tity ex­plo­sion

EdoArad26 Jul 2023 9:20 UTC
14 points
0 comments2 min readEA link
(github.com)

Cost-effec­tive­ness of pro­fes­sional field-build­ing pro­grams for AI safety research

Center for AI Safety10 Jul 2023 17:26 UTC
38 points
2 comments18 min readEA link

The US-China Re­la­tion­ship and Catas­trophic Risk (EAG Bos­ton tran­script)

EA Global9 Jul 2024 13:50 UTC
30 points
1 comment19 min readEA link

AI Safety Newslet­ter #40: Cal­ifor­nia AI Leg­is­la­tion Plus, NVIDIA De­lays Chip Pro­duc­tion, and Do AI Safety Bench­marks Ac­tu­ally Mea­sure Safety?

Center for AI Safety21 Aug 2024 18:10 UTC
17 points
0 comments6 min readEA link
(newsletter.safe.ai)

US Congress in­tro­duces CREATE AI Act for es­tab­lish­ing Na­tional AI Re­search Resource

Daniel_Eth28 Jul 2023 23:27 UTC
9 points
1 comment1 min readEA link
(eshoo.house.gov)

Cost-effec­tive­ness of stu­dent pro­grams for AI safety research

Center for AI Safety10 Jul 2023 17:23 UTC
53 points
7 comments15 min readEA link

White House pub­lishes frame­work for Nu­cleic Acid Screening

Agustín Covarrubias 🔸30 Apr 2024 0:44 UTC
30 points
1 comment1 min readEA link
(www.whitehouse.gov)

The­o­ries of Change for Track II Di­plo­macy [Founders Pledge]

christian.r9 Jul 2024 13:31 UTC
20 points
2 comments33 min readEA link

[Question] Strongest real-world ex­am­ples sup­port­ing AI risk claims?

rosehadshar5 Sep 2023 15:11 UTC
52 points
9 comments1 min readEA link

We Did AGISF’s 8-week Course in 3 Days. Here’s How it Went

ag400024 Jul 2022 16:46 UTC
26 points
7 comments6 min readEA link

Have your say on the Aus­tralian Govern­ment’s AI Policy

Nathan Sherburn17 Jul 2023 11:02 UTC
3 points
1 comment1 min readEA link

[Question] What is the eas­iest/​funnest way to build up a com­pre­hen­sive un­der­stand­ing of AI and AI Safety?

Jordan Arel30 Apr 2024 18:39 UTC
14 points
0 comments1 min readEA link

Thoughts on yes­ter­day’s UN Se­cu­rity Coun­cil meet­ing on AI

Greg_Colbourn19 Jul 2023 16:46 UTC
31 points
2 comments1 min readEA link

Model­ing the im­pact of AI safety field-build­ing programs

Center for AI Safety10 Jul 2023 17:22 UTC
83 points
0 comments7 min readEA link

De­bate se­ries: should we push for a pause on the de­vel­op­ment of AI?

Ben_West🔸8 Sep 2023 16:29 UTC
252 points
58 comments1 min readEA link

Some rea­sons to start a pro­ject to stop harm­ful AI

Remmelt22 Aug 2024 16:23 UTC
5 points
0 comments1 min readEA link

Five Years of Re­think Pri­ori­ties: Im­pact, Fu­ture Plans, Fund­ing Needs (July 2023)

Rethink Priorities18 Jul 2023 15:59 UTC
110 points
3 comments16 min readEA link

We need non-cy­ber­se­cu­rity peo­ple [too]

Jarrah5 May 2024 0:11 UTC
32 points
0 comments2 min readEA link

How I Formed My Own Views About AI Safety

Neel Nanda27 Feb 2022 18:52 UTC
134 points
12 comments14 min readEA link
(www.neelnanda.io)

Lev­el­ling Up in AI Safety Re­search Engineering

GabeM2 Sep 2022 4:59 UTC
165 points
21 comments17 min readEA link

Read­ing list on AI agents and as­so­ci­ated policy

Peter Wildeford9 Aug 2024 17:40 UTC
79 points
2 comments1 min readEA link

Is this com­mu­nity over-em­pha­siz­ing AI al­ign­ment?

Lixiang8 Jan 2023 6:23 UTC
1 point
5 comments1 min readEA link

New Deep­Mind re­port on in­sti­tu­tions for global AI governance

finm14 Jul 2023 16:05 UTC
10 points
0 comments1 min readEA link
(www.deepmind.com)

An­nounc­ing the Ex­is­ten­tial In­foSec Forum

calebp7 Jul 2023 21:08 UTC
90 points
1 comment2 min readEA link

[Linkpost] Jan Leike on three kinds of al­ign­ment taxes

Akash6 Jan 2023 23:57 UTC
29 points