RSS

Joe_Carlsmith

Karma: 3,441

Senior research analyst at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 30, 2023, 4:43 PM
6 points
1 commentEA link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe_CarlsmithNov 29, 2023, 4:32 PM
7 points
0 commentsEA link

Two sources of be­yond-epi­sode goals (Sec­tion 2.2.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 28, 2023, 1:49 PM
8 points
0 commentsEA link

Two con­cepts of an “epi­sode” (Sec­tion 2.2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 27, 2023, 6:01 PM
11 points
1 commentEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 26, 2023, 11:00 PM
12 points
1 commentEA link

On “slack” in train­ing (Sec­tion 1.5 of “Schem­ing AIs”)

Joe_CarlsmithNov 25, 2023, 5:51 PM
14 points
1 commentEA link

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe_CarlsmithNov 24, 2023, 7:18 PM
10 points
1 commentEA link

A tax­on­omy of non-schemer mod­els (Sec­tion 1.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 22, 2023, 3:24 PM
6 points
0 commentsEA link

Va­ri­eties of fake al­ign­ment (Sec­tion 1.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 21, 2023, 3:00 PM
6 points
0 commentsEA link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_CarlsmithNov 15, 2023, 5:16 PM
71 points
4 commentsEA link

Su­perfore­cast­ing the premises in “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe_CarlsmithOct 18, 2023, 8:33 PM
114 points
3 commentsEA link

In mem­ory of Louise Glück

Joe_CarlsmithOct 15, 2023, 3:10 AM
22 points
2 comments8 min readEA link

The “no sand­bag­ging on check­able tasks” hypothesis

Joe_CarlsmithJul 31, 2023, 11:13 PM
10 points
0 comments9 min readEA link

Pre­dictable up­dat­ing about AI risk

Joe_CarlsmithMay 8, 2023, 10:05 PM
134 points
12 comments36 min readEA link

[Linkpost] Shorter ver­sion of re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe_CarlsmithMar 22, 2023, 6:06 PM
49 points
1 comment1 min readEA link

A Stranger Pri­or­ity? Topics at the Outer Reaches of Effec­tive Altru­ism (my dis­ser­ta­tion)

Joe_CarlsmithFeb 21, 2023, 5:16 PM
64 points
0 comments1 min readEA link

See­ing more whole

Joe_CarlsmithFeb 17, 2023, 5:14 AM
122 points
9 comments26 min readEA link

Why should eth­i­cal anti-re­al­ists do ethics?

Joe_CarlsmithFeb 16, 2023, 4:27 PM
118 points
10 comments27 min readEA link

[Linkpost] Hu­man-nar­rated au­dio ver­sion of “Is Power-Seek­ing AI an Ex­is­ten­tial Risk?”

Joe_CarlsmithJan 31, 2023, 7:19 PM
9 points
0 comments1 min readEA link

On sincerity

Joe_CarlsmithDec 23, 2022, 5:14 PM
46 points
3 comments42 min readEA link