RSS

Joe_Carlsmith

Karma: 2,399

Senior research analyst at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.

Two sources of be­yond-epi­sode goals (Sec­tion 2.2.2 of “Schem­ing AIs”)

Joe_Carlsmith28 Nov 2023 13:49 UTC
6 points
0 comments1 min readEA link

Two con­cepts of an “epi­sode” (Sec­tion 2.2.1 of “Schem­ing AIs”)

Joe_Carlsmith27 Nov 2023 18:01 UTC
11 points
1 comment1 min readEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_Carlsmith26 Nov 2023 23:00 UTC
6 points
1 comment1 min readEA link

On “slack” in train­ing (Sec­tion 1.5 of “Schem­ing AIs”)

Joe_Carlsmith25 Nov 2023 17:51 UTC
14 points
1 comment1 min readEA link

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe_Carlsmith24 Nov 2023 19:18 UTC
10 points
1 comment1 min readEA link

A tax­on­omy of non-schemer mod­els (Sec­tion 1.2 of “Schem­ing AIs”)

Joe_Carlsmith22 Nov 2023 15:24 UTC
6 points
0 comments1 min readEA link

Va­ri­eties of fake al­ign­ment (Sec­tion 1.1 of “Schem­ing AIs”)

Joe_Carlsmith21 Nov 2023 15:00 UTC
6 points
0 comments1 min readEA link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_Carlsmith15 Nov 2023 17:16 UTC
61 points
3 comments1 min readEA link

Su­perfore­cast­ing the premises in “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe_Carlsmith18 Oct 2023 20:33 UTC
106 points
3 comments1 min readEA link

In mem­ory of Louise Glück

Joe_Carlsmith15 Oct 2023 3:10 UTC
22 points
2 comments8 min readEA link

The “no sand­bag­ging on check­able tasks” hypothesis

Joe_Carlsmith31 Jul 2023 23:13 UTC
10 points
0 comments9 min readEA link

Pre­dictable up­dat­ing about AI risk

Joe_Carlsmith8 May 2023 22:05 UTC
129 points
12 comments36 min readEA link

[Linkpost] Shorter ver­sion of re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe_Carlsmith22 Mar 2023 18:06 UTC
49 points
1 comment1 min readEA link

A Stranger Pri­or­ity? Topics at the Outer Reaches of Effec­tive Altru­ism (my dis­ser­ta­tion)

Joe_Carlsmith21 Feb 2023 17:16 UTC
64 points
0 comments1 min readEA link

See­ing more whole

Joe_Carlsmith17 Feb 2023 5:14 UTC
126 points
8 comments26 min readEA link

Why should eth­i­cal anti-re­al­ists do ethics?

Joe_Carlsmith16 Feb 2023 16:27 UTC
118 points
10 comments27 min readEA link