RSS

Joe_Carlsmith

Karma: 3,538

Senior advisor at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.

Sum­ming up “Schem­ing AIs” (Sec­tion 5)

Joe_CarlsmithDec 9, 2023, 3:48 PM
9 points
1 comment10 min readEA link

Speed ar­gu­ments against schem­ing (Sec­tion 4.4-4.7 of “Schem­ing AIs”)

Joe_CarlsmithDec 8, 2023, 9:10 PM
6 points
0 comments11 min readEA link

Sim­plic­ity ar­gu­ments for schem­ing (Sec­tion 4.3 of “Schem­ing AIs”)

Joe_CarlsmithDec 7, 2023, 3:05 PM
6 points
1 comment14 min readEA link

The count­ing ar­gu­ment for schem­ing (Sec­tions 4.1 and 4.2 of “Schem­ing AIs”)

Joe_CarlsmithDec 6, 2023, 7:28 PM
9 points
1 comment7 min readEA link

Ar­gu­ments for/​against schem­ing that fo­cus on the path SGD takes (Sec­tion 3 of “Schem­ing AIs”)

Joe_CarlsmithDec 5, 2023, 6:48 PM
7 points
1 comment20 min readEA link

Non-clas­sic sto­ries about schem­ing (Sec­tion 2.3.2 of “Schem­ing AIs”)

Joe_CarlsmithDec 4, 2023, 6:44 PM
12 points
1 comment16 min readEA link

Does schem­ing lead to ad­e­quate fu­ture em­pow­er­ment? (Sec­tion 2.3.1.2 of “Schem­ing AIs”)

Joe_CarlsmithDec 3, 2023, 6:32 PM
6 points
1 comment15 min readEA link

The goal-guard­ing hy­poth­e­sis (Sec­tion 2.3.1.1 of “Schem­ing AIs”)

Joe_CarlsmithDec 2, 2023, 3:20 PM
6 points
1 comment12 min readEA link

How use­ful for al­ign­ment-rele­vant work are AIs with short-term goals? (Sec­tion 2.2.4.3 of “Schem­ing AIs”)

Joe_CarlsmithDec 1, 2023, 2:51 PM
6 points
0 comments6 min readEA link

Is schem­ing more likely in mod­els trained to have long-term goals? (Sec­tions 2.2.4.1-2.2.4.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 30, 2023, 4:43 PM
6 points
1 comment5 min readEA link

“Clean” vs. “messy” goal-di­rect­ed­ness (Sec­tion 2.2.3 of “Schem­ing AIs”)

Joe_CarlsmithNov 29, 2023, 4:32 PM
7 points
0 comments10 min readEA link

Two sources of be­yond-epi­sode goals (Sec­tion 2.2.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 28, 2023, 1:49 PM
8 points
0 comments13 min readEA link

Two con­cepts of an “epi­sode” (Sec­tion 2.2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 27, 2023, 6:01 PM
11 points
1 comment8 min readEA link

Si­tu­a­tional aware­ness (Sec­tion 2.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 26, 2023, 11:00 PM
12 points
1 comment6 min readEA link

On “slack” in train­ing (Sec­tion 1.5 of “Schem­ing AIs”)

Joe_CarlsmithNov 25, 2023, 5:51 PM
14 points
1 comment5 min readEA link

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe_CarlsmithNov 24, 2023, 7:18 PM
10 points
1 comment20 min readEA link

A tax­on­omy of non-schemer mod­els (Sec­tion 1.2 of “Schem­ing AIs”)

Joe_CarlsmithNov 22, 2023, 3:24 PM
6 points
0 comments6 min readEA link

Va­ri­eties of fake al­ign­ment (Sec­tion 1.1 of “Schem­ing AIs”)

Joe_CarlsmithNov 21, 2023, 3:00 PM
6 points
0 comments10 min readEA link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe_CarlsmithNov 15, 2023, 5:16 PM
71 points
4 comments30 min readEA link