RSS

Joe_Carlsmith

Karma: 3,736

Former senior advisor at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.

How hu­man-like do safe AI mo­ti­va­tions need to be?

Joe_Carlsmith12 Nov 2025 5:33 UTC
26 points
1 comment52 min readEA link

Leav­ing Open Philan­thropy, go­ing to Anthropic

Joe_Carlsmith3 Nov 2025 17:41 UTC
141 points
14 comments18 min readEA link

Con­trol­ling the op­tions AIs can pursue

Joe_Carlsmith29 Sep 2025 17:24 UTC
9 points
0 comments35 min readEA link

Video and tran­script of talk on giv­ing AIs safe motivations

Joe_Carlsmith22 Sep 2025 16:47 UTC
10 points
1 comment50 min readEA link

Giv­ing AIs safe motivations

Joe_Carlsmith18 Aug 2025 18:02 UTC
22 points
1 comment51 min readEA link

Video and tran­script of talk on “Can good­ness com­pete?”

Joe_Carlsmith17 Jul 2025 17:59 UTC
34 points
4 comments34 min readEA link
(joecarlsmith.substack.com)

Video and tran­script of talk on AI welfare

Joe_Carlsmith22 May 2025 16:15 UTC
22 points
1 comment28 min readEA link
(joecarlsmith.substack.com)

The stakes of AI moral status

Joe_Carlsmith21 May 2025 18:20 UTC
54 points
9 comments14 min readEA link
(joecarlsmith.substack.com)

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe_Carlsmith30 Apr 2025 17:43 UTC
11 points
1 comment24 min readEA link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe_Carlsmith30 Apr 2025 17:37 UTC
13 points
1 comment48 min readEA link
(joecarlsmith.com)

AI for AI safety

Joe_Carlsmith14 Mar 2025 15:00 UTC
34 points
1 comment17 min readEA link
(joecarlsmith.substack.com)

Paths and waysta­tions in AI safety

Joe_Carlsmith11 Mar 2025 18:52 UTC
22 points
2 comments11 min readEA link
(joecarlsmith.substack.com)

When should we worry about AI power-seek­ing?

Joe_Carlsmith19 Feb 2025 19:44 UTC
21 points
2 comments18 min readEA link
(joecarlsmith.substack.com)

What is it to solve the al­ign­ment prob­lem?

Joe_Carlsmith13 Feb 2025 18:42 UTC
25 points
1 comment19 min readEA link
(joecarlsmith.substack.com)

How do we solve the al­ign­ment prob­lem?

Joe_Carlsmith13 Feb 2025 18:27 UTC
38 points
1 comment7 min readEA link
(joecarlsmith.substack.com)

Fake think­ing and real thinking

Joe_Carlsmith28 Jan 2025 20:05 UTC
78 points
3 comments38 min readEA link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe_Carlsmith18 Dec 2024 18:22 UTC
72 points
1 comment62 min readEA link

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe_Carlsmith12 Nov 2024 20:56 UTC
9 points
0 comments12 min readEA link

Op­tion control

Joe_Carlsmith4 Nov 2024 17:54 UTC
11 points
0 comments54 min readEA link