RSS

Joe_Carlsmith

Karma: 3,515

Senior advisor at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.

Video and tran­script of talk on AI welfare

Joe_Carlsmith22 May 2025 16:15 UTC
22 points
1 comment28 min readEA link
(joecarlsmith.substack.com)

The stakes of AI moral status

Joe_Carlsmith21 May 2025 18:20 UTC
54 points
9 comments14 min readEA link
(joecarlsmith.substack.com)

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe_Carlsmith30 Apr 2025 17:43 UTC
11 points
1 comment24 min readEA link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe_Carlsmith30 Apr 2025 17:37 UTC
13 points
1 comment48 min readEA link
(joecarlsmith.com)

AI for AI safety

Joe_Carlsmith14 Mar 2025 15:00 UTC
34 points
1 comment17 min readEA link
(joecarlsmith.substack.com)

Paths and waysta­tions in AI safety

Joe_Carlsmith11 Mar 2025 18:52 UTC
22 points
2 comments11 min readEA link
(joecarlsmith.substack.com)

When should we worry about AI power-seek­ing?

Joe_Carlsmith19 Feb 2025 19:44 UTC
21 points
2 comments18 min readEA link
(joecarlsmith.substack.com)

What is it to solve the al­ign­ment prob­lem?

Joe_Carlsmith13 Feb 2025 18:42 UTC
25 points
1 comment19 min readEA link
(joecarlsmith.substack.com)

How do we solve the al­ign­ment prob­lem?

Joe_Carlsmith13 Feb 2025 18:27 UTC
28 points
1 comment6 min readEA link
(joecarlsmith.substack.com)

Fake think­ing and real thinking

Joe_Carlsmith28 Jan 2025 20:05 UTC
75 points
3 comments38 min readEA link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe_Carlsmith18 Dec 2024 18:22 UTC
72 points
1 comment62 min readEA link

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe_Carlsmith12 Nov 2024 20:56 UTC
9 points
0 comments12 min readEA link

Op­tion control

Joe_Carlsmith4 Nov 2024 17:54 UTC
11 points
0 comments54 min readEA link

Mo­ti­va­tion control

Joe_Carlsmith30 Oct 2024 17:15 UTC
18 points
0 comments52 min readEA link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe_Carlsmith28 Oct 2024 21:57 UTC
18 points
0 comments32 min readEA link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe_Carlsmith8 Oct 2024 22:30 UTC
18 points
1 comment27 min readEA link

What is it to solve the al­ign­ment prob­lem? (Notes)

Joe_Carlsmith24 Aug 2024 21:19 UTC
32 points
1 comment53 min readEA link

Value frag­ility and AI takeover

Joe_Carlsmith5 Aug 2024 21:28 UTC
38 points
3 comments30 min readEA link

A frame­work for think­ing about AI power-seeking

Joe_Carlsmith24 Jul 2024 22:41 UTC
44 points
11 comments16 min readEA link