Alignment tax

TagLast edit: Jul 22, 2022, 9:00 PM by Leo

An alignment tax (sometimes called a safety tax) is the additional cost of making AI aligned, relative to unaligned AI.

Approaches to the alignment tax

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.^[1]^[2] One approach seeks to find ways to pay the tax, such as persuading individual actors to pay it or facilitating coordination of the sort that would allow groups to pay it. The other approach tries to reduce the tax, by differentially advancing existing alignable algorithms or by making existing algorithms more alignable.

^
Christiano, Paul (2020) Current work in AI alignment, Effective Altruism Global, April 3.
^
For a summary, see Rohin Shah (2020) A framework for thinking about how to make AI go well, LessWrong, April 15.

Paul Christiano: Current work in AI alignment

EA GlobalApr 3, 2020, 7:06 AM

80 points

3 comments24 min readEA link

(www.youtube.com)

AI safety tax dynamics

Owen Cotton-BarrattOct 23, 2024, 12:21 PM

22 points

9 comments6 min readEA link

(strangecities.substack.com)

My personal cruxes for working on AI safety

BuckFeb 13, 2020, 7:11 AM

136 points

35 comments44 min readEA link

Safety tax functions

Owen Cotton-BarrattOct 20, 2024, 2:13 PM

23 points

1 comment6 min readEA link

(strangecities.substack.com)

[Linkpost] Jan Leike on three kinds of alignment taxes

AkashJan 6, 2023, 11:57 PM

29 points

0 comments1 min readEA link

Aligning AI with Humans by Leveraging Legal Informatics

johnjnaySep 18, 2022, 7:43 AM

20 points

11 comments3 min readEA link

New cooperation mechanism—quadratic funding without a matching pool

Filip SondejJun 5, 2022, 1:55 PM

55 points

11 comments5 min readEA link

MichaelA🔸Dec 30, 2021, 3:13 PM
2 points
0 ∶ 0
Here’s my original proposal for this entry, in case this info is of use to someone:
“This is probably an even better fit for LW or the Alignment Forum, but they don’t seem to have it. We could make a version here anyway, and then we could copy it there or someone from those sites could.
Here are some posts that have relevant content, from a very quick search:
- https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment/#:~:text=I%20like%20this%20notion%20of,%5Bthe%20systems%5D%20to%20do. (this seems to be the standard go-to at the moment, but it’s a long post where this is only one part, and having an encylopedic rather than conversational style of writing could be helpful)
  - Also on the Forum: https://forum.effectivealtruism.org/posts/63stBTw3WAW6k45dY/paul-christiano-current-work-in-ai-alignment
- https://www.lesswrong.com/posts/9et86yPRk6RinJNt3/an-95-a-framework-for-thinking-about-how-to-make-ai-go-well
- https://www.lesswrong.com/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety#Alignment_tax_and_alignable_algorithms
- https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
- https://www.lesswrong.com/posts/yhb5BNksWcESezp7p/poll-which-variables-are-most-strategically-relevant
- https://www.lesswrong.com/posts/dktT3BiinsBZLw96h/linkpost-a-general-language-assistant-as-a-laboratory-for
- https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
- https://forum.effectivealtruism.org/posts/Ayu5im98u8FeMWoBZ/my-personal-cruxes-for-working-on-ai-safety
Related entries:
- differential progress
- AI alignment
- AI forecasting
- AI governance
- Maybe Corporate governance, if that entry is made”

Align­ment tax

Approaches to the alignment tax

Further reading

Related entries

Paul Chris­ti­ano: Cur­rent work in AI alignment

AI safety tax dynamics

My per­sonal cruxes for work­ing on AI safety