Here I’m more interested in the Wiki entry than the tag, though the tag is probably also useful. Basically I primarily want a good go-to link that is solely focused on this and gives a clear definition and maybe some discussion.
This is probably an even better fit for LW or the Alignment Forum, but they don’t seem to have it. We could make a version here anyway, and then we could copy it there or someone from those sites could.
Here are some posts that have relevant content, from a very quick search:
Alignment tax
Here I’m more interested in the Wiki entry than the tag, though the tag is probably also useful. Basically I primarily want a good go-to link that is solely focused on this and gives a clear definition and maybe some discussion.
This is probably an even better fit for LW or the Alignment Forum, but they don’t seem to have it. We could make a version here anyway, and then we could copy it there or someone from those sites could.
Here are some posts that have relevant content, from a very quick search:
https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment/#:~:text=I%20like%20this%20notion%20of,%5Bthe%20systems%5D%20to%20do. (this seems to be the standard go-to at the moment, but it’s a long post where this is only one part, and having an encylopedic rather than conversational style of writing could be helpful)
Also on the Forum: https://forum.effectivealtruism.org/posts/63stBTw3WAW6k45dY/paul-christiano-current-work-in-ai-alignment
https://www.lesswrong.com/posts/9et86yPRk6RinJNt3/an-95-a-framework-for-thinking-about-how-to-make-ai-go-well
https://www.lesswrong.com/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety#Alignment_tax_and_alignable_algorithms
https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
https://www.lesswrong.com/posts/yhb5BNksWcESezp7p/poll-which-variables-are-most-strategically-relevant
https://www.lesswrong.com/posts/dktT3BiinsBZLw96h/linkpost-a-general-language-assistant-as-a-laboratory-for
https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3#Alignment_Tax
https://forum.effectivealtruism.org/posts/Ayu5im98u8FeMWoBZ/my-personal-cruxes-for-working-on-ai-safety
Related entries:
differential progress
AI alignment
AI forecasting
AI governance
Maybe Corporate governance, if that entry is made
The term “safety tax” should probably also be mentioned
Here’s the entry. I was only able to read the transcript of Paul’s talk and Rohin’s summary of it, so feel free to add anything you think is missing.
Thanks, Michael. This is a good idea; I will create the entry.
(I just noticed you left other comments to which I didn’t respond; I’ll do so shortly.)