Red team: Certain types of prosaic AI alignment (e.g., arguably InstructGPT) promote the illusion of safety without genuinely reducing existential risk from AI, or are capabilities research disguised as safety research. (A claim that I’ve heard from EleutherAI, rather indelicately phrased, and would like to see investigated)
Red team: Certain types of prosaic AI alignment (e.g., arguably InstructGPT) promote the illusion of safety without genuinely reducing existential risk from AI, or are capabilities research disguised as safety research. (A claim that I’ve heard from EleutherAI, rather indelicately phrased, and would like to see investigated)