Hello from the 4 years into the future! Just a random note on the thing you said,
Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive.
Anthropic is now doing exactly this with their Constitutional AI. They let the chatbot respond in some way, then they ask it “reformulate the text so that it is more ethical”, and finally train it to output something more akin to the latter rather than to the former.
Hello from the 4 years into the future! Just a random note on the thing you said,
Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive.
Anthropic is now doing exactly this with their Constitutional AI. They let the chatbot respond in some way, then they ask it “reformulate the text so that it is more ethical”, and finally train it to output something more akin to the latter rather than to the former.
Yep! I love when old threads get resurrected.