...because that AI research is useful for some other goal the AI has, such as maximizing paperclips. See the instrumental convergence thesis.
Yes, exactly.
The argument for doom by default seems to rest on a default misunderstanding of human values as the programmer attempts to communicate them to the AI. If capability growth comes before a goal is granted, it seems less likely that misunderstanding will occur.
Eh, I could see arguments that it would be less likely and arguments that it would be more likely. Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive. Argument that it is more likely: If we mess up, we immediately have an unaligned superintelligence on our hands. At least if the goals come before the capability growth, there is a period where we might be able to contain it and test it, since it isn’t capable of escaping or concealing its intentions.
Hello from the 4 years into the future! Just a random note on the thing you said,
Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive.
Anthropic is now doing exactly this with their Constitutional AI. They let the chatbot respond in some way, then they ask it “reformulate the text so that it is more ethical”, and finally train it to output something more akin to the latter rather than to the former.
Yes, exactly.
Eh, I could see arguments that it would be less likely and arguments that it would be more likely. Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive. Argument that it is more likely: If we mess up, we immediately have an unaligned superintelligence on our hands. At least if the goals come before the capability growth, there is a period where we might be able to contain it and test it, since it isn’t capable of escaping or concealing its intentions.
Hello from the 4 years into the future! Just a random note on the thing you said,
Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive.
Anthropic is now doing exactly this with their Constitutional AI. They let the chatbot respond in some way, then they ask it “reformulate the text so that it is more ethical”, and finally train it to output something more akin to the latter rather than to the former.
Yep! I love when old threads get resurrected.