I’m happy that the phrase “scheming” seems to have become popular recently, that’s an issue that seems fairly specific to me. I have a much easier time imagining preventing an AI from successfully (intentionally) scheming than I do preventing it from being “unaligned.”
Yea, I think I’d classify that as a different thing. I see alignment typically as a “mistake” issue, rather than as a “misuse” issue. I think others here often use the phrase similarly.
I think that the phrase [“unaligned” AI] is too vague for a lot of safety research work.
I prefer keywords like:
- scheming—
naive
- deceptive
- overconfident
- uncooperative
I’m happy that the phrase “scheming” seems to have become popular recently, that’s an issue that seems fairly specific to me. I have a much easier time imagining preventing an AI from successfully (intentionally) scheming than I do preventing it from being “unaligned.”
Hmm, I would argue than an AI which, when asked, causes human extinction is not aligned, even if it did exactly what it was told.
Yea, I think I’d classify that as a different thing. I see alignment typically as a “mistake” issue, rather than as a “misuse” issue. I think others here often use the phrase similarly.