if we don’t know how to control alignment, there’s no reason to think there won’t someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think there’d still be some reasons to think there won’t someday be significantly non-aligned AIs. For example, a general argument like: “People really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so they’ll work hard to prevent things that would cause those bad things. And they’ve often (though not always) succeeded in the past.”
But I don’t think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that that’s all that’s required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
Yes, I agree that there’s a difference.
I wrote up a longer reply to your first comment (the one marked “Answer’), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.
I at least approximately agree with that statement.
I think there’d still be some reasons to think there won’t someday be significantly non-aligned AIs. For example, a general argument like: “People really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so they’ll work hard to prevent things that would cause those bad things. And they’ve often (though not always) succeeded in the past.”
(Some discussions of this sort of argument can be found in the section on “Should we expect people to handle AI safety and governance issues adequately without longtermist intervention?” in Crucial questions.)
But I don’t think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that that’s all that’s required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
Yes, I agree that there’s a difference.
I wrote up a longer reply to your first comment (the one marked “Answer’), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.