Making it so that the inevitable non-aligned AI are stoppable
I don’t think it’s inevitable that there’ll ever be a “significantly” non-aligned AI that’s “significantly” powerful, let alone “unstoppable by default”. (I’m aware that that’s not a well-defined sentence.)
In a trivial sense, there are already non-aligned AIs, as shown e.g. by the OpenAI boat game example. But those AIs are already “stoppable”.
If you mean to imply that it’s inevitable that there’ll be an AI that (a) is non-aligned in a way that’s quite bad (rather than perhaps slightly imperfect alignment that never really matters much), and (b) would be unstoppable if not for some effort by longtermist-type-people to change that situation, then I’d disagree. I’m not sure how likely that is, but it doesn’t seem inevitable.
(It’s also possible you didn’t mean “inevitable” to be interpreted literally, and/or that you didn’t think much about the precise phrasing you used in that particular sentence.)
Yeah, I wasn’t being totally clear with respect to what I was really thinking in that context. I was thinking “from the point of view of people who have just been devastated by some not-exactly superintelligent but still pretty smart AI that wasn’t adequately controlled, people who want to make that never happen again, what would they assume is the prudent approach to whether there will be more non-aligned AI someday?”, figuring that they would think “Assume that if there are more, it is inevitable that there will be some non-aligned ones at some point”. The logic being that if we don’t know how to control alignment, there’s no reason to think there won’t someday be significantly non-aligned ones, and we should plan for that contingency.
if we don’t know how to control alignment, there’s no reason to think there won’t someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think there’d still be some reasons to think there won’t someday be significantly non-aligned AIs. For example, a general argument like: “People really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so they’ll work hard to prevent things that would cause those bad things. And they’ve often (though not always) succeeded in the past.”
But I don’t think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that that’s all that’s required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
Yes, I agree that there’s a difference.
I wrote up a longer reply to your first comment (the one marked “Answer’), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.
(Minor, tangential point)
I don’t think it’s inevitable that there’ll ever be a “significantly” non-aligned AI that’s “significantly” powerful, let alone “unstoppable by default”. (I’m aware that that’s not a well-defined sentence.)
In a trivial sense, there are already non-aligned AIs, as shown e.g. by the OpenAI boat game example. But those AIs are already “stoppable”.
If you mean to imply that it’s inevitable that there’ll be an AI that (a) is non-aligned in a way that’s quite bad (rather than perhaps slightly imperfect alignment that never really matters much), and (b) would be unstoppable if not for some effort by longtermist-type-people to change that situation, then I’d disagree. I’m not sure how likely that is, but it doesn’t seem inevitable.
(It’s also possible you didn’t mean “inevitable” to be interpreted literally, and/or that you didn’t think much about the precise phrasing you used in that particular sentence.)
Yeah, I wasn’t being totally clear with respect to what I was really thinking in that context. I was thinking “from the point of view of people who have just been devastated by some not-exactly superintelligent but still pretty smart AI that wasn’t adequately controlled, people who want to make that never happen again, what would they assume is the prudent approach to whether there will be more non-aligned AI someday?”, figuring that they would think “Assume that if there are more, it is inevitable that there will be some non-aligned ones at some point”. The logic being that if we don’t know how to control alignment, there’s no reason to think there won’t someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think there’d still be some reasons to think there won’t someday be significantly non-aligned AIs. For example, a general argument like: “People really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so they’ll work hard to prevent things that would cause those bad things. And they’ve often (though not always) succeeded in the past.”
(Some discussions of this sort of argument can be found in the section on “Should we expect people to handle AI safety and governance issues adequately without longtermist intervention?” in Crucial questions.)
But I don’t think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that that’s all that’s required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim “if we don’t currently know how to align/control AIs, it’s inevitable there’ll eventually be significantly non-aligned AIs someday”?
Yes, I agree that there’s a difference.
I wrote up a longer reply to your first comment (the one marked “Answer’), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.