Making it so that the inevitable non-aligned AI are stoppable
I donāt think itās inevitable that thereāll ever be a āsignificantlyā non-aligned AI thatās āsignificantlyā powerful, let alone āunstoppable by defaultā. (Iām aware that thatās not a well-defined sentence.)
In a trivial sense, there are already non-aligned AIs, as shown e.g. by the OpenAI boat game example. But those AIs are already āstoppableā.
If you mean to imply that itās inevitable that thereāll be an AI that (a) is non-aligned in a way thatās quite bad (rather than perhaps slightly imperfect alignment that never really matters much), and (b) would be unstoppable if not for some effort by longtermist-type-people to change that situation, then Iād disagree. Iām not sure how likely that is, but it doesnāt seem inevitable.
(Itās also possible you didnāt mean āinevitableā to be interpreted literally, and/āor that you didnāt think much about the precise phrasing you used in that particular sentence.)
Yeah, I wasnāt being totally clear with respect to what I was really thinking in that context. I was thinking āfrom the point of view of people who have just been devastated by some not-exactly superintelligent but still pretty smart AI that wasnāt adequately controlled, people who want to make that never happen again, what would they assume is the prudent approach to whether there will be more non-aligned AI someday?ā, figuring that they would think āAssume that if there are more, it is inevitable that there will be some non-aligned ones at some pointā. The logic being that if we donāt know how to control alignment, thereās no reason to think there wonāt someday be significantly non-aligned ones, and we should plan for that contingency.
if we donāt know how to control alignment, thereās no reason to think there wonāt someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think thereād still be some reasons to think there wonāt someday be significantly non-aligned AIs. For example, a general argument like: āPeople really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so theyāll work hard to prevent things that would cause those bad things. And theyāve often (though not always) succeeded in the past.ā
But I donāt think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that thatās all thatās required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim āif we donāt currently know how to align/ācontrol AIs, itās inevitable thereāll eventually be significantly non-aligned AIs somedayā?
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim āif we donāt currently know how to align/ācontrol AIs, itās inevitable thereāll eventually be significantly non-aligned AIs somedayā?
Yes, I agree that thereās a difference.
I wrote up a longer reply to your first comment (the one marked āAnswerā), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.
(Minor, tangential point)
I donāt think itās inevitable that thereāll ever be a āsignificantlyā non-aligned AI thatās āsignificantlyā powerful, let alone āunstoppable by defaultā. (Iām aware that thatās not a well-defined sentence.)
In a trivial sense, there are already non-aligned AIs, as shown e.g. by the OpenAI boat game example. But those AIs are already āstoppableā.
If you mean to imply that itās inevitable that thereāll be an AI that (a) is non-aligned in a way thatās quite bad (rather than perhaps slightly imperfect alignment that never really matters much), and (b) would be unstoppable if not for some effort by longtermist-type-people to change that situation, then Iād disagree. Iām not sure how likely that is, but it doesnāt seem inevitable.
(Itās also possible you didnāt mean āinevitableā to be interpreted literally, and/āor that you didnāt think much about the precise phrasing you used in that particular sentence.)
Yeah, I wasnāt being totally clear with respect to what I was really thinking in that context. I was thinking āfrom the point of view of people who have just been devastated by some not-exactly superintelligent but still pretty smart AI that wasnāt adequately controlled, people who want to make that never happen again, what would they assume is the prudent approach to whether there will be more non-aligned AI someday?ā, figuring that they would think āAssume that if there are more, it is inevitable that there will be some non-aligned ones at some pointā. The logic being that if we donāt know how to control alignment, thereās no reason to think there wonāt someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think thereād still be some reasons to think there wonāt someday be significantly non-aligned AIs. For example, a general argument like: āPeople really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so theyāll work hard to prevent things that would cause those bad things. And theyāve often (though not always) succeeded in the past.ā
(Some discussions of this sort of argument can be found in the section on āShould we expect people to handle AI safety and governance issues adequately without longtermist intervention?ā in Crucial questions.)
But I donāt think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that thatās all thatās required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim āif we donāt currently know how to align/ācontrol AIs, itās inevitable thereāll eventually be significantly non-aligned AIs somedayā?
Yes, I agree that thereās a difference.
I wrote up a longer reply to your first comment (the one marked āAnswerā), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.