if we donât know how to control alignment, thereâs no reason to think there wonât someday be significantly non-aligned ones, and we should plan for that contingency.
I at least approximately agree with that statement.
I think thereâd still be some reasons to think there wonât someday be significantly non-aligned AIs. For example, a general argument like: âPeople really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so theyâll work hard to prevent things that would cause those bad things. And theyâve often (though not always) succeeded in the past.â
But I donât think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that thatâs all thatâs required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim âif we donât currently know how to align/âcontrol AIs, itâs inevitable thereâll eventually be significantly non-aligned AIs somedayâ?
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim âif we donât currently know how to align/âcontrol AIs, itâs inevitable thereâll eventually be significantly non-aligned AIs somedayâ?
Yes, I agree that thereâs a difference.
I wrote up a longer reply to your first comment (the one marked âAnswerâ), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.
I at least approximately agree with that statement.
I think thereâd still be some reasons to think there wonât someday be significantly non-aligned AIs. For example, a general argument like: âPeople really really want to not get killed or subjugated or deprived of things they care about, and typically also want that for other people to some extent, so theyâll work hard to prevent things that would cause those bad things. And theyâve often (though not always) succeeded in the past.â
(Some discussions of this sort of argument can be found in the section on âShould we expect people to handle AI safety and governance issues adequately without longtermist intervention?â in Crucial questions.)
But I donât think those arguments make significantly non-aligned AIs implausible, let alone impossible. (Those are both vague words. I could maybe operationalise that as something like a 0.1-50% chance remaining.) And I think that thatâs all thatâs required (on this front) in order for the rest of your ideas in this post to be relevant.
In any case, both that quoted statement of yours and my tweaked version of it seem very different from the claim âif we donât currently know how to align/âcontrol AIs, itâs inevitable thereâll eventually be significantly non-aligned AIs somedayâ?
Yes, I agree that thereâs a difference.
I wrote up a longer reply to your first comment (the one marked âAnswerâ), but then I looked up your AI safety doc and realized that I might better read through the readings in that first.