I think some of these problems apply to the term “AI safety” as well. Say we built an AGI that was 100% guaranteed not to conquer humanity as a whole, but still killed thousands of people every year. Would we call that AI “Safe”?
One possible way to resolve this is for:
“misalignment” refers to the AI not doing what it’s meant to do
“AI safety” refers to ensuring that the consequences of misalignment are not majorly harmful
“AI X-risk safety” refers to ensuring that AI does not destroy or conquer humanity.
Each one being an easier subset of the problem above it.
“AI safety” refers to ensuring that the consequences of misalignment are not majorly harmful
That’s saying that AI safety is about protective mechanisms and that alignment is about preventative mechanisms. I haven’t heard the distinction drawn that way, and I think that’s an unusual way to draw it.
So partially what I’m saying is that the definitions of ” AI alignment” and “AI safety” are confusing, and people are using them to refer to different things in a way that can mislead. For example, if you declare that your AI is “safe” while it’s killing people on the daily (because you were referring to extinction), people will rightly feel mislead and angry.
Similarly, for “misalignment”, an image generator giving you an image of a hand with the wrong number of fingers is misaligned in the sense than you care about the correct number of fingers and it doesn’t know this. But it doesn’t cause any real harm the way that a malfunction in a healthcare diagnoser would.
Your point about safety wanting to prevent as well as protect is a good one. I think “AI safety” should refer to both.
I think some of these problems apply to the term “AI safety” as well. Say we built an AGI that was 100% guaranteed not to conquer humanity as a whole, but still killed thousands of people every year. Would we call that AI “Safe”?
One possible way to resolve this is for:
“misalignment” refers to the AI not doing what it’s meant to do
“AI safety” refers to ensuring that the consequences of misalignment are not majorly harmful
“AI X-risk safety” refers to ensuring that AI does not destroy or conquer humanity.
Each one being an easier subset of the problem above it.
That’s saying that AI safety is about protective mechanisms and that alignment is about preventative mechanisms. I haven’t heard the distinction drawn that way, and I think that’s an unusual way to draw it.
Context:
Preventative Barrier: prevent initiating hazardous event (decrease probability(event))
Protective Barrier: minimize hazardous event consequences (decrease impact(event))
Broader videos about safety engineering distinctions in AI safety: [1], [2].
So partially what I’m saying is that the definitions of ” AI alignment” and “AI safety” are confusing, and people are using them to refer to different things in a way that can mislead. For example, if you declare that your AI is “safe” while it’s killing people on the daily (because you were referring to extinction), people will rightly feel mislead and angry.
Similarly, for “misalignment”, an image generator giving you an image of a hand with the wrong number of fingers is misaligned in the sense than you care about the correct number of fingers and it doesn’t know this. But it doesn’t cause any real harm the way that a malfunction in a healthcare diagnoser would.
Your point about safety wanting to prevent as well as protect is a good one. I think “AI safety” should refer to both.