Greg_Colbourn comments on AI Pause Will Likely Backfire

Greg_Colbourn 22 Sep 2023 20:10 UTC
2 points
1 ∶ 3
GPT-4, which we can already align pretty well.
I think this is a crux. GPT-4 is only safe because it is weak. It is so far from being 100% aligned—see e.g this boast from OpenAI that is very far from being reassuring (“29% more often”), or all the many many jailbreaks—which is what will be needed for us to survive in the limit of superintelligence!
You go on to talk about robustness (to misuse) and how this (jailbreaks) is is a separate issue, but whilst the distinction may be important from the perspective of ML research (or AI capabilities research), the bottom line, ultimately, for all of us, is existential safety (x-safety).
I’ve folded all of the ways things could go wrong in terms of x-safety into my my concept of alignment here^[1]. Solving misuse (i.e. jailbreakes) is very much part of this! If we don’t, in the limit of superintelligence, all it takes is one bad actor directing their (to them “aligned”, by your definition) AI toward wiping out humanity, and we’re all dead (and yes, there are people who would press such a button if they had access to one).
1. ^
  Perhaps it would just be better referred to as x-safety.
What links here?
- Timelines are short, p(doom) is high: a global stop to frontier AI development until x-safety consensus is our only reasonable hope by Greg_Colbourn (12 Oct 2023 11:24 UTC; 73 points)
- Greg_Colbourn's comment on AI Pause Will Likely Backfire by Nora Belrose (22 Sep 2023 20:36 UTC; -1 points)