Linch comments on AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Linch 31 May 2023 6:03 UTC
9 points
0 ∶ 0
Unless I’m misunderstanding something important (which is very possible!) I think Bengio’s risk model is missing some key steps.

In particular, if I understand the core argument correctly, it goes like this:

1. (Individually) Human-level AI is possible.
2. At the point where individual AIs are human-level intelligence, they will collectively be superhuman in ability, due to various intrinsic advantages of being digital.
3. It’s possible to build such AIs with autonomous goals that are catastrophically or existentially detrimental to humanity. (Bengio calls them “rogue AIs”)
4. Some people may choose to actually build rogue AIs.
5. Thus, (some chance of) doom.
As stated, I think this argument is unconvincing. Because for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs.

If I was trying to argue for Bengio’s position, I would probably go down one (or more) of the following paths:
1. Alignment being very hard/practically impossible: If alignment is very hard and nobody can reliably build a superhuman AI that’s sufficiently aligned that we trust it to stop rogue AI, then the rogue AI can cause a catastrophe unimpeded
  1. Note that this is not just an argument for the possibility of rogue AIs, but an argument against non-rogue AIs.
2. Offense-defense imbalance: Perhaps it’s easier in practice to create rogue AIs to destroy the world than to create non-rogue AIs to prevent the world’s destruction.
  1. Vulnerable world: Perhaps it’s much easier to destroy the world than prevent its destruction
    Toy example: Suppose AIs with a collective intelligence of 200 IQ is enough to destroy the world, but AIs with a collective intelligence of 300 IQ is needed to prevent the world’s destruction. Then the “bad guys” will have a large head start on the “good guys.”
  2. Asymmetric carefulness: Perhaps humanity will not want to create non-rogue AIs because most people are too careful about the risks. Eg maybe we have an agreement among the top AI labs to not develop AI beyond capabilities level X without alignment level Y, or something similar in law. Suppose, further, in this world that normal companies mostly follow the law and at least one group building rogue AIs don’t.
    In a sense, you can view this as a more general case of (1). In this story, we don’t need AI alignment to be very hard, just for humanity to believe it is.
It’s possible Bengio already believes 1) or 2), or something else similar, and just thought it was obvious enough to not be worth noting. But at least among my conversations with AI risk skeptics, among the ones who think AGI itself is possible/likely, the most common objection is why rogue AIs will be able to overpower not just humans but also other AIs as well.
- aog 31 May 2023 12:32 UTC
  4 points
  0 ∶ 0
  Parent
  That’s a good point! Joe Carlsmith makes a similar step by step argument, but includes a specific step about whether the existence of rogue AI would lead to catastrophic harm. Would have been nice to include in Bengio’s.
  
  Carlsmith: https://arxiv.org/abs/2206.13353
- Denis 2 Jun 2023 11:30 UTC
  3 points
  0 ∶ 0
  Parent
  “for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs.”
  
  This is a very interesting argument, and definitely worthy of discussion. I realise you have only sketched your argument here, so I won’t try to poke holes in it.
  
  Briefly, I see two objections that need to be addressed:
  
  1. One fear is that the rogue AIs may well be released on 2023_Humanity or a version very close to that due to the exponential capability growth we could see if we create an AI that is able to develop better AI itself. Net, it may be enough that it would be catastrophic for 2023_Humanity.
  
  2. The challenge of developing aligned superhuman AIs which would defend us against rogue AIs while themselves offering no threat is not trivial, and I’m not sure how many major labs are working on that right now, or if they can even write a clear problem-statement about what such an AI system should be.
  From first principles, the concern is that this AI would necessarily be more limited (it needs to be aligned and safe) than a potential rogue AI, so why should we believe we could develop such an AI faster and enable it to stay ahead of potential rogue AIs?
  
  Far from disagreeing with your comment, I’m just thinking about how it would work and what tangible steps need to be taken to create the kind of well-aligned AIs which could protect humanity.