Denis comments on AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Denis 2 Jun 2023 11:30 UTC
3 points
0 ∶ 0
“for superhuman rogue AIs to be catastrophic for humanity, they need to not only be catastrophic for 2023_Humanity but also for humanity even after we also have the assistance of superhuman or near-superhuman AIs.”

This is a very interesting argument, and definitely worthy of discussion. I realise you have only sketched your argument here, so I won’t try to poke holes in it.

Briefly, I see two objections that need to be addressed:

1. One fear is that the rogue AIs may well be released on 2023_Humanity or a version very close to that due to the exponential capability growth we could see if we create an AI that is able to develop better AI itself. Net, it may be enough that it would be catastrophic for 2023_Humanity.

2. The challenge of developing aligned superhuman AIs which would defend us against rogue AIs while themselves offering no threat is not trivial, and I’m not sure how many major labs are working on that right now, or if they can even write a clear problem-statement about what such an AI system should be.
From first principles, the concern is that this AI would necessarily be more limited (it needs to be aligned and safe) than a potential rogue AI, so why should we believe we could develop such an AI faster and enable it to stay ahead of potential rogue AIs?

Far from disagreeing with your comment, I’m just thinking about how it would work and what tangible steps need to be taken to create the kind of well-aligned AIs which could protect humanity.