Here’s two quotes you might disagree with. If true, they seem like they would make us slightly more skeptical of x-risk from AI, though not countering the entire argument.
Richards argues that lack of generality will make recursive self-improvement more difficult:
I’m less concerned about the singularity because if I have an AI system that’s really good at coding, I’m not convinced that it’s going to be good at other things. And so it’s not the case that if it produces a new AI system, that’s even better at coding, that that new system is now going to be better at other things. And that you get this runaway train of the singularity.
Instead, what I can imagine is that you have an AI that’s really good at writing code, it generates other AI that might be good at other things. And if it generates another AI that’s really good at code, that new one is just going to be that: an AI that’s good at writing code. And maybe we can… So to some extent, we can keep getting better and better and better at producing AI systems with the help of AI systems. But a runaway train of a singularity is not something that concerns me...
The problem with that argument is that the claim is that the smarter version of itself is going to be just smarter across the board. Right? And so that’s where I get off the train. I’m like, “No, no, no, no. It’s going to be better at say programming or better at protein folding or better at causal reasoning. That doesn’t mean it’s going to be better at everything.”
He also argues that lack of generality will also make deception more difficult:
One of the other key things for the singularity argument that I don’t buy, is that you would have an AI that then also knows how to avoid people’s potential control over it. Right? Because again, I think you’d have to create an AI that specializes in that. Or alternatively, if you’ve got the master AI that programs other AIs, it would somehow also have to have some knowledge of how to manipulate people and avoid their powers over it. Again, if it’s really good at programming, I don’t think it’s going to be able to be particularly good at manipulating people.
These arguments at least indicate that generality is a risk factor for AI x-risk. Forecasting whether superintelligent systems will be general or narrow seems more difficult but not impossible. Language models have already shown strong potential for both writing code and persuasion, which is a strike in favor of generality. Ditto for Gato’s success across multiple domains (EDIT: Or is it? See below). More outside view arguments about the benefits or costs of using the one model for many different tasks seem mixed and don’t sway my opinion much. Curious to hear other considerations.
Very glad to see this interview and the broader series. Engaging with more ML researchers seems like a good way to popularize AI safety and learn something in the process.
I still feel mostly in agreement with those quotes (though less so than the ones in the original post).
On the first, I mostly agree that if you make an AI that’s better at coding, it will be better at coding but not necessarily anything else. The one part I disagree with is that this means “no singularity”: I don’t think this really affects the argument for a singularity, which according to me is primarily about the more ideas → more output → more “people” → more ideas positive feedback loop. I also don’t think the singularity argument or recursive self-improvement argument is that important for AI risk, as long as you believe that AI systems will become significantly more capable than humanity (see also here).
On the second, it seems very plausible that your first coding AIs are not very good at manipulating people. But it doesn’t necessarily need to manipulate people; a coding AI could hack into other servers that are not being monitored as heavily and run copies of itself there; those copies could then spend time learning and planning their next moves. (This requires some knowledge / understanding of humans, like that they would not like it if you achieved your goals, and that they are monitoring your server, but it doesn’t seem to require anywhere near human-level understanding of how to manipulate humans.)
Thanks for the quotes and the positive feedback on the interview/series!
Re Gato: we also mention it as a reason why training across multiple domains does not increase performance in narrow domains, so there is also evidence against generality (in the sense of generality being useful). From the transcript:
“And there’s been some funny work that shows that it can even transfer to some out-of-domain stuff a bit, but there hasn’t been any convincing demonstration that it transfers to anything you want. And in fact, I think that the recent paper… The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks.”
Here’s two quotes you might disagree with. If true, they seem like they would make us slightly more skeptical of x-risk from AI, though not countering the entire argument.
Richards argues that lack of generality will make recursive self-improvement more difficult:
He also argues that lack of generality will also make deception more difficult:
These arguments at least indicate that generality is a risk factor for AI x-risk. Forecasting whether superintelligent systems will be general or narrow seems more difficult but not impossible. Language models have already shown strong potential for both writing code and persuasion, which is a strike in favor of generality. Ditto for Gato’s success across multiple domains (EDIT: Or is it? See below). More outside view arguments about the benefits or costs of using the one model for many different tasks seem mixed and don’t sway my opinion much. Curious to hear other considerations.
Very glad to see this interview and the broader series. Engaging with more ML researchers seems like a good way to popularize AI safety and learn something in the process.
I still feel mostly in agreement with those quotes (though less so than the ones in the original post).
On the first, I mostly agree that if you make an AI that’s better at coding, it will be better at coding but not necessarily anything else. The one part I disagree with is that this means “no singularity”: I don’t think this really affects the argument for a singularity, which according to me is primarily about the more ideas → more output → more “people” → more ideas positive feedback loop. I also don’t think the singularity argument or recursive self-improvement argument is that important for AI risk, as long as you believe that AI systems will become significantly more capable than humanity (see also here).
On the second, it seems very plausible that your first coding AIs are not very good at manipulating people. But it doesn’t necessarily need to manipulate people; a coding AI could hack into other servers that are not being monitored as heavily and run copies of itself there; those copies could then spend time learning and planning their next moves. (This requires some knowledge / understanding of humans, like that they would not like it if you achieved your goals, and that they are monitoring your server, but it doesn’t seem to require anywhere near human-level understanding of how to manipulate humans.)
Thanks for the quotes and the positive feedback on the interview/series!
Re Gato: we also mention it as a reason why training across multiple domains does not increase performance in narrow domains, so there is also evidence against generality (in the sense of generality being useful). From the transcript: