The part I disagree with is: “Therefore, it is not very difficult to design a useful goal function that raises subjugation difficulty above the capability level of the AGI, simply by adding arbitrarily many constraints.”
Firstly, simply stacking lots of constraints will likely be less effective than utilizing a smaller number of strategically chosen constraints (and as mentioned by Mauricio, it’s highly likely that stacking as many constraints as possible makes your AI uncompetitive).
Secondly, assuming this technique works: Given the stakes, it seems worthwhile for people to spend a lot of time inventing the best such schemes. And it feels like this is a topic that could involve endless debate with people coming up with more and more complicated schemes and others coming up with convoluted schemes to get through the loophole.
And it feels like this is a topic that could involve endless debate with people coming up with more and more complicated schemes and others coming up with convoluted schemes to get through the loophole.
I think this is great! This is what we do when we write laws for humans, and this is what we do when we look for flaws in software algorithms. I think applying it to AI development would not be particularly burdensome, and as you point out can be refined to a few well-aimed constraints that drastically decrease odds of harm.
I guess my main complaint was that this “scenario-loophole” game seems to be treated as a rebuttal of some kind, instead of a highly useful and necessary part of AI development.
The part I disagree with is: “Therefore, it is not very difficult to design a useful goal function that raises subjugation difficulty above the capability level of the AGI, simply by adding arbitrarily many constraints.”
Firstly, simply stacking lots of constraints will likely be less effective than utilizing a smaller number of strategically chosen constraints (and as mentioned by Mauricio, it’s highly likely that stacking as many constraints as possible makes your AI uncompetitive).
Secondly, assuming this technique works: Given the stakes, it seems worthwhile for people to spend a lot of time inventing the best such schemes. And it feels like this is a topic that could involve endless debate with people coming up with more and more complicated schemes and others coming up with convoluted schemes to get through the loophole.
I think this is great! This is what we do when we write laws for humans, and this is what we do when we look for flaws in software algorithms. I think applying it to AI development would not be particularly burdensome, and as you point out can be refined to a few well-aimed constraints that drastically decrease odds of harm.
I guess my main complaint was that this “scenario-loophole” game seems to be treated as a rebuttal of some kind, instead of a highly useful and necessary part of AI development.