Dear Seth, thank you again for your opinion. I agree that many instrumental goals such as power would be helpful also for final goals that are not of the type “maximize this or that”. But I have yet to see a formal argument that show that they would actually emerge in a non-maximizing agent just as likely as in a maximizer.
Regarding your other claim, I cannot agree that “mismatched goals is the problem”. First of all, why do you think there is just a single problem, “the” problem? And then, is it helpful to consider something a “problem” that is an unchangeable fact of life? As long as there is more than one human who is potentially affected by an AI system’s actions, and these humans’ goals are not matched with each other (which they usually aren’t), no AI system can have goals matched to all humans affected by it. Unless you want to claim that “having matched goals” is not a transitive relation. So I am quite convinced that the fact that AI systems will have mismatched goals is not a problem we can solve but a fact we have to deal with.
I agree with you that humans have mismatched goals among ourselves, so some amount of goal mismatch is just a fact we have to deal with. I think the ideal is that we get an AGI that makes its goal the overlap in human goals; see [Empowerment is (almost) All We Need](https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need) and others on preference maximization.
I also agree with your intuition that having a non-maximizer improves the odds of an AGI not seeking power or doing other dangerous things. But I think we need to go far beyond the intuition; we don’t want to play odds with the future of humanity. To that end, I have more thoughts on where this will and won’t happen.
I’m saying “the problem” with optimization is actually mismatched goals, not optimization/maximization. In more depth, and hopefully more usefully: I think unbounded goals are the problem with optimization (not the only problem, but a very big one).
If an AGI had a bounded goal like “make on billion paperclips”, it wouldn’t be nearly as dangerous; it might decide to eliminate humanity to make the odds of getting to a billion as good as possible (I can’t remember where I saw this important point; I think maybe Nate Soares made it). But it might decide that its best odds would just be making some improvements to the paperclip business, in which case it wouldn’t cause problems.
One final comment on your argument about odds: In our algorithms, specifying an allowable aspiration includes specifying a desired probability of success that is sufficiently below 100%. This is exactly to avoid the problem of fulfilling the aspiration becoming an optimization problem through the backdoor.
Mismatched goals is the problem. The logic of instrumental convergence applies to any goal, not just maximization goals.
Dear Seth, thank you again for your opinion. I agree that many instrumental goals such as power would be helpful also for final goals that are not of the type “maximize this or that”. But I have yet to see a formal argument that show that they would actually emerge in a non-maximizing agent just as likely as in a maximizer.
Regarding your other claim, I cannot agree that “mismatched goals is the problem”. First of all, why do you think there is just a single problem, “the” problem? And then, is it helpful to consider something a “problem” that is an unchangeable fact of life? As long as there is more than one human who is potentially affected by an AI system’s actions, and these humans’ goals are not matched with each other (which they usually aren’t), no AI system can have goals matched to all humans affected by it. Unless you want to claim that “having matched goals” is not a transitive relation. So I am quite convinced that the fact that AI systems will have mismatched goals is not a problem we can solve but a fact we have to deal with.
I agree with you that humans have mismatched goals among ourselves, so some amount of goal mismatch is just a fact we have to deal with. I think the ideal is that we get an AGI that makes its goal the overlap in human goals; see [Empowerment is (almost) All We Need](https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need) and others on preference maximization.
I also agree with your intuition that having a non-maximizer improves the odds of an AGI not seeking power or doing other dangerous things. But I think we need to go far beyond the intuition; we don’t want to play odds with the future of humanity. To that end, I have more thoughts on where this will and won’t happen.
I’m saying “the problem” with optimization is actually mismatched goals, not optimization/maximization. In more depth, and hopefully more usefully: I think unbounded goals are the problem with optimization (not the only problem, but a very big one).
If an AGI had a bounded goal like “make on billion paperclips”, it wouldn’t be nearly as dangerous; it might decide to eliminate humanity to make the odds of getting to a billion as good as possible (I can’t remember where I saw this important point; I think maybe Nate Soares made it). But it might decide that its best odds would just be making some improvements to the paperclip business, in which case it wouldn’t cause problems.
So we’re converging...
One final comment on your argument about odds: In our algorithms, specifying an allowable aspiration includes specifying a desired probability of success that is sufficiently below 100%. This is exactly to avoid the problem of fulfilling the aspiration becoming an optimization problem through the backdoor.