I think “partly-utilitarian AI,” in the standard sense of the phrase, would produce orders of magnitude less utility than a system optimizing for utility, just because it’s likely that optimization for [anything else] probably comes apart from optimization for utility at high levels of capability.
What about the following counterexample? Suppose a powerful agent optimizes for a mixed objective, which leads to it optimizing ~half of the accessible universe for utilitarianism, the other ~half for some other scope-sensitive value, and a few planets for modal scope-insensitive human values. Then, even at high levels of capability, this universe will be ~half as good by utilitarian lights as a universe that’s fully optimized for utility, even though the optimizer wasn’t just optimizing for utility. (If you doubt whether there exist utility functions that would lead to roughly these outcomes, I’m happy to make arguments for that assumption.)
(I also don’t yet find your linked argument convincing. It argues that “If a future does not involve optimizing for the good, value is almost certainly near-zero.” I agree, but imo it’s quite a leap to conclude from that that [If a future does not only involve optimizing for the good, value is almost certainly near-zero.])
If we stipulate that “partly-utilitarian AI” makes a decent fraction of the utility of a utilitarian AI, I think such a system is extremely unlikely to exist.
This seems like a possible crux, but I don’t fully understand what you’re saying here. Could you rephrase?
[Added] Pasting this from my reply to Josh:
(Maybe I could have put more emphasis on what kind of AI I have in mind. As my original comment mentioned, I’m talking about “a sufficiently strong version of ‘partly-utilitarian.’” So an AI that’s just slightly utilitarian wouldn’t count. More concretely, I have in mind something like: an agent that operates via a moral parliament in which utilitarianism has > 10% of representation.)
Sure, this would presumably be ~half as utility-producing as a utilitarian AI (unless something weird happened like value being nonlinear in resources, but maybe in that case the AI could flip a coin and do 100-0 or 0-100 instead of 50-50). And maybe this could come about as a result of trade/coordination. But it feels unlikely to me. In particular, while “moral parliament” isn’t fully specified, in my internal version of moral parliament, I would not expect a superintelligence to have much moral uncertainty, and certainly would not expect it to translate that moral uncertainty into using significant resources to optimize for different things. (Note that the original moral parliament post thought you should end up acting as if you’re certain in whatever policy wins a majority of delegates—not use 10% of resources on 10% of delegates.) And if we do anything except optimize for utility (or disutility) with some resources, I think those resources produce about zero utility.
Ah good point, I was thinking of a moral parliament where representation is based on value pluralism rather than moral uncertainty, but I think you’re still right that a moral parliament approach (as originally conceived) wouldn’t produce the outcome I had in mind.
Still, is it that hard for some approach to produce a compromise? (Is the worry that [creating a powerful optimizer that uses significant resources to optimize for different things] is technically hard even if alignment has been solved? Edited to add: My intuition is this isn’t hard conditional on alignment being solved, since e.g. then you could just align the AI to an adequately pluralistic human or set of humans, or maybe directly reward this sort of pluralism in training, but I haven’t thought about it much.)
(A lot of my optimism comes from my assumption that ~all (popularity-weighted) value systems which compete with utilitarianism are at least somewhat scope-insensitive, which makes them easy to mostly satisfy with a small fraction of available resources. Are there any prominent value systems other than utilitarianism that are fully scope-sensitive?)
I agree that utilitarianism’s scope sensitivity means a compromise with less scope-sensitive systems could be highly utilitarian. And this may be very important. (But this seems far from certain to me: if Alice and Bob have equal power and decide to merge systems, and Alice is highly scope-sensitive and Bob isn’t, it seems likely Bob will still demand half of resources/etc., under certain reasonable assumptions. On the other hand, such agents may be able to make more sophisticated trades that provide extra security to the scope-insensitive and extra expected resources to the scope-sensitive.)
Regardless, I think scenarios where (1) a single agent controls ~all influence without needing to compromise or (2) multiple agents converge to the same final goals and so merge without compromise are more likely than scenarios where (3) agents with significant influence and diverse preferences compromise.
(So my answer to your second paragraph is “no, but”)
Good points re: negotiations potentially going poorly for Alice (added: and the potential for good compromise), and also about how I may be underestimating the probability of human values converging.
I still think scenario (1) is not so likely, because:
Any advanced AI will initially be created by a team, in which there will be pressures for at least intra-team compromise (and very possibly also external pressures).
More speculatively: maybe acausal trade will enable & incentivize compromise even if each world is unipolar (assuming there isn’t much convergence across worlds).
Sure. And I would buy that we should be generally uncertain. But note
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence). And regardless that AI’s successors might have less diverse goals.
Setting aside potential compromise outcomes of acausal trade, what’s decision-relevant now is what future systems that might engage in acausal trade would value, and I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade. But I’m of course extremely uncertain and not sure exactly how this matters.
Also I’m currently exhausted and tend to adopt soldier mindset when exhausted so what you’re saying is probably more convincing than I’m currently appreciating...
[noticing my excessive soldier mindset at least somewhat, I added a sentence at the end of the first paragraph of my previous comment]
No worries, I was probably doing something similar.
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence)
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
Thanks for pushing back!
What about the following counterexample? Suppose a powerful agent optimizes for a mixed objective, which leads to it optimizing ~half of the accessible universe for utilitarianism, the other ~half for some other scope-sensitive value, and a few planets for modal scope-insensitive human values. Then, even at high levels of capability, this universe will be ~half as good by utilitarian lights as a universe that’s fully optimized for utility, even though the optimizer wasn’t just optimizing for utility. (If you doubt whether there exist utility functions that would lead to roughly these outcomes, I’m happy to make arguments for that assumption.)
(I also don’t yet find your linked argument convincing. It argues that “If a future does not involve optimizing for the good, value is almost certainly near-zero.” I agree, but imo it’s quite a leap to conclude from that that [If a future does not only involve optimizing for the good, value is almost certainly near-zero.])
This seems like a possible crux, but I don’t fully understand what you’re saying here. Could you rephrase?
[Added] Pasting this from my reply to Josh:
Sure, this would presumably be ~half as utility-producing as a utilitarian AI (unless something weird happened like value being nonlinear in resources, but maybe in that case the AI could flip a coin and do 100-0 or 0-100 instead of 50-50). And maybe this could come about as a result of trade/coordination. But it feels unlikely to me. In particular, while “moral parliament” isn’t fully specified, in my internal version of moral parliament, I would not expect a superintelligence to have much moral uncertainty, and certainly would not expect it to translate that moral uncertainty into using significant resources to optimize for different things. (Note that the original moral parliament post thought you should end up acting as if you’re certain in whatever policy wins a majority of delegates—not use 10% of resources on 10% of delegates.) And if we do anything except optimize for utility (or disutility) with some resources, I think those resources produce about zero utility.
Ah good point, I was thinking of a moral parliament where representation is based on value pluralism rather than moral uncertainty, but I think you’re still right that a moral parliament approach (as originally conceived) wouldn’t produce the outcome I had in mind.
Still, is it that hard for some approach to produce a compromise? (Is the worry that [creating a powerful optimizer that uses significant resources to optimize for different things] is technically hard even if alignment has been solved? Edited to add: My intuition is this isn’t hard conditional on alignment being solved, since e.g. then you could just align the AI to an adequately pluralistic human or set of humans, or maybe directly reward this sort of pluralism in training, but I haven’t thought about it much.)
(A lot of my optimism comes from my assumption that ~all (popularity-weighted) value systems which compete with utilitarianism are at least somewhat scope-insensitive, which makes them easy to mostly satisfy with a small fraction of available resources. Are there any prominent value systems other than utilitarianism that are fully scope-sensitive?)
I agree that utilitarianism’s scope sensitivity means a compromise with less scope-sensitive systems could be highly utilitarian. And this may be very important. (But this seems far from certain to me: if Alice and Bob have equal power and decide to merge systems, and Alice is highly scope-sensitive and Bob isn’t, it seems likely Bob will still demand half of resources/etc., under certain reasonable assumptions. On the other hand, such agents may be able to make more sophisticated trades that provide extra security to the scope-insensitive and extra expected resources to the scope-sensitive.)
Regardless, I think scenarios where (1) a single agent controls ~all influence without needing to compromise or (2) multiple agents converge to the same final goals and so merge without compromise are more likely than scenarios where (3) agents with significant influence and diverse preferences compromise.
(So my answer to your second paragraph is “no, but”)
Good points re: negotiations potentially going poorly for Alice (added: and the potential for good compromise), and also about how I may be underestimating the probability of human values converging.
I still think scenario (1) is not so likely, because:
Any advanced AI will initially be created by a team, in which there will be pressures for at least intra-team compromise (and very possibly also external pressures).
More speculatively: maybe acausal trade will enable & incentivize compromise even if each world is unipolar (assuming there isn’t much convergence across worlds).
Sure. And I would buy that we should be generally uncertain. But note
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence). And regardless that AI’s successors might have less diverse goals.
Setting aside potential compromise outcomes of acausal trade, what’s decision-relevant now is what future systems that might engage in acausal trade would value, and I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade. But I’m of course extremely uncertain and not sure exactly how this matters.
Also I’m currently exhausted and tend to adopt soldier mindset when exhausted so what you’re saying is probably more convincing than I’m currently appreciating...
[noticing my excessive soldier mindset at least somewhat, I added a sentence at the end of the first paragraph of my previous comment]
No worries, I was probably doing something similar.
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
I’d be happy to discuss this on a call sometime.
I’m thinking of ~IDA with a non-adversarial (e.g. truthful) model, but could easily be mistaken. Curious what you’re expecting?
Fair, I’m also confused.
Sure! I’ll follow up.