Yep, I didn’t initially understand you. That’s a great point!
This means the framework I presented in this post is wrong. I agree now with your statement:
the EV of partly utilitarian AI is higher than that of fully utilitarian AI.
I think the framework in this post can be modified to incorporate this and the conclusions are similar. The quantity that dominates the utility calculation is now the expected representation of utilitarianism in the AGI’s values.
The two handles become: (1) The probability of misalignment. (2) The expected representation of utilitarianism in the moral parliament conditional on alignment.
The conclusion of the post, then, should be something like “interventions that increase (2) might be underrated” instead of “interventions that increase the probability of fully utilitarian AGI are underrated.”
On second thought, another potential wrinkle, re:the representation of utilitarianism in the AI’s values. Here are two ways that could be defined:
In some sort of moral parliament, what % of representatives are utilitarian?
How good are outcomes relative to what would be optimal by utilitarian lights?
Arguably the latter definition is the more morally relevant one. The former is related but maybe not linearly. (E.g., if the non-utilitarians in a parliament are all scope-insensitive, maybe utilitarianism just needs 5% representation to get > 50% of what it wants. If that’s the case, then it may make sense to be risk-averse with respect to expected representation, e.g., maximize the chances that some sort of compromise happens at all.)
Thanks! From the other comment thread, now I’m less confident in the moral parliament per se being a great framework, but I’d guess something along those lines should work out.
Yep, I didn’t initially understand you. That’s a great point!
This means the framework I presented in this post is wrong. I agree now with your statement:
I think the framework in this post can be modified to incorporate this and the conclusions are similar. The quantity that dominates the utility calculation is now the expected representation of utilitarianism in the AGI’s values.
The two handles become:
(1) The probability of misalignment.
(2) The expected representation of utilitarianism in the moral parliament conditional on alignment.
The conclusion of the post, then, should be something like “interventions that increase (2) might be underrated” instead of “interventions that increase the probability of fully utilitarian AGI are underrated.”
On second thought, another potential wrinkle, re:the representation of utilitarianism in the AI’s values. Here are two ways that could be defined:
In some sort of moral parliament, what % of representatives are utilitarian?
How good are outcomes relative to what would be optimal by utilitarian lights?
Arguably the latter definition is the more morally relevant one. The former is related but maybe not linearly. (E.g., if the non-utilitarians in a parliament are all scope-insensitive, maybe utilitarianism just needs 5% representation to get > 50% of what it wants. If that’s the case, then it may make sense to be risk-averse with respect to expected representation, e.g., maximize the chances that some sort of compromise happens at all.)
Thanks! From the other comment thread, now I’m less confident in the moral parliament per se being a great framework, but I’d guess something along those lines should work out.