Good points re: negotiations potentially going poorly for Alice (added: and the potential for good compromise), and also about how I may be underestimating the probability of human values converging.
I still think scenario (1) is not so likely, because:
Any advanced AI will initially be created by a team, in which there will be pressures for at least intra-team compromise (and very possibly also external pressures).
More speculatively: maybe acausal trade will enable & incentivize compromise even if each world is unipolar (assuming there isn’t much convergence across worlds).
Sure. And I would buy that we should be generally uncertain. But note
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence). And regardless that AI’s successors might have less diverse goals.
Setting aside potential compromise outcomes of acausal trade, what’s decision-relevant now is what future systems that might engage in acausal trade would value, and I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade. But I’m of course extremely uncertain and not sure exactly how this matters.
Also I’m currently exhausted and tend to adopt soldier mindset when exhausted so what you’re saying is probably more convincing than I’m currently appreciating...
[noticing my excessive soldier mindset at least somewhat, I added a sentence at the end of the first paragraph of my previous comment]
No worries, I was probably doing something similar.
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence)
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
Good points re: negotiations potentially going poorly for Alice (added: and the potential for good compromise), and also about how I may be underestimating the probability of human values converging.
I still think scenario (1) is not so likely, because:
Any advanced AI will initially be created by a team, in which there will be pressures for at least intra-team compromise (and very possibly also external pressures).
More speculatively: maybe acausal trade will enable & incentivize compromise even if each world is unipolar (assuming there isn’t much convergence across worlds).
Sure. And I would buy that we should be generally uncertain. But note
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence). And regardless that AI’s successors might have less diverse goals.
Setting aside potential compromise outcomes of acausal trade, what’s decision-relevant now is what future systems that might engage in acausal trade would value, and I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade. But I’m of course extremely uncertain and not sure exactly how this matters.
Also I’m currently exhausted and tend to adopt soldier mindset when exhausted so what you’re saying is probably more convincing than I’m currently appreciating...
[noticing my excessive soldier mindset at least somewhat, I added a sentence at the end of the first paragraph of my previous comment]
No worries, I was probably doing something similar.
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
I’d be happy to discuss this on a call sometime.
I’m thinking of ~IDA with a non-adversarial (e.g. truthful) model, but could easily be mistaken. Curious what you’re expecting?
Fair, I’m also confused.
Sure! I’ll follow up.