No worries, I was probably doing something similar.
I don’t expect a team that designs advanced AI to also choose what it optimizes for (and I think this is more clear if we replace “what it optimizes for” with “how it’s deployed,” which seems reasonable pre-superintelligence)
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
I instinctively doubt “partly-utilitarian” systems provide much of the expected value from acausal trade
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
No worries, I was probably doing something similar.
Could you say a bit more about where you’re coming from here? (My initial intuition would be: assuming alignment ends up being based on some sort of (amplified) human feedback, doesn’t the AI developer get a lot of choice, through its control over who gives the human feedback and how feedback is aggregated (if there are multiple feedback-givers)?)
Ah sorry, to clarify, what I had in mind was mostly that (fully) non-utilitarian systems, by trading with (fully) utilitarian systems, would provide much utilitarian value. (Although on second thought, that doesn’t clearly raise the value of partly utilitarian systems more than it raises the value of fully utilitarian systems. Maybe that’s what you were suggesting?)
I should learn more, and a employees-have-power view is shared by the one person in industry I’ve spoken about this with. But I think it’s less the “team” and more either leadership or whoever deploys the system that gets to choose what values the system’s deployment promotes. I also don’t expect alignment-with-human-values to look at all like amplification-of-asking-humans-about-their-values. Maybe you’re thinking of other kinds of human feedback, but then I don’t think it’s relevant to the AI’s values.
Acausal trade: I need to think about this sometime when I can do so carefully. In particular, I think we need to be careful about ‘providing value’ relative to the baseline of an empty universe vs [a non-utilitarian AI that trades with utilitarian AIs]. (It also might be the case that less scope-sensitive systems won’t be as excited about acausal trade?) For now, I don’t have a position and I’m confused about the decision-relevant upshot.
I’d be happy to discuss this on a call sometime.
I’m thinking of ~IDA with a non-adversarial (e.g. truthful) model, but could easily be mistaken. Curious what you’re expecting?
Fair, I’m also confused.
Sure! I’ll follow up.