I’m not sure I follow. [...] I assume all ethical views prefer status quo to extinction or totalitarianism
I wonder if we might be using “net negative” differently? By “net negative” I mean “worse than non-existence,” not “worse than status quo.” So even though we may prefer a stable status quo to imminent extinction, we might still think the latter leaves us at roughly net zero (i.e. not net negative, or at least not significantly net negative).
I also suspect that, under many ethical views, some forms of totalitarianism would be better than non-existence (i.e. not net-negative). For example, a totalitarian world in which freedoms/individuality are extremely limited—but most people are mostly happy, and extreme suffering is very rare—seems at least a little better than non-existence, by the lights of many views about value.
(A lot of what I’m saying here is based on the assumption that, according to very scope-sensitive views of value: on a scale where −100 is “worst possible future” and 0 is “non-existence” and 100 is “best possible future,” a technologically unsophisticated future would be approximately 0, because humanity would miss out on the vast majority of time and space in which we could create (dis)value. Which is why, for a technologically unsophisticated future to be better than the average technologically sophisticated future, the latter has to be net negative.)
Oh I agree, I feel like superintelligence cannot be trusted, at least the kind that’s capable of global power-grabs. [...] I think it’s largely because humans don’t want consistent things, and cannot possibly want consistent things, short of neurosurgery.
I’m not sure if you mean that humans’ preferences (a) are consistent at any given time but inconsistent over time, or (b) are inconsistent even if we hold time constant. I’d have different responses to the two.
Re: (a), I think this would require some way to aggregate different individuals’ preferences (applied to the same individual at different times)--admittedly seems tricky but not hopeless?
Re: (b), I agree that alignment to inconsistent preferences is impossible. (I also doubt humans can be aligned to other humans’ inconsistent preferences—if someone prefers apples to pears and they also prefer pears to apples (as an example of an inconsistent preference), I can’t try to do what they want me to do when they ask for a fruit, since there isn’t a consistent thing that they want me to do.) Still—I don’t know, I don’t feel that my preferences are that incoherent, and I think I’d be pretty happy with an AI that just tries to do what I want it to do to whatever extent I have consistent wants.
I wonder if we might be using “net negative” differently? By “net negative” I mean “worse than non-existence,” not “worse than status quo.” So even though we may prefer a stable status quo to imminent extinction, we might still think the latter leaves us at roughly net zero (i.e. not net negative, or at least not significantly net negative).
I also suspect that, under many ethical views, some forms of totalitarianism would be better than non-existence (i.e. not net-negative). For example, a totalitarian world in which freedoms/individuality are extremely limited—but most people are mostly happy, and extreme suffering is very rare—seems at least a little better than non-existence, by the lights of many views about value.
(A lot of what I’m saying here is based on the assumption that, according to very scope-sensitive views of value: on a scale where −100 is “worst possible future” and 0 is “non-existence” and 100 is “best possible future,” a technologically unsophisticated future would be approximately 0, because humanity would miss out on the vast majority of time and space in which we could create (dis)value. Which is why, for a technologically unsophisticated future to be better than the average technologically sophisticated future, the latter has to be net negative.)
I’m not sure if you mean that humans’ preferences (a) are consistent at any given time but inconsistent over time, or (b) are inconsistent even if we hold time constant. I’d have different responses to the two.
Re: (a), I think this would require some way to aggregate different individuals’ preferences (applied to the same individual at different times)--admittedly seems tricky but not hopeless?
Re: (b), I agree that alignment to inconsistent preferences is impossible. (I also doubt humans can be aligned to other humans’ inconsistent preferences—if someone prefers apples to pears and they also prefer pears to apples (as an example of an inconsistent preference), I can’t try to do what they want me to do when they ask for a fruit, since there isn’t a consistent thing that they want me to do.) Still—I don’t know, I don’t feel that my preferences are that incoherent, and I think I’d be pretty happy with an AI that just tries to do what I want it to do to whatever extent I have consistent wants.