I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because itās unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
If unaligned AIs are morally valuable, then itās less bad to get overthrown by them, and perhaps we should be aiming to produce successors who weāre happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
Iām curious to what extent the value of the āhappiness-to-be-overthrown-byā (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations /ā evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
If unaligned AIs are morally valuable, then itās less bad to get overthrown by them
Are you confident that being overthrown by AIs is bad? I am quite uncertain. For example, maybe most people would say that humans overpowering other animals was good overall.
Reframing your question as an answer: there isnāt much work on AI sentience because we can probably solve it later without much loss, and work on AI sentience trades off with work on other AI stuff (mostly because many of the people who could work on AI sentience could also work on other AI stuff), and we canāt save other AI stuff for later.
If they are aligned, then surely our future selves can figure this out?
I think itās entirely plausible we just donāt care to figure it out, especially if we have some kind of singleton scenario where the entity in control decides to optimize human/āpersonal welfare at the expense of other sentient beings. Just consider how humans currently treat animals and now imagine that there is no opportunity for lobbying for AI welfare, weāre just locked into place.
Ultimately, I am very uncertain, but I would not say that solving AI alignment/ācontrol will āsurelyā lead to a good future.
Scenario 1: Alignment goes well. In this scenario, I agree that our future AI-assisted selves can figure things out, and that pre-alignment AI sentience work will have been wasted effort.
Scenario 2: Alignment goes poorly. While I donāt technically disagree with your statement, āIf AIs are unaligned with human values, that seems very bad already,ā I do think it misleads through lumping together all kinds of misaligned AI outcomes into āvery bad,ā when in reality this category ranges across many orders of magnitude of badness.[1] In the case that we lose control of the future at some point, to me it seems worthwhile to try to steer away from some of the worse outcomes (e.g., astronomical ābyproductā suffering of digital minds, which is likely easier to avoid if we better understand AI sentience), before then.
Maybe a question instead of an answer, but what longtermist questions does this seem like a crux for?
If AIs are unaligned with human values, that seems very bad already.
If they are aligned, then surely our future selves can figure this out?
Again, could be very dumb question, but without knowing that, it doesnāt seem surprising how little attention is paid to AI sentience.
I think this is a great question. My answers:
I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because itās unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
If unaligned AIs are morally valuable, then itās less bad to get overthrown by them, and perhaps we should be aiming to produce successors who weāre happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
Iām curious to what extent the value of the āhappiness-to-be-overthrown-byā (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations /ā evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
I like this answer and will read the link in bullet 2. Iām very interested in further reading in bullet 1 as well.
Hi Buck,
Are you confident that being overthrown by AIs is bad? I am quite uncertain. For example, maybe most people would say that humans overpowering other animals was good overall.
Reframing your question as an answer: there isnāt much work on AI sentience because we can probably solve it later without much loss, and work on AI sentience trades off with work on other AI stuff (mostly because many of the people who could work on AI sentience could also work on other AI stuff), and we canāt save other AI stuff for later.
I think itās entirely plausible we just donāt care to figure it out, especially if we have some kind of singleton scenario where the entity in control decides to optimize human/āpersonal welfare at the expense of other sentient beings. Just consider how humans currently treat animals and now imagine that there is no opportunity for lobbying for AI welfare, weāre just locked into place.
Ultimately, I am very uncertain, but I would not say that solving AI alignment/ācontrol will āsurelyā lead to a good future.
Scenario 1: Alignment goes well. In this scenario, I agree that our future AI-assisted selves can figure things out, and that pre-alignment AI sentience work will have been wasted effort.
Scenario 2: Alignment goes poorly. While I donāt technically disagree with your statement, āIf AIs are unaligned with human values, that seems very bad already,ā I do think it misleads through lumping together all kinds of misaligned AI outcomes into āvery bad,ā when in reality this category ranges across many orders of magnitude of badness.[1] In the case that we lose control of the future at some point, to me it seems worthwhile to try to steer away from some of the worse outcomes (e.g., astronomical ābyproductā suffering of digital minds, which is likely easier to avoid if we better understand AI sentience), before then.
From the roughly neutral outcome of paperclip maximization, to the extremely bad outcome of optimized suffering.