I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because it’s unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
If unaligned AIs are morally valuable, then it’s less bad to get overthrown by them, and perhaps we should be aiming to produce successors who we’re happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
I’m curious to what extent the value of the “happiness-to-be-overthrown-by” (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations / evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
If unaligned AIs are morally valuable, then it’s less bad to get overthrown by them
Are you confident that being overthrown by AIs is bad? I am quite uncertain. For example, maybe most people would say that humans overpowering other animals was good overall.
I think this is a great question. My answers:
I think that some plausible alignment schemes seem like they could plausibly involve causing suffering to the AIs. I think that it seems pretty bad to inflict huge amounts of suffering on AIs, both because it’s unethical and because it seems potentially inadvisable to make AIs justifiably mad at us.
If unaligned AIs are morally valuable, then it’s less bad to get overthrown by them, and perhaps we should be aiming to produce successors who we’re happier to be overthrown by. See here for discussion. (Obviously the plan A is to align the AIs, but it seems good to know how important it is to succeed at this, and making unaligned but valuable successors seems like a not-totally-crazy plan B.)
I’m curious to what extent the value of the “happiness-to-be-overthrown-by” (H2BOB) variable for the unaligned AI that overthrew us would be predictive of the H2BOB value of future generations / evolutions of AI. Specifically, it seems at least plausible that the nature and rate of unaligned AI evolution could be so broad and fast that knowing the nature and H2BOB of the first AGI would tell us essentially nothing about prospects for AI welfare in the long run.
I like this answer and will read the link in bullet 2. I’m very interested in further reading in bullet 1 as well.
Hi Buck,
Are you confident that being overthrown by AIs is bad? I am quite uncertain. For example, maybe most people would say that humans overpowering other animals was good overall.