For example, it could hypothetically turn out, just as a brute empirical fact, that the most effective way of aligning AIs is to treat them terribly in some way, e.g. by brainwashing them or subjecting them to painful stimuli.
Yes, agree. (For this and other reasons, I’m supportive of projects like, e.g., NYU MEP.)
I also agree that there are no strong reasons to think that technological progress improves people’s morality.
As you write, my main reason for worrying more about agential s-risks is that the greater the technological power of agents, the more their intrinsic preferences matter in how the universe will look like. To put it differently, actors whose terminal goals put some positive value on suffering (e.g., due to sadism, retributivism or other weird fanatical beliefs) would deliberately aim to arrange matter in such a way that it contains more suffering—this seems extremely worrisome if they have access to advanced technology.
Altruists would also have a much harder time to trade with such actors, whereas purely selfish actors (who don’t put positive value on suffering) could plausibly engage in mutually beneficial trades (e.g., they use (slightly) less efficient AI training/alignment methods which contain much less suffering and altruists give them some of their resources in return).
But at the very least, incidental s-risks seem plausibly quite bad in expectation regardless.
Yeah, despite what I have written above, I probably worry more about incidental s-risks than the average s-risk reducer.
Yes, agree. (For this and other reasons, I’m supportive of projects like, e.g., NYU MEP.)
I also agree that there are no strong reasons to think that technological progress improves people’s morality.
As you write, my main reason for worrying more about agential s-risks is that the greater the technological power of agents, the more their intrinsic preferences matter in how the universe will look like. To put it differently, actors whose terminal goals put some positive value on suffering (e.g., due to sadism, retributivism or other weird fanatical beliefs) would deliberately aim to arrange matter in such a way that it contains more suffering—this seems extremely worrisome if they have access to advanced technology.
Altruists would also have a much harder time to trade with such actors, whereas purely selfish actors (who don’t put positive value on suffering) could plausibly engage in mutually beneficial trades (e.g., they use (slightly) less efficient AI training/alignment methods which contain much less suffering and altruists give them some of their resources in return).
Yeah, despite what I have written above, I probably worry more about incidental s-risks than the average s-risk reducer.