A5b. The argument that people may wish to directly optimize for positive utility, but nobody actively optimizes for negative utility, is in my mind some (and actually quite strong) evidence that total or negative-leaning *hedonic* utilitarians should focus more on avoiding extinction + ensuring positive outcomes than on avoiding negative outcomes.
I’ve argued against this point here (although I don’t think my objection is very strong). Basically, we (or whoever) could be mistaken about which of our AI tools are sentient or matter, and end up putting them in conditions in which they suffer inadvertently or without concern for them, like factory farmed animals. If sentient tools are adapted to specific conditions (e.g. evolved), a random change in conditions is more likely to be detrimental than beneficial.
Also, individuals who are indifferent to or unaware of negative utility (generally or in certain things) may threaten you with creating a lot of negative utility to get what they want. EAF is doing research on this now.
If sentient tools are adapted to specific conditions (e.g. evolved), a random change in conditions is more likely to be detrimental than beneficial.
I don’t think it’s obvious that this is in expectation negative. I’m not at all confident that negative valence is easier to induce than positive valence today (though I think it’s probably true), but conditional upon that being true, I also think it’s a weird quirk of biology that negative valence may be more common than positive valence in evolved animals. Naively I would guess that the experiences of tool AI (that we may wrongly believe to not be sentient, or are otherwise callous towards) is in expectation zero. However, this may be enough for hedonic utilitarians with a moderate negative lean (3-10x, say) to believe that suffering overrides happiness in those cases.
I want to make a weaker claim however, which is that per unit of {experience, resource consumed}, I’d just expect intentional, optimized experience to be multiple orders of magnitude greater than incidental suffering or happiness (or other relevant moral goods).
If this is true, to believe that the *total* expected unintentional suffering (or happiness) of tool AIs to exceed that of intentional experiences of happiness (or suffering), you need to believe that the sheer amount of resources devoted to these tools are several orders of magnitude greater than the optimized resources.
This seems possible but not exceedingly likely.
If I was a negative utilitarian, I might think really hard about trying to prevent agents deliberately optimizing for suffering (which naively I would guess to be pretty unlikely but not vanishingly so).
Also, individuals who are indifferent to or unaware of negative utility (generally or in certain things) may threaten you with creating a lot of negative utility to get what they want. EAF is doing research on this now.
Yeah that’s a good example. I’m glad someone’s working on this!
I don’t think it’s obvious that this is in expectation negative. I’m not at all confident that negative valence is easier to induce than positive valence today (though I think it’s probablytrue), but conditional upon that being true, I also think it’s a weird quirk of biology that negative valence may be more common than positive valence in evolved animals. Naively I would guess that the experiences of tool AI (that we may wrongly believe to not be sentient, or are otherwise callous towards) is in expectation zero. However, this may be enough for hedonic utilitarians with a moderate negative lean (3-10x, say) to believe that suffering overrides happiness in those cases.
It might be 0 in expectation to a classical utilitarian in the conditions for which they are adapted, but I expect it to go negative if the tools are initially developed through evolution (or some other optimization algorithm for design) and RL (for learning and individual behaviour optimization), and then used in different conditions. Think of “sweet spots”: if you raise temperatures, that leads to more deaths by hyperthermia, but if you decrease temperatures, more deaths by hypothermia. Furry animals have been selected to have the right amount of fur for the temperatures they’re exposed to, and sentient tools may be similarly adapted. I think optimization algorithms will tend towards local maxima like this (although by local maxima here, I mean with respect to conditions, while the optimization algorithm is optimizing genes; I don’t have a rigorous proof connecting the two).
On the other hand, environmental conditions which are good to change in one direction and bad in the other should cancel in expectation when making a random change (with a uniform prior), and conditions that lead to improvement in each direction don’t seem stable (or maybe I just can’t even think of any), so are less likely than conditions which are bad to change in each direction. I.e. is there any kind of condition such that a change in each direction is positive? Like increasing the temperature and decreasing the temperature are both good?
This is also a (weak) theoretical argument that wild animal welfare is negative on average, because environmental conditions are constantly changing.
I’ve argued against this point here (although I don’t think my objection is very strong). Basically, we (or whoever) could be mistaken about which of our AI tools are sentient or matter, and end up putting them in conditions in which they suffer inadvertently or without concern for them, like factory farmed animals. If sentient tools are adapted to specific conditions (e.g. evolved), a random change in conditions is more likely to be detrimental than beneficial.
Also, individuals who are indifferent to or unaware of negative utility (generally or in certain things) may threaten you with creating a lot of negative utility to get what they want. EAF is doing research on this now.
I don’t think it’s obvious that this is in expectation negative. I’m not at all confident that negative valence is easier to induce than positive valence today (though I think it’s probably true), but conditional upon that being true, I also think it’s a weird quirk of biology that negative valence may be more common than positive valence in evolved animals. Naively I would guess that the experiences of tool AI (that we may wrongly believe to not be sentient, or are otherwise callous towards) is in expectation zero. However, this may be enough for hedonic utilitarians with a moderate negative lean (3-10x, say) to believe that suffering overrides happiness in those cases.
I want to make a weaker claim however, which is that per unit of {experience, resource consumed}, I’d just expect intentional, optimized experience to be multiple orders of magnitude greater than incidental suffering or happiness (or other relevant moral goods).
If this is true, to believe that the *total* expected unintentional suffering (or happiness) of tool AIs to exceed that of intentional experiences of happiness (or suffering), you need to believe that the sheer amount of resources devoted to these tools are several orders of magnitude greater than the optimized resources.
This seems possible but not exceedingly likely.
If I was a negative utilitarian, I might think really hard about trying to prevent agents deliberately optimizing for suffering (which naively I would guess to be pretty unlikely but not vanishingly so).
Yeah that’s a good example. I’m glad someone’s working on this!
It might be 0 in expectation to a classical utilitarian in the conditions for which they are adapted, but I expect it to go negative if the tools are initially developed through evolution (or some other optimization algorithm for design) and RL (for learning and individual behaviour optimization), and then used in different conditions. Think of “sweet spots”: if you raise temperatures, that leads to more deaths by hyperthermia, but if you decrease temperatures, more deaths by hypothermia. Furry animals have been selected to have the right amount of fur for the temperatures they’re exposed to, and sentient tools may be similarly adapted. I think optimization algorithms will tend towards local maxima like this (although by local maxima here, I mean with respect to conditions, while the optimization algorithm is optimizing genes; I don’t have a rigorous proof connecting the two).
On the other hand, environmental conditions which are good to change in one direction and bad in the other should cancel in expectation when making a random change (with a uniform prior), and conditions that lead to improvement in each direction don’t seem stable (or maybe I just can’t even think of any), so are less likely than conditions which are bad to change in each direction. I.e. is there any kind of condition such that a change in each direction is positive? Like increasing the temperature and decreasing the temperature are both good?
This is also a (weak) theoretical argument that wild animal welfare is negative on average, because environmental conditions are constantly changing.
Fair enough on the rest.