Utility and integrity coming apart, and in particular deception for gain, is one of the central concerns of AI safety. Shouldn’t we similarly be worried at the extremes even in human consequentialists?
It is somewhat disanalogous, though, because
We don’t expect one small group of humans to have so much power without the need to cooperate with others, like might be the case for an AGI taking over. Furthermore, the FTX/Alameda leaders had goals that were fairly aligned with a much larger community (the EA community), whose work they’ve just made harder.
Humans tend to inherently value integrity, including consequentialists. However, this could actually be a bias among consequentialists that consequentialists should seek to abandon, if we think integrity and utility should come apart at the extremes and we should go for the extremes.
(EDIT) Humans are more limited cognitively than AGIs, and are less likely to identify net positive deceptive acts and more likely to identify net negative one than AGIs.
EDIT: On the other hand, maybe we shouldn’t trust utilitarians with AGIs aligned with their own values, either.
Utility and integrity coming apart, and in particular deception for gain, is one of the central concerns of AI safety. Shouldn’t we similarly be worried at the extremes even in human consequentialists?
It is somewhat disanalogous, though, because
We don’t expect one small group of humans to have so much power without the need to cooperate with others, like might be the case for an AGI taking over. Furthermore, the FTX/Alameda leaders had goals that were fairly aligned with a much larger community (the EA community), whose work they’ve just made harder.
Humans tend to inherently value integrity, including consequentialists. However, this could actually be a bias among consequentialists that consequentialists should seek to abandon, if we think integrity and utility should come apart at the extremes and we should go for the extremes.
(EDIT) Humans are more limited cognitively than AGIs, and are less likely to identify net positive deceptive acts and more likely to identify net negative one than AGIs.
EDIT: On the other hand, maybe we shouldn’t trust utilitarians with AGIs aligned with their own values, either.