Because utility and integrity are wholly independent variables, so there is no reason for us to assume a priori that they will always correlate perfectly. So if we wish to believe that integrity and expected value correlated for SBF, then we must show it. We must actually do the math.
This feels a bit unfair when people (i) have argued that utility and integrity will correlate strongly in practical cases (why use “perfectly” as your bar?), and (ii) that they will do so in ways that will be easy to underestimate if you just “do the math”.
You might think they’re mistaken, but some of the arguments do specifically talk about why the “assume 0 correlation and do the math”-approach works poorly, so if you disagree it’d be nice if you addressed that directly.
Utility and integrity coming apart, and in particular deception for gain, is one of the central concerns of AI safety. Shouldn’t we similarly be worried at the extremes even in human consequentialists?
It is somewhat disanalogous, though, because
We don’t expect one small group of humans to have so much power without the need to cooperate with others, like might be the case for an AGI taking over. Furthermore, the FTX/Alameda leaders had goals that were fairly aligned with a much larger community (the EA community), whose work they’ve just made harder.
Humans tend to inherently value integrity, including consequentialists. However, this could actually be a bias among consequentialists that consequentialists should seek to abandon, if we think integrity and utility should come apart at the extremes and we should go for the extremes.
(EDIT) Humans are more limited cognitively than AGIs, and are less likely to identify net positive deceptive acts and more likely to identify net negative one than AGIs.
EDIT: On the other hand, maybe we shouldn’t trust utilitarians with AGIs aligned with their own values, either.
Assuming zero correlation between two variables is standard practice. Because for any given set of two variables, it is very likely that they do not correlate. Anyone that wants to disagree must crunch the numbers and disprove it. That’s just how math works.
And if we want to treat ethics like math, then we need to actually do some math. We can’t have our cake and eat it too
I’m not sure how literally you mean “disprove”, but at it’s face, “assume nothing is related to anything until you have proven otherwise” is a reasoning procedure that will never recommend any action in the real world, because we never get that kind of certainty. When humans try to achieve results in the real world, heuristics, informal arguments, and looking at what seems to have worked ok in the past are unavoidable.
I am talking about math. In math, we can at least demonstrate things for certain (and prove things for certain, too, though that is admittedly not what I am talking about).
But the point is that we should at least be to bust out our calculators and crunch the numbers. We might not know if these numbers apply to the real world. That’s fine. But at least we have the numbers. And that counts for something.
For example, we can know roughly how much wealth SBF was gambling. We can give that a range. We also can estimate how much risk he was taking on. We can give that a range too. Then we can calculate if the risk he took on had net positive expected value in expectation
It’s possible that it has expected value in expectation, only above a certain level of risk, or whatever. Perhaps we do not know whether he faced this risk. That is fine. But we can still at any rate see in under what circumstances SBF would have been rational, acting on utilitarian grounds, to do what he did.
If these circumstances sound like do or could describe the circumstances that SBF was in earlier this week, then that should give us reason to pause.
This feels a bit unfair when people (i) have argued that utility and integrity will correlate strongly in practical cases (why use “perfectly” as your bar?), and (ii) that they will do so in ways that will be easy to underestimate if you just “do the math”.
You might think they’re mistaken, but some of the arguments do specifically talk about why the “assume 0 correlation and do the math”-approach works poorly, so if you disagree it’d be nice if you addressed that directly.
Utility and integrity coming apart, and in particular deception for gain, is one of the central concerns of AI safety. Shouldn’t we similarly be worried at the extremes even in human consequentialists?
It is somewhat disanalogous, though, because
We don’t expect one small group of humans to have so much power without the need to cooperate with others, like might be the case for an AGI taking over. Furthermore, the FTX/Alameda leaders had goals that were fairly aligned with a much larger community (the EA community), whose work they’ve just made harder.
Humans tend to inherently value integrity, including consequentialists. However, this could actually be a bias among consequentialists that consequentialists should seek to abandon, if we think integrity and utility should come apart at the extremes and we should go for the extremes.
(EDIT) Humans are more limited cognitively than AGIs, and are less likely to identify net positive deceptive acts and more likely to identify net negative one than AGIs.
EDIT: On the other hand, maybe we shouldn’t trust utilitarians with AGIs aligned with their own values, either.
Assuming zero correlation between two variables is standard practice. Because for any given set of two variables, it is very likely that they do not correlate. Anyone that wants to disagree must crunch the numbers and disprove it. That’s just how math works.
And if we want to treat ethics like math, then we need to actually do some math. We can’t have our cake and eat it too
I’m not sure how literally you mean “disprove”, but at it’s face, “assume nothing is related to anything until you have proven otherwise” is a reasoning procedure that will never recommend any action in the real world, because we never get that kind of certainty. When humans try to achieve results in the real world, heuristics, informal arguments, and looking at what seems to have worked ok in the past are unavoidable.
I am talking about math. In math, we can at least demonstrate things for certain (and prove things for certain, too, though that is admittedly not what I am talking about).
But the point is that we should at least be to bust out our calculators and crunch the numbers. We might not know if these numbers apply to the real world. That’s fine. But at least we have the numbers. And that counts for something.
For example, we can know roughly how much wealth SBF was gambling. We can give that a range. We also can estimate how much risk he was taking on. We can give that a range too. Then we can calculate if the risk he took on had net positive expected value in expectation
It’s possible that it has expected value in expectation, only above a certain level of risk, or whatever. Perhaps we do not know whether he faced this risk. That is fine. But we can still at any rate see in under what circumstances SBF would have been rational, acting on utilitarian grounds, to do what he did.
If these circumstances sound like do or could describe the circumstances that SBF was in earlier this week, then that should give us reason to pause.