One justification might be that in an online setting where you have to learn which options are best from past observations, the naive “follow the leader” approach—exactly maximizing your action based on whatever seems best so far—is easily exploited by an adversary.
This problem resolves itself if you make actions more likely if they’ve performed well, but regularize a little to smooth things out. The most common regularizer is entropy, and then as described on the “Softmax demystified” page, you basically end up recovering softmax (this is the well-known “multiplicative weight updates” algorithm).
Unlike poverty and disease, many of the harms of the criminal justice system are due to intentional cruelty. People are raped, beaten, and tortured every day in America’s jails and prisons. There are smaller cruelties, too, like prohibiting detainees from seeing visitors in order to extort more money out of their families.
To most people, seeing people doing intentional evil (and even getting rich off it) seems viscerally worse than harm due to natural causes.
I think from a ruthless expected utility perspective, this probably is correct in the abstract, i.e. all else equal, murder is worse than equivalently painful accidental death. However I doubt taking it into account (and even being very generous about things like “illegible corrosion to the social fabric”) would importantly change your conclusions about $/QALY in this case, because all else is not equal.
But, I think the distinction is probably worth making, as it’s a major difference between criminal justice reform and the two baselines for comparison.