The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:
Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that
EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)
and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.
This means pairing asymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to give equal weight to P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?
The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be
EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.
I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a preferred mapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.
One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.
Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability ranges to the expected value of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.
EDIT: I think this approach isn’t very promising.
The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:
Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that
EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)
and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.
This means pairing asymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to give equal weight to P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?
The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be
EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.
I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a preferred mapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.
One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.
Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability ranges to the expected value of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.