I think cluster thinking and the use of sensitivity analysis are approaches for decision making under deep uncertainty, when it’s difficult to commit to a particular joint probability distribution or weight considerations. Robust decision making is another. The maximality rule is another: given some set of plausible (empirical or ethical) worldviews/models for which we can’t commit to quantifying our uncertainty, if A is worse in expectation than B under some subset of plausible worldviews/models, and not better than B in expectation under any such set of plausible worldviews/models, we say A < B, and we should rule out A.
It seems like EAs should be more familiar with the field of decision making under deep uncertainty. (Thanks to this post by weeatquince for pointing this out.)
See also:
Deep Uncertainty by Walker, Lempert and Kwakkel for a short review.
The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:
Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that
EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)
and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.
This means pairing asymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to give equal weight to P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?
The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be
EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.
I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a preferred mapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.
One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.
Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability ranges to the expected value of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.
I think cluster thinking and the use of sensitivity analysis are approaches for decision making under deep uncertainty, when it’s difficult to commit to a particular joint probability distribution or weight considerations. Robust decision making is another. The maximality rule is another: given some set of plausible (empirical or ethical) worldviews/models for which we can’t commit to quantifying our uncertainty, if A is worse in expectation than B under some subset of plausible worldviews/models, and not better than B in expectation under any such set of plausible worldviews/models, we say A < B, and we should rule out A.
It seems like EAs should be more familiar with the field of decision making under deep uncertainty. (Thanks to this post by weeatquince for pointing this out.)
See also:
Deep Uncertainty by Walker, Lempert and Kwakkel for a short review.
Decision Making under Deep Uncertainty: From Theory to Practice for a comprehensive text.
Heuristics for clueless agents: how to get away with ignoring what matters most in ordinary decision-making by David Thorstad and Andreas Mogensen
Many Weak Arguments vs. One Relatively Strong Argument and Robustness of Cost-Effectiveness Estimates and Philanthropy by Jonah Sinick
Why I’m skeptical about unproven causes (and you should be too) by Peter Hurford (LW, blog)
The Optimizer’s Curse & Wrong-Way Reductions by Chris Smith (blog)
EDIT: I think this approach isn’t very promising.
The above mentioned papers by Mogensen and Thorstad are critical of the maximality rule for being too permissive, but here’s a half-baked attempt to improve it:
Suppose you have a social welfare function U, and want to compare two options, A and B. Suppose further that you have two sets of probability distributions of size n for the outcome X of each of A of and B, PA,PB. Then A≿B (A is at least as good as B) if (and only if) there is a bijection f:PA→PB such that
EX∼P[U(X)]≥EX∼f(P)[U(X)], for all P∈PA, (1)
and furthermore, A≻B (A is strictly better than B) if the above inequality is strict for some P∈PA.
This means pairing asymmetric/complex cluelessness arguments. Suppose you think helping an elderly person cross the street might have some important effect on the far future (you have some P∈PA), but you think not doing so could also have a similar far-future effect (according to P′∈PB), but the short-term consequences are worse, and under some pairing of distributions/arguments f:PA→PB, helping the elderly person always looks at least as good and under one pair (P,f(P)) looks better, so you should do it. Pairing distributions like this in some sense forces us to give equal weight to P and f(P), and maybe this goes too far and assumes away too much of our cluelessness or deep uncertainty?
The maximality rule as described in Maximal Cluelessness effectively assumes a pairing is already given to you, by instead using a single set of distributions P that can each be conditioned on taking action A or B. We’d omit f, and the expression replacing (1) above would be
EX∼P|A[U(X)]≥EX∼P|B[U(X)], for all P∈P.
I’m not sure what to do for different numbers of distributions for each option or infinitely many distributions. Maybe the function f should be assumed given, as a preferred mapping between distributions, and we could relax the surjectivity, total domain, injectivity and even fact that it’s a function, e.g. we compare for pairs (PA,PB)∈R, for some relation (subset) R⊆PA×PB. But assuming we already have such a function or relation seems to assume away too much of our deep uncertainty.
One plausibly useful first step is to sort PA and PB according to the expected values of U(A) and U(B) under their corresponding probability distributions, respectively. Should the mapping or relation preserve the min and max? How should we deal with everything else? I suspect any proposal will seem arbitrary.
Perhaps we can assume slightly more structure on the sets PA for each option A by assuming multiple probability distributions on PA, and go up a level (and we could repeat). Basically, I want to give probability ranges to the expected value of the action A, and then compare the possible expected values of these expected values. However, if we just multiply our higher-order probability distributions by the lower-order ones, this comes back to the original scenario.