I’ve been thinking more lately about how I should be thinking about causal effects for cost-effectiveness estimates, in order to clarify my own skepticism of more speculative causes, especially longtermist ones, and better understand how skeptical I ought to be. Maybe I’m far too skeptical. Maybe I just haven’t come across a full model for causal effects that’s convincing since I haven’t been specifically looking. I’ve been referred to this in the past, and plan to get through it, since it might provide some missing pieces for the value of research. This also came up here.
Suppose I have two random variables, X and Y, and I want to know the causal effect of manipulating X on Y, if any.
1. If I’m confident there’s no causal relationship between the two, say due to spatial separation, I assume there is no causal effect, and Y conditional on the manipulation of X to take value A (possibly random), Y|do(X=A), is identical to Y, i.e. Y|do(X=A)=Y. (The do notation is Pearl’s do-calculus notation.)
2. If X could affect Y, but I know nothing else,
a. I might assume, based on symmetry (and chaos?) for Y, that Y|do(X=A) and Y are identical in distribution, but not necessarily literally equal as random variables. They might be slightly “shuffled” or permuted versions of each other (see symmetric decreasing rearrangements for specific examples of such a permutation). The difference in expected values is still 0. This is how I think about the effects of my every day decisions, like going to the store, breathing at particular times, etc. on future populations. I might assume the same for variables that depend on Y.
b. Or, I might think that manipulating X just injects noise into Y, possibly while preserving some of its statistics, e.g. the mean or median. A simple case is just adding random symmetric noise with mean and median 0 to Y. However, whether or not a statistic is preserved with the extra noise might be sensitive to the scale on which Y is measured. For example, if Y is real-valued, and f:R→R is strictly increasing, then for the median, med(f(Y))=f(med(Y)), but the same is not necessarily true for the expected value of Y, or for other variables that depend on Y.
c. Or, I might think that manipulating X makes Ycloser to a “default” distribution over the possible values of Y, often but not always uninformed or uniform. This can shift the mean, median, etc., of Y. For example, Y could be the face of the coin I see on my desk, and X could be whether I flip the coin or not, with X being not by default. So, if I do flip the coin and hence manipulate X, this randomizes the value of Y, making my probability distribution for its value uniformly random instead of a known, deterministic value. You might think that some systems are the result of optimization and therefore fragile, so random interventions might return them to prior “defaults”, e.g. naive systemic change or changes to ecosystems. This could be (like) regression to the mean.
I’m not sure how to balance these three possibilities generally. If I do think the effects are symmetric, I might go with a or b or some combination of them. In particular asymmetric cases, I might also combine c.
3. Suppose I have a plausible argument for how X could affect Y in a particular way, but no observations that can be used as suitable proxies, even very indirect, for counterfactuals with which to estimate the size of the effect. I lean towards dealing with this case as in 2, rather than just making assumptions about effect sizes without observations.
For example, someone might propose a causal path through which X affects Y with a missing estimate of effect size at at least one step along the path, but an argument to that this should increase the value of Y. It is not enough to consider only one such path, since there may be many paths from X to Y, e.g. different considerations for how X could affect Y, and these would need to be combined. Some could have opposite effects. By 2, those other paths, when combined with the proposed causal path, reduce the effects of X on Y through the proposed path. The longer the proposed path, the more unknown alternate paths.
I think this is where I am now with speculative longtermist causes. Part of this may be my ignorance of the proposed causal paths and estimates of effect sizes, since I haven’t looked too deeply at the justifications for these causes, but the dampening from unknown paths also applies when the effect sizes along a path are known, which is the next case.
4. Suppose I have a causal path through some other variable Z, X→Z→Y, so that X causes Z and Z causes Y, and I model both the effects of X→Z and Z→Y, based on observations. Should I just combine the two for the effect of X on Y? In general, not in the straightforward way. As in 3, there could be another causal path, X→Z′→Y (and it could be longer, instead of with just a single intermediate variable).
As in case 3, you can think of X→Z′→Y as dampening the effect of X→Z→Y, and with long proposed causal paths, we might expect the net effect to be small, consistently with the intuition that the predictable impacts on the far future decrease over time due to ignorance/noise and chaos, even though the actual impacts may compound due to chaos.
Maybe I’ll write this up as a full post after I’ve thought more about it. I imagine there’s been writing related to this, including in the EA and rationality communities.
I’ve been thinking more lately about how I should be thinking about causal effects for cost-effectiveness estimates, in order to clarify my own skepticism of more speculative causes, especially longtermist ones, and better understand how skeptical I ought to be. Maybe I’m far too skeptical. Maybe I just haven’t come across a full model for causal effects that’s convincing since I haven’t been specifically looking. I’ve been referred to this in the past, and plan to get through it, since it might provide some missing pieces for the value of research. This also came up here.
Suppose I have two random variables, X and Y, and I want to know the causal effect of manipulating X on Y, if any.
1. If I’m confident there’s no causal relationship between the two, say due to spatial separation, I assume there is no causal effect, and Y conditional on the manipulation of X to take value A (possibly random), Y|do(X=A), is identical to Y, i.e. Y|do(X=A)=Y. (The do notation is Pearl’s do-calculus notation.)
2. If X could affect Y, but I know nothing else,
a. I might assume, based on symmetry (and chaos?) for Y, that Y|do(X=A) and Y are identical in distribution, but not necessarily literally equal as random variables. They might be slightly “shuffled” or permuted versions of each other (see symmetric decreasing rearrangements for specific examples of such a permutation). The difference in expected values is still 0. This is how I think about the effects of my every day decisions, like going to the store, breathing at particular times, etc. on future populations. I might assume the same for variables that depend on Y.
b. Or, I might think that manipulating X just injects noise into Y, possibly while preserving some of its statistics, e.g. the mean or median. A simple case is just adding random symmetric noise with mean and median 0 to Y. However, whether or not a statistic is preserved with the extra noise might be sensitive to the scale on which Y is measured. For example, if Y is real-valued, and f:R→R is strictly increasing, then for the median, med(f(Y))=f(med(Y)), but the same is not necessarily true for the expected value of Y, or for other variables that depend on Y.
c. Or, I might think that manipulating X makes Y closer to a “default” distribution over the possible values of Y, often but not always uninformed or uniform. This can shift the mean, median, etc., of Y. For example, Y could be the face of the coin I see on my desk, and X could be whether I flip the coin or not, with X being not by default. So, if I do flip the coin and hence manipulate X, this randomizes the value of Y, making my probability distribution for its value uniformly random instead of a known, deterministic value. You might think that some systems are the result of optimization and therefore fragile, so random interventions might return them to prior “defaults”, e.g. naive systemic change or changes to ecosystems. This could be (like) regression to the mean.
I’m not sure how to balance these three possibilities generally. If I do think the effects are symmetric, I might go with a or b or some combination of them. In particular asymmetric cases, I might also combine c.
3. Suppose I have a plausible argument for how X could affect Y in a particular way, but no observations that can be used as suitable proxies, even very indirect, for counterfactuals with which to estimate the size of the effect. I lean towards dealing with this case as in 2, rather than just making assumptions about effect sizes without observations.
For example, someone might propose a causal path through which X affects Y with a missing estimate of effect size at at least one step along the path, but an argument to that this should increase the value of Y. It is not enough to consider only one such path, since there may be many paths from X to Y, e.g. different considerations for how X could affect Y, and these would need to be combined. Some could have opposite effects. By 2, those other paths, when combined with the proposed causal path, reduce the effects of X on Y through the proposed path. The longer the proposed path, the more unknown alternate paths.
I think this is where I am now with speculative longtermist causes. Part of this may be my ignorance of the proposed causal paths and estimates of effect sizes, since I haven’t looked too deeply at the justifications for these causes, but the dampening from unknown paths also applies when the effect sizes along a path are known, which is the next case.
4. Suppose I have a causal path through some other variable Z, X→Z→Y, so that X causes Z and Z causes Y, and I model both the effects of X→Z and Z→Y, based on observations. Should I just combine the two for the effect of X on Y? In general, not in the straightforward way. As in 3, there could be another causal path, X→Z′→Y (and it could be longer, instead of with just a single intermediate variable).
As in case 3, you can think of X→Z′→Y as dampening the effect of X→Z→Y, and with long proposed causal paths, we might expect the net effect to be small, consistently with the intuition that the predictable impacts on the far future decrease over time due to ignorance/noise and chaos, even though the actual impacts may compound due to chaos.
Maybe I’ll write this up as a full post after I’ve thought more about it. I imagine there’s been writing related to this, including in the EA and rationality communities.