This is a fantastic , clearly written, post. Thank you for writing up and sharing!
In the 3 models, why is outcome_2 not included as a predictor?
Iâm just trying to wrap my head around how the 3-wave separation works, but canât quite follow how the confounders will be controlled for if the treatment is the only variable included from wave 2.
For example, in the first model:
Suppose âactivismâ was a confounder for the effect of âveganuaryâ on âoutcomeâ (so âactivismâ caused increased âveganuaryâ exposure, as well as increased âoutcomeâ).
Suppose we have 2 participants with identical Wave 1 responses.
Between wave 1 and wave 2, the first participant is exposed to âactivismâ, which increases both their âveganuaryâ and âoutcomeâ values, and this change persists all the way through to Wave 3.
The first participant now has higher outcome_3 and veganuary_2 than the second participant, with all other predictors in the model equal, so this will lead to a positive coefficient for veganuary_2, even though the relationship between veganuary and outcome is not causal.
I can see how this problem is avoided if outcome_2 is included as a predictor instead (or maybe as well as..?) outcome_1. So maybe this is just a typo..? If so I would be interested in the explanation for whether you need outcome_1 and outcome_2, or if just outcome_2 is enough. Iâm finding that quite confusing to think about!
Thanks Toby! Great question: outcome_2 isnât included because it would over-adjust our estimate for veganuary_2. By design, outcome_2 occurs after (or at the same time as) veganuary_2. If it occurs after, outcome_2 will âcontainâ the effect of veganuary_2 (and in the real world, this contained effect may be larger than the effect on outcome_3 given attenuating effects over time). If we include outcome_2, our model will adjust for the now updated outcome_2, and âcontrol awayâ most or all of the effect when estimating for outcome_3. On the other hand, including activism_2 would successfully adjust for any inter-wave activism exposure.
There are then two directly related follow-up questions:
If outcome_2 occurs after, why not make it the primary outcome instead of outcome_3?
If exposure to activism occurs between wave 1 and 2, why donât we include activism_2 when estimating veganuary_2â˛s effect on the outcome?
This is interesting and directly relevant to inferring events from measurements. In this study, the outcome was prospective (e.g., what is your current consumption), while the predictors were both prospective and retrospective (e.g., what happened in the last six months). For question 1, outcome_2 occurs after the retrospective predictors but not after the prospective ones, so we have a reverse causation problem for some of the predictors. For question 2, from the framing of the survey questions (how much activism were you exposed to in the last six months), itâs not possible to determine whether activism_2 or veganuary_2 occurred first, meaning we would again have over-adjustment for many of the models.
In an ideal scenario, we would adjust for all potential confounders immediately prior to the exposure. But in those cases itâs a tug-o-war between temporal precedence and no alternative explanations because in the real world as soon as you start measuring extremely close to the exposure, it becomes unclear where the confounding control ends and where the exposure begins.
It makes sense that including outcome_2 would risk controlling away much of any effect of veganuary on outcome. And your answers to those pre-empted follow up questions make sense to me as well!
But does that then mean my original concern is still valid..? There is still a possibility that a statistically significant coefficient for veganuary_2 in the model might not be causal, but due to a confounder? Even a confounder that was actually measured, like activism exposure?
Thanks for the in-depth questions! Youâre right, and this is another limitation. Even for cases where there is no inter-wave activism, I should make it clear that the estimates are only truly causal if you adjust for all relevant confounders, which is unlikely in practice. So the results we get are associations, but less biased (aka causal under certain assumptions).
The main way we address this issue is through the sensitivity analysis, since it gives a sense of how much unmeasured confounding is required (from a variable not collected or a variable collected not granularly enough like you pointed out) to overturn significance. In our case, a moderate amount would be needed, so the estimates are likely at least directionally consistent.
This is a fantastic , clearly written, post. Thank you for writing up and sharing!
In the 3 models, why is outcome_2 not included as a predictor?
Iâm just trying to wrap my head around how the 3-wave separation works, but canât quite follow how the confounders will be controlled for if the treatment is the only variable included from wave 2.
For example, in the first model:
Suppose âactivismâ was a confounder for the effect of âveganuaryâ on âoutcomeâ (so âactivismâ caused increased âveganuaryâ exposure, as well as increased âoutcomeâ).
Suppose we have 2 participants with identical Wave 1 responses.
Between wave 1 and wave 2, the first participant is exposed to âactivismâ, which increases both their âveganuaryâ and âoutcomeâ values, and this change persists all the way through to Wave 3.
The first participant now has higher outcome_3 and veganuary_2 than the second participant, with all other predictors in the model equal, so this will lead to a positive coefficient for veganuary_2, even though the relationship between veganuary and outcome is not causal.
I can see how this problem is avoided if outcome_2 is included as a predictor instead (or maybe as well as..?) outcome_1. So maybe this is just a typo..? If so I would be interested in the explanation for whether you need outcome_1 and outcome_2, or if just outcome_2 is enough. Iâm finding that quite confusing to think about!
Thanks Toby! Great question: outcome_2 isnât included because it would over-adjust our estimate for veganuary_2. By design, outcome_2 occurs after (or at the same time as) veganuary_2. If it occurs after, outcome_2 will âcontainâ the effect of veganuary_2 (and in the real world, this contained effect may be larger than the effect on outcome_3 given attenuating effects over time). If we include outcome_2, our model will adjust for the now updated outcome_2, and âcontrol awayâ most or all of the effect when estimating for outcome_3. On the other hand, including activism_2 would successfully adjust for any inter-wave activism exposure.
There are then two directly related follow-up questions:
If outcome_2 occurs after, why not make it the primary outcome instead of outcome_3?
If exposure to activism occurs between wave 1 and 2, why donât we include activism_2 when estimating veganuary_2â˛s effect on the outcome?
This is interesting and directly relevant to inferring events from measurements. In this study, the outcome was prospective (e.g., what is your current consumption), while the predictors were both prospective and retrospective (e.g., what happened in the last six months). For question 1, outcome_2 occurs after the retrospective predictors but not after the prospective ones, so we have a reverse causation problem for some of the predictors. For question 2, from the framing of the survey questions (how much activism were you exposed to in the last six months), itâs not possible to determine whether activism_2 or veganuary_2 occurred first, meaning we would again have over-adjustment for many of the models.
In an ideal scenario, we would adjust for all potential confounders immediately prior to the exposure. But in those cases itâs a tug-o-war between temporal precedence and no alternative explanations because in the real world as soon as you start measuring extremely close to the exposure, it becomes unclear where the confounding control ends and where the exposure begins.
I hope that helps and feel free to follow up!
Thank you for the detailed reply Jared!
It makes sense that including outcome_2 would risk controlling away much of any effect of veganuary on outcome. And your answers to those pre-empted follow up questions make sense to me as well!
But does that then mean my original concern is still valid..? There is still a possibility that a statistically significant coefficient for veganuary_2 in the model might not be causal, but due to a confounder? Even a confounder that was actually measured, like activism exposure?
Thanks for the in-depth questions! Youâre right, and this is another limitation. Even for cases where there is no inter-wave activism, I should make it clear that the estimates are only truly causal if you adjust for all relevant confounders, which is unlikely in practice. So the results we get are associations, but less biased (aka causal under certain assumptions).
The main way we address this issue is through the sensitivity analysis, since it gives a sense of how much unmeasured confounding is required (from a variable not collected or a variable collected not granularly enough like you pointed out) to overturn significance. In our case, a moderate amount would be needed, so the estimates are likely at least directionally consistent.