Thanks for doing this. I really appreciate your running this as a controlled trial. I hope this fosters a range of additional experimental work and evidence-gathering. Also great that you are making the data and analysis public, and that you pre-registered your hypotheses. I think this was a success in terms of following a careful protocol, learning, and getting better at this stuff. It is putting us on a good path.
A few things I might have done or reported differently (we have discussed much of this, but I want to share it publicly, for others to consider and maybe weigh in on).
You did state your results tentatively, but I would have been even a bit more tentative. Given the differential attrition, I’m just not sure that we really can be confident that the interventions had an effect. And given this and the self-selection to each treatment group, we can’t say much about the relative efficacy of each of the two interventions.
The (differential) attrition problem is really a substantial one. As we’ve discussed, a large (and different) share in both treatments, and in the control group, did not complete both rounds of the longitudinal study.
The chief concern here is that the treatments themselves may have had an impact on the composition of those who completed the second survey. To some extent we can see evidence that something other than ‘random attrition’ is going on in the data—we see greater attrition in the control than in the treatment. IIRC composition of the treated and control groups (ex-post) also differed in terms of ex-ante observable traits, traits which could not have been affected by the treatment themselves. Thus, given a randomly assigned treatment, these could only be due to differential attrition.
Note that there are some reasonable statistical ‘Bounding’ approaches (see e.g., Lee 2009) for dealing with differential attrition, although these tend to lead to very wide bounds when you have substantial (differential) levels of attrition.
I appreciate your use of the LinkedIn data for follow-up, I would pursue this further. To the extent that you can track down the future outcomes of a large set of respondents through LinkedIn, this will help recover very meaningful estimates, in my opinion. You note, correctly, that this is much less vulnerable to differential attrition bias, as well as less biased towards “pleasing those who you spoke to” (differential desirability bias). I would follow up on this for future outcomes, and do this more carefully, using a blind external rater (or an AI tool to rate these, maybe GPT3 as a classifier).
You noted the difference in self-selection into the two types of treatments and the resulting limitations to the comparability of these. However, a great deal of the post does discuss these differences and the implications for cost-effectiveness. To me, this seems to be digging too deeply into an area where we don’t have strong evidence yet.
I’d love to see followups of this or similar experiments. Perhaps you can run more in the future, with larger sample sizes, and more carefully considering plans to limit the possibility of differential attrition. Perhaps limiting the study to those with LinkedIn accounts would be one way of doing this. Another possibility (which you could even in principle pursue with the previous tested group) would be to find the funds to pay fairly large rewards to follow up again with a survey for everyone in each group. If the rewards were sufficient, I guess you could probably track down everyone or nearly everyone.
By the way, I made a recording where I read your post (with some comments mostly overlapping the comment here), which I will put up on my podcast (and link here) shortly.
Thanks David! And thanks again for all your help. I agree with lots of this, e.g. differential attrition being a substantial problem and follow-ups being very desirable. More on some of that in the next forum post that I’ll share next week.
Thanks for doing this. I really appreciate your running this as a controlled trial. I hope this fosters a range of additional experimental work and evidence-gathering. Also great that you are making the data and analysis public, and that you pre-registered your hypotheses. I think this was a success in terms of following a careful protocol, learning, and getting better at this stuff. It is putting us on a good path.
A few things I might have done or reported differently (we have discussed much of this, but I want to share it publicly, for others to consider and maybe weigh in on).
You did state your results tentatively, but I would have been even a bit more tentative. Given the differential attrition, I’m just not sure that we really can be confident that the interventions had an effect. And given this and the self-selection to each treatment group, we can’t say much about the relative efficacy of each of the two interventions.
The (differential) attrition problem is really a substantial one. As we’ve discussed, a large (and different) share in both treatments, and in the control group, did not complete both rounds of the longitudinal study.
The chief concern here is that the treatments themselves may have had an impact on the composition of those who completed the second survey. To some extent we can see evidence that something other than ‘random attrition’ is going on in the data—we see greater attrition in the control than in the treatment. IIRC composition of the treated and control groups (ex-post) also differed in terms of ex-ante observable traits, traits which could not have been affected by the treatment themselves. Thus, given a randomly assigned treatment, these could only be due to differential attrition.
Note that there are some reasonable statistical ‘Bounding’ approaches (see e.g., Lee 2009) for dealing with differential attrition, although these tend to lead to very wide bounds when you have substantial (differential) levels of attrition.
I appreciate your use of the LinkedIn data for follow-up, I would pursue this further. To the extent that you can track down the future outcomes of a large set of respondents through LinkedIn, this will help recover very meaningful estimates, in my opinion. You note, correctly, that this is much less vulnerable to differential attrition bias, as well as less biased towards “pleasing those who you spoke to” (differential desirability bias). I would follow up on this for future outcomes, and do this more carefully, using a blind external rater (or an AI tool to rate these, maybe GPT3 as a classifier).
You noted the difference in self-selection into the two types of treatments and the resulting limitations to the comparability of these. However, a great deal of the post does discuss these differences and the implications for cost-effectiveness. To me, this seems to be digging too deeply into an area where we don’t have strong evidence yet.
I’d love to see followups of this or similar experiments. Perhaps you can run more in the future, with larger sample sizes, and more carefully considering plans to limit the possibility of differential attrition. Perhaps limiting the study to those with LinkedIn accounts would be one way of doing this. Another possibility (which you could even in principle pursue with the previous tested group) would be to find the funds to pay fairly large rewards to follow up again with a survey for everyone in each group. If the rewards were sufficient, I guess you could probably track down everyone or nearly everyone.
By the way, I made a recording where I read your post (with some comments mostly overlapping the comment here), which I will put up on my podcast (and link here) shortly.
AUDIO on my podcast HERE
Thanks David! And thanks again for all your help. I agree with lots of this, e.g. differential attrition being a substantial problem and follow-ups being very desirable. More on some of that in the next forum post that I’ll share next week.
(Oh, and thanks for recording!)