GiveWell seems nonplussed by not having an RCT comparing deworming to cash transfers or malaria prevention to vitamin-A supplementation, I’m inclined to believe this is something they’ve thought about. That’s not meant to be a knockdown argument against the idea, but if an RCT like this is needed for comparing psychotherapy to cash transfers—why not every other intervention currently recommended in the EA space?
This does seem different, though. When you’re studying whether bednets or vitamin A save lives, there’s no plausible basis for thinking the beneficiary’s knowledge that they are in the treatment group, or the non-effective portions of the experimental situation, could skew results. So it’s fine to use a control group that consists of no intervention. In contrast, when you’re studying a new medication for headache, you very much do not want the treatment group and the control group to know who they are—you want them to believe they are receiving something equally effective. Hence we have placebos.
I see that many of the studies had what you characterize as some form of “care as usual” or a placebo like “HIV education.” I flipped through a few of the linked studies, and I didn’t walk away with an impression that the control group received an intervention that was nearly as immersive—or that would lead participants to think their mental health would benefit—as the psychotherapy intervention. (Although to be fair, most research articles don’t dwell on the control group very much!)
And it seems that placebo “quality” can matter a lot—e.g., this small study, where anti-depressant + supportive care reduced HRSD scores about 10 points, placebo pill + supportive care about 7.5, supportive care only less than 1.5. If you just looked at anti-depressant vs. the weak control of supportive care only, that anti-depressant looks awfully good. Likewise, on immersiveness, sham surgery does a lot better than sham acupuncture, which does a lot better than sham pills, for migraine headache.
So at some point, I think it’s reasonable to ask for an assessment of SM—or a similar program with a similar client population—against a control group that receives an intervention that is both of similar intensity and that study participants believed would likely improve their subjective well-being and/or depression. I hear that HLI doesn’t currently have capacity to fund that, though.
As for the control: I don’t think something like HIV education works—the participants would not expect receiving that to improve their subjective well-being. Cash transfers is an obvious option, but probably not the only one. Pill placebos would work in Western countries, but maybe not in other places. Some sort of religious control-group experience (e.g., eight sessions of prayer vs. eight sessions of SM) would be a controversial active control, but seems potentially plausible if consistent with the cultural beliefs of the study population. Sham psychotherapy seems hard to pull off unless you have highly trained experimenters, but could be an option if you do.
In short, you’re trying to measure an outcome variable that is far more sensitive to these sorts of issues than GiveWell (whose outcome measure is primarily loaded on whether the beneficiaries are less likely to die).
There’s two separate topics here, the one I was discussing in the quoted text was about whether an intra RCT comparison of two interventions was necessary or whether two meta-analyeses of two interventions would be sufficient. The references to GiveWell were not about the control groups they accept, but about their willingness to use meta-analyses instead of RCTs with arms comparing different interventions they suggest.
Another topic is the appropriate control group to compare psychotherapy against. But, I think you make a decent argument that placebo quality could matter. It’s given me some things to think about, thank you.
Thanks, Joel. I agree that an RCT of SM vs cash wouldn’t be useful as a head-to-head comparison of the two interventions. Among other things, “cash transfers to people who report being very depressed” is unlikely to be a scalable intervention anyway—people in the service area would figure out what the “correct” answers were to obtain resources they needed for themselves and their families, and the program would largely turn into “generic cash transfers.”
I think that your idea of sham psychotherapy Jason is a great idea and could well work, although it wouldn’t be ethical unfortunately so couldn’t be done. Thinking of alternatives to cash is a good idea but hard.
I think the purpose of testing Strong minds vs. cash is good not because we are considering giving cash instead to people who are depressed (you are right about it not being able to scale), but instead to see if SM really is better than cash using the before and after subjective question system. If SM squarely beat out cash, it would give me far more confidence that the before and after subjective wellbeing questions can work without a crippling amount of bias, as cash is far more likely than psychotherapy to illicit a positive future hope rating bias.
Would be interested to hear what’s included in your “among other things” that you don’t like about cash vs. Strongminds
I understand the discussion above to be about whether it is necessary or advisible to have a SM arm and a cash arm in the same RCT. One major issue I would have with that design is that (based on what I understand of typical study recruitment) a fair number of people in the SM arm would know what people in the other arm got. I imagine that some people would be rather disappointed once they found that out that the other group got several months’ worth of income and they got lay psychotherapy sessions.
Likewise, if I were running a RCT of alprazolam vs. cognitive-behavioral therapy for panic disorder, I wouldn’t want the CBT arm participants to see how the alprazolam branch was doing after a few weeks. Seeing the quick symptom relief of a benzo in other participants, and realizing they might be experiencing that present relief but for a coin flip, would risk biasing the CBT group.
It’s not obvious to me why concerns about potential crippling bias in subjective well-being questions couldn’t be met with the alternative Joel mentioned, “two high-quality trials run separately about two different interventions but measuring similar outcomes.” If cash creates high bias (and shows the measurement of certain subjective states to be unreliable), it should show this bias in a separate trial as effectively as in a head-to-head in the same RCT. Of course, the outcome measures would need to be similar enough, and the participant population would need to be similar enough.
As far as other factors, I think cost is a potentially significant one—it’s been almost twenty years since I took a graduate research design course (and it was in sociology), but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population. If SM continues to raise money at the rate it did in 2021 (vs. significantly lower funding levels in prior years), my consideration of that factor will diminish.
“but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population” I really like this.
You are right again that two trials would show the bias separately, but doing 2 separate trials loses the key RCT benefits of (almost) removing confounding and bias. Selecting 2 populations for different trials that are comparable is very, very difficult.
My view on whether a cash vs. SM RCT is necessary / worth the money could definitely change based on the results of a good literature review or piggyback.
This does seem different, though. When you’re studying whether bednets or vitamin A save lives, there’s no plausible basis for thinking the beneficiary’s knowledge that they are in the treatment group, or the non-effective portions of the experimental situation, could skew results. So it’s fine to use a control group that consists of no intervention. In contrast, when you’re studying a new medication for headache, you very much do not want the treatment group and the control group to know who they are—you want them to believe they are receiving something equally effective. Hence we have placebos.
I see that many of the studies had what you characterize as some form of “care as usual” or a placebo like “HIV education.” I flipped through a few of the linked studies, and I didn’t walk away with an impression that the control group received an intervention that was nearly as immersive—or that would lead participants to think their mental health would benefit—as the psychotherapy intervention. (Although to be fair, most research articles don’t dwell on the control group very much!)
And it seems that placebo “quality” can matter a lot—e.g., this small study, where anti-depressant + supportive care reduced HRSD scores about 10 points, placebo pill + supportive care about 7.5, supportive care only less than 1.5. If you just looked at anti-depressant vs. the weak control of supportive care only, that anti-depressant looks awfully good. Likewise, on immersiveness, sham surgery does a lot better than sham acupuncture, which does a lot better than sham pills, for migraine headache.
So at some point, I think it’s reasonable to ask for an assessment of SM—or a similar program with a similar client population—against a control group that receives an intervention that is both of similar intensity and that study participants believed would likely improve their subjective well-being and/or depression. I hear that HLI doesn’t currently have capacity to fund that, though.
As for the control: I don’t think something like HIV education works—the participants would not expect receiving that to improve their subjective well-being. Cash transfers is an obvious option, but probably not the only one. Pill placebos would work in Western countries, but maybe not in other places. Some sort of religious control-group experience (e.g., eight sessions of prayer vs. eight sessions of SM) would be a controversial active control, but seems potentially plausible if consistent with the cultural beliefs of the study population. Sham psychotherapy seems hard to pull off unless you have highly trained experimenters, but could be an option if you do.
In short, you’re trying to measure an outcome variable that is far more sensitive to these sorts of issues than GiveWell (whose outcome measure is primarily loaded on whether the beneficiaries are less likely to die).
There’s two separate topics here, the one I was discussing in the quoted text was about whether an intra RCT comparison of two interventions was necessary or whether two meta-analyeses of two interventions would be sufficient. The references to GiveWell were not about the control groups they accept, but about their willingness to use meta-analyses instead of RCTs with arms comparing different interventions they suggest.
Another topic is the appropriate control group to compare psychotherapy against. But, I think you make a decent argument that placebo quality could matter. It’s given me some things to think about, thank you.
Thanks, Joel. I agree that an RCT of SM vs cash wouldn’t be useful as a head-to-head comparison of the two interventions. Among other things, “cash transfers to people who report being very depressed” is unlikely to be a scalable intervention anyway—people in the service area would figure out what the “correct” answers were to obtain resources they needed for themselves and their families, and the program would largely turn into “generic cash transfers.”
I think that your idea of sham psychotherapy Jason is a great idea and could well work, although it wouldn’t be ethical unfortunately so couldn’t be done. Thinking of alternatives to cash is a good idea but hard.
I think the purpose of testing Strong minds vs. cash is good not because we are considering giving cash instead to people who are depressed (you are right about it not being able to scale), but instead to see if SM really is better than cash using the before and after subjective question system. If SM squarely beat out cash, it would give me far more confidence that the before and after subjective wellbeing questions can work without a crippling amount of bias, as cash is far more likely than psychotherapy to illicit a positive future hope rating bias.
Would be interested to hear what’s included in your “among other things” that you don’t like about cash vs. Strongminds
I understand the discussion above to be about whether it is necessary or advisible to have a SM arm and a cash arm in the same RCT. One major issue I would have with that design is that (based on what I understand of typical study recruitment) a fair number of people in the SM arm would know what people in the other arm got. I imagine that some people would be rather disappointed once they found that out that the other group got several months’ worth of income and they got lay psychotherapy sessions.
Likewise, if I were running a RCT of alprazolam vs. cognitive-behavioral therapy for panic disorder, I wouldn’t want the CBT arm participants to see how the alprazolam branch was doing after a few weeks. Seeing the quick symptom relief of a benzo in other participants, and realizing they might be experiencing that present relief but for a coin flip, would risk biasing the CBT group.
It’s not obvious to me why concerns about potential crippling bias in subjective well-being questions couldn’t be met with the alternative Joel mentioned, “two high-quality trials run separately about two different interventions but measuring similar outcomes.” If cash creates high bias (and shows the measurement of certain subjective states to be unreliable), it should show this bias in a separate trial as effectively as in a head-to-head in the same RCT. Of course, the outcome measures would need to be similar enough, and the participant population would need to be similar enough.
As far as other factors, I think cost is a potentially significant one—it’s been almost twenty years since I took a graduate research design course (and it was in sociology), but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population. If SM continues to raise money at the rate it did in 2021 (vs. significantly lower funding levels in prior years), my consideration of that factor will diminish.
“but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population” I really like this.
You are right again that two trials would show the bias separately, but doing 2 separate trials loses the key RCT benefits of (almost) removing confounding and bias. Selecting 2 populations for different trials that are comparable is very, very difficult.
My view on whether a cash vs. SM RCT is necessary / worth the money could definitely change based on the results of a good literature review or piggyback.