Nick, thank you for your comment. I always appreciate your friendly tone.
but I also believe that when an org gets to the scale that StrongMinds have now reached, they should have an RCT vs cash at least in the works.
I agree this would be great … but this also seems like a really strict requirement, and I’m not sure it’s necessary. GiveWell seems nonplussed by not having an RCT comparing deworming to cash transfers or malaria prevention to vitamin-A supplementation, I’m inclined to believe this is something they’ve thought about. That’s not meant to be a knockdown argument against the idea, but if an RCT like this is needed for comparing psychotherapy to cash transfers—why not every other intervention currently recommended in the EA space?
More directly, if we have two high-quality trials run separately about two different interventions but measuring similar outcomes—how much better is this than running an RCT with two arms? It certainly reduces differences in confounders (particularly unobserved) between trials. But I think it’s possible it could also have weaknesses.
It seems plausibly more expensive to coordinate two high-quality interventions in a single trial than let them be run separately. For instance, IIRC correctly in Haushofer et al., 2020 the comparison between psychotherapy and GiveDirectly was a bit apples to oranges. For psychotherapy, they hired a local NGO to start a new programme from scratch, which they compared to GiveDirectly, which by that time, was a well-oiled cash-slinging machine. To get two organizations that already know how to deploy the intervention well to collaborate on a single RCT seems difficult and expensive.
It also may have limited generalizability. Running separate trials of charity interventions makes it likelier that the results reflect the circumstances that the charity operates in—unless they have an area of overlap—which is possible, but finding that seems like another reason this could be difficult.
Lastly, regarding the making the dream RCT happen, HLI is currently rather resource constrained, so in our work, we have to make do with the existing literature, and we’re only just now exploring “research advocacy” as an option. Running an RCT would probably cost a multiple of our annual budget. For StrongMinds, they have more resources, but not much more. If it was desired to have another RCT with a psychotherapy and cash arm, I wonder if the GiveDirectly RCT machine may be the most promising way to get that evidence.
You also say that there are 39 studies analysed, but it looks like there are a lot less studies than that, with individual studies broken into differnent groups (like a,b,c,d).
I think what may be happening here is I use “studies” as synonymous with “trials”. So in my usage, one paper can analyse multiple studies (or trials). However, on reflection, I realise I sometimes refer to papers as studies—which is unhelpful, therefore, I think it’d be clearer if I referred to each separate intervention experiment as a “trial”. Another thing that may be confusing is that sometimes authors will publish multiple papers in the same year. I distinguish these papers by adding an “a” or “b” etc to the end of the reference.
But if you count all of the different unique “trials”, it does come out to 39.
Also have you thought about publishing your Meta-analysis in a peer reviewed journal?
We’re keen to do this. But the existing meta-analysis is probably 65% of the rigour necessary for an academic paper. This year we are trying to redo the analysis with an academic collaborator so the search will be systematic, the data will be double-screened, and we have many more robustness tests.
(I’ll answer the selection criterion question separately)
GiveWell seems nonplussed by not having an RCT comparing deworming to cash transfers or malaria prevention to vitamin-A supplementation, I’m inclined to believe this is something they’ve thought about. That’s not meant to be a knockdown argument against the idea, but if an RCT like this is needed for comparing psychotherapy to cash transfers—why not every other intervention currently recommended in the EA space?
This does seem different, though. When you’re studying whether bednets or vitamin A save lives, there’s no plausible basis for thinking the beneficiary’s knowledge that they are in the treatment group, or the non-effective portions of the experimental situation, could skew results. So it’s fine to use a control group that consists of no intervention. In contrast, when you’re studying a new medication for headache, you very much do not want the treatment group and the control group to know who they are—you want them to believe they are receiving something equally effective. Hence we have placebos.
I see that many of the studies had what you characterize as some form of “care as usual” or a placebo like “HIV education.” I flipped through a few of the linked studies, and I didn’t walk away with an impression that the control group received an intervention that was nearly as immersive—or that would lead participants to think their mental health would benefit—as the psychotherapy intervention. (Although to be fair, most research articles don’t dwell on the control group very much!)
And it seems that placebo “quality” can matter a lot—e.g., this small study, where anti-depressant + supportive care reduced HRSD scores about 10 points, placebo pill + supportive care about 7.5, supportive care only less than 1.5. If you just looked at anti-depressant vs. the weak control of supportive care only, that anti-depressant looks awfully good. Likewise, on immersiveness, sham surgery does a lot better than sham acupuncture, which does a lot better than sham pills, for migraine headache.
So at some point, I think it’s reasonable to ask for an assessment of SM—or a similar program with a similar client population—against a control group that receives an intervention that is both of similar intensity and that study participants believed would likely improve their subjective well-being and/or depression. I hear that HLI doesn’t currently have capacity to fund that, though.
As for the control: I don’t think something like HIV education works—the participants would not expect receiving that to improve their subjective well-being. Cash transfers is an obvious option, but probably not the only one. Pill placebos would work in Western countries, but maybe not in other places. Some sort of religious control-group experience (e.g., eight sessions of prayer vs. eight sessions of SM) would be a controversial active control, but seems potentially plausible if consistent with the cultural beliefs of the study population. Sham psychotherapy seems hard to pull off unless you have highly trained experimenters, but could be an option if you do.
In short, you’re trying to measure an outcome variable that is far more sensitive to these sorts of issues than GiveWell (whose outcome measure is primarily loaded on whether the beneficiaries are less likely to die).
There’s two separate topics here, the one I was discussing in the quoted text was about whether an intra RCT comparison of two interventions was necessary or whether two meta-analyeses of two interventions would be sufficient. The references to GiveWell were not about the control groups they accept, but about their willingness to use meta-analyses instead of RCTs with arms comparing different interventions they suggest.
Another topic is the appropriate control group to compare psychotherapy against. But, I think you make a decent argument that placebo quality could matter. It’s given me some things to think about, thank you.
Thanks, Joel. I agree that an RCT of SM vs cash wouldn’t be useful as a head-to-head comparison of the two interventions. Among other things, “cash transfers to people who report being very depressed” is unlikely to be a scalable intervention anyway—people in the service area would figure out what the “correct” answers were to obtain resources they needed for themselves and their families, and the program would largely turn into “generic cash transfers.”
I think that your idea of sham psychotherapy Jason is a great idea and could well work, although it wouldn’t be ethical unfortunately so couldn’t be done. Thinking of alternatives to cash is a good idea but hard.
I think the purpose of testing Strong minds vs. cash is good not because we are considering giving cash instead to people who are depressed (you are right about it not being able to scale), but instead to see if SM really is better than cash using the before and after subjective question system. If SM squarely beat out cash, it would give me far more confidence that the before and after subjective wellbeing questions can work without a crippling amount of bias, as cash is far more likely than psychotherapy to illicit a positive future hope rating bias.
Would be interested to hear what’s included in your “among other things” that you don’t like about cash vs. Strongminds
I understand the discussion above to be about whether it is necessary or advisible to have a SM arm and a cash arm in the same RCT. One major issue I would have with that design is that (based on what I understand of typical study recruitment) a fair number of people in the SM arm would know what people in the other arm got. I imagine that some people would be rather disappointed once they found that out that the other group got several months’ worth of income and they got lay psychotherapy sessions.
Likewise, if I were running a RCT of alprazolam vs. cognitive-behavioral therapy for panic disorder, I wouldn’t want the CBT arm participants to see how the alprazolam branch was doing after a few weeks. Seeing the quick symptom relief of a benzo in other participants, and realizing they might be experiencing that present relief but for a coin flip, would risk biasing the CBT group.
It’s not obvious to me why concerns about potential crippling bias in subjective well-being questions couldn’t be met with the alternative Joel mentioned, “two high-quality trials run separately about two different interventions but measuring similar outcomes.” If cash creates high bias (and shows the measurement of certain subjective states to be unreliable), it should show this bias in a separate trial as effectively as in a head-to-head in the same RCT. Of course, the outcome measures would need to be similar enough, and the participant population would need to be similar enough.
As far as other factors, I think cost is a potentially significant one—it’s been almost twenty years since I took a graduate research design course (and it was in sociology), but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population. If SM continues to raise money at the rate it did in 2021 (vs. significantly lower funding levels in prior years), my consideration of that factor will diminish.
“but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population” I really like this.
You are right again that two trials would show the bias separately, but doing 2 separate trials loses the key RCT benefits of (almost) removing confounding and bias. Selecting 2 populations for different trials that are comparable is very, very difficult.
My view on whether a cash vs. SM RCT is necessary / worth the money could definitely change based on the results of a good literature review or piggyback.
Nick, thank you for your comment. I always appreciate your friendly tone.
I agree this would be great … but this also seems like a really strict requirement, and I’m not sure it’s necessary. GiveWell seems nonplussed by not having an RCT comparing deworming to cash transfers or malaria prevention to vitamin-A supplementation, I’m inclined to believe this is something they’ve thought about. That’s not meant to be a knockdown argument against the idea, but if an RCT like this is needed for comparing psychotherapy to cash transfers—why not every other intervention currently recommended in the EA space?
More directly, if we have two high-quality trials run separately about two different interventions but measuring similar outcomes—how much better is this than running an RCT with two arms? It certainly reduces differences in confounders (particularly unobserved) between trials. But I think it’s possible it could also have weaknesses.
It seems plausibly more expensive to coordinate two high-quality interventions in a single trial than let them be run separately. For instance, IIRC correctly in Haushofer et al., 2020 the comparison between psychotherapy and GiveDirectly was a bit apples to oranges. For psychotherapy, they hired a local NGO to start a new programme from scratch, which they compared to GiveDirectly, which by that time, was a well-oiled cash-slinging machine. To get two organizations that already know how to deploy the intervention well to collaborate on a single RCT seems difficult and expensive.
It also may have limited generalizability. Running separate trials of charity interventions makes it likelier that the results reflect the circumstances that the charity operates in—unless they have an area of overlap—which is possible, but finding that seems like another reason this could be difficult.
Lastly, regarding the making the dream RCT happen, HLI is currently rather resource constrained, so in our work, we have to make do with the existing literature, and we’re only just now exploring “research advocacy” as an option. Running an RCT would probably cost a multiple of our annual budget. For StrongMinds, they have more resources, but not much more. If it was desired to have another RCT with a psychotherapy and cash arm, I wonder if the GiveDirectly RCT machine may be the most promising way to get that evidence.
I think what may be happening here is I use “studies” as synonymous with “trials”. So in my usage, one paper can analyse multiple studies (or trials). However, on reflection, I realise I sometimes refer to papers as studies—which is unhelpful, therefore, I think it’d be clearer if I referred to each separate intervention experiment as a “trial”. Another thing that may be confusing is that sometimes authors will publish multiple papers in the same year. I distinguish these papers by adding an “a” or “b” etc to the end of the reference.
But if you count all of the different unique “trials”, it does come out to 39.
We’re keen to do this. But the existing meta-analysis is probably 65% of the rigour necessary for an academic paper. This year we are trying to redo the analysis with an academic collaborator so the search will be systematic, the data will be double-screened, and we have many more robustness tests.
(I’ll answer the selection criterion question separately)
This does seem different, though. When you’re studying whether bednets or vitamin A save lives, there’s no plausible basis for thinking the beneficiary’s knowledge that they are in the treatment group, or the non-effective portions of the experimental situation, could skew results. So it’s fine to use a control group that consists of no intervention. In contrast, when you’re studying a new medication for headache, you very much do not want the treatment group and the control group to know who they are—you want them to believe they are receiving something equally effective. Hence we have placebos.
I see that many of the studies had what you characterize as some form of “care as usual” or a placebo like “HIV education.” I flipped through a few of the linked studies, and I didn’t walk away with an impression that the control group received an intervention that was nearly as immersive—or that would lead participants to think their mental health would benefit—as the psychotherapy intervention. (Although to be fair, most research articles don’t dwell on the control group very much!)
And it seems that placebo “quality” can matter a lot—e.g., this small study, where anti-depressant + supportive care reduced HRSD scores about 10 points, placebo pill + supportive care about 7.5, supportive care only less than 1.5. If you just looked at anti-depressant vs. the weak control of supportive care only, that anti-depressant looks awfully good. Likewise, on immersiveness, sham surgery does a lot better than sham acupuncture, which does a lot better than sham pills, for migraine headache.
So at some point, I think it’s reasonable to ask for an assessment of SM—or a similar program with a similar client population—against a control group that receives an intervention that is both of similar intensity and that study participants believed would likely improve their subjective well-being and/or depression. I hear that HLI doesn’t currently have capacity to fund that, though.
As for the control: I don’t think something like HIV education works—the participants would not expect receiving that to improve their subjective well-being. Cash transfers is an obvious option, but probably not the only one. Pill placebos would work in Western countries, but maybe not in other places. Some sort of religious control-group experience (e.g., eight sessions of prayer vs. eight sessions of SM) would be a controversial active control, but seems potentially plausible if consistent with the cultural beliefs of the study population. Sham psychotherapy seems hard to pull off unless you have highly trained experimenters, but could be an option if you do.
In short, you’re trying to measure an outcome variable that is far more sensitive to these sorts of issues than GiveWell (whose outcome measure is primarily loaded on whether the beneficiaries are less likely to die).
There’s two separate topics here, the one I was discussing in the quoted text was about whether an intra RCT comparison of two interventions was necessary or whether two meta-analyeses of two interventions would be sufficient. The references to GiveWell were not about the control groups they accept, but about their willingness to use meta-analyses instead of RCTs with arms comparing different interventions they suggest.
Another topic is the appropriate control group to compare psychotherapy against. But, I think you make a decent argument that placebo quality could matter. It’s given me some things to think about, thank you.
Thanks, Joel. I agree that an RCT of SM vs cash wouldn’t be useful as a head-to-head comparison of the two interventions. Among other things, “cash transfers to people who report being very depressed” is unlikely to be a scalable intervention anyway—people in the service area would figure out what the “correct” answers were to obtain resources they needed for themselves and their families, and the program would largely turn into “generic cash transfers.”
I think that your idea of sham psychotherapy Jason is a great idea and could well work, although it wouldn’t be ethical unfortunately so couldn’t be done. Thinking of alternatives to cash is a good idea but hard.
I think the purpose of testing Strong minds vs. cash is good not because we are considering giving cash instead to people who are depressed (you are right about it not being able to scale), but instead to see if SM really is better than cash using the before and after subjective question system. If SM squarely beat out cash, it would give me far more confidence that the before and after subjective wellbeing questions can work without a crippling amount of bias, as cash is far more likely than psychotherapy to illicit a positive future hope rating bias.
Would be interested to hear what’s included in your “among other things” that you don’t like about cash vs. Strongminds
I understand the discussion above to be about whether it is necessary or advisible to have a SM arm and a cash arm in the same RCT. One major issue I would have with that design is that (based on what I understand of typical study recruitment) a fair number of people in the SM arm would know what people in the other arm got. I imagine that some people would be rather disappointed once they found that out that the other group got several months’ worth of income and they got lay psychotherapy sessions.
Likewise, if I were running a RCT of alprazolam vs. cognitive-behavioral therapy for panic disorder, I wouldn’t want the CBT arm participants to see how the alprazolam branch was doing after a few weeks. Seeing the quick symptom relief of a benzo in other participants, and realizing they might be experiencing that present relief but for a coin flip, would risk biasing the CBT group.
It’s not obvious to me why concerns about potential crippling bias in subjective well-being questions couldn’t be met with the alternative Joel mentioned, “two high-quality trials run separately about two different interventions but measuring similar outcomes.” If cash creates high bias (and shows the measurement of certain subjective states to be unreliable), it should show this bias in a separate trial as effectively as in a head-to-head in the same RCT. Of course, the outcome measures would need to be similar enough, and the participant population would need to be similar enough.
As far as other factors, I think cost is a potentially significant one—it’s been almost twenty years since I took a graduate research design course (and it was in sociology), but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population. If SM continues to raise money at the rate it did in 2021 (vs. significantly lower funding levels in prior years), my consideration of that factor will diminish.
“but it seems a lot cheaper to use existing literature on cash transfers (if appropriate) or to try to piggyback your subjective well-being questions into someone else’s cash-transfer study for an analogous population” I really like this.
You are right again that two trials would show the bias separately, but doing 2 separate trials loses the key RCT benefits of (almost) removing confounding and bias. Selecting 2 populations for different trials that are comparable is very, very difficult.
My view on whether a cash vs. SM RCT is necessary / worth the money could definitely change based on the results of a good literature review or piggyback.