Thank you for sharing! This is valuable to hear. The issue of being primed to respond in a certain way has surprisingly not been explored widely in low-income countries.
We’re concerned about this, but to our knowledge, the existing evidence suggests this isn’t a large concern. The only study we’ve seen that explicitly tries to measure the impact of this was a module in Haushofer et al., (2020 section III.I) – where they find about a zero effect. If there’s more evidence we don’t know about, please share!
Here’s the excerpt (they call these effects “experimenter demand effects”).
You also said “Also, we’re comparing self-reports to other self-reports”, which doesn’t help the matter, because those who don’t get help are likely to keep scoring the survey lowly because they feel like they didn’t get help
Allow me to disagree—I think this could help the matter when we’re comparing between interventions. If this effect was substantial, the big concern would be if this effect differs dramatically between interventions. E.g., the priming effect is larger for psychotherapy than for cash transfers. I.e., it’s okay if the same bias is balanced across interventions. Of course, I’m not sure if it’s plausible for this to be the case—my prior is pretty uniform here.
All that being said, I totally endorse more research on this topic!
We can’t ignore how people feel, but we need to try and find objective ways of assessing it, especially in contexts like here in Uganda where NGOs have wrecked any chance of self reporting being very accurate.
I think this is probably a point where we disagree. A point we’ve expanded upon previously (see McGuire et al., 2022) is that we seriously doubt that we can measure feelings objectively. The only way we can know if that objective measure is capturing a feeling is by asking someone if it does so—so why not just do that?
I am much more pro “find the flaws and fix them” then “abandon ship when it leaks” regarding measuring subjective wellbeing.
Alternatively I’d be VERY interested in a head to head Cash transfer vs Strongminds RCT—should be pretty straightforward , even potentially using your same subjective before and after scores. Surely this would answer some important questions.
Thanks so much Joel and I’m stoked by your response. I don’t think I’ve been in a forum where discussion and analysis is this good.
I’m not sure that having close to no evidence on a major concern about potential validity of before and after surveys should be very reassuring.
That tiny piece of evidence you cited only looks at the “experimenter demand effect”, which is a concern yes, but not my biggest concern. My largest concern is let’s say the “future hope” effect which I gave the example from in my first reply – where participants rate the effect of interventions more positively than their actual experience, because they CORRECTLY assess that a higher rating may bring them better help in future. That’s what I think is most likely to wreck these before and after surveys.
I don’t know this field well at all like you so yes it seems likely this is a poorly researched issue . We have experience and anecdotes like those from Dambisa Moyo, me and many others working in the NGO world, that these incentives and vested interests can greatly affect before and after surveys. You recently wrote an article which included (to your credit) 4 disadvantages of the WELLBY. My inclination is (with enormous uncertainty) that these problems I’ve outlined with before and after subjective surveys in low income countries are at least as big a disadvantage to the WELLBY approach as any of the 4 issues you outlined. I agree that SWB is a well validated tool for one time surveys and there are no serious vested interests involved. It’s the before and after surveys in low income countries that are problematic.
These are the 3 major un-accounted for problems I see with before and after self reporting (first 2 already discussed).
My biggest concern is the “future hope” effect—people positively responding because they correctly believe a positive response is likely to get them more and even better help in future.
The “ experimenter demand effect” (as you discussed) is if interviewees are primed to give people the answer they think the experimenters want.
A third potential problem (which I haven’t mentioned already but have experienced) is that interviewers manipulate results positively in the direction of the intervention either through the way they ask questions, or even through increasing scores fraudulently. This is again rational as their jobs often depend on the NGO raising more money to support “successful interventions” to keep them employed. I have seen this here a number of times during surveys within my own organisation AND in other organisations, It’s very hard to stop this effect even after talking to researchers about it – I have tried but there was still manipulation present. This is probably a smaller problem than the other two, and easier to control and stamp out.
On the “Also, we’re comparing self-reports to other self-reports” front—I agree this could be be fine if comparing between two interventions (e.g. cash and psychotherapy) and the effect might be similar between interventions. I think though most of the studies that you have listed compare intervention to no intervention, so your point may not stand in these cases.
I’ll change my mind somewhat in your direction and give you some benefit of the doubt on your point about objective measures not working well for wellbeing assessment, given that I haven’t researched it very well and I’m not an expert. Let’s leave objective measures out of the discussion for the moment!
I love the idea of your RCT cash transfers vs. psychotherapy but I’m confused about a number of parts of the design and have a few questions if you will humour me .
- The study design seems to say you are giving cash only the intervention groups and not the control group? I suspect this is a mistake in the reporting of the study design but please clarify. To compare cash vs psychotherapy would you not give the cash to the whole control group and either not to the intervention group at all, or only a small subsection of the intervention group? I might well have missed something here... - Why are you giving money at the end of the intervention rather than at the start? Does that not give less time for the benefits of the cash to take effect. - Why are you only giving $50 cash transfer and not around $130 (the cost of the therapy). Would it not would be a fairer comparison to compare like for like in terms of money spent on the intervention?
It seems logical that most RCTs in the effective altruism space now should be intervention vs cash transfer, or at least having 3 arms Intervention vs cash transfer vs nothing. Hopefully I’ve read it wrong and that is the case in this study!
To finish I like positivity and I’ll get alongside your point about fixing the boat rather than dismissing it. I feel like the boat is more than a little leak at this stage, but I hope I’m wrong I love the idea of using before and after subjective wellness measures to assess effectiveness of interventions, I’m just not yet convinced yet it can give very meaningful results based on my issues above.
Thanks so much if you got this far and sorry it’s so long!
A quick response—I’ll respond in more depth later,
I love the idea of your RCT cash transfers vs. psychotherapy but I’m confused about a number of parts of the design and have a few questions if you will humour me .
To be clear, the planned RCT has nothing to do with HLI, so I would forward all questions to the authors! :)
And to be clear when you say “before and after measures”, do you think this applies to RCTs where the measurement of an effect is comparing a control to a treatment group instead of before and after a group receives an intervention?
Apologies for assuming that the RCT involved HLI—the strong minds involvement lead me to that wrong assumption! They will then be addressing different questions than we are here—unfortunately for us that trial as stated in the protocol doesn’t test cash transfers vs. psychotherapy.
I don’t quite understand the distinction in your question sorry, will need rephrasing! I’m referencing the problems with any SWB measurement which involves measuring SWB at baseline and then after an intervention. Whether there is a control arm or not.
Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.
I’m wondering if this bias your concerned with would be captured by a “placebo” arm of an RCT. Imagine a control group that receives an intervention that we think has no to little effect. If you expect any intervention to activate this “future hope” bias, then we could potentially estimate the extent of this bias with more trials including three arms: a placebo, a receive nothing, and an intervention arm.
Do you have any ideas on how to elicit this bias experimentally? Could we instruct interviewers to, for a subsample of the people in a trial, explicit something like “any future assistance [ will / will not] depend on the benefit this intervention provided.” Anything verbal like this would be cheapest to test.
”Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.”
I would have thought the major concern would have been if the bias existed at all, rather than whether it balanced between interventions. Both StrongMinds evidence assessing their own program and most of the studies used in your WELLBY analysis are NOT vs. another intervention, but rather Psychotherapy vs. No intervention. This is often labelled “Normal care”, which is usually nothing in low income countries. So if it exists at all in any magnitude, it will be affecting your results.
Onto your question though, which is still important but I believe of secondary importance
Your idea of a kind of “fake” placebo arm would work—providing the fake placebo was hyped up as much and taken as seriously as the treatment arm, AND that the study participants really didn’t know it was a placebo. Unfortunately you can’t do this as It’s not ethical to have an RCT with an intervention that you think has no or little effect. So not possible I don’t think
I like your idea of interviewers in a trial stating for a subset of people that their answers won’t change whether they get more or not n future. I doubt this would mitgate the effect much or at all, but it’s a good idea to try!
My very strong instinct is that cash transfers would illicit a FAR stronger effect than psychotherapy. It’s hard to imagine anything that would illicit “future hope” more than the possibility of getting cash in future. This seems almost self-evident. Which extremely poor person in their right mind (whether their mental health really is better or not) is going to say that cash didn’t help their mental health if they think it might increase their chance of getting more cash in future?
Again I think just direct RCTs vs. Cash Transfers is the best way to test your intervention and control for this bias. It’s hard to imagine anything having a larger “future hope” effect than cash. If psychotherapy really beat the cash given that cash will almost certainly have a bigger “future hope” bias than the psychotherapy, then I’d say you’d have a slam dunk case.
I am a bit bewildered that StrongMinds has never done a psychotherapy vs. Cash transfer trial. This is the big question, and you claim that Strongminds therapy produces more WELLBYs than Cash Transfers yet there is no RCT?It looks like that trial you sent me is another painful missed opportunity to do that research as well. Why there is no straight cash transfer arm in that trial doesn’t make any sense.
As far as I can see the Kenyan trial is the only RCT with Subjective wellness assessment which pits Cash Transfers vs. Psychotherapy (although it was a disproportionately large cash transfer and not the Strongminds style group therapy). If group Psychotherapy beat an equivalent cash Cash transfer in say 2 RCTs I might give up my job and start running group psychotherapy sessions—that would be powerful!
Hi Nick. I found more details about the Baird et al. RCT here. I’ve copied the section about the ‘cash alone’ arm below as I know you’ll be interested to read that:
One of the issues that can cause difficulties in the interpretation of the findings from this study comes from the fact that it does not have a classical 2x2 factorial design that includes a “cash alone” arm. While it would have been ideal, from a study design perspective, to have such a design, which would have enabled us to experimentally reject (or not) that cash alone would have been as effective as IPT-G+. Such benchmarking of the (cost-) effectiveness of a group therapy intervention to cash transfers alone would have been nice, the three-arm trial we designed for our setting, leaving out a “cash-only” arm, is an adequate study design for the following three reasons.
First, the extant evidence on the effects of economic interventions in general, or cash transfers in particular, do not support the idea of improved mental health outcomes past the short-run. For example, in their review, Lund et al. (2011) finds that the mental health effects of poverty alleviation programs were inconclusive. Blattman, Jamison, and Sheridan (2017) find no effects of lump-sum cash transfers alone on mental health outcomes in the short- or the medium-run. Short-term effects of cash transfers (monthly or lump-sum) on psychological wellbeing that were observed in the short-run dissipated a couple of years later (Baird, de Hoop, and Özler 2013; Baird, McIntosh and Özler 2019; Haushofer and Shapiro 2016, 2018). Hence, we do not think that there is sufficient equipoise to include a “cash only” arm when it comes to sustained effects on depression two-years after the end of the intervention.
Second, even if one could make a case that there may be a sustained income effect on mental health – for example, such as those indicated by studies of lottery winners (Gardner and Oswald 2007; Lindahl 2005) – the amounts offered to IPT-G participants here are too small to have this kind of an effect two years after they are transferred. A similar argument has been made in another experiment that lacked a pure unconditional cash transfer (UCT) arm: Benhassine et al. (2015) argue that the labelling of the cash transfer as an education support program increased school participation through its effect on the “...parents’ belief that education was a worthwhile investment,” rather than through a pure income effect, because the transfers were too small to cause the observed effects.
Finally, and related to the point above, one of the aims of our trial is to test the efficacy of a low-cost and scalable intervention through two NGOs that have a track record of implementing programs that are being utilized here (BRAC Uganda and StrongMinds Uganda). While BRAC Uganda is interested in taking advantage of its ELA girls’ clubs’ platform to provide mental health services across Uganda, it does not have any plans to provide UCTs, especially not in transfer sizes that might perhaps have sustained effects. Hence, the lack of evidence on the potential effectiveness of UCTs on sustained reductions in depression, combined with a lack of interest from the implementing partners, resulted in the study team designing a trial that has only three arms. Should the trial show that IPT-G+ is significantly more effective than IPT-G alone in reducing depression in the medium-run, our interpretation will be that there is a complementarity between the two interventions, and not that cash is effective on its own for sustained improvements in psychological wellbeing.
Thanks Barry I tried to find this earlier but couldn’t.
I find these arguments rather uncompelling. What do you think Barry and Joel? (I wish I could tag people on this forum haha)
That they feel the need to write 4 paragraphs to defend against this elephant in the room says a lot. The question we are all still asking is how much better (if at all) StrongMinds really is than cash for wellbeing.
My first question is why don’t they reference the 2020 Haushofer study, the only RCT comparing psychotherapy to cash and showing cash is better? https://www.nber.org/papers/w28106
Second, their equipoise argument is very poor. The control arm should have been BRAC ELA club + cash. Then you keep 3 arms and avoid their straw man 4 arm problem. You would lose nothing in equipoise giving cash to the control arm—I don’t understand the equipoise argument perhaps I’m missing something?
Then third there’s this...
“Should the trial show that IPT-G+ is significantly more effective than IPT-G alone in reducing depression in the medium-run, our interpretation will be that there is a complementarity between the two interventions, and not that cash is effective on its own for sustained improvements in psychological wellbeing.”
This is the most telling paragraph. It’s like, we designed our study so that even if we see that cash gives a big boost, we aren’t going to consider the alternative that we don’t like. It seems to me like they are defending poor design post-hoc, rather than that they made good decision made in advance.
The more I see this, the more I suspect that leaving the cash arm out was either a big mistake or an intentional move by the NGOs. What we have now is a million dollar RCT, which doesn’t answer conclusively the most important question we are all asking. This leaves organisations like your HLI having to use substandard data to assess psychotherapy vs. cash because there is no direct gold standard comparison.
It’s pretty sad that a million dollars will be spent on a study that at best fails to address the elephant in the room (while spending 4 paragraphs explaining why they are not). Other than that the design and reasoning in this study seems fantastic.
Hi Nick,
Thank you for sharing! This is valuable to hear. The issue of being primed to respond in a certain way has surprisingly not been explored widely in low-income countries.
We’re concerned about this, but to our knowledge, the existing evidence suggests this isn’t a large concern. The only study we’ve seen that explicitly tries to measure the impact of this was a module in Haushofer et al., (2020 section III.I) – where they find about a zero effect. If there’s more evidence we don’t know about, please share!
Here’s the excerpt (they call these effects “experimenter demand effects”).
Allow me to disagree—I think this could help the matter when we’re comparing between interventions. If this effect was substantial, the big concern would be if this effect differs dramatically between interventions. E.g., the priming effect is larger for psychotherapy than for cash transfers. I.e., it’s okay if the same bias is balanced across interventions. Of course, I’m not sure if it’s plausible for this to be the case—my prior is pretty uniform here.
All that being said, I totally endorse more research on this topic!
I think this is probably a point where we disagree. A point we’ve expanded upon previously (see McGuire et al., 2022) is that we seriously doubt that we can measure feelings objectively. The only way we can know if that objective measure is capturing a feeling is by asking someone if it does so—so why not just do that?
I am much more pro “find the flaws and fix them” then “abandon ship when it leaks” regarding measuring subjective wellbeing.
This may be in the works!
Thanks so much Joel and I’m stoked by your response. I don’t think I’ve been in a forum where discussion and analysis is this good.
I’m not sure that having close to no evidence on a major concern about potential validity of before and after surveys should be very reassuring.
That tiny piece of evidence you cited only looks at the “experimenter demand effect”, which is a concern yes, but not my biggest concern. My largest concern is let’s say the “future hope” effect which I gave the example from in my first reply – where participants rate the effect of interventions more positively than their actual experience, because they CORRECTLY assess that a higher rating may bring them better help in future. That’s what I think is most likely to wreck these before and after surveys.
I don’t know this field well at all like you so yes it seems likely this is a poorly researched issue . We have experience and anecdotes like those from Dambisa Moyo, me and many others working in the NGO world, that these incentives and vested interests can greatly affect before and after surveys. You recently wrote an article which included (to your credit) 4 disadvantages of the WELLBY. My inclination is (with enormous uncertainty) that these problems I’ve outlined with before and after subjective surveys in low income countries are at least as big a disadvantage to the WELLBY approach as any of the 4 issues you outlined. I agree that SWB is a well validated tool for one time surveys and there are no serious vested interests involved. It’s the before and after surveys in low income countries that are problematic.
These are the 3 major un-accounted for problems I see with before and after self reporting (first 2 already discussed).
My biggest concern is the “future hope” effect—people positively responding because they correctly believe a positive response is likely to get them more and even better help in future.
The “ experimenter demand effect” (as you discussed) is if interviewees are primed to give people the answer they think the experimenters want.
A third potential problem (which I haven’t mentioned already but have experienced) is that interviewers manipulate results positively in the direction of the intervention either through the way they ask questions, or even through increasing scores fraudulently. This is again rational as their jobs often depend on the NGO raising more money to support “successful interventions” to keep them employed. I have seen this here a number of times during surveys within my own organisation AND in other organisations, It’s very hard to stop this effect even after talking to researchers about it – I have tried but there was still manipulation present. This is probably a smaller problem than the other two, and easier to control and stamp out.
On the “Also, we’re comparing self-reports to other self-reports” front—I agree this could be be fine if comparing between two interventions (e.g. cash and psychotherapy) and the effect might be similar between interventions. I think though most of the studies that you have listed compare intervention to no intervention, so your point may not stand in these cases.
I’ll change my mind somewhat in your direction and give you some benefit of the doubt on your point about objective measures not working well for wellbeing assessment, given that I haven’t researched it very well and I’m not an expert. Let’s leave objective measures out of the discussion for the moment!
I love the idea of your RCT cash transfers vs. psychotherapy but I’m confused about a number of parts of the design and have a few questions if you will humour me .
- The study design seems to say you are giving cash only the intervention groups and not the control group? I suspect this is a mistake in the reporting of the study design but please clarify. To compare cash vs psychotherapy would you not give the cash to the whole control group and either not to the intervention group at all, or only a small subsection of the intervention group? I might well have missed something here...
- Why are you giving money at the end of the intervention rather than at the start? Does that not give less time for the benefits of the cash to take effect.
- Why are you only giving $50 cash transfer and not around $130 (the cost of the therapy). Would it not would be a fairer comparison to compare like for like in terms of money spent on the intervention?
It seems logical that most RCTs in the effective altruism space now should be intervention vs cash transfer, or at least having 3 arms Intervention vs cash transfer vs nothing. Hopefully I’ve read it wrong and that is the case in this study!
To finish I like positivity and I’ll get alongside your point about fixing the boat rather than dismissing it. I feel like the boat is more than a little leak at this stage, but I hope I’m wrong I love the idea of using before and after subjective wellness measures to assess effectiveness of interventions, I’m just not yet convinced yet it can give very meaningful results based on my issues above.
Thanks so much if you got this far and sorry it’s so long!
A quick response—I’ll respond in more depth later,
To be clear, the planned RCT has nothing to do with HLI, so I would forward all questions to the authors! :)
And to be clear when you say “before and after measures”, do you think this applies to RCTs where the measurement of an effect is comparing a control to a treatment group instead of before and after a group receives an intervention?
Apologies for assuming that the RCT involved HLI—the strong minds involvement lead me to that wrong assumption! They will then be addressing different questions than we are here—unfortunately for us that trial as stated in the protocol doesn’t test cash transfers vs. psychotherapy.
I don’t quite understand the distinction in your question sorry, will need rephrasing! I’m referencing the problems with any SWB measurement which involves measuring SWB at baseline and then after an intervention. Whether there is a control arm or not.
Looking forward to hearing more nice one!
Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.
I’m wondering if this bias your concerned with would be captured by a “placebo” arm of an RCT. Imagine a control group that receives an intervention that we think has no to little effect. If you expect any intervention to activate this “future hope” bias, then we could potentially estimate the extent of this bias with more trials including three arms: a placebo, a receive nothing, and an intervention arm.
Do you have any ideas on how to elicit this bias experimentally? Could we instruct interviewers to, for a subsample of the people in a trial, explicit something like “any future assistance [ will / will not] depend on the benefit this intervention provided.” Anything verbal like this would be cheapest to test.
Hi Joel—Nice one again
”Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.”
I would have thought the major concern would have been if the bias existed at all, rather than whether it balanced between interventions. Both StrongMinds evidence assessing their own program and most of the studies used in your WELLBY analysis are NOT vs. another intervention, but rather Psychotherapy vs. No intervention. This is often labelled “Normal care”, which is usually nothing in low income countries. So if it exists at all in any magnitude, it will be affecting your results.
Onto your question though, which is still important but I believe of secondary importance
Your idea of a kind of “fake” placebo arm would work—providing the fake placebo was hyped up as much and taken as seriously as the treatment arm, AND that the study participants really didn’t know it was a placebo. Unfortunately you can’t do this as It’s not ethical to have an RCT with an intervention that you think has no or little effect. So not possible I don’t think
I like your idea of interviewers in a trial stating for a subset of people that their answers won’t change whether they get more or not n future. I doubt this would mitgate the effect much or at all, but it’s a good idea to try!
My very strong instinct is that cash transfers would illicit a FAR stronger effect than psychotherapy. It’s hard to imagine anything that would illicit “future hope” more than the possibility of getting cash in future. This seems almost self-evident. Which extremely poor person in their right mind (whether their mental health really is better or not) is going to say that cash didn’t help their mental health if they think it might increase their chance of getting more cash in future?
Again I think just direct RCTs vs. Cash Transfers is the best way to test your intervention and control for this bias. It’s hard to imagine anything having a larger “future hope” effect than cash. If psychotherapy really beat the cash given that cash will almost certainly have a bigger “future hope” bias than the psychotherapy, then I’d say you’d have a slam dunk case.
I am a bit bewildered that StrongMinds has never done a psychotherapy vs. Cash transfer trial. This is the big question, and you claim that Strongminds therapy produces more WELLBYs than Cash Transfers yet there is no RCT? It looks like that trial you sent me is another painful missed opportunity to do that research as well. Why there is no straight cash transfer arm in that trial doesn’t make any sense.
As far as I can see the Kenyan trial is the only RCT with Subjective wellness assessment which pits Cash Transfers vs. Psychotherapy (although it was a disproportionately large cash transfer and not the Strongminds style group therapy). If group Psychotherapy beat an equivalent cash Cash transfer in say 2 RCTs I might give up my job and start running group psychotherapy sessions—that would be powerful!
Hi Nick. I found more details about the Baird et al. RCT here. I’ve copied the section about the ‘cash alone’ arm below as I know you’ll be interested to read that:
Thanks Barry I tried to find this earlier but couldn’t.
I find these arguments rather uncompelling. What do you think Barry and Joel? (I wish I could tag people on this forum haha)
That they feel the need to write 4 paragraphs to defend against this elephant in the room says a lot. The question we are all still asking is how much better (if at all) StrongMinds really is than cash for wellbeing.
My first question is why don’t they reference the 2020 Haushofer study, the only RCT comparing psychotherapy to cash and showing cash is better? https://www.nber.org/papers/w28106
Second, their equipoise argument is very poor. The control arm should have been BRAC ELA club + cash. Then you keep 3 arms and avoid their straw man 4 arm problem. You would lose nothing in equipoise giving cash to the control arm—I don’t understand the equipoise argument perhaps I’m missing something?
Then third there’s this...
“Should the trial show that IPT-G+ is significantly more effective than IPT-G alone in reducing depression in the medium-run, our interpretation will be that there is a complementarity between the two interventions, and not that cash is effective on its own for sustained improvements in psychological wellbeing.”
This is the most telling paragraph. It’s like, we designed our study so that even if we see that cash gives a big boost, we aren’t going to consider the alternative that we don’t like. It seems to me like they are defending poor design post-hoc, rather than that they made good decision made in advance.
The more I see this, the more I suspect that leaving the cash arm out was either a big mistake or an intentional move by the NGOs. What we have now is a million dollar RCT, which doesn’t answer conclusively the most important question we are all asking. This leaves organisations like your HLI having to use substandard data to assess psychotherapy vs. cash because there is no direct gold standard comparison.
It’s pretty sad that a million dollars will be spent on a study that at best fails to address the elephant in the room (while spending 4 paragraphs explaining why they are not). Other than that the design and reasoning in this study seems fantastic.