NickLaing comments on Don’t just give well, give WELLBYs: HLI’s 2022 charity recommendation

NickLaing Nov 27, 2022, 10:22 PM
65 points
13 ∶ 1
DISCLAIMER: (perhaps a double edge sword) I’ve lived in Uganda here for 10 years working in Healthcare.

Thanks Michael for all your efforts. I love StrongMinds and am considering donating myself. I run health centers here in Northern Uganda and have thought about getting in touch with you see if we can use something like the Strong minds in the health centers we manage. While working as a doctor here my estimate from experience that for perhaps between 1 in 5 and 1 and 10 of our patients, depression or anxiety is the biggest medical problem in their lives. I feel bad every year that we do nothing at all to help these people.

Point 1

First I read a reply below that seriously doubted that improving depression could have more positive psychological effect than preventing the grief of the death of a child. On this front I think it’s very hard to make a call in either direction, but it seems plausible to me that lifting someone out of depression could have a greater effect in many cases.

Point 2
I however strongly disagree with your statement here about self reporting. Sadly I think it is not a good measure especially as a primary outcome measure.
“Also, what’s wrong with the self-reports? People are self-reporting how they feel. How else should we determine how people feel? Should we just ignore them and assume that we know best? Also, we’re comparing self-reports to other self-reports, so it’s unclear what bias we need to worry about.”

Self reporting doesn’t work because poor people here in Northern Uganda at least are primed to give low marks when reporting how they feel before an intervention, and then high marks afterwards—whether the intervention did anything or not. I have seen it personally here a number of times with fairly useless aid projects. I even asked people one time after a terrible farming training, whether they really thought the training helped as much as they had reported on the piece of paper. A couple of people laughed and said something like “No of course it didn’t help, but if we give high grades we might get more and better help in future”. this is an intelligent and rational response by recipients of aid, as of course good reports of an intervention increase their chances of getting more stuff in future, useful or not.

Dambisa Moyo says it even better in her book “Dead Aid”, but couldn’t find the quote. There might also be good research papers and other effective altruism posts that describe this failing of self reporting better than me so apologies if this is the case.

You also said “Also, we’re comparing self-reports to other self-reports”, which doesn’t help the matter, because those who don’t get help are likely to keep scoring the survey lowly because they feel like they didn’t get help

Because of this I struggle to get behind any assessment that relies on self-reporting, especially in low income countries like Uganda where people are often reliant on aid, and desperate for more. Ironically perhaps I have exactly the same criticism of GiveDirectly. I think that researchers of GiveDirectly should use exclusively (or almost exclusively) objective measures of improved life (hemoglobin levels, kids school grades, weight for height charts, assets at home) rather the before and after surveys they do. To their credit, recent GiveDirectly research seem to be using more objective measures in their effectiveness research.

https://www.givedirectly.org/research-at-give-directly/
We can’t ignore how people feel, but we need to try and find objective ways of assessing it, especially in contexts like here in Uganda where NGOs have wrecked any chance of self reporting being very accurate. I feel like measuring improvement in physical aspects of depression could be a way forward. Just off the top of my head you could measure before and after mental agility scores, which should improve as depression improves, or quality of sleep before and after using a smart watch or phone. Perhaps even you could use continuous body monitoring for a small number of people, as they did here

https://www.vs.inf.ethz.ch/edu/HS2011/CPS/papers/sung05_measures-depression.pdf

Alternatively I’d be VERY interested in a head to head Cash transfer vs Strongminds RCT—should be pretty straightforward , even potentially using your same subjective before and after scores. Surely this would answer some important questions.

A similar comparative RCT was done in Kenya in 2020 of cash transfer vs. Psychotherapy, and the cash transfers clearly came through on top https://www.nber.org/papers/w28106.

Anyway I think Strong minds is a great idea and probably works well to the point I really want to use it myself in our health centers, but I don’t like the way you measure it’s effectiveness and therefore doubt whether it is as effective as stated here.

Thanks for all the good work!
What links here?
- Samuel Dupret's comment on Can we trust wellbeing surveys? A pilot study of comparability, linearity, and neutrality by Conrad S (Mar 23, 2023, 11:53 AM; 11 points)
- bruce's comment on StrongMinds should not be a top-rated charity (yet) by Simon_M (Dec 30, 2022, 7:41 PM; 3 points)
- JoelMcGuire Nov 28, 2022, 9:10 PM
  9 points
  1 ∶ 1
  Parent
  Hi Nick,
  Thank you for sharing! This is valuable to hear. The issue of being primed to respond in a certain way has surprisingly not been explored widely in low-income countries.
  We’re concerned about this, but to our knowledge, the existing evidence suggests this isn’t a large concern. The only study we’ve seen that explicitly tries to measure the impact of this was a module in Haushofer et al., (2020 section III.I) – where they find about a zero effect. If there’s more evidence we don’t know about, please share!
  Here’s the excerpt (they call these effects “experimenter demand effects”).
  You also said “Also, we’re comparing self-reports to other self-reports”, which doesn’t help the matter, because those who don’t get help are likely to keep scoring the survey lowly because they feel like they didn’t get help
  
  Allow me to disagree—I think this could help the matter when we’re comparing between interventions. If this effect was substantial, the big concern would be if this effect differs dramatically between interventions. E.g., the priming effect is larger for psychotherapy than for cash transfers. I.e., it’s okay if the same bias is balanced across interventions. Of course, I’m not sure if it’s plausible for this to be the case—my prior is pretty uniform here.
  All that being said, I totally endorse more research on this topic!
  We can’t ignore how people feel, but we need to try and find objective ways of assessing it, especially in contexts like here in Uganda where NGOs have wrecked any chance of self reporting being very accurate.
  I think this is probably a point where we disagree. A point we’ve expanded upon previously (see McGuire et al., 2022) is that we seriously doubt that we can measure feelings objectively. The only way we can know if that objective measure is capturing a feeling is by asking someone if it does so—so why not just do that?
  I am much more pro “find the flaws and fix them” then “abandon ship when it leaks” regarding measuring subjective wellbeing.
  Alternatively I’d be VERY interested in a head to head Cash transfer vs Strongminds RCT—should be pretty straightforward , even potentially using your same subjective before and after scores. Surely this would answer some important questions.
  
  This may be in the works!
  - NickLaing Nov 29, 2022, 10:53 AM
    13 points
    1 ∶ 0
    Parent
    Thanks so much Joel and I’m stoked by your response. I don’t think I’ve been in a forum where discussion and analysis is this good.
    I’m not sure that having close to no evidence on a major concern about potential validity of before and after surveys should be very reassuring.
    That tiny piece of evidence you cited only looks at the “experimenter demand effect”, which is a concern yes, but not my biggest concern. My largest concern is let’s say the “future hope” effect which I gave the example from in my first reply – where participants rate the effect of interventions more positively than their actual experience, because they CORRECTLY assess that a higher rating may bring them better help in future. That’s what I think is most likely to wreck these before and after surveys.
    I don’t know this field well at all like you so yes it seems likely this is a poorly researched issue . We have experience and anecdotes like those from Dambisa Moyo, me and many others working in the NGO world, that these incentives and vested interests can greatly affect before and after surveys. You recently wrote an article which included (to your credit) 4 disadvantages of the WELLBY. My inclination is (with enormous uncertainty) that these problems I’ve outlined with before and after subjective surveys in low income countries are at least as big a disadvantage to the WELLBY approach as any of the 4 issues you outlined. I agree that SWB is a well validated tool for one time surveys and there are no serious vested interests involved. It’s the before and after surveys in low income countries that are problematic.
    These are the 3 major un-accounted for problems I see with before and after self reporting (first 2 already discussed).
    
    My biggest concern is the “future hope” effect—people positively responding because they correctly believe a positive response is likely to get them more and even better help in future.
    The “ experimenter demand effect” (as you discussed) is if interviewees are primed to give people the answer they think the experimenters want.
    
    A third potential problem (which I haven’t mentioned already but have experienced) is that interviewers manipulate results positively in the direction of the intervention either through the way they ask questions, or even through increasing scores fraudulently. This is again rational as their jobs often depend on the NGO raising more money to support “successful interventions” to keep them employed. I have seen this here a number of times during surveys within my own organisation AND in other organisations, It’s very hard to stop this effect even after talking to researchers about it – I have tried but there was still manipulation present. This is probably a smaller problem than the other two, and easier to control and stamp out.
    
    On the “Also, we’re comparing self-reports to other self-reports” front—I agree this could be be fine if comparing between two interventions (e.g. cash and psychotherapy) and the effect might be similar between interventions. I think though most of the studies that you have listed compare intervention to no intervention, so your point may not stand in these cases.
    I’ll change my mind somewhat in your direction and give you some benefit of the doubt on your point about objective measures not working well for wellbeing assessment, given that I haven’t researched it very well and I’m not an expert. Let’s leave objective measures out of the discussion for the moment!
    
    I love the idea of your RCT cash transfers vs. psychotherapy but I’m confused about a number of parts of the design and have a few questions if you will humour me .
    
    - The study design seems to say you are giving cash only the intervention groups and not the control group? I suspect this is a mistake in the reporting of the study design but please clarify. To compare cash vs psychotherapy would you not give the cash to the whole control group and either not to the intervention group at all, or only a small subsection of the intervention group? I might well have missed something here...
    - Why are you giving money at the end of the intervention rather than at the start? Does that not give less time for the benefits of the cash to take effect.
    - Why are you only giving $50 cash transfer and not around $130 (the cost of the therapy). Would it not would be a fairer comparison to compare like for like in terms of money spent on the intervention?
    
    It seems logical that most RCTs in the effective altruism space now should be intervention vs cash transfer, or at least having 3 arms Intervention vs cash transfer vs nothing. Hopefully I’ve read it wrong and that is the case in this study!
    
    To finish I like positivity and I’ll get alongside your point about fixing the boat rather than dismissing it. I feel like the boat is more than a little leak at this stage, but I hope I’m wrong I love the idea of using before and after subjective wellness measures to assess effectiveness of interventions, I’m just not yet convinced yet it can give very meaningful results based on my issues above.
    
    Thanks so much if you got this far and sorry it’s so long!
    - JoelMcGuire Nov 29, 2022, 6:08 PM
      5 points
      0 ∶ 0
      Parent
      A quick response—I’ll respond in more depth later,
      I love the idea of your RCT cash transfers vs. psychotherapy but I’m confused about a number of parts of the design and have a few questions if you will humour me .
      
      To be clear, the planned RCT has nothing to do with HLI, so I would forward all questions to the authors! :)
      And to be clear when you say “before and after measures”, do you think this applies to RCTs where the measurement of an effect is comparing a control to a treatment group instead of before and after a group receives an intervention?
      - NickLaing Nov 29, 2022, 7:04 PM
        5 points
        1 ∶ 0
        Parent
        Apologies for assuming that the RCT involved HLI—the strong minds involvement lead me to that wrong assumption! They will then be addressing different questions than we are here—unfortunately for us that trial as stated in the protocol doesn’t test cash transfers vs. psychotherapy.
        I don’t quite understand the distinction in your question sorry, will need rephrasing! I’m referencing the problems with any SWB measurement which involves measuring SWB at baseline and then after an intervention. Whether there is a control arm or not.
        Looking forward to hearing more nice one!
        JoelMcGuire Dec 2, 2022, 1:29 AM
        4 points
        0 ∶ 1
        Parent
        Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.
        I’m wondering if this bias your concerned with would be captured by a “placebo” arm of an RCT. Imagine a control group that receives an intervention that we think has no to little effect. If you expect any intervention to activate this “future hope” bias, then we could potentially estimate the extent of this bias with more trials including three arms: a placebo, a receive nothing, and an intervention arm.
        Do you have any ideas on how to elicit this bias experimentally? Could we instruct interviewers to, for a subsample of the people in a trial, explicit something like “any future assistance [ will / will not] depend on the benefit this intervention provided.” Anything verbal like this would be cheapest to test.
        NickLaing Dec 2, 2022, 8:53 AM
        6 points
        1 ∶ 0
        Parent
        Hi Joel—Nice one again
        
        ”Right, our concern is that if this bias exists, it is stronger for one intervention than another. E.g., say psychotherapy is more susceptible than cash transfers. If the bias is balanced across both interventions, then again, not as much of an issue.”
        
        I would have thought the major concern would have been if the bias existed at all, rather than whether it balanced between interventions. Both StrongMinds evidence assessing their own program and most of the studies used in your WELLBY analysis are NOT vs. another intervention, but rather Psychotherapy vs. No intervention. This is often labelled “Normal care”, which is usually nothing in low income countries. So if it exists at all in any magnitude, it will be affecting your results.
        
        Onto your question though, which is still important but I believe of secondary importance
        Your idea of a kind of “fake” placebo arm would work—providing the fake placebo was hyped up as much and taken as seriously as the treatment arm, AND that the study participants really didn’t know it was a placebo. Unfortunately you can’t do this as It’s not ethical to have an RCT with an intervention that you think has no or little effect. So not possible I don’t think
        I like your idea of interviewers in a trial stating for a subset of people that their answers won’t change whether they get more or not n future. I doubt this would mitgate the effect much or at all, but it’s a good idea to try!
        My very strong instinct is that cash transfers would illicit a FAR stronger effect than psychotherapy. It’s hard to imagine anything that would illicit “future hope” more than the possibility of getting cash in future. This seems almost self-evident. Which extremely poor person in their right mind (whether their mental health really is better or not) is going to say that cash didn’t help their mental health if they think it might increase their chance of getting more cash in future?
        Again I think just direct RCTs vs. Cash Transfers is the best way to test your intervention and control for this bias. It’s hard to imagine anything having a larger “future hope” effect than cash. If psychotherapy really beat the cash given that cash will almost certainly have a bigger “future hope” bias than the psychotherapy, then I’d say you’d have a slam dunk case.
        
        I am a bit bewildered that StrongMinds has never done a psychotherapy vs. Cash transfer trial. This is the big question, and you claim that Strongminds therapy produces more WELLBYs than Cash Transfers yet there is no RCT? It looks like that trial you sent me is another painful missed opportunity to do that research as well. Why there is no straight cash transfer arm in that trial doesn’t make any sense.
        As far as I can see the Kenyan trial is the only RCT with Subjective wellness assessment which pits Cash Transfers vs. Psychotherapy (although it was a disproportionately large cash transfer and not the Strongminds style group therapy). If group Psychotherapy beat an equivalent cash Cash transfer in say 2 RCTs I might give up my job and start running group psychotherapy sessions—that would be powerful!
        Barry Grimes Jan 2, 2023, 8:25 PM
        13 points
        0 ∶ 0
        Parent
        Hi Nick. I found more details about the Baird et al. RCT here. I’ve copied the section about the ‘cash alone’ arm below as I know you’ll be interested to read that:
        One of the issues that can cause difficulties in the interpretation of the findings from this study comes from the fact that it does not have a classical 2x2 factorial design that includes a “cash alone” arm. While it would have been ideal, from a study design perspective, to have such a design, which would have enabled us to experimentally reject (or not) that cash alone would have been as effective as IPT-G+. Such benchmarking of the (cost-) effectiveness of a group therapy intervention to cash transfers alone would have been nice, the three-arm trial we designed for our setting, leaving out a “cash-only” arm, is an adequate study design for the following three reasons.
        
        First, the extant evidence on the effects of economic interventions in general, or cash transfers in particular, do not support the idea of improved mental health outcomes past the short-run. For example, in their review, Lund et al. (2011) finds that the mental health effects of poverty alleviation programs were inconclusive. Blattman, Jamison, and Sheridan (2017) find no effects of lump-sum cash transfers alone on mental health outcomes in the short- or the medium-run. Short-term effects of cash transfers (monthly or lump-sum) on psychological wellbeing that were observed in the short-run dissipated a couple of years later (Baird, de Hoop, and Özler 2013; Baird, McIntosh and Özler 2019; Haushofer and Shapiro 2016, 2018). Hence, we do not think that there is sufficient equipoise to include a “cash only” arm when it comes to sustained effects on depression two-years after the end of the intervention.
        
        Second, even if one could make a case that there may be a sustained income effect on mental health – for example, such as those indicated by studies of lottery winners (Gardner and Oswald 2007; Lindahl 2005) – the amounts offered to IPT-G participants here are too small to have this kind of an effect two years after they are transferred. A similar argument has been made in another experiment that lacked a pure unconditional cash transfer (UCT) arm: Benhassine et al. (2015) argue that the labelling of the cash transfer as an education support program increased school participation through its effect on the “...parents’ belief that education was a worthwhile investment,” rather than through a pure income effect, because the transfers were too small to cause the observed effects.
        
        Finally, and related to the point above, one of the aims of our trial is to test the efficacy of a low-cost and scalable intervention through two NGOs that have a track record of implementing programs that are being utilized here (BRAC Uganda and StrongMinds Uganda). While BRAC Uganda is interested in taking advantage of its ELA girls’ clubs’ platform to provide mental health services across Uganda, it does not have any plans to provide UCTs, especially not in transfer sizes that might perhaps have sustained effects. Hence, the lack of evidence on the potential effectiveness of UCTs on sustained reductions in depression, combined with a lack of interest from the implementing partners, resulted in the study team designing a trial that has only three arms. Should the trial show that IPT-G+ is significantly more effective than IPT-G alone in reducing depression in the medium-run, our interpretation will be that there is a complementarity between the two interventions, and not that cash is effective on its own for sustained improvements in psychological wellbeing.
        NickLaing Jan 3, 2023, 6:48 AM
        5 points
        0 ∶ 0
        Parent
        Thanks Barry I tried to find this earlier but couldn’t.
        I find these arguments rather uncompelling. What do you think Barry and Joel? (I wish I could tag people on this forum haha)
        That they feel the need to write 4 paragraphs to defend against this elephant in the room says a lot. The question we are all still asking is how much better (if at all) StrongMinds really is than cash for wellbeing.
        
        My first question is why don’t they reference the 2020 Haushofer study, the only RCT comparing psychotherapy to cash and showing cash is better? https://www.nber.org/papers/w28106
        Second, their equipoise argument is very poor. The control arm should have been BRAC ELA club + cash. Then you keep 3 arms and avoid their straw man 4 arm problem. You would lose nothing in equipoise giving cash to the control arm—I don’t understand the equipoise argument perhaps I’m missing something?
        
        Then third there’s this...
        “Should the trial show that IPT-G+ is significantly more effective than IPT-G alone in reducing depression in the medium-run, our interpretation will be that there is a complementarity between the two interventions, and not that cash is effective on its own for sustained improvements in psychological wellbeing.”
        This is the most telling paragraph. It’s like, we designed our study so that even if we see that cash gives a big boost, we aren’t going to consider the alternative that we don’t like. It seems to me like they are defending poor design post-hoc, rather than that they made good decision made in advance.
        
        The more I see this, the more I suspect that leaving the cash arm out was either a big mistake or an intentional move by the NGOs. What we have now is a million dollar RCT, which doesn’t answer conclusively the most important question we are all asking. This leaves organisations like your HLI having to use substandard data to assess psychotherapy vs. cash because there is no direct gold standard comparison.
        It’s pretty sad that a million dollars will be spent on a study that at best fails to address the elephant in the room (while spending 4 paragraphs explaining why they are not). Other than that the design and reasoning in this study seems fantastic.
- Matt_Lerner Nov 28, 2022, 9:13 PM
  5 points
  2 ∶ 1
  Parent
  Hey Nick, thanks for this very valuable experience-informed comment. I’m curious what you make of the original 2002 RCT that first tested IPT-G in Uganda. When we (at Founders Pledge) looked at StrongMinds (which we currently recommend, in large part on the back of HLI’s research), I was surprised to see that the results from the original RCT lined up closely with the pre/post scores reported by recent program participants.
  Would your take on this result be that participants in the treated group were still basically giving what they saw as socially desirable answers, irrespective of the efficacy of the intervention? It’s true that the control arm in the 2002 RCT did not receive a comparable placebo treatment, so that does seem a reasonable criticism. But if the socially desirability bias is so strong as to account for the massive effect size reported in the 2002 paper, I’d expect it to appear in the NBER paper you cite, which also featured a pure control group. But that paper seems to find no effect of psychotherapy alone.
  - NickLaing Nov 29, 2022, 11:23 AM
    2 points
    0 ∶ 0
    Parent
    Matt these are fantastic questions that I definitely don’t have great answers to, but here are a few thoughts.
    
    First I’m not saying at all that the Strong minds intervention is likely useless—I think it is likely very useful. Just that the positive effects may well be grossly overstated for the reasons outlined above.
    My take on the result of that original 2002 RCT and Strong Minds. Yes like you say in both cases it could well be that the treatment group are giving positive answers both to appease the interviewer (Incredibly the before and after interviews were done by the same researcher in that study which is deeply problematic!) and because they may have been hoping positive responses might provide them with further future help.
    Also in most of these studies, participants are given something physical for being part of the intervention groups. Perhaps small allowances for completing interviews, or tea and biscuits during the sessions. These tiny physical incentives can be more appreciated than the actual intervention. Knowing World Vision this would almost certainly be the case
    
    I have an immense mistrust of World vision for a whole range of reasons, who were heavily involved in that famous 2002 RCT. This is due to their misleading advertising and a number of shocking experiences of their work here in Northern Uganda which I won’t expand on. I even wrote a blog about this a few years ago, encouraging people not to give them money. I know this may be a poor reason to mistrust a study but my previous experience heavily biases me all the same.
    
    Great point about the NBER paper which featured a pure control group. First it was a different intervention—individual CBT not group therapy.
    
    Second it feels like the Kenyan study was more dispassionate than some of the other big ones. I might be wrong but a bunch of the other RCTs are partly led and operated by organisations with something to prove. I did like that the Kenyan RCT felt less likely to be biased as there didn’t seem to be as much of an agenda as with some other studies.
    
    Third, the Kenyan study didn’t pre-select people with depression, the intervention was performed on people randomly selected from the population. Obviously this means you are comparing different situations when comparing this to the studies with group psychotherapy for people with depression.
    
    Finally allow me to speculate with enormous uncertainty. I suspect having the huge 1000 dollar cash transfers involved really changed the game here. ALL participants would have known for sure that some people people were getting the cash and this would have changed dynamics a lot. One outcome could have been that other people getting a wad of cash might have devalued the psychotherapy in participants eyes. Smart participants may even have decided the were more likely to get cash in future if they played down the effect of the therapy. Or even more extreme the confounding could go in the opposite direction of other studies, if participants assigned to psychotherapy undervalued a potentially positive intervention, out of disappointment at not getting the cash in hand. Again really just summising, but never underrate the connectivity and intelligence of people in villages in ths region!