Thanks—I had looked at the HLI research and I do have a bunch of issues with the analysis (both presentation and research). My biggest issue at the moment is I can’t join up the dots between:
“a universal metric called wellbeing-adjusted life years (WELLBYs). One WELLBY is equivalent to a 1-point increase on a 0-10 life satisfaction scale for one year” (here)
“First, we define a ΔWELLBY to denote a one SD change in wellbeing lasting for one year” (Appendix D here)
In all the HLI research, everything seems to be calculated in the latter terms, which isn’t something meaningful at all (to the best of my understanding). The standard deviations you are using aren’t some global “variance in subjective well-being” but a the sample variance of subjective well-being which going to be materially lower. It’s also not clear to me that this is even a meaningful quantity. Especially when your metric for subjective well-being is a mental health survey in which a mentally healthy person in San Franscisco would answer the same as a mentally healthy person in the most acute poverty.
Hi Simon, I’m one of the authors of HLI’s cost-effectiveness analysis of psychotherapy and StrongMinds. I’ll be able to engage more when I return from vacation next week.
I see why there could be some confusion there. Regarding the two specifications of WELLBYs, the latter was unique to that appendix, and we consider the first specification to be conventional. In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
Regarding whether these changes are “meaningful at all”, -- it’s unclear what you mean. Which of the following are you concerned with?
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
Or is it something else I haven’t articulated?
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
This is exactly what confused me. In all the analytical pieces (and places linked to in the reports defining WELLBY on the 0-10 scale) you use SD but then there’s a chart which uses WELLBY and I couldn’t find where you convert from one to another.
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
I think this is a very reasonable way to do meta-analyses
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Yes. This is exactly my confusion, specifically:
A potential issue with using SD changes is that the mental health (MH) scores for recipients of different programmes might have different size standard deviations – e.g. SD could be 15 for cash transfers and 20 for psychotherapy, on a given mental health scale. We currently do not have much evidence on this. If we had more time we would test and adjust for any bias stemming from differences in variances of psychological distress between intervention samples by comparing the average SD for equivalent measures across intervention samples
In the absence of evidence my prior is very strong that a group of people selected to have a certain level of depression is going to have a lower SD than a group of randomly sampled people. This is exactly my confusion. Furthermore, I would expect the SD of “generally healthy people” to be quite low and interventions to have low impact. For example, giving a health person an PS5 for Christmas might massively boost their subjective well-being, but probably doen’t do much for mental health. (This is related to your third point, but is more about the magnitude of changes I’d expect to see rather than anything else)
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
So I also have issues with this, although it’s not the specific issue I’m raising here.
Or is it something else I haven’t articulated?
Nope—it’s pretty much exactly point 2.
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
Well, my contention is subjective wellbeing analyses shouldn’t be the sole basis for evaluation (but again, that’s probably a separate point).
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
Thanks! I’ve (hopefully) signed up to speak to you tomorrow
Thanks—I had looked at the HLI research and I do have a bunch of issues with the analysis (both presentation and research). My biggest issue at the moment is I can’t join up the dots between:
“a universal metric called wellbeing-adjusted life years (WELLBYs). One WELLBY is equivalent to a 1-point increase on a 0-10 life satisfaction scale for one year” (here)
“First, we define a ΔWELLBY to denote a one SD change in wellbeing lasting for one year” (Appendix D here)
In all the HLI research, everything seems to be calculated in the latter terms, which isn’t something meaningful at all (to the best of my understanding). The standard deviations you are using aren’t some global “variance in subjective well-being” but a the sample variance of subjective well-being which going to be materially lower. It’s also not clear to me that this is even a meaningful quantity. Especially when your metric for subjective well-being is a mental health survey in which a mentally healthy person in San Franscisco would answer the same as a mentally healthy person in the most acute poverty.
Hi Simon, I’m one of the authors of HLI’s cost-effectiveness analysis of psychotherapy and StrongMinds. I’ll be able to engage more when I return from vacation next week.
I see why there could be some confusion there. Regarding the two specifications of WELLBYs, the latter was unique to that appendix, and we consider the first specification to be conventional. In an attempt to avoid this confusion, we denoted all the effects as changes in ‘SDs’ or ‘SD-years’ of subjective wellbeing / affective mental health in all the reports (1,2,3,4,5) that were direct results in the intervention comparison.
Regarding whether these changes are “meaningful at all”, -- it’s unclear what you mean. Which of the following are you concerned with?
That standard deviation differences (I.e., Cohen’s d or Hedges g effect sizes) are reasonable ways to do meta-analyses?
Or is your concern more that even if SDs are reasonable for meta-analyses, they aren’t appropriate for comparing the effectiveness of interventions? We flag some possible concerns in Section 7 of the psychotherapy report. But we haven’t found sufficient evidence after several shallow dives to change our minds.
Or, you may be concerned that similar changes in subjective wellbeing and affective mental health don’t represent similar changes in wellbeing? (We discuss this in Appendix A of the psychotherapy report).
Or is it something else I haven’t articulated?
Most of these issues are technical, and we recognise that our views could change with further work. However, we aren’t convinced there’s a ready-to-use method that is a better alternative for use with subjective wellbeing analyses.
I also welcome further explanation of your issues with our analysis, public or private. If you’d like to have low stakes chat about our work, you can schedule a time here. If that doesn’t work, email or message me, and we can make something work.
This is exactly what confused me. In all the analytical pieces (and places linked to in the reports defining WELLBY on the 0-10 scale) you use SD but then there’s a chart which uses WELLBY and I couldn’t find where you convert from one to another.
I think this is a very reasonable way to do meta-analyses
Yes. This is exactly my confusion, specifically:
In the absence of evidence my prior is very strong that a group of people selected to have a certain level of depression is going to have a lower SD than a group of randomly sampled people. This is exactly my confusion. Furthermore, I would expect the SD of “generally healthy people” to be quite low and interventions to have low impact. For example, giving a health person an PS5 for Christmas might massively boost their subjective well-being, but probably doen’t do much for mental health. (This is related to your third point, but is more about the magnitude of changes I’d expect to see rather than anything else)
So I also have issues with this, although it’s not the specific issue I’m raising here.
Nope—it’s pretty much exactly point 2.
Well, my contention is subjective wellbeing analyses shouldn’t be the sole basis for evaluation (but again, that’s probably a separate point).
Thanks! I’ve (hopefully) signed up to speak to you tomorrow