To expand a little on “this seems implausible”: I feel like there is probably a mistake somewhere in the notion that anyone involves thinks that <doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect.>
The mistake might be in your interpretation of HLI’s document (it does look like the 1.3 figure is a small part of some more complicated calculation regarding the economic impacts of AMF and their effect on well being, rather than intended as a headline finding about the cash to well being conversion rate). Or it could be that HLI has an error or has inconsistencies between reports. Or it could be that it’s not valid to apply that 1.3 number to “income doubling” SoGive weights for some reason because it doesn’t actually refer to the WELLBY value of doubling.
I’m not sure exactly where the mistake is, so it’s quite possible that you’re right, or that we are both missing something about how the math behind this works which causes this to work out, but I’m suspicious because it doesn’t really fit together with various other pieces of information that I know. For instance - it doesn’t really square with how HLI reported Psychotherapy is 9x GiveDirectly when the cost of treating one person with therapy is around $80, or how they estimated that it took $1000 worth of cash transfers to produce 0.92 SDs-years of subjective-well-being improvement (“totally curing just one case of severe depression for a year” should correspond to something more like 2-5 SD-years).
I wish I could give you a clearer “ah, here is where i think the mistake is” or perhaps a “oh, you’re right after all” but I too am finding the linked analysis a little hard to follow and am a bit short on time (ironically, because I’m trying to publish a different piece of Strongminds analysis before a deadline). Maybe one of the things we can talk about once we schedule a call is how you calculated this and whether it works? Or maybe HLI will comment and clear things up regarding the 1.3 figure you pulled out and what it really means.
I thought it cleaner to reply to this comment about moral weights here where you could see my calculations as it will make it easier to find the discussion and it is more related to moral weights.
I feel like there is probably a mistake somewhere in the notion that anyone involves thinks that <doubling income as having 1.3 WELLBY and severe depression has having a 1.3 WELLBY effect.>
It’s certainly plausible, although I don’t know where my mistake is.
Or it could be that HLI has an error or has inconsistencies between reports.
I am very confident HLI are inconsistent between reports. I have already queried them on this. I don’t know if I have Joel’s permission to publish his full reply, but he is looking into it. I also noted it in the footnotes here
(“totally curing just one case of severe depression for a year” should correspond to something more like 2-5 SD-years).
I’m not sure 2-5 SD-years is plausible for severe depression. 3 SDs would saturate the entire scale 0-24.
0.92 SD-years gets converted to 2.0 WELLBYs since they multiply SD-years by the 2.17 figure. This is something I have had confirmed with Joel and this is how they are creating their figures on this page.
My response to this post overall is that I think some of what is going on here is that different people and different organizations mean very different things when we say “Depression”. Since “depression” is not really a binary, the value of averting “1 case of severe depression” can change a lot depending on how you define severity, in such a way that differences in reasonable definitions of “sufficiently bad depression” can plausibly differ by 1-3x when you break it down into “how many SD counts as curing depression” terms.
However, the in-progress nature of SoGives’ mental health work makes pinning down what we do mean sort of tricky. What exactly did the participants in the SoGive Delphi Process mean when they said “severe depression”? How should I, as an analyst who isn’t aiming to set the moral weights but is attempting to advise people using them, interpret that? These things are currently in flux, in the sense that I’m basically in the process of making various judgement calls about them right now, which I’ll describe below.
You commented:
I’m not sure 2-5 SD-years is plausible for severe depression. 3 SDs would saturate the entire scale 0-24.
It’s true that the PHQ-9 score of 27 points maxes out around 2-4sd. How many SD it is exactly depends on the spread of your population of course (for example if 1sd=6.1 points then the range of a 27 point scale spans 4.42sd ), and for some population spreads it would be 3sd.
These two things are related actually! I think the trouble is that the word “severity depression” is ambiguous as to how bad it is, so different people can mean different things by it.
One might argue that the following was an awkward workaround which should have been done differently, but basically, to make transparent my internal thought process here (In terms of what I thought after joining sogive, starting this analysis, and encountering these weights) was the following:
-> “hm, this implies we’re willing to trade averting 25 years of depression against one (mostly neonatal) death. Is this unusual?”
→ “Maybe we are thinking about the type of severe, suicidal depression that is an extremely net negative experience, a state which is worse than death.”
→ “Every questionnaire creator seems to have recommended cut-offs for gradients of depression such as “mild” and “moderate” (e.g. the creators of the PHQ-9 scale are recommending 20 points as the cut-off for “severe” depression) but these aren’t consistent between scales and are ultimately arbitrary choices.”
-> “extrapolating linearly from the time-trade-off literature people seemed to think that a year of depression breaks even with dying a year earlier around 5.5sd. Maybe less if it’s not linear.”
-> “But maybe it should be more because what’s really happening here is that we’re seeing multiple patients improve by 0.5-0.8 sd. The people surveyed in that paper think that the difference between 2sd->3sd is bigger than 1sd->2sd. People might disagree on the correct way to sum these up.”
→ concluding with me thinking that various reasonable people might set the standard for “averting severe depression” between 2-6 sd, depending on whether they wanted ordinary severity or worse than death severity
So, hopefully that answers your question as to why I wrote to you that 2-5sd is reasonable for severe depression. I’m going to try to justify this further in subsequent posts. Some additional thoughts that I had were:
-> I notice that this is still weighting depression more heavily than the people surveyed in the time-trade-off, but if we set it on the higher range of 3-6sd it still feels like a morally plausible view (especially considering that some people might have assigned lower moral weight to neonates).
→ My role is to tell people what the effect is, not to tell them what moral weights to use. However, I’m noticing that all the wiggle room to interpret what “severe” means is on me, and I notice that I keep wanting to nudge the SD-years I accept as higher in order to make the view match what I think is morally plausible.
-> I’ll just provisionally use something between 3-5 sd-years for the purpose of completing analysis, because my main aim is to figure out what therapy does in terms of sd.
→ But I should probably publish a tool that allows people to think about moral weights in terms of standard deviation, and maybe we can survey people for moral weights again in the future in a manner that lets them talk about standard deviations rather than whatever connotations they attached to “severe depression”. Then we can figure out what people really think about various grades of depression and how much income and life they’re willing to trade about it.
In fact the next thing I’m scheduled to publish is a write up that talks in detail about how to translate SD into something more morally intuitive. So hopefully that will help us make some progress on the moral weights issue.
So to summarize, I think (assuming your calculations w.r.t. everyone else’s weights are correct) what’s going on here is that it looks like SoGive is weighing depression 4x more than everyone, but those moral weights were set in the absence of a concrete recommendations, and in the end …and arguably this is an artifact me choosing after the fact to set a really high SD threshold for “severity” as a reaction to the weights, and what really needs to happen is that we need to go through that process I described of polling people again in a way that breaks down “severity” differently… in the final analysis, once a concrete recommendation comes out, it probably won’t be that different? (Though you’ve added two items, sd<->daly/wellby and cash<->sd, on my list of things to check for robustness and if it ends up being notable I’m definitely going to flag it, so thank you for that). I do think that this story will ultimately end with some revisiting of moral weights, how they should be set, and what they mean, and how to communicate them.
(There’s another point that came up in the other thread though, regarding “does it pass the sanity check w.r.t. cash transfer effects on well being”, which this doesn’t address. although it falls outside the scope of my current work I have been wanting to get a firmer sense of the empirical cash <-> wellby <-> sd depression correlations and apropos of your comments perhaps this should be made more explicit in moral weights agendas.)
From ishaan here.
I thought it cleaner to reply to this comment about moral weights here where you could see my calculations as it will make it easier to find the discussion and it is more related to moral weights.
It’s certainly plausible, although I don’t know where my mistake is.
I am very confident HLI are inconsistent between reports. I have already queried them on this. I don’t know if I have Joel’s permission to publish his full reply, but he is looking into it. I also noted it in the footnotes here
I’m not sure 2-5 SD-years is plausible for severe depression. 3 SDs would saturate the entire scale 0-24.
0.92 SD-years gets converted to 2.0 WELLBYs since they multiply SD-years by the 2.17 figure. This is something I have had confirmed with Joel and this is how they are creating their figures on this page.
My response to this post overall is that I think some of what is going on here is that different people and different organizations mean very different things when we say “Depression”. Since “depression” is not really a binary, the value of averting “1 case of severe depression” can change a lot depending on how you define severity, in such a way that differences in reasonable definitions of “sufficiently bad depression” can plausibly differ by 1-3x when you break it down into “how many SD counts as curing depression” terms.
However, the in-progress nature of SoGives’ mental health work makes pinning down what we do mean sort of tricky. What exactly did the participants in the SoGive Delphi Process mean when they said “severe depression”? How should I, as an analyst who isn’t aiming to set the moral weights but is attempting to advise people using them, interpret that? These things are currently in flux, in the sense that I’m basically in the process of making various judgement calls about them right now, which I’ll describe below.
You commented:
It’s true that the PHQ-9 score of 27 points maxes out around 2-4sd. How many SD it is exactly depends on the spread of your population of course (for example if 1sd=6.1 points then the range of a 27 point scale spans 4.42sd ), and for some population spreads it would be 3sd.
These two things are related actually! I think the trouble is that the word “severity depression” is ambiguous as to how bad it is, so different people can mean different things by it.
One might argue that the following was an awkward workaround which should have been done differently, but basically, to make transparent my internal thought process here (In terms of what I thought after joining sogive, starting this analysis, and encountering these weights) was the following:
-> “hm, this implies we’re willing to trade averting 25 years of depression against one (mostly neonatal) death. Is this unusual?”
→ “Maybe we are thinking about the type of severe, suicidal depression that is an extremely net negative experience, a state which is worse than death.”
→ “Every questionnaire creator seems to have recommended cut-offs for gradients of depression such as “mild” and “moderate” (e.g. the creators of the PHQ-9 scale are recommending 20 points as the cut-off for “severe” depression) but these aren’t consistent between scales and are ultimately arbitrary choices.”
-> “extrapolating linearly from the time-trade-off literature people seemed to think that a year of depression breaks even with dying a year earlier around 5.5sd. Maybe less if it’s not linear.”
-> “But maybe it should be more because what’s really happening here is that we’re seeing multiple patients improve by 0.5-0.8 sd. The people surveyed in that paper think that the difference between 2sd->3sd is bigger than 1sd->2sd. People might disagree on the correct way to sum these up.”
→ concluding with me thinking that various reasonable people might set the standard for “averting severe depression” between 2-6 sd, depending on whether they wanted ordinary severity or worse than death severity
So, hopefully that answers your question as to why I wrote to you that 2-5sd is reasonable for severe depression. I’m going to try to justify this further in subsequent posts. Some additional thoughts that I had were:
-> I notice that this is still weighting depression more heavily than the people surveyed in the time-trade-off, but if we set it on the higher range of 3-6sd it still feels like a morally plausible view (especially considering that some people might have assigned lower moral weight to neonates).
→ My role is to tell people what the effect is, not to tell them what moral weights to use. However, I’m noticing that all the wiggle room to interpret what “severe” means is on me, and I notice that I keep wanting to nudge the SD-years I accept as higher in order to make the view match what I think is morally plausible.
-> I’ll just provisionally use something between 3-5 sd-years for the purpose of completing analysis, because my main aim is to figure out what therapy does in terms of sd.
→ But I should probably publish a tool that allows people to think about moral weights in terms of standard deviation, and maybe we can survey people for moral weights again in the future in a manner that lets them talk about standard deviations rather than whatever connotations they attached to “severe depression”. Then we can figure out what people really think about various grades of depression and how much income and life they’re willing to trade about it.
In fact the next thing I’m scheduled to publish is a write up that talks in detail about how to translate SD into something more morally intuitive. So hopefully that will help us make some progress on the moral weights issue.
So to summarize, I think (assuming your calculations w.r.t. everyone else’s weights are correct) what’s going on here is that it looks like SoGive is weighing depression 4x more than everyone, but those moral weights were set in the absence of a concrete recommendations, and in the end …and arguably this is an artifact me choosing after the fact to set a really high SD threshold for “severity” as a reaction to the weights, and what really needs to happen is that we need to go through that process I described of polling people again in a way that breaks down “severity” differently… in the final analysis, once a concrete recommendation comes out, it probably won’t be that different? (Though you’ve added two items, sd<->daly/wellby and cash<->sd, on my list of things to check for robustness and if it ends up being notable I’m definitely going to flag it, so thank you for that). I do think that this story will ultimately end with some revisiting of moral weights, how they should be set, and what they mean, and how to communicate them.
(There’s another point that came up in the other thread though, regarding “does it pass the sanity check w.r.t. cash transfer effects on well being”, which this doesn’t address. although it falls outside the scope of my current work I have been wanting to get a firmer sense of the empirical cash <-> wellby <-> sd depression correlations and apropos of your comments perhaps this should be made more explicit in moral weights agendas.)