Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I only skimmed this, but I think the majority of EAs don’t actually look into the how and why of GiveWell’s recommendations. And even less go into the processes and publications that lead to the numbers that GiveWell eventually uses. An indirect result is that GiveWell doesn’t get as much feedback as it could likely benefit from, and too many EAs can’t speak to M&E professionals in international development at a meaningful level.
What’s explained here, and alluded to here, as well as the criticisms, is important basic info for many EAs who are unfamiliar with it. The various methodologies for costing and discounting (both included here and others), in particular, are definitely worth investigating further for those who haven’t.
Thanks for posting this, it’s a really thorough write up of the issue.
I wrote a bit about this about a year ago where I argued effective altruism is overlooking happiness and I’m pleased to see you reached the same conclusions (I also found the the Dolan 2011 paper very persuasive)! I think your analysis was 1. much more substantial than mine and 2. didn’t hide the information in an additional document people had to go find (on reflection, that was a mistake on my part). Where I think this criticism of DALYs leads us, in our quest to do the most good, is towards mental health as a substantial cause area and away from physical health.
As a separate point: this post does raise the more general worry about the effectiveness with which information gets shared in EA circles. I’d looked into this before and there’s some duplication of effort here: if I’d found a way to make my research better known, the author might have researched something else instead. To be clear, I mean this in no way as a criticism of the author, I think it’s just unfortunate. It’s not the first time I’ve come across this phenomenon in the EA world either and I may make a post on the general problem soon.
Thanks for your comments, Michael!
You raise a good point. I think a possible way of making communication better may be by labelling posts with keywords? That way, everyone could find very easily everything that has been posted about a topic. I am not sure that would have helped in this case, though. My goal with this post was to be comprehensive about what is not so good about DALYs. I did as much reading as I could and wrote more extensively about the factors that seemed more important. Mental health came up as one of these factors (I read your article and I was persuaded, so I decided to dig deeper), so I did not feel that I could leave it out, even if there was already something written about it. You may be right in that this may have been inefficient, and in the future, I think referencing posts rather than writing new ones may be a better alternative. Something definitely worth keeping in mind.
Great post!
Nitpick:
I think “11111” usually refers to full health. (cf. the “EQ-5D Value Sets: Inventory, Comparative Review and User Guide” by Szende, Oppe & Devlin, 2007).
As part of a bigger project on descriptive (population) ethics, I’ve been working on a literature review of health economics. It also contains a section on the EQ-5D and its weaknesses. Here some excerpts:
(Incorporating the TTO lead-time approach can easily overcome this problem.)
Anway, you write:
I couldn’t agree more.
IMHO, another big problem is the evaluation of states worse than death (SWD) (and states of severe mental illness such as depression arguably belong in this category). For example, most studies don’t even allow for SWD assessments. Furthermore, most researchers transform negative evaluations, limiting them to a lower bound of −1. Assuming that people with a history of mental illness more often evaluate health states indicating severe mental illness as highly negative (i.e. give utilities as lower than −1), then this ex-post transformation causes their judgments to have less influence than the judgments of uninformed people who underestimate the severity of mental illness.
I discuss this problem, as well as other problems, in much greater detail in my doc.
I plan on publishing the doc within the next months, but if you’re interested I’m happy to send you a link to the current version.
Some nitpicks in turn!
I don’t think this follows. If these states are impossible (I don’t disagree) then they’ll never come in real life, so it won’t matter what people say in the TTOs. As long as people make sensible judgements about the health states that actually occur, it doesn’t matter what they say in impossible ones. I think you should push the fact they don’t make sensible judgements in general - affective forecasting stuff, etc.
Curious. Hmm. IIRC, DALYs and QALYs don’t have a neutral point: 1 is healthy, 0 is dead, but it’s not specified where between 0 and 1 is neutral. Is neutral 0.5? 0? Unless you know where neutral is you can’t specify the minimum point on the scale, because it doesn’t make sense.
What would −1 mean here? DALYs and QALYs aren’t well-being scales and can’t straightforwardly be interpreted as such.
Good point. But I wonder whether they reinterpret the meanings of some of the dimensions of the ED-Q5 in order to make sense of some of the health states they are asked to rate.
Agree.
This depends on the study. I’m afraid it will take me a couple of paragraphs to explain the methodology, but I hope you’ll bear with me :)
The literature review by Tilling et al. (2010) concluded that only 8% of all TTO studies even allow for subjects to rate health states as worse than death (i.e. as below 0), so for the vast majority of studies, the minimum point on the scale is indeed 0. I think this is problematic since e.g. health states like 33333 (if they are permanent) are probably worse than death for many, maybe even most people.
Of the few TTO studies that allow for negative values, the protocols by Torrence et al. (1982) and Dolan (1997) are used by almost all of them. Below a quote by Tilling et al. (2010), describing these two methods:
These two TTO protocols, in theory, would allow for extremely negative (and even infinite) negative values. Tilling et al. (2010) explain:
How do researchers respond? Again, I’ll quote Tilling et al. (2010, emphasis mine):
In the two most commonly used TTO protocols, the smallest unit of time the TTO procedure iterates toward for SWD is 1 year. Consequently, the lower bound is −9. (Sometimes, the smallest united of time is 3 months, so the lowest possible value is −39.)
To give a concrete example: The subject is indifferent between A) living for 2 years in full health and for 8 years in health state 33333 and B) dying immediately. Thus, the value for health state 33333, for this subject, is − 8⁄2 = − 4.
Now almost all researchers then transform these values, such that the lowest possible value is −1. In my view, this is somewhat arbitrary.
Below some quotes by Devlin et al. (2011) on the matter:
...
And here another quote by Tilling et al (2010):
I hope this explains my previous comment.
References:
Devlin, N. J., Tsuchiya, A., Buckingham, K., & Tilling, C. (2011, 02). A uniform time trade off method for states better and worse than dead: Feasibility study of the ‘lead time’ approach. Health Economics, 20(3), 348-361.
Dolan, P. (1997). Modeling Valuations for EuroQol Health States. Medical Care, 35(11), 1095-1108.
Tilling, C., Devlin, N., Tsuchiya, A., & Buckingham, K. (2010, 09). Protocols for Time Tradeoff Valuations of Health States Worse than Dead: A Literature Review. Medical Decision Making, 30(5), 610-619.
Torrance, G. W., Boyle, M. H., & Horwood, S. P. (1982, 12). Application of Multi-Attribute Utility Theory to Measure Social Preferences for Health States. Operations Research, 30(6), 1043-1069.
Categorizing quality of life based on personal testimony is a challenging task. The reasons you listed show many specific problems, and more generally, human judgement is fickle and error-prone. For instance, Thinking Fast and Slow claims that we are loss-averse and that we overweight the cost of losing something. I wonder, then, how the responses of perceived quality of life differ between people who were born with particular illnesses (like blindness) and people that suffered from it later in life.
The inherent fallacies in human judgement cause me to wonder if it can ever be a reliable source to quantify the effect of illnesses. At the risk of being hyper-pragmatic, perhaps we should attempt to quantify the effect of illnesses by only considering the degree to which the illness impacts a person’s ability to provide useful social function.
Of course, this approach also has many inherent issues. For one, meaningfully quantifying this would be incredibly challenging if not infeasible. It would also likely weight the value of the rich much higher than the value of the poor.
If you don’t think you can quantify QoL by self-reports, I’m not sure how you’re going to be able to quantify useful social functions instead!
FWIW, measuring happiness turns out to be basically fine. You might like this article on the topic which discusses it: http://journals.sagepub.com/doi/10.1111/j.1745-6916.2007.00030.x
Maybe a relevant post from the past.
http://effective-altruism.com/ea/pu/we_care_about_walys_not_qalys/
“QALYs only measure health, and health is not all that matters. Most effective altruists care about increasing the number of “WALYs” or well-being adjusted life years, where health is just one component of wellbeing.”
Keep in mind that there are some differences between DALYs and QALYs, for example see the discussion in https://academic.oup.com/heapol/article/21/5/402/578296/Calculating-QALYs-comparing-QALY-and-DALY