Hello William, thanks for this. I’ve been scratching my head about how best to respond to the concerns you raise.
First, your TL;DR is that this post doesn’t address your concerns about the WELLBY. That’s understandable, not least because that was never the purpose of this post. Here, we aimed to set out our charity recommendations and give a non-technical overview of our work, not get into methodological and technical issues. If you want to know more about the WELLBY approach, I would send you to this recent post instead, where we talk about the method overall, including concerns about neutrality, linearity, and comparability.
Second, on scientific validity, it means that your measure successfully captures what you set out to measure. See e.g. Alexandrova and Haybron (2022) on the concept of validity and its application to wellbeing measures. I’m not going to give you chapter and verse on this.
Regarding linearity and comparability, you’re right that people *could* be using this in different ways. But, are they? and would it matter if they did? You always get measurement error, whatever you do. An initial response is to point out that if differences are random, they will wash out as ‘noise’. Further, even if something is slightly biased, that wouldn’t make it useless—a bent measuring stick might be better than nothing. The scales don’t need to be literally exactly linear and comparable to be informative. I’ve looked into this issue previously, as have some others, and at HLI we plan to do more on it: again, see this post. I’m not incredibly worried about these things. Some quick evidence. If you look at map of global life satisfaction, it’s pretty clear there’s a shared scale in general. It would be an issue if e.g. Iraq gave themselves 9⁄10.
In your MacAskill quotation, MacAskill is attacking a straw man. When people say something is, e.g. “the best”, we don’t mean the best it is logically possible to be. That wouldn’t be helpful. We mean something more like the “the best that’s actually possible”, i.e. possible in the real world. That’s how we make language meaningful. But yes, in another recent report, we stress that we need more work on understanding the neutral point.
Finally, and the thing I think you’ve really missed about all this, is that: if we’re not going to use subjective wellbeing surveys to find out how well or badly people’s lives are going, what are we going to use instead? Indeed, MacAskill himself says in the same chapter you quote from of What We Owe The Future:
You might ask, Who am I to judge what lives are above or below neutral? The sentiment here is a good one. We should be extremely cautious to figure our how good or bad others’ lives are, as it’s so hard to understand the experiences of people with lives very different to one’s own. The answer is to rely primarily on self-reports
Thank you very much for taking the time to write this detailed reply, Michael! I haven’t read the To WELLBY or not to WELLBY? post, but definitely want to check that out to understand this all better.
I also want to apologize for my language sounding overly critical/harsh in my previous comment. E.g. Making my first sentence “This post didn’t address my concerns related to using WELLBYs...” when I knew full well that wasn’t what the post was intending to address was very unfair of me.
I know you’ve put a lot of work into researching the WELLBY approach and are no doubt advancing our frontier of knowledge of how to do good effectively in the process, so I want to acknowledge that I appreciate what you do regardless of any level of skepticism I may still have related to heavily relying on WELLBY measurements as the best way to evaluate impact.
As a final note, I want to clarify that while my previous comment may have made it sound like I was confident that the WELLBY approach was no good, in fact my tone was more reflective of my (low-information) intuitive independent impression, not my all-things-considered view. I think there’s a significant chance that when I read into your research on neutrality, linearity, and comparability, etc, that I’ll update toward thinking that the WELLBY approach makes considerably more sense than I initially assumed.
Thanks for saying that. Yeah, I couldn’t really understand where you were coming from (and honestly ended up spending 2+ hours drafting a reply).
On reflection, we should probably have done more WELLBY-related referencing in the post, but we were trying to keep the academic side light. In fact, we probably need to recombine our various scratching on the WELLBY and put them onto a single page on our website—it’s been a lower priority than doing the object-level charity analysis work.
If you’re doing the independent impression thing again, then, as a recipient, it would have been really helpful to know that. Then I would have read it more as a friendly “I’m new to this and sceptical and X and Y—what’s going on with those?” and less as a “I’m sceptical, you clearly have no idea what you’re talking about” (which was more-or-less how I initially interpreted it… :) )
Then I would have read it more as a friendly “I’m new to this and sceptical and X and Y—what’s going on with those?” and less as a “I’m sceptical, you clearly have no idea what you’re talking about”
Ah, I’m really sorry I didn’t clarify this!
For the record, you’re clearly an expert on WELLBYs and I’m quite new to thinking about them.
My initial exposure to HLI’s WELLBY approach to evaluating interventions was the post Measuring Good Better and this post is only my second time reading about WELLBYs. I also know very little about subjective wellbeing surveys. I’ve been asked to report my subjective wellbeing on surveys before, but I’ve basically never read about them before besides that chapter of WWOTF.
The rest of this comment is me offering an explanation on what I think happened here:
Scott Alexander has a post called Socratic Grilling that I think offers useful insight into our exchange. In particular, while I absolutely could and should have written my initial comment to be a lot friendlier, I think my comment was essentially an all-at-once example of Socratic grilling (me being the student and you being the teacher). As Scott points out, there’s a known issue with this:
Second, to a hostile observer, it would sound like the student was challenging the teacher. Every time the teacher tried to explain germ theory, the student “pounced” on a supposed inconsistency. When the teacher tried to explain the inconsistency, the student challenged her explanations. At times he almost seems to be mocking the teacher. Without contextual clues – and without an appreciation for how confused young kids can be sometimes – it could sound like this kid is an arrogant know-it-all who thinks he’s checkmated biologists and proven that germ theory can’t possibly be true. Or that he thinks that he, a mere schoolchild, can come up with a novel way to end all sickness forever that nobody else ever thought of.
Later:
Tolerating this is harder than it sounds. Most people can stay helpful for one or two iterations. But most people are bad at explaining things, so one or two iterations isn’t always enough I’ve had times when I need five or ten question-answer rounds with a teacher in order to understand what they’re telling me. The process sounds a lot like “The thing you just said is obviously wrong”…”no, that explanation you gave doesn’t make sense, you’re still obviously wrong”…”you keep saying the same thing over and over again, and it keeps being obviously wrong”…”no, that’s irrelevant to the point that’s bothering me”…”no, that’s also irrelevant, you keep saying an obviously wrong thing”…”Oh! That word means something totally different from what I thought it meant, now your statement makes total sense.”
But it’s harder even than that. Sometimes there is a vast inferential distance between you and the place where your teacher’s model makes sense, and you need to go through a process as laborious as converting a religious person to a materialist worldview (or vice versa) before the gap gets closed.
When I first read about HLI’s approach in the Measuring Good Better article my reaction was “Huh, this seems like a poor way to evaluate impact given [all the aspects of subjective wellbeing surveys that intuitively seemed problematic to me].”
If I was talking with you in person about it I probably would have done a back-and-forth Socratic grilling with you about it. But I didn’t comment. I then got to this post some weeks later and was hoping it would provide some answer to my concerns, was disappointed that that was not the post’s purpose, and proceeded to write a long post explaining all my concerns with the WELLBY approach so that you or someone could address them. In short, I dumped a lot of work on you and completely failed to think about how (Scott’s words:) ” it would sound like the student was challenging the teacher,” and how I could come across as an “arrogant know-it-all who thinks he’s checkmated” you, and how “Tolerating this is harder than it sounds”.
So I’m really sorry about that and will make it a point to make sure I actually think about how my comments will be received next time I’m tempted to “Socratically grill” someone, that way I can make sure my comment comes across as friendly.
Hello William, thanks for this. I’ve been scratching my head about how best to respond to the concerns you raise.
First, your TL;DR is that this post doesn’t address your concerns about the WELLBY. That’s understandable, not least because that was never the purpose of this post. Here, we aimed to set out our charity recommendations and give a non-technical overview of our work, not get into methodological and technical issues. If you want to know more about the WELLBY approach, I would send you to this recent post instead, where we talk about the method overall, including concerns about neutrality, linearity, and comparability.
Second, on scientific validity, it means that your measure successfully captures what you set out to measure. See e.g. Alexandrova and Haybron (2022) on the concept of validity and its application to wellbeing measures. I’m not going to give you chapter and verse on this.
Regarding linearity and comparability, you’re right that people *could* be using this in different ways. But, are they? and would it matter if they did? You always get measurement error, whatever you do. An initial response is to point out that if differences are random, they will wash out as ‘noise’. Further, even if something is slightly biased, that wouldn’t make it useless—a bent measuring stick might be better than nothing. The scales don’t need to be literally exactly linear and comparable to be informative. I’ve looked into this issue previously, as have some others, and at HLI we plan to do more on it: again, see this post. I’m not incredibly worried about these things. Some quick evidence. If you look at map of global life satisfaction, it’s pretty clear there’s a shared scale in general. It would be an issue if e.g. Iraq gave themselves 9⁄10.
Equally, it’s pretty clear that people can and do use words and numbers in a meaningful and comparable way.
In your MacAskill quotation, MacAskill is attacking a straw man. When people say something is, e.g. “the best”, we don’t mean the best it is logically possible to be. That wouldn’t be helpful. We mean something more like the “the best that’s actually possible”, i.e. possible in the real world. That’s how we make language meaningful. But yes, in another recent report, we stress that we need more work on understanding the neutral point.
Finally, and the thing I think you’ve really missed about all this, is that: if we’re not going to use subjective wellbeing surveys to find out how well or badly people’s lives are going, what are we going to use instead? Indeed, MacAskill himself says in the same chapter you quote from of What We Owe The Future:
Thank you very much for taking the time to write this detailed reply, Michael! I haven’t read the To WELLBY or not to WELLBY? post, but definitely want to check that out to understand this all better.
I also want to apologize for my language sounding overly critical/harsh in my previous comment. E.g. Making my first sentence “This post didn’t address my concerns related to using WELLBYs...” when I knew full well that wasn’t what the post was intending to address was very unfair of me.
I know you’ve put a lot of work into researching the WELLBY approach and are no doubt advancing our frontier of knowledge of how to do good effectively in the process, so I want to acknowledge that I appreciate what you do regardless of any level of skepticism I may still have related to heavily relying on WELLBY measurements as the best way to evaluate impact.
As a final note, I want to clarify that while my previous comment may have made it sound like I was confident that the WELLBY approach was no good, in fact my tone was more reflective of my (low-information) intuitive independent impression, not my all-things-considered view. I think there’s a significant chance that when I read into your research on neutrality, linearity, and comparability, etc, that I’ll update toward thinking that the WELLBY approach makes considerably more sense than I initially assumed.
Hello William,
Thanks for saying that. Yeah, I couldn’t really understand where you were coming from (and honestly ended up spending 2+ hours drafting a reply).
On reflection, we should probably have done more WELLBY-related referencing in the post, but we were trying to keep the academic side light. In fact, we probably need to recombine our various scratching on the WELLBY and put them onto a single page on our website—it’s been a lower priority than doing the object-level charity analysis work.
If you’re doing the independent impression thing again, then, as a recipient, it would have been really helpful to know that. Then I would have read it more as a friendly “I’m new to this and sceptical and X and Y—what’s going on with those?” and less as a “I’m sceptical, you clearly have no idea what you’re talking about” (which was more-or-less how I initially interpreted it… :) )
Ah, I’m really sorry I didn’t clarify this!
For the record, you’re clearly an expert on WELLBYs and I’m quite new to thinking about them.
My initial exposure to HLI’s WELLBY approach to evaluating interventions was the post Measuring Good Better and this post is only my second time reading about WELLBYs. I also know very little about subjective wellbeing surveys. I’ve been asked to report my subjective wellbeing on surveys before, but I’ve basically never read about them before besides that chapter of WWOTF.
The rest of this comment is me offering an explanation on what I think happened here:
Scott Alexander has a post called Socratic Grilling that I think offers useful insight into our exchange. In particular, while I absolutely could and should have written my initial comment to be a lot friendlier, I think my comment was essentially an all-at-once example of Socratic grilling (me being the student and you being the teacher). As Scott points out, there’s a known issue with this:
Later:
When I first read about HLI’s approach in the Measuring Good Better article my reaction was “Huh, this seems like a poor way to evaluate impact given [all the aspects of subjective wellbeing surveys that intuitively seemed problematic to me].”
If I was talking with you in person about it I probably would have done a back-and-forth Socratic grilling with you about it. But I didn’t comment. I then got to this post some weeks later and was hoping it would provide some answer to my concerns, was disappointed that that was not the post’s purpose, and proceeded to write a long post explaining all my concerns with the WELLBY approach so that you or someone could address them. In short, I dumped a lot of work on you and completely failed to think about how (Scott’s words:) ” it would sound like the student was challenging the teacher,” and how I could come across as an “arrogant know-it-all who thinks he’s checkmated” you, and how “Tolerating this is harder than it sounds”.
So I’m really sorry about that and will make it a point to make sure I actually think about how my comments will be received next time I’m tempted to “Socratically grill” someone, that way I can make sure my comment comes across as friendly.