Vanessa comments on To WELLBY or not to WELLBY? Measuring non-health, non-pecuniary benefits using subjective wellbeing

Vanessa 12 Aug 2022 7:43 UTC
6 points
0 ∶ 0
I am skeptical of using answers to questions such as “how satisfied are you with your life?” as a measure of human preferences. I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”. I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
[I’m assuming that something like preference utilitarianism is a reasonable model of our goal here, I do realize some people might disagree but didn’t want to dive into those weeds just yet.]
(I only skimmed the article, so my apologies if this was addressed somewhere and I missed it.)
- JoelMcGuire 12 Aug 2022 19:31 UTC
  13 points
  0 ∶ 0
  Parent
  Hello, Vanessa
  To complement Michael’s reply, I think there’s been some decent work related to two of your points, which happens to all be by the same group.
  I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
  In Benjamin et al. (2012; 2014a) they find that what people choose is well predicted by what they think would make them happier or more satisfied with their life—so there may not be too much tension between these measures as is. However, if you’re interested in a measure of wellbeing more in line with people’s revealed preferences, then it seems your best bet may still lie within the realm of SWB. See Benjamin et al., (2014b) whose title hints at the thrust of their argument “Beyond Happiness and Satisfaction: Toward Well-Being Indices Based on Stated Preference” -- but note, that their approach doesn’t mean abandoning subjective wellbeing as the approach is still based on asking people about their life. They discuss their approach to SWB more in Benjamin et al., (2021).
  I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”.
  The difference in meaning of SWB questions is still, as we note in Section 5, an area of active exploration. For instance, some recent work finds that people will respond to ambiguously worded questions about their life’s wellbeing to include considerations of how their family is doing (Benjamin et al., 2021, which contains a few other interesting findings!).
  I wouldn’t be surprised if we discover that we need to do some fine-tuning to make these questions more precise, but that that to me seems like the normal hard work of iterative refinement, instead of an indictment of the whole enterprise!
  - Vanessa 21 Aug 2022 8:12 UTC
    1 point
    0 ∶ 0
    Parent
    Hi Joel,
    Thank you for the informative reply!
    I think there’s a big difference between asking people to rate their present life satisfaction and asking people what would make them more satisfied with their life. The latter is a comparison: either between several options or between future and present, depending on the phrasing of the questions. In a comparison it makes sense people report their relative preferences. On the other hand, the former is in some ill-posed reference frame. So I would be much more optimistic about a variant of WELLBY based on the former than on the latter.
- MichaelPlant 12 Aug 2022 16:31 UTC
  8 points
  0 ∶ 0
  Parent
  I’m not sure I understand your point. Kahneman famously distinguishes between decision utility—what people do or would choose—and experience utility—how they felt as a result of their choice. SWB measures allow us to get at the second. How would you empirically test which is the better measure of preferences?
  - Vanessa 13 Aug 2022 16:44 UTC
    4 points
    0 ∶ 0
    Parent
    Suppose I’m the intended recipient of a philanthropic intervention by an organization called MaxGood. They are considering two possible interventions: A and B. If MaxGood choose according to “decision utility” then the result is equivalent to letting me choose, assuming that I am well-informed about the consequences. In particular, if it was in my power to decide according to what measure they choose their intervention, I would definitely choose decision-utility. Indeed, making MaxGood choose according to decision-utility is guaranteed to be the best choice according to decision-utility, assuming MaxGood are at least as well informed about things as I am, and by definition I’m making my choices according to decision-utility.
    On the other hand, letting MaxGood choose according to my answer on a poll is… Well, if I knew how the poll is used when answering it, I could use it to achieve the same effect. But in practice, this is not the context in which people answer those polls (even if they know the poll is used for philanthropy, this philanthropy usually doesn’t target them personally, and even if it did individual answers would have tiny influence^[1]). Therefore, the result might be what I actually want or it might be e.g. choosing an intervention which will influence society in a direction that makes putting higher numbers culturally expected or will lower the baseline expectations w.r.t. which I’m implicitly calculating this number^[2].
    Another issue with polls is, how do we know the answer is utility rather than some monotonic function of utility? The difference is important if we need to compute expectations. But this is the least of the problem IMO.
    Now, in reality it is not in the recipient’s power to decide on that measure. Hence MaxGood are free to decide in some other way. But, if your philanthropy is explicitly going against what the recipient would choose for themself^[3], well… From my perspective (as Vanessa this time), this is not even altruism anymore. This is imposing your own preferences on other people^[4].
    ^
    A similar situation arises in voting, and I indeed believe this causes people to vote in ways other than optimizing the governance of the country (specifically, vote according to tribal signalling considerations instead).
    ^
    Although in practice, many interventions have limited predictable influence on this kind of factors, which might mean that poll-based measures are usually fine. It might still be difficult to see the signal through the noise in this measure. And, we need to be vigilant about interventions that don’t fall into this class.
    ^
    It is ofc absolutely fine if e.g. MaxGood are using a poll-based measure because they believe, with rational justification, that in practice this is the best way to maximize the recipient’s decision-utility.
    ^
    I’m ignoring animals in this entire analysis, but this doesn’t matter much since the poll methodology is in applicable to animals anyway.
    - Lorenzo Buonanno🔸 16 Aug 2022 12:50 UTC
      2 points
      0 ∶ 0
      Parent
      But, if your philanthropy is explicitly going against what the recipient would choose for themself, well… From my perspective (as Vanessa this time), this is not even altruism anymore. This is imposing your own preferences on other people
      Would this also apply to e.g. funding any GiveWell top charity besides GiveDirectly, or would that fall into “in practice, this is the best way to maximize the recipient’s decision-utility”?
      I don’t think most recipients would buy vitamin supplementation or bednets themselves, given cash.
      I guess you could say that it’s because they’re not “well informed”, but then how could you predict their “decision utility when well informed” besides assuming it would correlate strongly with maximizing their experience utility?
      A bit off-topic, but I found GiveWell’s staff documents on moral weights fascinating for deciding how much to weigh beneficiaries’ preferences, from a very different angle.
      - Vanessa 17 Aug 2022 6:37 UTC
        5 points
        0 ∶ 0
        Parent
        I don’t know much about supplements/bednets, but AFAIU there are some economy of scale issues which make it easier for e.g. AMF to supply bednets compared with individuals buying bednets for themselves.
        As to how to predict “decision utility when well informed”, one method I can think of is look at people who have been selected for being well-informed while similar to target recipients in other respects.
        But, I don’t at all claim that I know how to do it right, or even that life satisfaction polls are useless. I’m just saying that I would feel better about research grounded in (what I see as) more solid starting assumptions, which might lead to using life satisfaction polls or to something else entirely (or a combination of both).
        What links here?
        Lorenzo Buonanno🔸's comment on Rhodri Davies on why he’s not an EA by Sanjay (19 Aug 2022 20:02 UTC; 2 points)
- helmetedhornbill 17 Aug 2022 14:37 UTC
  3 points
  0 ∶ 0
  Parent
  Hi Vanessa, I really liked how specific and critical your comment was, which I think is ultimately how research can improve, so I’ve upvoted it :)
  I’m not linked to this report but have an interest in subjective measures broadly so thought I would add a different perspective for the sake of discussion in response to the two issues your raise.
  1. I am skeptical of using answers to questions such as “how satisfied are you with your life?” as a measure of human preferences. I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”.
  I think the fact that SWB measures differs across cultures is actually a good sign that these measures capture what they are supposed to capture. Cultures differ in e.g. values (collectivistic vs individualistic), social and gender norms, economic systems, ethics and moral. Surely some of these facets should influence how people see what a good life is, what happiness is, what wellbeing is. In fact, I would be more concerned if different people with different views and circumstances did not, as you say, ‘differ substantially.’
  I think these differences, attributable to culture or individual variance, are not likely to be of concern for what I would imagine would be the more common ways WELLBYs could be used. Most cost effectiveness analyses rely on RCTs or comparable designs with pre and post measures. You could look at changes within the same group of people easily pre and post and compare their differences. Or even beyond such designs, controlling for different sources of variance that we think are important (like age and gender most commonly) is not that tricky. This doesn’t seem a big methodological concern to me but would be keen to hear more about how things look from your view.
  1. I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
  What I like about the original post here is that there is caution about the uncertainties and challenges with SWB measures, e.g. comparability issues, neutral points. So I think it’s only fair to point out some of the challenges for revealed preferences. In my reading, there’s a long body of researcher suggesting these are stable, yet in practice your ‘revealed’ preference at $5 is likely to be different than at $10. Many scholars have now critiqued the notion of revealed preferences and instead suggested that we should be talking about constructed preferences. Most notably I am thinking of Itamar Simonson’s work, though this as a field can be traced back at least to Slovic in the 1950s (to my knowledge).
  Constructed preferences are seen as constructed in the process of making a choice—different tasks and contexts highlight different aspects of the available options, thus focusing decision-makers on different considerations that lead to seemingly inconsistent decisions (Bettman, Luce, and Payne 1998). And I think there is an argument to be made that your wellbeing can influence your constructed preferences. For instance, negative appraisals and rumination are common for low levels of wellbeing, and there is evidence to suggest that perceived choice difficulty is linked to variances for preferences (Dhar and Simonson 2003; Payne, Bettman, and Johnson 1992). Further, there is evidence broader metacognitive process influence constructed preferences, and those too can shift depending on your (lack of) happiness. So I wouldn’t be surprised that your preferences vary at e.g. low vs high SWB, in fact it sounds to me like it would be important to know SWB and be able to account for it.
  - Vanessa 17 Aug 2022 16:18 UTC
    0 points
    0 ∶ 0
    Parent
    I think the fact that SWB measures differs across cultures is actually a good sign that these measures capture what they are supposed to capture… In fact, I would be more concerned if different people with different views and circumstances did not, as you say, ‘differ substantially.’
    My claim is not “SWB is empirically different between cultures therefore SWB is bad”. My claim is, I suspect that cultural factors cause people to choose different numbers for reasons orthogonal to what they actually want. For example, maybe Alice wants to be a career woman instead of her current role as a housewife (and would make choices to this effect if she had an opportunity), but she reports high life satisfaction because she feels that is expected of her (and it’s not like reporting a low number would help her). Or, maybe people in Fooland consistently report higher life satisfaction than people in Baristan (because they have lower expectations of how life should be), but nobody from Baristan wants to move to Fooland and everyone from Fooland want to move to Baristan if they can (because life is actually better in Baristan).
    I think these differences, attributable to culture or individual variance, are not likely to be of concern for what I would imagine would be the more common ways WELLBYs could be used. Most cost effectiveness analyses rely on RCTs or comparable designs with pre and post measures.
    I agree that directly comparing “pre” to “post” SWB might work okay for many interventions, because the intervention doesn’t affect the confounding factors, as long as you’re comparing different interventions applied to similar populations. I would still rely more on asking people directly how much this intervention helped them / how much their life improved over this period (as opposed to comparing numbers reported at different points of time)^[1]. And, we should still be vigilant about situations in which the confounders cannot be ignored (e.g. interventions that cause cultural shifts). And, there might be a non-linear relationship between SWB and decision-utility which should be somehow divulged if we are averaging these numbers.
    In my reading, there’s a long body of researcher suggesting these are stable, yet in practice your ‘revealed’ preference at $5 is likely to be different than at $10.
    I’m guessing you are not talking about things like, how much free time you would exchange for an additional $1? Because that’s consistent with constant preferences? So, Alice has $5 and Bob has $10, they are asked to choose between X and Y, and they have predictably different preferences despite the fact that post-X-Alice has the same wealth (and other circumstances) and post-X-Bob and the same for Y? And this despite somehow controlling for confounders are correlated both with the causes for Alice’s and Bob’s wealth and with their preferences?
    I imagine such things can happen, in which case I would try to add hindsight judgements and judgements of people who experienced different circumstances into the mix. I expect that as people become more informed and experienced they roughly converge to some stable set of preferences, and the tradeoffs that don’t converge are not really important. If I’m wrong and they are important, then we need to use the revealed preferences of people in those particular circumstances (which, yes, might include SWB, might also include other parameters).
    ^
    Even under optimistic assumptions about SWB, this seems less noisy. Under pessimistic assumptions, I can imagine e.g. people implicitly interpreting the question as comparing their life to their neighbors (which were also affected by the intervention) or comparing their life now to their life in the past (which was still after the intervention), in which case SWB has no signal at all.
    - helmetedhornbill 18 Aug 2022 11:53 UTC
      1 point
      0 ∶ 0
      Parent
      Thanks so much for replying, I learned a lot from your response and its clarity helped me update my thinking.
      
      My claim is, I suspect that cultural factors cause people to choose different numbers for reasons orthogonal to what they actually want.
      Thanks, the specificity here helped me understand your view better. I suppose with the examples you give—I would expect these to be exceptions rather than norms (because if e.g. wanting to have a career was the norm, over enough time, that would tend to become culturally normative and even in the process of it becoming a more normative view the difference with a SWB measure should diminish). And more broadly, interventions that have large samples and aim for generalizability should be reasonably representative and also diminish this as a concern.
      
      I suppose I’m also thinking about the potential difference in specific SWB scales. Something like the SWLS scale or the single item measures would not be very domain specific but scales based around the e.g. Wheel of Life tradition tell you a lot more different facets of your life (e.g. you can see high overall scale but low for job satisfaction), so it seems to me that with the right scales and enough items you can address culture or other variance even further.
      
      I’m guessing you are not talking about things like, how much free time you would exchange for an additional $1? Because that’s consistent with constant preferences? So, Alice has $5 and Bob has $10, they are asked to choose between X and Y, and they have predictably different preferences despite the fact that post-X-Alice has the same wealth (and other circumstances) and post-X-Bob and the same for Y? And this despite somehow controlling for confounders are correlated both with the causes for Alice’s and Bob’s wealth and with their preferences?
      Thanks again for responding with such precision. What I was unable to articulate well is that your individual preferences are not stable (or I suppose: per person, rather than across people), i.e. Alice when she has $5 will exchange a different amount of free time for an extra $1 then when Alice has $10.
      
      I agree with everything else you’ve said and especially with:
      I would still rely more on asking people directly how much this intervention helped them / how much their life improved over this period (as opposed to comparing numbers reported at different points of time)
      I think this is a hugely underappreciated point. I think some of the SWB measures target this issue somewhat but in a limited fashion. I’d love to see more qualitative interviews and participatory / or co-production interventions. I am always surprised by how many interventions say they cannot ascertain a causal mechanism quantitatively and so do not attempt to… well, ask people what worked and didn’t.
      - Vanessa 21 Aug 2022 8:28 UTC
        2 points
        0 ∶ 0
        Parent
        Thanks so much for replying, I learned a lot from your response and its clarity helped me update my thinking.
        You’re very welcome, I’m glad it was useful!
        I would expect these to be exceptions rather than norms (because if e.g. wanting to have a career was the norm, over enough time, that would tend to become culturally normative and even in the process of it becoming a more normative view the difference with a SWB measure should diminish).
        I’m much more pessimistic. The processes that determine what is culturally normative are complicated, there are many examples of norms that discriminate against certain groups or curtail freedoms lasting over time, and if you’re optimizing for the near future then “over enough time” is not a satisfactory solution.
        I suppose I’m also thinking about the potential difference in specific SWB scales. Something like the SWLS scale or the single item measures would not be very domain specific but scales based around the e.g. Wheel of Life tradition tell you a lot more different facets of your life (e.g. you can see high overall scale but low for job satisfaction), so it seems to me that with the right scales and enough items you can address culture or other variance even further.
        I don’t know how those scales work, but (as I wrote in my reply to Joel), I would be much more optimistic about scales that are relative i.e. ask you to compare your well-being in situation A to situation B (whether these situations are familiar or hypothetical) rather than absolute (in which case it’s not clear what’s the reference frame).
        What I was unable to articulate well is that your individual preferences are not stable (or I suppose: per person, rather than across people), i.e. Alice when she has $5 will exchange a different amount of free time for an extra $1 then when Alice has $10.
        This is considered a consistent preference in standard (VNM) decision theory. It is entirely consistent that U(6$ and X free time) > U(5$ and Y free time) but U(11$ and X free time) < U(10$ and Y free time).