Hi Vanessa, I really liked how specific and critical your comment was, which I think is ultimately how research can improve, so I’ve upvoted it :)
I’m not linked to this report but have an interest in subjective measures broadly so thought I would add a different perspective for the sake of discussion in response to the two issues your raise.
I am skeptical of using answers to questions such as “how satisfied are you with your life?” as a measure of human preferences. I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”.
I think the fact that SWB measures differs across cultures is actually a good sign that these measures capture what they are supposed to capture. Cultures differ in e.g. values (collectivistic vs individualistic), social and gender norms, economic systems, ethics and moral. Surely some of these facets should influence how people see what a good life is, what happiness is, what wellbeing is. In fact, I would be more concerned if different people with different views and circumstances did not, as you say, ‘differ substantially.’
I think these differences, attributable to culture or individual variance, are not likely to be of concern for what I would imagine would be the more common ways WELLBYs could be used. Most cost effectiveness analyses rely on RCTs or comparable designs with pre and post measures. You could look at changes within the same group of people easily pre and post and compare their differences. Or even beyond such designs, controlling for different sources of variance that we think are important (like age and gender most commonly) is not that tricky. This doesn’t seem a big methodological concern to me but would be keen to hear more about how things look from your view.
I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
What I like about the original post here is that there is caution about the uncertainties and challenges with SWB measures, e.g. comparability issues, neutral points. So I think it’s only fair to point out some of the challenges for revealed preferences. In my reading, there’s a long body of researcher suggesting these are stable, yet in practice your ‘revealed’ preference at $5 is likely to be different than at $10. Many scholars have now critiqued the notion of revealed preferences and instead suggested that we should be talking about constructed preferences. Most notably I am thinking of Itamar Simonson’s work, though this as a field can be traced back at least to Slovic in the 1950s (to my knowledge).
Constructed preferences are seen as constructed in the process of making a choice—different tasks and contexts highlight different aspects of the available options, thus focusing decision-makers on different considerations that lead to seemingly inconsistent decisions (Bettman, Luce, and Payne 1998). And I think there is an argument to be made that your wellbeing can influence your constructed preferences. For instance, negative appraisals and rumination are common for low levels of wellbeing, and there is evidence to suggest that perceived choice difficulty is linked to variances for preferences (Dhar and Simonson 2003; Payne, Bettman, and Johnson 1992). Further, there is evidence broader metacognitive process influence constructed preferences, and those too can shift depending on your (lack of) happiness. So I wouldn’t be surprised that your preferences vary at e.g. low vs high SWB, in fact it sounds to me like it would be important to know SWB and be able to account for it.
I think the fact that SWB measures differs across cultures is actually a good sign that these measures capture what they are supposed to capture… In fact, I would be more concerned if different people with different views and circumstances did not, as you say, ‘differ substantially.’
My claim is not “SWB is empirically different between cultures therefore SWB is bad”. My claim is, I suspect that cultural factors cause people to choose different numbers for reasons orthogonal to what they actually want. For example, maybe Alice wants to be a career woman instead of her current role as a housewife (and would make choices to this effect if she had an opportunity), but she reports high life satisfaction because she feels that is expected of her (and it’s not like reporting a low number would help her). Or, maybe people in Fooland consistently report higher life satisfaction than people in Baristan (because they have lower expectations of how life should be), but nobody from Baristan wants to move to Fooland and everyone from Fooland want to move to Baristan if they can (because life is actually better in Baristan).
I think these differences, attributable to culture or individual variance, are not likely to be of concern for what I would imagine would be the more common ways WELLBYs could be used. Most cost effectiveness analyses rely on RCTs or comparable designs with pre and post measures.
I agree that directly comparing “pre” to “post” SWB might work okay for many interventions, because the intervention doesn’t affect the confounding factors, as long as you’re comparing different interventions applied to similar populations. I would still rely more on asking people directly how much this intervention helped them / how much their life improved over this period (as opposed to comparing numbers reported at different points of time)[1]. And, we should still be vigilant about situations in which the confounders cannot be ignored (e.g. interventions that cause cultural shifts). And, there might be a non-linear relationship between SWB and decision-utility which should be somehow divulged if we are averaging these numbers.
In my reading, there’s a long body of researcher suggesting these are stable, yet in practice your ‘revealed’ preference at $5 is likely to be different than at $10.
I’m guessing you are not talking about things like, how much free time you would exchange for an additional $1? Because that’s consistent with constant preferences? So, Alice has $5 and Bob has $10, they are asked to choose between X and Y, and they have predictably different preferences despite the fact that post-X-Alice has the same wealth (and other circumstances) and post-X-Bob and the same for Y? And this despite somehow controlling for confounders are correlated both with the causes for Alice’s and Bob’s wealth and with their preferences?
I imagine such things can happen, in which case I would try to add hindsight judgements and judgements of people who experienced different circumstances into the mix. I expect that as people become more informed and experienced they roughly converge to some stable set of preferences, and the tradeoffs that don’t converge are not really important. If I’m wrong and they are important, then we need to use the revealed preferences of people in those particular circumstances (which, yes, might include SWB, might also include other parameters).
Even under optimistic assumptions about SWB, this seems less noisy. Under pessimistic assumptions, I can imagine e.g. people implicitly interpreting the question as comparing their life to their neighbors (which were also affected by the intervention) or comparing their life now to their life in the past (which was still after the intervention), in which case SWB has no signal at all.
Thanks so much for replying, I learned a lot from your response and its clarity helped me update my thinking.
My claim is, I suspect that cultural factors cause people to choose different numbers for reasons orthogonal to what they actually want.
Thanks, the specificity here helped me understand your view better. I suppose with the examples you give—I would expect these to be exceptions rather than norms (because if e.g. wanting to have a career was the norm, over enough time, that would tend to become culturally normative and even in the process of it becoming a more normative view the difference with a SWB measure should diminish). And more broadly, interventions that have large samples and aim for generalizability should be reasonably representative and also diminish this as a concern.
I suppose I’m also thinking about the potential difference in specific SWB scales. Something like the SWLS scale or the single item measures would not be very domain specific but scales based around the e.g. Wheel of Life tradition tell you a lot more different facets of your life (e.g. you can see high overall scale but low for job satisfaction), so it seems to me that with the right scales and enough items you can address culture or other variance even further.
I’m guessing you are not talking about things like, how much free time you would exchange for an additional $1? Because that’s consistent with constant preferences? So, Alice has $5 and Bob has $10, they are asked to choose between X and Y, and they have predictably different preferences despite the fact that post-X-Alice has the same wealth (and other circumstances) and post-X-Bob and the same for Y? And this despite somehow controlling for confounders are correlated both with the causes for Alice’s and Bob’s wealth and with their preferences?
Thanks again for responding with such precision. What I was unable to articulate well is that your individual preferences are not stable (or I suppose: per person, rather than across people), i.e. Alice when she has $5 will exchange a different amount of free time for an extra $1 then when Alice has $10.
I agree with everything else you’ve said and especially with:
I would still rely more on asking people directly how much this intervention helped them / how much their life improved over this period (as opposed to comparing numbers reported at different points of time)
I think this is a hugely underappreciated point. I think some of the SWB measures target this issue somewhat but in a limited fashion. I’d love to see more qualitative interviews and participatory / or co-production interventions. I am always surprised by how many interventions say they cannot ascertain a causal mechanism quantitatively and so do not attempt to… well, ask people what worked and didn’t.
Thanks so much for replying, I learned a lot from your response and its clarity helped me update my thinking.
You’re very welcome, I’m glad it was useful!
I would expect these to be exceptions rather than norms (because if e.g. wanting to have a career was the norm, over enough time, that would tend to become culturally normative and even in the process of it becoming a more normative view the difference with a SWB measure should diminish).
I’m much more pessimistic. The processes that determine what is culturally normative are complicated, there are many examples of norms that discriminate against certain groups or curtail freedoms lasting over time, and if you’re optimizing for the near future then “over enough time” is not a satisfactory solution.
I suppose I’m also thinking about the potential difference in specific SWB scales. Something like the SWLS scale or the single item measures would not be very domain specific but scales based around the e.g. Wheel of Life tradition tell you a lot more different facets of your life (e.g. you can see high overall scale but low for job satisfaction), so it seems to me that with the right scales and enough items you can address culture or other variance even further.
I don’t know how those scales work, but (as I wrote in my reply to Joel), I would be much more optimistic about scales that are relative i.e. ask you to compare your well-being in situation A to situation B (whether these situations are familiar or hypothetical) rather than absolute (in which case it’s not clear what’s the reference frame).
What I was unable to articulate well is that your individual preferences are not stable (or I suppose: per person, rather than across people), i.e. Alice when she has $5 will exchange a different amount of free time for an extra $1 then when Alice has $10.
This is considered a consistent preference in standard (VNM) decision theory. It is entirely consistent that U(6$ and X free time) > U(5$ and Y free time) but U(11$ and X free time) < U(10$ and Y free time).
Hi Vanessa, I really liked how specific and critical your comment was, which I think is ultimately how research can improve, so I’ve upvoted it :)
I’m not linked to this report but have an interest in subjective measures broadly so thought I would add a different perspective for the sake of discussion in response to the two issues your raise.
I am skeptical of using answers to questions such as “how satisfied are you with your life?” as a measure of human preferences. I suspect that the meaning of the answer might differ substantially between people in different cultures and/or be normalized w.r.t. some complicated implicit baseline, such as what a person thinks they should “expect” or “deserve”.
I think the fact that SWB measures differs across cultures is actually a good sign that these measures capture what they are supposed to capture. Cultures differ in e.g. values (collectivistic vs individualistic), social and gender norms, economic systems, ethics and moral. Surely some of these facets should influence how people see what a good life is, what happiness is, what wellbeing is. In fact, I would be more concerned if different people with different views and circumstances did not, as you say, ‘differ substantially.’
I think these differences, attributable to culture or individual variance, are not likely to be of concern for what I would imagine would be the more common ways WELLBYs could be used. Most cost effectiveness analyses rely on RCTs or comparable designs with pre and post measures. You could look at changes within the same group of people easily pre and post and compare their differences. Or even beyond such designs, controlling for different sources of variance that we think are important (like age and gender most commonly) is not that tricky. This doesn’t seem a big methodological concern to me but would be keen to hear more about how things look from your view.
I would be more optimistic of measurements based on revealed preferences, i.e. what people actually choose given several options when they are well-informed or what people think of their past choices in hindsight (or at least what they say they would choose in hypothetical situations, but this is less reliable).
What I like about the original post here is that there is caution about the uncertainties and challenges with SWB measures, e.g. comparability issues, neutral points. So I think it’s only fair to point out some of the challenges for revealed preferences. In my reading, there’s a long body of researcher suggesting these are stable, yet in practice your ‘revealed’ preference at $5 is likely to be different than at $10. Many scholars have now critiqued the notion of revealed preferences and instead suggested that we should be talking about constructed preferences. Most notably I am thinking of Itamar Simonson’s work, though this as a field can be traced back at least to Slovic in the 1950s (to my knowledge).
Constructed preferences are seen as constructed in the process of making a choice—different tasks and contexts highlight different aspects of the available options, thus focusing decision-makers on different considerations that lead to seemingly inconsistent decisions (Bettman, Luce, and Payne 1998). And I think there is an argument to be made that your wellbeing can influence your constructed preferences. For instance, negative appraisals and rumination are common for low levels of wellbeing, and there is evidence to suggest that perceived choice difficulty is linked to variances for preferences (Dhar and Simonson 2003; Payne, Bettman, and Johnson 1992). Further, there is evidence broader metacognitive process influence constructed preferences, and those too can shift depending on your (lack of) happiness. So I wouldn’t be surprised that your preferences vary at e.g. low vs high SWB, in fact it sounds to me like it would be important to know SWB and be able to account for it.
My claim is not “SWB is empirically different between cultures therefore SWB is bad”. My claim is, I suspect that cultural factors cause people to choose different numbers for reasons orthogonal to what they actually want. For example, maybe Alice wants to be a career woman instead of her current role as a housewife (and would make choices to this effect if she had an opportunity), but she reports high life satisfaction because she feels that is expected of her (and it’s not like reporting a low number would help her). Or, maybe people in Fooland consistently report higher life satisfaction than people in Baristan (because they have lower expectations of how life should be), but nobody from Baristan wants to move to Fooland and everyone from Fooland want to move to Baristan if they can (because life is actually better in Baristan).
I agree that directly comparing “pre” to “post” SWB might work okay for many interventions, because the intervention doesn’t affect the confounding factors, as long as you’re comparing different interventions applied to similar populations. I would still rely more on asking people directly how much this intervention helped them / how much their life improved over this period (as opposed to comparing numbers reported at different points of time)[1]. And, we should still be vigilant about situations in which the confounders cannot be ignored (e.g. interventions that cause cultural shifts). And, there might be a non-linear relationship between SWB and decision-utility which should be somehow divulged if we are averaging these numbers.
I’m guessing you are not talking about things like, how much free time you would exchange for an additional $1? Because that’s consistent with constant preferences? So, Alice has $5 and Bob has $10, they are asked to choose between X and Y, and they have predictably different preferences despite the fact that post-X-Alice has the same wealth (and other circumstances) and post-X-Bob and the same for Y? And this despite somehow controlling for confounders are correlated both with the causes for Alice’s and Bob’s wealth and with their preferences?
I imagine such things can happen, in which case I would try to add hindsight judgements and judgements of people who experienced different circumstances into the mix. I expect that as people become more informed and experienced they roughly converge to some stable set of preferences, and the tradeoffs that don’t converge are not really important. If I’m wrong and they are important, then we need to use the revealed preferences of people in those particular circumstances (which, yes, might include SWB, might also include other parameters).
Even under optimistic assumptions about SWB, this seems less noisy. Under pessimistic assumptions, I can imagine e.g. people implicitly interpreting the question as comparing their life to their neighbors (which were also affected by the intervention) or comparing their life now to their life in the past (which was still after the intervention), in which case SWB has no signal at all.
Thanks so much for replying, I learned a lot from your response and its clarity helped me update my thinking.
Thanks, the specificity here helped me understand your view better. I suppose with the examples you give—I would expect these to be exceptions rather than norms (because if e.g. wanting to have a career was the norm, over enough time, that would tend to become culturally normative and even in the process of it becoming a more normative view the difference with a SWB measure should diminish). And more broadly, interventions that have large samples and aim for generalizability should be reasonably representative and also diminish this as a concern.
I suppose I’m also thinking about the potential difference in specific SWB scales. Something like the SWLS scale or the single item measures would not be very domain specific but scales based around the e.g. Wheel of Life tradition tell you a lot more different facets of your life (e.g. you can see high overall scale but low for job satisfaction), so it seems to me that with the right scales and enough items you can address culture or other variance even further.
Thanks again for responding with such precision. What I was unable to articulate well is that your individual preferences are not stable (or I suppose: per person, rather than across people), i.e. Alice when she has $5 will exchange a different amount of free time for an extra $1 then when Alice has $10.
I agree with everything else you’ve said and especially with:
I think this is a hugely underappreciated point. I think some of the SWB measures target this issue somewhat but in a limited fashion. I’d love to see more qualitative interviews and participatory / or co-production interventions. I am always surprised by how many interventions say they cannot ascertain a causal mechanism quantitatively and so do not attempt to… well, ask people what worked and didn’t.
You’re very welcome, I’m glad it was useful!
I’m much more pessimistic. The processes that determine what is culturally normative are complicated, there are many examples of norms that discriminate against certain groups or curtail freedoms lasting over time, and if you’re optimizing for the near future then “over enough time” is not a satisfactory solution.
I don’t know how those scales work, but (as I wrote in my reply to Joel), I would be much more optimistic about scales that are relative i.e. ask you to compare your well-being in situation A to situation B (whether these situations are familiar or hypothetical) rather than absolute (in which case it’s not clear what’s the reference frame).
This is considered a consistent preference in standard (VNM) decision theory. It is entirely consistent that U(6$ and X free time) > U(5$ and Y free time) but U(11$ and X free time) < U(10$ and Y free time).