Might be worth noting, utilities in this sense are preferences, which may or may not matter intrinsically. On preference/desire theories of well-being, your life goes better the more you get of what you want. But on, say, hedonist theories of well-being, your life goes better the more happiness you have (where happiness is often understood as a positive balance of pleasure over displeasure). Historically, ‘utilities’ in economics referred to happiness rather than preferences. This switched in the early 20th century with work by Pareto and Robbins and others.
I thought these ideas were interesting, but it would be useful to have a less technical and/or more intuitive explanation.
I think the issue with your comment was that someone said “I want to do some good, can anyone help me?” and your response reads as”oh, well, you and your type don’t seem as smart or important as another group of people” which seemed needlessly rude to me. I say it was needless because, pace your follow up comment, there was no strategic decision to make; it wasn’t as if the decision was to help fundraise from athletes or poker places, but just a request for assistance relevant to the former group.
Thanks very much for writing this up Sam. Two points from my perspective at the Happier Lives Institute, who you kindly mention and is a new entrant to cause prioritisation work.
First, you say this on theories of change:
But for a new organisation to solely focus on doing the research that they believed would be most useful for improving the world it is unclear what the theory of change would be. Some options are:
Do research → build audience on quality of research → then influence audience
Do research + persuade other organisations to use your research → influence their audiences and money
I think this nails the difficulty for new cause prioritisation research (where ‘new’ means ‘not being done by an existing EA organisation’). The existing organisations are the ‘gatekeepers’ for resources but doing novel cause prioritsation work requires, of necessity, doing work those organisations themselves consider low-priority (otherwise they would do it themselves). This creates a tension: funders often want potential entrants to show they have ‘buy-in’ from existing orgs. But the more novel the project, the less ‘buy-in’ it will have, and so the less chance it gets off the ground. I confess I don’t have a solution for this, other than that, if funders want to see new research, they need to be prepared to back it themselves.
Second, you say you’d like to see research on
unexplored areas that could be highly impactful such as access to painkillers or mental health
I’m pleased to say HLI is working on both those areas—see our April update.
It’s not clear to me what relationship one should expect between the cardinality (or not) of subjective scales and the relationship between ratings of overall SWB and rating of sub-domains (and thus what one could infer about the other from results in the first).
As a separate point, I’m not sure how to make sense of the putative inconsistency Kaj’s notes. I haven’t looked into the relationships between overall rating and rating of sub-domains; it’s not something that I’ve heard SWB researchers discuss much either. The most obvious explanations, in addition to those mentioned below, are to appeal to missing domains and/or different temporal foci (i.e. you just think about sub domains are they are now, but your life you also think about the future.
TL;DR Evidence suggests there aren’t shifts in SWB scales over time. This topic isn’t well understood. I’ve got a paper on this area in the works.
The question you’re asking here—do individuals rescale, that is, alter what the end-points of their scales refer to? - is one component of a broader concern. The broader question is whether subjective scales, those where individuals give numerical ratings of subjective phenomena are cardinally comparable, that is, whether a one-point change, on a given scale, represents the same size change in subjective experience for different people and at different times. For instance, if I say my happiness has gone from a 4 to an 5 out of 10, and you say your happiness have gone from a 3 to 4, can we conclude we each had the same increase in happiness?
Given how fundamental the concern is—it applies to all subjective data, not just SWB data—I’ve been surprised to find the topic hasn’t been looked into a great deal. Two leading SWB researchers, Stone and Krueger, said this in an 2018 review article
one of the most important issues inadequately addressed by current [SWB] research is that of systematic differences in question interpretation and response styles between population groups. Is there conclusive evidence that this is a problem? And, if so, are there ways to adjust for it? Information is needed about which types of group comparisons are affected, about the magnitude of the problem, and about the psychological mechanisms underlying these systematic differences
I’ve been looking at the cardinality of subjective scales. I’ve got a working paper that I’m not quite ready to put online—this should only be another couple of months. The paper is an evolution of work I had in my DPhil thesis (pp. 135), where I broke cardinal comparability into a number of components, reviewed the evidence for each, and concluded SWB data probably best interpreted as cardinally comparable.
The topic is pretty complicated and addressing all of it would take too long here. I’ll just provide a ‘quick and dirty’ answer to the specific concern you raise about rescaling (aka ‘intertemporal cardinality’). Prati and Senik (2020) compare remembered SWB—how satisfied individuals recall being—with observed past SWB—how satisfied individuals they said they were at the time. The use a German panel data where individuals were given 9 different pictures of changes in life satisfaction over time (e.g. staying flat, going up, going up then going down, etc) and asked to pick the one that best represented their own life.
There turns out be an (I think) pretty amazing match between the patterns of observed past and remembered SWB. This is only possible if either (A) individuals both use the same scale over time and have good memories or (B) individuals change the scale use and have bad memories. If individuals used the same scales and had bad memories, or used different scales and had good memories, there would be an inconsistency between the recalled and past observed patterns. Of the two options, (A) seems far more probable than (B). It’s hard to believe individuals really can’t remember how their lives have gone. Further, we might expect individuals will try not to rescale so that their answers are comparable over time.*
Hence, there doesn’t seem to be rescaling at the population level. Further research into whether there are some individuals who rescale, and what causes this to happen, would be good. I’m not aware of any.
*In fact, (B) requires quite specific and implausible patterns of memory failure. To illustrate, suppose your experienced satisfaction has been flat but, because your scale has been shrinking, your reported 0-10 level of satisfaction had been rising over time. To make your observed past satisfaction and your recalled satisfaction consistent, given this scale shrinkage, you would need to falsely recall that your satisfaction has increased. If you instead erroneously recalled that your satisfaction had decreased, then there would be an inconsistency between observation and recall.
Glad you were impressed! Would welcome any suggestions on how to improve the analysis.
Thanks for clarifying. Yes, I understand that economists lean towards a desire satisfaction theory of well-being and development economists lean towards Sen-style objective list theories. We’re in discussion with a development economist about whether and how to transform this into an article for a development econ journal, and there we expect to have to say a lot more about justifying the approach. That didn’t seem so necessary here: EAs tends to be quite sympathetic to hedonism and/or measuring well-being using SWB, and we’ve argued for that elsewhere, so we thought it more useful just to present the method.
Hello Jack, thanks for the comment. As you note, the document doesn’t attempt to address the issues you raised. We’re particularly interested to have people engage with the details of how we’ve done the analysis, although we recognise this will be far too far much ‘in the weeds’ for most (even members of this august forum).
I’d like to reply to your comment though, seeing as you’ve made it. There are quite a few separate points you could be making and I’m not sure which you mean to press.
You wonder about the suitability of SWB scores in low-income settings and raise Sen’s adaptive preferences point.
One way to understand the adaptive preferences point is as an argument against hedonism: poor people are happy, but their lives aren’t going well, so happiness can’t be what matters. From this it would follow that SWB scores might not be a good measure of well-being anywhere, not just in low-income contexts. Two replies. First, I’m pretty sympathetic to hedonism: if people are happy, then I think their lives are going well. Considering adaptive preferences doesn’t pull me to revise that. Second, as an empirical aside, it’s not at all obvious that people do adapt to poverty: the IDInsight survey found the Kenyan villagers had life satisfaction of around 2⁄10. That’s much lower than life satisfaction on average in Kenya of around 4.5. A quick gander at the worldwide distribution of life satisfaction scores (https://ourworldindata.org/happiness-and-life-satisfaction) tells you the poorer people are less satisfied than richer ones. The story might be interestingly different for measures of happiness (sometimes called ‘affect balance’).
Another way to understand the force of adaptive preferences is about what we owe one another. Here the idea is that we should help poor people even if doing so doesn’t improve their well-being (whatever well-being is) - the further thought being that it won’t improve their well-being because they’ve adapted. I don’t find this plausible. If I can provide resources for A or B, but helping A will have no impact on their well-being, whereas B will have their well-being increased, I say we help B. (To pull out the intuition that adaptive preferences is really about normative commitments, note we might think it makes sense for people in unfavourable circumstances to change their views to increase their well-being, but that there’s something odious about not helping people because they’ve managed to adapt; it’s as if we’re punishing them for their ingenuity)
A different concern one might have is that those in low-income contexts use scales very differently from those elsewhere: someone who says there are 4⁄10 but lives in poverty actually has a very different set of psychological states from someone who says they are 4⁄10 in the UK. In this case, it is mistaken to take these numbers at face value. The response to this problem is to have a theory of how and why people differently interpret subjective scales so you can account for and adjust the score: e.g. determine what the true SWB values are on the same scale. This is one of the most important issues not adequately addressed by current research. I’ve got a (long) paper on this I’ve nearly finished. The very short answer is that I think the answers are (cardinally) comparable and this is because individuals try to answer subjective scales in the same way as everyone else in order to make themselves understood. On this basis, I think it’s reasonable to interpret SWB scores at face value.
I think population ethics and infinite ethics should be separated. They are different topics, although with relevant to each other.
I enjoyed reading the paper but was unconvinced any serious problem was being raised (rather than merely a perception of a problem resulting from a misunderstanding).
Put very simply, the structure of the original case is that person chooses option B instead of option A because new information makes option B look better in expectation. It then turns out that option A, despite having lower expected value, produced the outcome with higher value. But there’s nothing mysterious about this: it happens all the time and provides no challenge to expected value theory or act utilitarianism. The fact that I would have won if I’d put all my money on number 16 at the roulette table does not mean I was mistaken not to do so.
At HLI, we’ve found creating a Theory of Change (TOC) very useful. It was (at least for me) quite a painful process of making explicit various assumptions and uncertainties and then talking through them. I think if we hadn’t done it explicitly we would (a) have made a less thoughtful plan and (b) different members of the team would be carrying around their own plans in their heads.
Going through a ToC process has also helped us to focus on meeting the needs of our target audiences. After developing our TOC, we sent out surveys to some of our key stakeholders to identify their concerns about subjective well-being measures and what new information would make them more likely to use them. Their responses provided the basis for our research agenda and the questions we have chosen to investigate this year.
We have a slightly more detailed version of our ToC diagram on our blog. Thanks for pointing out that it’s hard to find; we’ll think about putting it on a main page.
Hmm. Okay, that’s fair, on re-reading I note the OP did discuss this at the start, but I’m still unconvinced. I think the context may make a difference. If you are speaking to a member of the public, I think my concern stands, because of how they will misinterpret the thoughtfulness of your prediction. If you are speaking to other predict-y types, I think this concerns disappears, as they will interpret your statements the way you mean them. And if you’re putting a set of predictions together into a calculation, not only it is useful to carry that precision through, but it’s not as if your calculation will misinterpret you, so to speak.
I had a worry on similar lines that I was surprised not to see discussed.
I think the obvious objection to using additional precision is that this will falsely convey certainty and expertise to most folks (i.e. those outside the EA/rationalist bubble). If I say to a man in the pub either (A) “there’s a 12.4% chance of famine in Sudan” or (B) “there’s a 10% chance of famine in Sudan”, I expect him to interpret me as an expert in (A) - how else could I get so precise? - even if I know nothing about Sudan and all I’ve read about discussing probabilities is this forum post. I might expect him to take my estimate more seriously than of someone who knows about Sudan but not about conveying uncertainty.
(In philosophy of language jargon, the use of a non-rounded percentage is a conversational implicature that you have enough information, by the standards of ordinary discourse, to be that precise.)
I agree with this comment—thanks! A follow up: can you say why political theorists accept high stakes instrumentalism (as opposed to stating that they do)? It sounds like this is effectively a re-run of familiar debates between consequentialists and non-consequentialists (e.g. “can you kill one to save five? what about killing one to save a million?”), just wrapped in different language, so I’m wondering if something else is going on. I suppose I’m a bit surprised the view has no detractors—I imagine there are some (Kant?) who would hold the seemingly equivalent view you can never kill one to save any number of others.
Thanks for this write up. The list is quite substantial, which makes me think: do you have a list of problems you’ve considered, concluded are probably quite unpromising and therefore dissuade people from undertaking? I could imagine someone reading this and thinking “X and Y are on the list so Z, which wasn’t mentioned explicitly [but 80k would advice against], is also likely a good area”.
Just a quick note. It would be helpful if, at the start, you explained who you think this post is for and/or its practical upshot. I skimmed through the first 30% and wasn’t sure if this was a purely academic discussion or you were suggesting a way for donors to coordinate.
A couple of quick replies.
First, all your comments on the weirdness of Western mental healthcare are probably better described as ‘the weirdness of the US healthcare system’ rather than anything to do with mental health specifically. Note they are mostly to do with insurance issues.
Second, I think one can always raise the question of whether it’s better to (A) improve the best of service/good X or (B) improve distribution of existing versions of X. This also isn’t specific to mental health: one might retort to donors to AMF that they should be funding improvements in (say) health treatment in general or malaria treatment in particular. There’s a saying I like which is “the future is here, it just isn’t very evenly distributed”—if you compare Space-X launching rockets which can land themselves vs people not having clean drinking water. There seems to be very little we can say from the armchair about whether (A) or (B) is the more cost-effective option for a given X. I suspect that if there were a really strong ‘pull’ for goods/services to be provided, then we would already have ‘solved’ world poverty, which makes me think distribution is weakly related to innovation.
Aside: I wonder if there is some concept of ‘trickle-down’ innovation at play, and whether this is relevant analogous to that of ‘trickle-down’ economics.
I’m not sure what you mean by going from 0 to 1 vs 1 to n. Can you elaborate? I take it you mean the challenge of going from no to current best practice treatment (in developing countries) vs improving best practice (in developed countries).
I don’t have a cached answer on that question, but it’s an interesting one. You’d need to make quite a few more assumptions to work through it, e.g. how much better MH treatment could be than the current best practice, how easy it would be to get it there, how fast this would spread, etc. If you’d thought through some of this, I’d be interested to hear it.
Right. My thought is that we assume humans have the same capacity on average, because while there might be differences, we don’t know which way they’ll go so they should ‘wash out’ as statistical noise. Pertinently, this same response doesn’t work for animals because we really don’t know what their relatively max capacities are.
FWIW, the analogue to my response here would be to say we can expect all chickens to have approximately the same capacity as each other, even if individuals chickens differ. The claim isn’t about humans per se, but about similarities borne out of genetics.