I’m the Director of the Happier Lives Institute and Postdoctoral Research Fellow at Oxford’s Wellbeing Research Centre. I’m a philosopher by background and did my DPhil at Oxford, primarily under the supervision of Peter Singer and Hilary Greaves. I’ve previously worked for an MP and failed to start a start-up.
MichaelPlant
Talking through depression: The cost-effectiveness of psychotherapy in LMICs, revised and expanded
Hello Bob and team. Looking forward to reading this. To check, are you planning to say anything explicitly about your approach to moral uncertainty? I can’t see anything directly mentioned in 5., which is where I guessed it would go.
On that note Bob, you might recall that, a while back I mentioned to you some work I’m doing with a couple of other philosophers on developing an approach to moral uncertainty along these lines that will sometimes justify the practice of worldview diversification. That draft is nearly complete and your post series inspires me to try and get it over the line!
This report seems commendably thorough and thoughtful. Could you possibly spell out its implications for effect altruists though? I take it the conclusion is that humanity was most violent in the subsistence farming period, rather than before or after, but I’m not sure what to make of that. Presumably, it shows how violent people are changes quite radically in different contexts, so should I be reassured if, as seems likely, modern-type societies will continue? Returns to hunter-gathering and subsistence farming do not seem on the cards.
Sorry if I’ve missed something. But I reckoned that, if it was obvious to me, some others would missed it too.
Hello Jack, I’m honoured you’ve written a review of my review! Thanks also for giving me sight of this before you posted. I don’t think I can give a quick satisfactory reply to this, and I don’t plan to get into a long back and forth. So, I’ll make a few points to provide some more context on what I wrote. [I wrote the remarks below based on the original draft I was sent. I haven’t carefully reread the post above to check for differences, so there may be a mismatch if the post has been updated]
First, the piece you’re referring to is a book review in an academic philosophy journal. I’m writing primarily for other philosophers who I can expect to have lots of background knowledge (which means I don’t need to provide it myself).
Second, book reviews are, by design, very short. You’re even discouraged from referencing things outside the text you’re reviewing. The word limit was 1,500 words—I think my review may even be shorter than your review of my review! - so the aim is just to give a brief overview and make a few comments.
Third, the thrust of my article is that MacAskill makes a disquietingly polemical, one-sided case for longtermism. My objective was to point this out and deliberately give the other side so that, once readers have read both they are, hopefully, left with a balanced view. I didn’t seek to, and couldn’t possibly hope to, given a balanced argument that refutes longtermism in a few pages. I merely explain why, in my opinion, the case for it in the book is unconvincing. Hence, I’d have lots of sympathy with your comments if I’d written a full-length article, or a whole book, challenging longtermism.
Fourth, I’m not sure why you think I’ve misrepresented MacAskill (do you mean ‘misunderstood’?). In the part you quote, I am (I think?) making my own assessment, not stating MacAskill’s view at all. What’s more, I don’t believe MacAskill and I disagree about the importance of the intuition of neutrality for longtermism. I only observe that accepting that intuition would weaken the case—I do not claim there is no case for longtermism if you accept it. Specifically, you quote MacAskill saying:
[if you endorse the intuition of neutrality] you wouldn’t regard the absence of future generations in itself as a moral loss.
But the cause du jour of longtermism is preventing existential risks in order that many future happy generations exist. If one accepts the intuition of neutrality that would reduce/remove the good of doing that. Hence, it does present a severe challenge to longtermism in practice—especially if you want to claim, as MacAskill does, that longtermism changes the priorities.
Finally, on whether ‘many’ philosophers are sympathetic to person-affecting views. In my experience of floating around seminar rooms, it seems to be a view of the large minority of discussants (indeed, it seems far more popular than totalism). Further, it’s taken as a default, or starting position, which is why other philosophers have strenuously argued against it; there is little need to argue against views that no one holds! I don’t think we should assess philosophical truth ‘by the numbers’, ie polling people, rather than by arguments, particularly when those you poll aren’t familiar with the arguments. (If we took such an approach, utilitiarianism would be conclusively ‘proved’ false.). That said, off the top of my head, philosophers who have written sympathetically about person-affecting views include Bader, Narveson (two classic articles here and here), Roberts (especially here, but she’s written on it a few times), Frick (here and in his thesis), Heyd, Boonin, Temkin (here and probably elsewhere). There are not ‘many’ philosophers in the world, and population ethics is a small field, so this is a non-trivial number of authors! For an overview of the non-identity problem in particular, see the SEP.
Yup, I’d be inclined to agree it’s easier to ground the idea life is getting better for humans on objective measures. The is author’s comparison is made in terms of happiness though:
This work draws heavily on the Moral Weight Project from Rethink Priorities and relies on the same assumptions: utilitarianism, hedonism, valence symmetry, unitarianism, use of proxies for hedonic potential, and more
I’m actually not sure how I’d think about the animal side of things on the capabilities approach. Presumably, factory farming looks pretty bad on that, so there are increasingly many animals with low/negative capability lives, so unclear how this works out on a global level.
This is a minor comment but you say
There’s compelling evidence that life has gotten better for humans recently
I don’t think that is compelling evidence. Neither Pinker nor Karnosfky look at averages of self-reported happiness or life satisfaction, which would be the most relevant and comparable evidence, given your assumptions. According to the so-called Easterlin Paradox average subjective wellbeing has not been going up over the past few decades and won’t with further economic growth. There have been years of debates over this (I confess I got sucked in, once) but, either way, there is not a consensus among happiness researchers that there is compelling evidence life has gotten better (at least as far as happiness is concerned).
While I agree that net global welfare may be negative and declining, in light of the reasoning and evidence presented here, I think you could and should have claimed something like this: “net global welfare may be negative and declining, but it may also be positive and increasing, and really we have no idea which it is—any assessment of this type of is enormously speculative and uncertain”.
As I read the post, the two expressions that popped into my head were “if it’s worth doing, it’s worth doing with made-up numbers” and “if you saw how the sausage is made …”.
The problem here is that all of the numbers for ‘animal welfare capacity’ and ‘welfare percentages’ are essentially—and unfortunately—made up. You cite Rethink Priorities for the former, and Charity Entrepreneurship for the latter, and express some scepticism, but then more or less take them at face value. You don’t explain how those people came up with numbers and whether they should be trusted. I don’t think I am disparaging the good folk at either organisation—and I am certainly not trying to! - because you asked them about this, I think they would freely say “look, we don’t really know how to do this. We have intuitions about this, of course, but we’re not sure if there’s any good evidenced-based way to come up with these numbers”;* indeed, that is, in effect, the conclusion Rethink Priorities stated in the write-up of their recent workshop (see my comment on that too). Hence, such numbers should not be taken with a mere pinch of salt, but with a bucketload.
You don’t account for uncertainty here (you used point estimates), and I appreciate that is extra hassle, but I think the uncertainty here is the story. If you were to use upper and lower subjective bounds for e.g. “how unhappy are chickens compared to how happy humans are?”, they would be very large. They must be very large because, as noted, we don’t even know what factual, objective evidence we would use to narrow them down, so we have nothing to constrain the bounds of what’s plausible. But given how large they would be, we’d end up with the conclusion that we really don’t know whether global welfare is negative or positive.
* People are often tempted to say that we could look at objective measures, like neuron counts, for interspecies comparison. But this merely kicks the can down the road. How do we know what the relationship is between neuron counts and levels of pleasure and pain? We don’t. We have intuitions, yes, but what evidence could we point to to settle the question? I do not know.
Thanks for this and great diagrams! To think about what the relationship between EA and AI safety, it might help about what EA is for in general. I see a/the purpose of EA is helping people figure out how they can do the most good—to learn about the different paths, the options, and the landscape. In that sense, EA is a bit like a university, or a market, or maybe even just a signpost: once you’ve learnt what you needed, or found what you want and where to go, you don’t necessarily stick around: maybe you need to ‘go out’ in the world to do what calls you.
This explains your venn diagram: GHD and animal welfare are causes that exist prior to, and independent of EA. They, rather than EA, are where the action is if you prioritise those things. AI safety grew up inside EA.
I imagine AI safety will naturally form it own ecosytem independent of EA: much like, if you care about global development, you don’t need to participate in the EA community, a time will come when, for AI safety, you won’t need to participate in EA either.
This doesn’t mean that EA becomes irrelevant, much like a university doesn’t stop mattering when students graduate—or a market ceases to be useful when some people find what they want. There will be further cohorts who want to learn—and some people have to stick around to think about and highlight their options.
I suppose you could think of it as a manner of degree, right? Submitting feedback, doing interviews etc. are a good start, but involve people having less of a say than either 1. being part of the conversation or 2. having decision-making power, eg through a vote. People like to feel their concerns are heard—not just in EA, but in general—and when eg. a company says “please send in this feedback form” I’m not sure many people feel as heard as if someone (important) from that company listens to you live and publicly responds.
Learning from our mistakes: how HLI plans to improve
Thanks for this, which I read with interest! Can I see if I understood this correctly?
You were interested in finding a way to assess the severity of pains in farmed animals so that you can compare severity to duration and determine the total badness. In jargon, you’re after a cardinal measure of pain intensity.
And your conclusion was a negative one, specifically that there was no clear way to assess the severity of pain. As you note, for humans, we have self-reports, but for non-human animals, we don’t, so have to look for something else, such as how the animals behave. However, there is no obvious non-self-report method that would give us a quantitative measure of pain.
(from 1 and 2) We are forced to rely on our priors, that is, our intuitions, to make comparisons.
For what it’s worth, I agree with 1-3, but it does leave me with a feeling of hopelessness about animal welfare comparisons. Certainly, we have intuitions about how to do them, but we do not, as far I can see, have reason to think our intuitions are informed or reliable—what evidence would tell us we were wrong? So, I wonder if it would be true to say that making evidence-based (cardinal) animal welfare comparisons is not merely difficult (which implies they are possible) but actually not possible. I’m not sure what follows from this.
- 27 Sep 2023 8:58 UTC; 47 points) 's comment on Net global welfare may be negative and declining by (
Hey LondonGal, thank you for following up on this. I appreciate you clarifying your intentions about your post. Our team has read your comments and will take your feedback into consideration in our future work. I’ll hope you’ll forgive us for not responding in detail at this time. We are currently trying to focus on our current projects (and to avoid spending too much time on the EA forum, which we’ve done a lot of, particularly recently!). I expect that some (but probably not all) of the points you’ve raised in your original post will be addressed in some of our upcoming research. Thanks again for engaging with our work, and for sending the olive branch. It’s been received and we’ll look forward to future constructive interactions
Hello LondonGal (sorry, I don’t know your real name). I’m glad that, after your recent scepticism, you looked further into subjective wellbeing data and think it can be useful. You’ve written a lot and I won’t respond to it in detail.
I think the most important points to make are (1) there is a lot more research that you suggest and (2) it didn’t just start around COVID.
You are right that, if you search for “subjective wellbeing”, not much comes up (I get 706 results on PubMed). However, that’s because the trend among researchers to refer to “subjective wellbeing” rather than “subjective well-being”, ie with a hyphen, is very recent (as, AFAIK, is unrelated to COVID). Searching for “subjective well-being” yields, by comparison, 4,806 results.
If I expand the search to other keywords, namely “happiness” OR “life satisfaction” OR “subjective wellbeing” OR “subjective well-being”, I get over 150,000 results on PubMed. This is displayed below. Note the results go back to 1838, but the research only really kicks off after 1980.
I’m not an expert in academic databases, so I don’t know how comprehensive PubMed is of all research, but I’m guessing it’s a subset. FWIW, Ed Diener et al. in a 2018 article on subjective wellbeing states that there were “170,000 articles and books published on the topic in the past 15 years” although I haven’t looked into his numbers.
You might be interested in this article in the recent World Happiness Report which looks at various trends related to happiness, including academic interest, and find the fraction of articles on the topic has been trending up since the 80s: note the steady linear increase on the logarithmic y-axis.
Hence, as you suspected, many of the topics you raise here are reasonably quite well-trodden in the literature.
If you’re interested in looking further at the pros and cons of the WELLBY, the easiest thing for me to point you to is HLI’s To WELLBY Or Not To WELLBY report and references therein. You may also find this reading list useful.
In terms of the state of the literature, if you’ll forgive further laziness and self-promotion, I’d suggest my EAG London talk. The short answer is that ‘we’ (ie happiness researchers) know quite a bit about the nature and measurement of wellbeing and its causes and correlates, but relatively little about what the best ways are to increase it; work on WELLBY cost-effectiveness is barely older than COVID.
Hello Linch. We’re reluctant to recommend organisations that we haven’t been able to vet ourselves but are planning to vet some new mental health and non-mental health organisations in time for Giving Season 2023. The details are in our Research Agenda. For mental health, we say
We expect to examine Friendship Bench, Sangath, and CorStone unless we find something more promising.
On how we chose StrongMinds, you’ve already found our selection process. Looking back at the document, I see that we don’t get into the details, but it wasn’t just procedural. We hadn’t done a deep dive analysis at the point—the point of the search process was to work out what we should look at in more depth—but our prior was that StrongMinds would come out at or close to the top anyway. To explain, it was delivering the intervention we thought would do most good per person (therapy for depression), doing this cheaply (via lay-delivered interpersonal group therapy) and it seems to be a well-run organisation. I thought Friendship Bench might beat it (Friendship Bench had a volunteer model and so plausibly much lower costs but also lower efficacy) but they didn’t offer us their data at the time, something they’ve since done. I don’t think I knew about Sangath or Corstone back then.I think I would advise donors to wait until the end of this year. However, my money would be on Friendship Bench being the best MH org that isn’t StrongMinds and I wouldn’t rule out it being more cost-effective.
This was really helpful, thanks! I’ll discuss it with the team.
Oh yes. I agree with you that it would be good if people could make helpful suggestions as to what we could do, rather than just criticise.
[I don’t plan make any (major) comments on this thread after today. It’s been time-and-energy intensive and I plan to move back to other priorities]
Hello Jason,
I really appreciated this comment: the analysis was thoughtful and the suggestions constructive. Indeed, it was a lightbulb moment. I agree that some people do have us on epistemic probation, in the sense they think it’s inappropriate to grant the principle of charity, and should instead look for mistakes (and conclude incompetence or motivated reasoning if they find them).
I would disagree that HLI should be on epistemic probation, but I am, of course, at risk of bias here, and I’m not sure I can defend our work without coming off as counter-productively defensive! That said, I want to make some comments that may help others understand what’s going on so they can form their own view, then set out our mistakes and what we plan to do next.
Context
I suspect that some people have had HLI on epistemic probation since we started—for perhaps understandable reasons. These are:
We are advancing a new methodology, the happiness/SWB/WELLBY approach. Although there are decades of work in social science on this and it’s now used by the UK government, this was new to most EAs and they could ask, “if it’s so good, why aren’t we already doing it?” Of course, new ideas have to start sometime.
HLI is a second-generation EA org that is setting out to publicly re-assess some conclusions of an existing (understandably!) well-beloved first-generation org, GiveWell. I can’t think of another case like this; usually, EA orgs do non-overlapping work. Some people have welcomed us offering a different perspective, others have really not liked it; we’ve clearly ruffled some feathers.
As a result of 1 and 2, there is something of a status quo effect and scepticism that wouldn’t be the case if we were offering recommendations in a new area for the first time. To illustrate, suppose you know nothing about global health and wellbeing and someone tells you they’ve done lots of research based on happiness measures and they’ve found cash transfers are good, treating depression is about 7x as good as cash, deworming has no clear long-run effect, and life-saving bednets are 1-8x cash depending on difficult moral assumptions. I expect most people would say “yeah, that seems reasonable” rather than “why are engaged in motivated reasoning?”.
Our mistakes (so far)
The discussion in this thread has been a bit vague about what mistakes HLI has made that have led to suspicion. I want to set out what, from my perspective, those are. I reserve the right to add things to this list! We’ll probably put a version of this on our website.
1. Not modelling spillovers in our cash vs psychotherapy meta-analyses.
This was the first substantive empirical criticism we received. We had noted in the original report that not including spillovers was a limitation in the analysis, but we hadn’t explicitly modelled them. This was for a couple of reasons. We hadn’t seen any other EA org empirically model spillovers, so it seemed an non-standard thing to do, and the data were low-quality anyway, so we hadn’t thought much about including them. We were surprised when some claimed this was a serious (possibly deliberate) omission.
That said, we took the objection very seriously and reallocated several months of staff time in early 2022 from other topics to produce the best spillovers analysis we could on the available data, which we then shared with others. In the end, it only somewhat reduced the result (therapy went from 12x cash to 9x).
2. We were too confident and clumsy in our 2022 Giving Season post.
At that point, we incorporated nearly all the available data into our cash and psychotherapy meta-analyses, accounted for spillovers, plus looked at deworming (for which long-term effects on wellbeing are non-significant) and life-extending vs life-saving interventions (where psychotherapy seemed better under almost all assumptions). So we felt proud of our work and quite confident.
In retrospect, as I’ve alluded to before, we were overconfident, our language and execution were clumsy, and this really annoyed some people. I’m sorry about this and I hope people can forgive us. We have since spent some time internally thinking about how to communicate our confidence in our conclusions.
3. Not communicating better how we’d done our meta-analysis of psychotherapy, including that we hadn’t taken StrongMinds’ own studies at face value.
SimonM’s post has been mentioned a few times in this thread. As I mentioned in point 3 here, SimonM criticised the recommendation of StrongMinds based on concerns about StrongMinds’ own study, not our analysis. He said he didn’t engage with our analysis because he was ‘confused’ about methodology but that, in any case “key thing about HLI methodology is that [it] follows the same structure as the Founders Pledge analysis and so all the problems I mention above regarding data apply just as much to them as FP”. However, our evaluation didn’t have the problems he was referring to because of how we’d done the meta-analysis.
In retrospect, it seems the fact we’d done a meta-analysis, and not put much weight on StrongMinds’ own study, wasn’t something people knew, and we should have communicated that much more prominently; it was buried in some super long posts. We need to own our inadequate comms there. It was tough to learn he and some other members of EA have been thinking of us with such suspicion. Psychologically, the team took this very hard.
4. We made some errors in the spillovers analysis (as pointed out by James Snowden).
The main error here was that, as my colleague Joel conceded (“I blundered”) he coded some data the wrong way and this reduced the result from 9x to 7.5x cash transfers. This is embarrassing but not, I think, sinister by itself. These things happen, they’re awkward, but not well explained by motivated reasoning: coding errors are checkable and, in any case, the result is unchanged with the correction (see my comment here too)
I recognise that some will think this a catalogue of errors best explained by a corrupting agenda; the reader must make up their own mind. Two of the four are analysis errors of the sort that routinely appear when researchers review each other’s work. Two are errors in communication, either about being overconfident, or not communicating enough.
Next steps:
Jason suggests those on epistemic probation should provide a credible exit plan. Leaving aside whether we are, or should be, on epistemic probation, I am happy to set out what we plan to do next. For our research regarding reevaluating psychotherapy, we had already set this out in our new research agenda, at Section 2.1, which we published at the same time as this post. We are still committed to digging into the details of this analysis that have been brought up.
About bounties: I like this idea and wish we could implement it, but in light of our funding position, I don’t think we’ll be able to do so in the near-term.
In addition, we’ll consider adding something like an ‘Our mistakes’ page to our website to chronicle our blunders. At the least, we’ll add a version history to our cost-effectiveness analysis so people can see how the numbers have changed over time and why.
I am open to—indeed, I welcome—further constructive suggestions about what work people would like us to do to change their minds and/or reassure them. I do ask that these are realistic: as noted, we are a small, funding-and-capacity-constrained team with a substantial research agenda. We therefore might not be able to take all suggestions on board.
Hello Gregory. With apologies, I’m going to pre-commit both to making this my last reply to you on this post. This thread has been very costly in terms of my time and mental health, and your points below are, as far as I can tell, largely restatements of your earlier ones. As briefly as I can, and point by point again.
1.
A casual reader looking at your original comment might mistakenly conclude that we only used StrongMinds own study, and no other data, for our evaluation. Our point was that SM’s own work has relatively little weight, and we rely on many other sources. At this point, your argument seems rather ‘motte-and-bailey’. I would agree with you that there are different ways to do a meta-analysis (your point 3), and we plan to publish our new psychotherapy meta-analysis in due course so that it can be reviewed.
2.
Here, you are restating your prior suggestions that HLI should be taken in bad faith. Your claim is that HLI is good at spotting errors in others’ work, but not its own. But there is an obvious explanation about ‘survivorship’ effects. If you spot errors in your own research, you strip them out. Hence, by the time you publish, you’ve found all the ones you’re going to find. This is why peer review is important: external reviewers will spot the errors that authors have missed themselves. Hence, there’s nothing odd about having errors in your own work but also finding them in others. This is the normal stuff of academia!
3.
I’m afraid I don’t understand your complaint. I think your point is that “any way you slice the meta-analysis, psychotherapy looks more cost-effective than cash transfers” but then you conclude this shows the meta-analysis must be wrong, rather than it’s sensible to conclude psychotherapy is better. You’re right that you would have to deflate all the effect sizes by a large proportion to reverse the result. This should give you confidence in psychotherapy being better! It’s worth pointing out that if psychotherapy is about $150pp, but cash transfers cost about $1100pp ($1000 transfer + delivery costs), therapy will be more cost-effective per intervention unless its per-intervention effect is much smaller
The explanation behind finding a new charity on our first go is not complicated or sinister. In earlier work, including my PhD, I had suggested that, on a SWB analysis, mental health was likely to be relatively neglected compared to status quo prioritising methods. I explained this in terms of the existing psychological literature on affective forecasting errors: we’re not very good at imagining internal suffering, we probably overstate the badness of material due to focusing illusions, and our forecasts don’t account for hedonic adaptation (which doesn’t occur to mental health). So the simple explanation is that we were ‘digging’ where we thought we were mostly likely to find ‘altruistic gold’, which seems sensible given limited resources.
4.
As much as I enjoyed your football analogies, here also you’re restating, rather than further substantiating, your earlier accusations. You seem to conclude from the fact you found some problems with HLI’s analysis that we should conclude this means HLI, but only HLI, should be distrusted, and retain our confidence in all the other charity evaluators. This seems unwarranted. Why not conclude you would find mistakes elsewhere too? I am reminded of the expression, “if you knew how the sausage was made, you wouldn’t want to eat the sausage”. What I think is true is that HLI is a second-generation charity evaluator, we are aiming to be extremely transparent, and we are proposing novel priorities. As a result, I think we have come in for a far higher level of public scrutiny than others have, so more of our errors have been found, but I don’t know that we have made more and worse errors. Quite possibly, where errors have been noticed in others’ work, they have been quietly and privately identified, and corrected with less fanfare.
Hello Jason. FWIW, I’ve drafted a reply to your other comment and I’m getting it checked internally before I post it.
On this comment about you not liking that we hadn’t updated our website to include the new numbers: we all agree with you! It’s a reasonable complaint. The explanation is fairly boring: we have been working on a new charity recommendations page for the website, at which point we were going to update the numbers at add a note, so we could do it all in one go. (We still plan to do a bigger reanalysis later this year.) However, that has gone slower than expected and hadn’t happened yet. Because of your comment, we’ll add a ‘hot fix’ update in the next week, and hopefully have the new charity recommendations page live in a couple of weeks.
I think we’d have moved faster on this if it had substantially changed the results. On our numbers, StrongMinds is still the best life-improving intervention (it’s several times better than cash and we’re not confident deworming has a longterm effect). You’re right it would slightly change the crossover point for choosing between life-saving and life-improving interventions, but we’ve got the impression that donors weren’t making much use of our analysis anyway; even if they were, it’s a pretty small difference, and well within the margin of uncertainty.
- 13 Jul 2023 15:51 UTC; 31 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
Thanks for this. I think this is very valuable and really appreciate this being set out. I expect to come back to it a few times. One query and one request from further work—from someone, not necessarily you, as this is already a sterling effort!
I’ve heard Thorstad’s TOP talk a couple of times, but it’s now a bit foggy and I can’t remember where his ends and yours starts. Is it that Thorstad argues (some version of) longtermism relies on the TOP thesis, but doesn’t investigate whether TOP is true, whereas you set about investigating if it is true?
The request for further work: 18 is a lot of premises for a philosophical argument, and your analysis is very hedged. I recognise you don’t want to claim too much but, as a reader who has thought about this far less than you, I would really appreciate you telling me what you think. Specifically, it would be useful to know which of the premises are the most crucial, in the sense of being least plausible. Presumably, some of 18 premises we don’t need to worry about, and our attention can concentrate on a subset. Or, if you think all the premises are similarly plausible, that would be useful to know too!