Everything written in the post above strongly resonates with my own experiences, in particular the following lines:
the creation of this paper has not signalled epistemic health. It has been the most emotionally draining paper we have ever written.
the burden of proof placed on our claims was unbelievably high in comparison to papers which were considered less “political” or simply closer to orthodox views.
The EA community prides itself on being able to invite and process criticism. However, warm welcome of criticism was certainly not our experience in writing this paper.
I think criticism of EA orthodoxy is routinely dismissed. I would like to share a few more stories of being publicly critical of EA in the hope that doing so adds some useful evidence to the discussion:
Consider systemic change. “Some critics of effective altruism allege that its proponents have failed to engage with systemic change” (source). I have always found the responses (eg here and here) to this critique to be dismissive and miss the point. Why can we not just say: yes we are a new community this area feels difficult and we are not there yet? Why do we have to pretend EA is perfect and does systemic change stuff well?
My own experience (risk planning). I have some relevant expertise from engaging with professional risk managers, military personnel, counterterrorism staff and so on. I have really rally struggled to communicate any of this to EA folk, especially where it suggest that EAs are not thinking about risks well. I tend to find I get downvoted or told I am strawmanning EA. If I want to avoid it is is possible if I put huge amounts of time and mental energy.
Mental health. Consider that Michael Plant has, for 6 years now, been making the case that GiveWell and other neartermist EAs don’t put enough weight on mental health. I believe his experience is mostly one of feeling that people are dismissive rather than engage with him.
Other. A few years back I remember being unimpressed that EAs response to Iason Gabriel’s critique was largely to argue back and ignore it. There was no effort to see if any of the criticisms contained useful grains of truth that could help us improve EA.
Other evidence to note is that the top thing that EAs thinks other EAs get (source) is “Reinventing the wheel” and “Overconfidence and misplaced difference” and many EAs worry that EA is intellectually stale / stagnant (in answers to this question). On the other hand many EA orgs are very good at recognising their mistakes made (e.g with ‘Our mistakes’ pages), which is a great cultural thing that we as a community should be proud of.
I think we should also recognise that Both Carla and Luke have full time EA research jobs and they have found it time consuming and for someone without a full time position it can become almost impossibly time consuming and draining to do a half decent job. This essentially closes off a lot of people from critiquing EA.
If there was one change I would make I would like there to be a cultural shift so if someone posts something critical we try to steelman it rather than dismiss it. (Here is an example of steelmanning some of Phil Torres’ arguments [edit: although we should of course not knowingly steelman/endorse arguments made in bad faith]). We could also on occasion say “yes we get this wrong and we still have much to learn” and not treat every critique as an attack.
I agree with this, and would add that the appropriate response to arguments made in bad faith is not to “steelman” them (or to add them to a syllabus, or to keep disseminating a cherry-picked quote from a doctoral dissertation), but to expose them for what they are or ignore them altogether. Intellectually dishonesty is the epistemic equivalent of defection in the cooperative enterprise of truth-seeking; to cooperate with defectors is not a sign of virtue, but quite the opposite.
This person doesn’t believe their own argument, but they aren’t lying within the argument itself.
While it’s obvious that we should point out lies where we see them, I think we should distinguish between (1) and (2). An argument’s original promoter not believing it isn’t a reason for no one to believe it, and shouldn’t stop us from engaging with arguments that aren’t obviously false.
I agree that there is a relevant difference, and I appreciate your pointing it out. However, I also think that knowledge of the origins of a claim or an argument is sometimes relevant for deciding whether one should engage seriously with it, or engage with it at all, even if the person presenting it is not himself/herself acting in bad faith. For example, if I know that the oil or the tobacco industries funded studies seeking to show that global warming is not anthropogenic or that smoking doesn’t cause cancer, I think it’s reasonable to be skeptical even if the claims or arguments contained in those studies are presented by a person unaffiliated with those industries. One reason is that the studies may consist of filtered evidence—that is, evidence selected to demonstrate a particular conclusion, rather than to find the truth. Another reason is that by treating arguments skeptically when they originate in a non-truth-seeking process, one disincentivizes that kind of intellectually dishonest and socially harmful behavior.
In the case at hand, I think what’s going on is pretty clear. A person who became deeply hostile to longtermism (for reasons that look prima facie mostly unrelated to the intellectual merits of those views) diligently went through most of the longtermist literature fishing for claims that would, if presented in isolation to a popular audience using technically true but highly tendentious or misleading language and/or stripped of the relevant context, cause serious damage to the longtermist movement. In light of this, I think it is not only naive but epistemically unjustified to insist that this person’s findings be assessed on their merits alone. (Again, consider what your attitude would be if the claims originated e.g. in an industry lobbyist.)
In addition, I think that it’s inappropriate to publicize this person’s writings, by including them in a syllabus or by reproducing their cherry-picked quotes. In the case of Nick Beckstead’s quote, in particular, its reproduction seems especially egregious, because it helps promote an image of someone diametrically opposed to the truth: an early Giving What We Can Member who pledged to donate 50% of his income to global poverty charities for the rest of his life is presented—from a single paragraph excerpted from a 180-page doctoral dissertation intended to be read primarily by an audience of professional analytic philosophers—as “support[ing] white supremacist ideology”. Furthermore, even if Nick was just an ordinary guy rather than having impeccable cosmopolitan credentials, I think it would be perfectly appropriate to write what he did in the context of a thesis advancing the argument that our moral judgments are less reliable than is generally assumed. More generally, and more importantly, I believe that as EAs we should be willing to question established beliefs related to the cost-effectiveness of any cause, even if this risks reaching very uncomfortable conclusions, as long as the questioning is done as part of a good-faith effort in cause-prioritization and subject to the usual caveats related to possible reputational damage or the spreading of information hazards. It frightens me to think what our movement might become if it became an accepted norm that explorations of the sort exemplified by the quote can only be carried out “through a postcolonial lens”!
Note: Although I generally oppose disclaimers, I will add one here. I’ve known Nick Beckstead for a decade or so. We interacted a bit back when he was working at FHI, though after he moved to Open Phil in 2014 we had no further communication, other than exchanging greetings when he visited the CEA office around 2016 and corresponding briefly in a professional capacity. I am also an FTX Fellow, and as I learned recently, Nick has been appointed CEO of the FTX Foundation. However, I made this same criticism ten months ago, way before I developed any ties to FTX (or had any expectations that I would develop such ties or that Nick was being considered for a senior position). Here’s what I wrote back then:
I personally do not think it is appropriate to include an essay in a syllabus or engage with it in a forum post when (1) this essay characterizes the views it argues against using terms like ‘white supremacy’ and in a way that suggests (without explicitly asserting it, to retain plausible deniability) that their proponents—including eminently sensible and reasonable people such as Nick Beckstead and others— are white supremacists, and when (2) its author has shown repeatedly in previous publications, social media posts and other behavior that he is not writing in good faith and that he is unwilling to engage in honest discussion.
One reason is that the studies may consist of filtered evidence—that is, evidence selected to demonstrate a particular conclusion, rather than to find the truth. Another reason is that by treating arguments skeptically when they originate in a non-truth-seeking process, one disincentivizes that kind of intellectually dishonest and socially harmful behavior.
The “incentives” point is reasonable, and it’s part of the reason I’d want to deprioritize checking into claims with dishonest origins.
However, I’ll note that establishing a rule like “we won’t look at claims seriously if the person making them has a personal vendetta against us” could lead to people trying to argue against examining someone’s claims by arguing that they have a personal vendetta, which gets weird and messy. (“This person told me they were sad after org X rejected their job application, so I’m not going to take their argument against org X’s work very seriously.”)
Of course, there are many levels to what a “personal vendetta” might entail, and there are real trade-offs to whatever policy you establish. But I’m wary of taking the most extreme approach in any direction (“let’s just ignore Phil entirely”).
As for filtered evidence — definitely a concern if you’re trying to weigh the totality of evidence for or against something. But not necessarily relevant if there’s one specific piece of evidence that would be damning if true. For example, if Phil had produced a verifiable email exchange showing an EA leader threatening to fire a subordinate for writing something critical of longtermism, it wouldn’t matter much to me how much that leader had done to encourage criticism in public.
I think it is not only naive but epistemically unjustified to insist that this person’s findings be assessed on their merits alone.
I agree with this to the extent that those findings allow for degrees of freedom — so I’ll be very skeptical of conversations reported third-hand or cherry-picked quotes from papers, but still interested in leaked emails that seem like the genuine article.
In addition...
No major disagreements with anything past this point. I certainly wouldn’t put Phil’s white-supremacy work on a syllabus, though I could imagine excerpts of his criticism on other topics making it in — of the type “this point of view implies this objection” rather than “this point of view implies that the person holding it is a dangerous lunatic”.
Just to clarify (since I now realize my comment was written in a way that may have suggested otherwise): I wasn’t alluding to your attempt to steelman his criticism. I agree that at the time the evidence was much less clear, and that steelmanning probably made sense back then (though I don’t recall the details well).
Strong upvote from me—you’ve articulated my main criticisms of EA.
I think it’s particularly surprising that EA still doesn’t pay much attention to mental health and happiness as a cause area, especially when we discuss pleasure and suffering all the time, Yew Kwang Ng focused so much on happiness, and Michael Plant has collaborated with Peter Singer.
In your view, what would it look like for EA to pay sufficient attention to mental health?
To me, it looks like there’s a fair amount of engagement on this:
Peter Singer obviously cares about the issue, and he’s a major force in EA by himself.
Michael Plant’s last post got a positive writeup in Future Perfect and serious engagement from a lot of people on the Forum and on Twitter (including Alexander Berger, who probably has more influence over neartermist EA funding than any other person); Alex was somewhat negative on the post, but at least he read it.
Forum posts with the “mental health” tag generally seem to be well-received.
Will MacAskill invited three very prominent figures to run an EA Forum AMA on psychedelics as a promising mental health intervention.
Founders Pledge released a detailed cause area report on mental health, which makes me think that a lot of their members are trying to fund this area.
EA Global has featured several talks on mental health.
I can’t easily find engagement with mental health from Open Phil or GiveWell, but this doesn’t seem like an obvious sign of neglect, given the variety of other health interventions they haven’t closely engaged with.
I’m limited here by my lack of knowledge w/r/t funding constraints for orgs like StrongMinds and the Happier Lives Institute. If either org way really funding-constrained, I’d consider them to be promising donation targets for people concerned about global health, but I also think that those people — if they look anywhere outside of GiveWell — have a good chance of finding these orgs, thanks to their strong presence on the Forum and in other EA spaces.
I’ve only just seen this and thought I should chime in. Before I describe my experience, I should note that I will respond to Luke’s specific concerns about subjective wellbeing separately in a reply to his comment.
TL;DR Although GiveWell (and Open Phil) have started to take an interest in subjective wellbeing and mental health in the last 12 months, I have felt considerable disappointment and frustration with their level of engagement over the previous six years.
I raised the “SWB and mental health might really matter” concerns in meetings with GiveWell staff about once a year since 2015. Before 2021, my experience was that they more or less dismissed my concerns, even though they didn’t seem familiar with the relevant literature. When I asked what their specific doubts were, these were vague and seemed to change each time (“we’re not sure you can measure feelings”, “we’re worried about experimenter demand effect”, etc.). I’d typically point out their concerns had already been addressed in the literature, but that still didn’t seem to make them more interested. (I don’t recall anyone ever mentioning ‘item response theory’, which Luke raises as his objection.) In the end, I got the impression that GiveWell staff thought I was a crank and were hoping I would just go away.
GiveWell’s public engagement has been almost non-existent. When HLI published, in August 2020, a document explaining how GiveWell could (re)estimate their own ‘moral weights’ using SWB, GiveWell didn’t comment on this (a Founders Pledge researcher did, however, provide detailed comments). The first and only time GiveWell has responded publicly about this was in December 2020, where they set out their concerns in relation to our cash transfer vs therapy meta-analyses; I’ve replied to those comments (many of which expressed quite non-specific doubts) but not yet received a follow-up.
The response I was hoping for—indeed, am still hoping for—was the one Will et al. gave above, namely, “We’re really interested in serious critiques. What do you think we’re getting wrong, why, and what difference would it make if you were right? Would you like us to fund you to work on this?” Obviously, you wouldn’t expect an organisation to engage with critiques that are practically unimportant and from non-credible sources. In this case, however, I was raising fundamental concerns that, if true, could substantially alter the priorities, both for GiveWell and EA more broadly. And, for context, at the time I initially highlighted these points I was doing a philosophy PhD supervised by Hilary Greaves and Peter Singer and the measurement of wellbeing was a big part of my thesis.
There has been quite good engagement from other EAs and EAs orgs, as Aaron Gertler notes above. I can add to those that, for instance, Founders Pledge have taken SWB on board in their internal decision-making and have since made recommendations in mental health. However, GiveWell’s lack of engagement has really made things difficult because EAs defer so much to GiveWell: a common question I get is “ah, but what does GiveWell think?” People assume that, because GiveWell didn’t take something seriously, that was strong evidence they shouldn’t either. This frustration was compounded by the fact that because there isn’t a clear, public statement of what GiveWell’s concerns were, I could neither try to address their concerns nor placate the worries of others by saying something like “GiveWell’s objection is X. We don’t share that because of Y”.
This is pure speculation on my part, but I wonder if GiveWell (and perhaps Open Phil too) developed an ‘ugh field’ around subjective wellbeing and mental health. They didn’t look into it initially because they were just too damn busy. But then, after a while, it became awkward to start engaging with because that would require admitting they should have done so years ago, so they just ignored it. I also suspect there’s been something of an information cascade where someone originally looked at all this (see my reply to Luke above), decided it wasn’t interesting, and then other staff members just took that on trust and didn’t revisit it—everyone knew an idea could be safely ignored even if they weren’t sure why.
Since 2021, however, things have been much better. In late 2020, as mentioned, HLI published a blog post showing how SWB could be used to (re)estimate GiveWell’s ‘moral weights’. I understand that some of GiveWell’s donors asked them for an opinion on this and that pushed them to engage with it. HLI had a productive conversation with GiveWell in February 2021 (see GiveWell’s notes) where, curiously, no specific objections to SWB were raised. GiveWell are currently working on a blog post responding to our moral weights piece and they kindly shared a draft with us in July asking for our feedback. They’ve told us they plan to publish reports on SWB and psychotherapy in the next 3-6 months.
Regarding Open Phil, it seemed pointless to engage unless GiveWell came on board, because Open Phil also defer strongly to GiveWell’s judgements, as Alex Berger has recently stated. However, we recently had some positive engagement from Alex on Twitter, and a member of his team contacted HLI for advice after reading our report and recommendations on global mental health. Hence, we are now starting to see some serious engagement, but it’s rather overdue and still less fulsome than I’d want.
Really sad to hear about this, thanks for sharing. And thank you for keeping at it despite the frustrations. I think you and the team at HLI are doing good and important work.
To me (as someone who has funded the Happier Lives institute) I just think it should not have taken founding an institute and 6 years and of repeating this message (and feeling largely ignored and dismissed by existing EA orgs) to reach the point we are at now.
I think expecting orgs and donors to change direction is certainly a very high bar. But I don’t think we should pride ourselves on being a community that pivots and changes direction when new data (e.g. on subjective wellbeing) is made available to us.
FWIW, one of my first projects at Open Phil, starting in 2015, was to investigate subjective well-being interventions as a potential focus area. We never published a page on it, but we did publish some conversation notes. We didn’t pursue it further because my initial findings were that there were major problems with the empirical literature, including weakly validated measures, unconvincing intervention studies, one entire literature using the wrong statistical test for decades, etc. I concluded that there might be cost-effective interventions in this space, perhaps especially after better measure validation studies and intervention studies are conducted, but my initial investigation suggested it would take a lot of work for us to get there, so I moved on to other topics.
At least for me, I don’t think this is a case of an EA funder repeatedly ignoring work by e.g. Michael Plant — I think it’s a case of me following the debate over the years and disagreeing on the substance after having familiarized myself with the literature.
That said, I still think some happiness interventions might be cost-effective upon further investigation, and I think our Global Health & Well-Being team has been looking into the topic again as that team has gained more research capacity in the past year or two.
Hello Luke, thanks for this, which was illuminating. I’ll make an initial clarifying comment and then go on to the substantive issues of disagreement.
At least for me, I don’t think this is a case of an EA funder repeatedly ignoring work by e.g. Michael Plant — I think it’s a case of me following the debate over the years and disagreeing on the substance after having familiarized myself with the literature.
I’m not sure what you mean here. Are you saying GiveWell didn’t repeatedly ignore the work? That Open Phil didn’t? Something else? As I set out in another comment, my experience with GiveWell staff was of being ignored by people who weren’t at that familiar with the relevant literature—FWIW, I don’t recall the concerns you raise in your notes being raised with me. I’ve not had interactions with Open Phil staff prior to 2021 - for those reading, Luke and I have never spoken—so I’m not able to comment regarding that.
Onto the substantive issues. Would you be prepared to more precisely state what your concerns are, and what sort of evidence would chance your mind? Reading your comments and your notes, I’m not sure exactly what your objections are and, in so far as I do, they don’t seem like strong objections.
You mention “weakly validated measures” as an issue but in the text you say “for some scales, reliability and validity have been firmly established”, which implies to me you think (some) scales are validated. So which scales are you worried about, to what extent, and why? Are they so non-validated we should think they contain no information? If some scales are validated, why not just use those ones? By analogy, we wouldn’t give up on measuring temperature if we thought only some of our thermometers were broken. I’m not sure if we’re even on the same page about what it is to ‘validate’ a measure of something (I can elaborate, if helpful).
On “unconvincing intervention studies”, I take it you’re referring to your conversation notes with Sonja Lyubormirsky. The ‘happiness interventions’ you talk about are really just those from the field of ‘positive psychology’ where, basically, you take mentally healthy people and try to get them to change their thoughts and behaviours to be happier, such as by writing down what they’re grateful for. This implies a very narrow interpretation of ‘happiness interventions’. Reducing poverty or curing diseases are ‘happiness interventions’ in my book because they increase happiness, but they are certainly not positive psychology interventions. One can coherently think that subjective wellbeing measures, eg self-reported happiness, are valid and capture something morally important but deny gratitude journalling etc. are particularly promising ways, in practice, of increasing it. Also, there’s a big difference between the lab-style experiments psychologists run and the work economists tend to do looking at large panel and cohort data sets.
Regarding “one entire literature using the wrong statistical test for decades”, again, I’m not sure exactly what you mean. Is the point about ‘item response theory’? I confess that’s not something that gets discussed in the academic world of subjective wellbeing measurement—I don’t think I’ve ever heard it mentioned. After a quick look, it seems to be a method to relate scores of psychometric tests to real-world performance. That seems to be a separate methodological ballgame from concerns about the relationship between how people feel and how they report those feelings on a numerical scale, e.g. when we ask “how happy are you, 0-10?”. Subjective wellbeing researchers do talk about the issue of ‘scale cardinality’, ie, roughly, does your “7/10” feel the same to you as my “7/10″ does to me? This issue has been starting to get quite a bit of attention in just the last couple of years but has, I concede, been rather neglected by the field. I’ve got a working paper on this under review which is (I think) the first comprehensive review of the problem.
To me, it looks like in your initial investigation you had the bad luck to run into a couple of dead ends and, quite understandably given those, didn’t go further. But I hope you’ll let me try to explain further to you why I think happiness research (like happiness itself) is worth taking seriously!
I don’t have much time to engage on this, but here are some quick replies:
I don’t know anything about your interactions with GiveWell. My comment about ignoring vs. not-ignoring arguments about happiness interventions was about me / Open Phil, since I looked into the literature in 2015 and have read various things by you since then. I wouldn’t say I ignored those posts and arguments, I just had different views than you about likely cost-effectiveness etc.
On “weakly validated measures,” I’m talking in part about lack of IRT validation studies for SWB measures used in adults (NIH funded such studies for SWB measures in kids but not adults, IIRC), but also about other things. The published conversation notes only discuss a small fraction of my findings/thoughts on the topic.
On “unconvincing intervention studies” I mean interventions from the SWB literature, e.g. gratitude journals and the like. Personally, I’m more optimistic about health and anti-poverty interventions for the purpose of improving happiness.
On “wrong statistical test,” I’m referring to the section called “Older studies used inappropriate statistical methods” in the linked conversation notes with Joel Hektner.
TBC, I think happiness research is worth engaging and has things to teach us, and I think there may be some cost-effectiveness happiness interventions out there. As I said in my original comment, I moved on to other topics not because I think the field is hopeless, but because it was in a bad enough state that it didn’t make sense for me to prioritize it at the time.
Thanks for this too. I appreciate you’ve since moved on to other things, so this isn’t really your topic to engage on anymore. However, I’ll make two comments.
First, you said you read various things in the area, including by me, since 2015. It would have been really helpful (to me) if, given you had different views, you had engaged at the time and set out where you disagreed and what sort of evidence would have changed your mind.
Second, and similarly, I would really appreciate it if the current team at Open Philanthropy could more precisely set out their perspective on all this. I did have a few interactions with various Open Phil staff in 2021, but I wouldn’t say I’ve got anything like canonical answers on what their reservations are about 1. measuring outcomes in terms of SWB - Alex Berger’s recent technical update didn’t comment on this—and 2. doing more research or grantmaking into the things that, from the SWB perspective, seem overlooked.
This is an interesting conversation. It’s veering off into a separate topic. I wish there was a way to “rebase” these spin-off discussions into a different place. For better organisation.
Do you feel that existing data on subjective wellbeing is so compelling that it’s an indictment on EA for GiveWell/OpenPhil not to have funded more work in that area? (Founder’s Pledge released their report in early 2019 and was presumably working on it much earlier, so they wouldn’t seem to be blameworthy.)
I can’t say much more here without knowing the details of how Michael/others’ work was received when they presented it to funders. The situation I’ve outlined seems to be compatible both with “this work wasn’t taken seriously enough” and “this work was taken seriously, but seen as a weaker thing to fund than the things that were actually funded” (which is, in turn, compatible with “funders were correct in their assessment” and “funders were incorrect in their assessment”).
That Michael felt dismissed is moderate evidence for “not taken seriously enough”. That his work (and other work like it) got a bunch of engagement on the Forum is weak evidence for “taken seriously” (what the Forum cares about =/= what funders care about, but the correlation isn’t 0). I’m left feeling uncertain about this example, but it’s certainly reasonable to argue that mental health and/or SWB hasn’t gotten enough attention.
(Personally, I find the case for additional work on SWB more compelling than the case for additional work on mental health specifically, and I don’t know the extent to which HLI was trying to get funding for one vs. the other.)
Do you feel that existing data on subjective wellbeing is so compelling that it’s an indictment on EA for GiveWell/OpenPhil not to have funded more work in that area?
Tl;dr. Hard to judge. Maybe: Yes for GW. No for Open Phil. Mixed for EA community as a whole.
I think I will slightly dodge the question and answer the separate question – are these orgs doing enough exploratory type research. (I think this is a more pertinent question, and although I think subjective wellbeing is worth looking into as an example it is not clear it is at the very top of the list of things to look into more that might change how we think about doing good).
Firstly to give a massive caveat that I do not know for sure. It is hard to judge and knowing exactly how seriously various orgs have looked into topics is very hard to do from the outside. So take the below with a pinch of salt. That said:
OpenPhil – AOK.
OpenPhil (neartermists) generally seem good at exploring new areas and experimenting (and as Luke highlights, did look into this).
GiveWell – hmmm could do better.
GiveWell seem to have a pattern of saying they will do more exploratory research (e.g. into policy) and then not doing it (mentioned here, I think 2020 has seen some but minimal progress).
I am genuinely surprised GiveWell have not found things better than anti-malaria and deworming (sure, there are limits on how effective scalable charities can be but it seems odd our first guesses are still the top recommended).
There is limited catering to anyone who is not a classical utilitarian – for example if you care about wellbeing (e.g. years lived with disability) but not lives saved it is unclear where to give.
EA in general – so-so.
There has been interest from EAs (individuals, Charity Entrepreneurship, Founders Pledge, EAG) on the value of happiness and addressing mental health issues, etc.
It is not just Michael. I get the sense the folk working on Improving Institutional Decision Making (IIDM) have struggled to get traction and funding and support too. (Although maybe promoters of new causes areas within EA always feel their ideas are not taken seriously.)
The EA community (not just GiveWell) seems very bad at catering to folk who are not roughly classical (or negative leaning) utilitarians (a thing I struggled with when working as a community builder).
I do believe there is a lack of exploratory research happening given the potential benefits (see here and here). Maybe Rethink are changing this.
Not sure I really answered the question. And anyway none of those points are very strong evidence as much as me trying to explain my current intuitions. But maybe I said something of interest.
Everything written in the post above strongly resonates with my own experiences, in particular the following lines:
I think criticism of EA orthodoxy is routinely dismissed. I would like to share a few more stories of being publicly critical of EA in the hope that doing so adds some useful evidence to the discussion:
Consider systemic change. “Some critics of effective altruism allege that its proponents have failed to engage with systemic change” (source). I have always found the responses (eg here and here) to this critique to be dismissive and miss the point. Why can we not just say: yes we are a new community this area feels difficult and we are not there yet? Why do we have to pretend EA is perfect and does systemic change stuff well?
My own experience (risk planning). I have some relevant expertise from engaging with professional risk managers, military personnel, counterterrorism staff and so on. I have really rally struggled to communicate any of this to EA folk, especially where it suggest that EAs are not thinking about risks well. I tend to find I get downvoted or told I am strawmanning EA. If I want to avoid it is is possible if I put huge amounts of time and mental energy.
Mental health. Consider that Michael Plant has, for 6 years now, been making the case that GiveWell and other neartermist EAs don’t put enough weight on mental health. I believe his experience is mostly one of feeling that people are dismissive rather than engage with him.
Other. A few years back I remember being unimpressed that EAs response to Iason Gabriel’s critique was largely to argue back and ignore it. There was no effort to see if any of the criticisms contained useful grains of truth that could help us improve EA.
Other evidence to note is that the top thing that EAs thinks other EAs get (source) is “Reinventing the wheel” and “Overconfidence and misplaced difference” and many EAs worry that EA is intellectually stale / stagnant (in answers to this question). On the other hand many EA orgs are very good at recognising their mistakes made (e.g with ‘Our mistakes’ pages), which is a great cultural thing that we as a community should be proud of.
I think we should also recognise that Both Carla and Luke have full time EA research jobs and they have found it time consuming and for someone without a full time position it can become almost impossibly time consuming and draining to do a half decent job. This essentially closes off a lot of people from critiquing EA.
If there was one change I would make I would like there to be a cultural shift so if someone posts something critical we try to steelman it rather than dismiss it. (Here is an example of steelmanning some of Phil Torres’ arguments [edit: although we should of course not knowingly steelman/endorse arguments made in bad faith]). We could also on occasion say “yes we get this wrong and we still have much to learn” and not treat every critique as an attack.
Hope some extra views help.
i do think there is a difference between this article and stuff from people like Torres, in terms of good faith
I agree with this, and would add that the appropriate response to arguments made in bad faith is not to “steelman” them (or to add them to a syllabus, or to keep disseminating a cherry-picked quote from a doctoral dissertation), but to expose them for what they are or ignore them altogether. Intellectually dishonesty is the epistemic equivalent of defection in the cooperative enterprise of truth-seeking; to cooperate with defectors is not a sign of virtue, but quite the opposite.
I’ve seen “in bad faith” used in two ways:
This person’s argument is based on a lie.
This person doesn’t believe their own argument, but they aren’t lying within the argument itself.
While it’s obvious that we should point out lies where we see them, I think we should distinguish between (1) and (2). An argument’s original promoter not believing it isn’t a reason for no one to believe it, and shouldn’t stop us from engaging with arguments that aren’t obviously false.
(See this comment for more.)
I agree that there is a relevant difference, and I appreciate your pointing it out. However, I also think that knowledge of the origins of a claim or an argument is sometimes relevant for deciding whether one should engage seriously with it, or engage with it at all, even if the person presenting it is not himself/herself acting in bad faith. For example, if I know that the oil or the tobacco industries funded studies seeking to show that global warming is not anthropogenic or that smoking doesn’t cause cancer, I think it’s reasonable to be skeptical even if the claims or arguments contained in those studies are presented by a person unaffiliated with those industries. One reason is that the studies may consist of filtered evidence—that is, evidence selected to demonstrate a particular conclusion, rather than to find the truth. Another reason is that by treating arguments skeptically when they originate in a non-truth-seeking process, one disincentivizes that kind of intellectually dishonest and socially harmful behavior.
In the case at hand, I think what’s going on is pretty clear. A person who became deeply hostile to longtermism (for reasons that look prima facie mostly unrelated to the intellectual merits of those views) diligently went through most of the longtermist literature fishing for claims that would, if presented in isolation to a popular audience using technically true but highly tendentious or misleading language and/or stripped of the relevant context, cause serious damage to the longtermist movement. In light of this, I think it is not only naive but epistemically unjustified to insist that this person’s findings be assessed on their merits alone. (Again, consider what your attitude would be if the claims originated e.g. in an industry lobbyist.)
In addition, I think that it’s inappropriate to publicize this person’s writings, by including them in a syllabus or by reproducing their cherry-picked quotes. In the case of Nick Beckstead’s quote, in particular, its reproduction seems especially egregious, because it helps promote an image of someone diametrically opposed to the truth: an early Giving What We Can Member who pledged to donate 50% of his income to global poverty charities for the rest of his life is presented—from a single paragraph excerpted from a 180-page doctoral dissertation intended to be read primarily by an audience of professional analytic philosophers—as “support[ing] white supremacist ideology”. Furthermore, even if Nick was just an ordinary guy rather than having impeccable cosmopolitan credentials, I think it would be perfectly appropriate to write what he did in the context of a thesis advancing the argument that our moral judgments are less reliable than is generally assumed. More generally, and more importantly, I believe that as EAs we should be willing to question established beliefs related to the cost-effectiveness of any cause, even if this risks reaching very uncomfortable conclusions, as long as the questioning is done as part of a good-faith effort in cause-prioritization and subject to the usual caveats related to possible reputational damage or the spreading of information hazards. It frightens me to think what our movement might become if it became an accepted norm that explorations of the sort exemplified by the quote can only be carried out “through a postcolonial lens”!
Note: Although I generally oppose disclaimers, I will add one here. I’ve known Nick Beckstead for a decade or so. We interacted a bit back when he was working at FHI, though after he moved to Open Phil in 2014 we had no further communication, other than exchanging greetings when he visited the CEA office around 2016 and corresponding briefly in a professional capacity. I am also an FTX Fellow, and as I learned recently, Nick has been appointed CEO of the FTX Foundation. However, I made this same criticism ten months ago, way before I developed any ties to FTX (or had any expectations that I would develop such ties or that Nick was being considered for a senior position). Here’s what I wrote back then:
The “incentives” point is reasonable, and it’s part of the reason I’d want to deprioritize checking into claims with dishonest origins.
However, I’ll note that establishing a rule like “we won’t look at claims seriously if the person making them has a personal vendetta against us” could lead to people trying to argue against examining someone’s claims by arguing that they have a personal vendetta, which gets weird and messy. (“This person told me they were sad after org X rejected their job application, so I’m not going to take their argument against org X’s work very seriously.”)
Of course, there are many levels to what a “personal vendetta” might entail, and there are real trade-offs to whatever policy you establish. But I’m wary of taking the most extreme approach in any direction (“let’s just ignore Phil entirely”).
As for filtered evidence — definitely a concern if you’re trying to weigh the totality of evidence for or against something. But not necessarily relevant if there’s one specific piece of evidence that would be damning if true. For example, if Phil had produced a verifiable email exchange showing an EA leader threatening to fire a subordinate for writing something critical of longtermism, it wouldn’t matter much to me how much that leader had done to encourage criticism in public.
I agree with this to the extent that those findings allow for degrees of freedom — so I’ll be very skeptical of conversations reported third-hand or cherry-picked quotes from papers, but still interested in leaked emails that seem like the genuine article.
No major disagreements with anything past this point. I certainly wouldn’t put Phil’s white-supremacy work on a syllabus, though I could imagine excerpts of his criticism on other topics making it in — of the type “this point of view implies this objection” rather than “this point of view implies that the person holding it is a dangerous lunatic”.
Agree with this.
Yes I think that is fair.
At the time (before he wrote his public critique) I had not yet realised that Phil Torres was acting in bad faith.
Just to clarify (since I now realize my comment was written in a way that may have suggested otherwise): I wasn’t alluding to your attempt to steelman his criticism. I agree that at the time the evidence was much less clear, and that steelmanning probably made sense back then (though I don’t recall the details well).
Strong upvote from me—you’ve articulated my main criticisms of EA.
I think it’s particularly surprising that EA still doesn’t pay much attention to mental health and happiness as a cause area, especially when we discuss pleasure and suffering all the time, Yew Kwang Ng focused so much on happiness, and Michael Plant has collaborated with Peter Singer.
In your view, what would it look like for EA to pay sufficient attention to mental health?
To me, it looks like there’s a fair amount of engagement on this:
Peter Singer obviously cares about the issue, and he’s a major force in EA by himself.
Michael Plant’s last post got a positive writeup in Future Perfect and serious engagement from a lot of people on the Forum and on Twitter (including Alexander Berger, who probably has more influence over neartermist EA funding than any other person); Alex was somewhat negative on the post, but at least he read it.
Forum posts with the “mental health” tag generally seem to be well-received.
Will MacAskill invited three very prominent figures to run an EA Forum AMA on psychedelics as a promising mental health intervention.
Founders Pledge released a detailed cause area report on mental health, which makes me think that a lot of their members are trying to fund this area.
EA Global has featured several talks on mental health.
I can’t easily find engagement with mental health from Open Phil or GiveWell, but this doesn’t seem like an obvious sign of neglect, given the variety of other health interventions they haven’t closely engaged with.
I’m limited here by my lack of knowledge w/r/t funding constraints for orgs like StrongMinds and the Happier Lives Institute. If either org way really funding-constrained, I’d consider them to be promising donation targets for people concerned about global health, but I also think that those people — if they look anywhere outside of GiveWell — have a good chance of finding these orgs, thanks to their strong presence on the Forum and in other EA spaces.
I’ve only just seen this and thought I should chime in. Before I describe my experience, I should note that I will respond to Luke’s specific concerns about subjective wellbeing separately in a reply to his comment.
TL;DR Although GiveWell (and Open Phil) have started to take an interest in subjective wellbeing and mental health in the last 12 months, I have felt considerable disappointment and frustration with their level of engagement over the previous six years.
I raised the “SWB and mental health might really matter” concerns in meetings with GiveWell staff about once a year since 2015. Before 2021, my experience was that they more or less dismissed my concerns, even though they didn’t seem familiar with the relevant literature. When I asked what their specific doubts were, these were vague and seemed to change each time (“we’re not sure you can measure feelings”, “we’re worried about experimenter demand effect”, etc.). I’d typically point out their concerns had already been addressed in the literature, but that still didn’t seem to make them more interested. (I don’t recall anyone ever mentioning ‘item response theory’, which Luke raises as his objection.) In the end, I got the impression that GiveWell staff thought I was a crank and were hoping I would just go away.
GiveWell’s public engagement has been almost non-existent. When HLI published, in August 2020, a document explaining how GiveWell could (re)estimate their own ‘moral weights’ using SWB, GiveWell didn’t comment on this (a Founders Pledge researcher did, however, provide detailed comments). The first and only time GiveWell has responded publicly about this was in December 2020, where they set out their concerns in relation to our cash transfer vs therapy meta-analyses; I’ve replied to those comments (many of which expressed quite non-specific doubts) but not yet received a follow-up.
The response I was hoping for—indeed, am still hoping for—was the one Will et al. gave above, namely, “We’re really interested in serious critiques. What do you think we’re getting wrong, why, and what difference would it make if you were right? Would you like us to fund you to work on this?” Obviously, you wouldn’t expect an organisation to engage with critiques that are practically unimportant and from non-credible sources. In this case, however, I was raising fundamental concerns that, if true, could substantially alter the priorities, both for GiveWell and EA more broadly. And, for context, at the time I initially highlighted these points I was doing a philosophy PhD supervised by Hilary Greaves and Peter Singer and the measurement of wellbeing was a big part of my thesis.
There has been quite good engagement from other EAs and EAs orgs, as Aaron Gertler notes above. I can add to those that, for instance, Founders Pledge have taken SWB on board in their internal decision-making and have since made recommendations in mental health. However, GiveWell’s lack of engagement has really made things difficult because EAs defer so much to GiveWell: a common question I get is “ah, but what does GiveWell think?” People assume that, because GiveWell didn’t take something seriously, that was strong evidence they shouldn’t either. This frustration was compounded by the fact that because there isn’t a clear, public statement of what GiveWell’s concerns were, I could neither try to address their concerns nor placate the worries of others by saying something like “GiveWell’s objection is X. We don’t share that because of Y”.
This is pure speculation on my part, but I wonder if GiveWell (and perhaps Open Phil too) developed an ‘ugh field’ around subjective wellbeing and mental health. They didn’t look into it initially because they were just too damn busy. But then, after a while, it became awkward to start engaging with because that would require admitting they should have done so years ago, so they just ignored it. I also suspect there’s been something of an information cascade where someone originally looked at all this (see my reply to Luke above), decided it wasn’t interesting, and then other staff members just took that on trust and didn’t revisit it—everyone knew an idea could be safely ignored even if they weren’t sure why.
Since 2021, however, things have been much better. In late 2020, as mentioned, HLI published a blog post showing how SWB could be used to (re)estimate GiveWell’s ‘moral weights’. I understand that some of GiveWell’s donors asked them for an opinion on this and that pushed them to engage with it. HLI had a productive conversation with GiveWell in February 2021 (see GiveWell’s notes) where, curiously, no specific objections to SWB were raised. GiveWell are currently working on a blog post responding to our moral weights piece and they kindly shared a draft with us in July asking for our feedback. They’ve told us they plan to publish reports on SWB and psychotherapy in the next 3-6 months.
Regarding Open Phil, it seemed pointless to engage unless GiveWell came on board, because Open Phil also defer strongly to GiveWell’s judgements, as Alex Berger has recently stated. However, we recently had some positive engagement from Alex on Twitter, and a member of his team contacted HLI for advice after reading our report and recommendations on global mental health. Hence, we are now starting to see some serious engagement, but it’s rather overdue and still less fulsome than I’d want.
Really sad to hear about this, thanks for sharing. And thank you for keeping at it despite the frustrations. I think you and the team at HLI are doing good and important work.
To me (as someone who has funded the Happier Lives institute) I just think it should not have taken founding an institute and 6 years and of repeating this message (and feeling largely ignored and dismissed by existing EA orgs) to reach the point we are at now.
I think expecting orgs and donors to change direction is certainly a very high bar. But I don’t think we should pride ourselves on being a community that pivots and changes direction when new data (e.g. on subjective wellbeing) is made available to us.
FWIW, one of my first projects at Open Phil, starting in 2015, was to investigate subjective well-being interventions as a potential focus area. We never published a page on it, but we did publish some conversation notes. We didn’t pursue it further because my initial findings were that there were major problems with the empirical literature, including weakly validated measures, unconvincing intervention studies, one entire literature using the wrong statistical test for decades, etc. I concluded that there might be cost-effective interventions in this space, perhaps especially after better measure validation studies and intervention studies are conducted, but my initial investigation suggested it would take a lot of work for us to get there, so I moved on to other topics.
At least for me, I don’t think this is a case of an EA funder repeatedly ignoring work by e.g. Michael Plant — I think it’s a case of me following the debate over the years and disagreeing on the substance after having familiarized myself with the literature.
That said, I still think some happiness interventions might be cost-effective upon further investigation, and I think our Global Health & Well-Being team has been looking into the topic again as that team has gained more research capacity in the past year or two.
Hello Luke, thanks for this, which was illuminating. I’ll make an initial clarifying comment and then go on to the substantive issues of disagreement.
I’m not sure what you mean here. Are you saying GiveWell didn’t repeatedly ignore the work? That Open Phil didn’t? Something else? As I set out in another comment, my experience with GiveWell staff was of being ignored by people who weren’t at that familiar with the relevant literature—FWIW, I don’t recall the concerns you raise in your notes being raised with me. I’ve not had interactions with Open Phil staff prior to 2021 - for those reading, Luke and I have never spoken—so I’m not able to comment regarding that.
Onto the substantive issues. Would you be prepared to more precisely state what your concerns are, and what sort of evidence would chance your mind? Reading your comments and your notes, I’m not sure exactly what your objections are and, in so far as I do, they don’t seem like strong objections.
You mention “weakly validated measures” as an issue but in the text you say “for some scales, reliability and validity have been firmly established”, which implies to me you think (some) scales are validated. So which scales are you worried about, to what extent, and why? Are they so non-validated we should think they contain no information? If some scales are validated, why not just use those ones? By analogy, we wouldn’t give up on measuring temperature if we thought only some of our thermometers were broken. I’m not sure if we’re even on the same page about what it is to ‘validate’ a measure of something (I can elaborate, if helpful).
On “unconvincing intervention studies”, I take it you’re referring to your conversation notes with Sonja Lyubormirsky. The ‘happiness interventions’ you talk about are really just those from the field of ‘positive psychology’ where, basically, you take mentally healthy people and try to get them to change their thoughts and behaviours to be happier, such as by writing down what they’re grateful for. This implies a very narrow interpretation of ‘happiness interventions’. Reducing poverty or curing diseases are ‘happiness interventions’ in my book because they increase happiness, but they are certainly not positive psychology interventions. One can coherently think that subjective wellbeing measures, eg self-reported happiness, are valid and capture something morally important but deny gratitude journalling etc. are particularly promising ways, in practice, of increasing it. Also, there’s a big difference between the lab-style experiments psychologists run and the work economists tend to do looking at large panel and cohort data sets.
Regarding “one entire literature using the wrong statistical test for decades”, again, I’m not sure exactly what you mean. Is the point about ‘item response theory’? I confess that’s not something that gets discussed in the academic world of subjective wellbeing measurement—I don’t think I’ve ever heard it mentioned. After a quick look, it seems to be a method to relate scores of psychometric tests to real-world performance. That seems to be a separate methodological ballgame from concerns about the relationship between how people feel and how they report those feelings on a numerical scale, e.g. when we ask “how happy are you, 0-10?”. Subjective wellbeing researchers do talk about the issue of ‘scale cardinality’, ie, roughly, does your “7/10” feel the same to you as my “7/10″ does to me? This issue has been starting to get quite a bit of attention in just the last couple of years but has, I concede, been rather neglected by the field. I’ve got a working paper on this under review which is (I think) the first comprehensive review of the problem.
To me, it looks like in your initial investigation you had the bad luck to run into a couple of dead ends and, quite understandably given those, didn’t go further. But I hope you’ll let me try to explain further to you why I think happiness research (like happiness itself) is worth taking seriously!
Hi Michael,
I don’t have much time to engage on this, but here are some quick replies:
I don’t know anything about your interactions with GiveWell. My comment about ignoring vs. not-ignoring arguments about happiness interventions was about me / Open Phil, since I looked into the literature in 2015 and have read various things by you since then. I wouldn’t say I ignored those posts and arguments, I just had different views than you about likely cost-effectiveness etc.
On “weakly validated measures,” I’m talking in part about lack of IRT validation studies for SWB measures used in adults (NIH funded such studies for SWB measures in kids but not adults, IIRC), but also about other things. The published conversation notes only discuss a small fraction of my findings/thoughts on the topic.
On “unconvincing intervention studies” I mean interventions from the SWB literature, e.g. gratitude journals and the like. Personally, I’m more optimistic about health and anti-poverty interventions for the purpose of improving happiness.
On “wrong statistical test,” I’m referring to the section called “Older studies used inappropriate statistical methods” in the linked conversation notes with Joel Hektner.
TBC, I think happiness research is worth engaging and has things to teach us, and I think there may be some cost-effectiveness happiness interventions out there. As I said in my original comment, I moved on to other topics not because I think the field is hopeless, but because it was in a bad enough state that it didn’t make sense for me to prioritize it at the time.
Hello Luke,
Thanks for this too. I appreciate you’ve since moved on to other things, so this isn’t really your topic to engage on anymore. However, I’ll make two comments.
First, you said you read various things in the area, including by me, since 2015. It would have been really helpful (to me) if, given you had different views, you had engaged at the time and set out where you disagreed and what sort of evidence would have changed your mind.
Second, and similarly, I would really appreciate it if the current team at Open Philanthropy could more precisely set out their perspective on all this. I did have a few interactions with various Open Phil staff in 2021, but I wouldn’t say I’ve got anything like canonical answers on what their reservations are about 1. measuring outcomes in terms of SWB - Alex Berger’s recent technical update didn’t comment on this—and 2. doing more research or grantmaking into the things that, from the SWB perspective, seem overlooked.
This is an interesting conversation. It’s veering off into a separate topic. I wish there was a way to “rebase” these spin-off discussions into a different place. For better organisation.
Thank you Luke – super helpful to hear!!
Do you feel that existing data on subjective wellbeing is so compelling that it’s an indictment on EA for GiveWell/OpenPhil not to have funded more work in that area? (Founder’s Pledge released their report in early 2019 and was presumably working on it much earlier, so they wouldn’t seem to be blameworthy.)
I can’t say much more here without knowing the details of how Michael/others’ work was received when they presented it to funders. The situation I’ve outlined seems to be compatible both with “this work wasn’t taken seriously enough” and “this work was taken seriously, but seen as a weaker thing to fund than the things that were actually funded” (which is, in turn, compatible with “funders were correct in their assessment” and “funders were incorrect in their assessment”).
That Michael felt dismissed is moderate evidence for “not taken seriously enough”. That his work (and other work like it) got a bunch of engagement on the Forum is weak evidence for “taken seriously” (what the Forum cares about =/= what funders care about, but the correlation isn’t 0). I’m left feeling uncertain about this example, but it’s certainly reasonable to argue that mental health and/or SWB hasn’t gotten enough attention.
(Personally, I find the case for additional work on SWB more compelling than the case for additional work on mental health specifically, and I don’t know the extent to which HLI was trying to get funding for one vs. the other.)
Tl;dr. Hard to judge. Maybe: Yes for GW. No for Open Phil. Mixed for EA community as a whole.
I think I will slightly dodge the question and answer the separate question – are these orgs doing enough exploratory type research. (I think this is a more pertinent question, and although I think subjective wellbeing is worth looking into as an example it is not clear it is at the very top of the list of things to look into more that might change how we think about doing good).
Firstly to give a massive caveat that I do not know for sure. It is hard to judge and knowing exactly how seriously various orgs have looked into topics is very hard to do from the outside. So take the below with a pinch of salt. That said:
OpenPhil – AOK.
OpenPhil (neartermists) generally seem good at exploring new areas and experimenting (and as Luke highlights, did look into this).
GiveWell – hmmm could do better.
GiveWell seem to have a pattern of saying they will do more exploratory research (e.g. into policy) and then not doing it (mentioned here, I think 2020 has seen some but minimal progress).
I am genuinely surprised GiveWell have not found things better than anti-malaria and deworming (sure, there are limits on how effective scalable charities can be but it seems odd our first guesses are still the top recommended).
There is limited catering to anyone who is not a classical utilitarian – for example if you care about wellbeing (e.g. years lived with disability) but not lives saved it is unclear where to give.
EA in general – so-so.
There has been interest from EAs (individuals, Charity Entrepreneurship, Founders Pledge, EAG) on the value of happiness and addressing mental health issues, etc.
It is not just Michael. I get the sense the folk working on Improving Institutional Decision Making (IIDM) have struggled to get traction and funding and support too. (Although maybe promoters of new causes areas within EA always feel their ideas are not taken seriously.)
The EA community (not just GiveWell) seems very bad at catering to folk who are not roughly classical (or negative leaning) utilitarians (a thing I struggled with when working as a community builder).
I do believe there is a lack of exploratory research happening given the potential benefits (see here and here). Maybe Rethink are changing this.
Not sure I really answered the question. And anyway none of those points are very strong evidence as much as me trying to explain my current intuitions. But maybe I said something of interest.
Strong upvote for this if nothing else.
(the rest is also brilliant though, thank you so much for speaking up!)