Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.
Feel free to reach out if you think there’s anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you’re a medical student / junior doctor reconsidering your clinical future, or if you’re quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.
Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.
All comments in personal capacity unless otherwise stated.
bruce
Can you assure me that Rethink’s researchers are independent?.
I no longer work at RP, but I thought I’d add a data point from someone who doesn’t stand to benefit from your donations, in case it was helpful.
I think my take here is that if my experience doing research with the GHD team is representative of RP’s work going forwards, then research independence should not be a reason not to donate.[1]
My personal impression is that of the work that I / the GHD team has been involved with, I have been afforded the freedom to look for our best guess of what the true answers are, and have personally never felt constrained or pushed into a particular answer that wasn’t directly related to interpretation of the research. I have also consistently felt free to push back on lines of research that I feel would be less productive, or suggest stronger alternatives. I think credit here probably goes both to clients as well as the GHD team, though I’m not sure exactly how to attribute this.
I feel less confident about biases that may arise from the research agenda / selection of research questions or worldviews and assumptions of clients, but this could (for example) make one more inclined towards funding RP to do their own independent research, or specifying research you think is particularly important and neglected.
Edit: See thread by Saulius detailing his views.- ^
Caveats: I can’t speak for the teams outside of GHD, and I can’t speak for RP’s work in 2024. This comment should not be seen as an endorsement of the claim that RP is the best place to donate to all things considered, which obviously is influenced by other variables beyond research independence.
- ^
Evidentiary standards. We drew on a large number of RCTs for our systematic reviews and meta-analyses of cash transfers and psychotherapy (42 and 74, respectively). If one holds that the evidence for something as well-studied as psychotherapy is too weak to justify any recommendations, charity evaluators could recommend very little.
A comparatively minor point, but it doesn’t seem to me that the claims in Greg’s post [more] are meaningfully weakened by whether or not psychotherapy is well-studied (as measured by how many RCTs HLI has found on it, noting that you already push back on some object level disagreement on study quality in point 1, which feels more directly relevant).
It also seems pretty unlikely to be true that psychotherapy being well studied necessarily means that StrongMinds is a cost-effective intervention comparable to current OP / GW funding bars (which is one main point of contention), or that charity evaluators need 74+ RCTs in an area before recommending a charity. Is the implicit claim being made here is that the evidence for StrongMinds being a top charity is stronger than that of AMF, which is (AFAIK) based on less than 74 RCTs?[1]
I never worked directly with Meghan when we were colleagues, but my interactions with her were v positive and give me the impression that she would be a great supervisor to work with—infectiously passionate about her research, an excellent communicator, and kind + supportive.
This sounds like a terribly traumatic experience. I’m so sorry you went through this, and I hope you are in a better place and feel safer now.
Your self-worth is so, so much more than how well you can navigate what sounds like a manipulative, controlling, and abusive work environment.
spent months trying to figure out how to empathize with Kat and Emerson, how they’re able to do what they’ve done, to Alice, to others they claimed to care a lot about. How they can give so much love and support with one hand and say things that even if I’d try to model “what’s the worst possible thing someone could say”, I’d be surprised how far off my predictions would be.
It sounds like despite all of this, you’ve tried to be charitable to people who have treated you unfairly and poorly—while this speaks to your compassion, I know this line of thought can often lead to things that feel like you are gaslighting yourself, and I hope this isn’t something that has caused you too much distress.
I also hope that Effective Altruism as a community becomes a safer space for people who join it aspiring to do good, and I’m grateful for your courage in sharing your experiences, despite it (very reasonably!) feeling painful and unsafe for you.[1] All the best for whatever is next, and I hope you have access to enough support around you to help with recovering what you’ve lost.
============
[Meta: I’m aware that there will likely be claims around the accuracy of these stories, but I think it’s important to acknowledge the potential difficulty of sharing experiences of this nature with a community that rates itself highly on truth-seeking, possibly acknowledging your own lived experience as “stories” accordingly; as well as the potential anguish it might be for these experiences to have been re-lived over the past year and possibly again in the near future, if/when these claims are dissected, questioned, and contested.]
- ^
That being said, your experience would be no less valid had you chosen not to share these. And even though I’m cautiously optimistic that the EA community will benefit from you sharing these experiences, your work here is supererogatory, and improving Nonlinear’s practices or the EA community’s safety is not your burden to bear alone. In a different world it would have been totally reasonable for you to not have shared this, if that was what you needed to do for your own wellbeing. I guess this comment is more for past Chloes or other people with similar experiences who may have struggled with these kinds of decisions than it is for Chloe today, but thought it was worth mentioning.
- ^
Both Kat and Emerson are claiming that there have been edits to this post.[1]
I wonder whether an appendix or summary of changes to important claims would be fair and appropriate, given the length of post and severity of allegations? It’d help readers keep up with these changes, and it is likely most time-efficient for the author making the edits to document these as they go along.
[Edit: Kat has since retracted her statement.]
- ^
his original post (he’s quietly redacted a lot of points since publishing) had a lot of falsehoods that he knew were not true. He has since removed some of them after the fact, but those have still been causing us damage.Ben has also been quietly fixing errors in the post, which I appreciate, but people are going around right now attacking us for things that Ben got wrong, because how would they know he quietly changed the post?
This is why every time newspapers get caught making a mistake they issue a public retraction the next day to let everyone know. I believe Ben should make these retractions more visible
- ^
To add sources to recent examples that come to mind that broadly support MHR’s point above RE: visible (ex post) failures that don’t seem to be harshly punished, (most seem somewhere between neutral to supportive, at least publicly).
Lightcone
Alvea
ALERT
AI Safety Support
EA hub
No Lean SeasonSome failures that came with a larger proportion of critical feedback probably include the Carrick Flynn campaign (1, 2, 3), but even here “harshly punish” seems like an overstatement. HLI also comes to mind (and despite highly critical commentary in earlier posts, I think the highly positive response to this specific post is telling).
============
On the extent to which Nonlinear’s failures relate to integrity / engineering, I think I’m sympathetic to both Rob’s view:
I think the failures that seem like the biggest deal to me (Nonlinear threatening people and trying to shut down criticism and frighten people) genuinely are matters of character and lack of integrity, not matters of bad engineering.
As well as Holly’s:
If you wouldn’t have looked at it before it imploded and thought the engineering was bad, I think that’s the biggest thing that needs to change. I’m concerned that people still think that if you have good enough character (or are smart enough, etc), you don’t need good boundaries and systems.
but do not think these are necessarily mutually exclusive.
Specifically, it sounds like Rob is mainly thinking about the source of the concerns, and Holly is thinking about what to do going forwards. And it might be the case that the most helpful actionable steps going forward are things that look more like improving boundaries and systems, regardless of whether you believe failures specific to Nonlinear are caused by deficiencies in integrity or engineering.That said, I agree with Rob’s point that the most significant allegations raised about Nonlinear quite clearly do not fit the category of ‘appropriate experimentation that the community would approve of’, under almost all reasonable perspectives.
I was a participant and largely endorse this comment.
one contributor to a lack of convergence was attrition of effort and incentives. By the time there was superforecaster-expert exchange, we’d been at it for months, and there weren’t requirements for forum activity (unlike the first team stage)
[Edit: wrote this before I saw lilly’s comment, would recommend that as a similar message but ~3x shorter].
============
I would consider Greg’s comment as “brought up with force”, but would not consider it an “edge case criticism”. I also don’t think James / Alex’s comments are brought up particularly forcefully.
I do think it is worth making a case that pushing back on making comments that are easily misinterpreted or misleading are also not edge case criticisms though, especially if these are comments that directly benefit your organisation.
Given the stated goal of the EA community is “to find the best ways to help others, and put them into practice”, it seems especially important that strong claims are sufficiently well-supported, and made carefully + cautiously. This is in part because the EA community should reward research outputs if they are helpful for finding the best ways to do good, not solely because they are strongly worded; in part because EA donors who don’t have capacity to engage at the object level may be happy to defer to EA organisations/recommendations; and in part because the counterfactual impact diverted from the EA donor is likely higher than the average donor.
For example:
“We’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money”.[1]
Michael has expressed regret about this statement, so I won’t go further into this than I already have. However, there is a framing in that comment that suggests this is an exception, because “HLI is quite well-caveated elsewhere”, and I want to push back on this a little.
HLI has previously been mistaken for an advocacy organisation (1, 2). This isn’t HLI’s stated intention (which is closer to a “Happiness/Wellbeing GiveWell”). I outline why I think this is a reasonable misunderstanding here (including important disclaimers that outline HLI’s positives).
Despite claims that HLI does not advocate for any particular philosophical view, I think this is easily (and reasonably) misinterpreted.
James’ comment thread below: “Our focus on subjective wellbeing (SWB) was initially treated with a (understandable!) dose of scepticism. Since then, all the major actors in effective altruism’s global health and wellbeing space seem to have come around to it”
See alex’s comment below, where TLYCS is quoted to say: “we will continue to rely heavily on the research done by other terrific organizations in this space, such as GiveWell, Founders Pledge, Giving Green, Happier Lives Institute [...]”
I think excluding “to identify candidates for our recommendations, even as we also assess them using our own evaluation framework” [emphasis added] gives a fairly different impression to the actual quote, in terms of whether or not TLYCS supports WELLBYs as an approach.
While I wouldn’t want to exclude careless communication / miscommunication, I can understand why others might feel less optimistic about this, especially if they have engaged more deeply at the object level and found additional reasons to be skeptical.[2] I do feel like I subjectively have a lower bar for investigating strong claims by HLI than I did 7 or 8 months ago.
(commenting in personal capacity etc)
============
Adding a note RE: Nathan’s comment below about bad blood:
Just for the record, I don’t consider there to be any bad blood between me and any members of HLI. I previously flagged a comment I wrote with two HLI staff, worrying that it might be misinterpreted as uncharitable or unfair. Based on positive responses there and from other private discussions, my impression is that this is mutual.[3]- ^
-This as the claim that originally prompted me to look more deeply into the StrongMinds studies. After <30 minutes on StrongMinds’ website, I stumbled across a few things that stood out as surprising, which prompted me to look deeper. I summarise some thoughts here (which has been edited to include a compilation of most of the critical relevant EA forum commentary I have come across on StrongMinds), and include more detail here.
-I remained fairly cautious about claims I made, because this entire process took three years / 10,000 hours, so I assumed by default I was missing information or that there was a reasonable explanation.
-However, after some discussions on the forum / in private DMs with HLI staff, I found it difficult to update meaningfully towards believing this statement was a sufficiently well-justified one. I think a fairly charitable interpretation would be something like “this claim was too strong, it is attributable to careless communication, but unintentional.”
- ^
Quotes above do not imply any particular views of commentors referenced.
- ^
I have not done this for this message, as I view it as largely a compilation of existing messages that may help provide more context.
- 11 Jul 2023 8:48 UTC; 1 point) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
A commonly used model in the trust literature (Mayer et al., 1995) is that trustworthiness can be broken down into three factors: ability, benevolence, and integrity.
RE: domain specific, the paper incorporates this under ‘ability’:
The domain of the ability is specific because the trustee may be highly competent in some technical area, affording that person trust on tasks related to that area. However, the trustee may have little aptitude, training, or experience in another area, for instance, in interpersonal communication. Although such an individual may be trusted to do analytic tasks related to his or her technical area, the individual may not be trusted to initiate contact with an important customer. Thus, trust is domain specific.
There are other conceptions but many of them describe something closer to trust that is domain specific rather than generalised.
...All of these are similar to ability in the current conceptualization. Whereas such terms as expertise and competence connote a set of skills applicable to a single, fixed domain (e.g., Gabarro’s interpersonal competence), ability highlights the task- and situation-specific nature of the construct in the current model.
This is a conversation I have a fair amount when I talk to non-EA + non-medical friends about work, some quick thoughts:
If someone asks me Qs around DALYs at all (i.e. “why measure”), I would point to general cases where this happens fairly uncontroversially, e.g.:-If you were in charge of the health system, how would you choose to distribute the resources you get?
-If you were building a hospital, how would you go about choosing how to allocate your wards to different specialties?
-If you were in an emergency waiting room and you had 10 people in the waiting room, how would you choose who to see first?
These kinds of questions entail some kind of “diverting resources from one person to another” in a way that is pretty understandable (though they also point to reasonable considerations for why you might not only use DALYs in those contexts)
If someone is challenging me over using DALYs in context of it being a measurement system that is potentially ableist, then I generally just agree—it is indeed ableist by some framings![1]
Though, often in these conversations the underlying theme isn’t necessarily a “I have a problem with healthcare prioritisation” but a general sense that disabled folk aren’t receiving enough resources for their needs—so when having these conversations it’s important to acknowledge that disabled folk do just face a lot more challenges navigating the healthcare system (and society generally) through no fault of their own, and that we haven’t worked out the answers to prioritising accordingly or for solving the barriers that disabled folk face.
If the claim goes further and is explicitly saying interventions for disabilities are more cost effective than current DALYs approach give them credit for, then that’s also worth considering—though the standard would correspondingly increase if they are suggesting a new approach to resource allocation—as Larks’ comment illustrates, it is difficult to find an singular approach / measure that doesn’t push against intuitions or have something problematic at the policy level.[2]
On how you’re feeling when talking about prioritising:But then I feel like I’m implicitly saying something about valuing some people’s lives less than others, or saying that I would ultimately choose to divert resources from one person’s suffering to another’s.
This makes sense, though I do think there is a decent difference between the claim of “some people’s lives are worth more than others” and the claim of “some healthcare resources go further in one context than others (and thus justify the diversion)”. For example, I think if you never actively deprioritised anyone you would end up implicitly/passively prioritising based on things like [who can afford to go to the hospital / who lives closer / other access constraints]. But these are going to be much less correlated to what people care about when they say “all lives are equal”.
But if we have data on what the status quo is, then “not prioritising” / “letting the status quo happen” is still a choice we are making! And so we try to improve on the status quo and save more lives, precisely because we don’t think the 1000 patients on diabetes medication is worth less than the one cancer patient on a third-line immunotherapy.
- ^
E.g., for DALYs, the disability weight of 1 person with (condition A+B) is mathematically forced to be lower than the combined disability weight of two separate individuals with condition A and condition B respectively. That means for any cure of condition A, those who have only condition A would theoretically be prioritised under the DALY framework than those who have other health issues (e.g. have a disability). While I don’t have a good sense of when/if this specific part of the DALY framework has impacted resource allocation in practice, it is important to acknowledge the (many!) limitations the measures we use have.
- ^
Also, different folks within the disability community also have a wide range of views around what it means to live with a disability / be a disabled person (e.g. functional VS social models of disability), so it’s not actually clear that e.g., WELLBYs would necessarily lead to more healthcare resources in that direction, depending on which groups you were talking to.
- ^
Thanks for writing this! RE: We would advise against working at Conjecture
We think there are many more impactful places to work, including non-profits such as Redwood, CAIS and FAR; alignment teams at Anthropic, OpenAI and DeepMind; or working with academics such as Stuart Russell, Sam Bowman, Jacob Steinhardt or David Krueger. Note we would not in general recommend working at capabilities-oriented teams at Anthropic, OpenAI, DeepMind or other AGI-focused companies.
Additionally, Conjecture seems relatively weak for skill building [...] We expect most ML engineering or research roles at prominent AI labs to offer better mentorship than Conjecture. Although we would hesitate to recommend taking a position at a capabilities-focused lab purely for skill building, we find it plausible that Conjecture could end up being net-negative, and so do not view Conjecture as a safer option in this regard than most competing firms.
I don’t work in AI safety and am not well-informed on the orgs here, but did want to comment on this as this recommendation might benefit from some clarity about who the target audience is.
As written, the claims sound something like:
CAIS et al., alignment teams at Anthropic et al., and working with Stuart Russel et al., are better places to work than Conjecture
Though not necessarily recommended, capabilities research at prominent AI labs is likely to be better than working at Conjecture for skill building, since Conjecture is not necessarily safer.
However:
The suggested alternatives don’t seem like they would be able to absorb a significant amount of additional talent, especially given the increase in interest in AI.
I have spoken to a few people working in AI / AI field building who perceive mentoring to be a bottleneck in AI safety at the moment.
If both of the above are true, what would your recommendation be to someone who had an offer from Conjecture, but not your recommended alternatives? E.g., choosing between independent research funded by LTFF VS working for Conjecture?
Just seeking a bit more clarity about whether this recommendation is mainly targeted at people who might have a choice between Conjecture and your alternatives, or whether this is a blanket recommendation that one should reject offers from Conjecture, regardless of seniority and what their alternatives are, or somewhere in between.
Thanks again!
Some very quick thoughts from EY’s TIME piece from the perspective of someone ~outside of the AI safety work. I have no technical background and don’t follow the field closely, so likely to be missing some context and nuance; happy to hear pushback!
Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
My immediate reaction when reading this was something like “wow, is this representative of AI safety folks? Are they willing to go to any lengths to stop AI development?”. I’ve heard anecdotes of people outside of all this stuff saying this piece reads like a terrorist organisation, for example, which I think is a stronger term than I’d describe, but I think suggestions like this does unfortunately play into potential comparisons to ecofascists.
As someone seen publicly to be a thought leader and widely regarded as a founder of the field, there are some risks to this kind of messaging. It’s hard to evaluate how this trades off, but I definitely know communities and groups that would be pretty put off by this, and it’s unclear how much value the sentences around willingness to escalate nuclear war are actually adding.
It’s an empirical Q about how to tradeoff between risks from nuclear war and risks from AI, but the claim of “preventing AI extinction is a priority above a nuclear exchange” is ~trivially true; the reverse is also true: “preventing extinction from nuclear war is a priority above preventing AI training runs”. Given the difficulty of illustrating and defending a position that the risks of AI training runs is substantially higher than that of a nuclear exchange to the general public, I would have erred on the side of caution when saying things that are as politically charged as advocating for nuclear escalation (or at least something can be interpreted as such).
I wonder which superpower EY trusts to properly identify a hypothetical “rogue datacentre” that’s worthy of a military strike for the good of humanity, or whether this will just end up with parallels to other failed excursions abroad ‘for the greater good’ or to advance individual national interests.
If nuclear weapons are a reasonable comparison, we might expect limitations to end up with a few competing global powers to have access to AI developments, and countries that do not. It seems plausible that criticism around these treaties being used to maintain the status quo in the nuclear nonproliferation / disarmament debate may be applicable here too.
Unlike nuclear weapons (though nuclear power may weaken this somewhat), developments in AI has the potential to help immensely with development and economic growth.
Thus the conversation may eventually bump something that looks like:
Richer countries / first movers that have obtained significant benefits of AI take steps to prevent other countries from catching up.[1]
Rich countries using the excuse of preventing AI extinction as a guise to further national interests
Development opportunities from AI for LMICs are similarly hindered, or only allowed in a way that is approved by the first movers in AI.
Given the above, and that conversations around and tangential to AI risk already receive some pushback from the Global South community for distracting and taking resources away from existing commitments to UN Development Goals, my sense is that folks working in AI governance / policy would likely strongly benefit from scoping out how these developments are affecting Global South stakeholders, and how to get their buy-in for such measures.
(disclaimer: one thing this gestures at is something like—“global health / development efforts can be instrumentally useful towards achieving longtermist goals”[2], which is something I’m clearly interested in as someone working in global health. While it seems rather unlikely that doing so is the best way of achieving longtermist goals on the margin[3], it doesn’t exclude some aspect of this in being part of a necessary condition for important wins like an international treaty, if that’s what is currently being advocated for. It is also worth mentioning because I think this is likely to be a gap / weakness in existing EA approaches).
In our new report, The Elephant in the Bednet, we show that the relative value of life-extending and life-improving interventions depends very heavily on the philosophical assumptions you make. This issue is usually glossed over and there is no simple answer.
We conclude that the Against Malaria Foundation is less cost-effective than StrongMinds under almost all assumptions. We expect this conclusion will similarly apply to the other life-extending charities recommended by GiveWell.
In suggesting James quote these together, it sounds like you’re saying something like “this is a clear caveat to the strength of recommendation behind StrongMinds, HLI doesn’t recommend StrongMinds as strongly as the individual bullet implies, it’s misleading for you to not include this”.
But in other places HLI’s communication around this takes on a framing of something closer to “The cost effectiveness of AMF, (but not StrongMinds) varies greatly under these assumptions. But the vast majority of this large range falls below the cost effectiveness of StrongMinds”. (extracted quotes in footnote)[1]
As a result of this framing, despite the caveat that HLI “[does] not advocate for any particular view”, I think it’s reasonable to interpret this as being strongly supportive of StrongMinds, which can be true even if HLI does not have a formed view on the exact philosophical view to take.[2]
If you did mean the former (that the bullet about philosophical assumptions is primarily included as a caveat to the strength of recommendation behind StrongMinds), then there is probably some tension here between (emphasis added):
-”the relative value of life-extending and life-improving interventions depends very heavily on the philosophical assumptions you make...there is no simple answer”, and
-”We conclude StrongMinds > AMF under almost all assumptions”
Additionally I think some weak evidence to suggest that HLI is not as well-caveated as it could be is that many people (mistakenly) viewed HLI as an advocacy organisation for mental health interventions. I do think this is a reasonable outside interpretation based on HLI’s communications, even though this is not HLI’s stated intent. For example, I don’t think it would be unreasonable for an outsider to read your current pinned thread and come away with conclusions like:
“StrongMinds is the best place to donate”,
“StrongMinds is better than AMF”,
“Mental health is a very good place to donate if you want to do the most good”,
“Happiness is what ultimately matters for wellbeing and what should be measured”.
If these are not what you want people to take away, then I think pointing to this bullet point caveat doesn’t really meaningfully address this concern—the response kind of feels something like “you should have read the fine print”. While I don’t think it’s not necessary for HLI to take a stance on specific philosophical views, I do think it becomes an issue if people are (mis)interpreting HLI’s stance based on its published statements.
(commenting in personal capacity etc)
- ^
-We show how much cost-effectiveness changes by shifting from one extreme of (reasonable) opinion to the other. At one end, AMF is 1.3x better than StrongMinds. At the other, StrongMinds is 12x better than AMF.
-StrongMinds and GiveDirectly are represented with flat, dashed lines because their cost-effectiveness does not change under the different assumptions.
-As you can see, AMF’s cost-effectiveness changes a lot. It is only more cost-effective than StrongMinds if you adopt deprivationism and place the neutral point below 1.
- ^
As you’ve acknowledged, comments like “We’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money.” perhaps add to the confusion.
- 10 Jul 2023 22:22 UTC; 45 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 11 Jul 2023 8:48 UTC; 1 point) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
That makes sense, thanks for clarifying!
If I understand correctly, the updated figures should then be:
For 1 person being treated by StrongMinds (excluding all household spillover effects) to be worth the WELLBYs gained for a year of life[1] with HLI’s methodology, the neutral point needs to be at least 4.95-3.77 = 1.18.
If we include spillover effects of StrongMinds (and use the updated / lower figures), then the benefit of 1 person going through StrongMinds is 10.7 WELLBYs.[2] Under HLI’s estimates, this is equivalent to more than two years of wellbeing benefits from the average life, even if we set the neutral point at zero. Using your personal neutral point of 2 would suggest the intervention for 1 person including spillovers is equivalent to >3.5 years of wellbeing benefits. Is this correct or am I missing something here?
1.18 as the neutral point seems pretty reasonable, though the idea that 12 hours of therapy for an individual is worth the wellbeing benefits of 1 year of an average life when only considering impacts to them, and anywhere between 2~3.5 years of life when including spillovers does seem rather unintuitive to me, despite my view that we should probably do more work on subjective wellbeing measures on the margin. I’m not sure if this means:
WELLBYs as a measure can’t capturing what I care about in a year of healthy life, so we should not use solely WELLBYs when measuring wellbeing;
HLI isn’t applying WELLBYs in a way that captures the benefits of a healthy life;
The existing way of estimating 1 year of life via WELLBYs is wrong in some other way (e.g. the 4.95 assumption is wrong, the 0-10 scale is wrong, the ~1.18 neutral point is wrong);
HLI have overestimated the benefits of StrongMinds;
I have a very poorly calibrated view of how good / bad 12 hours of therapy / a year of life is worth, though this seems less likely.
Would be interested in your thoughts on this / let me know if I’ve misinterpreted anything!
- ^
More precisely, the average wellbeing benefits from 1 year of life from an adult in 6 African countries
- ^
Thanks Joel.
this comparison, as it stands, doesn’t immediately strike me as absurd. Grief has an odd counterfactual. We can only extend lives. People who’re saved will still die and the people who love them will still grieve. The question is how much worse the total grief is for a very young child (the typical beneficiary of e.g., AMF) than the grief for the adolescent, or a young adult, or an adult, or elder they’d become
My intuition, which is shared by many, is that the badness of a child’s death is not merely due to the grief of those around them. So presumably the question should not be comparing just the counterfactual grief of losing a very young child VS an [older adult], but also “lost wellbeing” from living a net-positive-wellbeing life in expectation?
I also just saw that Alex claims HLI “estimates that StrongMinds causes a gain of 13 WELLBYs”. Is this for 1 person going through StrongMinds (i.e. ~12 hours of group therapy), or something else? Where does the 13 WELLBYs come from?
I ask because if we are using HLI’s estimates of WELLBYs per death averted, and use your preferred estimate for the neutral point, then 13 / (4.95-2) is >4 years of life. Even if we put the neutral point at zero, this suggests 13 WELLBYs is worth >2.5 years of life.[1]
I think I’m misunderstanding something here, because GiveWell claims “HLI’s estimates imply that receiving IPT-G is roughly 40% as valuable as an additional year of life per year of benefit or 80% of the value of an additional year of life total.”
Can you help me disambiguate this? Apologies for the confusion.
- ^
13 / 4.95
- ^
To be a little more precise:
HLI’s estimates imply, for example, that a donor would pick offering StrongMinds’ intervention to 20 individuals over averting the death of a child, and that receiving StrongMinds’ program is 80% as good for the recipient as an additional year of healthy life.
I.e., is it your view that 4-8 weeks of group therapy (~12 hours) for 20 people is preferable to averting the death of a child?
it seems low cost and potentially quite valuable to put up a title and perhaps just a one-para abstract of all the projects you have done/are doing
This is a great suggestion, thanks!
Thanks for this! Yeah, the research going out of date is definitely a relevant concern in some faster-moving areas. RE: easiest to put it up ~immediately—I think if our reports for clients could just be copy pasted to a public facing version for a general audience this would be true, but in practice this is often not the case, e.g. because the client has some underlying background knowledge that would be unreasonable to expect the public to have, running quotes by interviewees to see if they’re happy with being quoted publicly etc.
There’s a direct tradeoff here between spending time on turning a client-facing report to a public-facing version and just starting the next client-facing report. In most cases we’ve just prioritised the next client-facing report, but it is definitely something we want to think more about going forward, and I think our most recent round of hires has definitely helped with this.
In an ideal world the global health team just has a lot of unrestricted funding to use so we can push these things out in parallel etc, in part because it is one way (among many others we’d like to explore) of helping us increase the impact of research we’ve already done, and also because this would provide extra feedback loops that can improve our own process + work.
Thanks for engaging! I’ll speak for myself here, though others might chime in or have different thoughts.
How do you determine if you’re asking the right questions?
Generally we ask our clients at the start something along the lines of “what question is this report trying to help answer for you?”. Often this is fairly straightforward, like “is this worth funding”, or “is this worth more researcher hours in exploring”. And we will often push back or add things to the brief to make sure we include what is most decision-relevant within the timeframe we are allocated. An example of this is when we were asked to look into the landscape of the philanthropy spending for cause area X, but it turns out that excluding the non-philanthropic spending might end up being pretty decision relevant, so we suggested incorporating that into the report.
We have multiple check-ins with our client to make sure the information we get is the kind of information they want, and to have opportunities to pivot if new questions come up as a result of what we find that might be more decision-relevant.
What is your process for judging information quality?
I don’t think we have a formalised organisational-level process around this; and I think this is just fairly general research appraisal stuff that we do independently. There’s a tradeoff between following a thorough process and speed; it might be clear on skimming that this study is much less updating because of its recruitment or allocation etc, but if we needed to e.g. MMAT every study we read this would be pretty time consuming. In general we try to transparently communicate what we’ve done in check-ins with each other, with our client, and in our reports, so they’re aware of limitations in the search and our conclusions.
Do you employ any audits or tools to identify/correct biases (e.g. what studies you select, whom you decide to interview, etc.)?
Can you give me an example of a tool to identify biases in the above? I assume you aren’t referring to tools that we can use to appraise individual studies/reviews but one level above that?
RE: interviews, one approach we frequently take is to look for key papers or reports in the field that are most likely to be decision-relevant and reach out to its author. Sometimes we will intentionally aim to find views that push us in opposing sides of the potential decision. Other times we just need technical expertise in an area that our team doesn’t have. Generally we will reach out to the client with the list to make sure they’re happy with the choices we’ve made, which is intended to reduce doubling up on the same expert, but also serves as a checkpoint I guess.
We don’t have audits but we do have internal reviews, though admittedly I think our current process is unlikely to pick up issues around interviewee selection unless the reviewer is well connected in this space, and it will similarly likely only pick up issues in study selection if the reviewer knows specific papers or have some strong priors around the existence of stronger evidence on this topic. My guess is that the likelihood of the audits making meaningful changes to our report is sufficiently low that if it takes more than a few days it just wouldn’t be worth the time for most of the reports we are doing. That being said, it might be a reasonable thing to consider as part of a separate retrospective review of previous reports etc! Do you have any suggestions here or are there good approaches you know about / have seen?
I think it is entirely possible that people are being unkind because they updated too quickly on claims from Ben’s post that are now being disputed, and I’m grateful that you’ve written this (ditto chinscratch’s comment) as a reminder to be empathetic. That being said, there are also some reasons people might be less charitable than you are for reasons that are unrelated to them being unkind, or the facts that are in contention:
Without commenting on whether Ben’s original post should have been approached better or worded differently or was misleading etc, this comment from the Community Health/Special Projects team might add some useful additional context. There are also previous allegations that have been raised.[1]
Perhaps you are including both of these as part of the same set of allegations, but some may suggest that not being permitted to run sessions / recruit at EAGs and considering blocking attendance (especially given the reference class of actions that have prompted various responses that you can see here) is qualitatively important and may affect whether commentors are being charitable or not (as opposed to if they just considered the contents of Ben’s post VS Nonlinear (NL)’s response). Of course, this depends on how much you think the Community Health/Special Projects team are trustworthy with their judgement / investigation, or how likely this is all just an information cascade etc.
It is possible for altruistic people to be poor managers, poor leaders, make bad decisions about professional boundaries, have a poor understanding of power dynamics, or indeed, be abusive. The extent to which people at NL are altruistic is (afaict) not a major point of contention, and it is possible to not update about how altruistic someone is while also wanting to hold them accountable to some reasonable standard like “not being abusive or manipulative towards people you manage”.
The claims in question from Alice/Chloe/Ben are not anonymous, the identities of Alice and Chloe are known to the Nonlinear team.
Independent of my personal views on these issues, I do think the pushback around ‘stylistic mistakes’ are reasonable insofar as people interpret this to be indicative of something concerning about NL’s approach towards managing staff / criticism / conflict (1, 2, 3), rather than e.g. just being nitpicky about tone, though I appreciate both interpretations are plausible.
I think (much) less is more in this case.[2] I think there are parts of this current post that feel more subjective and not supported by facts, and may be reasonably interpreted by a cynical outsider to look like a distraction or a defensive smear campaign. I think these choices are counterproductive (both for a truth-seeking outsider, and for NL’s own interests), especially given the allegations of frame control and being retaliatory.
There are other parts that might similarly be reasonably interpreted to range from irrelevant (Alice’s personal drug use habits), unproductive (links to Kathy Forth), or misleading (inclusion of photos, inconsistent usage of quotation marks, unnecessary paraphrasing, usage of quotes that miss the full context). I disagreed with the approaches here, though I acknowledge there were competing opinions and I wasn’t privy to the internal discussions that lead to the decisions.
I think a cleaner version of this would have probably been something 5 to 10x shorter (not including the appendix), and looked something like:[3]
Apology for harms done
Acknowledgement of which allegations are seen as the most major (much closer to top 3-5 than all 85)
Responses to major allegations, focusing only on factual differences and claims that are backed up by ~irrefutable evidence
Charitable interpretations of Alice/Chloe/Ben’s position, despite above factual disagreement (what kinds of things need to be true for their allegations to be plausibly reasonable or fair from their perspective),
Lessons learnt, and things NL will do differently in future (some expression of self-awareness / reflection)
An appendix containing a list of unresolved but less critical allegations
Disclaimer: I offered to (and did) help review an early draft, in large part because I expected the NL team to (understandably!) be in panic mode after Ben’s post/getting dogpiled, and I wanted further community updates to be based on as much relevant information as was possible.
This footnote added in response to Jeff’s comment: I agree that it’s likely not double counting, because the story there appears to be one where Kat left the working relationship, which is inconsistent with the accounts of Alice / Chloe’s situations, but also makes it unlikely that the “current employee of NL / Kat” hypothesis is correct.
Perhaps hypocritical given the length of this comment
Acknowledging that I have no PR expertise