Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.
Feel free to reach out if you think there’s anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you’re a medical student / junior doctor reconsidering your clinical future, or if you’re quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.
Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.
All comments in personal capacity unless otherwise stated.
bruce
Speaking for myself / not for anyone else here:
My (highly uncertain + subjective) guess is that each lethal infection is probably worse than 0.5 host-years equivalents, but the number of worms per host animal probably could vary significantly.
That being said, personally I am fine with the assumption of modelling ~0 additional counterfactual suffering for screwworms that are never brought into existence, rather than e.g. an eradication campaign that involves killing existing animals.
I’m unsure how to think about the possibility that the screwworm species which might be living significantly net positive lives such that it trumps the benefit of reduced suffering from screwworm deaths, but I’d personally prefer stronger evidence for wellbeing or harms on the worm’s end to justify inaction here (ie not look into the possibility/feasibility of this)Again, speaking only for myself—I’m not personally fixated on either gene drives or sterile insect approaches! I am also very interested in finding out reasons to not proceed with the project, find alternative approaches, which doesn’t preclude the possibility that the net welfare of screwworms should be more heavily weighed as a consideration. That being said, I would be surprised if something like “we should do nothing to alleviate host animal suffering because their suffering can provide more utils for the screwworm” was a sufficiently convincing reason to not do more work / investigation in this area (for nonutilitarian reasons), though I understand there are a set of assumptions / views one might hold that could drive disagreement here.[1]
- ^
If a highly uncertain BOTEC showed you that torturing humans would bring more utility to digital beings than the suffering incurred on the humans, would you endorse allowing this? At what ratio would you change your mind, and how many OOMs of uncertainty on the BOTEC would you be OK with?
Or—would you be in favour of taking this further and spreading the screwworm globally simply because it provides more utils, rather than just not eradicating the screwworm?
- ^
Launching Screwworm-Free Future – Funding and Support Request
It’s fine to apply regardless; there’s one application form for all 2025 in-person EAGs. You’ll likely be sent an email separately closer to the time reminding you that you can register for the East Coast EAG, and be directed to a separate portal where you can do this without needing to apply again.
Hey team—are you happy to share a bit more about who would be involved in these projects, and their track record (or Whylome’s more broadly)? I only spent a minute or so on this but I can’t find any information online beyond your website and these links, related to SMTM’s “exposure to subclinical doses of lithium is responsible for the obesity epidemic” hypothesis (1, 2).
More info on how much money you’re looking for the above projects would also be useful.
Ah my bad, I meant extreme pain above there as well, edited to clarify! I agree it’s not a super important assumption for the BOTEC in the grand scheme of things though.
However, if one wants to argue that I overestimated the cost-effectiveness of SWP, one has to provide reasons for my guess overestimating the intensity of excruciating pain.
I don’t actually argue for this in either of my comments.[1] I’m just saying that it sounds like if I duplicated your BOTEC, and changed this one speculative parameter to 2 OOMs lower, an observer would have no strong reason to choose one BOTEC over another just by looking at the BOTEC alone. Expressing skepticism of an unproven claim doesn’t produce a symmetrical burden of proof on my end!
Mainly just from a reasoning transparency point of view I think it’s worth fleshing out what these assumptions imply and what is grounding these best guesses[2] - in part because I personally want to know how much I should update based on your BOTEC, in part because knowing your reasoning might help me better argue why you might (or might not) have overestimated the intensity of excruciating pain if I knew where your ratio came from (and this is why I was checking the maths and seeing if these were correct, and asking if there’s stronger evidence if so, before critiquing the 100k figure), and because I think other EAF readers, as well as broader, lower-context audience of EA bloggers would benefit from this too.
If you did that, SWP would still be 434 (= 43.4*10^3*10^3/(100*10^3)) times as cost-effective as GiveWell’s top charities.
Yeah, I wasn’t making any inter-charity comparisons or claiming that SWP is less cost-effective than GW top charities![3] But since you mention it, it wouldn’t be surprising to me if losing 2 OOMs might make some donors favour other animal welfare charities over SWP for example—but again, the primary purpose of these comments is not to litigate which charity is the best, or whether this is better or worse than GW top charities, but mainly just to explore a bit more around what is grounding the BOTEC, so observers have a good sense on how much they should update based on how compelling they find the assumptions / reasoning etc.
I think it is also worth wondering about whether you trully believe that updated intensity. Do you think 1 day of fully healthy life plus 86.4 s (= 0.864*100*10^3/100) of scalding or severe burning events in large parts of the body, dismemberment, or extreme torture would be neutral?
Nope! I would rather give up 1 day of healthy life than 86 seconds of this description. But this varies depending on the timeframe in question.
For example, I’d probably be willing to endure 0.86 seconds of this for 14 minutes of healthy life, and I would definitely endure 0.086 seconds of this than give up 86 seconds of healthy life.
And using your assumptions (ratio of 100k), I would easily rather have 0.8 seconds of this than give up 1 day of healthy life, but if I had to endure many hours of this I could imagine my tradeoffs approaching, or even exceeding 100k.
I do want to mention that I think it’s useful that someone is trying to quantify these comparisons, I’m grateful for this work, and I want to emphasise that these are about making the underlying reasoning more transparent / understanding the methodology that leads to the assumptions in the BOTEC, rather than any kind of personal criticism!
So I suppose I would be wary of saying that GiveDirectly now have 3–4x the WELLBY impact relative to Vida Plena—or even to say that GiveDirectly have any more WELLBY impact relative to Vida Plena
Ah right—yeah I’m not making either of these claims, I’m just saying that if the previous claim (from VP’s predictive CEA) was that: “Vida Plena...is 8 times more cost-effective than GiveDirectly”, and GD has since been updated to 3-4x more cost-effective than it was compared to the time the predictive CEA was published, we should discount the 8x claim downwards somewhat (but not necessarily by 3-4x).
I think one could probably push back on whether 7.5 minutes of [extreme] pain is a reasonable estimate for a person who dies from malaria, but I think the bigger potential issue is still that the result of the BOTEC seems highly sensitive to the “excruciating pain is 100,000 times worse than fully healthy life is good” assumption—for both air asphyxiation and ice slurry, the time spent under excruciating pain make up more than 99.96% of the total equivalent loss of healthy life.[1]
I alluded to this on your post, but I think your results imply you would prefer to avert 10 shrimp days of excruciating pain (e.g. air asphyxiation / ice slurry) over saving 1 human life (51 DALYs).[2]
If I use your assumption and also value human excruciating pain as 100,000 times as bad as healthy life is good,[3] then this means you would prefer to save 10 shrimp days of excruciating pain (using your air asphyxiation figures) over 4.5 human hours of excruciating pain,[4] and your shrimp to human ratio is less than 50:1 - that is, you would rather avert 50 shrimp minutes of excruciating pain than 1 human minute of excruciating pain.
To be clear, this isn’t a claim that one shouldn’t donate to SWP, but just that if you do bite the bullet on those numbers above then I’d be keen to see some stronger justification beyond “my guess” for a BOTEC that leads to results that are so counterintuitive (like I’m kind of assuming that I’ve missed a step or OOMs in the maths here!), and is so highly sensitive to this assumption.[5]
- ^
Air asphyxiation: 1- (5.01 / 12,605.01) = 0.9996
Ice slurry: 1 - (0.24 / 604.57) = 0.9996 - ^
1770 * 7.5 = 13275 shrimp minutes
13275 / 60 / 24 = 9.21875 shrimp days - ^
There are arguments in either direction, but that’s probably not a super productive line of discussion.
- ^
51 * 365.25 * 24 * 60 = 26,823,960 human minutes
26,823,960 / 100,000 = 268.2396 human minutes of excruciating pain
268.2396 / 60 = 4.47 human hours of excruciating pain13275 / 268.2396 = 49.49 (shrimp : human ratio)
- ^
Otherwise I could just copy your entire BOTEC, and change the bottom figure to 1000 instead of 100k, and change your topline results by 2 OOMs.
Annoying pain is 10% as intense as fully healthy life.
Hurtful pain is as intense as fully healthy life.
Disabling pain is 10 times as intense as fully healthy life.
Excruciating pain is 100 k times as intense as fully healthy life.
- ^
I might be misunderstanding you, but are you saying that after reading the GD updates, we should update on VP equally to GD / that we should expect the relative cost-effectiveness ratio between the two to remain the same?
Thanks for the response, and likewise—hope you’ve been well! (Sorry I wasn’t sure if it was you or someone else on the account).
I agree that it is pretty reasonable to stick with the same benchmark, but I think this means it should be communicated accordingly, as VP are sometimes referring to a benchmark and other times referring to the GD programme, while GW are sticking to the same benchmark for their cost-effectiveness analyses, but updating their estimates of GD programmes.[1]
E.g. the predictive CEA (pg 7) referenced says:
“This means that a $1000 donation to Vida Plena would produce 58 WELLBYs, which is 8 times more cost-effective than GiveDirectly (a charity that excels in delivering cash transfers—simply giving people money—and a gold standard in effective altruism)”[2]
I think people would reasonably misinterpret this to mean you are referring to the GD programme, rather than the GW benchmark.[3] Again I know this is a v recent update and so hadn’t expected it to be already updated! But just flagging this as a potential source of confusion in the future.
Separately, I just thought I’d register interest in a more up-to-date predictive CEA that comes before your planned 2026 analysis, in part because there’s decent reason to do so (though I’m not making the stronger claim that this is more important than other things on your plate!), 2026 is a while away, and because it’s plausibly decision relevant for potential donors if they’re not sure the extent to which HLI updates might be applicable to VP.
- ^
“Thus, we will be using our historic benchmark until we have thought it through. For now, you can think of our benchmark as “GiveWell’s pre-2024 estimate of the impacts of cash transfers in Kenya,” with GiveDirectly’s current programs in various countries coming in at 3 to 4 times as cost-effective as that benchmark.”
- ^
The summary table on the same page also just says “GiveDirectly”.
- ^
To VP’s credit, I think “eight times more cost-effective than the benchmark of direct cash transfers” in this post would likely be interpreted correctly in a high context setting (but I also think reasonably might not be, and so may still be worth clarifying).
- ^
Hey team, thanks for sharing this update!
A few comments, not intended as a knock on Vida Plena’s programme, but perhaps more relevant to how it’s communicated:
You can save a life as depression not only causes immense human suffering but is a deadly disease. Approximately 24% of the individuals we support are at high risk of suicide and as a result, face a 3- to 11-fold increased likelihood of dying by suicide within the next 30 days.
Given this is the first bullet under “helping a life flourish” I thought this might benefit from some clarification, because the vast majority of the value of this programme is likely not from suicide prevention, given low absolute rates of suicide.
From the same source: “at two years, the cumulative hazard of suicide death ranged from 0.12% in young adults to 0.18% in older adults.” Under unreasonably optimistic assumptions,[1] Vida Plena would prevent 1 suicide every 500 participants / prevent a suicide for $116,500, which is something between 21x to 39x less cost effective than GiveWell top charities.[2] More reasonable assumptions would drop this upper bound to 1 suicide prevented every ~1200 participants, or ~$272,000 per suicide prevented / ~50-90x less effective than GW top charities.[3]
Given you hope to reach 2,000 people by the end of 2025 for $50,000, this suggests a reasonable upper bound is something like 2 additional suicides prevented.[4]
This isn’t a claim that the cost-effectiveness claims are necessarily incorrect, even with minimal suicide prevention. A quick sense check RE: $462/DALY and 0.22 DALYs per participant would imply that Vida Plena would need to more than halve their cost per participant (from $233 down to $101), and then achieve results comparable to “~100% of people with severe / moderate mild depression conclude the programme going down one level of severity or something like ~5 points on the PHQ9 score (severe --> moderate; moderate --> mild; mild --> no depression).”[5] This is well within your listed results—though as you note in your annual report these have some fairly significant sources of bias and (IMO) probably should not be taken at face value.Some other comments:
The NBER paper quoted in “g-IPT has also demonstrated long-term positive effects” looked at the “Healthy Activity Programme” (HAP)[6] and the “Thinking Healthy Programme Peer-Delivered” (THPP).[7] Neither of these are g-IPT programmes.
The minimal and unsustained results from the Baird RCT may be worth incorporating in an updated analysis, given the predictive CEA is from 2022[8]
From the predictive CEA: “Vida Plena’s overall effect for a household is 7.18*0.75*0.83 = 4.49 (95% CI: 0.77, 31.04) WELLBYs per person treated”. HLI recently decreased their estimate for StrongMinds treatment effects by 80% from 10.49 to 2.15 WELLBYs per treatment (also including household spillovers, and estimated StrongMinds to be “3.7x (previously 8x) as cost-effective as GiveDirectly”.
The cost-effectiveness of GiveDirectly has gone up by 3-4x (GW blog, GD blog), though this was recent news and does not necessarily imply that WELLBYs will also go up by 3-4x (most of this increase is attributable to increased consumption) - but should constitute a discount at least.
- ^
Even if 100% (rather than 24%) of individuals were in the high risk group (i.e. suicidal ideation nearly every day), and even if you dropped 100% of individuals risk of suicide from 0.2% to zero (rather than reducing it by 3-11x or to baseline), and if this effect persisted forever rather than just the initial 30 days
- ^
233 * 500 / 3000 = 38.83
233 * 500 / 5500 = 21.18 (assuming 1 prevented suicide = 1 life saved) - ^
If 24% of your participants were high risk (7x risk, at 0.18%), and the other 76% of them were half of that (3.5x risk, at 0.09%), and you successfully reduced 100% of participants to baseline (0.026%), you would prevent 1 suicide every 1169 participants, which comes to ~$272,000 per life saved, or ~50-90x less cost effective than GW top charities.
(0.18-0.026) * 0.24 + (0.09-0.026) * 0.76 = 0.0856
100 / 0.0856 = 1168.2
1168.2 * 233 = 272190.6
272190.6 / 3000 = 90.73
272190.6 / 5500 = 49.4892 - ^
It’s also worth noting these are cumulative hazards at 2 years rather than 30, and the hazard ratios at 365 days are approximately halved compared to 30 days (1.7- to 5.7 instead of 3.3-10.8), so these figures are plausibly a few factors optimistic still.
- ^
Severe --> moderate depression is about 0.262 DALYs averted, moderate --> mild depression is about 0.251 DALYs averted, and mild --> no depression is about 0.145 DALYs averted.
- ^
HAP is described as “a psychological treatment based on behavioral activation...consist[ing] of 6 to 8 weekly sessions of 30 to 40 minutes each, delivered individually at participants’ homes or at the local PHC.”
- ^
THPP is a simplified version of a psychological intervention (THP) for treating perinatal depression that has been found to be effective in similar settings and is recommended by the WHO (Rahman et al., 2008, 2013; WHO, 2015; Baranov et al., 2020). While the original THP trials employed a full-fledged cognitive behavioral therapy (CBT) intervention, THPP was a simpler intervention focused on behavioral activation, as in the HAP trial described above. THPP was designed to be delivered by peer counselors, instead of community health workers as in previous trials.
- ^
[taken from here, emphasis added]:
-Our findings add to this evidence base by showing 12-month modest improvements of 20%-30% in rates of minimal depression for adolescents assigned to IPT-G, with these effects completely dissipating by the 24-month follow-up. We similarly find small short-term impacts on school enrollment, delayed marriage, desired fertility and time preferences, but fail to conclude that these effects persist two years after therapy.-Given impact estimates of a reduction in the prevalence of mild depression of 0.054 pp for a period of one year, it implies that the cost of the program per case of depression averted was nearly USD 916, or 2,670 in 2019 PPP terms.
-This implies that ultimately the program cost USD PPP (2019) 18,413 per DALY averted. (almost 8x Uganda’s GDP per capita)
Without checking the assumptions / BOTEC in detail, I’ll just flag that this implies you are indifferent between saving 1 human life (not just averting 1 malaria death) and helping ~1000-1900 shrimp die via electrical stunning instead of dying via air asphyxiation / ice slurry.[1]
(This is not an anti-SWP comment tbc!)- ^
Depending on whether you are using $3000 to $5500 per life saved;
15000 * 3000 / 43500 = 1034.5
15000 * 5500 / 43500 = 1896.6
- Ways I see the global health → animal welfare shift backfiring by 9 Oct 2024 10:47 UTC; 83 points) (
- 17 Nov 2024 8:31 UTC; 4 points) 's comment on The Case For Giving To The Shrimp Welfare Project by (
- ^
I think that certain EA actions in ai policy are getting a lot of flak.
Also, I suspect that the current EA AI policy arm could find ways to be more diplomatic and cooperative
Would you be happy to expand on these points?
It sounds like you’re interpreting my claim to be “the Baird RCT is a particularly good proxy (or possibly even better than other RCTs on group therapy in adult women) for the SM adult programme effectiveness”, but this isn’t actually my claim here; and while I think one could reasonably make some different, stronger (donor-relevant) claims based on the discussions on the forum and the Baird RCT results, mine are largely just: “it’s an important proxy”, “it’s worth updating on”, and “the relevant considerations/updates should be easily accessible on various recommendation pages”. I definitely agree that an RCT on the adult programme would have been better for understanding the adult programme.
(I’ll probably check out of the thread here for now, but good chatting as always Nick! hope you’re well)
Yes, because:
1) I think this RCT is an important proxy for StrongMinds (SM)‘s performance ‘in situ’, and worth updating on—in part because it is currently the only completed RCT of SM. Uninformed readers who read what is currently on e.g. GWWC[1]/FP[2]/HLI website might reasonably get the wrong impression of the evidence base behind the recommendation around SM (i.e. there are no concerns sufficiently noteworthy to merit inclusion as a caveat). I think the effective giving community should have a higher bar for being proactively transparent here—it is much better to include (at minimum) a relevant disclaimer like this, than to be asked questions by donors and make a claim that there wasn’t capacity to include.[3]
2) If a SM recommendation is justified as a result of SM’s programme changes, this should still be communicated for trust building purposes (e.g. “We are recommending SM despite [Baird et al RCT results], because …), both for those who are on the fence about deferring, and for those who now have a reason to re-affirm their existing trust on EA org recommendations.[4]
3) Help potential donors make more informed decisions—for example, informed readers who may be unsure about HLI’s methodology and wanted to wait for the RCT results should not have to go search this up themselves or look for a fairly buried comment thread on a post from >1 year ago in order to make this decision when looking at EA recommendations / links to donate—I don’t think it’s an unreasonable amount of effort compared to how it may help. This line of reasoning may also apply to other evaluators (e.g. GWWC evaluator investigations).[5]- ^
GWWC website currently says it only includes recommendations after they review it through their Evaluating Evaluators work, and their evaluation of HLI did not include any quality checks of HLI’s work itself nor finalise a conclusion. Similarly, they say: “we don’t currently include StrongMinds as one of our recommended programs but you can still donate to it via our donation platform”.
- ^
Founders Pledge’s current website says:
We recommend StrongMinds because IPT-G has shown significant promise as an evidence-backed intervention that can durably reduce depression symptoms. Crucial to our analysis are previous RCTs
- ^
I’m not suggesting at all that they should have done this by now, only ~2 weeks after the Baird RCT results were made public. But I do think three months is a reasonable timeframe for this.
- ^
If there was an RCT that showed malaria chemoprevention cost more than $6000 per DALY averted in Nigeria (GDP/capita * 3), rather than per life saved (ballpark), I would want to know about it. And I would want to know about it even if Malaria Consortium decided to drop their work in Nigeria, and EA evaluators continued to recommend Malaria Consortium as a result. And how organisations go about communicating updates like this do impact my personal view on how much I should defer to them wrt charity recommendations.
- ^
Of course, based on HLI’s current analysis/approach, the ?disappointing/?unsurprising result of this RCT (even if it was on the adult population) would not have meaningfully changed the outcome of the recommendation, even if SM did not make this pivot (pg 66):
Therefore, even if the StrongMinds-specific evidence finds a small total recipient effect (as we present here as a placeholder), and we relied solely on this evidence, then it would still result in a cost-effectiveness that is similar or greater than that of GiveDirectly because StrongMinds programme is very cheap to deliver.
And while I think this is a conversation that has already been hashed out enough on the forum, I do think the point stands—potential donors who disagree with or are uncertain about HLI’s methodology here would benefit from knowing the results of the RCT, and it’s not an unreasonable ask for organisations doing charity evaluations / recommendations to include this information.
- ^
Based on Nigeria’s GDP/capita * 3
- ^
Acknowledging that this is DALYs not WELLBYs! OTOH, this conclusion is not the GiveWell or GiveDirectly bar, but a ~mainstream global health cost-effectiveness standard of ~3x GDP per capita per DALY averted (in this case, the ~$18k USD PPP/DALY averted of SM is below the ~$7k USD PPP/DALY bar for Uganda)
- ^
My view is that HLI[1], GWWC[2], Founders Pledge[3], and other EA / effective giving orgs that recommend or provide StrongMinds as an donation option should ideally at least update their page on StrongMinds to include relevant considerations from this RCT, and do so well before Thanksgiving / Giving Tuesday in Nov/Dec this year, so donors looking to decide where to spend their dollars most cost effectively can make an informed choice.[4]
- ^
Listed as a top recommendation
- ^
Not currently a recommendation, (but to included as an option to donate)
- ^
Currently tagged as an “active recommendation”
- ^
Acknowledging that HLI’s current schedule is “By Dec 2024”, though this may only give donors 3 days before Giving Tuesday.
- ^
Congratulations on the pilot!
I just thought I’d flag some initial skepticism around the claim:
Our estimates indicate that next year, we will become 20 times as cost-effective as cash transfers.
Overall I expect it may be difficult for the uninformed reader to know how much they should update based on this post (if at all), but given you have acknowledged many of these (fairly glaring) design/study limitations in the text itself, I am somewhat surprised the team is still willing to make the extrapolation from 7x to 20x GD within a year. It also requires that the team is successful with increasing effective outreach by 2 OOMs despite currently having less than 6 months of runway for the organisation.[1]
I also think this pilot should not give the team “a reasonable level of confidence that [the] adaptation of Step-by-Step was effective” insofar as the claim is that charitable dollars here are cost competitive with top GiveWell charities / have good reason to believe you will be 2x top GiveWell charities next year) (though perhaps you just meant from an implementation perspective, not cost-effectiveness). My current view is that while this might be a reasonable place to consider funding for non-EA funders (or e.g. specifically interested in mental health or mental health in India), I’d hope that the EA community who are looking to maximise impact through their donations in the GHD space would update based on higher evidentiary standards than what has been provided in this post, which IMO indicates little beyond feasibility and acceptability (which is still promising and exciting news, and I don’t want to diminsh this!)I don’t want this to come across as a rebuke of the work the team is trying to do—I am on the record for being excited about more people doing work that use subjective wellbeing on the margin, and I think this is work worth doing. But I hope the team is mindful that continued overconfident claims in this space may cause people to negatively update and less likely to fund this work in future, and for totally preventable communication-related decisions, and not because wellbeing approaches are bad/not worth funding in principle.
- ^
A very crude BOTEC based only on the increased time needed for the 15min / week calls with 10,000 people indicates something like 17 additional guides doing the 15min calls full time, assuming they do nothing but these calls every day. The increase in human resources to scale up to reaching 10,000 people are of course much more intensive than this, even for a heavily WhatsApp based intervention.
10000 * 0.25 * 6 * 0.27 / 40 / 6 = 16.875
(number reached * hours per week * weeks * retention / hours per week / week)
- ^
Hey Ben! A few quick Qs:
Did the team consider a paid/minimum wage position instead of an unpaid one? How did it decide on the unpaid positions?
Is the theory of change for impact here mainly an “upskill students/early career researchers” thing, or for the benefits to RP’s research outputs?
What is RP’s current policy on volunteers?
Does RP expect to continue recruiting volunteers for research projects in the future?
Articles about recent OpenAI departures
I think it is entirely possible that people are being unkind because they updated too quickly on claims from Ben’s post that are now being disputed, and I’m grateful that you’ve written this (ditto chinscratch’s comment) as a reminder to be empathetic. That being said, there are also some reasons people might be less charitable than you are for reasons that are unrelated to them being unkind, or the facts that are in contention:
I have only heard good things about Nonlinear, outside these accusations
Right now, on the basis of what could turn out to have been a lot of lies, their reputations, friendship futures and careers are at risk of being badly damaged
Without commenting on whether Ben’s original post should have been approached better or worded differently or was misleading etc, this comment from the Community Health/Special Projects team might add some useful additional context. There are also previous allegations that have been raised.[1]
Perhaps you are including both of these as part of the same set of allegations, but some may suggest that not being permitted to run sessions / recruit at EAGs and considering blocking attendance (especially given the reference class of actions that have prompted various responses that you can see here) is qualitatively important and may affect whether commentors are being charitable or not (as opposed to if they just considered the contents of Ben’s post VS Nonlinear (NL)’s response). Of course, this depends on how much you think the Community Health/Special Projects team are trustworthy with their judgement / investigation, or how likely this is all just an information cascade etc.
It seems reasonable to assume that the people at Nonlinear are altruistic people.
It is possible for altruistic people to be poor managers, poor leaders, make bad decisions about professional boundaries, have a poor understanding of power dynamics, or indeed, be abusive. The extent to which people at NL are altruistic is (afaict) not a major point of contention, and it is possible to not update about how altruistic someone is while also wanting to hold them accountable to some reasonable standard like “not being abusive or manipulative towards people you manage”.
Instead, as I see it, the main, or at least most upvoted, response here has been to critique stylistic mistakes made in their almost impossible task of refuting very damaging claims from anonymous sources in unknown contexts.
The claims in question from Alice/Chloe/Ben are not anonymous, the identities of Alice and Chloe are known to the Nonlinear team.
Independent of my personal views on these issues, I do think the pushback around ‘stylistic mistakes’ are reasonable insofar as people interpret this to be indicative of something concerning about NL’s approach towards managing staff / criticism / conflict (1, 2, 3), rather than e.g. just being nitpicky about tone, though I appreciate both interpretations are plausible.
I’d like people to imagine what they would do in a similar situation if they were faced with similar accusations. How would you successfully persuade people that you didn’t do the things you were accused of, and that the context was not as portrayed?
I think (much) less is more in this case.[2] I think there are parts of this current post that feel more subjective and not supported by facts, and may be reasonably interpreted by a cynical outsider to look like a distraction or a defensive smear campaign. I think these choices are counterproductive (both for a truth-seeking outsider, and for NL’s own interests), especially given the allegations of frame control and being retaliatory.
There are other parts that might similarly be reasonably interpreted to range from irrelevant (Alice’s personal drug use habits), unproductive (links to Kathy Forth), or misleading (inclusion of photos, inconsistent usage of quotation marks, unnecessary paraphrasing, usage of quotes that miss the full context). I disagreed with the approaches here, though I acknowledge there were competing opinions and I wasn’t privy to the internal discussions that lead to the decisions.
I think a cleaner version of this would have probably been something 5 to 10x shorter (not including the appendix), and looked something like:[3]
Apology for harms done
Acknowledgement of which allegations are seen as the most major (much closer to top 3-5 than all 85)
Responses to major allegations, focusing only on factual differences and claims that are backed up by ~irrefutable evidence
Charitable interpretations of Alice/Chloe/Ben’s position, despite above factual disagreement (what kinds of things need to be true for their allegations to be plausibly reasonable or fair from their perspective),
Lessons learnt, and things NL will do differently in future (some expression of self-awareness / reflection)
An appendix containing a list of unresolved but less critical allegations
Disclaimer: I offered to (and did) help review an early draft, in large part because I expected the NL team to (understandably!) be in panic mode after Ben’s post/getting dogpiled, and I wanted further community updates to be based on as much relevant information as was possible.- ^
This footnote added in response to Jeff’s comment: I agree that it’s likely not double counting, because the story there appears to be one where Kat left the working relationship, which is inconsistent with the accounts of Alice / Chloe’s situations, but also makes it unlikely that the “current employee of NL / Kat” hypothesis is correct.
- ^
Perhaps hypocritical given the length of this comment
- ^
Acknowledging that I have no PR expertise
I’ll say up front that I definitely agree that we should look into the impacts on worms a nonzero amount! The main reason for the comment is that I don’t think the appropriate bar for whether or not the project should warrant more investigation is whether or not it passes a BOTEC under your set of assumptions (which I am grateful for you sharing—I respect your willingness to share this and your consistency).
Again, not speaking on behalf of the team—but I’m happy to bite the bullet and say that I’m much more willing to defer to some deontological constraints in the face of uncertainty, rather than follow impartiality and maximising expected value all the way to its conclusion, whatever those conclusions are. This isn’t an argument against the end goal that you are aiming for, but more my best guess in terms of how to get there in practice.
I suspect this might be driven by it not being considered to be bad under your own worldview? Like it’s unsurprising that your preferred worldview doesn’t recommend actions that you consider bad, but actually my guess is that not working on global poverty and development for the meat eater problem is in fact an action that might be widely considered bad in real life for many reasonable operationalisations (though I don’t have empirical evidence to support this).[1]
I do agree with you on the word choices under this technical conception of excruciating pain / extreme torture,[2] though I think the idea that it ‘definitionally’ can’t be sustained beyond minutes does have some potential failure modes.
That being said, I wasn’t actually using torture as a descriptor for the screwworm situation, more just illustrating what I might consider a point of difference between our views, i.e. that I would not be in favour of allowing humans to be tortured by AIs even if you created a BOTEC showed that this caused net positive utils in expectation; and I would not be in favour of an intervention to spread the new world screwworm around the world, even if you created a BOTEC that showed it was the best way of creating utils—I would reject these at least on deontological grounds in the current state of the world.
This is not to suggest that I think “widely considered bad” is a good bar here! A lot of moral progress came from ideas that initially were “widely considered bad”. Just suggesting this particular defence of impartiality + hedonism; namely that it “does not recommend actions widely considered bad in real life” seems unlikely to be correct—simply because most people are not impartial hedonists to the extent you are.
Neither of which were my wording!