Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.
Feel free to reach out if you think there’s anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you’re a medical student / junior doctor reconsidering your clinical future, or if you’re quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.
Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.
All comments in personal capacity unless otherwise stated.
bruce
The ethical vegan must therefore decide whether their objection is to animals dying or to animals living.
One might object to animal suffering, rather than living/dying. So a utilitarian might say factory farming is bad because of the significantly net-negative states that animals endure while alive, while being OK with eating meat from a cow that is raised in a way such that it is living a robustly net positive life, for example.[1]
If you’re really worried about reducing the number of animal life years, focus on habitat destruction—it obviously kills wildlife on net, while farming is about increasing lives.
This isn’t an obvious comparison to me, there are clear potential downsides of habitat destruction (loss of ecosystem services) that don’t apply to reducing factory farming. There are also a lot of uncertainties around impacts of destroying habitats—it is much harder to recreate the ecosystem and its benefits than to re-introduce factory farming if we are wrong in either case. One might also argue that we might have a special obligation to reduce the harms we cause (via factory farming) than attempt habitat destruction, which is reducing suffering that exists ~independently of humans.
...the instrumentalization of animals as things to eat is morally repugnant, so we should make sure it’s not perpetuated. This seems to reflect a profound lack of empathy with the perspective of a domesticate that might want to go on existing. Declaring a group’s existence repugnant and acting to end it is unambiguously a form of intergroup aggression.
I’m not sure I’m understanding this correctly. Are you saying animals in factory farms have to be able to indicate to you that they don’t want to go on existing in order for you to consider taking action on factory farming? What bar do you think is appropriate here?
If factory farming seems like a bad thing, you should do something about the version happening to you first.
If there were 100 billion humans being killed for meat / other products every year and living in the conditions of modern factory farms, I would most definitely prioritise and advocate for that as a priority over factory farming.
The domestication of humans is particularly urgent precisely because, unlike selectively bred farm animals, humans are increasingly expressing their discontent with these conditions, and—more like wild animals in captivity than like proper domesticates—increasingly failing even to reproduce at replacement rates.
Can you say more about what you mean by “the domestication of humans”? It seems like you’re trying to draw a parallel between domesticated animals and domesticated humans, or modern humans and wild animals in captivity, but I’m not sure what the parallel you are trying to draw is. Could you make this more explicit?
This suggests our priorities have become oddly inverted—we focus intense moral concern on animals successfully bred to tolerate their conditions, while ignoring similar dynamics affecting creatures capable of articulating their objections...
This seems like a confusing argument. Most vegans I know aren’t against factory farming because it affects animal replacement rates. It’s also seems unlikely to me that reduced fertility rates in humans is a good proxy/correlate for the amount of suffering that exists (it’s possible that the relationship isn’t entirely linear, but if anything, historically the opposite is more true—countries have reduced fertility rates as they develop and standards of living improve). It’s weird that you use fertility rates as evidence for human suffering but seem to have a extremely high bar for animal suffering! Most of the evidence I’m aware of would strongly point to factory farmed animals in fact not tolerating their conditions well.
...who are moreover the only ones known to have the capacity and willingness to try to solve problems faced by other species.
This is a good argument to work on things that might end humanity or severely diminish it’s ability to meaningfully + positively affect the world. Of all the options that might do this, where would you rank reduced fertility rates?
- ^
Though (as you note) one might also object to farming animals for food for rights-based rather than welfare-based reasons.
- 30 Jan 2025 7:30 UTC; 5 points) 's comment on Ethical Veganism is Wrong by (
- ^
Screwworm Free Future is hiring for a Director
Reposting from LessWrong, for people who might be less active there:[1]
TL;DRFrontierMath was funded by OpenAI[2]
This was not publicly disclosed until December 20th, the date of OpenAI’s o3 announcement, including in earlier versions of the arXiv paper where this was eventually made public.
There was allegedly no active communication about this funding to the mathematicians contributing to the project before December 20th, due to the NDAs Epoch signed, but also no communication after the 20th, once the NDAs had expired.
OP claims that “I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation. I am not aware of an agreement between Epoch AI and OpenAI that prohibits using this dataset for training if they wanted to, and have slight evidence against such an agreement existing.”
Seems to have confirmed the OpenAI funding + NDA restrictions
Claims OpenAI has “access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities.”
They also have “a verbal agreement that these materials will not be used in model training.”
Edit (19/01): Elliot (the project lead) points out that the holdout set does not yet exist (emphasis added):As for where the o3 score on FM stands: yes I believe OAI has been accurate with their reporting on it, but Epoch can’t vouch for it until we independently evaluate the model using the holdout set we are developing.[3]
Edit (24/01):
Tamay tweets an apology (possibly including the timeline drafted by Elliot). It’s pretty succinct so I won’t summarise it here! Blog post version for people without twitter. Perhaps the most relevant point:OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned work, OpenAI retains ownership of these questions and has access to the problems and solutions.
Nat from OpenAI with an update from their side:
We did not use FrontierMath data to guide the development of o1 or o3, at all.
We didn’t train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular
I’m extremely confident, because we only downloaded frontiermath for our evals *long* after the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked .
============
Some quick uncertainties I had:
What does this mean for OpenAI’s 25% score on the benchmark?
What steps did Epoch take or consider taking to improve transparency between the time they were offered the NDA and the time of signing the NDA?
What is Epoch’s level of confidence that OpenAI will keep to their verbal agreement to not use these materials in model training, both in some technically true sense, and in a broader interpretation of an agreement? (see e.g. bottom paragraph of Ozzi’s comment).
In light of the confirmation that OpenAI not only has access to the problems and solutions but has ownership of them, what steps did Epoch consider before signing the relevant agreement to get something stronger than a verbal agreement that this won’t be used in training, now or in the future?
- ^
Epistemic status: quickly summarised + liberally copy pasted with ~0 additional fact checking given Tamay’s replies in the comment section
- ^
arXiv v5 (Dec 20th version) “We gratefully acknowledge OpenAI for their support in creating the benchmark.”
- ^
See clarification in case you interpreted Tamay’s comments (e.g. that OpenAI “do not have access to a separate holdout set that serves as an additional safeguard for independent verification”) to mean that the holdout set already exists
I’ll say up front that I definitely agree that we should look into the impacts on worms a nonzero amount! The main reason for the comment is that I don’t think the appropriate bar for whether or not the project should warrant more investigation is whether or not it passes a BOTEC under your set of assumptions (which I am grateful for you sharing—I respect your willingness to share this and your consistency).
Again, not speaking on behalf of the team—but I’m happy to bite the bullet and say that I’m much more willing to defer to some deontological constraints in the face of uncertainty, rather than follow impartiality and maximising expected value all the way to its conclusion, whatever those conclusions are. This isn’t an argument against the end goal that you are aiming for, but more my best guess in terms of how to get there in practice.
Impartiality and hedonism often recommend actions widely considered bad in super remote thought experiments, but, as far as I am aware, none in real life.
I suspect this might be driven by it not being considered to be bad under your own worldview? Like it’s unsurprising that your preferred worldview doesn’t recommend actions that you consider bad, but actually my guess is that not working on global poverty and development for the meat eater problem is in fact an action that might be widely considered bad in real life for many reasonable operationalisations (though I don’t have empirical evidence to support this).[1]
I do agree with you on the word choices under this technical conception of excruciating pain / extreme torture,[2] though I think the idea that it ‘definitionally’ can’t be sustained beyond minutes does have some potential failure modes.
That being said, I wasn’t actually using torture as a descriptor for the screwworm situation, more just illustrating what I might consider a point of difference between our views, i.e. that I would not be in favour of allowing humans to be tortured by AIs even if you created a BOTEC showed that this caused net positive utils in expectation; and I would not be in favour of an intervention to spread the new world screwworm around the world, even if you created a BOTEC that showed it was the best way of creating utils—I would reject these at least on deontological grounds in the current state of the world.- ^
This is not to suggest that I think “widely considered bad” is a good bar here! A lot of moral progress came from ideas that initially were “widely considered bad”. Just suggesting this particular defence of impartiality + hedonism; namely that it “does not recommend actions widely considered bad in real life” seems unlikely to be correct—simply because most people are not impartial hedonists to the extent you are.
- ^
Neither of which were my wording!
- ^
Speaking for myself / not for anyone else here:
My (highly uncertain + subjective) guess is that each lethal infection is probably worse than 0.5 host-years equivalents, but the number of worms per host animal probably could vary significantly.
That being said, personally I am fine with the assumption of modelling ~0 additional counterfactual suffering for screwworms that are never brought into existence, rather than e.g. an eradication campaign that involves killing existing animals.
I’m unsure how to think about the possibility that the screwworm species which might be living significantly net positive lives such that it trumps the benefit of reduced suffering from screwworm deaths, but I’d personally prefer stronger evidence for wellbeing or harms on the worm’s end to justify inaction here (ie not look into the possibility/feasibility of this)Again, speaking only for myself—I’m not personally fixated on either gene drives or sterile insect approaches! I am also very interested in finding out reasons to not proceed with the project, find alternative approaches, which doesn’t preclude the possibility that the net welfare of screwworms should be more heavily weighed as a consideration. That being said, I would be surprised if something like “we should do nothing to alleviate host animal suffering because their suffering can provide more utils for the screwworm” was a sufficiently convincing reason to not do more work / investigation in this area (for nonutilitarian reasons), though I understand there are a set of assumptions / views one might hold that could drive disagreement here.[1]
- ^
If a highly uncertain BOTEC showed you that torturing humans would bring more utility to digital beings than the suffering incurred on the humans, would you endorse allowing this? At what ratio would you change your mind, and how many OOMs of uncertainty on the BOTEC would you be OK with?
Or—would you be in favour of taking this further and spreading the screwworm globally simply because it provides more utils, rather than just not eradicating the screwworm?
- ^
Launching Screwworm-Free Future – Funding and Support Request
It’s fine to apply regardless; there’s one application form for all 2025 in-person EAGs. You’ll likely be sent an email separately closer to the time reminding you that you can register for the East Coast EAG, and be directed to a separate portal where you can do this without needing to apply again.
Hey team—are you happy to share a bit more about who would be involved in these projects, and their track record (or Whylome’s more broadly)? I only spent a minute or so on this but I can’t find any information online beyond your website and these links, related to SMTM’s “exposure to subclinical doses of lithium is responsible for the obesity epidemic” hypothesis (1, 2).
More info on how much money you’re looking for the above projects would also be useful.
Ah my bad, I meant extreme pain above there as well, edited to clarify! I agree it’s not a super important assumption for the BOTEC in the grand scheme of things though.
However, if one wants to argue that I overestimated the cost-effectiveness of SWP, one has to provide reasons for my guess overestimating the intensity of excruciating pain.
I don’t actually argue for this in either of my comments.[1] I’m just saying that it sounds like if I duplicated your BOTEC, and changed this one speculative parameter to 2 OOMs lower, an observer would have no strong reason to choose one BOTEC over another just by looking at the BOTEC alone. Expressing skepticism of an unproven claim doesn’t produce a symmetrical burden of proof on my end!
Mainly just from a reasoning transparency point of view I think it’s worth fleshing out what these assumptions imply and what is grounding these best guesses[2] - in part because I personally want to know how much I should update based on your BOTEC, in part because knowing your reasoning might help me better argue why you might (or might not) have overestimated the intensity of excruciating pain if I knew where your ratio came from (and this is why I was checking the maths and seeing if these were correct, and asking if there’s stronger evidence if so, before critiquing the 100k figure), and because I think other EAF readers, as well as broader, lower-context audience of EA bloggers would benefit from this too.
If you did that, SWP would still be 434 (= 43.4*10^3*10^3/(100*10^3)) times as cost-effective as GiveWell’s top charities.
Yeah, I wasn’t making any inter-charity comparisons or claiming that SWP is less cost-effective than GW top charities![3] But since you mention it, it wouldn’t be surprising to me if losing 2 OOMs might make some donors favour other animal welfare charities over SWP for example—but again, the primary purpose of these comments is not to litigate which charity is the best, or whether this is better or worse than GW top charities, but mainly just to explore a bit more around what is grounding the BOTEC, so observers have a good sense on how much they should update based on how compelling they find the assumptions / reasoning etc.
I think it is also worth wondering about whether you trully believe that updated intensity. Do you think 1 day of fully healthy life plus 86.4 s (= 0.864*100*10^3/100) of scalding or severe burning events in large parts of the body, dismemberment, or extreme torture would be neutral?
Nope! I would rather give up 1 day of healthy life than 86 seconds of this description. But this varies depending on the timeframe in question.
For example, I’d probably be willing to endure 0.86 seconds of this for 14 minutes of healthy life, and I would definitely endure 0.086 seconds of this than give up 86 seconds of healthy life.
And using your assumptions (ratio of 100k), I would easily rather have 0.8 seconds of this than give up 1 day of healthy life, but if I had to endure many hours of this I could imagine my tradeoffs approaching, or even exceeding 100k.
I do want to mention that I think it’s useful that someone is trying to quantify these comparisons, I’m grateful for this work, and I want to emphasise that these are about making the underlying reasoning more transparent / understanding the methodology that leads to the assumptions in the BOTEC, rather than any kind of personal criticism!
So I suppose I would be wary of saying that GiveDirectly now have 3–4x the WELLBY impact relative to Vida Plena—or even to say that GiveDirectly have any more WELLBY impact relative to Vida Plena
Ah right—yeah I’m not making either of these claims, I’m just saying that if the previous claim (from VP’s predictive CEA) was that: “Vida Plena...is 8 times more cost-effective than GiveDirectly”, and GD has since been updated to 3-4x more cost-effective than it was compared to the time the predictive CEA was published, we should discount the 8x claim downwards somewhat (but not necessarily by 3-4x).
I think one could probably push back on whether 7.5 minutes of [extreme] pain is a reasonable estimate for a person who dies from malaria, but I think the bigger potential issue is still that the result of the BOTEC seems highly sensitive to the “excruciating pain is 100,000 times worse than fully healthy life is good” assumption—for both air asphyxiation and ice slurry, the time spent under excruciating pain make up more than 99.96% of the total equivalent loss of healthy life.[1]
I alluded to this on your post, but I think your results imply you would prefer to avert 10 shrimp days of excruciating pain (e.g. air asphyxiation / ice slurry) over saving 1 human life (51 DALYs).[2]
If I use your assumption and also value human excruciating pain as 100,000 times as bad as healthy life is good,[3] then this means you would prefer to save 10 shrimp days of excruciating pain (using your air asphyxiation figures) over 4.5 human hours of excruciating pain,[4] and your shrimp to human ratio is less than 50:1 - that is, you would rather avert 50 shrimp minutes of excruciating pain than 1 human minute of excruciating pain.
To be clear, this isn’t a claim that one shouldn’t donate to SWP, but just that if you do bite the bullet on those numbers above then I’d be keen to see some stronger justification beyond “my guess” for a BOTEC that leads to results that are so counterintuitive (like I’m kind of assuming that I’ve missed a step or OOMs in the maths here!), and is so highly sensitive to this assumption.[5]
- ^
Air asphyxiation: 1- (5.01 / 12,605.01) = 0.9996
Ice slurry: 1 - (0.24 / 604.57) = 0.9996 - ^
1770 * 7.5 = 13275 shrimp minutes
13275 / 60 / 24 = 9.21875 shrimp days - ^
There are arguments in either direction, but that’s probably not a super productive line of discussion.
- ^
51 * 365.25 * 24 * 60 = 26,823,960 human minutes
26,823,960 / 100,000 = 268.2396 human minutes of excruciating pain
268.2396 / 60 = 4.47 human hours of excruciating pain13275 / 268.2396 = 49.49 (shrimp : human ratio)
- ^
Otherwise I could just copy your entire BOTEC, and change the bottom figure to 1000 instead of 100k, and change your topline results by 2 OOMs.
Annoying pain is 10% as intense as fully healthy life.
Hurtful pain is as intense as fully healthy life.
Disabling pain is 10 times as intense as fully healthy life.
Excruciating pain is 100 k times as intense as fully healthy life.
- ^
I might be misunderstanding you, but are you saying that after reading the GD updates, we should update on VP equally to GD / that we should expect the relative cost-effectiveness ratio between the two to remain the same?
Thanks for the response, and likewise—hope you’ve been well! (Sorry I wasn’t sure if it was you or someone else on the account).
I agree that it is pretty reasonable to stick with the same benchmark, but I think this means it should be communicated accordingly, as VP are sometimes referring to a benchmark and other times referring to the GD programme, while GW are sticking to the same benchmark for their cost-effectiveness analyses, but updating their estimates of GD programmes.[1]
E.g. the predictive CEA (pg 7) referenced says:
“This means that a $1000 donation to Vida Plena would produce 58 WELLBYs, which is 8 times more cost-effective than GiveDirectly (a charity that excels in delivering cash transfers—simply giving people money—and a gold standard in effective altruism)”[2]
I think people would reasonably misinterpret this to mean you are referring to the GD programme, rather than the GW benchmark.[3] Again I know this is a v recent update and so hadn’t expected it to be already updated! But just flagging this as a potential source of confusion in the future.
Separately, I just thought I’d register interest in a more up-to-date predictive CEA that comes before your planned 2026 analysis, in part because there’s decent reason to do so (though I’m not making the stronger claim that this is more important than other things on your plate!), 2026 is a while away, and because it’s plausibly decision relevant for potential donors if they’re not sure the extent to which HLI updates might be applicable to VP.
- ^
“Thus, we will be using our historic benchmark until we have thought it through. For now, you can think of our benchmark as “GiveWell’s pre-2024 estimate of the impacts of cash transfers in Kenya,” with GiveDirectly’s current programs in various countries coming in at 3 to 4 times as cost-effective as that benchmark.”
- ^
The summary table on the same page also just says “GiveDirectly”.
- ^
To VP’s credit, I think “eight times more cost-effective than the benchmark of direct cash transfers” in this post would likely be interpreted correctly in a high context setting (but I also think reasonably might not be, and so may still be worth clarifying).
- ^
Hey team, thanks for sharing this update!
A few comments, not intended as a knock on Vida Plena’s programme, but perhaps more relevant to how it’s communicated:
You can save a life as depression not only causes immense human suffering but is a deadly disease. Approximately 24% of the individuals we support are at high risk of suicide and as a result, face a 3- to 11-fold increased likelihood of dying by suicide within the next 30 days.
Given this is the first bullet under “helping a life flourish” I thought this might benefit from some clarification, because the vast majority of the value of this programme is likely not from suicide prevention, given low absolute rates of suicide.
From the same source: “at two years, the cumulative hazard of suicide death ranged from 0.12% in young adults to 0.18% in older adults.” Under unreasonably optimistic assumptions,[1] Vida Plena would prevent 1 suicide every 500 participants / prevent a suicide for $116,500, which is something between 21x to 39x less cost effective than GiveWell top charities.[2] More reasonable assumptions would drop this upper bound to 1 suicide prevented every ~1200 participants, or ~$272,000 per suicide prevented / ~50-90x less effective than GW top charities.[3]
Given you hope to reach 2,000 people by the end of 2025 for $50,000, this suggests a reasonable upper bound is something like 2 additional suicides prevented.[4]
This isn’t a claim that the cost-effectiveness claims are necessarily incorrect, even with minimal suicide prevention. A quick sense check RE: $462/DALY and 0.22 DALYs per participant would imply that Vida Plena would need to more than halve their cost per participant (from $233 down to $101), and then achieve results comparable to “~100% of people with severe / moderate mild depression conclude the programme going down one level of severity or something like ~5 points on the PHQ9 score (severe --> moderate; moderate --> mild; mild --> no depression).”[5] This is well within your listed results—though as you note in your annual report these have some fairly significant sources of bias and (IMO) probably should not be taken at face value.Some other comments:
The NBER paper quoted in “g-IPT has also demonstrated long-term positive effects” looked at the “Healthy Activity Programme” (HAP)[6] and the “Thinking Healthy Programme Peer-Delivered” (THPP).[7] Neither of these are g-IPT programmes.
The minimal and unsustained results from the Baird RCT may be worth incorporating in an updated analysis, given the predictive CEA is from 2022[8]
From the predictive CEA: “Vida Plena’s overall effect for a household is 7.18*0.75*0.83 = 4.49 (95% CI: 0.77, 31.04) WELLBYs per person treated”. HLI recently decreased their estimate for StrongMinds treatment effects by 80% from 10.49 to 2.15 WELLBYs per treatment (also including household spillovers, and estimated StrongMinds to be “3.7x (previously 8x) as cost-effective as GiveDirectly”.
The cost-effectiveness of GiveDirectly has gone up by 3-4x (GW blog, GD blog), though this was recent news and does not necessarily imply that WELLBYs will also go up by 3-4x (most of this increase is attributable to increased consumption) - but should constitute a discount at least.
- ^
Even if 100% (rather than 24%) of individuals were in the high risk group (i.e. suicidal ideation nearly every day), and even if you dropped 100% of individuals risk of suicide from 0.2% to zero (rather than reducing it by 3-11x or to baseline), and if this effect persisted forever rather than just the initial 30 days
- ^
233 * 500 / 3000 = 38.83
233 * 500 / 5500 = 21.18 (assuming 1 prevented suicide = 1 life saved) - ^
If 24% of your participants were high risk (7x risk, at 0.18%), and the other 76% of them were half of that (3.5x risk, at 0.09%), and you successfully reduced 100% of participants to baseline (0.026%), you would prevent 1 suicide every 1169 participants, which comes to ~$272,000 per life saved, or ~50-90x less cost effective than GW top charities.
(0.18-0.026) * 0.24 + (0.09-0.026) * 0.76 = 0.0856
100 / 0.0856 = 1168.2
1168.2 * 233 = 272190.6
272190.6 / 3000 = 90.73
272190.6 / 5500 = 49.4892 - ^
It’s also worth noting these are cumulative hazards at 2 years rather than 30, and the hazard ratios at 365 days are approximately halved compared to 30 days (1.7- to 5.7 instead of 3.3-10.8), so these figures are plausibly a few factors optimistic still.
- ^
Severe --> moderate depression is about 0.262 DALYs averted, moderate --> mild depression is about 0.251 DALYs averted, and mild --> no depression is about 0.145 DALYs averted.
- ^
HAP is described as “a psychological treatment based on behavioral activation...consist[ing] of 6 to 8 weekly sessions of 30 to 40 minutes each, delivered individually at participants’ homes or at the local PHC.”
- ^
THPP is a simplified version of a psychological intervention (THP) for treating perinatal depression that has been found to be effective in similar settings and is recommended by the WHO (Rahman et al., 2008, 2013; WHO, 2015; Baranov et al., 2020). While the original THP trials employed a full-fledged cognitive behavioral therapy (CBT) intervention, THPP was a simpler intervention focused on behavioral activation, as in the HAP trial described above. THPP was designed to be delivered by peer counselors, instead of community health workers as in previous trials.
- ^
[taken from here, emphasis added]:
-Our findings add to this evidence base by showing 12-month modest improvements of 20%-30% in rates of minimal depression for adolescents assigned to IPT-G, with these effects completely dissipating by the 24-month follow-up. We similarly find small short-term impacts on school enrollment, delayed marriage, desired fertility and time preferences, but fail to conclude that these effects persist two years after therapy.-Given impact estimates of a reduction in the prevalence of mild depression of 0.054 pp for a period of one year, it implies that the cost of the program per case of depression averted was nearly USD 916, or 2,670 in 2019 PPP terms.
-This implies that ultimately the program cost USD PPP (2019) 18,413 per DALY averted. (almost 8x Uganda’s GDP per capita)
Without checking the assumptions / BOTEC in detail, I’ll just flag that this implies you are indifferent between saving 1 human life (not just averting 1 malaria death) and helping ~1000-1900 shrimp die via electrical stunning instead of dying via air asphyxiation / ice slurry.[1]
(This is not an anti-SWP comment tbc!)- ^
Depending on whether you are using $3000 to $5500 per life saved;
15000 * 3000 / 43500 = 1034.5
15000 * 5500 / 43500 = 1896.6
- Ways I see the global health → animal welfare shift backfiring by 9 Oct 2024 10:47 UTC; 75 points) (
- 17 Nov 2024 8:31 UTC; 4 points) 's comment on The Case For Giving To The Shrimp Welfare Project by (
- ^
I think that certain EA actions in ai policy are getting a lot of flak.
Also, I suspect that the current EA AI policy arm could find ways to be more diplomatic and cooperative
Would you be happy to expand on these points?
It sounds like you’re interpreting my claim to be “the Baird RCT is a particularly good proxy (or possibly even better than other RCTs on group therapy in adult women) for the SM adult programme effectiveness”, but this isn’t actually my claim here; and while I think one could reasonably make some different, stronger (donor-relevant) claims based on the discussions on the forum and the Baird RCT results, mine are largely just: “it’s an important proxy”, “it’s worth updating on”, and “the relevant considerations/updates should be easily accessible on various recommendation pages”. I definitely agree that an RCT on the adult programme would have been better for understanding the adult programme.
(I’ll probably check out of the thread here for now, but good chatting as always Nick! hope you’re well)
Yes, because:
1) I think this RCT is an important proxy for StrongMinds (SM)‘s performance ‘in situ’, and worth updating on—in part because it is currently the only completed RCT of SM. Uninformed readers who read what is currently on e.g. GWWC[1]/FP[2]/HLI website might reasonably get the wrong impression of the evidence base behind the recommendation around SM (i.e. there are no concerns sufficiently noteworthy to merit inclusion as a caveat). I think the effective giving community should have a higher bar for being proactively transparent here—it is much better to include (at minimum) a relevant disclaimer like this, than to be asked questions by donors and make a claim that there wasn’t capacity to include.[3]
2) If a SM recommendation is justified as a result of SM’s programme changes, this should still be communicated for trust building purposes (e.g. “We are recommending SM despite [Baird et al RCT results], because …), both for those who are on the fence about deferring, and for those who now have a reason to re-affirm their existing trust on EA org recommendations.[4]
3) Help potential donors make more informed decisions—for example, informed readers who may be unsure about HLI’s methodology and wanted to wait for the RCT results should not have to go search this up themselves or look for a fairly buried comment thread on a post from >1 year ago in order to make this decision when looking at EA recommendations / links to donate—I don’t think it’s an unreasonable amount of effort compared to how it may help. This line of reasoning may also apply to other evaluators (e.g. GWWC evaluator investigations).[5]- ^
GWWC website currently says it only includes recommendations after they review it through their Evaluating Evaluators work, and their evaluation of HLI did not include any quality checks of HLI’s work itself nor finalise a conclusion. Similarly, they say: “we don’t currently include StrongMinds as one of our recommended programs but you can still donate to it via our donation platform”.
- ^
Founders Pledge’s current website says:
We recommend StrongMinds because IPT-G has shown significant promise as an evidence-backed intervention that can durably reduce depression symptoms. Crucial to our analysis are previous RCTs
- ^
I’m not suggesting at all that they should have done this by now, only ~2 weeks after the Baird RCT results were made public. But I do think three months is a reasonable timeframe for this.
- ^
If there was an RCT that showed malaria chemoprevention cost more than $6000 per DALY averted in Nigeria (GDP/capita * 3), rather than per life saved (ballpark), I would want to know about it. And I would want to know about it even if Malaria Consortium decided to drop their work in Nigeria, and EA evaluators continued to recommend Malaria Consortium as a result. And how organisations go about communicating updates like this do impact my personal view on how much I should defer to them wrt charity recommendations.
- ^
Of course, based on HLI’s current analysis/approach, the ?disappointing/?unsurprising result of this RCT (even if it was on the adult population) would not have meaningfully changed the outcome of the recommendation, even if SM did not make this pivot (pg 66):
Therefore, even if the StrongMinds-specific evidence finds a small total recipient effect (as we present here as a placeholder), and we relied solely on this evidence, then it would still result in a cost-effectiveness that is similar or greater than that of GiveDirectly because StrongMinds programme is very cheap to deliver.
And while I think this is a conversation that has already been hashed out enough on the forum, I do think the point stands—potential donors who disagree with or are uncertain about HLI’s methodology here would benefit from knowing the results of the RCT, and it’s not an unreasonable ask for organisations doing charity evaluations / recommendations to include this information.
- ^
Based on Nigeria’s GDP/capita * 3
- ^
Acknowledging that this is DALYs not WELLBYs! OTOH, this conclusion is not the GiveWell or GiveDirectly bar, but a ~mainstream global health cost-effectiveness standard of ~3x GDP per capita per DALY averted (in this case, the ~$18k USD PPP/DALY averted of SM is below the ~$7k USD PPP/DALY bar for Uganda)
- ^
My view is that HLI[1], GWWC[2], Founders Pledge[3], and other EA / effective giving orgs that recommend or provide StrongMinds as an donation option should ideally at least update their page on StrongMinds to include relevant considerations from this RCT, and do so well before Thanksgiving / Giving Tuesday in Nov/Dec this year, so donors looking to decide where to spend their dollars most cost effectively can make an informed choice.[4]
- ^
Listed as a top recommendation
- ^
Not currently a recommendation, (but to included as an option to donate)
- ^
Currently tagged as an “active recommendation”
- ^
Acknowledging that HLI’s current schedule is “By Dec 2024”, though this may only give donors 3 days before Giving Tuesday.
- ^
I didn’t catch this post until I saw this comment, and it prompted a response. I’m not well calibrated on how much upvotes different posts should get,[1] but personally I didn’t feel disappointed that this post wasn’t on the front page of the EA Forum, and I don’t expect this is a post I’d share with e.g., non-vegans who I’d discuss the meat eater problem with.[2]
I’m assuming you’re talking about the downvotes, rather than the comments? I may be mistaken though.
This isn’t something I’d usually comment because I do think the EA Forum should be more welcoming on the margin and I think there are a lot of barriers to people posting. But just providing one data point given your disappointment/surprise.