I’m not the expert on effective altruism. I don’t identify with that terminology. My impression is that it’s a bit of an outdated term.
Håkon Harnes 🔸
Thanks for sharing this update. I appreciate the transparency and your engagement with the broader community!
I have a few questions about this strategic pivot:
On organizational structure: Did you consider alternative models that would preserve 80,000 Hours’ established reputation as a more “neutral” career advisor while pursuing this AI-focused direction? For example, creating a separate brand or group dedicated to AI careers while maintaining the broader 80K platform for other cause areas? This might help avoid potential confusion where users encounter both your legacy content presenting multiple cause areas and your new AI-centric approach.On the EA pathway: I’m curious about how this shift might affect the “EA funnel”—where people typically enter effective altruism through more intuitive cause areas like global health or animal welfare before gradually engaging with longtermist ideas like AI safety. By positioning 80,000 Hours primarily as an AI-focused organization, are you concerned this might make it harder for newcomers to find their way into the community if AI risk arguments initially seem abstract or speculative to them?
On reputational considerations: Have you weighed the potential reputational risks if AI development follows a more moderate trajectory than anticipated? If we see AI plateau at impressive but clearly non-transformative capabilities, this strategic all-in approach could affect 80,000 Hours’ credibility for years to come. The past decade of 80K’s work as a cause-diverse advisor has created tremendous value—might a spinoff organization for AI-specific work better preserve that accumulated trust while still allowing you to pursue what you see as the highest-impact path?
The slides for GiveWell’s 2020 analysis of AMF are stellar, hopefully we can draw from them at Gi Effektivt! I particularly liked the slides that draws directly from the CEAs.
WHO claims the bump is due to covid-19 disruptions in world malaria report 2023.
From page 18:
”Between 2000 and 2019, case incidence in the WHO African Region decreased from 370 to 226 per 1000 population at risk, but increased to 232 per 1000 population at risk in 2020, mainly because of disruptions to services during the COVID-19 pandemic. In 2022, case incidence declined to 223 per 1000 population at risk.”
This is an argument for the effectiveness of existing interventions.
Funding to malaria prevention has stalled since 2010.[1] It’s misleading to point to GiveWell funding alone. I don’t think it’s particularly surprising that given ~constant funding the progress has slowed. As noted the nets last only a couple of years, and presumably you get diminishing marginal returns when scaling up, as always. I’m not sure exactly how they are counting here, as “other funders” is suspiciously small, but the general point still stands, that the largest donors have stalled over the last decade. WHO estimates for the counterfactual[2] show that just keeping the deaths constant is an achievement in itself (although the counterfactual is of course more uncertain). I think this comes down to population growth and a little bit of climate change, but I haven’t looked into it deeply.
I think there is an argument to be made here, that as the world shifted priorities with the SDGs and funding for the “old” efforts stalled, we unfortunately got a unusually good opportunity in malaria prevention in terms of marginal impact. While just keeping up the pressure doesn’t yield equally spectacular sounding results as the initial ramp up, it’s still likely saving thousands of lives.- ^
World Health Organization. (2023). Funding for malaria control and elimination, 2000–2022, by channel (constant 2022 US$) [Figure 6.4]. In World Malaria Report 2023 (p. 49). WHO.
- ^
World Health Organization. (2023). WHO methods for estimating cases and deaths averted. In World Malaria Report 2023 (p. 123), Annex 1. WHO.
- ^
Yes, this makes sense if I understand you correctly. If we set the effect size to 0 for all the dropouts, while having reasonable grounds for thinking it might be slightly positive, this would lead to underestimate top-line cost effectiveness.
I’m mostly reacting to the choice of presenting the results of the completer subgroup which might be conflated with all participants in the program. Even the OP themselves seem to mix this up in the text.
Context: To offer a few points of comparison, two studies of therapy-driven programs found that 46% and 57.5% of participants experienced reductions of 50% or more, compared to our result of 72%. For the original version of Step-by-Step, it was 37.1%. There was an average PHQ-9 reduction of 6 points compared to our result of 10 points.
As far as I can tell, they are talking about completers in this paragraph, not participants. @RachelAbbott could you clarify this?
When reading the introduction again I think it’s pretty balanced now (possibly because it was updated in response to the concerns). Again, thank you for being so receptive to feedback @RachelAbbott!
Very interesting, thanks for highlighting this!
I hope this is not what is happening. It’s at best naive. This assumes no issues will crop up during scaling, that “fixed” costs are indeed fixed (they rarely are) and that the marginal cost per treatment will fall (this is a reasonable first approximation, but it’s by no means guaranteed). A maximally optimistic estimate IMO. I don’t think one should claim future improvements in cost effectiveness when there are so many incredibly uncertain parameters in play.
My concrete suggestion would be to rather write something like: “We hope to reach 10 000 participants next year with our current infrastructure, which might further improve our cost-effectiveness.”
Thanks for this thorough and thoughtful response John!
I think most of this makes sense. I agree that if you are using an evidence based-intervention, it might not make sense to increase the cost by adding a control group. I would for instance not think of this as a big issue for bednet distribution in an area broadly similar to other areas bednet distribution works. Given that in this case they are simply implementing a programme from WHO with two positive RCTs (which I have not read), it seems reasonable to do an uncontrolled pilot.
I pushed back a little in a comment from you further down, but I think this point largely addresses my concerns there.
With regards to your explanations for why people drop out, I would argue that at least 1,2 and 3 are in fact because of the ineffectiveness of the intervention, but it’s mostly a semantic discussion.
The two RCTs cited seem to be about displaced Syrians, which makes me uncomfortable straightforwardly assuming it will transfer to the context in India. I would also add that there is a big difference between the evidence base for ITN distribution compared to this intervention. I look forward to seeing what the results are in the future!
This is fair, we don’t know why people drop out. But it seems much more plausible to me that looking at only the completers with no control is heavily biased in favor of the intervention.
I could spin the opposite story of course, it works so well that people drop out early because they are cured, and we never hear from them. My gut feeling is that this is unlikely to balance out, but again, we don’t know, and I contend this is a big problem. And I don’t think it’s the kind of issue you kan hand-wave away and proceed to casually presenting the results for completers like it represents the effect of the program as a whole. (To be clear, this post does not claim this, but I think it might easily be read like this by a naive reader).
There are all sort of other stories you could spin as well. For example, have the completers recently solved some other issue, e.g. gotten a job or resolved a health issue? Are they at the tail-end of the typical depression peak? Are the completers in general higher conscientiousness and thus more likely to resolve their issues on their own regardless of the programme? Given the information presented here, we just don’t know.
Qualitative interview with the completers only gets you so far, people are terrible at attributing cause and effect, and thats before factoring in the social pressure to report positive results in an interview. It’s not no evidence, but it is again biased in favor of the intervention.
Completers are a highly selected subset of the participants, and while I appreciate that in these sort of programmes you have to make some judgement-calls given the very high drop-out rate, I still think it is a big problem.
I don’t know about this, Open Phil have given billions to GiveWell charities and GHD programmes. A couple of million to a forecasting platform seems niche in comparison.
I don’t understand what you are saying here, could you elaborate?
By restricting to the people who completed the program, we get to understand the effect that the program itself has. This is important for understanding its therapeutic value.
I disagree with this. If this were a biomedical intervention where we gave a pill regiment, and two-thirds of the participants dropped out of the evaluation before the end because the pills had no effect (or had negative side-effects for that matter), it would not be right to look at only the remaining third that stuck with it to evaluate the effect of the pills. Although I do agree that it’s impressive and relevant that 27% complete the treatment, and that this is evidence of it’s relative effectiveness given the norm for such programmes.
I also wholeheartedly agree that the topline cost-effectiveness is what matters in the end.
Thanks for making these changes and responding to my concerns!
Also great to hear that HLI is doing a more in-depth analysis, that will be exciting to read.
With regards to the projections, it seems to me you just made up the number 10 000 participants? As in, there is no justification for why you chose this value. Perhaps I am missing something here, but it feels like without further context this projection is pretty meaningless.
Congratulations on your first pilot program! I’m very happy to see more work on direct well-being interventions!
I have a few questions and concerns:
Firstly, why did you opt to not have a control group? I find it problematic that you cite the reductions in depression, followed by a call to action for donations, before clarifying that there was no control. Given that the program ran for several months for some participants, and we know that in high income countries almost 50% recover without any intervention at all within a year[1], this feels disingenuous.
Secondly, isn’t it a massive problem that you only look at the 27% that completed the program when presenting results? You write that you got some feedback on why people were not completing the program unrelated to depression, but I think it’s more than plausible that many of the dropouts dropped out because they were depressed and saw no improvement. This choice makes stating things like “96% of program completers said they were likely or very likely to recommend the program” at best uninformative.
Thirdly, you say that you project the program will increase in cost effectiveness to 20x cash transfers, but give no justification for this number, other than general statements about optimisations and economies of scale. How do you derive this number? Most pilots see reduced cost-effectiveness when scaling up[2], I think you should be very careful publicly claiming this while soliciting donations.
Finally, you say Joel McGuire performed an analysis to derive the effect size of 0.54. Could you publish this analysis?
I hope I don’t come off as too dismissive, I think this is a great initiative and I look forward to seeing what you achieve in the future! It’s so cool to see more work on well-being interventions! Congratulations again on this exciting pilot!
Yes, this was the reason I chose the word robustly! I wholeheartedly agree that all three premises are certainly debatable. The reason I’m wondering is primarily because I think quite a few EAs might in fact have these views, wether correct or not. I’m therefore a little surprised that I have not seen anyone act on them.
That is, I have not seen anyone say that they have substantially increased their near-termist donations (although I have not gone looking either).
My suspicion is that a lot of the people holding these views might be more “grassroots” or in the periphery. Not the type of EA on a podcast or writing on the forum, but perhaps a city group member, student, earning to give etc.
If you believe that:
- ASI might come fairly soon
- ASI will either fix most of the easy problems quickly, or wipe us out
- You have no plausible way of robustly shaping the outcome of the arrival of ASI for the better
does it follow that you should spend a lot more on near-term cause areas now? Are people doing this?
I see some people argue for increasing consumption now, but surely this would apply even more so to donations to near-term cause areas?
Håkon Harnes’s Quick takes
It’s unclear to me why you think the procurement of tanks would demonstrate more of a closeness to the US than any other weapons system purchased from the US? It’s a weird kind of trade-off indeed if they can choose between the US-made patriot launchers (as you suggest) or the US-made Abrams tanks, and they go for the tanks despite a clear military inferiority? I honestly don’t follow the reasoning here.
EA Norway has shifted from in-person to digital general assemblies since COVID. This change has sparked some ongoing debate.
Benefits of in-person assemblies:
More informal networking and conversation
Better discussion environment
More enjoyable experience (digital meeting fatigue is real)
Previously combined with weekend conferences featuring talks and group discussions
Benefits of digital assemblies:
Easier attendance
Especially for members with families
Especially for people not living in Oslo, the capitol
Lower costs (minor)
Lower bar of entry for new members
EA Norway now also maintains an annual in-person gathering, essentially a mini-EAGx for Norway, were we plan to increasingly focus on organizational strategic planning to better capture some of the benefits of an in-person assembly.