I can see why this piece’s examples and tone will rankle folks here. But speaking for myself, I think its core contention is directionally correct: EA’s leading orgs’ and thinkers’ predictions and numeric estimates have an “all fur coat and no knickers” problem—putative precision but weak foundations. My entry to GiveWell’s Change Our Mind contest made basically the same point (albeit more politely).
Another way to frame this critique is to say it’s an instance of the Shirky principle: institutions will try to preserve the problem to which they are the solution. If GiveWell (or whoever) tried to clear up the ambiguous evidence underpinning its recommendations by funding more research (on the condition that the research would provide clear cost-benefit analyses in terms of lives saved per dollar), then what further purpose would the evaluator have once that estimate came back?
There are very reasonable counterpoints to this. I just think the critique is worth engaging with.
I looked at the eval for SMC, and it seems they relied largely on a Cochrane meta-analysis and then tried to correct down for a smaller effect in subsequent RTCs. If even relying on the allegedly gold standard famously intervention-skeptical Cochrane and then searching for published discomformation isn’t reliable, how can anyone ever be reasonably confident anything works?
As I argue in the SMC piece, not just any RCT will suffice, and today we know a lot more about what good research looks like. IMO, we should (collectively) be revisiting things we think we know with modern research methods. So yes, I think we can know things. But we are talking about hundreds of millions of dollars. Our evidentiary standards should be high.
I guess my pessimism is partly “if the gold standard of 2012 was total garbage, even from an organisation-Cochrane-that has zero qualms about saying there’s not much evidence for popular interventions-why should I trust that our 2024 idea of what good research looks like isn’t also wildly wrong? I wasn’t criticising you by the way-it’s good you’re holding GiveWell to account! I was just expressing stress/upset about the idea that we’re all wasting our time or making a fool of ourselves.
Some research evaluations last over time! But Munger’s ‘temporal validity’ argument really stuck with me: the social world changes over time, so things that work in one place and time could fail in another for reasons that have nothing to do with rigor, but changing context.
More broadly, for me personally, the way forward is to incentivize, champion, and promote better and more robust scientific work. I find this motivating and encouraging, and an efficient antidote against cynicism creep. I find it intellectually rewarding because it is an effort that spans many areas including teaching science, doing science, and communicating science. And I find it socially rewarding because it is a teamwork effort embedded in a large group of (largely early career) scientists trying to improve our fields and build a more robust, cumulative science.
I mean, I guess that is sort of encouraging, if you personally are a scientist, since it suggests you can do good work yourself. But it doesn’t offer me much sense that I who am not a scientist will ever in fact be able to trust very much outside established theory in the hard sciences, unless you think better methodology is going to become used nearly always by the big reputable orgs and journals. (I mean I already mostly didn’t have trust, but I kind of hoped GiveWell were relying on the minority of actually solid stuff.)
Obviously, ‘don’t trust anything’ could just be the right conclusion, and people should say it if it’s true! But it’s hard not to get disheartened about giving, if the messages is “don’t trust any research before c.2015, or also a lot of it afterward, even from the most apparently reliable and skeptical sources, and also, even good research produced now often has little external validity, so probably don’t trust that the good current stuff tells you much about what will happen going forward, either”.
I share your view that the criticism of seeming precision in EA is directionally correct, though attacking the cost-effectiveness of anti-malaria interventions sounds like it’s honing in on the least controversial predictions and strongest evidence base!
I’m less convinced the Shirky principle applies here. I don’t think clearing up ambiguous evidence for SMC would leave GiveWell or any other research org short of purpose, I think it would leave them in a position where they’d be able to get on with evaluating other causes, possibly with more foundation money headed their way to do so. For malaria specifically I also don’t think it’s possible to eliminate the uncertainty even with absurd research budgets, because background malaria prevalence and seasonal patterns vary so much by region and time (and are themselves endogenous with respect to prevention strategies used) so comparisons between areas require plugging assumptions into a model, and there will always be some areas where it has more or less effect.
I am very surprised to read that GiveWell doesn’t at all try to factor in deaths caused by the charities when calculating lives saved. I don’t agree that you need a separate number for lives lost as for lives saved, but I had always implicitly assumed that ‘lives saved’ was a net calculation.
The rest of the post is moderately misleading though (e.g. saying that Holden didn’t start working at Open Phil, and the EA-aligned OpenAI board members didn’t take their positions, until after FTXFF had launched).
The “deaths caused” example picked was pretty tendentious. I don’t think it’s reasonable to consider an attack at a facility by a violent criminal in a region with high baseline violent crime “deaths caused by the charity” or to extrapolate that into the assumption that two more people will be shot dead for every $100,000 donated. (For the record, if you did factor that into their spreadsheet estimate, it would mean saving a life via that program now cost $4776 rather than $4559)
I would expect the lives saved from the vaccines to be netted out against deaths from extremely rare vaccine side effects (and the same with analysis of riskier medical interventions), but I suspect the net size of that effect is 0 to several significant figures and already factored into the source data.
I don’t think you incorporate the number at face value, but plausibly you do factor it in in some capacity, given the level of detail GiveWell goes into for other factors
I think if there’s no credible reason to assign responsibility to the intervention, there’s no need to include it in the model. I think assigning the charity responsibility for the consequences of a crime they were the victim of is just not (by default) a reasonable thing to do.
It is included in the detailed write-up (the article even links to it). But without any reason to believe this level of crime is atypical for the context or specifically motivated by e.g. anger against the charity, I don’t think anything else needs to be made of it.
I don’t agree that you need a separate number for lives lost as for lives saved, but I had always implicitly assumed that ‘lives saved’ was a net calculation.
Interesting! I think the question of whether 1 QALY saved (in expectation) is canceled out by the loss of 1 QALY (in expectation) is a complicated question. I tend to think there’s an asymmetry between how good well-being is & how bad suffering is, though my views on this have oscillated a lot over the years. I’d like GiveWell to keep the tallies separate because I’d prefer to make the moral judgement depending on my current take on this asymmetry, rather than have them default to saying it’s 1:1.
I tend to think there’s an asymmetry between how good well-being is & how bad suffering is
This isn’t relevant if you think GiveWell charities mostly act to prevent suffering. I think this is certainly true for the health stuff, and arguably still plausible for the economic stuff.
This is an important point. People often confuse harm/benefit asymmetries with doing/allowing asymmetries. Wenar’s criticism seems to rest on the latter, not the former. Note that if all indirect harms are counted within the constraint against causing harm, almost all actions would be prohibited. (And on any plausible restriction, e.g. to “direct harms”, it would no longer be true that charities do harm. Wenar’s concerns involve very indirect effects. I think it’s very unlikely that there’s any consistent and plausible way to count these as having disproportionate moral weight. To avoid paralysis, such unintended indirect effects just need to be weighed in aggregate, balancing harms done against harms prevented.)
I don’t think it can be separated neatly. If the person who has died as a result of the charity’s existence is a recipient of a disease reduction intervention, then they may well have died from the disease instead if not for the intervention.
I can see why this piece’s examples and tone will rankle folks here. But speaking for myself, I think its core contention is directionally correct: EA’s leading orgs’ and thinkers’ predictions and numeric estimates have an “all fur coat and no knickers” problem—putative precision but weak foundations. My entry to GiveWell’s Change Our Mind contest made basically the same point (albeit more politely).
Another way to frame this critique is to say it’s an instance of the Shirky principle: institutions will try to preserve the problem to which they are the solution. If GiveWell (or whoever) tried to clear up the ambiguous evidence underpinning its recommendations by funding more research (on the condition that the research would provide clear cost-benefit analyses in terms of lives saved per dollar), then what further purpose would the evaluator have once that estimate came back?
There are very reasonable counterpoints to this. I just think the critique is worth engaging with.
I looked at the eval for SMC, and it seems they relied largely on a Cochrane meta-analysis and then tried to correct down for a smaller effect in subsequent RTCs. If even relying on the allegedly gold standard famously intervention-skeptical Cochrane and then searching for published discomformation isn’t reliable, how can anyone ever be reasonably confident anything works?
As I argue in the SMC piece, not just any RCT will suffice, and today we know a lot more about what good research looks like. IMO, we should (collectively) be revisiting things we think we know with modern research methods. So yes, I think we can know things. But we are talking about hundreds of millions of dollars. Our evidentiary standards should be high.
Related: Keving Munger on temporal validity https://journals.sagepub.com/doi/10.1177/20531680231187271
I guess my pessimism is partly “if the gold standard of 2012 was total garbage, even from an organisation-Cochrane-that has zero qualms about saying there’s not much evidence for popular interventions-why should I trust that our 2024 idea of what good research looks like isn’t also wildly wrong? I wasn’t criticising you by the way-it’s good you’re holding GiveWell to account! I was just expressing stress/upset about the idea that we’re all wasting our time or making a fool of ourselves.
Some research evaluations last over time! But Munger’s ‘temporal validity’ argument really stuck with me: the social world changes over time, so things that work in one place and time could fail in another for reasons that have nothing to do with rigor, but changing context.
In general, null results should be our default expectation in behavioral research: https://www.bu.edu/bulawreview/files/2023/12/STEVENSON.pdf
However, per https://eiko-fried.com/antidotes-to-cynicism-creep/#6_Antidotes_to_cynicism_creep
I mean, I guess that is sort of encouraging, if you personally are a scientist, since it suggests you can do good work yourself. But it doesn’t offer me much sense that I who am not a scientist will ever in fact be able to trust very much outside established theory in the hard sciences, unless you think better methodology is going to become used nearly always by the big reputable orgs and journals. (I mean I already mostly didn’t have trust, but I kind of hoped GiveWell were relying on the minority of actually solid stuff.)
Obviously, ‘don’t trust anything’ could just be the right conclusion, and people should say it if it’s true! But it’s hard not to get disheartened about giving, if the messages is “don’t trust any research before c.2015, or also a lot of it afterward, even from the most apparently reliable and skeptical sources, and also, even good research produced now often has little external validity, so probably don’t trust that the good current stuff tells you much about what will happen going forward, either”.
I share your view that the criticism of seeming precision in EA is directionally correct, though attacking the cost-effectiveness of anti-malaria interventions sounds like it’s honing in on the least controversial predictions and strongest evidence base!
I’m less convinced the Shirky principle applies here. I don’t think clearing up ambiguous evidence for SMC would leave GiveWell or any other research org short of purpose, I think it would leave them in a position where they’d be able to get on with evaluating other causes, possibly with more foundation money headed their way to do so. For malaria specifically I also don’t think it’s possible to eliminate the uncertainty even with absurd research budgets, because background malaria prevalence and seasonal patterns vary so much by region and time (and are themselves endogenous with respect to prevention strategies used) so comparisons between areas require plugging assumptions into a model, and there will always be some areas where it has more or less effect.
I am very surprised to read that GiveWell doesn’t at all try to factor in deaths caused by the charities when calculating lives saved. I don’t agree that you need a separate number for lives lost as for lives saved, but I had always implicitly assumed that ‘lives saved’ was a net calculation.
The rest of the post is moderately misleading though (e.g. saying that Holden didn’t start working at Open Phil, and the EA-aligned OpenAI board members didn’t take their positions, until after FTXFF had launched).
The “deaths caused” example picked was pretty tendentious. I don’t think it’s reasonable to consider an attack at a facility by a violent criminal in a region with high baseline violent crime “deaths caused by the charity” or to extrapolate that into the assumption that two more people will be shot dead for every $100,000 donated. (For the record, if you did factor that into their spreadsheet estimate, it would mean saving a life via that program now cost $4776 rather than $4559)
I would expect the lives saved from the vaccines to be netted out against deaths from extremely rare vaccine side effects (and the same with analysis of riskier medical interventions), but I suspect the net size of that effect is 0 to several significant figures and already factored into the source data.
I don’t think you incorporate the number at face value, but plausibly you do factor it in in some capacity, given the level of detail GiveWell goes into for other factors
I think if there’s no credible reason to assign responsibility to the intervention, there’s no need to include it in the model. I think assigning the charity responsibility for the consequences of a crime they were the victim of is just not (by default) a reasonable thing to do.
It is included in the detailed write-up (the article even links to it). But without any reason to believe this level of crime is atypical for the context or specifically motivated by e.g. anger against the charity, I don’t think anything else needs to be made of it.
Interesting! I think the question of whether 1 QALY saved (in expectation) is canceled out by the loss of 1 QALY (in expectation) is a complicated question. I tend to think there’s an asymmetry between how good well-being is & how bad suffering is, though my views on this have oscillated a lot over the years. I’d like GiveWell to keep the tallies separate because I’d prefer to make the moral judgement depending on my current take on this asymmetry, rather than have them default to saying it’s 1:1.
This isn’t relevant if you think GiveWell charities mostly act to prevent suffering. I think this is certainly true for the health stuff, and arguably still plausible for the economic stuff.
This is an important point. People often confuse harm/benefit asymmetries with doing/allowing asymmetries. Wenar’s criticism seems to rest on the latter, not the former. Note that if all indirect harms are counted within the constraint against causing harm, almost all actions would be prohibited. (And on any plausible restriction, e.g. to “direct harms”, it would no longer be true that charities do harm. Wenar’s concerns involve very indirect effects. I think it’s very unlikely that there’s any consistent and plausible way to count these as having disproportionate moral weight. To avoid paralysis, such unintended indirect effects just need to be weighed in aggregate, balancing harms done against harms prevented.)
I don’t think it can be separated neatly. If the person who has died as a result of the charity’s existence is a recipient of a disease reduction intervention, then they may well have died from the disease instead if not for the intervention.