Director of Research at CEARCH: https://exploratory-altruism.org/
I construct cost-effectiveness analyses of various cause areas, identifying the most promising opportunities for impactful work.
Previously a teacher in London, UK.
Director of Research at CEARCH: https://exploratory-altruism.org/
I construct cost-effectiveness analyses of various cause areas, identifying the most promising opportunities for impactful work.
Previously a teacher in London, UK.
Longtermist work is suspiciously comfortable
Much Longtermist work is clean, abstract and suspiciously well-suited to your typical EA. Could this be clouding our judgement?
One surprising thing about bednet-era EA was the disconnect between EAs and the kind of work they were championing. Oxbridge-grad MacAskill implored us to donate to malaria charities, or even to help fix global poverty directly. I actually found this reassuring—nerdy philosophy types probably don’t inherently love thinking about Sub-Saharan supply chains, so the fact they do it anyway was a sign that perhaps their reasoning really was impartial.
Contrast this with Longtermism. Longtermist work is generally more theoretical and less messy. It can be conducted on a laptop with a flat white and a Huel on hand. Longtermists don’t need to make networks in developing countries. In many cases, they don’t even need to prove that their work is making a difference.
All of the above differences make Longtermist work more appealing to a typical Western, university-educated person.
“So what?” I hear you say, “we’re rational and are pursuing Longtermism because it is so impactful”.
Perhaps. But we should be wary. We know how prone we are to post-rationalising our decisions. We should be careful to separate the worthiness of Longtermist work from the appeal of Longtermist roles.
I am not questioning the validity of Longtermism. I merely think that we should be aware of the likely bias we have towards it.
We are allowed to be swayed by good working conditions or better wages. The danger is that the comforts of the job stop us asking difficult questions about Longtermism.
Tonight, on the 80,000 Hours job page, as my cursor glides past the $1,000/month manager roles in Nairobi and hovers over the $100,000/year AI job in Silicon Valley, I will try to remember this.
I think you could build a very compelling case for this. Even if official data sources do underestimate key numbers like overdose deaths, they are still a stirring call to action.
Drug problems have got considerably worse in the past decade. This CDC source implies that overdose rates have more than doubled since 2015. Much of the increase came during the pandemic, which could add a little narrative spice to your argument.
2. Other “similar” problems are not getting worse. Other “despair” indicators like suicide and depression appear to be stable. Road accidents and violence have fallen. On one hand it’s a bit sneaky to pick and choose comparisons like this, but it could be argued that they are all societal problems that often cause (very) early death. They’re tragic.
3. Vaccines/ other pharma interventions may offer an unusually tractable and scalable solution. Addiction and all of the other problems in the chart above are very difficult problems to fight. At best, interventions usually take a chunk out of the burden but offer no hope of big change. Drug interventions can be controversial, with effects of uncertain sign. If you can show that your ideas are significantly better, you are doing well.
I expect that a major difficulty is that your solutions involve developing new vaccines/drugs, which is of course an expensive, unknown and long process. Will pharma companies see potential for a profit? Is there scientific grounding for optimism on these new drugs being possible?
Unfortunately I don’t have the spare capacity to volunteer much time. I’d be interested in giving feedback on any future work. Good luck!
Using PDF rather than CDF to compare the cost-effectiveness of preventing events of different magnitudes here seems off.
You show that preventing (say) all potential wars next year with a death toll of 100 is 1000^1.6 = 63,000 times better in expectation than preventing all potential wars with a death toll of 100k.
More realistically, intervention A might decrease the probability of wars of magnitude 10-100 deaths and intervention B might decrease the probability of wars of magnitude 100,000 to 1,000,000 deaths. Suppose they decrease the probability of such wars over the next n years by the same amount. Which intervention is more valuable? We would use the same methodology as you did except we would use the CDF instead of the PDF. Intervention A would be only 1000^0.6 = 63 times as valuable.
As an intuition pump we might look at the distribution of military deaths in the 20th century. Should the League of Nations/UN have spent more effort preventing small wars and less effort preventing large ones?
The data actually makes me think that even the 63x from above is too high. I would say that in the 20th century, great-power conflict > interstate conflict > intrastate conflict should have been the order of priorities (if we wish to reduce military deaths). When it comes to things that could be even deadlier than WWII, like nuclear war or a pandemic, it’s obvious to me that the uncertainty about the death toll of such events increases at least linearly with the expected toll, and hence the “100-1000 vs 100k-1M” framing is superior to the PDF approach.
I think you are right that we often forget the marginal nature of the contributions made in a highly-sought-after job. “Do I offer more than the next best candidate?” is a question we forget to ask.
I suspect the effectiveness of “nurses, child care workers, truck drivers, and home health aides”, while higher than a typical job, might pale in comparison to more targeted work like independent projects or effective giving. Someone donating 10% of the median US salary to effective causes can expect to save approximately one life per year—a high threshold indeed.
Thank you for the kind words, Nick!
Completely agree that if the participants know they are receiving a special treatment they are likely to show more response bias. By “well-conducted RCT” I was thinking of studies with an active control like Nakimuli-Mpungu et al. (2020) in which patients “were randomly assigned to deliver either [group support psychotherapy] or group HIV education”. If done well, the participants won’t know which treatment the scientists “want” to perform better and so response bias will be constant.
Thanks for sharing an explicit theory of change. I think you make a convincing case for filling a niche and being more cost-effective than some other interventions in the GCR community-building space (which in my opinion are often unnecessarily flashy).
Still, as an EA in Global Health who recently enjoyed a stay at CEEALAR, I’m disappointed to see the narrowing of focus to GCRs only. I think you can probably make a compelling case for why CEEALAR can be more effective if it specialises, but am I right in guessing that the change is driven mainly by trustees’ assessment that GCRs are just way more important than other cause areas?
Identify and validate better methods of eliciting low-probability forecasts
I think this is important work, so I’m glad to hear that it’s a priority.
It’s a two-pronged approach, right? Measuring how reliable forecasters are at working with small probabilities, and using better elicitation measures to reduce the size of any error effect
I suspect that when measuring how reliable forecasters are at working with small probabilities you’ll find a broad range of reliability. It would be interesting to see how the XPT forecasts change if you exclude those with poor small-probability understanding, or if you weight each response according to the forecaster’s aptitude.
Using comparative judgements seems like a good avenue for exploration. Have you thought about any of the following?
Using “1-in-x” style probabilities instead of x%. This might be a more “natural” way of thinking about small probabilities
Eliciting probabilities by steps: first get respondents to give the order of magnitude (is your prediction between 1-in-10 and 1-in-100 or between 1-in-100 and 1-in-1000 or…), then have them narrow down further. This is still more abstract than the “struck by lightning” idea, but does not rely on the respondent’s level of lightning-strike knowledge
Giving respondents sense-checks on the answers they have given before, and an opportunity to amend their answers: “your estimate of X-risk through Bio is 1% and your estimate of X-risk through natural pandemics is 0.8%, so you think 80% of the x-risk from Bio comes from natural pandemics”
One more thing:
If we are not sure whether forecasters can tell 0.001% apart from 0.000001% (a magnitude difference of 1,000x), then we should treat a 0.000001% forecast of a catastrophic risk as if it were 0.001% and be much more cautious about potential dangers.
In theory, yes, but I think people are generally much more likely to say 0.001% when their “true” probability is 0.000001% than vice versa—maybe because we very rarely think about events of the order of 0.000001%, so 0.001% seems to cover the most unlikely events.
You might counter that we just need a small proportion of respondents to say 0.00001% when their “true” probability is 0.001% to risk undervaluing important risks. But not if we are using medians, as the XPT does.
I could be wrong on the above, but my take is that understanding the likely direction of errors in the “0.001% vs 0.000001%” scenarios maybe ought to be a priority.
Could you elaborate? In my mind, catastrophes leading to over 90 % of the population dying flow through AI and bio risk, which are the 2 longtermist priorities, so I am not sure which “threats to long-term value that were previously not considered” you have in mind.
I’d say that “succumbing to long-term demographic decline after a major catastrophe” is the threat that has not previously been considered. People might assume that if hundreds of millions of people survive a catastrophe and begin reindustrializing, the main existential problem is that they are more vulnerable to the next catastrophe. I think population decline is an extra worry, and so the existential consequences of a global catastrophe are slightly greater.
Do you have any thoughts on the chance of 90 % of the population dying leading to preindustrial society?
I don’t think I have many useful insights here, except that we have very limited historical data to rely upon. It’s possible that even 10% mortality could trigger ripple effects that bring everything crashing down. If the cause of mortality was a global agricultural shortfall, we might see governments become very protectionist, banning food exports, which in turn leads to sanctions on other goods. If global trade shuts down, who knows what will happen...
It’s possible that industrialization could persist in one place, but it would be difficult as no one country has all of the resources that we use globally today. The islands that would be most resilient to climate changes in a nuclear winter would struggle from lack of energy sources and other things. The US seems to be well-endowed with resources, but still lacks some important metals. Rich economies are dependent upon cheap labour from abroad, through imports and immigration. So I’d say regional industrialization is possible but uncertain.
You may want to look into historical examples of fertility increasing after catastrophes, like after the Black Death.
Yes! In my first draft I looked at the Black Death, China’s Great Leap Forward Famine, and the Irish Famine. The population collapse in the Americas after the arrival of Europeans could be useful too, although nobody really knows how many people died.
It’s worth pointing out that these catastrophes happened in places that had not yet had a demographic transition—so they may be a good test of Malthusian models, but they don’t really help us with what would happen to a mature society.
It took a long time for Europe to recover from the Black Death, partly because the plague kept coming back. But once the plague was gone, growth may have been faster than pre-catastrophe. Perhaps there were more resources available per person, and better tech to exploit them with. To me that would be sign that mediaeval demographics were somewhat Malthusian.
The Irish Famine is tricky because more people emigrated than starved, and the famine was the beginning of a long demographic decline for the country. I couldn’t find any evidence of skyrocketing birth rates.
Although China’s famine was the biggest ever, it was still only a few % of China’s population. It’s difficult to tell amid the noise amid whether/how the famine altered China’s long-term population trajectory. We know that fertility dropped during the famine and spiked afterwards (maybe partly due to people “replacing” lost babies/pregnancies). But a few years later, China resumed the fertility decline that had begun soon before the famine. The fertility rate dropped from 5.7 to 2.8 between 1970 and 1980.
I’d be interested to hear of any historical examples that might be useful!
My attempt to summarize why the model predicts that preventing famine in China and other countries will have a negative effect on the future:
Much of the value of the future hinges on whether values become predominantly democratic or antidemocratic
The more prevalent antidemocratic values (or populations) are after a global disaster, the likelier it is that such values will become predominant
Hence preventing deaths in antidemocratic countries can have a negative effect on the future.
Or as the author puts it in a discussion linked above:
To be blunt for the sake of transparency, in this model, the future would improve if the real GDP of China, Egypt, India, Iran, and Russia dropped to 0, as long as that did not significantly affect the level of democracy and real GDP of democratic countries. However, null real GDP would imply widespread starvation, which is obviously pretty bad! I am confused about this, because I also believe worse values are associated with a worse future. For example, they arguably lead to higher chances of global totalitarianism or great power war.
I agree with the author that the conclusion is confusing. Even concerning.
I’d suggest that the conclusion is out-of-sync with how most people feel about saving lives in poor, undemocratic countries. We typically don’t hesitate to tackle neglected tropical diseases on the basis that doing so boosts the populations of dictatorships.
Hi James, thanks for sharing this. As others have said, it is a difficult thing to do. I’m actually weirdly looking forward to the EA criticisms that will come out of this FTX business. You often hear of the abstract need for criticism and “red-teaming” but not much about the actual criticisms.
I think your story chimes with a bigger difficulty in the EA movement : how small-scale effectiveness measures (ie not talking to junior EAs) end up stymying the movement on a larger scale (being unfriendly and putting people off).
I’m also worried about whether a utilitarian movement really can value integrity, friendliness etc. I can see how it might see the value in appearing to have integrity or appearing to value diversity. But if those things get in the way of effectiveness, won’t they be covertly canned?
I’m a 30y.o. in London and consider myself fairly friendly. If you want to talk about stuff, get in touch.
So glad you’re looking into this.
Interesting to see that you didn’t use IHME’s DALY weights for mild, moderate and severe depression, which are derived from surveys of ordinary people making pairwise comparisons (GBD, 2019).
[Figure from WHO (2020).]
In my report on mental health interventions (coming soon!) I took data from Pyne et al. (2009) which asked sufferers, ex-sufferers and never-sufferers of depression to rate the badness of different levels of depression. My analysis of the data suggests that sufferers rate depression to be approximately 20% worse than a typical person would. Interestingly, this phenomenon of rating your own condition worse than others would does not seem to hold for most health conditions (see Pyne et al., 2009).
I also tried to model the DALY burden that comes from the additional suicide risk associated with depression, and reached an estimate of 0.066 DALYs for each year of depression. This is contingent on some dodgy data on the effect of depression on suicide rates, and ought to vary a lot by gender and nationality.
Putting it all together, I got an estimated weighting of 0.392 for an average case of depression. My central estimate is 0.145 DALYs per SD of depression symptoms, so about 22% lower than yours.
Thanks again!
I think I have been trying to portray the point-estimate/interval-estimate trade-off as a difficult decision, but probably interval estimates are the obvious choice in most cases.
So I’ve re-done the “Should we always use interval estimates?” section to be less about pros/cons and more about exploring the importance of communicating uncertainty in your results. I have used the Ord example you mentioned.
Thanks for your feedback, Vasco. It’s led me to make extensive changes to the post:
More analysis on the pros/cons of modelling with distributions. I argue that sometimes it’s good that the crudeness of point-estimate work reflects the crudeness of the evidence available. Interval-estimate work is more honest about uncertainty, but runs the risk of encouraging overconfidence in the final distribution.
I include the lognormal mean in my analysis of means. You have convinced me that the sensitivity of lognormal means to heavy right tails is a strength, not a weakness! But the lognormal mean appears to be sensitive to the size of the confidence interval you use to calculate it—which means subjective methods are required to pick the size, introducing bias.
Overall I agree that interval estimation is better suited to the Drake equation than to GiveWell CEAs. But I’d summarise my reasons as follows:
The Drake Equation really seeks to ask “how likely is it that we have intelligent alien neighbours?”, but point-estimate methods answer the question “what is the expected number of intelligent alien neighbours?”. With such high variability the expected number is virtually useless, but the distribution of this number allows us to estimate the number of alien neighbours. GiveWell CEAs probably have much less variation and hence a point-estimate answer is relatively more useful
Reliable research on the numbers that go into the Drake equation often doesn’t exist, so it’s not too bad to “make up” interval estimates to go into it. We know much more about the charities GiveWell studies, so made-up distributions (even those informed by reliable point-estimates) are much less permissible.
Thanks again, and do let me know what you think!
Thanks for pointing this out. I agree, and I think we can trace the elitism in the movement to well-informed efforts to get the most from the human resources available.
While EA remains on the fringe we can keep thinking in terms of maximising marginal gains (ie only targeting an elite with the greatest potential for doing good). But as EA grows it is worth considering the consequences of maintaining such an approach :
Focusing on EA jobs & earning-to-give will limit the size of the movement, as newcomers increasingly see no place for themselves in it
With limited size comes limited scope for impact: eg you can’t change things that require a democratic majority
Even if 2) proves false, we probably don’t want a future society run by a vaunted, narrow elite (at least based on past experience)
According to this (not that rigorous) paper, cataract surgery can cost about $300. It improves vision and sometimes prevents blindness. Even if it’s not as cost-effective as a GiveWell-recommended charity, it can still illustrate the point about guide dogs.
As someone who has thought about cost-effectiveness, I agree that comparing willingness-to-pay-for-a-QALY/life is a more robust point. But for people who haven’t thought about this much, the more visceral preventing-blindness comparison might be better.
Maybe it would be worth someone checking that cataract-surgery charities like Sightsavers are passably cost-effective.
I skimmed the piece on axiological asymmetries that you linked and am quite puzzled that you seem to start with the assumption of symmetry and look for evidence against it. I would expect asymmetry to be the more intuitive, therefore default, position. As the piece says
At just the first-order level, people tend to assume that (the worst) pain is worse than (the best) pleasure is pleasurable. The agonizing ends for non-human animals in factory farms and in the wild seem far worse than the best sort of life they could realize would be good. [...] it’s hard to find any organisms that risk the worst pains for the greatest pleasures and vice versa.
I would expect that a difference in magnitude between the best pleasure and worst possible is the most obvious explanation, but the piece concludes that these judgments are “far more plausibly explained by various cognitive biases”.
As far as I can tell this would suggest that either:
Someone who has recently experienced or is currently experiencing intense suffering (and therefore has a better understanding of the stakes) would be more willing to take the kind of roulette gamble described in the piece. This seems unlikely.
People’s assessments of hedonic states are deeply unreliable even if they have recent experience of the states in question. I don’t like this much because it means we have to fall back on physiological evidence for human pleasure/suffering, which, as shown by the mayonnaise example, can’t give us the full picture.
On a slightly separate note, I played around with the BOTEC to check the claim that assuming symmetry doesn’t change the numbers much and I was convinced. The extreme suffering-focused assumption (where perfect health is merely neutral) resulted in double the welfare gain of the symmetric assumption (when the increase in welfare as a percentage of the animals’ negative welfare range is held constant).
My main question on this last point is: why use “percentage of the animals’ negative welfare range” when “percentage of the animals’ total welfare range” seems more relevant and would not vary at all across different (a)symmetry assumptions?
Great summary. You must have an incredibly organised system for keeping track of your reading and what you take from each post!
I suspect this has given me most of the benefit of hours of unguided reading at a fraction of the time cost.
I think it’s a great idea. My intuition is that you ought to exaggerate what makes your work different from the existing EA canon. For example, you might want to be much more accessible than the works put out by moral philosophers. To this end, I suggest partnering with someone with a track record publishing pop-science, self-help or similar.
It doesn’t mean you have to water down EA ideas. But it would probably mean distilling the essence of EA thought into a few clear principles, which can then use to illustrate why EA leads to various conclusions.
For example (off the top of my head) your principles might be:
The outcomes are what matters (consequentialism)
Do your best with the information available (Bayesian thinking)
Then, in your chapter on personal consumption choices, you can show why (to borrow Geoffrey Miller’s example) transitioning from chicken to grass-fed beef, with an offset donation to Vegan Outreach (a bewildering choice to most people) stems from the principles.
In short, you should aim to be accessible, but not one of those books that you have to flick back through to find the answers to each of life’s questions. Readers should be left with a clear and lasting grounding in the basics.
While I’m writing about setting yourself apart from the EA canon, I feel I should point out the obvious—women, especially mothers, are not well-represented among EA authors. If you can find some way to productively collaborate with Julia, you should.
On purely pragmatic grounds, having a female name on the cover will affect the readership
Poor representation is plausibly linked to increased attrition (I see the higher rates of attrition among women studying math as an example of this)
Parenthood, which is essentially a lifelong commitment to favour your child over others, urgently needs discussing in the context of EA, especially by those who are both EA and parents.
Thanks for sharing this! I would love to see similar transparency on pay decisions from other orgs, even if they don’t have such a formalised system.