I am a researcher at the Happier Lives Institute. In my work, I assess the cost-effectiveness of interventions in terms of subjective wellbeing.
JoelMcGuire
Fair jabs, but the PRC-Taiwan comparison was because it was the clearest natural experiment that came to mind where different bits of a nation (shared language, culture, etc.) were somewhat randomly assigned to authoritarianism or pluralistic democracy. I’m sure you could make more comparisons with further statistical jiggery-pokery.
The PRC-Taiwan comparison is also because, imagining we want to think of things in terms of life satisfaction, it’s not clear there’d be a huge (war-justifying) loss in wellbeing if annexation by the PRC only meant a relatively small dip in life satisfaction. This is the possibility I found distressing. Surely there’s something we’re missing, no?
I think inhabitants of both countries probably have similar response styles to surveys with these scales. Still, if a state is totalitarian, we should probably not be surprised if people are suspicious of surveys.
Sure, Taiwan could be invaded, and that could put a dampener on things, but, notably, Taiwan is more satisfied than its less likely to be invaded peers of similar wealth and democracy: Japan and South Korea.
I expect one response is, “well, we shouldn’t use these silly surveys”. But what other existing single type of measure is a better assessment of how people’s lives are going?
Taiwan has about a 0.7 advantage on a 0 to 10 life satisfaction scale, with most recently, 5% more of the population reporting to be happy.
I agree that the agency of newer NATO members (or Ukraine) has been neglected. Still, I don’t think this was a primary driver of underestimating Ukraine’s chances—unless I’m missing what “agency” means here.
I assume predictions were dim about Ukraine’s chances at the beginning of the war primarily because Russia and the West had done an excellent job of convincing us that Russia’s military was highly capable. E.g., I was disconcerted by the awe/dread with which my family members in the US Army spoke about Russian technical capabilities across multiple domains.
That said, I think some of these predictions came from a sense that Ukraine would just “give up”. In which case, missing the agency factor was a mistake.
Has Russia’s Invasion of Ukraine Changed Your Mind?
What’s the track record of secular eschatology?
A recent SSC blog post depicts a dialogue about Eugenics. This raised the question: how has the track record been for a community of reasonable people to identify the risks of previous catastrophes?
As noted in the post, at different times:
Many people were concerned about overpopulation posing an existential threat (c.f. population bomb, discussed at length in Wizard and The Prophet). It now seems widely accepted that the risk overpopulation posed was overblown. But this depends on how contingent the green revolution was. If there wasn’t a Norman Borlaug, would someone else have tried a little bit harder than others to find more productive cultivars of wheat?
Historically, there also appeared to be more worry about the perceived threat posed by a potential decline in population IQ. This flowed from the reasonable-sounding argument “Smart people seem to have fewer kids than their less intellectually endowed peers. Extrapolate this over many generations, and we have an idiocracy that at best will be marooned on earth or at worst will no longer be capable of complex civilization.” I don’t hear these concerns much these days (an exception being a recent Clearer Thinking podcast episode). I assume the dismissal would sound something like “A. Flynn effect.[1] B. If this exists, it will take a long time to bite into technological progress. And by the time it does pose a threat, we should have more elegant ways of increasing IQ than selective breeding. Or C. Technological progress may depend more on total population size than average IQ since we need a few Von Neumann’s instead of hordes of B-grade thinkers.”
I think many EAs would characterize global warming as tentatively in the same class: “We weren’t worried enough when action would have been high leverage, but now we’re relatively too worried because we seem to be making good progress (see the decline in solar cost), and we should predict this progress to continue.”
There have also been concerns about the catastrophic consequences of: A. Depletion of key resources such as water, fertilizer, oil, etc. B. Ecological collapse. C. Nanotechnology(???). These concerns are also considered overblown in the EA community relative to the preoccupation with AI and engineered pathogens.
Would communism’s prediction about an inevitable collapse of capitalism count? I don’t know harmful this would have been considered in the short run since most attention was about the utopia this would afford.
Most of the examples I’ve come up with seem to make me lean towards the view that “these past fears were overblown because they consistently discount the likelihood that someone will fix the problem in ways we can’t yet imagine.”
But I’d be curious to know if someone has examples or interpretations that lean more towards “We were right to worry! And in hindsight, these issues received about the right amount of resources. Heck they should have got more!”
What would an ideal EA have done if teleported back in time and mindwiped of foresight when these issues were discovered? If reasonable people acted in folly then, and EAs would have acted in folly as well, what does that mean for our priors?
- ^
I can’t find an OWID page on this, despite google image searches making it apparent it once existed. Might not have fed the right conversations to have allowed people to compare IQs across countries?
While there are some Metaculus questions that ask for predictions of the actual risk, the ones I selected are all conditional of the form, “If a global catastrophe occurs, will it be due X”. So they should be more comparable to the RP question “Which of the following do you think is most likely to cause human extinction?”
I know this wasn’t the goal, but this was the first time I’d seen general polls of how people rank existential risks, and I’m struck by how much the public differs from Rationalists / EAs (using Metaculus and Toby as a proxy). [1]
Risk Public (RP) Metaculus Difference Nukes 42% 31% -11% Climate 27% 7% -20% Asteroid 9%[2] ~0% (Toby Ord) -9% Pandemic 8% Natural: 14%, Eng: 28% 6- 20% AI 4% 46% 42%
In general, i agree politeness doesn’t require that — but id encourage following up in case something got lost in junk if the critique could be quite damaging to its subject.
In case it’s not obvious, the importance of previewing a critique also depends on the nature of the critique and the relative position of the critic and the critiqued. I think those possibly “Punching down” should be more generous and careful than those “punching up”.
The same goes for the implications of the critique “if true”, whether it’s picking nits or questioning whether the organisation is causing net harm.
That said, I think these considerations only make a difference between waiting one or two weeks for a response and sending one versus several emails to a couple of people if there’s no response the first time.
Lead exposure: a shallow cause exploration
Hi Alex, I’m heartened to see GiveWell engage with and update based on our previous work!
[Edited to expand on takeaway]
My overall impression is:
This update clearly improves GiveWell’s deworming analysis.
Each % point change in deworming cost-effectiveness could affect where hundreds of thousands of dollars are allocated. So getting it right seems important.
More work building an empirical prior seems likely to change the estimated decay of income effects and thus deworming’s cost-effectiveness, although it’s unclear what direction.
Further progress appears easy to make.
This work doesn’t update HLI’s view of deworming much because:
We primarily focus on subjective wellbeing as an outcome, which deworming doesn’t appear to affect in the long run.
The long-term income effects of deworming remain uncertain.
In either case, analysing deworming’s long-term effects still relies on a judgement-driven analysis of a single (well-done) but noisy study.
[Note: I threw this comment together rather quickly, but I wanted to get something out there quickly that gave my approximate views.]
1. There are several things I like about this update:
In several ways, it clarifies GiveWell’s analysis of deworming.
It succinctly explains where many of the numbers come from.
It clarifies the importance of explicit subjective assumptions (they seem pretty important).
It lays out the evidence it uses to build its prior in a manner that’s pretty easy to follow.
Helpfully, it lists the sources and reasons for the studies not included.
2. There are a few things that I think could be a bit clearer:
The decay rate from the raw (unadjusted) data is 13% yearly.
Assuming the same starting value as GiveWell, using this decay rate would lead to a total present value of 0.06 log-income units, compared to 0.09 for their “3% decay” model and 0.11 for their “no-decay” model.
Different decay rates imply very different discounts relative to the “no-decay” baseline / prior, 13% decay→ 49% discount. 3% → 19% discount.
They arrive at a decay rate of 3% instead of 13% because they subjectively discount the effect size from earlier time points more, which reduces the decay rate to 3%. While some of their justifications seem quite plausible[1] -- after some light spreadsheet crawling, I’m still confused about what’s happening underneath the hood here.
The 10% decrease in effectiveness comes because they assign 50% of the weight to their prior that there’s no decay and 50% to their estimate of a 3% decay rate. So whether the overall adjustment is 0% or 50% depends primarily on two factors:
How much to subjectively (unclear if this has an empirical element) discount each follow-up.
How much weight should be assigned to the prior for deworming’s time-trajectory, which they inform with a literature review.
All this being said, I think this update is a big improvement to the clarity of GiveWell’s deworming analysis.
My next two comments are related to some limitations of this update that Alex acknowledges:
It’s possible we’ve missed some relevant studies altogether.
We have not tried to formally combine these to get point estimates over time or attempted to weight studies based on relevance, study quality, etc.
We are combining studies that may have little ability to inform what we’d expect from deworming (twin studies, childcare programs, etc.).
It could be possible to re-assess other studies measuring long-term benefits of early childhood health interventions. When we set our prior, we excluded studies that did not report separate effects on income at different time periods. We guess that for several of these studies, it would be possible to re-analyze the primary data and create estimates of the effect on income at different time periods.
3. After briefly looking over the literature review GiveWell uses to build a prior on the long-term effects of deworming, it seems like further research would lead to different results.
GiveWell takes a “vote counting” approach where the studies are weighted equally[2]. But I would be very surprised if further research assigned equal weight to these studies because they appear to vary considerably in relevance, sample size, and quality.
Deworming analogies include preschool, schooling, low birth weight, early childhood stimulation, pollution, twin height differences, and nutritional school lunches. It’s unclear how relevant these are to deworming because the mechanisms for deworming to benefit income seem poorly understood.
Sample sizes aren’t noted. This could matter as one of the “pro-growth trajectory” studies, Gertler et al. (2021) have a follow-up sample size of around ~50. That seems unusually small, so it’s unclear how much weight that should receive relative to others. However, it is one of the only studies in an LMIC!
There are also two observational studies, which typically receive less weight than quasi-experimental trials or RCTs (Case et al. 2005, Currie and Hyson 1999).
4. Progress towards building a firmer prior seems straightforward. Is GiveWell planning on refining its prior for deworming’s trajectory? Or incentivizing more research on this topic, e.g., via a prize or a bounty? Here are some reasons why I think further progress may not be difficult:
The literature review seems like it could be somewhat easily expanded:
It seems plausible that you could use Liu and Liu (2019), another causal study of deworming’s long-term effects on income, to see if the long-term effects change depending on age. They were helpful when we asked them for assistance.
Somewhat at random, I looked at Duflo et al. (2021), which was passed over for inclusion in the review and found that it contained multiple follow-ups and found weak evidence for incomes increasing over time due to additional education.
The existing literature review on priors could be upgraded to a meta-analysis with time (data extraction is more tedious than technically challenging). A resulting meta-analysis where each study is weighted by precision and potentially a subjective assessment of relevance would be more clarifying than the present “vote counting” method.
It’s unclear if all the conclusions were warranted. GiveWell reads Lang and Nystedt (2018) as finding “Increases for males; mixed for females” and notes some quotes from the original study:
“From ages 30–34 and onwards, the height premium increases over the life cycle for men, starting at approximately 5%, reaching 10% at ages 45-64 and approximately 11-12% at ages 65-79 (i.e., in retirement).” [...] “Almost the opposite trend is found for women. Being one decimeter taller is associated with over 11% higher earnings for women aged 25–29. As the women age, the height premium decreases and levels off at approximately 6–7%.” [...] “The path of the height premium profile over the female adult life cycle is quite unstable, and no obvious trend can be seen (see Fig. 2).” (17-18)
But when I look up that same table (shown below), I see decay for women and growth for men.
- ^
Higher ln earnings effects from KLPS-2 to KLPS-3 are driven by lower control group earnings in KLPS-2 ($330 vs. $1165).[8] In KLPS-3, researchers started measuring farming profits in addition to other forms of earnings,[9]so part of the apparent increase in control group earnings from KLPS-2 to KLPS-3 is likely driven by a change in measurement, not real standards of living or catch-up growth.”
- ^
“We found 10 longitudinal studies with at least two adult follow-ups from a number of countries examining the impact of a range of childhood interventions or conditions (see this table), in addition to the deworming study (Hamory et al. 2021). Of those 10 studies, 3 found decreasing effects on income, 3 found increasing effects, and 4 found mixed effects (either similar effects across time periods, different patterns across males and females, or increases and then decreases over the life cycle). Based on this, we think it makes sense to continue to assume as a prior that income effects would be constant over time. I have low confidence in these estimates, though, and it’s possible further work could lead to a different conclusion.”
Hi John, it’s truly a delight to see someone visually illustrate our work better than we do. Great work!
Great piece. Short and sweet.
Given the stratospheric karma this post has reached, and the ensuing likelihood it becomes a referenced classic, I thought it’d be a good time to descend to some pedantry.
“Scope sensitivity” as a phrase doesn’t click with me. For some reason, it bounces off my brain. Please let me know if I seem alone in this regard. What scope are we sensitive to? The scope of impact? Also some of the related slogans “shut up and multiply” and “cause neutral” aren’t much clearer. “Shut up and multiply” which seems slightly offputting / crass as a phrase stripped of context, gives no hint at what we’re multiplying[1]. “Cause neutral” without elaboration, seems objectionable. We shouldn’t be neutral about causes! We should prefer the ones that do the most good! They both require extra context and elaboration. If this is something that is used to introduce EA, which now seems likelier, I think this section confuses a bit. A good slogan should have a clear, and difficult to misinterpret meaning that requires little elaboration. “Radical compassion / empathy” does a good job of this. “Scout mindset” is slightly more in-groupy, but I don’t think newbies would be surprised that thinking like a scout involves careful exploration of ideas and emphasizes the importance of reporting the truth of what you find.
Some alternatives to “scope sensitivity” are:
“Follow the numbers” / “crunch the numbers”: we don’t quite primarily “follow the data / evidence” anymore, but we certainly try to follow the numbers.
“More is better” / “More-imization” okay, this is a bit silly, but I assume that Peter was intentionally avoiding saying something like “Maximization mindset” which is more intuitive than “scope sensitivity”, but has probably fallen a bit out of vogue. We think that doing more good for the same cost is always better.
“Cost-effectiveness guided” while it sounds technocratic, that’s kind of the point. Ultimately it all comes back to cost-effectiveness. Why not say so?
- ^
If I knew nothing else, I’d guess it’s a suggestion of the profound implications of viewing probabilities as dependent (multiplicative) instead of dependent (addictive) and, consequently, support for complex systems approaches /GEM modelling instead of reductive OLSing with sparse interaction terms. /Joke
Jason,
You raise a fair point. One we’ve been discussing internally. Given the recent and expected adjustments to StrongMinds, it seems reasonable to update and clarify our position on AMF to say something like, “Under more views, AMF is better than or on par with StrongMinds. Note that currently, under our model, when AMF is better than StrongMinds, it isn’t wildly better.” Of course, while predicting how future research will pan out is tricky, we’d aim to be more specific.
A high neutral point implies that many people in developing countries believe their lives are not worth living.
This isn’t necessarily the case. I assume that if people described their lives as having negative wellbeing, this wouldn’t imply they thought their life was not worth continuing.
People can have negative wellbeing and still want to live for the sake of others or causes greater than themselves.
Life satisfaction appears to be increasing over time in low income countries. I think this progress is such that many people who may have negative wellbeing at present, will not have negative wellbeing their whole lives.
Edit: To expand a little, for these reasons, as well as the very reasonable drive to survive (regardless of wellbeing), I find it difficult to interpret revealed preferences and it’s unclear they’re a bastion of clarity in this confusing debate.
Anectdotally, I’ve clearly had periods of negative wellbeing before (sometimes starkly), but never wanted to die during those periods. If I knew that such periods were permanent, I’d probably think it was good for me to not-exist, but I’d still hesitate to say I’d prefer to not-exist, because I don’t just care about my wellbeing. As Tyrion said “Death is so final, and life is so full of possibilities.”
I think these difficulties should highlight that the difficulties here aren’t just localized to this area of the topic.
2. I don’t think 38% is a defensible estimate for spillovers, which puts me closer to GiveWell’s estimate of StrongMinds than HLI’s estimate of StrongMinds.
I wrote this critique of your estimate that household spillovers was 52%. That critique had three parts. The third part was an error, which you corrected and brought the answer down to 38%. But I think the first two are actually more important: you’re deriving a general household spillover effect from studies specifically designed to help household members, which would lead to an overestimate.
I thought you agreed with that from your response here, so I’m confused as to why you’re still defending 38%. Flagging that I’m not saying the studies themselves are weak (though it’s true that they’re not very highly powered). I’m saying they’re estimating a different thing from what you’re trying to estimate, and there are good reasons to think the thing they’re trying to estimate is higher. So I think your estimate should be lower.
I could have been clearer, the 38% is a placeholder while I do the Barker et al. 2022 analysis. You did update me about the previous studies’ relevance. My arguments are less supporting the 38% figure—which I expect to update with more data and more about explaining why I think that I have a higher prior for household spillovers from psychotherapy than you and Alex seem to. But really, the hope is that we can soon be discussing more and better evidence.
My intuition, which is shared by many, is that the badness of a child’s death is not merely due to the grief of those around them. Thus the question should not be comparing just the counterfactual grief of losing a very young child VS an [older adult], but also “lost wellbeing” from living a net-positive-wellbeing life in expectation.
I didn’t mean to imply that the badness of a child’s death is just due to grief. As I said in my main comment, I place substantial credence (2/3rds) in the view that death’s badness is the wellbeing lost. Again, this my view not HLIs.
The 13 WELLBY figure is the household effect of a single person being treated by StrongMinds. But that uses the uncorrected household spillover (53% spillover rate). With the correction (38% spillover) it’d be 10.5 WELLBYs (3.7 WELLBYs for recipient + 6.8 for household).
GiveWell arrives at the figure of 80% because they take a year of life as valued at 4.55 WELLBYs = 4.95 − 0.5 according to their preferred neutral point, and StrongMinds benefit ,according to HLI, to the direct recipient is 3.77 WELLBYs --> 3.77 / 4.55 = ~80%. I’m not sure where the 40% figure comes from.
To be clear on what the numbers are: we estimate that group psychotherapy has an effect of 10.5 WELLBYs on the recipient’s household, and that the death of a child in a LIC has a −7.3 WELLBY effect on the bereaved household. But the estimate for grief was very shallow. The report this estimate came from was not focused on making a cost-effectiveness estimate of saving a life (with AMF). Again, I know this sounds weasel-y, but we haven’t yet formed a view on the goodness of saving a life so I can’t say how much group therapy HLI thinks is preferable averting the death of a child.
That being said, I’ll explain why this comparison, as it stands, doesn’t immediately strike me as absurd. Grief has an odd counterfactual. We can only extend lives. People who’re saved will still die and the people who love them will still grieve. The question is how much worse the total grief is for a very young child (the typical beneficiary of e.g., AMF) than the grief for the adolescent, or a young adult, or an adult, or elder they’d become [1]-- all multiplied by mortality risk at those ages.
So is psychotherapy better than the counterfactual grief averted? Again, I’m not sure because the grief estimates are quite shallow, but the comparison seems less absurd to me when I hold the counterfactual in mind.
- ^
I assume people, who are not very young children, also have larger social networks and that this could also play into the counterfactual (e.g., non-children may be grieved for by more people who forged deeper bonds). But I’m not sure how much to make of this point.
- ^
I’d point to the literature on time lagged correlations between household members emotional states that I quickly summarised in the last installment of the household spillover discussion. I think it implies a household spillover of 20%. But I don’t know if this type of data should over- or -underestimate the spillover ratio relative to what we’d find in RCTs. I know I’m being really slippery about this, but the Barker et al. analysis stuff so far makes me think it’s larger than that.
I find nothing objectionable in that characterization. And if we only had these three studies to guide us then I’d concede that a discount of some size seems warranted. But we also have A. our priors. And B. some new evidence from Barker et al. Both of point me away from very small spillovers, but again I’m still very unsure. I think I’ll have clearer views once I’m done analyzing the Barker et al. results and have had someone, ideally Nathanial Barker, check my work.
[Edit: Michael edited to add: “It’s not clear any specific number away from 0 could be justified.”] Well not-zero certainly seems more justifiable than zero. Zero spillovers implies that emotional empathy doesn’t exist, which is an odd claim.
Hah! Yeah, stepping back, I think these events are a distraction for most people. Especially if they worsen one’s mental health. For me, reflecting on the war makes me feel so grateful and lucky to live where I do.
Another reason to pay attention is when it seems like it could shortly and sharply affect the chances of catastrophe. At the beginning of the war, I kept asking myself, “At what probability of nuclear war should I: make a plan, consider switching jobs, move to Argentina, etc.” But I think we’ve moved out of the scary zone for a while.