I expected low effects based on background assumptions like utility being sublogarithmic to income but I didn’t expect the effect to be actually zero (at the level of power that the very big studies could detect).
I’d be interested in replicating these trials in other developed countries. It could just be because the US is unusually wealthy.
Of course, like you say, this is further evidence we should increase foreign aid, since money could do far more good in very poor countries than very rich ones.
I haven’t run the numbers, and this is not my field so the below is very low confidence, but now that you mention it I wouldn’t be surprised if isoelastic utility would be enough to explain the lack of results.
LLMs claim that if the effect size is 10 times smaller, you need a sample size 100x larger to have the same statistical significance (someone correct me if this is wrong)
So if a GiveDirectly RCT in Kenya needs a sample size of 2,000 individuals to detect a statistically significant effect; an RCT in the US where you expect the effect to be 10x smaller would need 200,000 individuals, which is intractable[1].
Another intuition is that the effects of cash transfers in LMIC are significant but not huge, and iirc many experts claim that after ~5 years from the transfer there are negligible effects on subjective well-being, so it wouldn’t take that much for the effect to become undetectable.
Edit: Gemini raises a good point that variance could also be higher in the US. If the standard deviation of wellbeing for beneficiaries in the US is 2x larger than in Kenya, and the effect is 10x smaller, I think you’d need a 400x larger sample size, not “just” 100x
Hmm, I’d guess the s.d.s to be lower in the US than in Kenya, for what it’s worth. Under five child mortality rates are about 10x higher in Kenya than in the US, and I expect stuff like that to bleed through to other places.
But even if we assume a smaller s.d. (check: is there a smaller s.d., empirically?), this might be a major problem. The top paper on OpenResearch says they can rule out health improvements greater than 0.023-0.028 standard deviations from giving $1000/month. I’m not sure how it compares to households incomes, but let’s assume that household income is $2000-$3000/month for the US recipients now, so the transfer is 33-50% of household income.
From Googling around, the GiveDirectly studies show mental health effects around a quarter of a standard deviation from a doubling of income.
In other words, the study can rule out effect sizes the size that theory would predict if theory predicts effect sizes >0.1x the effect size in GiveDirectly experiments.
Does theory predict effect sizes >0.1x that of the GiveDirectly experiments? Well, it depends on ň, the risk-aversion constant[1]! If risk aversion is between 0 (linear, aka insane) and 1 (logarithmic) we should predict changes >.41 x to >.58x that of the GD experiments. So we can rule out linera, logarithmic, and super-logarithmic utility!
But if ň=1.5, then theory will predict changes on the scale of 0.076x to 0.128x that of the GD experiments. Ie, exactly in the boundary of whether it’s possible to detect an effect at all or not!
If ň =2, then theory will predict changes on the scale of 0.014x to 0.028x, or much smaller than the experiments are powered to detect.
For what it’s worth, before the studies I would’ve guessed a risk-aversion constant across countries to be something in between 1.2 and 2, so this study updates me some but not a lot.
@Kelsey Piper and others, did you or the study authors pre-register your beliefs on what risk-aversion constant you expected?
But if ň=1.5, then theory will predict changes on the scale of 0.076x to 0.128x that of the GD experiments. Ie, exactly in the boundary of whether it’s possible to detect an effect at all or not!
Might be misreading, on a quick skim Sam Nolan’s analysis seemed pertinent but noticed you’d already commented. Sam’s reply still seems useful to me, in particular the data here
I’m not sure about using these results to update your estimates of ň (as there are too many other differences between the US and LMICs, e.g. access to hospitals, no tubercolosis). But it does seem that reasonable values of ň would explain most of the lack of effects, especially for the study where mothers received “just” $333/month and similar ones.
I expected low effects based on background assumptions like utility being sublogarithmic to income but I didn’t expect the effect to be actually zero (at the level of power that the very big studies could detect).
I’d be interested in replicating these trials in other developed countries. It could just be because the US is unusually wealthy.
Of course, like you say, this is further evidence we should increase foreign aid, since money could do far more good in very poor countries than very rich ones.
I haven’t run the numbers, and this is not my field so the below is very low confidence, but now that you mention it I wouldn’t be surprised if isoelastic utility would be enough to explain the lack of results.
LLMs claim that if the effect size is 10 times smaller, you need a sample size 100x larger to have the same statistical significance (someone correct me if this is wrong)
So if a GiveDirectly RCT in Kenya needs a sample size of 2,000 individuals to detect a statistically significant effect; an RCT in the US where you expect the effect to be 10x smaller would need 200,000 individuals, which is intractable[1].
Another intuition is that the effects of cash transfers in LMIC are significant but not huge, and iirc many experts claim that after ~5 years from the transfer there are negligible effects on subjective well-being, so it wouldn’t take that much for the effect to become undetectable.
But again, this is all an uninformed vague guess.
Edit: Gemini raises a good point that variance could also be higher in the US. If the standard deviation of wellbeing for beneficiaries in the US is 2x larger than in Kenya, and the effect is 10x smaller, I think you’d need a 400x larger sample size, not “just” 100x
Hmm, I’d guess the s.d.s to be lower in the US than in Kenya, for what it’s worth. Under five child mortality rates are about 10x higher in Kenya than in the US, and I expect stuff like that to bleed through to other places.
But even if we assume a smaller s.d. (check: is there a smaller s.d., empirically?), this might be a major problem. The top paper on OpenResearch says they can rule out health improvements greater than 0.023-0.028 standard deviations from giving $1000/month. I’m not sure how it compares to households incomes, but let’s assume that household income is $2000-$3000/month for the US recipients now, so the transfer is 33-50% of household income.
From Googling around, the GiveDirectly studies show mental health effects around a quarter of a standard deviation from a doubling of income.
In other words, the study can rule out effect sizes the size that theory would predict if theory predicts effect sizes >0.1x the effect size in GiveDirectly experiments.
Does theory predict effect sizes >0.1x that of the GiveDirectly experiments? Well, it depends on ň, the risk-aversion constant[1]! If risk aversion is between 0 (linear, aka insane) and 1 (logarithmic) we should predict changes >.41 x to >.58x that of the GD experiments. So we can rule out linera, logarithmic, and super-logarithmic utility!
But if ň=1.5, then theory will predict changes on the scale of 0.076x to 0.128x that of the GD experiments. Ie, exactly in the boundary of whether it’s possible to detect an effect at all or not!
If ň =2, then theory will predict changes on the scale of 0.014x to 0.028x, or much smaller than the experiments are powered to detect.
For what it’s worth, before the studies I would’ve guessed a risk-aversion constant across countries to be something in between 1.2 and 2, so this study updates me some but not a lot.
@Kelsey Piper and others, did you or the study authors pre-register your beliefs on what risk-aversion constant you expected?
rendering the greek constant economists use for risk aversion
as ň since it otherwise doesn’t render correctly on my laptop.
Might be misreading, on a quick skim Sam Nolan’s analysis seemed pertinent but noticed you’d already commented. Sam’s reply still seems useful to me, in particular the data here
although none of those countries are low-income so your concern re: OOD generalisation still applies.
lmao when I commented 3 years ago I said
and then I just did an out-of-country and out-of-distribution generalization with no caveats! I could be really silly sometimes lol.
Thank you for running the numbers!
I’m not sure about using these results to update your estimates of ň (as there are too many other differences between the US and LMICs, e.g. access to hospitals, no tubercolosis). But it does seem that reasonable values of ň would explain most of the lack of effects, especially for the study where mothers received “just” $333/month and similar ones.