I agree that longer term data collection can help here in principle, if the initial differences in impact timing wash out over the years. One reason we didn’t do that was statistical power: we expected our impact to decrease over time, so longer term surveys would require a larger sample to detect this smaller impact. I think we were powered to measure something like a $12/month difference in household consumption. I think I’d still call a program that cost $120 and increased consumption by, say, $3/month 10 years later a “success”, but cutting the detectable effect by 1⁄4 takes 16x the sample size. Throw in a cash arm, and that’s a 32x bigger sample (64,000 households in our case). We could get a decent sense of whether our program had worked vs control over a shorter (smaller sample) timeline, and so we went with that.
If the concern is about which measure of impact to use—you cite issues with people remembering their spending—then the (I think) obvious response is to measure individuals’ subjective wellbeing, eg 0-10 “how satisfied are you with your life nowadays?” which allows them to integrate all the background information of their life when answering the question.
The subjective wellbeing idea is interesting (and I will read your study, I only skimmed for now but I was impressed). It isn’t obvious to me that subjective wellbeing isn’t also just a snapshot of a person’s welfare and so prone to similar issues to consumption e.g. you might see immediate subjective welfare gains in the cash arm but the program arm won’t start feeling better until they harvest their crops. I’m not really familiar with the measure, I might be missing something there.
I agree with you that you don’t need a cash arm to prove your alternative didn’t work. But, if you already knew in advance your alternative would be worse, then it raises questions as to why you’d do it at all.
Agreed—I’m sure they expected their program to work, I just don’t think adding a cash arm really helped them determine if it did or not.
It isn’t obvious to me that subjective wellbeing isn’t also just a snapshot of a person’s welfare
Life satisfaction score is a snapshot, while WELLBY is its integral over time, so it’s WELLBY you want. The World Happiness Report’s article on this is a good primer.
Thanks for the interesting reflections.
I agree that longer term data collection can help here in principle, if the initial differences in impact timing wash out over the years. One reason we didn’t do that was statistical power: we expected our impact to decrease over time, so longer term surveys would require a larger sample to detect this smaller impact. I think we were powered to measure something like a $12/month difference in household consumption. I think I’d still call a program that cost $120 and increased consumption by, say, $3/month 10 years later a “success”, but cutting the detectable effect by 1⁄4 takes 16x the sample size. Throw in a cash arm, and that’s a 32x bigger sample (64,000 households in our case). We could get a decent sense of whether our program had worked vs control over a shorter (smaller sample) timeline, and so we went with that.
The subjective wellbeing idea is interesting (and I will read your study, I only skimmed for now but I was impressed). It isn’t obvious to me that subjective wellbeing isn’t also just a snapshot of a person’s welfare and so prone to similar issues to consumption e.g. you might see immediate subjective welfare gains in the cash arm but the program arm won’t start feeling better until they harvest their crops. I’m not really familiar with the measure, I might be missing something there.
Agreed—I’m sure they expected their program to work, I just don’t think adding a cash arm really helped them determine if it did or not.
Life satisfaction score is a snapshot, while WELLBY is its integral over time, so it’s WELLBY you want. The World Happiness Report’s article on this is a good primer.