[Epistemic status: Writing off-the-cuff about issues I haven’t thought about in a while—would welcome pushback and feedback]
Thanks for this post, I found it thought-provoking! I’m happy to see insightful global development content like this on the Forum.
My views after reading your post are:
You’re probably right that it doesn’t make sense for all studies to be benchmarking their intervention against cash transfers;
I still think there are good reasons for practitioners to think hard about whether their programs do more good than bduget-equivalent cash transfers would;
Your post raises issues that challenge the usefulness of RCTs in general, not just RCTs that compare interventions to cash transfers.
Why I like cash benchmarking
You write:
That’s the role that a cash arm plays: rather than just check if a program is better than doing nothing at all (comparing to a control), we index it against a simple intervention that we know works well: cash.
The reason I find a cash benchmark useful feels a bit different than this. IMO the purpose of cash benchmarking is to compare a program to a practical counterfactual: just giving the money to beneficiaries instead of funding a more complicated program. It feels intuitive to me that it’s bad to fund a development program that ends up helping people less than just giving people the cash directly instead. So the key thing is not that ‘we know cash works well’ - it’s that giving cash away instead is almost always a feasible alternative to whatever development program one is funding.
That still feels pretty compelling to me. I previously worked in development and was often annoyed, and sometimes furious, about the waste and bureaucratic bs we had to put up with to run simple interventions. Cash benchmarking to me is meant to test whether the beneficiaries would be better off if, instead of hiring another consultant or buying more equipment, we had just given them the money.
Problems with RCTs
You write:
I am most familiar with our own program but I expect this applies to many other international development programs too: your medicine/training/infrastructure/etc program will very likely deliver benefits over a different timeline to cash, making a direct RCT comparison dependent more on survey timing than intervention efficacy.
This is a really good point. In combination with the graph you posted, I’m not sure I’ve seen it laid out so clearly previously. But it seems like you’ve raised an issue with not just cash benchmarking, but with our ability to use RCTs to usefully measure program effects at all.
In your graph, you point out that the timing of your follow-up survey will affect your estimate of the gap between the effects of your intervention and the effects of a cash benchmark. But we’d have the same issue if we wanted to compare the effects of your interventions to all the other interventions we could possibly fund or deliver. And if we want to maximize impact, we should be considering all these different possibilities.
More worringly: what we really care about is not the gap between the effects at a given point in time. What we care about is the difference between the integrals of those curves. The difference in total impact (divided by program cost).
But, as you say, surveys are expensive and difficult. It’s rare to even have one follow-up survey, much less a sufficient number of surveys to construct the shape of the benefits curve.
It seems to me people mostly muddle through and ignore that this is an issue. But the people who really care fill in the blanks with assumptions. GiveWell, for example, makes a lot of assumptions about the benefits-over-time of the interventions they compare. To their eternal credit you can see these in their public cost-effectiveness model. They make an assumption about how much of the transfer is invested;[1] they make an assumption about how much that investment returns over time; they make an assumption about how many years that investment lasts; etc. etc. And they do similar things for the other interventions they consider.
All of this, though, is updating me further against RCTs really providing that much practical value for practitioners or funders. Estimating the true benefits of even the most highly-scrutinized interventions requires making a lot of assumptions. I’m a fan of doing this. I think we should accept the uncertainty we face and make decisions that seem good in expectation. But once we’ve accepted that, I start to question why we’re messing around with RCTs at all.
More worringly: what we really care about is not the gap between the effects at a given point in time. What we care about is the difference between the integrals of those curves. The difference in total impact (divided by program cost).
Yes. I don’t think the issue is with cash transfers alone. It’s that most RCTs (I’m most familiar with the subjective wellbeing / mental health literature) don’t perform or afford the analysis of the total impact of an intervention. The general shortcoming is the lack of information about the decay or growth of effects over time.
But I don’t quite share your update away from using RCTs. Instead, we should demand better data and analysis of RCTs.
Despite the limited change-over-time data on many interventions, we often don’t need to guess what happens over time (if by “requires making a lot of assumptions” you mean more of an educated guess rather than modelling something with empirically estimated parameters). At the Happier Lives Institute, we estimate the total effects by first evaluating an initial effect (what’s commonly measured in meta-analyses) and then empirically estimating how the effect changes over time (which is rarely done). You can dig into how we’ve done this by using meta-analyses in our cash transfers and psychotherapy reports here.
While not perfect, if we have two-time points within or between studies, we can use the change in the impact between those time points to inform our view on how long the effect lasts, and thus estimate the total effect of an intervention. The knot is then cut.
FWIW, from our report about cash transfers, we expect the effect on subjective wellbeing to last around a decade or less. Interestingly, some studies of “big push” asset transfers find effects on subjective wellbeing that have not declined or grown after a decade post-intervention. If that holds up to scrutiny, that’s a way in which asset transfers could be more cost-effective than cash transfers.
Note to reader: Michael Plant (who commented elsewhere) and I both work at HLI, which is related to why we’re expressing similar views.
[Epistemic status: Writing off-the-cuff about issues I haven’t thought about in a while—would welcome pushback and feedback]
Thanks for this post, I found it thought-provoking! I’m happy to see insightful global development content like this on the Forum.
My views after reading your post are:
You’re probably right that it doesn’t make sense for all studies to be benchmarking their intervention against cash transfers;
I still think there are good reasons for practitioners to think hard about whether their programs do more good than bduget-equivalent cash transfers would;
Your post raises issues that challenge the usefulness of RCTs in general, not just RCTs that compare interventions to cash transfers.
Why I like cash benchmarking
You write:
The reason I find a cash benchmark useful feels a bit different than this. IMO the purpose of cash benchmarking is to compare a program to a practical counterfactual: just giving the money to beneficiaries instead of funding a more complicated program. It feels intuitive to me that it’s bad to fund a development program that ends up helping people less than just giving people the cash directly instead. So the key thing is not that ‘we know cash works well’ - it’s that giving cash away instead is almost always a feasible alternative to whatever development program one is funding.
That still feels pretty compelling to me. I previously worked in development and was often annoyed, and sometimes furious, about the waste and bureaucratic bs we had to put up with to run simple interventions. Cash benchmarking to me is meant to test whether the beneficiaries would be better off if, instead of hiring another consultant or buying more equipment, we had just given them the money.
Problems with RCTs
You write:
This is a really good point. In combination with the graph you posted, I’m not sure I’ve seen it laid out so clearly previously. But it seems like you’ve raised an issue with not just cash benchmarking, but with our ability to use RCTs to usefully measure program effects at all.
In your graph, you point out that the timing of your follow-up survey will affect your estimate of the gap between the effects of your intervention and the effects of a cash benchmark. But we’d have the same issue if we wanted to compare the effects of your interventions to all the other interventions we could possibly fund or deliver. And if we want to maximize impact, we should be considering all these different possibilities.
More worringly: what we really care about is not the gap between the effects at a given point in time. What we care about is the difference between the integrals of those curves. The difference in total impact (divided by program cost).
But, as you say, surveys are expensive and difficult. It’s rare to even have one follow-up survey, much less a sufficient number of surveys to construct the shape of the benefits curve.
It seems to me people mostly muddle through and ignore that this is an issue. But the people who really care fill in the blanks with assumptions. GiveWell, for example, makes a lot of assumptions about the benefits-over-time of the interventions they compare. To their eternal credit you can see these in their public cost-effectiveness model. They make an assumption about how much of the transfer is invested;[1] they make an assumption about how much that investment returns over time; they make an assumption about how many years that investment lasts; etc. etc. And they do similar things for the other interventions they consider.
All of this, though, is updating me further against RCTs really providing that much practical value for practitioners or funders. Estimating the true benefits of even the most highly-scrutinized interventions requires making a lot of assumptions. I’m a fan of doing this. I think we should accept the uncertainty we face and make decisions that seem good in expectation. But once we’ve accepted that, I start to question why we’re messing around with RCTs at all.
They base this on a study of cash transfers in Kenya. But of course the proportion of transfer invested likely differs across time and locations
You beat me to commenting this!
I’ve always thought the question of “when should we measure endpoints?” is the biggest weakness of RCTs and doesn’t get enough attention.
Yes. I don’t think the issue is with cash transfers alone. It’s that most RCTs (I’m most familiar with the subjective wellbeing / mental health literature) don’t perform or afford the analysis of the total impact of an intervention. The general shortcoming is the lack of information about the decay or growth of effects over time.
But I don’t quite share your update away from using RCTs. Instead, we should demand better data and analysis of RCTs.
Despite the limited change-over-time data on many interventions, we often don’t need to guess what happens over time (if by “requires making a lot of assumptions” you mean more of an educated guess rather than modelling something with empirically estimated parameters). At the Happier Lives Institute, we estimate the total effects by first evaluating an initial effect (what’s commonly measured in meta-analyses) and then empirically estimating how the effect changes over time (which is rarely done). You can dig into how we’ve done this by using meta-analyses in our cash transfers and psychotherapy reports here.
While not perfect, if we have two-time points within or between studies, we can use the change in the impact between those time points to inform our view on how long the effect lasts, and thus estimate the total effect of an intervention. The knot is then cut.
FWIW, from our report about cash transfers, we expect the effect on subjective wellbeing to last around a decade or less. Interestingly, some studies of “big push” asset transfers find effects on subjective wellbeing that have not declined or grown after a decade post-intervention. If that holds up to scrutiny, that’s a way in which asset transfers could be more cost-effective than cash transfers.
Note to reader: Michael Plant (who commented elsewhere) and I both work at HLI, which is related to why we’re expressing similar views.