Super exciting work! Sharing a few quick thoughts:
1. I wonder if you’ve explored some of the reasons for effect size heterogeneity in ways that go beyond formal moderator analyses. In other words, I’d be curious if you have a “rough sense” of why some programs seem to be so much better than others. Is it just random chance? Study design factors? Or could it be that some CT programs are implemented much better than others, and there is a “real” difference between the best CT programs and the average CT programs?
This seems important because, in practice, donors are rarely deciding between funding the “average” CT program or the “average” [something else] program. Instead, they’d ideally want to choose between the “best” CT program to the “best” [something else] program. In other words, when I go to GiveWell, I don’t want to know about the “average” Malaria program or the “average” CT program—I want to know the best program for each category & how they compare to each other.
This might become even more important in analyses of other kinds of interventions, where the implementation factors might matter more. For instance, in the psychotherapy literature, I know a lot of people are cautious about making too many generalizations based on “average” effect sizes (which can be weighed down by studies that had poor training procedures, recruited populations that were unlikely to benefit, etc.).
With this in mind, what do you think is currently the “best” CT program, and how effective is it?
2. I’d be interested in seeing the measures that the studies used to measure life satisfaction, depression, and subjective well-being.
I’m especially interested in the measurement of life satisfaction. My impression is that the most commonly used life satisfaction measure (this one) might lead to an overestimation of the relationship between CTs and life satisfaction. I think two (of the five) the items could prime people to think more about their material conditions than their “happiness.” Items listed below:
The conditions of my life are excellent (when people think about “conditions,” I think many people might think about material/economic conditions moreso than affective/emotional conditions).
So far I have gotten the important things I want in life (when people think about things they want, I think many people will consider material/economic things moreso than affective/emotional things)
I have no data to suggest that this is true, so I’m very open to being wrong. Maybe these don’t prime people toward thinking in material/economic terms at all. But if they do, I think they could inflate the effect size of CT programs on life satisfaction (relative to the effect size that would be found if we used a measure of life satisfaction that was less likely to prime people to think materialistically).
Also, a few minor things I noticed:
1. “The average effect size (Cohen’s d) of 38 CT studies on our composite outcome of MH and SWB is 0.10 standard deviations (SDs) (95% CI: 0.8, 0.13).”
I believe there might be a typo here—was it supposed to be “0.08, 0.13”?
2. I believe there are two “Figure 5”s—the forest plot should probably be Figure 6.
Best of luck with next steps—looking forward to seeing analyses of other kinds of interventions!
Hi Akash, It’s been a few months since your comment but I’m replying in case its still useful.
I’d be curious if you have a “rough sense” of why some programs seem to be so much better than others.
General note is that I am, for at least the next year, mostly staying away from comparing programs and instead will compare interventions. Hopefully one can estimate the impacts of a program from the work I do modeling interventions.
That being said let me try and answer your question.
One of the reasons why CTs make an elegant benchmark is there are relatively few moving parts on both ends. You inform someone they will receive cash. They then do what needs to be done to receive it, which at most means walking a long long ways. The issues with “quality” seem to arise primarily from A. How convenient they make it. and B. whether the provider reliably follow through with the transfers. Biggest variation I’m concerned with comes with administrative costs as share of the CT, which we still have very little information on. But that’s a factor on the cost not effect side of things.
From this simple description, I expect the programs that do best are those that use digital or otherwise automatic transfers AND are reliable. I don’t think this is situation where the best is 10x as good as average, I’m not sure there’s enough play in the system (however 3-5x variation in cost effectiveness seems possible).
I think GiveDirectly is a good program and quite a bit better than the average government unconditional CT (can put a number on that in private if you’d like). I’m not saying it’s the “best” because as I started this comment by saying, I’m not actively searching for the best program right now. I have some ideas for how we’d quickly compare programs though, I’d be happy to talk about that in private.
However, I can’t help but comment that there are some hard to quantify factors I haven’t incorporated that could favor government programs .For instance, there’s evidence that CTs when reliably ran can increase trust in governments.
But the decision maker isn’t always a donor. It may be a mid-level bureaucrat that can allocate money between programs, in which case intervention level analyses could be useful.
This might become even more important in analyses of other kinds of interventions, where the implementation factors might matter more.
Yes!
But if they do, I think they could inflate the effect size of CT programs on life satisfaction (relative to the effect size that would be found if we used a measure of life satisfaction that was less likely to prime people to think materialistically).
I agree. It may be worth it to roughly classify the “materialness” of different measures and see if that predicts larger effects of a cash transfer.
Super exciting work! Sharing a few quick thoughts:
1. I wonder if you’ve explored some of the reasons for effect size heterogeneity in ways that go beyond formal moderator analyses. In other words, I’d be curious if you have a “rough sense” of why some programs seem to be so much better than others. Is it just random chance? Study design factors? Or could it be that some CT programs are implemented much better than others, and there is a “real” difference between the best CT programs and the average CT programs?
This seems important because, in practice, donors are rarely deciding between funding the “average” CT program or the “average” [something else] program. Instead, they’d ideally want to choose between the “best” CT program to the “best” [something else] program. In other words, when I go to GiveWell, I don’t want to know about the “average” Malaria program or the “average” CT program—I want to know the best program for each category & how they compare to each other.
This might become even more important in analyses of other kinds of interventions, where the implementation factors might matter more. For instance, in the psychotherapy literature, I know a lot of people are cautious about making too many generalizations based on “average” effect sizes (which can be weighed down by studies that had poor training procedures, recruited populations that were unlikely to benefit, etc.).
With this in mind, what do you think is currently the “best” CT program, and how effective is it?
2. I’d be interested in seeing the measures that the studies used to measure life satisfaction, depression, and subjective well-being.
I’m especially interested in the measurement of life satisfaction. My impression is that the most commonly used life satisfaction measure (this one) might lead to an overestimation of the relationship between CTs and life satisfaction. I think two (of the five) the items could prime people to think more about their material conditions than their “happiness.” Items listed below:
The conditions of my life are excellent (when people think about “conditions,” I think many people might think about material/economic conditions moreso than affective/emotional conditions).
So far I have gotten the important things I want in life (when people think about things they want, I think many people will consider material/economic things moreso than affective/emotional things)
I have no data to suggest that this is true, so I’m very open to being wrong. Maybe these don’t prime people toward thinking in material/economic terms at all. But if they do, I think they could inflate the effect size of CT programs on life satisfaction (relative to the effect size that would be found if we used a measure of life satisfaction that was less likely to prime people to think materialistically).
Also, a few minor things I noticed:
1. “The average effect size (Cohen’s d) of 38 CT studies on our composite outcome of MH and SWB is 0.10 standard deviations (SDs) (95% CI: 0.8, 0.13).”
I believe there might be a typo here—was it supposed to be “0.08, 0.13”?
2. I believe there are two “Figure 5”s—the forest plot should probably be Figure 6.
Best of luck with next steps—looking forward to seeing analyses of other kinds of interventions!
Hi Akash, It’s been a few months since your comment but I’m replying in case its still useful.
General note is that I am, for at least the next year, mostly staying away from comparing programs and instead will compare interventions. Hopefully one can estimate the impacts of a program from the work I do modeling interventions.
That being said let me try and answer your question.
One of the reasons why CTs make an elegant benchmark is there are relatively few moving parts on both ends. You inform someone they will receive cash. They then do what needs to be done to receive it, which at most means walking a long long ways. The issues with “quality” seem to arise primarily from A. How convenient they make it. and B. whether the provider reliably follow through with the transfers. Biggest variation I’m concerned with comes with administrative costs as share of the CT, which we still have very little information on. But that’s a factor on the cost not effect side of things.
From this simple description, I expect the programs that do best are those that use digital or otherwise automatic transfers AND are reliable. I don’t think this is situation where the best is 10x as good as average, I’m not sure there’s enough play in the system (however 3-5x variation in cost effectiveness seems possible).
I think GiveDirectly is a good program and quite a bit better than the average government unconditional CT (can put a number on that in private if you’d like). I’m not saying it’s the “best” because as I started this comment by saying, I’m not actively searching for the best program right now. I have some ideas for how we’d quickly compare programs though, I’d be happy to talk about that in private.
However, I can’t help but comment that there are some hard to quantify factors I haven’t incorporated that could favor government programs .For instance, there’s evidence that CTs when reliably ran can increase trust in governments.
But the decision maker isn’t always a donor. It may be a mid-level bureaucrat that can allocate money between programs, in which case intervention level analyses could be useful.
Yes!
I agree. It may be worth it to roughly classify the “materialness” of different measures and see if that predicts larger effects of a cash transfer.