I think I agree with the rest of this analysis (or at least the parts I could understand). However, the following paragraph seems off:
To its credit, the write-up does highlight this, but does not seem to appreciate the implications are crazy: any PT intervention, so long as it is cheap enough, should be thought better than GD, even if studies upon it show very low effect size
Apologies if I’m being naive here, but isn’t this just a known problem of first-order cost-effectiveness analysis, not with this particular analysis per se? I mean, since cheapness could be arbitrarily low (or at least down to $0.01), “better than GD” is a bit of a red herring, the claim is merely that a single (even high-quality) study is not enough for someone to update their prior all the way down to zero, or negative.
And stated in English, this seems eminently reasonable to me. There might be good second-order etc reasons to not act on a naive first-order analysis (eg risk/ambiguity aversion, wanting to promote better studies, etc). But ultimately I don’t think that literal claim is crazy to me, and naively seems like something that naturally falls out of a direct cost-effectiveness framework.
So the problem I had in mind was in the parenthetical in my paragraph:
To its credit, the write-up does highlight this, but does not seem to appreciate the implications are crazy: any PT intervention, so long as it is cheap enough, should be thought better than GD, even if studies upon it show very low effect size (which would usually be reported as a negative result, as almost any study in this field would be underpowered to detect effects as low as are being stipulated)
To elaborate: the actual data on Strongminds was a n~250 study by Bolton et al. 2003 then followed up by Bass et al. 2006. HLI models this in table 19:
So an initial effect of g = 1.85, and a total impact of 3.48 WELLBYs. To simulate what the SM data will show once the (anticipated to be disappointing) forthcoming Baird et al. RCT is included, they discount this[1] by a factor of 20.
Thus the simulated effect size of Bolton and Bass is now ~0.1. In this simulated case, the Bolton and Bass studies would be reporting negative results, as they would not be powered to detect an effect size as small as g = 0.1. To benchmark, the forthcoming Baird et al. study is 6x larger than these, and its power calculations have minimal detectable effects g = 0.1 or greater.
Yet, apparently, in such a simulated case we should conclude that Strongminds is fractionally better than GD purely on the basis of two trials reporting negative findings, because numerically the treatment groups did slightly (but not significantly) better than the control ones.
Even if in general we are happy with ‘hey, the effect is small, but it is cheap, so it’s a highly cost-effective intervention’, we should not accept this at the point when ‘small’ becomes ‘too small to be statistically significant’. Analysis method + negative findings =! fractionally better in expectation vs. cash transfers, so I take it as diagnostic the analysis is going wrong.
I think ‘this’ must be the initial effect size/intercept, as 3.48 * 0.05 ~ 1.7 not 3.8. I find this counter-intuitive, as I think the drop in total effect should be super not sub linear with intercept, but ignore that.
I think I agree with the rest of this analysis (or at least the parts I could understand). However, the following paragraph seems off:
Apologies if I’m being naive here, but isn’t this just a known problem of first-order cost-effectiveness analysis, not with this particular analysis per se? I mean, since cheapness could be arbitrarily low (or at least down to $0.01), “better than GD” is a bit of a red herring, the claim is merely that a single (even high-quality) study is not enough for someone to update their prior all the way down to zero, or negative.
And stated in English, this seems eminently reasonable to me. There might be good second-order etc reasons to not act on a naive first-order analysis (eg risk/ambiguity aversion, wanting to promote better studies, etc). But ultimately I don’t think that literal claim is crazy to me, and naively seems like something that naturally falls out of a direct cost-effectiveness framework.
So the problem I had in mind was in the parenthetical in my paragraph:
To elaborate: the actual data on Strongminds was a n~250 study by Bolton et al. 2003 then followed up by Bass et al. 2006. HLI models this in table 19:
So an initial effect of g = 1.85, and a total impact of 3.48 WELLBYs. To simulate what the SM data will show once the (anticipated to be disappointing) forthcoming Baird et al. RCT is included, they discount this[1] by a factor of 20.
Thus the simulated effect size of Bolton and Bass is now ~0.1. In this simulated case, the Bolton and Bass studies would be reporting negative results, as they would not be powered to detect an effect size as small as g = 0.1. To benchmark, the forthcoming Baird et al. study is 6x larger than these, and its power calculations have minimal detectable effects g = 0.1 or greater.
Yet, apparently, in such a simulated case we should conclude that Strongminds is fractionally better than GD purely on the basis of two trials reporting negative findings, because numerically the treatment groups did slightly (but not significantly) better than the control ones.
Even if in general we are happy with ‘hey, the effect is small, but it is cheap, so it’s a highly cost-effective intervention’, we should not accept this at the point when ‘small’ becomes ‘too small to be statistically significant’. Analysis method + negative findings =! fractionally better in expectation vs. cash transfers, so I take it as diagnostic the analysis is going wrong.
I think ‘this’ must be the initial effect size/intercept, as 3.48 * 0.05 ~ 1.7 not 3.8. I find this counter-intuitive, as I think the drop in total effect should be super not sub linear with intercept, but ignore that.