Thank you for sharing these Joel. You’ve got a lot going on in the comments here, so I’m going only make a few brief specific comments and one larger one. The larger one relates to something you’ve noted elsewhere in the thread, which is:
“That the quality of this analysis was an attempt to be more rigorous than most shallow EA analyses, but definitely less rigorous than an quality peer reviewed academic paper. I think this [...] is not something we clearly communicated.”
This work forms part of the evidence base behind some strong claims from HLI about where to give money, so I did expect it to be more rigorous. I wondered if I was alone in being surprised here, so I did a very informal (n = 23!) Twitter poll in the EA group asking about what people expected re: the rigor of evidence for charity recommendations. (I fixed my stupid Our World in Data autocorrect glitch in a follow up tweet).
I don’t want to lean on this too much, but I do think it suggests that I’m not alone in expecting a higher degree of rigor when it comes to where to put charity dollars. This is perhaps mostly a communication issue, but I also think that as quality of analysis and evidence becomes less rigorous then claims should be toned down or at least the uncertainty (in the broad sense) needs to be more strongly expressed.
On the specifics, first, I appreciate you noting the apparent publication bias. That’s both important and not great.
Second, I think comparing the cash transfer funnel plot to the other one is informative. The cash transfer one looks “right”. It has the correct shape and it’s comforting to see the Egger regression line is basically zero. This is definitely not the case with the StrongMinds MA. The funnel plot looks incredibly weird, which could be heterogeneity that we can model but should regardless make everyone skeptical because doing that kind of modelling well is very hard. It’s also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero. In other words, unwinding publication bias in this way would lead us to guess at a true effect of around nothing. Do I believe that? I’m not sure. There are good reasons to be skeptical of Egger-type regressions, but all of this definitely increases my skepticism of the results. While I’m glad it’s public now, I don’t feel great that this wasn’t part of the very public first cut of the results.
Again, I appreciate you responding. I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
Interesting poll Ryan! I’m not sure how much to take away because I think epistemic / evidentiary standards is pretty fuzzy in the minds of most readers. But still, point taken that people probably expect high standards.
It’s also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero.
I’m not sure about that. Here’s the output of the Egger test. If I’m interpreting it correctly then that’s smaller, but not zero. I’ll try to figure out how what the p-curve suggested correction says.
Edit: I’m also not sure how much to trust the Egger test to tell me what the corrected effect size should be, so this wasn’t an endorsement that I think the real effect size should be halfed. It seems different ways of making this correction give very different answers. I’ll add a further comment with more details.
I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
Thank you for sharing these Joel. You’ve got a lot going on in the comments here, so I’m going only make a few brief specific comments and one larger one. The larger one relates to something you’ve noted elsewhere in the thread, which is:
This work forms part of the evidence base behind some strong claims from HLI about where to give money, so I did expect it to be more rigorous. I wondered if I was alone in being surprised here, so I did a very informal (n = 23!) Twitter poll in the EA group asking about what people expected re: the rigor of evidence for charity recommendations. (I fixed my stupid Our World in Data autocorrect glitch in a follow up tweet).
I don’t want to lean on this too much, but I do think it suggests that I’m not alone in expecting a higher degree of rigor when it comes to where to put charity dollars. This is perhaps mostly a communication issue, but I also think that as quality of analysis and evidence becomes less rigorous then claims should be toned down or at least the uncertainty (in the broad sense) needs to be more strongly expressed.
On the specifics, first, I appreciate you noting the apparent publication bias. That’s both important and not great.
Second, I think comparing the cash transfer funnel plot to the other one is informative. The cash transfer one looks “right”. It has the correct shape and it’s comforting to see the Egger regression line is basically zero. This is definitely not the case with the StrongMinds MA. The funnel plot looks incredibly weird, which could be heterogeneity that we can model but should regardless make everyone skeptical because doing that kind of modelling well is very hard. It’s also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero. In other words, unwinding publication bias in this way would lead us to guess at a true effect of around nothing. Do I believe that? I’m not sure. There are good reasons to be skeptical of Egger-type regressions, but all of this definitely increases my skepticism of the results. While I’m glad it’s public now, I don’t feel great that this wasn’t part of the very public first cut of the results.
Again, I appreciate you responding. I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
Interesting poll Ryan! I’m not sure how much to take away because I think epistemic / evidentiary standards is pretty fuzzy in the minds of most readers. But still, point taken that people probably expect high standards.
I’m not sure about that. Here’s the output of the Egger test. If I’m interpreting it correctly then that’s smaller, but not zero. I’ll try to figure out how what the p-curve suggested correction says.
Edit: I’m also not sure how much to trust the Egger test to tell me what the corrected effect size should be, so this wasn’t an endorsement that I think the real effect size should be halfed. It seems different ways of making this correction give very different answers. I’ll add a further comment with more details.
Seems reasonable.
Fair re: Egger. I just eyeballed the figure.