I found (I think) the spreadsheet for the included studies here. I did a lazy replication (i.e. excluding duplicate follow-ups from studies, only including the 30 studies where ârawâ means and SDs were extracted, then plugging this into metamar). I copy and paste the (random effects) forest plot and funnel plot belowâdoubtless you would be able to perform a much more rigorous replication.
This is why we like to see these plots! Thank you Gregory, though this should not have been on you to do.
Having results like this underpin a charity recommendation and not showing it all transparently is a bad look for HLI. Hopefully there has been a mistake in your attempted replication and that explains e.g. the funnel plot. I look forward to reading the responses to your questions to Joel.
Iâd love to hear which parts of my comment people disagree with. I think the following points, which I tried to make in my comment, are uncontentious:
The plots I requested are indeed informative, and they cast some doubt on the credibility of the original meta-analysis
Basic meta-analysis plots like a forest or funnel plot, which are incredible common in meta-analyses, should have been provided by the authors rather than made by community members
Relatedly, transparency in the strength and/âor quality of evidence underpinning charity recommendation is good (not checking the strength or quality of evidence is bad, as is not sharing that information if one did check)
The funnel plot looks very asymmetric as well as just weird, and it would be nice if this was due to e.g. data entry mistakes by Gregory as opposed to anything else
I didnât vote, but people may feel ânot showing it all transparently is a bad look for HLIâ is a little premature and unfriendly without allowing HLI time for a response to fresh analysis.
Thank you for responding Jason. That makes sense. The analysis under question here was done in Oct 2021, so I do think there was enough time to check a funnel plot for publication bias or odd heterogeneity. I really do think itâs a bad look if no one checked for this, and itâs a worse look if people checked and didnât report it. This is why I hope the issue is something like data entry.
Your core point is still fair though: There might be other explanations for this that Iâm not considering, so while waiting for clarification from HLI I should be clear that Iâm agnostic on motives or anything else. Everyone here is trying.
Our preferred model uses a meta-regression with the follow-up time as a moderator, not the typical âaverage everythingâ meta-analysis. Because of my experience presenting the cash transfers meta-analysis, I wanted to avoid people fixating on the forest plot and getting confused about the results since itâs not the takeaway result. But In hindsight I think it probably would have been helpful to include the forest plot somewhere.
I donât have a good excuse for the publication bias analysis. Instead of making a funnel plot I embarked on a quest to try and find a more general system for adjusting for biases between intervention literatures. This was, perhaps unsurprisingly, an incomplete work that failed to achieve many of its aims (see Appendix C) -- but it did lead to a discount of psychotherapyâs effects relative to cash transfers. In hindsight, I see the time spent on that mini project as a distraction. In the future I think we will spend more time focusing on using extant ways to adjust for publication bias quantitatively.
Part of the reasoning was because we werenât trying to do a systematic meta-analysis, but trying to do a quicker version on a convenience sample of studies. As we said on page 8 âThese studies are not exhaustive (footnote: There are at least 24 studies, with an estimated total sample size of 2,310, we did not extract. Additionally, there appear to be several protocols registered to run trials studying the effectiveness and cost of non-specialist-delivered mental health interventions.). We stopped collecting new studies due to time constraints and the perception of diminishing returns.â
I wasnât sure if a funnel plot was appropriate when applied to a non-systematically selected sample of studies. As Iâve said elsewhere, I think we could have made the depth (or shallowness) of our analysis more clear.
so I do think there was enough time to check a funnel plot for publication bias or odd heterogeneity
While thatâs technically true that there was enough time, It certainly doesnât feel like it! -- HLI is a very small research organization (from 2020 through 2021 I was pretty much the lone HLI empirical researcher), and we have to constantly balance between exploring new cause areas /â searching for interventions, and updating /â improving previous analyses. It feels like I hit publish on this yesterday. I concede that I could have done better, and I plan on doing so in the future, but this balancing act is an art. It sometimes takes conversations like this to put items on our agenda.
FWIW, here some quick plots I cooked up with the cleaner data. Some obvious remarks:
The StrongMinds relevant studies (Bolton et al., 2003; Bass et al., 2006) appear to be unusually effective (outliers?).
There appears more evidence of publication bias than was the case with our cash transfers meta-analysis (see last plot).
I also added a p-curve. What you donât want to see is a larger number of studies at the 0.05 mark than the 0.04 significance level, but thatâs what you see here.
Thank you for sharing these Joel. Youâve got a lot going on in the comments here, so Iâm going only make a few brief specific comments and one larger one. The larger one relates to something youâve noted elsewhere in the thread, which is:
âThat the quality of this analysis was an attempt to be more rigorous than most shallow EA analyses, but definitely less rigorous than an quality peer reviewed academic paper. I think this [...] is not something we clearly communicated.â
This work forms part of the evidence base behind some strong claims from HLI about where to give money, so I did expect it to be more rigorous. I wondered if I was alone in being surprised here, so I did a very informal (n = 23!) Twitter poll in the EA group asking about what people expected re: the rigor of evidence for charity recommendations. (I fixed my stupid Our World in Data autocorrect glitch in a follow up tweet).
I donât want to lean on this too much, but I do think it suggests that Iâm not alone in expecting a higher degree of rigor when it comes to where to put charity dollars. This is perhaps mostly a communication issue, but I also think that as quality of analysis and evidence becomes less rigorous then claims should be toned down or at least the uncertainty (in the broad sense) needs to be more strongly expressed.
On the specifics, first, I appreciate you noting the apparent publication bias. Thatâs both important and not great.
Second, I think comparing the cash transfer funnel plot to the other one is informative. The cash transfer one looks ârightâ. It has the correct shape and itâs comforting to see the Egger regression line is basically zero. This is definitely not the case with the StrongMinds MA. The funnel plot looks incredibly weird, which could be heterogeneity that we can model but should regardless make everyone skeptical because doing that kind of modelling well is very hard. Itâs also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero. In other words, unwinding publication bias in this way would lead us to guess at a true effect of around nothing. Do I believe that? Iâm not sure. There are good reasons to be skeptical of Egger-type regressions, but all of this definitely increases my skepticism of the results. While Iâm glad itâs public now, I donât feel great that this wasnât part of the very public first cut of the results.
Again, I appreciate you responding. I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
Interesting poll Ryan! Iâm not sure how much to take away because I think epistemic /â evidentiary standards is pretty fuzzy in the minds of most readers. But still, point taken that people probably expect high standards.
Itâs also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero.
Iâm not sure about that. Hereâs the output of the Egger test. If Iâm interpreting it correctly then thatâs smaller, but not zero. Iâll try to figure out how what the p-curve suggested correction says.
Edit: Iâm also not sure how much to trust the Egger test to tell me what the corrected effect size should be, so this wasnât an endorsement that I think the real effect size should be halfed. It seems different ways of making this correction give very different answers. Iâll add a further comment with more details.
I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
I found (I think) the spreadsheet for the included studies here. I did a lazy replication (i.e. excluding duplicate follow-ups from studies, only including the 30 studies where ârawâ means and SDs were extracted, then plugging this into metamar). I copy and paste the (random effects) forest plot and funnel plot belowâdoubtless you would be able to perform a much more rigorous replication.
This is why we like to see these plots! Thank you Gregory, though this should not have been on you to do.
Having results like this underpin a charity recommendation and not showing it all transparently is a bad look for HLI. Hopefully there has been a mistake in your attempted replication and that explains e.g. the funnel plot. I look forward to reading the responses to your questions to Joel.
Iâd love to hear which parts of my comment people disagree with. I think the following points, which I tried to make in my comment, are uncontentious:
The plots I requested are indeed informative, and they cast some doubt on the credibility of the original meta-analysis
Basic meta-analysis plots like a forest or funnel plot, which are incredible common in meta-analyses, should have been provided by the authors rather than made by community members
Relatedly, transparency in the strength and/âor quality of evidence underpinning charity recommendation is good (not checking the strength or quality of evidence is bad, as is not sharing that information if one did check)
The funnel plot looks very asymmetric as well as just weird, and it would be nice if this was due to e.g. data entry mistakes by Gregory as opposed to anything else
I didnât vote, but people may feel ânot showing it all transparently is a bad look for HLIâ is a little premature and unfriendly without allowing HLI time for a response to fresh analysis.
Thank you for responding Jason. That makes sense. The analysis under question here was done in Oct 2021, so I do think there was enough time to check a funnel plot for publication bias or odd heterogeneity. I really do think itâs a bad look if no one checked for this, and itâs a worse look if people checked and didnât report it. This is why I hope the issue is something like data entry.
Your core point is still fair though: There might be other explanations for this that Iâm not considering, so while waiting for clarification from HLI I should be clear that Iâm agnostic on motives or anything else. Everyone here is trying.
Hi Ryan,
Our preferred model uses a meta-regression with the follow-up time as a moderator, not the typical âaverage everythingâ meta-analysis. Because of my experience presenting the cash transfers meta-analysis, I wanted to avoid people fixating on the forest plot and getting confused about the results since itâs not the takeaway result. But In hindsight I think it probably would have been helpful to include the forest plot somewhere.
I donât have a good excuse for the publication bias analysis. Instead of making a funnel plot I embarked on a quest to try and find a more general system for adjusting for biases between intervention literatures. This was, perhaps unsurprisingly, an incomplete work that failed to achieve many of its aims (see Appendix C) -- but it did lead to a discount of psychotherapyâs effects relative to cash transfers. In hindsight, I see the time spent on that mini project as a distraction. In the future I think we will spend more time focusing on using extant ways to adjust for publication bias quantitatively.
Part of the reasoning was because we werenât trying to do a systematic meta-analysis, but trying to do a quicker version on a convenience sample of studies. As we said on page 8 âThese studies are not exhaustive (footnote: There are at least 24 studies, with an estimated total sample size of 2,310, we did not extract. Additionally, there appear to be several protocols registered to run trials studying the effectiveness and cost of non-specialist-delivered mental health interventions.). We stopped collecting new studies due to time constraints and the perception of diminishing returns.â
I wasnât sure if a funnel plot was appropriate when applied to a non-systematically selected sample of studies. As Iâve said elsewhere, I think we could have made the depth (or shallowness) of our analysis more clear.
While thatâs technically true that there was enough time, It certainly doesnât feel like it! -- HLI is a very small research organization (from 2020 through 2021 I was pretty much the lone HLI empirical researcher), and we have to constantly balance between exploring new cause areas /â searching for interventions, and updating /â improving previous analyses. It feels like I hit publish on this yesterday. I concede that I could have done better, and I plan on doing so in the future, but this balancing act is an art. It sometimes takes conversations like this to put items on our agenda.
FWIW, here some quick plots I cooked up with the cleaner data. Some obvious remarks:
The StrongMinds relevant studies (Bolton et al., 2003; Bass et al., 2006) appear to be unusually effective (outliers?).
There appears more evidence of publication bias than was the case with our cash transfers meta-analysis (see last plot).
I also added a p-curve. What you donât want to see is a larger number of studies at the 0.05 mark than the 0.04 significance level, but thatâs what you see here.
Here are the cash transfer plots for reference:
Thank you for sharing these Joel. Youâve got a lot going on in the comments here, so Iâm going only make a few brief specific comments and one larger one. The larger one relates to something youâve noted elsewhere in the thread, which is:
This work forms part of the evidence base behind some strong claims from HLI about where to give money, so I did expect it to be more rigorous. I wondered if I was alone in being surprised here, so I did a very informal (n = 23!) Twitter poll in the EA group asking about what people expected re: the rigor of evidence for charity recommendations. (I fixed my stupid Our World in Data autocorrect glitch in a follow up tweet).
I donât want to lean on this too much, but I do think it suggests that Iâm not alone in expecting a higher degree of rigor when it comes to where to put charity dollars. This is perhaps mostly a communication issue, but I also think that as quality of analysis and evidence becomes less rigorous then claims should be toned down or at least the uncertainty (in the broad sense) needs to be more strongly expressed.
On the specifics, first, I appreciate you noting the apparent publication bias. Thatâs both important and not great.
Second, I think comparing the cash transfer funnel plot to the other one is informative. The cash transfer one looks ârightâ. It has the correct shape and itâs comforting to see the Egger regression line is basically zero. This is definitely not the case with the StrongMinds MA. The funnel plot looks incredibly weird, which could be heterogeneity that we can model but should regardless make everyone skeptical because doing that kind of modelling well is very hard. Itâs also rough to see that if we project the Egger regression line back to the origin then the predicted effect when the SE is zero is basically zero. In other words, unwinding publication bias in this way would lead us to guess at a true effect of around nothing. Do I believe that? Iâm not sure. There are good reasons to be skeptical of Egger-type regressions, but all of this definitely increases my skepticism of the results. While Iâm glad itâs public now, I donât feel great that this wasnât part of the very public first cut of the results.
Again, I appreciate you responding. I do think going forward it would be worth taking seriously community expectations about what underlies charity recommendations, and if something is tentative or rough then I hope that it gets clearly communicated as such, both originally and in downstream uses.
Interesting poll Ryan! Iâm not sure how much to take away because I think epistemic /â evidentiary standards is pretty fuzzy in the minds of most readers. But still, point taken that people probably expect high standards.
Iâm not sure about that. Hereâs the output of the Egger test. If Iâm interpreting it correctly then thatâs smaller, but not zero. Iâll try to figure out how what the p-curve suggested correction says.
Edit: Iâm also not sure how much to trust the Egger test to tell me what the corrected effect size should be, so this wasnât an endorsement that I think the real effect size should be halfed. It seems different ways of making this correction give very different answers. Iâll add a further comment with more details.
Seems reasonable.
Fair re: Egger. I just eyeballed the figure.
Hello, as far as I understood this post that charity plays an important role in our life. Moreover we should pay more attention to it