“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose[1] psychotherapy’s effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
“To my not-very-well-trained eyes, one hint to me that there’s an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence.”
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it’s deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the “real” programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn’t quite as bad as it seems if we’re predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.
Hi Jason,
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with[1] – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it’s deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the “real” programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn’t quite as bad as it seems if we’re predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.