I think the fundamental point (i.e. âYou cannot use the distribution for the expected value of an average therapy treatment as the prior distribution for a SPECIFIC therapy treatment, as there will be a large amount of variation between possible therapy treatments that is missed when doing this.â) is on the right lines, although subsequent discussion of fixed/ârandom effect models might confuse the issue. (Cf. my reply to Jason).
The typical output of a meta-analysis is an (~) average effect size estimate (the diamond at the bottom of the forest plot, etc.) The confidence interval given for that is (very roughly)[1] the interval we predict the true average effect likely lies. So for the basic model given in Section 4 of the report, the average effect size is 0.64, 95% CI (0.54 â 0.74). So (again, roughly) our best guess of the âtrueâ average effect size of psychotherapy in LMICs from our data is 0.64, and weâre 95% sure(*) this average is somewhere between (0.54, 0.74).
Clearly, it is not the case that if we draw another study from the same population, we should be 95% confident(*) the effect size of this new data point will lie between 0.54 to 0.74. This would not be true even in the unicorn case thereâs no between study heterogeneity (e.g. all the studies are measuring the same effect modulo sampling variance), and even less so when this is marked, as here. To answer that question, what you want is a prediction interval.[2] This interval is always wider, and almost always significantly so, than the confidence interval for the average effect: in the same analysis with the 0.54-0.74 confidence interval, the prediction interval was â0.27 to 1.55.
Although the full model HLI uses in constructing informed priors is different from that presented in S4 (e.g. it includes a bunch of moderators), they appear to be constructed with monte carlo on the confidenceintervals for the average, not the prediction interval for the data. So I believe the informed prior is actually one of the (adjusted) âAverage effect of psychotherapy interventions as a wholeâ, not a prior for (e.g.) âthe effect size reported in a given PT study.â The latter would need to use the prediction intervals, and have a much wider distribution.[3]
I think this ably explains exactly why the Bayesian method for (e.g.) Strongminds gives very bizarre results when deployed as the report does, but they do make much more sense if re-interpreted as (in essence) computing the expected effect size of âa future strongminds-like interventionâ, but not the effect size we should believe StrongMinds actually has once in receipt of trial data upon it specifically. E.g.:
The histogram of effect sizes shows some comparisons had an effect size < 0, but the âinformed priorâ suggests P(ES < 0) is extremely low. As a prior for the effect size of the next study, it is much too confident, given the data, a trial will report positive effects (you have >1/â72 studies being negative, so surely it cannot be <1%, etc.). As a prior for the average effect size, this confidence is warranted: given the large number of studies in our sample, most of which report positive effects, we would be very surprised to discover the true average effect size is negative.
The prior doesnât update very much on data provided. E.g. When we stipulate the trials upon strongminds report a near-zero effect of 0.05 WELLBYs, our estimate of 1.49 WELLBYS goes to 1.26: so we should (apparently) believe in such a circumstance the efficacy of SM is ~25 times greater than the trial data upon it indicates. This is, obviously, absurd. However, such a small update is appropriate if it were to ~the average of PT interventions as a whole: that we observe a new PT intervention has much below average results should cause our average to shift a little towards the new findings, but not much.
In essence, the update we are interested in is not âHow effective should we expect future interventions like Strongminds are given the data on Strongminds efficacyâ, but simply âHow effective should we expect Strongminds is given the data on how effective Strongminds isâ. Given the massive heterogeneity and wide prediction interval, the (correct) informed prior is pretty uninformative, as it isnât that surprised by anything in a very wide range of values, and so on finding trial data on SM with a given estimate in this range, our estimate should update to match it pretty closely.[4]
(This also should mean, unlike the report suggests, the SM estimate is not that ârobustâ to adverse data. Eyeballing it, Iâd guess the posterior should be going down by a factor of 2-3 conditional on the stipulated data versus currently reported results).
Iâm aware confidence intervals are not credible intervals, and that âthe 95% CI tells you where the true value is with 95% likelihoodâ strictly misinterprets what a confidence interval is, etc. (see) But perhaps âclose enoughâ, so Iâm going to pretend these are credible intervals, and asterisk each time I assume the strictly incorrect interpretation.
The summary estimate and confidence interval from a random-effects meta-analysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the meta-analysis. This is inappropriate. The confidence interval from a random-effects meta-analysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a meta-analysis, we may obtain a very tight confidence interval around the random-effects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3).
Obviously, modulo all the other issues I suggest with both the meta-analysis as a whole, that we in fact would incorporate other sources of information into our actual prior, etc. etc.
(@Burner1989 @David Rhys Bernard @Karthik Tadepalli)
I think the fundamental point (i.e. âYou cannot use the distribution for the expected value of an average therapy treatment as the prior distribution for a SPECIFIC therapy treatment, as there will be a large amount of variation between possible therapy treatments that is missed when doing this.â) is on the right lines, although subsequent discussion of fixed/ârandom effect models might confuse the issue. (Cf. my reply to Jason).
The typical output of a meta-analysis is an (~) average effect size estimate (the diamond at the bottom of the forest plot, etc.) The confidence interval given for that is (very roughly)[1] the interval we predict the true average effect likely lies. So for the basic model given in Section 4 of the report, the average effect size is 0.64, 95% CI (0.54 â 0.74). So (again, roughly) our best guess of the âtrueâ average effect size of psychotherapy in LMICs from our data is 0.64, and weâre 95% sure(*) this average is somewhere between (0.54, 0.74).
Clearly, it is not the case that if we draw another study from the same population, we should be 95% confident(*) the effect size of this new data point will lie between 0.54 to 0.74. This would not be true even in the unicorn case thereâs no between study heterogeneity (e.g. all the studies are measuring the same effect modulo sampling variance), and even less so when this is marked, as here. To answer that question, what you want is a prediction interval.[2] This interval is always wider, and almost always significantly so, than the confidence interval for the average effect: in the same analysis with the 0.54-0.74 confidence interval, the prediction interval was â0.27 to 1.55.
Although the full model HLI uses in constructing informed priors is different from that presented in S4 (e.g. it includes a bunch of moderators), they appear to be constructed with monte carlo on the confidence intervals for the average, not the prediction interval for the data. So I believe the informed prior is actually one of the (adjusted) âAverage effect of psychotherapy interventions as a wholeâ, not a prior for (e.g.) âthe effect size reported in a given PT study.â The latter would need to use the prediction intervals, and have a much wider distribution.[3]
I think this ably explains exactly why the Bayesian method for (e.g.) Strongminds gives very bizarre results when deployed as the report does, but they do make much more sense if re-interpreted as (in essence) computing the expected effect size of âa future strongminds-like interventionâ, but not the effect size we should believe StrongMinds actually has once in receipt of trial data upon it specifically. E.g.:
The histogram of effect sizes shows some comparisons had an effect size < 0, but the âinformed priorâ suggests P(ES < 0) is extremely low. As a prior for the effect size of the next study, it is much too confident, given the data, a trial will report positive effects (you have >1/â72 studies being negative, so surely it cannot be <1%, etc.). As a prior for the average effect size, this confidence is warranted: given the large number of studies in our sample, most of which report positive effects, we would be very surprised to discover the true average effect size is negative.
The prior doesnât update very much on data provided. E.g. When we stipulate the trials upon strongminds report a near-zero effect of 0.05 WELLBYs, our estimate of 1.49 WELLBYS goes to 1.26: so we should (apparently) believe in such a circumstance the efficacy of SM is ~25 times greater than the trial data upon it indicates. This is, obviously, absurd. However, such a small update is appropriate if it were to ~the average of PT interventions as a whole: that we observe a new PT intervention has much below average results should cause our average to shift a little towards the new findings, but not much.
In essence, the update we are interested in is not âHow effective should we expect future interventions like Strongminds are given the data on Strongminds efficacyâ, but simply âHow effective should we expect Strongminds is given the data on how effective Strongminds isâ. Given the massive heterogeneity and wide prediction interval, the (correct) informed prior is pretty uninformative, as it isnât that surprised by anything in a very wide range of values, and so on finding trial data on SM with a given estimate in this range, our estimate should update to match it pretty closely.[4]
(This also should mean, unlike the report suggests, the SM estimate is not that ârobustâ to adverse data. Eyeballing it, Iâd guess the posterior should be going down by a factor of 2-3 conditional on the stipulated data versus currently reported results).
Iâm aware confidence intervals are not credible intervals, and that âthe 95% CI tells you where the true value is with 95% likelihoodâ strictly misinterprets what a confidence interval is, etc. (see) But perhaps âclose enoughâ, so Iâm going to pretend these are credible intervals, and asterisk each time I assume the strictly incorrect interpretation.
Cf. Cochrane:
Although I think the same mean, so it will give the right âbest guessâ initial estimates.
Obviously, modulo all the other issues I suggest with both the meta-analysis as a whole, that we in fact would incorporate other sources of information into our actual prior, etc. etc.