Something I’ve wondered is whether GiveWell has looked at whether its methods are robust against “Pascal’s mugging” type situations, where a very high estimate of expected value of an intervention leads to it being chosen even when it seems very implausible a priori. The deworming case seems to fit this mould to me somewhat—an RCT finding a high expected impact despite no clear large near term health benefits and no reason to think there’s another mechanism to getting income improvements (as I understand it) does seem a bit like the hypothetical mugger promising to give a high reward despite limited reason to expect it to be true (though not as extreme as in the philosophical thought experiments).
Actually, doing a bit of searching turned up that Pascal’s mugging has been discussed in an old 2011 post on the GiveWell blog here, but only abstractly and not in the context of any real decisions. The post seems to argue that past some point, based on Bayesian reasoning, “the greater [the ‘explicit expected-value’ estimate] is, the lower the expected value of Action A”. So by that logic, it’s potentially the case that had the deworming RCT turned up a higher, even harder to believe estimate of the effect on income, a good evaluation could have given a lower estimate of expected value. Discounting the RCT expected value by a constant factor that is independent of the RCT result doesn’t capture this. (But I’ve not gone through the maths of the post to tell how general the result is.)
The post goes on to say ‘The point at which a threat or proposal starts to be called “Pascal’s Mugging” can be thought of as the point at which the claimed value of Action A is wildly outside the prior set by life experience (which may cause the feeling that common sense is being violated)’. Maybe it’s not common sense being violated in the case of deworming, but it does seem quite hard to think of a good explanation for the results (for an amateur reader like me anyway). Has any analysis been done on whether the deworming trial results should be considered past this point? It seems to me that that would require coming up with a prior estimate and checking that the posterior expectation does behave sensibly as hypothetical RCT results go beyond what seems plausible a priori. Of course, thinking may have evolved a lot since that post, but it seems to pick up on some key points to me.
It looks like >$10M were given by GiveWell to deworming programs in 2023, and from what I can tell it looks like a large proportion of funds given to the “All Grants” fund went to this cause area, so it does seem quite important to get the reasoning here correct. Since learning about the issues with the deworming studies, I’ve wondered whether donations to this cause can currently make sense—as an academic, my life experience tells me not to take big actions based on results from individual published studies! And this acts as a barrier to feeling comfortable with donating to the “All Grants” fund for me, even though I’d like to handover more of the decision-making to GiveWell otherwise.
We don’t always try to convert the answers to these questions to the same “currency” as our cost-effectiveness estimates, because we think entertaining multiple perspectives ultimately makes our decision-making more robust. We’ve previously written about this here, and we think these arguments still ring true. In particular, we think cluster-style thinking (Figure 6) handles unknown-unknowns in a more robust way, as we find that expert opinion is often a good predictor of “which way the arguments I haven’t thought of yet will point.”
This is the blog post being referenced. Its about exactly the problem you describe.
Hmm it’s not very clear to me that it would be effective at addressing the problem—it seems a bit abstract as described. And addressing Pascal’s mugging issues seems like it potentially requires modifying how cost effectiveness estimates are done ie modifying one component of the “cluster” rather than it just being a cluster vs sequence thinking matter. It would be good to hear more about how this kind of thinking is influencing decisions about giving grants in actual cases like deworming if it is being used.
Pascal’s mugging should be addressed by a prior which is more sceptical of extreme estimates.
GiveWell are approximating that process here:
We’re reluctant to take this estimate at face value because (i) this result has not been replicated elsewhere and (ii) it seems implausibly large given the more muted effects on intermediate outcomes (e.g., years of schooling).
It’s a potential solution, but I think it requires the prior to decrease quickly enough with increasing cost effectiveness, and this isn’t guaranteed. So I’m wondering is there any analysis to show that the methods being used are actually robust to this problem e.g. exploring sensitivity to how answers would look if the deworming RCT results had been higher or lower and that they change sensibly?
A document that looks to give more info on the method used for deworming looks to be here, so perhaps that can be built on—but from a quick look it doesn’t seem to say exactly what shape is being used for the priors in all cases, though they look quite Gaussian from the plots.
Reflecting, in the everything-is-Gaussian case a prior doesn’t help much. Here, your posterior mean is a weighted average of prior and likelihood, with the weights depending only on the variance of the two distributions. So if the likelihood mean increases but with constant variance then your posterior mean increases linearly. You’d probably need a bias term or something in your model (if you’re doing this formally).
This might actually be an argument in favour of GiveWell’s current approach, assuming they’d discount more as the study estimate becomes increasinly implausible.
exploring sensitivity to how answers would look if the deworming RCT results had been higher or lower and that they change sensibly?
Do you just mean that the change in the posterior expectation is in the correct direction? In that case, we know the answer from theory: yes, for any prior and a wide range of likelihood functions.
Andrews et al. 1972 (Lemma 1) shows that when the signal B is normally distributed, with mean T, then, for any prior distribution over T, E[T|B=b] is increasing in b.
This was generalised by Ma 1999 (Corollary 1.3) to any likelihood function arising from a B that (i) has T as a location parameter, and (ii) is strongly unimodally distributed.
I guess it depends on what the “correct direction” is thought to be. From the reasoning quoted in my first post, it could be the case that as the study result becomes larger the posterior expectation should actually reduce. It’s not inconceivable that as we saw the estimate go to infinity, we should start reasoning that the study is so ridiculous as to be uninformative and so not the posterior update becomes smaller. But I don’t know. What you say seems to suggest that Bayesian reasoning could only do that for rather specific choices of likelihood functions, which is interesting.
A lognormal prior (and a normal likelihood function) might be a good starting point when adjusting for the statistical uncertainty in an effect size estimate. The resulting posterior cannot be calculated in closed form, but I have a simple website that calculates it using numerical methods. Here’s an example.
Worth noting that adjusting for the statistical uncertainty in an effect size estimate is quite different from adjusting for the totality of our uncertainty in a cost-effectiveness estimate. For doing the latter, it’s unclear to me what likelihood function would be appropriate. I’d love to know if there are practical methods for choosing the likelihood function in these cases.
GiveWell does seem to be using mostly normal priors in the document you linked. I don’t have time to read the whole document and think carefully about what prior would be most appropriate. For its length (3,600 words including footnotes) the document doesn’t appear to give much reasoning for the choices of distribution families.
Something I’ve wondered is whether GiveWell has looked at whether its methods are robust against “Pascal’s mugging” type situations, where a very high estimate of expected value of an intervention leads to it being chosen even when it seems very implausible a priori. The deworming case seems to fit this mould to me somewhat—an RCT finding a high expected impact despite no clear large near term health benefits and no reason to think there’s another mechanism to getting income improvements (as I understand it) does seem a bit like the hypothetical mugger promising to give a high reward despite limited reason to expect it to be true (though not as extreme as in the philosophical thought experiments).
Actually, doing a bit of searching turned up that Pascal’s mugging has been discussed in an old 2011 post on the GiveWell blog here, but only abstractly and not in the context of any real decisions. The post seems to argue that past some point, based on Bayesian reasoning, “the greater [the ‘explicit expected-value’ estimate] is, the lower the expected value of Action A”. So by that logic, it’s potentially the case that had the deworming RCT turned up a higher, even harder to believe estimate of the effect on income, a good evaluation could have given a lower estimate of expected value. Discounting the RCT expected value by a constant factor that is independent of the RCT result doesn’t capture this. (But I’ve not gone through the maths of the post to tell how general the result is.)
The post goes on to say ‘The point at which a threat or proposal starts to be called “Pascal’s Mugging” can be thought of as the point at which the claimed value of Action A is wildly outside the prior set by life experience (which may cause the feeling that common sense is being violated)’. Maybe it’s not common sense being violated in the case of deworming, but it does seem quite hard to think of a good explanation for the results (for an amateur reader like me anyway). Has any analysis been done on whether the deworming trial results should be considered past this point? It seems to me that that would require coming up with a prior estimate and checking that the posterior expectation does behave sensibly as hypothetical RCT results go beyond what seems plausible a priori. Of course, thinking may have evolved a lot since that post, but it seems to pick up on some key points to me.
It looks like >$10M were given by GiveWell to deworming programs in 2023, and from what I can tell it looks like a large proportion of funds given to the “All Grants” fund went to this cause area, so it does seem quite important to get the reasoning here correct. Since learning about the issues with the deworming studies, I’ve wondered whether donations to this cause can currently make sense—as an academic, my life experience tells me not to take big actions based on results from individual published studies! And this acts as a barrier to feeling comfortable with donating to the “All Grants” fund for me, even though I’d like to handover more of the decision-making to GiveWell otherwise.
From the post:
This is the blog post being referenced. Its about exactly the problem you describe.
Hmm it’s not very clear to me that it would be effective at addressing the problem—it seems a bit abstract as described. And addressing Pascal’s mugging issues seems like it potentially requires modifying how cost effectiveness estimates are done ie modifying one component of the “cluster” rather than it just being a cluster vs sequence thinking matter. It would be good to hear more about how this kind of thinking is influencing decisions about giving grants in actual cases like deworming if it is being used.
Pascal’s mugging should be addressed by a prior which is more sceptical of extreme estimates.
GiveWell are approximating that process here:
It’s a potential solution, but I think it requires the prior to decrease quickly enough with increasing cost effectiveness, and this isn’t guaranteed. So I’m wondering is there any analysis to show that the methods being used are actually robust to this problem e.g. exploring sensitivity to how answers would look if the deworming RCT results had been higher or lower and that they change sensibly?
A document that looks to give more info on the method used for deworming looks to be here, so perhaps that can be built on—but from a quick look it doesn’t seem to say exactly what shape is being used for the priors in all cases, though they look quite Gaussian from the plots.
I agree.
Reflecting, in the everything-is-Gaussian case a prior doesn’t help much. Here, your posterior mean is a weighted average of prior and likelihood, with the weights depending only on the variance of the two distributions. So if the likelihood mean increases but with constant variance then your posterior mean increases linearly. You’d probably need a bias term or something in your model (if you’re doing this formally).
This might actually be an argument in favour of GiveWell’s current approach, assuming they’d discount more as the study estimate becomes increasinly implausible.
Do you just mean that the change in the posterior expectation is in the correct direction? In that case, we know the answer from theory: yes, for any prior and a wide range of likelihood functions.
Andrews et al. 1972 (Lemma 1) shows that when the signal
B
is normally distributed, with meanT
, then, for any prior distribution overT
,E[T|B=b]
is increasing inb
.This was generalised by Ma 1999 (Corollary 1.3) to any likelihood function arising from a
B
that (i) hasT
as a location parameter, and (ii) is strongly unimodally distributed.I guess it depends on what the “correct direction” is thought to be. From the reasoning quoted in my first post, it could be the case that as the study result becomes larger the posterior expectation should actually reduce. It’s not inconceivable that as we saw the estimate go to infinity, we should start reasoning that the study is so ridiculous as to be uninformative and so not the posterior update becomes smaller. But I don’t know. What you say seems to suggest that Bayesian reasoning could only do that for rather specific choices of likelihood functions, which is interesting.
A lognormal prior (and a normal likelihood function) might be a good starting point when adjusting for the statistical uncertainty in an effect size estimate. The resulting posterior cannot be calculated in closed form, but I have a simple website that calculates it using numerical methods. Here’s an example.
Worth noting that adjusting for the statistical uncertainty in an effect size estimate is quite different from adjusting for the totality of our uncertainty in a cost-effectiveness estimate. For doing the latter, it’s unclear to me what likelihood function would be appropriate. I’d love to know if there are practical methods for choosing the likelihood function in these cases.
GiveWell does seem to be using mostly normal priors in the document you linked. I don’t have time to read the whole document and think carefully about what prior would be most appropriate. For its length (3,600 words including footnotes) the document doesn’t appear to give much reasoning for the choices of distribution families.