the thrust of what you’re saying is “we should do uncertainty analysis (use Monte Carlo simulations instead of point-estimates) as our cost-effectiveness might be sensitive to it”
Yup, this is the thrust of it.
But you haven’t shown that GiveWell’s estimates are sensitive to a reliance on point estimates (have you?)
I think I have—conditionally. The uncertainty analysis shows that, if you think the neutral uncertainty I use as input is an acceptable approximation, substantially different rankings are within the bounds of plausibility. If I put in my own best estimates, the conclusion would still be conditional. It’s just that instead of being conditional upon “if you think the neutral uncertainty I use as input is an acceptable approximation” it’s conditional upon “if you think my best estimates of the uncertainty are an acceptable approximation”.
So the summary point there is that there’s really no way to escape conditional conclusions within a subjective Bayesian framework. Conclusions will always be of the form “Conclusion C is true if you accept prior beliefs B”. This makes generic, public communication hard (as we’re seeing!), but offers lots of benefits too (which I tried to demonstrate in the post—e.g. an explicit quantification of uncertainty, a sense of which inputs are most influential).
here’s a new, really complicated methodology we could use
If I’ve given the impression that it’s really complicated, I think might have misled. One of the things I really like about the approach is that you pay a relatively modest fixed cost and then you get this kind of analysis “for free”. By which I mean the complexity doesn’t infect all your actual modeling code. For example, the GiveDirectly model here actually reads more clearly to me than the corresponding spreadsheet because I’m not constantly jumping around trying to figure out what the cell reference (e.g. B23) means in formulas.
Admittedly, some of the stuff about delta moment-independent sensitivity analysis and different distance metrics is a bit more complicated. But the distance metric stuff is specific to this particular problem—not the methodology in general—and the sensitivity analysis can largely be treated as a black box. As long as you understand what the properties of the resulting number are (e.g. ranges from 0-1, 0 means independence), the internal workings aren’t crucial.
I think it would actually be very useful for you to input your best guess inputs (and its likely to be more useful for you to do it than an average EA, given you’ve thought about this more)
Given the responses here, I think I will go ahead and try that approach. Though I guess even better would be getting GiveWell’s uncertainty on all the inputs (rather than just the inputs highlighted in the “User weights” and “Moral inputs” tab).
Sorry for adding even more text to what’s already a lot of text :). Hope that helps.
Did you ever get round to running the analysis with your best guess inputs?
If that revealed substantial decision uncertainty (and especially if you were very uncertain about your inputs), I’d also like to see it run with GiveWell’s inputs. They could be aggregated distributions from multiple staff members, elicited using standard methods, or in some cases perhaps ‘official’ GiveWell consensus distributions. I’m kind of surprised this doesn’t seem to have been done already, given obvious issues with using point estimates in non-linear models. Or do you have reason to believe the ranking and cost-effectiveness ratios would not be sensitive to methodological changes like this?
Thanks for your thoughts.
Yup, this is the thrust of it.
I think I have—conditionally. The uncertainty analysis shows that, if you think the neutral uncertainty I use as input is an acceptable approximation, substantially different rankings are within the bounds of plausibility. If I put in my own best estimates, the conclusion would still be conditional. It’s just that instead of being conditional upon “if you think the neutral uncertainty I use as input is an acceptable approximation” it’s conditional upon “if you think my best estimates of the uncertainty are an acceptable approximation”.
So the summary point there is that there’s really no way to escape conditional conclusions within a subjective Bayesian framework. Conclusions will always be of the form “Conclusion C is true if you accept prior beliefs B”. This makes generic, public communication hard (as we’re seeing!), but offers lots of benefits too (which I tried to demonstrate in the post—e.g. an explicit quantification of uncertainty, a sense of which inputs are most influential).
If I’ve given the impression that it’s really complicated, I think might have misled. One of the things I really like about the approach is that you pay a relatively modest fixed cost and then you get this kind of analysis “for free”. By which I mean the complexity doesn’t infect all your actual modeling code. For example, the GiveDirectly model here actually reads more clearly to me than the corresponding spreadsheet because I’m not constantly jumping around trying to figure out what the cell reference (e.g. B23) means in formulas.
Admittedly, some of the stuff about delta moment-independent sensitivity analysis and different distance metrics is a bit more complicated. But the distance metric stuff is specific to this particular problem—not the methodology in general—and the sensitivity analysis can largely be treated as a black box. As long as you understand what the properties of the resulting number are (e.g. ranges from 0-1, 0 means independence), the internal workings aren’t crucial.
Given the responses here, I think I will go ahead and try that approach. Though I guess even better would be getting GiveWell’s uncertainty on all the inputs (rather than just the inputs highlighted in the “User weights” and “Moral inputs” tab).
Sorry for adding even more text to what’s already a lot of text :). Hope that helps.
Did you ever get round to running the analysis with your best guess inputs?
If that revealed substantial decision uncertainty (and especially if you were very uncertain about your inputs), I’d also like to see it run with GiveWell’s inputs. They could be aggregated distributions from multiple staff members, elicited using standard methods, or in some cases perhaps ‘official’ GiveWell consensus distributions. I’m kind of surprised this doesn’t seem to have been done already, given obvious issues with using point estimates in non-linear models. Or do you have reason to believe the ranking and cost-effectiveness ratios would not be sensitive to methodological changes like this?