Hi TSondo! Thanks for a wonderful analysis of structural patterns in AIM’s Cost-effectiveness Analyses. As AIM’s Research Director, I respond to your findings and suggestions here. Filip Murár and Vicky Cox, Senior Research Managers, contributed to this response.
Finding one: probability of success We recognize that it is disconcerting to see us changing approach so much year on year. We take a fairly pragmatic approach to modelling and have a low bar for implementing changes where we think appropriate, caring less about comparability across years. Therefore, we are less concerned about year-on-year changes in methodology. For the avoidance of doubt, our CEAs are pretty specific to the task of forecasting a broad point estimate for a plausible future intervention (where we have very little detail about how it will be implemented). They should not be taken literally as evaluations of a certain intervention or charity like GiveWell or Animal Charity Evaluators.
However, you also identify wide variation in shorter timeframes. This is more worrying, as we do want good comparability within a round. We now use 100% for direct delivery and custom amounts for other types of interventions (namely policy or those relying a lot on persuasion of other actors). In these cases, we may use different approaches to estimate a best-guess, which is what is happening in some of the cases you mention. I agree that we should make this choice more explicit, and will take you up on your suggestion to clarify methodology for selecting p(Success). Thanks for the suggestion!
Finding two: validity adjustments Bang on. We recognize this is an inconsistency. It has largely risen from our constant finding that each evidence base requires a specific treatment, and we haven’t worried much about it given this fact. However, your feedback has reignited an old debate about standardizing validity discounts in the way you suggest. Candidly, I can’t promise any specific changes until we think a bit more about how to approach this, but we are keen to at least establish more guidance and use the guidance in quality assurance processes. We’ll likely draw heavily from this great piece for that
Finding three: defaults left unchanged Another good point. We’ll start being clearer about active choices vs. defaults in future (in the notes). I think this mostly improves reasoning transparency and clearly is needed.
Other stuff you mention For the avoidance of doubt—the patterns resonate. We were aware of most of these. However, your feedback is helpful for us to calibrate on the relative priority of different improvements we notice. As a small team, this type of feedback to help us figure out what to prioritize is helpful.
We are quite keen to hear more about your suggestion for deeper analysis. I will email you about this next week!
Thanks Morgan! I appreciate the detailed response from you, Filip, and Vicky. Glad the patterns were useful. I’ll look out for your email and am happy to dig into whatever direction would be most useful on your end.
Hi TSondo! Thanks for a wonderful analysis of structural patterns in AIM’s Cost-effectiveness Analyses. As AIM’s Research Director, I respond to your findings and suggestions here. Filip Murár and Vicky Cox, Senior Research Managers, contributed to this response.
Finding one: probability of success
We recognize that it is disconcerting to see us changing approach so much year on year. We take a fairly pragmatic approach to modelling and have a low bar for implementing changes where we think appropriate, caring less about comparability across years. Therefore, we are less concerned about year-on-year changes in methodology. For the avoidance of doubt, our CEAs are pretty specific to the task of forecasting a broad point estimate for a plausible future intervention (where we have very little detail about how it will be implemented). They should not be taken literally as evaluations of a certain intervention or charity like GiveWell or Animal Charity Evaluators.
However, you also identify wide variation in shorter timeframes. This is more worrying, as we do want good comparability within a round. We now use 100% for direct delivery and custom amounts for other types of interventions (namely policy or those relying a lot on persuasion of other actors). In these cases, we may use different approaches to estimate a best-guess, which is what is happening in some of the cases you mention. I agree that we should make this choice more explicit, and will take you up on your suggestion to clarify methodology for selecting p(Success). Thanks for the suggestion!
Finding two: validity adjustments
Bang on. We recognize this is an inconsistency. It has largely risen from our constant finding that each evidence base requires a specific treatment, and we haven’t worried much about it given this fact. However, your feedback has reignited an old debate about standardizing validity discounts in the way you suggest. Candidly, I can’t promise any specific changes until we think a bit more about how to approach this, but we are keen to at least establish more guidance and use the guidance in quality assurance processes. We’ll likely draw heavily from this great piece for that
Finding three: defaults left unchanged
Another good point. We’ll start being clearer about active choices vs. defaults in future (in the notes). I think this mostly improves reasoning transparency and clearly is needed.
Other stuff you mention
For the avoidance of doubt—the patterns resonate. We were aware of most of these. However, your feedback is helpful for us to calibrate on the relative priority of different improvements we notice. As a small team, this type of feedback to help us figure out what to prioritize is helpful.
We are quite keen to hear more about your suggestion for deeper analysis. I will email you about this next week!
Thanks Morgan! I appreciate the detailed response from you, Filip, and Vicky. Glad the patterns were useful. I’ll look out for your email and am happy to dig into whatever direction would be most useful on your end.
Tsondo