Hey Vasco—thanks for this. As we write in the update, we are unlikely to pay much mind to scaling in the near future, in part because it has proved difficult to put numbers onto scaling in a way that satisfies us.
Morgan Fairless
Growth Research, a Methods Site, and SADs 2.0 (AIM Research Update)
AIM’s new charity taxonomy
Hi TSondo! Thanks for a wonderful analysis of structural patterns in AIM’s Cost-effectiveness Analyses. As AIM’s Research Director, I respond to your findings and suggestions here. Filip Murár and Vicky Cox, Senior Research Managers, contributed to this response.
Finding one: probability of success
We recognize that it is disconcerting to see us changing approach so much year on year. We take a fairly pragmatic approach to modelling and have a low bar for implementing changes where we think appropriate, caring less about comparability across years. Therefore, we are less concerned about year-on-year changes in methodology. For the avoidance of doubt, our CEAs are pretty specific to the task of forecasting a broad point estimate for a plausible future intervention (where we have very little detail about how it will be implemented). They should not be taken literally as evaluations of a certain intervention or charity like GiveWell or Animal Charity Evaluators.However, you also identify wide variation in shorter timeframes. This is more worrying, as we do want good comparability within a round. We now use 100% for direct delivery and custom amounts for other types of interventions (namely policy or those relying a lot on persuasion of other actors). In these cases, we may use different approaches to estimate a best-guess, which is what is happening in some of the cases you mention. I agree that we should make this choice more explicit, and will take you up on your suggestion to clarify methodology for selecting p(Success). Thanks for the suggestion!
Finding two: validity adjustments
Bang on. We recognize this is an inconsistency. It has largely risen from our constant finding that each evidence base requires a specific treatment, and we haven’t worried much about it given this fact. However, your feedback has reignited an old debate about standardizing validity discounts in the way you suggest. Candidly, I can’t promise any specific changes until we think a bit more about how to approach this, but we are keen to at least establish more guidance and use the guidance in quality assurance processes. We’ll likely draw heavily from this great piece for thatFinding three: defaults left unchanged
Another good point. We’ll start being clearer about active choices vs. defaults in future (in the notes). I think this mostly improves reasoning transparency and clearly is needed.Other stuff you mention
For the avoidance of doubt—the patterns resonate. We were aware of most of these. However, your feedback is helpful for us to calibrate on the relative priority of different improvements we notice. As a small team, this type of feedback to help us figure out what to prioritize is helpful.We are quite keen to hear more about your suggestion for deeper analysis. I will email you about this next week!
I think this is fair—it would be ridiculous to expect disclosure of the use of a dictionary. I’d be in favour of this being down to social norms / personal behaviour because I think it’s not the most clear thing in the world.
I still personally think there’s something qualitatively different. Imagine you have to go through 80 pages of calculations and the author tells you they used a calculator which routinely makes errors at random. In theory you expect the author to back their work and have checked it… in practice.… I worry lots of people don’t.. As a consumer I’d rather know what tool was used.
Another analogy would be how code and packages use are standard disclosures in papers.
Reasoning transparency demands AI-use disclosure
Announcing Applications for the AIM Research Program April 2025
Hey Emmanuel,
This is great! I am the incoming director of research for Ambitious Impact—I’ll try to reach out via email over the next few weeks with some thoughts on how we can trade notes.
Hi Vasco,
Thanks for your suggestion. I have limited capacity so apologies if I don’t answer promptly. Thanks for your suggestion on how to present CEAs, we’ll think about this further.
Note: I am certainly not an animal welfare expert, and there may be very different views on this that I am not covering.
I am not sure marginal research money should be spent on pain scaling questions, I am also not bullish on most pain scaling surveys.
I prefer having stronger research on the quantification of pain in different production systems (i.e., funding welfare science that aligns to existing proposed frameworks) and exploration of other research questions.
I believe this because:
I think that existing disagregated (non scaled) metrics already allow us to make reasonable guesses for what is cost effective to help animals.
I am not confident that we will ever gain lots of clarity on scaling, as there are lots of known unknowns and reasonable disagreements on questions up and downstream from how to scale things (sentience, range, etc.)
TTO surveys are super interesting and informative. As per the above I’d personally spend money on other research questions instead.
If someone were to go ahead with this. I’d be curious to try to predict what the learnings from such a survey could tangibly influence, as I expect they’d get tangled in a bunch of questions about validity anyways. Given the subjective nature of pain, I am not convinced a survey sampling from such a specific group is particularly externally valid (I also don’t know that any survey of humans will ever be externally valid to how we’d scale pain across other animals).