• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.” That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.
It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
I think you misunderstood what I was saying at least a bit, in that I did read the post in reasonably close detail (about a total of half an hour of reading) and was aware of most of your comment.
I will try to find the time to write a longer response that tries to explain my case in more detail, but can’t currently make any promises. I expect there are some larger inferential distances here that would take a while to cross for both of us.
Hey Oli, thanks for taking the time to come up with these points, and going out of your way to say, “...I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition...and would like to see more people trying similar things in the future.” This is exactly the type of attitude toward agency and attempting to do good that I’d like to have encouraged more in EA.
Point-by-point, I think Derek covered a lot. I also mention in a comment how I was thinking about this evaluation in terms of a contribution to grant evaluation and the EA project space more broadly.
We might have done better to distill cruxes within our qualitative reasoning, though I do think a fair amount of this is presented in various sections. Agreed that swapping advanced mathematical models for BOTECs is often advisable, but at certain points in the future, I would imagine that evaluators could make good use of methods like these.
Hi Oliver.
Thanks for your comments.
I think there are some reasonable points here.
• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.”
That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.
It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
Thanks again for engaging with this!
Thanks for the response!
I think you misunderstood what I was saying at least a bit, in that I did read the post in reasonably close detail (about a total of half an hour of reading) and was aware of most of your comment.
I will try to find the time to write a longer response that tries to explain my case in more detail, but can’t currently make any promises. I expect there are some larger inferential distances here that would take a while to cross for both of us.
Yeah, I did wonder if we were talking past each other a bit, and I’d be interested to clear that up – but no worries if you don’t have time.
Hey Oli, thanks for taking the time to come up with these points, and going out of your way to say, “...I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition...and would like to see more people trying similar things in the future.” This is exactly the type of attitude toward agency and attempting to do good that I’d like to have encouraged more in EA.
Point-by-point, I think Derek covered a lot. I also mention in a comment how I was thinking about this evaluation in terms of a contribution to grant evaluation and the EA project space more broadly.
We might have done better to distill cruxes within our qualitative reasoning, though I do think a fair amount of this is presented in various sections. Agreed that swapping advanced mathematical models for BOTECs is often advisable, but at certain points in the future, I would imagine that evaluators could make good use of methods like these.