First of all, I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition. I applaud the effort and dedication that went into this report, and would like to see more people trying similar things in the future.
Tee Barnett asked me for feedback in a private message. Here is a very slightly edited version of my response (hency why it is more off-the-cuff than I would usually post on the forum):
-------
Hmm, I don’t know. I looked at the cost-effectiveness section and feel mostly that the post is overemphasizing formal models. Like, after reading the whole thing, and looking at the spreadsheet for 5 minutes I am still unable to answer the following core questions:
What is the basic argument for Donational?
Does that argument hold up after looking into it in more detail?
How does the quality of that argument compare against other things in the space?
What has donational done so far?
What evidence do we have about its operations?
If you do a naive simple fermi estimate on donational’s effectiveness, what is the bottom line?
I think I would have preferred just one individual writing a post titled “Why I am not excited about Donational”, that just tries to explain clearly, like you would in a conversation, why they don’t think it’s a good idea, or how they have come to change their mind.
Obviously I am strongly in favor of people doing evaluations like this, though I don’t think I am a huge fan of the format that this one chose.
------- (end of quote)
On a broader level, I think there might be some philosophical assumptions about the way this post deals with modeling cause prioritization that I disagree with. I have this sense that the primary purpose of mathematical analysis in most contexts is to help someone build a deeper understanding of a problem by helping them make their assumptions explicit and to clarify the consequences of their assumptions, and that after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.
When I look at this post, I have a lot of trouble understanding the actual reasons for why someone might think Donational is a good idea, and what arguments would (and maybe have) convinced them otherwise. Instead I see a large amount of rigor being poured into a single cost-effectiveness model, with a result that I am pretty confident could have been replaced by some pretty straightforward fermi point-estimates.
I think there is nothing wrong with also doing sensitivity analyses and more complicated parameter estimation, but in this context it seems that all of that mostly obscures the core aspects of the underlying uncertainty and makes it harder for both the reader to understand what the basic case for Donational is (and why it fails), and (in my model) for the people constructing the model to actually interface with the core questions at hand.
All of this doesn’t mean that the tools employed here are never the correct tools to be used, but I do think that when trying to produce an evaluation that is primarily designed for external consumption, I would prefer much more emphasis to be given to clear explanations of the basic idea behind the organizations and an explanation of a set of cruxes and observations that would change the evaluators mind, instead of this much emphasis on both the creation of detailed mathematical models and the explanation of those models.
• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.” That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.
It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
I think you misunderstood what I was saying at least a bit, in that I did read the post in reasonably close detail (about a total of half an hour of reading) and was aware of most of your comment.
I will try to find the time to write a longer response that tries to explain my case in more detail, but can’t currently make any promises. I expect there are some larger inferential distances here that would take a while to cross for both of us.
Hey Oli, thanks for taking the time to come up with these points, and going out of your way to say, “...I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition...and would like to see more people trying similar things in the future.” This is exactly the type of attitude toward agency and attempting to do good that I’d like to have encouraged more in EA.
Point-by-point, I think Derek covered a lot. I also mention in a comment how I was thinking about this evaluation in terms of a contribution to grant evaluation and the EA project space more broadly.
We might have done better to distill cruxes within our qualitative reasoning, though I do think a fair amount of this is presented in various sections. Agreed that swapping advanced mathematical models for BOTECs is often advisable, but at certain points in the future, I would imagine that evaluators could make good use of methods like these.
First of all, I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition. I applaud the effort and dedication that went into this report, and would like to see more people trying similar things in the future.
Tee Barnett asked me for feedback in a private message. Here is a very slightly edited version of my response (hency why it is more off-the-cuff than I would usually post on the forum):
-------
Hmm, I don’t know. I looked at the cost-effectiveness section and feel mostly that the post is overemphasizing formal models. Like, after reading the whole thing, and looking at the spreadsheet for 5 minutes I am still unable to answer the following core questions:
What is the basic argument for Donational?
Does that argument hold up after looking into it in more detail?
How does the quality of that argument compare against other things in the space?
What has donational done so far?
What evidence do we have about its operations?
If you do a naive simple fermi estimate on donational’s effectiveness, what is the bottom line?
I think I would have preferred just one individual writing a post titled “Why I am not excited about Donational”, that just tries to explain clearly, like you would in a conversation, why they don’t think it’s a good idea, or how they have come to change their mind.
Obviously I am strongly in favor of people doing evaluations like this, though I don’t think I am a huge fan of the format that this one chose.
------- (end of quote)
On a broader level, I think there might be some philosophical assumptions about the way this post deals with modeling cause prioritization that I disagree with. I have this sense that the primary purpose of mathematical analysis in most contexts is to help someone build a deeper understanding of a problem by helping them make their assumptions explicit and to clarify the consequences of their assumptions, and that after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.
When I look at this post, I have a lot of trouble understanding the actual reasons for why someone might think Donational is a good idea, and what arguments would (and maybe have) convinced them otherwise. Instead I see a large amount of rigor being poured into a single cost-effectiveness model, with a result that I am pretty confident could have been replaced by some pretty straightforward fermi point-estimates.
I think there is nothing wrong with also doing sensitivity analyses and more complicated parameter estimation, but in this context it seems that all of that mostly obscures the core aspects of the underlying uncertainty and makes it harder for both the reader to understand what the basic case for Donational is (and why it fails), and (in my model) for the people constructing the model to actually interface with the core questions at hand.
All of this doesn’t mean that the tools employed here are never the correct tools to be used, but I do think that when trying to produce an evaluation that is primarily designed for external consumption, I would prefer much more emphasis to be given to clear explanations of the basic idea behind the organizations and an explanation of a set of cruxes and observations that would change the evaluators mind, instead of this much emphasis on both the creation of detailed mathematical models and the explanation of those models.
Hi Oliver.
Thanks for your comments.
I think there are some reasonable points here.
• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.”
That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.
It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
Thanks again for engaging with this!
Thanks for the response!
I think you misunderstood what I was saying at least a bit, in that I did read the post in reasonably close detail (about a total of half an hour of reading) and was aware of most of your comment.
I will try to find the time to write a longer response that tries to explain my case in more detail, but can’t currently make any promises. I expect there are some larger inferential distances here that would take a while to cross for both of us.
Yeah, I did wonder if we were talking past each other a bit, and I’d be interested to clear that up – but no worries if you don’t have time.
Hey Oli, thanks for taking the time to come up with these points, and going out of your way to say, “...I think evaluations like this are quite important and a core part of what I think of as EA’s value proposition...and would like to see more people trying similar things in the future.” This is exactly the type of attitude toward agency and attempting to do good that I’d like to have encouraged more in EA.
Point-by-point, I think Derek covered a lot. I also mention in a comment how I was thinking about this evaluation in terms of a contribution to grant evaluation and the EA project space more broadly.
We might have done better to distill cruxes within our qualitative reasoning, though I do think a fair amount of this is presented in various sections. Agreed that swapping advanced mathematical models for BOTECs is often advisable, but at certain points in the future, I would imagine that evaluators could make good use of methods like these.