I think in general, our research is pretty unusual in that we are quite willing to publish research that has a fairly limited number of hours put into it. Partly, this is due to our research not being aimed at external actors (e.g., convincing funders, the broader animal movement, other orgs) as much as aimed at people already fairly convinced on founding a charity and aimed at a quite specific question of what would be the best org to found. We do take an approach that is more accepting of errors, particularly ones that do not affect endline decisions connected directly to founding a charity.
Do you think there are additional steps you could/should take to make this philosophy / these limitations clearer to would-be to those who come across your reports?
I strongly support more transparency and more release of materials (including less polished work product), but I think it is essential that the would-be secondary user is well aware of the limitations. This could include (e.g.) noting the amount of time spent on the report, the intended audience and use case for the report, the amount of reliance upon which you intend that audience to place on the report, any additional research you expect that intended audience to take before relying on the report, and the presence of any significant issues / weaknesses that may be of particular concern to either the intended audience or anticipated secondary users. If you specifically do not intend to correct any errors discovered after a certain time (e.g., after the idea was used or removed from recommended options), it would probably be good to state that as well.
Hi, I am Charity Entrepreneurship (CE, now AIM) Director of Research. I wanted to quickly respond to this point.
– –
Quality of our reports
I would like to push back a bit on Joey’s response here. I agree that our research is quicker scrappier and goes into less depth than other orgs, but I am not convinced that our reports have more errors or worse reasoning that reports of other organisations (thinking of non-peer reviewed global health and animal welfare organisations like GiveWell, OpenPhil, Animal Charity Evaluators, Rethink Priorities, Founders Pledge).
I don’t have strong evidence for thinking this. Mostly I am going of the amount of errors that incubates find in the reports. In each cohort we have ~10 potential founders digging into ~4-5 reports for a few weeks. I estimate there is on average roughly 0.8 non-trivial non-major errors (i.e. something that would change a CEA by ~20%) and 0 major errors highlighted by the potential founders. This seems in the same order of magnitude to the number of errors GiveWell get on scrutiny (e.g. here).
And ultimately all our reports are tested in the real world by people putting the ideas in practice. If our reports do not line up to reality in any major way we expect to find out when founders do their own research or a charity pivots or shuts down, as MHI has done recently.
One caveat to this is that I am more confident about the reports on the ideas we do recommend than the other reports on non-recommended ideas which receive less oversight internally as they are less decision relevant for founders, and receive less scrutiny from incubates and being put into action.
I note also that in this entire critique and having skimmed over the threads here no-one appears to have pointed out any actual errors in any CE report. So I find it hard to update on anything written here. (The possible exception is me, in this post, pointing to the MHI case which does seem unfortunately to have shut down in part due to an error in the initial research.)
So I think our quality of research is comparable to other orgs, but my evidence for this is weak and I have not done a thorough benchmarking. I would be interested in ways to test this. It could be a good idea for CE to run a change our mind context like GiveWell in order to test the robustness of our research. Something for me to consider. It could also be useful (although I doubt worth the error) to have some external research evaluator review our work and benchmark us against other organisations.
[EDIT: To be clear talking here about quality in terms of number of mistakes/errors. Agree our research is often shorter and as such is more willing to take shortcuts to reach conclusions.]
– –
That said I do agree that we should make it very very clear in all our reports the context of who the report is written for and why and what the reader should take from the report. We do this in the introduction section to all our reports and I will review the introduction for future reports to make sure this is absolutely clear.
I think it is quite clear that a lot of your research isn’t at the bar of those other organizations (though I think for the reasons Joey mentioned, that definitely can be okay). For example, I think in this report, collapsing 30 million species with diverse life histories into a single “Wild bug” and then taking what appear to be completely uncalibrated guesses at their life conditions, then using that to compare to other species is just well below the quality standards of other organizations in the space, even if it is a useful way to get a quick sense of things.
[previous comment is deleted, because I accidentally sent an unfinished one]
Thanks for the example! That makes sense and makes me wonder if part of the disagreement came from thinking about different reference classes. I agree that, in general, the research we did in our first year of operations, so 2018/2019, is well below the quality standard we expect of ourselves now, or what we expected of ourselves even in 2020. I agree it is easy to find a lot of errors (that weren’t decision-relevant) in our research from that year. That is part of the reason they are not on the website anymore.
That being said, I still broadly support our decision not to spend more time on research that year. That’s because spending more time on it would have come at a cost of significant tradeoff. At the time, there was no other organization whose research we could have relied on, and the alternative to the assessment you mention was either to not compare interventions across species (or reduce it to a simplistic metric like “the number of animals affected” metric) or to spend more time on research and run Incubation Program a year later in which case we would have lost a year of impact and might not have started the charities we did. That would have been a big loss because for example, that year we incubated Suvita whose impact and promise were recently recognized by GiveWell that, provided Suvita with $3.3M to scale up, or we incubated Fish Welfare Initiative (FWI) and Animal Advocacy Careers a decision I still consider to be a good one (FWI is an ACE Recommended Charity, and even though I agree with its co-founders that their impact could be higher, I’m glad they exist). We also couldn’t simply hire more staff and do things more in-depth because it was our first year of operation, and there was not enough funding and other resources available for, at the time, an unproven project.
I wouldn’t want to spend more time on that, especially because one of the main principles of our research is “decision-relevance,” and the “wild bug” one-pager you mention or similar ones were not relevant. If it were, we would not have settled on something of that quality, and we would have put more time into it.
For what it is worth, I think there are things we could have done better. Specifically, we could have put more effort into communicating how little weight others should put on some of that research. We did that by stating at the top (for example, as in the wild bug one-pager you link), “these reports were 1-5 hours time-limited, depending on the animal, and thus are not fully comprehensive.” and at the time, we thought it was sufficient. But we could have stressed epistemic status even more strongly and in more places so it is clear to others that we put very little weight on it. For full transparency, we also made another mistake. We didn’t recommend working on banning/reducing bait fish as an idea at the time because, from our shallow research, it looked less promising, and later, upon researching it more in-depth, we decided to recommend it. It wouldn’t have made a difference then because there were not enough potential co-founders in year 1 to start more charities, but it was a mistake, nevertheless.
Do you think there are additional steps you could/should take to make this philosophy / these limitations clearer to would-be to those who come across your reports?
I strongly support more transparency and more release of materials (including less polished work product), but I think it is essential that the would-be secondary user is well aware of the limitations. This could include (e.g.) noting the amount of time spent on the report, the intended audience and use case for the report, the amount of reliance upon which you intend that audience to place on the report, any additional research you expect that intended audience to take before relying on the report, and the presence of any significant issues / weaknesses that may be of particular concern to either the intended audience or anticipated secondary users. If you specifically do not intend to correct any errors discovered after a certain time (e.g., after the idea was used or removed from recommended options), it would probably be good to state that as well.
Hi, I am Charity Entrepreneurship (CE, now AIM) Director of Research. I wanted to quickly respond to this point.
– –
Quality of our reports
I would like to push back a bit on Joey’s response here. I agree that our research is quicker scrappier and goes into less depth than other orgs, but I am not convinced that our reports have more errors or worse reasoning that reports of other organisations (thinking of non-peer reviewed global health and animal welfare organisations like GiveWell, OpenPhil, Animal Charity Evaluators, Rethink Priorities, Founders Pledge).
I don’t have strong evidence for thinking this. Mostly I am going of the amount of errors that incubates find in the reports. In each cohort we have ~10 potential founders digging into ~4-5 reports for a few weeks. I estimate there is on average roughly 0.8 non-trivial non-major errors (i.e. something that would change a CEA by ~20%) and 0 major errors highlighted by the potential founders. This seems in the same order of magnitude to the number of errors GiveWell get on scrutiny (e.g. here).
And ultimately all our reports are tested in the real world by people putting the ideas in practice. If our reports do not line up to reality in any major way we expect to find out when founders do their own research or a charity pivots or shuts down, as MHI has done recently.
One caveat to this is that I am more confident about the reports on the ideas we do recommend than the other reports on non-recommended ideas which receive less oversight internally as they are less decision relevant for founders, and receive less scrutiny from incubates and being put into action.
I note also that in this entire critique and having skimmed over the threads here no-one appears to have pointed out any actual errors in any CE report. So I find it hard to update on anything written here. (The possible exception is me, in this post, pointing to the MHI case which does seem unfortunately to have shut down in part due to an error in the initial research.)
So I think our quality of research is comparable to other orgs, but my evidence for this is weak and I have not done a thorough benchmarking. I would be interested in ways to test this. It could be a good idea for CE to run a change our mind context like GiveWell in order to test the robustness of our research. Something for me to consider. It could also be useful (although I doubt worth the error) to have some external research evaluator review our work and benchmark us against other organisations.
[EDIT: To be clear talking here about quality in terms of number of mistakes/errors. Agree our research is often shorter and as such is more willing to take shortcuts to reach conclusions.]
– –
That said I do agree that we should make it very very clear in all our reports the context of who the report is written for and why and what the reader should take from the report. We do this in the introduction section to all our reports and I will review the introduction for future reports to make sure this is absolutely clear.
I think it is quite clear that a lot of your research isn’t at the bar of those other organizations (though I think for the reasons Joey mentioned, that definitely can be okay). For example, I think in this report, collapsing 30 million species with diverse life histories into a single “Wild bug” and then taking what appear to be completely uncalibrated guesses at their life conditions, then using that to compare to other species is just well below the quality standards of other organizations in the space, even if it is a useful way to get a quick sense of things.
[previous comment is deleted, because I accidentally sent an unfinished one]
Thanks for the example! That makes sense and makes me wonder if part of the disagreement came from thinking about different reference classes. I agree that, in general, the research we did in our first year of operations, so 2018/2019, is well below the quality standard we expect of ourselves now, or what we expected of ourselves even in 2020. I agree it is easy to find a lot of errors (that weren’t decision-relevant) in our research from that year. That is part of the reason they are not on the website anymore.
That being said, I still broadly support our decision not to spend more time on research that year. That’s because spending more time on it would have come at a cost of significant tradeoff. At the time, there was no other organization whose research we could have relied on, and the alternative to the assessment you mention was either to not compare interventions across species (or reduce it to a simplistic metric like “the number of animals affected” metric) or to spend more time on research and run Incubation Program a year later in which case we would have lost a year of impact and might not have started the charities we did. That would have been a big loss because for example, that year we incubated Suvita whose impact and promise were recently recognized by GiveWell that, provided Suvita with $3.3M to scale up, or we incubated Fish Welfare Initiative (FWI) and Animal Advocacy Careers a decision I still consider to be a good one (FWI is an ACE Recommended Charity, and even though I agree with its co-founders that their impact could be higher, I’m glad they exist). We also couldn’t simply hire more staff and do things more in-depth because it was our first year of operation, and there was not enough funding and other resources available for, at the time, an unproven project.
I wouldn’t want to spend more time on that, especially because one of the main principles of our research is “decision-relevance,” and the “wild bug” one-pager you mention or similar ones were not relevant. If it were, we would not have settled on something of that quality, and we would have put more time into it.
For what it is worth, I think there are things we could have done better. Specifically, we could have put more effort into communicating how little weight others should put on some of that research. We did that by stating at the top (for example, as in the wild bug one-pager you link), “these reports were 1-5 hours time-limited, depending on the animal, and thus are not fully comprehensive.” and at the time, we thought it was sufficient. But we could have stressed epistemic status even more strongly and in more places so it is clear to others that we put very little weight on it. For full transparency, we also made another mistake. We didn’t recommend working on banning/reducing bait fish as an idea at the time because, from our shallow research, it looked less promising, and later, upon researching it more in-depth, we decided to recommend it. It wouldn’t have made a difference then because there were not enough potential co-founders in year 1 to start more charities, but it was a mistake, nevertheless.