First a meta note less directly connected to the response:
Our funding circles fund a lot of different groups, and there is no joint pot, so itâs closer to a moderated discussion about a given cause area than CE/âAIM making granting calls. We are not looking for people to donate to us or our charities, and as far as I understand, OpenPhil and AWF do not have a participatory way to get involved other than just donating to their joint pot directly. This request is more aimed at people who want to put in significant personal time to making decisions independent from existing funding actors.
More connected response:
Thanks for the thoughts, and the support you have given our past charities. I can give a few quick comments on this. Our research team might also respond a bit more deeply.
1) Research quality: I think in general, our research is pretty unusual in that we are quite willing to publish research that has a fairly limited number of hours put into it. Partly, this is due to our research not being aimed at external actors (e.g., convincing funders, the broader animal movement, other orgs) as much as aimed at people already fairly convinced on founding a charity and aimed at a quite specific question of what would be the best org to found. We do take an approach that is more accepting of errors, particularly ones that do not affect endline decisions connected directly to founding a charity. E.g., For starting a charity on fish in a given country, we are not really concerned about the number of fish farmed unless that number is significant determining factor in terms of founding a charity in that space. We have gone back and forth as to how much transparency to have on research and how much time to spend per report and have not come to a fixed answer. We are more likely to get criticism/âpushback on higher transparency + lower hours per report but typically think it will still lead to more charities that are promising in the end.
2) CEâs animal charity quality: I think both our ordering and assessment of charity quality would be different from what is described here. I also think animal welfare funds and Open Philâs (both of who have funded the majority of these projects) assessments would also not match your description. However, in some ways, these are small differences as our general estimate is that 2â5 charities in a given area are highly promising. It is quite a hits-based game, and that is the number we would expect (and would rank internally) about how many charities are performing really well.
2.5) Feedback on animal charities: I did a quick review of charities that got the most positive vs negative feedback at the time of idea recommendation from the broader animal community relative to your rank order and relative to our internal one and did not find a correlation. Generally, I think the space is pretty uncertain and thus the charity that got the most positive expectations were typically the charities that deviated the smallest amount from other actions already taken in the space. I think that putting more time into the research reports (including getting more feedback) is one way to improve charity quality (at the cost of quantity) but Iâm pretty skeptical itâs the best way. So far, the biggest predictive factor has not been idea strength but the founder team, so when thinking about where to spend marginal resources to improve charities, I would still lean that way (although itâs far from clear if that will always be the case).
3) I would be interested in doing a survey on this to get better data. I get the impression that we are seen as pretty disconnected from the animal space (and I think that is fairly true). I think we are far more involved in e.g., the EA space both when it comes to more formal research and when it comes to softer social engagement. I think our charities tend to go deeper into whatever area they are focusing on than our team does, and I am pretty comfortable with that. I would not be surprised if we both were invited less and attended less coordination events and meetings connected to the animal space; we like to stay focused quite directly on the problems we are working on.
Thanks again for writing this up. I put some chance that these are issues that are correct and important enough to prioritize, and itâs valuable to get pushback and flags even if we end up disagreeing about the actions to take.
It would be helpful if you engaged with the plagiarism claims, because it is concerning that CE is running researcher training programs while failing to handle that well. I agree with the rest of what you say here as being tricky, but think that it is pretty bad that you publish the low confidence research publicly, and itâs led to confusion in the animal space.
+ 2.5 - I think if your ordering is significantly different, itâs probably fairly different than most people in the space, so thatâs somewhat surprising/âan indicator that lots of feedback isnât reaching you all.
To be clear, I am certain that CE staff have not been invited to events in the animal welfare space due to impressions of your organization being unwilling to be cooperative.
My main view is that animal donors should seriously engage in a vetting process prior to taking large amounts of guidance on donations from CE /â shouldnât update on your research in meaningful ways. I still think CE is probably the best bet in the animal space for future new very high impact organizations in the space as well though, so itâs a tricky balance to critique CE. Iâd bet that a fair number of the best giving opportunities in the animal space in 5 years will have come out of CE, but that itâll also have come with a large amount of generally avoidable wastes of funding and talent.
I think in general, our research is pretty unusual in that we are quite willing to publish research that has a fairly limited number of hours put into it. Partly, this is due to our research not being aimed at external actors (e.g., convincing funders, the broader animal movement, other orgs) as much as aimed at people already fairly convinced on founding a charity and aimed at a quite specific question of what would be the best org to found. We do take an approach that is more accepting of errors, particularly ones that do not affect endline decisions connected directly to founding a charity.
Do you think there are additional steps you could/âshould take to make this philosophy /â these limitations clearer to would-be to those who come across your reports?
I strongly support more transparency and more release of materials (including less polished work product), but I think it is essential that the would-be secondary user is well aware of the limitations. This could include (e.g.) noting the amount of time spent on the report, the intended audience and use case for the report, the amount of reliance upon which you intend that audience to place on the report, any additional research you expect that intended audience to take before relying on the report, and the presence of any significant issues /â weaknesses that may be of particular concern to either the intended audience or anticipated secondary users. If you specifically do not intend to correct any errors discovered after a certain time (e.g., after the idea was used or removed from recommended options), it would probably be good to state that as well.
Hi, I am Charity Entrepreneurship (CE, now AIM) Director of Research. I wanted to quickly respond to this point.
â â
Quality of our reports
I would like to push back a bit on Joeyâs response here. I agree that our research is quicker scrappier and goes into less depth than other orgs, but I am not convinced that our reports have more errors or worse reasoning that reports of other organisations (thinking of non-peer reviewed global health and animal welfare organisations like GiveWell, OpenPhil, Animal Charity Evaluators, Rethink Priorities, Founders Pledge).
I donât have strong evidence for thinking this. Mostly I am going of the amount of errors that incubates find in the reports. In each cohort we have ~10 potential founders digging into ~4-5 reports for a few weeks. I estimate there is on average roughly 0.8 non-trivial non-major errors (i.e. something that would change a CEA by ~20%) and 0 major errors highlighted by the potential founders. This seems in the same order of magnitude to the number of errors GiveWell get on scrutiny (e.g. here).
And ultimately all our reports are tested in the real world by people putting the ideas in practice. If our reports do not line up to reality in any major way we expect to find out when founders do their own research or a charity pivots or shuts down, as MHI has done recently.
One caveat to this is that I am more confident about the reports on the ideas we do recommend than the other reports on non-recommended ideas which receive less oversight internally as they are less decision relevant for founders, and receive less scrutiny from incubates and being put into action.
I note also that in this entire critique and having skimmed over the threads here no-one appears to have pointed out any actual errors in any CE report. So I find it hard to update on anything written here. (The possible exception is me, in this post, pointing to the MHI case which does seem unfortunately to have shut down in part due to an error in the initial research.)
So I think our quality of research is comparable to other orgs, but my evidence for this is weak and I have not done a thorough benchmarking. I would be interested in ways to test this. It could be a good idea for CE to run a change our mind context like GiveWell in order to test the robustness of our research. Something for me to consider. It could also be useful (although I doubt worth the error) to have some external research evaluator review our work and benchmark us against other organisations.
[EDIT: To be clear talking here about quality in terms of number of mistakes/âerrors. Agree our research is often shorter and as such is more willing to take shortcuts to reach conclusions.]
â â
That said I do agree that we should make it very very clear in all our reports the context of who the report is written for and why and what the reader should take from the report. We do this in the introduction section to all our reports and I will review the introduction for future reports to make sure this is absolutely clear.
I think it is quite clear that a lot of your research isnât at the bar of those other organizations (though I think for the reasons Joey mentioned, that definitely can be okay). For example, I think in this report, collapsing 30 million species with diverse life histories into a single âWild bugâ and then taking what appear to be completely uncalibrated guesses at their life conditions, then using that to compare to other species is just well below the quality standards of other organizations in the space, even if it is a useful way to get a quick sense of things.
[previous comment is deleted, because I accidentally sent an unfinished one]
Thanks for the example! That makes sense and makes me wonder if part of the disagreement came from thinking about different reference classes. I agree that, in general, the research we did in our first year of operations, so 2018/â2019, is well below the quality standard we expect of ourselves now, or what we expected of ourselves even in 2020. I agree it is easy to find a lot of errors (that werenât decision-relevant) in our research from that year. That is part of the reason they are not on the website anymore.
That being said, I still broadly support our decision not to spend more time on research that year. Thatâs because spending more time on it would have come at a cost of significant tradeoff. At the time, there was no other organization whose research we could have relied on, and the alternative to the assessment you mention was either to not compare interventions across species (or reduce it to a simplistic metric like âthe number of animals affectedâ metric) or to spend more time on research and run Incubation Program a year later in which case we would have lost a year of impact and might not have started the charities we did. That would have been a big loss because for example, that year we incubated Suvita whose impact and promise were recently recognized by GiveWell that, provided Suvita with $3.3M to scale up, or we incubated Fish Welfare Initiative (FWI) and Animal Advocacy Careers a decision I still consider to be a good one (FWI is an ACE Recommended Charity, and even though I agree with its co-founders that their impact could be higher, Iâm glad they exist). We also couldnât simply hire more staff and do things more in-depth because it was our first year of operation, and there was not enough funding and other resources available for, at the time, an unproven project.
I wouldnât want to spend more time on that, especially because one of the main principles of our research is âdecision-relevance,â and the âwild bugâ one-pager you mention or similar ones were not relevant. If it were, we would not have settled on something of that quality, and we would have put more time into it.
For what it is worth, I think there are things we could have done better. Specifically, we could have put more effort into communicating how little weight others should put on some of that research. We did that by stating at the top (for example, as in the wild bug one-pager you link), âthese reports were 1-5 hours time-limited, depending on the animal, and thus are not fully comprehensive.â and at the time, we thought it was sufficient. But we could have stressed epistemic status even more strongly and in more places so it is clear to others that we put very little weight on it. For full transparency, we also made another mistake. We didnât recommend working on banning/âreducing bait fish as an idea at the time because, from our shallow research, it looked less promising, and later, upon researching it more in-depth, we decided to recommend it. It wouldnât have made a difference then because there were not enough potential co-founders in year 1 to start more charities, but it was a mistake, nevertheless.
First a meta note less directly connected to the response:
Our funding circles fund a lot of different groups, and there is no joint pot, so itâs closer to a moderated discussion about a given cause area than CE/âAIM making granting calls. We are not looking for people to donate to us or our charities, and as far as I understand, OpenPhil and AWF do not have a participatory way to get involved other than just donating to their joint pot directly. This request is more aimed at people who want to put in significant personal time to making decisions independent from existing funding actors.
More connected response:
Thanks for the thoughts, and the support you have given our past charities. I can give a few quick comments on this. Our research team might also respond a bit more deeply.
1) Research quality: I think in general, our research is pretty unusual in that we are quite willing to publish research that has a fairly limited number of hours put into it. Partly, this is due to our research not being aimed at external actors (e.g., convincing funders, the broader animal movement, other orgs) as much as aimed at people already fairly convinced on founding a charity and aimed at a quite specific question of what would be the best org to found. We do take an approach that is more accepting of errors, particularly ones that do not affect endline decisions connected directly to founding a charity. E.g., For starting a charity on fish in a given country, we are not really concerned about the number of fish farmed unless that number is significant determining factor in terms of founding a charity in that space. We have gone back and forth as to how much transparency to have on research and how much time to spend per report and have not come to a fixed answer. We are more likely to get criticism/âpushback on higher transparency + lower hours per report but typically think it will still lead to more charities that are promising in the end.
2) CEâs animal charity quality: I think both our ordering and assessment of charity quality would be different from what is described here. I also think animal welfare funds and Open Philâs (both of who have funded the majority of these projects) assessments would also not match your description. However, in some ways, these are small differences as our general estimate is that 2â5 charities in a given area are highly promising. It is quite a hits-based game, and that is the number we would expect (and would rank internally) about how many charities are performing really well.
2.5) Feedback on animal charities: I did a quick review of charities that got the most positive vs negative feedback at the time of idea recommendation from the broader animal community relative to your rank order and relative to our internal one and did not find a correlation. Generally, I think the space is pretty uncertain and thus the charity that got the most positive expectations were typically the charities that deviated the smallest amount from other actions already taken in the space. I think that putting more time into the research reports (including getting more feedback) is one way to improve charity quality (at the cost of quantity) but Iâm pretty skeptical itâs the best way. So far, the biggest predictive factor has not been idea strength but the founder team, so when thinking about where to spend marginal resources to improve charities, I would still lean that way (although itâs far from clear if that will always be the case).
3) I would be interested in doing a survey on this to get better data. I get the impression that we are seen as pretty disconnected from the animal space (and I think that is fairly true). I think we are far more involved in e.g., the EA space both when it comes to more formal research and when it comes to softer social engagement. I think our charities tend to go deeper into whatever area they are focusing on than our team does, and I am pretty comfortable with that. I would not be surprised if we both were invited less and attended less coordination events and meetings connected to the animal space; we like to stay focused quite directly on the problems we are working on.
Thanks again for writing this up. I put some chance that these are issues that are correct and important enough to prioritize, and itâs valuable to get pushback and flags even if we end up disagreeing about the actions to take.
It would be helpful if you engaged with the plagiarism claims, because it is concerning that CE is running researcher training programs while failing to handle that well. I agree with the rest of what you say here as being tricky, but think that it is pretty bad that you publish the low confidence research publicly, and itâs led to confusion in the animal space.
+ 2.5 - I think if your ordering is significantly different, itâs probably fairly different than most people in the space, so thatâs somewhat surprising/âan indicator that lots of feedback isnât reaching you all.
To be clear, I am certain that CE staff have not been invited to events in the animal welfare space due to impressions of your organization being unwilling to be cooperative.
My main view is that animal donors should seriously engage in a vetting process prior to taking large amounts of guidance on donations from CE /â shouldnât update on your research in meaningful ways. I still think CE is probably the best bet in the animal space for future new very high impact organizations in the space as well though, so itâs a tricky balance to critique CE. Iâd bet that a fair number of the best giving opportunities in the animal space in 5 years will have come out of CE, but that itâll also have come with a large amount of generally avoidable wastes of funding and talent.
Do you think there are additional steps you could/âshould take to make this philosophy /â these limitations clearer to would-be to those who come across your reports?
I strongly support more transparency and more release of materials (including less polished work product), but I think it is essential that the would-be secondary user is well aware of the limitations. This could include (e.g.) noting the amount of time spent on the report, the intended audience and use case for the report, the amount of reliance upon which you intend that audience to place on the report, any additional research you expect that intended audience to take before relying on the report, and the presence of any significant issues /â weaknesses that may be of particular concern to either the intended audience or anticipated secondary users. If you specifically do not intend to correct any errors discovered after a certain time (e.g., after the idea was used or removed from recommended options), it would probably be good to state that as well.
Hi, I am Charity Entrepreneurship (CE, now AIM) Director of Research. I wanted to quickly respond to this point.
â â
Quality of our reports
I would like to push back a bit on Joeyâs response here. I agree that our research is quicker scrappier and goes into less depth than other orgs, but I am not convinced that our reports have more errors or worse reasoning that reports of other organisations (thinking of non-peer reviewed global health and animal welfare organisations like GiveWell, OpenPhil, Animal Charity Evaluators, Rethink Priorities, Founders Pledge).
I donât have strong evidence for thinking this. Mostly I am going of the amount of errors that incubates find in the reports. In each cohort we have ~10 potential founders digging into ~4-5 reports for a few weeks. I estimate there is on average roughly 0.8 non-trivial non-major errors (i.e. something that would change a CEA by ~20%) and 0 major errors highlighted by the potential founders. This seems in the same order of magnitude to the number of errors GiveWell get on scrutiny (e.g. here).
And ultimately all our reports are tested in the real world by people putting the ideas in practice. If our reports do not line up to reality in any major way we expect to find out when founders do their own research or a charity pivots or shuts down, as MHI has done recently.
One caveat to this is that I am more confident about the reports on the ideas we do recommend than the other reports on non-recommended ideas which receive less oversight internally as they are less decision relevant for founders, and receive less scrutiny from incubates and being put into action.
I note also that in this entire critique and having skimmed over the threads here no-one appears to have pointed out any actual errors in any CE report. So I find it hard to update on anything written here. (The possible exception is me, in this post, pointing to the MHI case which does seem unfortunately to have shut down in part due to an error in the initial research.)
So I think our quality of research is comparable to other orgs, but my evidence for this is weak and I have not done a thorough benchmarking. I would be interested in ways to test this. It could be a good idea for CE to run a change our mind context like GiveWell in order to test the robustness of our research. Something for me to consider. It could also be useful (although I doubt worth the error) to have some external research evaluator review our work and benchmark us against other organisations.
[EDIT: To be clear talking here about quality in terms of number of mistakes/âerrors. Agree our research is often shorter and as such is more willing to take shortcuts to reach conclusions.]
â â
That said I do agree that we should make it very very clear in all our reports the context of who the report is written for and why and what the reader should take from the report. We do this in the introduction section to all our reports and I will review the introduction for future reports to make sure this is absolutely clear.
I think it is quite clear that a lot of your research isnât at the bar of those other organizations (though I think for the reasons Joey mentioned, that definitely can be okay). For example, I think in this report, collapsing 30 million species with diverse life histories into a single âWild bugâ and then taking what appear to be completely uncalibrated guesses at their life conditions, then using that to compare to other species is just well below the quality standards of other organizations in the space, even if it is a useful way to get a quick sense of things.
[previous comment is deleted, because I accidentally sent an unfinished one]
Thanks for the example! That makes sense and makes me wonder if part of the disagreement came from thinking about different reference classes. I agree that, in general, the research we did in our first year of operations, so 2018/â2019, is well below the quality standard we expect of ourselves now, or what we expected of ourselves even in 2020. I agree it is easy to find a lot of errors (that werenât decision-relevant) in our research from that year. That is part of the reason they are not on the website anymore.
That being said, I still broadly support our decision not to spend more time on research that year. Thatâs because spending more time on it would have come at a cost of significant tradeoff. At the time, there was no other organization whose research we could have relied on, and the alternative to the assessment you mention was either to not compare interventions across species (or reduce it to a simplistic metric like âthe number of animals affectedâ metric) or to spend more time on research and run Incubation Program a year later in which case we would have lost a year of impact and might not have started the charities we did. That would have been a big loss because for example, that year we incubated Suvita whose impact and promise were recently recognized by GiveWell that, provided Suvita with $3.3M to scale up, or we incubated Fish Welfare Initiative (FWI) and Animal Advocacy Careers a decision I still consider to be a good one (FWI is an ACE Recommended Charity, and even though I agree with its co-founders that their impact could be higher, Iâm glad they exist). We also couldnât simply hire more staff and do things more in-depth because it was our first year of operation, and there was not enough funding and other resources available for, at the time, an unproven project.
I wouldnât want to spend more time on that, especially because one of the main principles of our research is âdecision-relevance,â and the âwild bugâ one-pager you mention or similar ones were not relevant. If it were, we would not have settled on something of that quality, and we would have put more time into it.
For what it is worth, I think there are things we could have done better. Specifically, we could have put more effort into communicating how little weight others should put on some of that research. We did that by stating at the top (for example, as in the wild bug one-pager you link), âthese reports were 1-5 hours time-limited, depending on the animal, and thus are not fully comprehensive.â and at the time, we thought it was sufficient. But we could have stressed epistemic status even more strongly and in more places so it is clear to others that we put very little weight on it. For full transparency, we also made another mistake. We didnât recommend working on banning/âreducing bait fish as an idea at the time because, from our shallow research, it looked less promising, and later, upon researching it more in-depth, we decided to recommend it. It wouldnât have made a difference then because there were not enough potential co-founders in year 1 to start more charities, but it was a mistake, nevertheless.