Thank you for writing this and for all the work you (and others) have put in over the years.
My question is to what extent you think CE’s impact measurement is tautological. If you determine something to be a high impact opportunity and then go and do it, aren’t you by definition doing things you estimate to be high impact (as long as you don’t screw up the execution or realise you made an error). To full adjust for selection effect, you would have to ignore all research conducted before the decision was made and rely solely on new data, which is probably quite hard to come by.
The 40% seems very high. For-profit start-ups have a much higher failure rate. If that’s true, that’s incredible, but I’d expect to see more like 5% of charities and 50% of funds.
I think tautological measurement is a real concern for basically every meta charity, although I’m not sure I agree with your solution. I think the better solution is external evaluation, someone like GiveWell or Founders Pledge who does not have any reason to value CE charities. Typically, these organizations do their own independent research and compare it across their current portfolio of projects. If CE can, for example, fairly consistently incubate charities that GW/FP/etc. rank as best in the world, I think that is at least not organizationally tautological (it is assuming that these charity evaluators are in fact identifying the best areas/charities, and replicating any flaws they have though).
In terms of success rate, I agree 40% is high but I would expect many NGO incubators to be considerably higher than in the for-profit space, for a few reasons (a couple listed below):
General competition: There are just not that many charities aiming for pure impact (in an EA way), unlike the for-profit market. The general efficiency of the charity market is pretty low, and thus there are lots of fairly easy wins.
Scale sensitivity: Generally, successful for-profits are seen as really large-scale ventures (e.g., unicorns) and the market is consistently hunting for that. Debatably, the only charity currently seen as highly impactful that can get to that sort of scale is GiveDirectly. Thus the bar for success in the charity sector is significantly lower in terms of a money spent scale. For example, if we founded a charity running with a 1m a year budget, that was x2 as effective as top GW ones, we could count that as a success. But an organization of the same size would be considered a rounding error by YC. If we take size expectations into account, it might be like 1⁄25 charities that we incubated that have any significant chance of getting to unicorn-level size.
Thanks, Joey. Really appreciate you taking the time to engage on these questions.
To be clear, I’m not seriously suggesting ignoring all research from before the decision. I’m just saying that mathematically, an independent test needs its backrest data to exclude all calibration data.
It strikes me that there are broadly 3 buckets of risk / potential failure:
Execution risk—this is significant and you can only find out by trying, but you only really know if you’re being successful with the left hand side of the theory of change
Logic risk—having an external organisation take a completely fresh view should solve most of this
Evidence risk—even with an external organisation marking your homework, they are still probably drawing on the same pool of research and that might suffer from survivorship bias
The high success rate almost makes me think CE should be incubating even more ambitious, riskier projects, with the expectation of a lower success rate but higher overall EV. Very uncertain about this intuition though, would be interested to hear what CE thinks.
We have thought about this but we are not confident weaker charities would not crowd out stronger ones with funders and thus lead to less overall impact.
Thank you for writing this and for all the work you (and others) have put in over the years.
My question is to what extent you think CE’s impact measurement is tautological. If you determine something to be a high impact opportunity and then go and do it, aren’t you by definition doing things you estimate to be high impact (as long as you don’t screw up the execution or realise you made an error). To full adjust for selection effect, you would have to ignore all research conducted before the decision was made and rely solely on new data, which is probably quite hard to come by.
The 40% seems very high. For-profit start-ups have a much higher failure rate. If that’s true, that’s incredible, but I’d expect to see more like 5% of charities and 50% of funds.
I think tautological measurement is a real concern for basically every meta charity, although I’m not sure I agree with your solution. I think the better solution is external evaluation, someone like GiveWell or Founders Pledge who does not have any reason to value CE charities. Typically, these organizations do their own independent research and compare it across their current portfolio of projects. If CE can, for example, fairly consistently incubate charities that GW/FP/etc. rank as best in the world, I think that is at least not organizationally tautological (it is assuming that these charity evaluators are in fact identifying the best areas/charities, and replicating any flaws they have though).
In terms of success rate, I agree 40% is high but I would expect many NGO incubators to be considerably higher than in the for-profit space, for a few reasons (a couple listed below):
General competition: There are just not that many charities aiming for pure impact (in an EA way), unlike the for-profit market. The general efficiency of the charity market is pretty low, and thus there are lots of fairly easy wins.
Scale sensitivity: Generally, successful for-profits are seen as really large-scale ventures (e.g., unicorns) and the market is consistently hunting for that. Debatably, the only charity currently seen as highly impactful that can get to that sort of scale is GiveDirectly. Thus the bar for success in the charity sector is significantly lower in terms of a money spent scale. For example, if we founded a charity running with a 1m a year budget, that was x2 as effective as top GW ones, we could count that as a success. But an organization of the same size would be considered a rounding error by YC. If we take size expectations into account, it might be like 1⁄25 charities that we incubated that have any significant chance of getting to unicorn-level size.
Thanks, Joey. Really appreciate you taking the time to engage on these questions.
To be clear, I’m not seriously suggesting ignoring all research from before the decision. I’m just saying that mathematically, an independent test needs its backrest data to exclude all calibration data.
It strikes me that there are broadly 3 buckets of risk / potential failure:
Execution risk—this is significant and you can only find out by trying, but you only really know if you’re being successful with the left hand side of the theory of change
Logic risk—having an external organisation take a completely fresh view should solve most of this
Evidence risk—even with an external organisation marking your homework, they are still probably drawing on the same pool of research and that might suffer from survivorship bias
The high success rate almost makes me think CE should be incubating even more ambitious, riskier projects, with the expectation of a lower success rate but higher overall EV. Very uncertain about this intuition though, would be interested to hear what CE thinks.
We have thought about this but we are not confident weaker charities would not crowd out stronger ones with funders and thus lead to less overall impact.