I think we mostly have a semantic difference here. At present, I think the method of analysis is so different that it’s better not to speak of the units as being of the same type. That’s in part based on clarity concerns—speaking of GIF units and GiveWell units as the same risks people trying to compare them without applying an appropriate method for allocating impact in a comparative context. I think it’s possible to agree on a range, but I think that is going to require a lot of data from GIF that it probably isn’t in a position to disclose (and which may require several more years of operation to collect).
If I’m understanding Ken correctly, I do not think GIF’s current calculation method is sufficient to allow for comparisons between GIF and GiveWell:
GIF invests in early-stage innovations that arguably might fail or falter absent the funding round in which we participate. So we think the counterfactual of no impact is a defensible assumption. Regarding contribution, we allocate future impact in proportion to our participation in that funding round.
Let’s say GIF gave a $1MM grant to an organization in the first funding round, which is 50% of the total round. Another grantor gave a $2MM grant in the second round (50% of that round), and a third grantor gave $4MM as 50% of a final funding round. (I’m using grants to simplify the toy model.)
The organization produces 14 million raw impact units, as projected. If I’m reading the above statement correctly, GIF allocates all 14 million raw units to the first funding round, and assigns itself half of them for a final impact of 7 raw impact units per dollar. For this to be comparable to a GiveWell unit (which represents unduplicated impact), you’d have to assign the other two funders zero impact, which isn’t plausible. Stated differently, you’d have to assume the other grantors’ counterfactual use of the money in GIF’s absence would have been to light it on fire.
A generous-to-GiveWell option would be to assume that the other two grantors would have counterfactually given their $6MM to GiveWell. Under this assumption, GIF’s impact is 7 million raw impact units for the $1MM minus how many ever raw impact units GiveWell would have generated with an additional $6MM. Under the assumption that GiveWell converts money into raw impact units 1⁄3 as efficiently as GIF, that would actually make GIF severely net negative in the toy example because the lost $6MM in GiveWell funding was more valuable than 50% of the organization’s impact.
I’m definitely not suggesting that is the correct approach, and it is certainly GiveWell-friendly. However, I’m not currently convinced it’s more GiveWell-friendly than allocating all impact to the funding round in which GIF participated is GIF-friendly. If the defensible range of comparisons is anywhere near as broad as the toy example, then no meaningful comparisons can be made on currently available information.
Yeah, it seems we do have a semantic difference here. But, how you’re using ‘raw impact units’ makes sense to me.
Nice, clear examples! I feel inspired by them to sketch out what I think the “correct” approach would look like. With plenty of room for anyone to choose their own parameters.
Let’s simplify things a bit. Say the first round is as described above and its purpose is to fund the organization to test its intervention. Then let’s lump all future rounds together and say they total $14m and fund the implementation of the intervention if the tests are successful. That is, $14m of funding in the second round, assuming the tests are a success, produces 14m units of impact.
The 14m is what I would call the org’s potential ‘Gross Impact’, with no adjustment for the counterfactual. We need to adjust for what would otherwise happen without the org to get its potential ‘Enterprise Impact’ (relative to the counterfactual).
For one, yes, the funders would have invested their money elsewhere. So, the org will only have a positive Enterprise Impact if it is more cost-effective than the funder’s alternative. I think the ‘generous-to-GiveWell option’ is more extreme than it might appear at first glance. It’s not only assuming that the funders would otherwise donate in line with GiveWell (GW). It’s also assuming that they are somehow suckered into donating to this less effective org, despite being GW donors.
A more reasonable assumption, in my view, is that the org only gets funding if its cost-effectiveness is above the bar of its funders. It also seems likely to me that the org, if successful, will be able to attract funders that are not GW donors. There are plenty of funders with preferences that are not that aligned with cost-effectiveness. As long as these other reasons line up with this hypothetical org, then it could get non-GW funding. Indeed, in GW’s model for Malaria Consortium it looks like they are assuming the Global Fund is 2-3x less effective than GW spending and that Domestic Governments are 6x-10x less effective. Furthermore, if the org is able to adopt a for-profit operating model, it could get commercial funding with relatively little impact in the counterfactual.
As an example, let’s say GW top charities produce 1 unit of impact per dollar and the org’s second round funders typically make grants that are 10x less effective than GW. The counterfactual impact of the funder’s alternative grants would be 1.4 million units of impact. So, based on this consideration the potential Enterprise Impact = 14 million − 1.4 million = 12.6 million units of impact.
Another consideration is that if the org didn’t exist, or even if it does, then another org may have developed that solves the same problem for the same beneficiaries. Let’s say the probability of an alternative org replicating the Enterprise Impact is 21% (just an illustrative number). Then adjusting for this consideration makes the potential Enterprise Impact actually (1-21%) * 12.6 million = 10 million units of impact.
Next, we need to go from potential Enterprise Impact to expected Enterprise Impact. That is, we need to account for the probability the org is successful after the first round tests. Let’s say 10% - a fairly standard early-stage success rate. That makes the expected Enterprise Impact equal to 1 million units.
Now we can look at the impact of GIF’s funding. That is, how did their decision to fund the $1m in the first round change the expected Enterprise Impact?
This will depend on a combination of how much potential interest there was from other funders and how much the organization is able to scale with more funding (e.g., improving the statistical power of their test intervention, testing in more locations,...). At one extreme, all other funders may be non-cost effective donors and only be willing to participate if GIF led the round, in which case I’d say GIF’s $1m enabled the entire $2m round. At the other extreme, it could be the org only really needed $1m and there were plenty of other funders willing to step in, in which case GIF’s investor impact would be close to zero.
For this example, let’s say there was some potential of other funding but it was far from guaranteed, that the typical cost-effectiveness of the other funders would be small, and there were some diminishing returns to scale in the round. Altogether, suppose this means that GIF’s 50% of the round only has a contribution versus the counterfactual equal to 30% of the expected Enterprise Impact. That is, an Investor Impact of 300k units of impact.
To summarize the whole stream of calculations is (14m − 1.4m) * (1 − 0.21) * 10% * 30% = 300k. That is, 0.3 units of impact per dollar. Or, 0.3x GW (according to my illustrative assumptions).
Based on GIF’s published methodology and Ken’s comments here, I believe GIF’s reported numbers for this example would be something like 14m * 10% * 50% = 700k. Or, 0.7x GW. Given they actually reported 3x GW, to calibrate this example to their reporting, I’d increase the value of the (14m * 10%) part by 4.3 = 3⁄0.7. This can be interpreted as the actual scale or duration of the orgs GIF funds being greater than 14m, or that their average probability of success is higher than 10%.
With the calibration change, my illustrative estimate of GIF’s effectiveness would be 4.3 times my original value of 0.3x. That is, 1.3x GW.
The only “real” number here is the calibration to being 3x GW according to my version of GIF’s calculations. The point of the 1.3x result is just to illustrate how I would adjust for the relevant counterfactuals. Relative to my version of GIF’s calculation, my calculation includes a 71% = (14m − 1.4m)/ 14m * (1 − 0.21) multiplier, that translates Gross Impact into Enterprise Impact, and a 60% = 30% /50% multiplier, that gets us to the final Investor Impact. With these values, that I consider highly uncertain but plausible, the effectiveness of GIF would be above GW top charities.
For emphasis, I’m not claiming GIF is or is not achieving this effectiveness. I’m just seeking to illustrate that it is plausible. And, if someone were to do an independent analysis, I’d expect the results to shape up along the lines of the approach I’ve outlined here.
For one, yes, the funders would have invested their money elsewhere. So, the org will only have a positive Enterprise Impact if it is more cost-effective than the funder’s alternative. I think the ‘generous-to-GiveWell option’ is more extreme than it might appear at first glance. It’s not only assuming that the funders would otherwise donate in line with GiveWell (GW). It’s also assuming that they are somehow suckered into donating to this less effective org, despite being GW donors.
Yes, I think the “generous-to-Givewell” model should be seen as the right bookend on defensible models on available data, just like I see GIF’s current model as the left bookend on defensible models. I think it’s plausible that $1 to GIF has either higher or lower impact than $1 to GiveWell.
As for the counterfactual impact that other funders would have, I would expect funders savvy enough and impact-motivated enough to give to GIF-supported projects to be a cut above the norm in effectiveness (although full-GiveWell effectiveness is a stretch as you note). Also, the later-round funders could plausibly make decisions while disregarding prior funding as sunk costs, if they concluded that the relevant project was going to go under otherwise. This could be because they are thinking in a one-off fashion or because they don’t think their fund/no fund decision will affect the future decisions of early-stage funders like GIF.
Although I like your model at a quick glance, I think it’s going to be challenging to come up with input numbers we can have a lot of confidence in. If there’s relatively low overlap between the GiveWell-style donor base and the GIF-style donor base, it may not be worthwhile to invest heavily enough in that analysis to provide a confidence interval that doesn’t include equality.
Also, GiveWell’s diminishing returns curve is fairly smooth, fairly stable over time, and fairly easy to calculate—most of its portfolio is in a few interventions, and marginal funding mostly extends one of those interventions to a new region/country. GIF’s impact model seems much more hits-based, so I’d expect diminishing returns to kick in more forcefully. Indeed, my very-low-confidence guess is that GIF is more effective at lower funding levels, but that the advantage switches to GiveWell at some inflection point. All that is to say that we’d probably need to invest resources into continuously updating the relevant inputs for the counterfactual impact forumula.
I think we mostly have a semantic difference here. At present, I think the method of analysis is so different that it’s better not to speak of the units as being of the same type. That’s in part based on clarity concerns—speaking of GIF units and GiveWell units as the same risks people trying to compare them without applying an appropriate method for allocating impact in a comparative context. I think it’s possible to agree on a range, but I think that is going to require a lot of data from GIF that it probably isn’t in a position to disclose (and which may require several more years of operation to collect).
If I’m understanding Ken correctly, I do not think GIF’s current calculation method is sufficient to allow for comparisons between GIF and GiveWell:
Let’s say GIF gave a $1MM grant to an organization in the first funding round, which is 50% of the total round. Another grantor gave a $2MM grant in the second round (50% of that round), and a third grantor gave $4MM as 50% of a final funding round. (I’m using grants to simplify the toy model.)
The organization produces 14 million raw impact units, as projected. If I’m reading the above statement correctly, GIF allocates all 14 million raw units to the first funding round, and assigns itself half of them for a final impact of 7 raw impact units per dollar. For this to be comparable to a GiveWell unit (which represents unduplicated impact), you’d have to assign the other two funders zero impact, which isn’t plausible. Stated differently, you’d have to assume the other grantors’ counterfactual use of the money in GIF’s absence would have been to light it on fire.
A generous-to-GiveWell option would be to assume that the other two grantors would have counterfactually given their $6MM to GiveWell. Under this assumption, GIF’s impact is 7 million raw impact units for the $1MM minus how many ever raw impact units GiveWell would have generated with an additional $6MM. Under the assumption that GiveWell converts money into raw impact units 1⁄3 as efficiently as GIF, that would actually make GIF severely net negative in the toy example because the lost $6MM in GiveWell funding was more valuable than 50% of the organization’s impact.
I’m definitely not suggesting that is the correct approach, and it is certainly GiveWell-friendly. However, I’m not currently convinced it’s more GiveWell-friendly than allocating all impact to the funding round in which GIF participated is GIF-friendly. If the defensible range of comparisons is anywhere near as broad as the toy example, then no meaningful comparisons can be made on currently available information.
Yeah, it seems we do have a semantic difference here. But, how you’re using ‘raw impact units’ makes sense to me.
Nice, clear examples! I feel inspired by them to sketch out what I think the “correct” approach would look like. With plenty of room for anyone to choose their own parameters.
Let’s simplify things a bit. Say the first round is as described above and its purpose is to fund the organization to test its intervention. Then let’s lump all future rounds together and say they total $14m and fund the implementation of the intervention if the tests are successful. That is, $14m of funding in the second round, assuming the tests are a success, produces 14m units of impact.
The 14m is what I would call the org’s potential ‘Gross Impact’, with no adjustment for the counterfactual. We need to adjust for what would otherwise happen without the org to get its potential ‘Enterprise Impact’ (relative to the counterfactual).
For one, yes, the funders would have invested their money elsewhere. So, the org will only have a positive Enterprise Impact if it is more cost-effective than the funder’s alternative. I think the ‘generous-to-GiveWell option’ is more extreme than it might appear at first glance. It’s not only assuming that the funders would otherwise donate in line with GiveWell (GW). It’s also assuming that they are somehow suckered into donating to this less effective org, despite being GW donors.
A more reasonable assumption, in my view, is that the org only gets funding if its cost-effectiveness is above the bar of its funders. It also seems likely to me that the org, if successful, will be able to attract funders that are not GW donors. There are plenty of funders with preferences that are not that aligned with cost-effectiveness. As long as these other reasons line up with this hypothetical org, then it could get non-GW funding. Indeed, in GW’s model for Malaria Consortium it looks like they are assuming the Global Fund is 2-3x less effective than GW spending and that Domestic Governments are 6x-10x less effective. Furthermore, if the org is able to adopt a for-profit operating model, it could get commercial funding with relatively little impact in the counterfactual.
As an example, let’s say GW top charities produce 1 unit of impact per dollar and the org’s second round funders typically make grants that are 10x less effective than GW. The counterfactual impact of the funder’s alternative grants would be 1.4 million units of impact. So, based on this consideration the potential Enterprise Impact = 14 million − 1.4 million = 12.6 million units of impact.
Another consideration is that if the org didn’t exist, or even if it does, then another org may have developed that solves the same problem for the same beneficiaries. Let’s say the probability of an alternative org replicating the Enterprise Impact is 21% (just an illustrative number). Then adjusting for this consideration makes the potential Enterprise Impact actually (1-21%) * 12.6 million = 10 million units of impact.
Next, we need to go from potential Enterprise Impact to expected Enterprise Impact. That is, we need to account for the probability the org is successful after the first round tests. Let’s say 10% - a fairly standard early-stage success rate. That makes the expected Enterprise Impact equal to 1 million units.
Now we can look at the impact of GIF’s funding. That is, how did their decision to fund the $1m in the first round change the expected Enterprise Impact?
This will depend on a combination of how much potential interest there was from other funders and how much the organization is able to scale with more funding (e.g., improving the statistical power of their test intervention, testing in more locations,...). At one extreme, all other funders may be non-cost effective donors and only be willing to participate if GIF led the round, in which case I’d say GIF’s $1m enabled the entire $2m round. At the other extreme, it could be the org only really needed $1m and there were plenty of other funders willing to step in, in which case GIF’s investor impact would be close to zero.
For this example, let’s say there was some potential of other funding but it was far from guaranteed, that the typical cost-effectiveness of the other funders would be small, and there were some diminishing returns to scale in the round. Altogether, suppose this means that GIF’s 50% of the round only has a contribution versus the counterfactual equal to 30% of the expected Enterprise Impact. That is, an Investor Impact of 300k units of impact.
To summarize the whole stream of calculations is (14m − 1.4m) * (1 − 0.21) * 10% * 30% = 300k. That is, 0.3 units of impact per dollar. Or, 0.3x GW (according to my illustrative assumptions).
Based on GIF’s published methodology and Ken’s comments here, I believe GIF’s reported numbers for this example would be something like 14m * 10% * 50% = 700k. Or, 0.7x GW. Given they actually reported 3x GW, to calibrate this example to their reporting, I’d increase the value of the (14m * 10%) part by 4.3 = 3⁄0.7. This can be interpreted as the actual scale or duration of the orgs GIF funds being greater than 14m, or that their average probability of success is higher than 10%.
With the calibration change, my illustrative estimate of GIF’s effectiveness would be 4.3 times my original value of 0.3x. That is, 1.3x GW.
The only “real” number here is the calibration to being 3x GW according to my version of GIF’s calculations. The point of the 1.3x result is just to illustrate how I would adjust for the relevant counterfactuals. Relative to my version of GIF’s calculation, my calculation includes a 71% = (14m − 1.4m)/ 14m * (1 − 0.21) multiplier, that translates Gross Impact into Enterprise Impact, and a 60% = 30% /50% multiplier, that gets us to the final Investor Impact. With these values, that I consider highly uncertain but plausible, the effectiveness of GIF would be above GW top charities.
For emphasis, I’m not claiming GIF is or is not achieving this effectiveness. I’m just seeking to illustrate that it is plausible. And, if someone were to do an independent analysis, I’d expect the results to shape up along the lines of the approach I’ve outlined here.
Yes, I think the “generous-to-Givewell” model should be seen as the right bookend on defensible models on available data, just like I see GIF’s current model as the left bookend on defensible models. I think it’s plausible that $1 to GIF has either higher or lower impact than $1 to GiveWell.
As for the counterfactual impact that other funders would have, I would expect funders savvy enough and impact-motivated enough to give to GIF-supported projects to be a cut above the norm in effectiveness (although full-GiveWell effectiveness is a stretch as you note). Also, the later-round funders could plausibly make decisions while disregarding prior funding as sunk costs, if they concluded that the relevant project was going to go under otherwise. This could be because they are thinking in a one-off fashion or because they don’t think their fund/no fund decision will affect the future decisions of early-stage funders like GIF.
Although I like your model at a quick glance, I think it’s going to be challenging to come up with input numbers we can have a lot of confidence in. If there’s relatively low overlap between the GiveWell-style donor base and the GIF-style donor base, it may not be worthwhile to invest heavily enough in that analysis to provide a confidence interval that doesn’t include equality.
Also, GiveWell’s diminishing returns curve is fairly smooth, fairly stable over time, and fairly easy to calculate—most of its portfolio is in a few interventions, and marginal funding mostly extends one of those interventions to a new region/country. GIF’s impact model seems much more hits-based, so I’d expect diminishing returns to kick in more forcefully. Indeed, my very-low-confidence guess is that GIF is more effective at lower funding levels, but that the advantage switches to GiveWell at some inflection point. All that is to say that we’d probably need to invest resources into continuously updating the relevant inputs for the counterfactual impact forumula.