Attracting more experienced staff with higher salary and nicer office: more experienced staff are more productive which would increase the average cost-effectiveness above the current level, so the marginal must be greater than the current average.
Wait, what? The costs are also increasing, it’s definitely possible for marginal cost effectiveness to be lower than the current average. In fact, I would strongly predict it’s lower—if there’s an opportunity to get better marginal cost effectiveness than average cost effectiveness, that begs the question of why you don’t just cut funding from some of your less effective activities and repurpose it for this opportunity.
Given the importance of such considerations and the difficulty of modelling them quantitatively, to holistically evaluate an organization, especially a young one, there is an argument for using a qualitative approach and “cluster thinking”, in addition to a quantitative approach and “sequential thinking.”
Please do, I think an analysis of the potential for growth (qualitative or quantitative) would significantly improve this post, since that consideration could easily swamp all others.
Wait, what? The costs are also increasing, it’s definitely possible for marginal cost effectiveness to be lower than the current average.
Yes, agree with this. Like I say in the long comment above, I think that giving money to us right now probably has diminishing returns because we already made our funding targets for this year.
Robin, for what you quoted about increasing returns I was thinking only in the case of labor. Overall you are right that, if the organization has been maximizing cost-effectiveness, then they probably would have used the money they had before reaching fundraising targets in a way that makes it more cost-effective than money coming in later (assuming they are more certain about the amount of money up to fundraising target, and less certain about money coming in after that).
Thanks for sharing this analysis (and the broader project)!
Given the lengthy section on model limitations, I would have liked to have seen a discussion of sensitivity to assumptions. The one that stood out to me was the estimate for the value of a GWWC Pledge, which serves as a basis for all your calcs. While it certainly seems reasonable to use their estimate as a baseline, there’s inherently a lot of uncertainty in estimating a multi-decade donation stream and adjusting for counter-factuals, time discounting, and attrition.
FWIW, I’m pretty dubious about the treatment of plan changes scored 10. The model implies each of those plan changes is worth >$500k (again, adjusted for counterfactuals, time discounting, and attrition), which is an extremely high hurdle to meet. If a university student tells me they’re going to “become a major advocate of effective causes” (sufficient for a score of 10), I wouldn’t think that has the same expected value as a half million dollars given to AMF today.
I would have liked to have seen a discussion of sensitivity to assumptions.
I agree—I think, however, you can justify the cost-effectiveness of 80k in multiple, semi-independent ways, which help to make the argument more robust:
FWIW, I’m pretty dubious about the treatment of plan changes scored 10. The model implies each of those plan changes is worth >$500k...If a university student tells me they’re going to “become a major advocate of effective causes” (sufficient for a score of 10), I wouldn’t think that has the same expected value as a half million dollars given to AMF today.
Yes, we only weigh them at 10, rather than 40. However, here are some reasons the 500k figure might not be out of the question.
You only need a small number of outliers to pull up the mean a great deal.
Less extremely, some of the 10s are likely to donate millions to charity within the next few years.
Second, most of the 10s are focused on xrisk and meta-charity. Personally, I think efforts in these causes are likely at least 5-fold more cost-effective than AMF, so they’d only need to donate a 100k to have as much impact as 500k to AMF.
Fair point about outliers driving the mean. Does suggest that a cost-effectiveness estimate should just try to quantify those outliers directly instead of going through a translation. E.g. if “some of the 10s are likely to donate millions to charity within the next few years”, just estimate the value of that rather than assuming that giving will on average equal 10x GWWC’s estimate for the value of a pledge.
Does suggest that a cost-effectiveness estimate should just try to quantify those outliers directly instead of going through a translation.
Yes, that’s the main way I think about our impact. But I think you can also justify it on the basis of getting lots of people make moderate changes, so I think it’s useful to consider both approaches.
Thanks for writing this. A couple of quick comments (these are not thoroughly checked—our annual reviews are the more reliable source of information):
How should we think about all these things they could do with additional funding now?
Given that we made the higher end of our funding targets, I’d guess that giving us money right now has diminishing returns compared to those we received earlier in the year. However, they are not super diminishing. First, they give us the option to grow faster. Second, if we don’t take that option, then the worst case scenario is that we raise less money next funding round. This means you funge with our marginal donor in early 2018 (which might well be Open Phil), while also saving us time, and giving us greater financial strength in the meantime, which helps to attract staff.
Will our returns diminish from 2016 to 2017? That’s less clear.
If you’re reading this and trying to evaluate 80,000 Hours, then I’d encourage you to consider other questions, which are glossed over in this analysis, but similarly, or more important, such as:
1) Is the EA community more talent constrained than funding constrained?
2) Will 80k continue to grow rapidly?
3) How pressing a problem are poor career choice and promoting EA?
4) How effective is AMF vs other EA causes? (80k isn’t especially focused on global poverty)
5) Is 80k a well-run organisation with a good team?
I think you should add more uncertainty to your model around the value of an 80K career change (in both directions). While 1 impact-adjusted change is approximately the value of a GWWC pledge, that doesn’t mean it is equal in both mean and standard deviation as your model suggests, since the plan changes involve a wide variety of different possibilities.
It might be good to work with 80K to get some more detail about the kinds of career changes that are being made and try to model the types of career changes separately. Certainly, some people do take the GWWC pledge, and that is a change that is straightforwardly comparable with the value of the GWWC pledge (minus concerns about the counterfactual share of 80K), but other people make much higher-risk higher-reward career changes, especially in the 10x category.
Speaking just for me, in my personal view looking at a few examples of the 80K 10x category, I’ve found them to be highly variable (including some changes that I’d personally judge as less valuable than the GWWC pledge)… while this certainly is not a systematic analysis on my part, it would suggest your model should include more uncertainty than it currently does.
Lastly, I think your model right now assumes 80K has 100% responsibility for all their career changes. Maybe this is completely fine because 80K already weights their reported career change numbers for counterfactuality? Or maybe there’s some other good reason to not take this into account? I admit there’s a good chance I’m missing something here, but it would be nice to see it addressed more specifically.
Lastly, I think your model right now assumes 80K has 100% responsibility for all their career changes. Maybe this is completely fine because 80K already weights their reported career change numbers for counterfactuality? Or maybe there’s some other good reason to not take this into account? I admit there’s a good chance I’m missing something here, but it would be nice to see it addressed more specifically.
I don’t think that’s true, because the GWWC pledge value figures have been counterfactually adjusted, and because we don’t count all of the people we’ve influenced to take the GWWC pledge.
While 1 impact-adjusted change is approximately the value of a GWWC pledge, that doesn’t mean it is equal in both mean and standard deviation as your model suggests, since the plan changes involve a wide variety of different possibilities.
Agree with that—the standard deviation should be larger.
Peter, indeed your point #2 about uncertainty is what I discuss in the last point of “2) Outcome measures”, under “Model limitations”. I argued in a handwaving way that because 80K still causes some lower risk and lower return global health type interventions—which our aggregation model seems to favor, probably due to the Bayesian prior—it will probably still beat MIRI that focuses exclusively on high risk, high return things that the model seems to penalize. But yes we should have modeled it in this way.
My hunch is also that 80,000 Hours and most organisations have diminishing marginal cost-effectiveness. As far as I know from our conversations, on balance this is Sindy’s view too.
You need to be very careful about what margin and output you’re talking about.
As I discuss in my long comment above, I think it’s unclear whether our annual ratio of cost per plan change will go up or down, and I think there’s a good chance it continues to drop, as it has the last 4 years.
On the other hand, if you’re talking about total value created per dollar (including all forms of value), then that seems like it’s more likely to be going down. It seems intuitive that our earliest supporters who made 80k possible had more impact than supporters today.
Though even that’s not clear. You could get increasing returns due to economies of scale or tipping point effects and so on.
Actually I was suggesting you use a qualitative approach (which is what the quoted section says). I don’t think I could come up with a quantitative model that I would believe over my intuition, because as you said the counterfactuals are hard. But just because you can’t easily quantify an argument doesn’t mean you should discard it altogether, and in this particular case it’s one of the most important arguments and could be the only one that matters, so you really shouldn’t ignore it, even if it can’t be quantified.
Wait, what? The costs are also increasing, it’s definitely possible for marginal cost effectiveness to be lower than the current average. In fact, I would strongly predict it’s lower—if there’s an opportunity to get better marginal cost effectiveness than average cost effectiveness, that begs the question of why you don’t just cut funding from some of your less effective activities and repurpose it for this opportunity.
Please do, I think an analysis of the potential for growth (qualitative or quantitative) would significantly improve this post, since that consideration could easily swamp all others.
Yes, agree with this. Like I say in the long comment above, I think that giving money to us right now probably has diminishing returns because we already made our funding targets for this year.
Robin, for what you quoted about increasing returns I was thinking only in the case of labor. Overall you are right that, if the organization has been maximizing cost-effectiveness, then they probably would have used the money they had before reaching fundraising targets in a way that makes it more cost-effective than money coming in later (assuming they are more certain about the amount of money up to fundraising target, and less certain about money coming in after that).
Thanks for sharing this analysis (and the broader project)!
Given the lengthy section on model limitations, I would have liked to have seen a discussion of sensitivity to assumptions. The one that stood out to me was the estimate for the value of a GWWC Pledge, which serves as a basis for all your calcs. While it certainly seems reasonable to use their estimate as a baseline, there’s inherently a lot of uncertainty in estimating a multi-decade donation stream and adjusting for counter-factuals, time discounting, and attrition.
FWIW, I’m pretty dubious about the treatment of plan changes scored 10. The model implies each of those plan changes is worth >$500k (again, adjusted for counterfactuals, time discounting, and attrition), which is an extremely high hurdle to meet. If a university student tells me they’re going to “become a major advocate of effective causes” (sufficient for a score of 10), I wouldn’t think that has the same expected value as a half million dollars given to AMF today.
Hi Jon,
I agree—I think, however, you can justify the cost-effectiveness of 80k in multiple, semi-independent ways, which help to make the argument more robust:
https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/
Yes, we only weigh them at 10, rather than 40. However, here are some reasons the 500k figure might not be out of the question.
First, we care about the mean value, not the median or threshold. Although some of the 10s will probably have less impact than 500k to AMF now, some of them could have far more. For instance, there’s reason to think GPP might have had impact equivalent to over $100m given to AMF. https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#global-priorities-project
You only need a small number of outliers to pull up the mean a great deal.
Less extremely, some of the 10s are likely to donate millions to charity within the next few years.
Second, most of the 10s are focused on xrisk and meta-charity. Personally, I think efforts in these causes are likely at least 5-fold more cost-effective than AMF, so they’d only need to donate a 100k to have as much impact as 500k to AMF.
Fair point about outliers driving the mean. Does suggest that a cost-effectiveness estimate should just try to quantify those outliers directly instead of going through a translation.
E.g. if “some of the 10s are likely to donate millions to charity within the next few years”, just estimate the value of that rather than assuming that giving will on average equal 10x GWWC’s estimate for the value of a pledge.
Yes, that’s the main way I think about our impact. But I think you can also justify it on the basis of getting lots of people make moderate changes, so I think it’s useful to consider both approaches.
Hi there,
Thanks for writing this. A couple of quick comments (these are not thoroughly checked—our annual reviews are the more reliable source of information):
Given that we made the higher end of our funding targets, I’d guess that giving us money right now has diminishing returns compared to those we received earlier in the year. However, they are not super diminishing. First, they give us the option to grow faster. Second, if we don’t take that option, then the worst case scenario is that we raise less money next funding round. This means you funge with our marginal donor in early 2018 (which might well be Open Phil), while also saving us time, and giving us greater financial strength in the meantime, which helps to attract staff.
Will our returns diminish from 2016 to 2017? That’s less clear.
If you’re looking at the ratio of plan changes to costs each year, as you do in your model, then there’s a good chance the ratio goes down in 2017. Past investments will pay off, we learn how to be more efficient, and we get economies of scale. More discussion here: https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#whats-the-marginal-cost-per-plan-change
On the other hand, if we invest a lot in long-term growth, then the short-term ratio will go up.
This shows some of the limitation looking at the ratio of costs to plan changes each year, which we discuss more here: https://80000hours.org/2015/11/take-the-growth-approach-to-evaluating-startup-non-profits-not-the-marginal-approach/
If you’re reading this and trying to evaluate 80,000 Hours, then I’d encourage you to consider other questions, which are glossed over in this analysis, but similarly, or more important, such as:
1) Is the EA community more talent constrained than funding constrained?
2) Will 80k continue to grow rapidly?
3) How pressing a problem are poor career choice and promoting EA?
4) How effective is AMF vs other EA causes? (80k isn’t especially focused on global poverty)
5) Is 80k a well-run organisation with a good team?
You can see more of our thoughts on how to analyse a charity here: https://80000hours.org/articles/best-charity/
I think you should add more uncertainty to your model around the value of an 80K career change (in both directions). While 1 impact-adjusted change is approximately the value of a GWWC pledge, that doesn’t mean it is equal in both mean and standard deviation as your model suggests, since the plan changes involve a wide variety of different possibilities.
It might be good to work with 80K to get some more detail about the kinds of career changes that are being made and try to model the types of career changes separately. Certainly, some people do take the GWWC pledge, and that is a change that is straightforwardly comparable with the value of the GWWC pledge (minus concerns about the counterfactual share of 80K), but other people make much higher-risk higher-reward career changes, especially in the 10x category.
Speaking just for me, in my personal view looking at a few examples of the 80K 10x category, I’ve found them to be highly variable (including some changes that I’d personally judge as less valuable than the GWWC pledge)… while this certainly is not a systematic analysis on my part, it would suggest your model should include more uncertainty than it currently does.
Lastly, I think your model right now assumes 80K has 100% responsibility for all their career changes. Maybe this is completely fine because 80K already weights their reported career change numbers for counterfactuality? Or maybe there’s some other good reason to not take this into account? I admit there’s a good chance I’m missing something here, but it would be nice to see it addressed more specifically.
I don’t think that’s true, because the GWWC pledge value figures have been counterfactually adjusted, and because we don’t count all of the people we’ve influenced to take the GWWC pledge.
More discussion here: https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#giving-what-we-can-pledges
Agree with that—the standard deviation should be larger.
Peter, indeed your point #2 about uncertainty is what I discuss in the last point of “2) Outcome measures”, under “Model limitations”. I argued in a handwaving way that because 80K still causes some lower risk and lower return global health type interventions—which our aggregation model seems to favor, probably due to the Bayesian prior—it will probably still beat MIRI that focuses exclusively on high risk, high return things that the model seems to penalize. But yes we should have modeled it in this way.
You need to be very careful about what margin and output you’re talking about.
As I discuss in my long comment above, I think it’s unclear whether our annual ratio of cost per plan change will go up or down, and I think there’s a good chance it continues to drop, as it has the last 4 years.
On the other hand, if you’re talking about total value created per dollar (including all forms of value), then that seems like it’s more likely to be going down. It seems intuitive that our earliest supporters who made 80k possible had more impact than supporters today.
Though even that’s not clear. You could get increasing returns due to economies of scale or tipping point effects and so on.
Actually I was suggesting you use a qualitative approach (which is what the quoted section says). I don’t think I could come up with a quantitative model that I would believe over my intuition, because as you said the counterfactuals are hard. But just because you can’t easily quantify an argument doesn’t mean you should discard it altogether, and in this particular case it’s one of the most important arguments and could be the only one that matters, so you really shouldn’t ignore it, even if it can’t be quantified.