Issues with how to assess impact, metrics etc. are discussed in-depth in the organisation’s impact evaluations.
What I’m keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.
Just to clarify, you’d like to see funding to meta-charities increase, so don’t think these worries are actually sufficient to warrant a move back to first order charities?
Cheers,
Ben
PS. One other small thing – it’s odd to class GiveWell as not meta, but 80k as meta. I often think of 80k as the GiveWell of career choice. Just as GiveWell does research into which charities are most effective and publicises it, we do research into which career strategies are most effective and publicise it.
Note that it is possible for the credit to sum to more than 100%.
Yes, I agree that this is possible (this is why I said it could be “a reasonable conclusion by each organization”). My point is that because of this phenomenon, you can have the pathological case where from a global perspective, the impact does not justify the costs, even though the impact does justify the costs from the perspective of every organization.
I discuss point 6 here
Yeah, I agree that potential economies of scale are much greater than diminishing marginal returns, and I should have mentioned that. Mea culpa.
Issues with how to assess impact, metrics etc. are discussed in-depth in the organisation’s impact evaluations.
My impression is that organizations acknowledge that there are issues, but the issues remain. I’ll write up an example with GWWC soon.
Just to clarify, you’d like to see funding to meta-charities increase, so don’t think these worries are actually sufficient to warrant a move back to first order charities?
That’s correct.
PS. One other small thing – it’s odd to class GiveWell as not meta, but 80k as meta. I often think of 80k as the GiveWell of career choice. Just as GiveWell does research into which charities are most effective and publicises it, we do research into which career strategies are most effective and publicise it.
I agree that 80k’s research product is not meta the way I’ve defined it. However, 80k does a lot of publicity and outreach that GiveWell for the most part does not do. For example: the career workshops, the 80K newsletter, the recent 80K book, the TedX talks, the online ads, the flashy website that has popups for the mailing list. To my knowledge, of that list GiveWell only has online ads.
What I’m keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.
I’ve got a speculative one for GWWC, and a more concrete one for chapter seeding.
GWWC pledges: I’ve mentioned that I don’t worry about traps #2 and #4, and traps #3 and #5 don’t apply to a specific organization, so I’ll skip those.
Meta Trap #1a. Probability of not having an impact
I don’t think this is a problem for GWWC.
Meta Trap #1b. Overestimating impact
Here are some potential ways that GWWC could be overestimating impact:
They assume that the rate people drop out of the pledge is constant, but I would guess that the rate increases for the first ~10 years and then drops. In addition, I would guess that the average 2016 pledge taker is less committed than the average 2015 pledge taker (more numbers suggests lower quality), so that would also increase the rate.
In one section, instead of computing a weighted average, they compute the average weight and then multiply it by the total donations, for reasons unclear to me. Fixing this could cause the impact to go up or down. I mentioned this to GWWC and so far haven’t gotten a definitive answer.
Counterfactuals are self-reported and so could be systematically wrong. GWWC’s response to this ameliorated my concerns but I would still guess that they are biased in GWWC’s favor. For example, all of the pledges from EA Berkeley members would have happened as long as some sort of pledge existed. You may credit GWWC with some impact since we needed it to exist, but at the very least it wouldn’t be marginal impact. However, I think (I’m guessing) that some of our members assigned counterfactual impact to GWWC.
Their time discounting could be too small (since meta trap #5 suggests a larger time discount).
Now since GWWC talks about their impact in terms of their impact on the organizations directly beneath them on the chain, you don’t see any amplification of overestimates. However, consider the case of local EA groups. They could be overestimating their impact too:
They assume that a GWWC pledge is worth $73,000, which is what GWWC says, but the average local group pledge may be worse than the average GWWC pledge, because the members are not as committed to effectiveness, or they are more likely to drop out due to lack of community. (I say “may be worse” because I don’t know what the average GWWC pledge looks like, it may turn out there are similar problems there.)
They may simply overcount the number of members who have taken the pledge. (I have had at least one student say that they took the pledge, but their name wasn’t on the website, and many students say they will take the pledge but then don’t.)
Meta Trap #1c. Issues with metrics
I do think that local groups sometimes sacrifice quality of pledges for number of pledges when they shouldn’t.
It actually is the case that student pledge takers from EA Berkeley have basically no interaction with the GWWC community. I don’t know why this is or if its normal for pledge takers in general. Sometimes I worry that GWWC spends too much time promoting their top charities since that would improve their metrics, but I have no real reason for thinking that this is actually happening.
Meta Trap #6. Marginal impact may be much lower than average impact.
My original post explained how this would be the case for GWWC. I agree though that economies of scale will probably dominate for some time.
Meta Trap #7. Meta suffers more from coordination problems.
I think local EA groups and GWWC both take credit for pledges originating from local groups. (It depends on what those pledge takers self-reported as the counterfactual.) If they came from an 80,000 Hours career workshop, then we now have three organizations claiming the impact.
There’s also a good Facebook thread about this. I forgot about it when writing the post.
Meta Trap #8. Counterfactuals are harder to assess.
I’ve mentioned this above—GWWC uses self-reported counterfactuals. If you agree that you should penalize expected value estimates if the evidence is not robust, then I think you should do the same here.
Here’s the second example:
There was an effort to “seed” local EA groups, and in the impact evaluation we see “each hour of staff time generated between $187 and $1270 USD per hour.”
First problem: The entire point of this activity was to get other people to start a local EA group, but the time spent by these other people aren’t included as costs (meta trap #7, kind of). Those other people would probably have to put in ~20 person hours per group to get this impact. If you include these hours as costs, then the estimated cost-effectiveness becomes something more like $167-890.
Second problem: I would bet money at even odds that the pessimistic estimate for cold-email seeding was too optimistic (meta trap #1b). (I’m not questioning the counterfactual rate, I’m questioning the number of chapters that “went silent”.)
Taking these two into account, I think that the chapter seeding was probably worth ~$200 per hour. Now if GWWC itself is too optimistic in its impact calculation (as I think it is), this falls even further (meta trap #1b), and this seems just barely worthwhile.
That said, there are other benefits that aren’t incorporated into the calculation (both for GWWC pledges in general and chapter seeding). So overall it still seems like it was worthwhile, but it’s not nearly as exciting as it initially seemed.
These are all reasonable concerns. I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.
With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn’t have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.
With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say “probably not”. These people might still be biased in our favor, but on the other hand, there’s people we’ve influenced but were pushed over the edge by another org. We don’t count these people towards our impact, even though we made it easier for the other org.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Zooming out a bit, ultimately what we do is make people more likely to pledge.
Here’s a toy model.
At time 0, you have 3 people.
Amy has a 10% chance of taking the pledge
Bob has a 80% chance
Carla has a 90% chance
80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:
Amy: 20%
Bob: 90%
Carla: 100% → she actually takes it
Then GWWC shows them a talk, which has the same effect. So at time 2:
Amy: 30%
Bob: 100% → actually takes it
Carla: 100% (overdetermined)
Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.
GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC’s estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob’s donations to their estimates of counterfactual money moved.
This is pessimistic for 80k, because without 80k, GWWC wouldn’t have persuaded Bob, but this isn’t added to our impact.
It’s also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they’ve made it easier for other organisations to persuade her.
In either case, what’s actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what’s really going on.
Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren’t leading EAB (which took ~10 hours of my time each week). I’m not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).
On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
Anyway, I think there’s 3 main types of cost:
Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you’re into EA. So it’s plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn’t be a big factor in a calculation measured in terms of GiveWell dollars.
Okay, this makes more sense. I was mainly thinking of the second point—I agree that the first and third points don’t make too much of a difference. (However, some students can take on important jobs, eg. Oliver Habryka working at CEA while being a student.)
Another possibility is that you graduate faster. Instead of running a local group, you could take one extra course each semester. Aggregating this, for every two years of not running a local group, you could graduate a semester earlier.
(This would be for UC Berkeley, I think it should generalize about the same to other universities as well.)
In practice, it seems like much chapter time comes out of other leisure activities.
I strongly disagree.
I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates.
This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it’s very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is “they’ve probably thought of that”. To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don’t expect to be better than GWWC at estimating counterfactuals (and I don’t have the data to do so), so I can’t show that there’s a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.
There is one point that I can justify further. It’s my impression that meta orgs consistently don’t take into account the time spent by other people/groups, so I wouldn’t call that one a judgment call. Some more examples:
CEA lists “Hosted eight EAGx conferences” as one of their key accomplishments, but as far as I can tell don’t consider the costs to the people who ran the conferences, which can be huge. And there’s no way that you could expect this to come out of leisure time.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Yes, I agree that there is impact that isn’t counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the “uncounted” impact is probably low).
Here’s a toy model.
The main issue is that I don’t expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It’s a little hard for me to imagine what “90% chance” means so it’s hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)
However, I think this is mostly because you’ve chosen a very small effect size. Under this model, it’s impossible for 80k to ever have impact—people will only say they “probably wouldn’t” have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can’t tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
I can’t speak for the other orgs, but 80k probably wouldn’t count this as “full impact”.
First, the person would have to say they made the pledge “due to 80k”. Whereas if they were heavily influenced by the local group, they might say they would have taken it otherwise.
Second, as a first approximation, we use the same figure GWWC does for a value of a pledge in terms of donations. IIRC this already assumes only 30% is additional, once counterfactually adjusted. This % is based on their surveys of the pledgers. (Moreover, for the largest donors, who determine 75% of the donations, we ask them to make individual estimates too).
Taken together, 80k would attribute at most 30% of the value.
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
What would you estimate is the opportunity cost of student group organiser time per hour?
First, the person would have to say they made the pledge “due to 80k”.
Yes, I’m predicting that they would say that almost always (over 90% of the time).
this already assumes only 30% is additional, once counterfactually adjusted.
That does make quite a difference. It seems plausible then that impact is mostly undercounted rather than overcounted. This seems more like an artifact of a weird calculation (why use GWWC’s counterfactual instead of having a separate one)? And you still have the issue that impact may be double counted, it’s just that since you tend to undercount impact in the first place the effects seem to cancel out.
That’s a little uncharitable of me, but the point I’m trying to make is that there is no correction for double-counting impact—most of your counterarguments seem to be saying “we typically underestimate our impact so this doesn’t end up being a problem”. You aren’t using the 30% counterfactual rate because you’re worried about double counting impact with GWWC. (I’m correct about that, right? It would a really strange way to handle double counting of impact.)
Nitpick: This spreadsheet suggests 53%, and then adds some more impact based on changing where people donate (which could double count with GiveWell).
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
I agree that impact is often undercounted. I accept that impact is often undercounted, to such a degree that double counting would not get you over 100%. I still worry that people think “Their impact numbers are great and probably significant underestimates” without thinking about the issue of double counting, especially since most orgs make sure to mention how their impact estimates are likely underestimates.
Even if people just donated on the basis of “their impact numbers are great” without thinking about both undercounting and overcounting, I would worry that they are making the right decision for the wrong reasons. We should promote more rigorous thinking.
My perspective is something like “donors should know about these considerations”, whereas you may be interpreting it as “people who work in meta don’t know/care about these considerations”. I would only endorse the latter in the one specific case of not valuing the time of other groups/people.
What would you estimate is the opportunity cost of student group organiser time per hour?
The number I use for myself is $20, mostly just made up so that I can use it in Fermi estimates.
How would it compare to time spent by 80k staff?
Unsure. Probably a little bit higher, but not much. Say $40?
(I have not thought much about the actual numbers. I do think that the ratio between the two should be relatively small.)
I also don’t care too much that 80k doesn’t include costs to student groups because those costs are relatively small compared to the costs to 80k (probably). This is why I haven’t really looked into it. This is not the case with GWWC pledges or chapter seeding.
Hey Rohin, without getting into the details, I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse, so I’m not sure how we should act. We win out in some cases (we get bragging rights from someone who found out about EA from another source then changes their career) and lose in others (someone who finds out about GiveWell through 80k but doesn’t then attribute their donations to us).
There’s double counting yes, but the orgs are also legitimately complementary of one another—not sure if the double counting exceeds the real complementarity.
We could try to measure the benefit/cost of the movement as a whole—this gets rid of the attribution and complementarity problem, though loses the ability to tell what is best within the movement.
I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse
I’m a little unclear on what you mean here. I see three different factors:
Various orgs are undercounting their impact because they don’t count small changes that are part of a larger effort, even though in theory from a single player perspective, they should count the impact.
In some cases, two (or more) organizations both reach out to an individual, but either one of the organizations would have been sufficient, so neither of them get any counterfactual impact (more generally, the sum of the individually recorded impacts is less than the impact of the system as a whole)
Multiple orgs have claimed the same object-level impact (eg. an additional $100,000 to AMF from a GWWC pledge) because they were all counterfactually responsible for it (more generally, the sum of the individually recorded impacts is more than the impact of the system as a whole).
Let’s suppose:
X is the impact of an org from a single player perspective
Y is the impact of an org taking a system-level view (so that the sum of Y values for all orgs is equal to the impact of the system as a whole)
Point 1 doesn’t change X or Y, but it does change the estimate we make of X and Y, and tends to increase it.
Point 2 can only tend to make Y > X.
Point 3 can only tend to make Y < X.
Is your claim that the combination of points 1 and 2 may outweigh point 3, or just that point 2 may outweigh point 3? I can believe the former, but the latter seems unlikely—it doesn’t seem very common for many separate orgs to all be capable of making the same change, it seems more likely to me that in such cases all of the orgs are necessary which would be an instance of point 3.
We could try to measure the benefit/cost of the movement as a whole
Yeah, this is the best idea I’ve come up with so far, but I don’t really like it much. (Do you include local groups? Do you include the time that EAs spend talking to their friends? If not, how do you determine how much of the impact to attribute to meta orgs vs. normal network effects?) It would be a good start though.
Another possibility is to cross-reference data between all meta orgs, and try to figure out whether for each person, the sum of the impacts recorded by all meta orgs is a reasonable number. Not sure how feasible this actually is (in particular, it’s hard to know what a “reasonable number” would be, and coordinating among so many organizations seems quite hard).
I agree the double-counting issue is pretty complex. (I think maybe the “fraction of value added” approach I mention in the value of coordination post is along the right lines)
I think the key point is that it seems unlikely that (given how orgs currently measure impact) they’re claiming significantly more than 100% in aggregate. This is partly because there’s already lots of adjustments that pick up some of this (e.g. asking people if they would have done X due to another org) and because there are various types of undercounting.
Given this, adding a further correction for double counting doesn’t seem like a particularly big consideration—there are more pressing sources of uncertainty.
I agree that 80k’s research product is not meta the way I’ve defined it. However, 80k does a lot of publicity and outreach that GiveWell for the most part does not do. For example: the career workshops, the 80K newsletter, the recent 80K book, the TedX talks, the online ads, the flashy website that has popups for the mailing list. To my knowledge, of that list GiveWell only has online ads.
Maybe instead of talking about “meta traps” we should talk about “promotion traps” or something?
I don’t think you’ll want to equate being a meta org with what proportion of your time you spend on outreach. Some object level charities do a lot of outreach too. If AMF started spending 25% of its budget on marketing, would it become a meta-charity?
Sure 80k puts more effort into outreach than GiveWell, but the core model is very similar.
Fair enough, I don’t particularly care about which organization is a “meta org” and which one is not, I mostly care about where these meta traps apply and where they don’t. Probably should have talked about “meta work” instead of “meta org”. Anyway, it does seem like the traps apply to the outreach portion of 80k and not to GiveWell (since they barely have any outreach).
If AMF started spending a lot on marketing, I would count that as “meta work”, though I think a lot of these traps would not apply to that specific scenario.
At a glance, it seems like most of the meta-traps don’t apply to stuff like promotion of object-level causes.
That’s why Peter Hurford distinguished between second-level and first-level meta, and focused his criticism on the second-level.
80,000 Hours and GiveWell are both mainly doing first-level meta (i.e. we promote specific first order opportunities for impact); though we also do some second-level meta (promoting EA as an idea). 80k does more second-level meta day-to-day than GiveWell, though GiveWell explains their ultimate mission in second-level meta terms:
We aim to direct as much funding as possible of this large pool to the best giving opportunities we can find, and create a global, public, open conversation about how best to help people. We picture a world in which donors reward charities for effectiveness in improving lives.
One other quick point is that I don’t think coordination problems arise especially from meta-work. Rather, coordination problems can arise anywhere in which the best action for you depends on what someone else is going to do. E.g. you can get coordination problems among global health donors (GiveWell has written a lot about this). The points you list under “coordination problems” seem more like examples of why the counterfactuals are hard to assess, which is already under trap 8.
At a glance, it seems like most of the meta-traps don’t apply to stuff like promotion of object-level causes. That’s why Peter Hurford distinguished between second-level and first-level meta, and focused his criticism on the second-level.
I mostly agree, but I think a lot of them do apply to first-level meta in many cases. For example I talked about how they apply to GWWC, which is first-level meta (I think).
80,000 Hours and GiveWell are both mainly doing first-level meta (i.e. we promote specific first order opportunities for impact)
Yes, and I specifically didn’t include that kind of first-level meta work. I think the parts of first-level meta that are affected by these traps are efforts to fundraise for effective organizations, mainly ones that target EAs specifically. Even for general fundraising though, I think several traps still do apply, such as trap #1, #6 and #8.
One other quick point is that I don’t think coordination problems arise especially from meta-work.
I agree, I think it’s just disproportionately the case that donors to meta work are not taking into account these considerations. GiveWell and ACE take these considerations into account when making recommendations, so anyone relying on those recommendations has already “taken it into account”. This may arise in X-risk, I’m not sure—certainly it seems to apply to the part of X-risk that is about convincing other people to work on X-risk.
The points you list under “coordination problems” seem more like examples of why the counterfactuals are hard to assess, which is already under trap 8.
Well, even if each organization assesses counterfactuals perfectly, you still have the problem that the sum of the impacts across all organizations may be larger than 100%. The made-up example with Alice was meant to illustrate a case where each organization assesses their impact perfectly, comes to a ratio of 2:1 correctly, but in aggregate they would have spent more than was warranted.
I agree, I think it’s just disproportionately the case that donors to meta work are not taking into account these considerations.
What makes you think this? I found this post interesting, but not new; it’s all stuff I’ve thought about quite hard before. I wouldn’t have thought I was roughly representative of meta donors here (I certainly know people who have thought harder), though I’d be happy for other such donors to contradict me.
I’ve had conversations with people who said they’ve donated to GWWC because of high leverage ratios, and my impression based on those conversations is that they take the multiplier fairly literally (“even if it’s off by an order of magnitude it’s still worthwhile”) without really considering the alternatives.
In addition, it’s really easy to find all of the arguments in favor of meta, including (many of) the arguments that impact is probably being undercounted—you just have to read the fundraising posts by meta orgs. I don’t know of any post other than Hurford’s that suggests considerations against meta. It took me about a year to generate all of the ideas not in that post, and it certainly helped that I was working in meta myself.
I think the arguments in favor of meta are intuitive, but not easy to find. For one thing, the org’s posts tend to be org-specific (unsurprisngly) rather than a general defense of meta work. In fact, to the best of my knowledge the best general arguments have never been made on the forum at the top level because it’s sort-of-assumed that everybody knows them. So while you’re saying Peter’s post is the only such post you could find, that’s still more than the reverse (and with your post, it’s now 2 − 0).
At the comment level it’s easy to find plenty of examples of people making anti-meta arguments.
I think it’s not quite what you’re looking for, but I wrote How valuable is movement growth?, which is an article analysing the long-term counterfactual impact of different types of short-term movement growth effects. (It doesn’t properly speak to the empirical question of how short-term effort into meta work translates into short-term movement growth effects.)
I think the arguments in favor of meta are intuitive, but not easy to find. For one thing, the org’s posts tend to be org-specific (unsurprisngly) rather than a general defense of meta work.
Huh, there is a surprising lack of a canonical article that makes the case for meta work. (Just tried to find one.) That said, it’s very common when getting interested in EA to hear about GiveWell, GWWC and 80K, and to look them up, which gives you a sense of the arguments for meta.
Also, I would actually prefer that the arguments against also be org-specific, since that’s typically more decision-relevant, but a) that’s more work and b) it’s hard to do without actually being a part of the organization.
Anyway, even though there’s not a general article arguing for meta (which I am surprised by), that doesn’t particularly change my belief that a lot of people know the arguments for but not the arguments against. This has increased my estimate of the number of people who know neither the arguments for nor the arguments against.
I’m hoping/planning to plug both of those holes (a lack of org-specific criticism, and the uncomplied general arguments in favour) in the next few weeks, so did want to double-check that there wasn’t a canonical piece that I was missing.
Hi Rohin,
I agree these are good concerns.
I partially address 7 and point 8 in the post that you link to: https://80000hours.org/2016/02/the-value-of-coordination/#attributing-impact Note that it is possible for the credit to sum to more than 100%.
I discuss point 6 here: https://80000hours.org/2015/11/stop-talking-about-declining-returns-in-small-organisations/
Issues with how to assess impact, metrics etc. are discussed in-depth in the organisation’s impact evaluations.
What I’m keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.
Just to clarify, you’d like to see funding to meta-charities increase, so don’t think these worries are actually sufficient to warrant a move back to first order charities?
Cheers,
Ben
PS. One other small thing – it’s odd to class GiveWell as not meta, but 80k as meta. I often think of 80k as the GiveWell of career choice. Just as GiveWell does research into which charities are most effective and publicises it, we do research into which career strategies are most effective and publicise it.
Yes, I agree that this is possible (this is why I said it could be “a reasonable conclusion by each organization”). My point is that because of this phenomenon, you can have the pathological case where from a global perspective, the impact does not justify the costs, even though the impact does justify the costs from the perspective of every organization.
Yeah, I agree that potential economies of scale are much greater than diminishing marginal returns, and I should have mentioned that. Mea culpa.
My impression is that organizations acknowledge that there are issues, but the issues remain. I’ll write up an example with GWWC soon.
That’s correct.
I agree that 80k’s research product is not meta the way I’ve defined it. However, 80k does a lot of publicity and outreach that GiveWell for the most part does not do. For example: the career workshops, the 80K newsletter, the recent 80K book, the TedX talks, the online ads, the flashy website that has popups for the mailing list. To my knowledge, of that list GiveWell only has online ads.
I’ve got a speculative one for GWWC, and a more concrete one for chapter seeding.
GWWC pledges: I’ve mentioned that I don’t worry about traps #2 and #4, and traps #3 and #5 don’t apply to a specific organization, so I’ll skip those.
I don’t think this is a problem for GWWC.
Here are some potential ways that GWWC could be overestimating impact:
They assume that the rate people drop out of the pledge is constant, but I would guess that the rate increases for the first ~10 years and then drops. In addition, I would guess that the average 2016 pledge taker is less committed than the average 2015 pledge taker (more numbers suggests lower quality), so that would also increase the rate.
In one section, instead of computing a weighted average, they compute the average weight and then multiply it by the total donations, for reasons unclear to me. Fixing this could cause the impact to go up or down. I mentioned this to GWWC and so far haven’t gotten a definitive answer.
Counterfactuals are self-reported and so could be systematically wrong. GWWC’s response to this ameliorated my concerns but I would still guess that they are biased in GWWC’s favor. For example, all of the pledges from EA Berkeley members would have happened as long as some sort of pledge existed. You may credit GWWC with some impact since we needed it to exist, but at the very least it wouldn’t be marginal impact. However, I think (I’m guessing) that some of our members assigned counterfactual impact to GWWC.
Their time discounting could be too small (since meta trap #5 suggests a larger time discount).
Now since GWWC talks about their impact in terms of their impact on the organizations directly beneath them on the chain, you don’t see any amplification of overestimates. However, consider the case of local EA groups. They could be overestimating their impact too:
They assume that a GWWC pledge is worth $73,000, which is what GWWC says, but the average local group pledge may be worse than the average GWWC pledge, because the members are not as committed to effectiveness, or they are more likely to drop out due to lack of community. (I say “may be worse” because I don’t know what the average GWWC pledge looks like, it may turn out there are similar problems there.)
They may simply overcount the number of members who have taken the pledge. (I have had at least one student say that they took the pledge, but their name wasn’t on the website, and many students say they will take the pledge but then don’t.)
I do think that local groups sometimes sacrifice quality of pledges for number of pledges when they shouldn’t.
It actually is the case that student pledge takers from EA Berkeley have basically no interaction with the GWWC community. I don’t know why this is or if its normal for pledge takers in general. Sometimes I worry that GWWC spends too much time promoting their top charities since that would improve their metrics, but I have no real reason for thinking that this is actually happening.
My original post explained how this would be the case for GWWC. I agree though that economies of scale will probably dominate for some time.
I think local EA groups and GWWC both take credit for pledges originating from local groups. (It depends on what those pledge takers self-reported as the counterfactual.) If they came from an 80,000 Hours career workshop, then we now have three organizations claiming the impact.
There’s also a good Facebook thread about this. I forgot about it when writing the post.
I’ve mentioned this above—GWWC uses self-reported counterfactuals. If you agree that you should penalize expected value estimates if the evidence is not robust, then I think you should do the same here.
Here’s the second example:
There was an effort to “seed” local EA groups, and in the impact evaluation we see “each hour of staff time generated between $187 and $1270 USD per hour.”
First problem: The entire point of this activity was to get other people to start a local EA group, but the time spent by these other people aren’t included as costs (meta trap #7, kind of). Those other people would probably have to put in ~20 person hours per group to get this impact. If you include these hours as costs, then the estimated cost-effectiveness becomes something more like $167-890.
Second problem: I would bet money at even odds that the pessimistic estimate for cold-email seeding was too optimistic (meta trap #1b). (I’m not questioning the counterfactual rate, I’m questioning the number of chapters that “went silent”.)
Taking these two into account, I think that the chapter seeding was probably worth ~$200 per hour. Now if GWWC itself is too optimistic in its impact calculation (as I think it is), this falls even further (meta trap #1b), and this seems just barely worthwhile.
That said, there are other benefits that aren’t incorporated into the calculation (both for GWWC pledges in general and chapter seeding). So overall it still seems like it was worthwhile, but it’s not nearly as exciting as it initially seemed.
These are all reasonable concerns. I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.
With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn’t have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.
With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say “probably not”. These people might still be biased in our favor, but on the other hand, there’s people we’ve influenced but were pushed over the edge by another org. We don’t count these people towards our impact, even though we made it easier for the other org.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Zooming out a bit, ultimately what we do is make people more likely to pledge.
Here’s a toy model.
At time 0, you have 3 people.
Amy has a 10% chance of taking the pledge
Bob has a 80% chance
Carla has a 90% chance
80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:
Amy: 20%
Bob: 90%
Carla: 100% → she actually takes it
Then GWWC shows them a talk, which has the same effect. So at time 2:
Amy: 30%
Bob: 100% → actually takes it
Carla: 100% (overdetermined)
Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.
GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC’s estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob’s donations to their estimates of counterfactual money moved.
This is pessimistic for 80k, because without 80k, GWWC wouldn’t have persuaded Bob, but this isn’t added to our impact.
It’s also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they’ve made it easier for other organisations to persuade her.
In either case, what’s actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what’s really going on.
Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren’t leading EAB (which took ~10 hours of my time each week). I’m not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).
Hmm, my comment about this was lost.
On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
Anyway, I think there’s 3 main types of cost:
Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you’re into EA. So it’s plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn’t be a big factor in a calculation measured in terms of GiveWell dollars.
Based on other students I know who put time into rationalist or EA societies this seems right.
Okay, this makes more sense. I was mainly thinking of the second point—I agree that the first and third points don’t make too much of a difference. (However, some students can take on important jobs, eg. Oliver Habryka working at CEA while being a student.)
Another possibility is that you graduate faster. Instead of running a local group, you could take one extra course each semester. Aggregating this, for every two years of not running a local group, you could graduate a semester earlier.
(This would be for UC Berkeley, I think it should generalize about the same to other universities as well.)
I strongly disagree.
This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it’s very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is “they’ve probably thought of that”. To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don’t expect to be better than GWWC at estimating counterfactuals (and I don’t have the data to do so), so I can’t show that there’s a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.
There is one point that I can justify further. It’s my impression that meta orgs consistently don’t take into account the time spent by other people/groups, so I wouldn’t call that one a judgment call. Some more examples:
CEA lists “Hosted eight EAGx conferences” as one of their key accomplishments, but as far as I can tell don’t consider the costs to the people who ran the conferences, which can be huge. And there’s no way that you could expect this to come out of leisure time.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
Yes, I agree that there is impact that isn’t counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the “uncounted” impact is probably low).
The main issue is that I don’t expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It’s a little hard for me to imagine what “90% chance” means so it’s hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)
However, I think this is mostly because you’ve chosen a very small effect size. Under this model, it’s impossible for 80k to ever have impact—people will only say they “probably wouldn’t” have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can’t tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)
I can’t speak for the other orgs, but 80k probably wouldn’t count this as “full impact”.
First, the person would have to say they made the pledge “due to 80k”. Whereas if they were heavily influenced by the local group, they might say they would have taken it otherwise.
Second, as a first approximation, we use the same figure GWWC does for a value of a pledge in terms of donations. IIRC this already assumes only 30% is additional, once counterfactually adjusted. This % is based on their surveys of the pledgers. (Moreover, for the largest donors, who determine 75% of the donations, we ask them to make individual estimates too).
Taken together, 80k would attribute at most 30% of the value.
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
What would you estimate is the opportunity cost of student group organiser time per hour?
How would it compare to time spent by 80k staff?
Yes, I’m predicting that they would say that almost always (over 90% of the time).
That does make quite a difference. It seems plausible then that impact is mostly undercounted rather than overcounted. This seems more like an artifact of a weird calculation (why use GWWC’s counterfactual instead of having a separate one)? And you still have the issue that impact may be double counted, it’s just that since you tend to undercount impact in the first place the effects seem to cancel out.
That’s a little uncharitable of me, but the point I’m trying to make is that there is no correction for double-counting impact—most of your counterarguments seem to be saying “we typically underestimate our impact so this doesn’t end up being a problem”. You aren’t using the 30% counterfactual rate because you’re worried about double counting impact with GWWC. (I’m correct about that, right? It would a really strange way to handle double counting of impact.)
Nitpick: This spreadsheet suggests 53%, and then adds some more impact based on changing where people donate (which could double count with GiveWell).
I agree that impact is often undercounted. I accept that impact is often undercounted, to such a degree that double counting would not get you over 100%. I still worry that people think “Their impact numbers are great and probably significant underestimates” without thinking about the issue of double counting, especially since most orgs make sure to mention how their impact estimates are likely underestimates.
Even if people just donated on the basis of “their impact numbers are great” without thinking about both undercounting and overcounting, I would worry that they are making the right decision for the wrong reasons. We should promote more rigorous thinking.
My perspective is something like “donors should know about these considerations”, whereas you may be interpreting it as “people who work in meta don’t know/care about these considerations”. I would only endorse the latter in the one specific case of not valuing the time of other groups/people.
The number I use for myself is $20, mostly just made up so that I can use it in Fermi estimates.
Unsure. Probably a little bit higher, but not much. Say $40?
(I have not thought much about the actual numbers. I do think that the ratio between the two should be relatively small.)
I also don’t care too much that 80k doesn’t include costs to student groups because those costs are relatively small compared to the costs to 80k (probably). This is why I haven’t really looked into it. This is not the case with GWWC pledges or chapter seeding.
Hey Rohin, without getting into the details, I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse, so I’m not sure how we should act. We win out in some cases (we get bragging rights from someone who found out about EA from another source then changes their career) and lose in others (someone who finds out about GiveWell through 80k but doesn’t then attribute their donations to us).
There’s double counting yes, but the orgs are also legitimately complementary of one another—not sure if the double counting exceeds the real complementarity.
We could try to measure the benefit/cost of the movement as a whole—this gets rid of the attribution and complementarity problem, though loses the ability to tell what is best within the movement.
I’m a little unclear on what you mean here. I see three different factors:
Various orgs are undercounting their impact because they don’t count small changes that are part of a larger effort, even though in theory from a single player perspective, they should count the impact.
In some cases, two (or more) organizations both reach out to an individual, but either one of the organizations would have been sufficient, so neither of them get any counterfactual impact (more generally, the sum of the individually recorded impacts is less than the impact of the system as a whole)
Multiple orgs have claimed the same object-level impact (eg. an additional $100,000 to AMF from a GWWC pledge) because they were all counterfactually responsible for it (more generally, the sum of the individually recorded impacts is more than the impact of the system as a whole).
Let’s suppose:
X is the impact of an org from a single player perspective
Y is the impact of an org taking a system-level view (so that the sum of Y values for all orgs is equal to the impact of the system as a whole)
Point 1 doesn’t change X or Y, but it does change the estimate we make of X and Y, and tends to increase it.
Point 2 can only tend to make Y > X.
Point 3 can only tend to make Y < X.
Is your claim that the combination of points 1 and 2 may outweigh point 3, or just that point 2 may outweigh point 3? I can believe the former, but the latter seems unlikely—it doesn’t seem very common for many separate orgs to all be capable of making the same change, it seems more likely to me that in such cases all of the orgs are necessary which would be an instance of point 3.
Yeah, this is the best idea I’ve come up with so far, but I don’t really like it much. (Do you include local groups? Do you include the time that EAs spend talking to their friends? If not, how do you determine how much of the impact to attribute to meta orgs vs. normal network effects?) It would be a good start though.
Another possibility is to cross-reference data between all meta orgs, and try to figure out whether for each person, the sum of the impacts recorded by all meta orgs is a reasonable number. Not sure how feasible this actually is (in particular, it’s hard to know what a “reasonable number” would be, and coordinating among so many organizations seems quite hard).
I agree the double-counting issue is pretty complex. (I think maybe the “fraction of value added” approach I mention in the value of coordination post is along the right lines)
I think the key point is that it seems unlikely that (given how orgs currently measure impact) they’re claiming significantly more than 100% in aggregate. This is partly because there’s already lots of adjustments that pick up some of this (e.g. asking people if they would have done X due to another org) and because there are various types of undercounting.
Given this, adding a further correction for double counting doesn’t seem like a particularly big consideration—there are more pressing sources of uncertainty.
Yes, I agree with this. (See also my reply to Rob above.)
Maybe instead of talking about “meta traps” we should talk about “promotion traps” or something?
Yeah, that does seem to capture the idea better.
I don’t think you’ll want to equate being a meta org with what proportion of your time you spend on outreach. Some object level charities do a lot of outreach too. If AMF started spending 25% of its budget on marketing, would it become a meta-charity?
Sure 80k puts more effort into outreach than GiveWell, but the core model is very similar.
Fair enough, I don’t particularly care about which organization is a “meta org” and which one is not, I mostly care about where these meta traps apply and where they don’t. Probably should have talked about “meta work” instead of “meta org”. Anyway, it does seem like the traps apply to the outreach portion of 80k and not to GiveWell (since they barely have any outreach).
If AMF started spending a lot on marketing, I would count that as “meta work”, though I think a lot of these traps would not apply to that specific scenario.
At a glance, it seems like most of the meta-traps don’t apply to stuff like promotion of object-level causes.
That’s why Peter Hurford distinguished between second-level and first-level meta, and focused his criticism on the second-level.
80,000 Hours and GiveWell are both mainly doing first-level meta (i.e. we promote specific first order opportunities for impact); though we also do some second-level meta (promoting EA as an idea). 80k does more second-level meta day-to-day than GiveWell, though GiveWell explains their ultimate mission in second-level meta terms:
One other quick point is that I don’t think coordination problems arise especially from meta-work. Rather, coordination problems can arise anywhere in which the best action for you depends on what someone else is going to do. E.g. you can get coordination problems among global health donors (GiveWell has written a lot about this). The points you list under “coordination problems” seem more like examples of why the counterfactuals are hard to assess, which is already under trap 8.
I mostly agree, but I think a lot of them do apply to first-level meta in many cases. For example I talked about how they apply to GWWC, which is first-level meta (I think).
Yes, and I specifically didn’t include that kind of first-level meta work. I think the parts of first-level meta that are affected by these traps are efforts to fundraise for effective organizations, mainly ones that target EAs specifically. Even for general fundraising though, I think several traps still do apply, such as trap #1, #6 and #8.
I agree, I think it’s just disproportionately the case that donors to meta work are not taking into account these considerations. GiveWell and ACE take these considerations into account when making recommendations, so anyone relying on those recommendations has already “taken it into account”. This may arise in X-risk, I’m not sure—certainly it seems to apply to the part of X-risk that is about convincing other people to work on X-risk.
Well, even if each organization assesses counterfactuals perfectly, you still have the problem that the sum of the impacts across all organizations may be larger than 100%. The made-up example with Alice was meant to illustrate a case where each organization assesses their impact perfectly, comes to a ratio of 2:1 correctly, but in aggregate they would have spent more than was warranted.
What makes you think this? I found this post interesting, but not new; it’s all stuff I’ve thought about quite hard before. I wouldn’t have thought I was roughly representative of meta donors here (I certainly know people who have thought harder), though I’d be happy for other such donors to contradict me.
I’ve had conversations with people who said they’ve donated to GWWC because of high leverage ratios, and my impression based on those conversations is that they take the multiplier fairly literally (“even if it’s off by an order of magnitude it’s still worthwhile”) without really considering the alternatives.
In addition, it’s really easy to find all of the arguments in favor of meta, including (many of) the arguments that impact is probably being undercounted—you just have to read the fundraising posts by meta orgs. I don’t know of any post other than Hurford’s that suggests considerations against meta. It took me about a year to generate all of the ideas not in that post, and it certainly helped that I was working in meta myself.
I think the arguments in favor of meta are intuitive, but not easy to find. For one thing, the org’s posts tend to be org-specific (unsurprisngly) rather than a general defense of meta work. In fact, to the best of my knowledge the best general arguments have never been made on the forum at the top level because it’s sort-of-assumed that everybody knows them. So while you’re saying Peter’s post is the only such post you could find, that’s still more than the reverse (and with your post, it’s now 2 − 0).
At the comment level it’s easy to find plenty of examples of people making anti-meta arguments.
I think it’s not quite what you’re looking for, but I wrote How valuable is movement growth?, which is an article analysing the long-term counterfactual impact of different types of short-term movement growth effects. (It doesn’t properly speak to the empirical question of how short-term effort into meta work translates into short-term movement growth effects.)
Huh, there is a surprising lack of a canonical article that makes the case for meta work. (Just tried to find one.) That said, it’s very common when getting interested in EA to hear about GiveWell, GWWC and 80K, and to look them up, which gives you a sense of the arguments for meta.
Also, I would actually prefer that the arguments against also be org-specific, since that’s typically more decision-relevant, but a) that’s more work and b) it’s hard to do without actually being a part of the organization.
Anyway, even though there’s not a general article arguing for meta (which I am surprised by), that doesn’t particularly change my belief that a lot of people know the arguments for but not the arguments against. This has increased my estimate of the number of people who know neither the arguments for nor the arguments against.
Sure, I think we’re on the same page here.
I’m hoping/planning to plug both of those holes (a lack of org-specific criticism, and the uncomplied general arguments in favour) in the next few weeks, so did want to double-check that there wasn’t a canonical piece that I was missing.