What I’m keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.
I’ve got a speculative one for GWWC, and a more concrete one for chapter seeding.
GWWC pledges: I’ve mentioned that I don’t worry about traps #2 and #4, and traps #3 and #5 don’t apply to a specific organization, so I’ll skip those.
Meta Trap #1a. Probability of not having an impact
I don’t think this is a problem for GWWC.
Meta Trap #1b. Overestimating impact
Here are some potential ways that GWWC could be overestimating impact:
They assume that the rate people drop out of the pledge is constant, but I would guess that the rate increases for the first ~10 years and then drops. In addition, I would guess that the average 2016 pledge taker is less committed than the average 2015 pledge taker (more numbers suggests lower quality), so that would also increase the rate.
In one section, instead of computing a weighted average, they compute the average weight and then multiply it by the total donations, for reasons unclear to me. Fixing this could cause the impact to go up or down. I mentioned this to GWWC and so far haven’t gotten a definitive answer.
Counterfactuals are self-reported and so could be systematically wrong. GWWC’s response to this ameliorated my concerns but I would still guess that they are biased in GWWC’s favor. For example, all of the pledges from EA Berkeley members would have happened as long as some sort of pledge existed. You may credit GWWC with some impact since we needed it to exist, but at the very least it wouldn’t be marginal impact. However, I think (I’m guessing) that some of our members assigned counterfactual impact to GWWC.
Their time discounting could be too small (since meta trap #5 suggests a larger time discount).
Now since GWWC talks about their impact in terms of their impact on the organizations directly beneath them on the chain, you don’t see any amplification of overestimates. However, consider the case of local EA groups. They could be overestimating their impact too:
They assume that a GWWC pledge is worth $73,000, which is what GWWC says, but the average local group pledge may be worse than the average GWWC pledge, because the members are not as committed to effectiveness, or they are more likely to drop out due to lack of community. (I say “may be worse” because I don’t know what the average GWWC pledge looks like, it may turn out there are similar problems there.)
They may simply overcount the number of members who have taken the pledge. (I have had at least one student say that they took the pledge, but their name wasn’t on the website, and many students say they will take the pledge but then don’t.)
Meta Trap #1c. Issues with metrics
I do think that local groups sometimes sacrifice quality of pledges for number of pledges when they shouldn’t.
It actually is the case that student pledge takers from EA Berkeley have basically no interaction with the GWWC community. I don’t know why this is or if its normal for pledge takers in general. Sometimes I worry that GWWC spends too much time promoting their top charities since that would improve their metrics, but I have no real reason for thinking that this is actually happening.
Meta Trap #6. Marginal impact may be much lower than average impact.
My original post explained how this would be the case for GWWC. I agree though that economies of scale will probably dominate for some time.
Meta Trap #7. Meta suffers more from coordination problems.
I think local EA groups and GWWC both take credit for pledges originating from local groups. (It depends on what those pledge takers self-reported as the counterfactual.) If they came from an 80,000 Hours career workshop, then we now have three organizations claiming the impact.
There’s also a good Facebook thread about this. I forgot about it when writing the post.
Meta Trap #8. Counterfactuals are harder to assess.
I’ve mentioned this above—GWWC uses self-reported counterfactuals. If you agree that you should penalize expected value estimates if the evidence is not robust, then I think you should do the same here.
Here’s the second example:
There was an effort to “seed” local EA groups, and in the impact evaluation we see “each hour of staff time generated between $187 and $1270 USD per hour.”
First problem: The entire point of this activity was to get other people to start a local EA group, but the time spent by these other people aren’t included as costs (meta trap #7, kind of). Those other people would probably have to put in ~20 person hours per group to get this impact. If you include these hours as costs, then the estimated cost-effectiveness becomes something more like $167-890.
Second problem: I would bet money at even odds that the pessimistic estimate for cold-email seeding was too optimistic (meta trap #1b). (I’m not questioning the counterfactual rate, I’m questioning the number of chapters that “went silent”.)
Taking these two into account, I think that the chapter seeding was probably worth ~$200 per hour. Now if GWWC itself is too optimistic in its impact calculation (as I think it is), this falls even further (meta trap #1b), and this seems just barely worthwhile.
That said, there are other benefits that aren’t incorporated into the calculation (both for GWWC pledges in general and chapter seeding). So overall it still seems like it was worthwhile, but it’s not nearly as exciting as it initially seemed.
These are all reasonable concerns. I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.
With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn’t have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.
With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say “probably not”. These people might still be biased in our favor, but on the other hand, there’s people we’ve influenced but were pushed over the edge by another org. We don’t count these people towards our impact, even though we made it easier for the other org.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Zooming out a bit, ultimately what we do is make people more likely to pledge.
Here’s a toy model.
At time 0, you have 3 people.
Amy has a 10% chance of taking the pledge
Bob has a 80% chance
Carla has a 90% chance
80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:
Amy: 20%
Bob: 90%
Carla: 100% → she actually takes it
Then GWWC shows them a talk, which has the same effect. So at time 2:
Amy: 30%
Bob: 100% → actually takes it
Carla: 100% (overdetermined)
Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.
GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC’s estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob’s donations to their estimates of counterfactual money moved.
This is pessimistic for 80k, because without 80k, GWWC wouldn’t have persuaded Bob, but this isn’t added to our impact.
It’s also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they’ve made it easier for other organisations to persuade her.
In either case, what’s actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what’s really going on.
Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren’t leading EAB (which took ~10 hours of my time each week). I’m not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).
On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
Anyway, I think there’s 3 main types of cost:
Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you’re into EA. So it’s plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn’t be a big factor in a calculation measured in terms of GiveWell dollars.
Okay, this makes more sense. I was mainly thinking of the second point—I agree that the first and third points don’t make too much of a difference. (However, some students can take on important jobs, eg. Oliver Habryka working at CEA while being a student.)
Another possibility is that you graduate faster. Instead of running a local group, you could take one extra course each semester. Aggregating this, for every two years of not running a local group, you could graduate a semester earlier.
(This would be for UC Berkeley, I think it should generalize about the same to other universities as well.)
In practice, it seems like much chapter time comes out of other leisure activities.
I strongly disagree.
I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates.
This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it’s very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is “they’ve probably thought of that”. To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don’t expect to be better than GWWC at estimating counterfactuals (and I don’t have the data to do so), so I can’t show that there’s a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.
There is one point that I can justify further. It’s my impression that meta orgs consistently don’t take into account the time spent by other people/groups, so I wouldn’t call that one a judgment call. Some more examples:
CEA lists “Hosted eight EAGx conferences” as one of their key accomplishments, but as far as I can tell don’t consider the costs to the people who ran the conferences, which can be huge. And there’s no way that you could expect this to come out of leisure time.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Yes, I agree that there is impact that isn’t counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the “uncounted” impact is probably low).
Here’s a toy model.
The main issue is that I don’t expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It’s a little hard for me to imagine what “90% chance” means so it’s hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)
However, I think this is mostly because you’ve chosen a very small effect size. Under this model, it’s impossible for 80k to ever have impact—people will only say they “probably wouldn’t” have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can’t tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
I can’t speak for the other orgs, but 80k probably wouldn’t count this as “full impact”.
First, the person would have to say they made the pledge “due to 80k”. Whereas if they were heavily influenced by the local group, they might say they would have taken it otherwise.
Second, as a first approximation, we use the same figure GWWC does for a value of a pledge in terms of donations. IIRC this already assumes only 30% is additional, once counterfactually adjusted. This % is based on their surveys of the pledgers. (Moreover, for the largest donors, who determine 75% of the donations, we ask them to make individual estimates too).
Taken together, 80k would attribute at most 30% of the value.
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
What would you estimate is the opportunity cost of student group organiser time per hour?
First, the person would have to say they made the pledge “due to 80k”.
Yes, I’m predicting that they would say that almost always (over 90% of the time).
this already assumes only 30% is additional, once counterfactually adjusted.
That does make quite a difference. It seems plausible then that impact is mostly undercounted rather than overcounted. This seems more like an artifact of a weird calculation (why use GWWC’s counterfactual instead of having a separate one)? And you still have the issue that impact may be double counted, it’s just that since you tend to undercount impact in the first place the effects seem to cancel out.
That’s a little uncharitable of me, but the point I’m trying to make is that there is no correction for double-counting impact—most of your counterarguments seem to be saying “we typically underestimate our impact so this doesn’t end up being a problem”. You aren’t using the 30% counterfactual rate because you’re worried about double counting impact with GWWC. (I’m correct about that, right? It would a really strange way to handle double counting of impact.)
Nitpick: This spreadsheet suggests 53%, and then adds some more impact based on changing where people donate (which could double count with GiveWell).
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
I agree that impact is often undercounted. I accept that impact is often undercounted, to such a degree that double counting would not get you over 100%. I still worry that people think “Their impact numbers are great and probably significant underestimates” without thinking about the issue of double counting, especially since most orgs make sure to mention how their impact estimates are likely underestimates.
Even if people just donated on the basis of “their impact numbers are great” without thinking about both undercounting and overcounting, I would worry that they are making the right decision for the wrong reasons. We should promote more rigorous thinking.
My perspective is something like “donors should know about these considerations”, whereas you may be interpreting it as “people who work in meta don’t know/care about these considerations”. I would only endorse the latter in the one specific case of not valuing the time of other groups/people.
What would you estimate is the opportunity cost of student group organiser time per hour?
The number I use for myself is $20, mostly just made up so that I can use it in Fermi estimates.
How would it compare to time spent by 80k staff?
Unsure. Probably a little bit higher, but not much. Say $40?
(I have not thought much about the actual numbers. I do think that the ratio between the two should be relatively small.)
I also don’t care too much that 80k doesn’t include costs to student groups because those costs are relatively small compared to the costs to 80k (probably). This is why I haven’t really looked into it. This is not the case with GWWC pledges or chapter seeding.
Hey Rohin, without getting into the details, I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse, so I’m not sure how we should act. We win out in some cases (we get bragging rights from someone who found out about EA from another source then changes their career) and lose in others (someone who finds out about GiveWell through 80k but doesn’t then attribute their donations to us).
There’s double counting yes, but the orgs are also legitimately complementary of one another—not sure if the double counting exceeds the real complementarity.
We could try to measure the benefit/cost of the movement as a whole—this gets rid of the attribution and complementarity problem, though loses the ability to tell what is best within the movement.
I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse
I’m a little unclear on what you mean here. I see three different factors:
Various orgs are undercounting their impact because they don’t count small changes that are part of a larger effort, even though in theory from a single player perspective, they should count the impact.
In some cases, two (or more) organizations both reach out to an individual, but either one of the organizations would have been sufficient, so neither of them get any counterfactual impact (more generally, the sum of the individually recorded impacts is less than the impact of the system as a whole)
Multiple orgs have claimed the same object-level impact (eg. an additional $100,000 to AMF from a GWWC pledge) because they were all counterfactually responsible for it (more generally, the sum of the individually recorded impacts is more than the impact of the system as a whole).
Let’s suppose:
X is the impact of an org from a single player perspective
Y is the impact of an org taking a system-level view (so that the sum of Y values for all orgs is equal to the impact of the system as a whole)
Point 1 doesn’t change X or Y, but it does change the estimate we make of X and Y, and tends to increase it.
Point 2 can only tend to make Y > X.
Point 3 can only tend to make Y < X.
Is your claim that the combination of points 1 and 2 may outweigh point 3, or just that point 2 may outweigh point 3? I can believe the former, but the latter seems unlikely—it doesn’t seem very common for many separate orgs to all be capable of making the same change, it seems more likely to me that in such cases all of the orgs are necessary which would be an instance of point 3.
We could try to measure the benefit/cost of the movement as a whole
Yeah, this is the best idea I’ve come up with so far, but I don’t really like it much. (Do you include local groups? Do you include the time that EAs spend talking to their friends? If not, how do you determine how much of the impact to attribute to meta orgs vs. normal network effects?) It would be a good start though.
Another possibility is to cross-reference data between all meta orgs, and try to figure out whether for each person, the sum of the impacts recorded by all meta orgs is a reasonable number. Not sure how feasible this actually is (in particular, it’s hard to know what a “reasonable number” would be, and coordinating among so many organizations seems quite hard).
I agree the double-counting issue is pretty complex. (I think maybe the “fraction of value added” approach I mention in the value of coordination post is along the right lines)
I think the key point is that it seems unlikely that (given how orgs currently measure impact) they’re claiming significantly more than 100% in aggregate. This is partly because there’s already lots of adjustments that pick up some of this (e.g. asking people if they would have done X due to another org) and because there are various types of undercounting.
Given this, adding a further correction for double counting doesn’t seem like a particularly big consideration—there are more pressing sources of uncertainty.
I’ve got a speculative one for GWWC, and a more concrete one for chapter seeding.
GWWC pledges: I’ve mentioned that I don’t worry about traps #2 and #4, and traps #3 and #5 don’t apply to a specific organization, so I’ll skip those.
I don’t think this is a problem for GWWC.
Here are some potential ways that GWWC could be overestimating impact:
They assume that the rate people drop out of the pledge is constant, but I would guess that the rate increases for the first ~10 years and then drops. In addition, I would guess that the average 2016 pledge taker is less committed than the average 2015 pledge taker (more numbers suggests lower quality), so that would also increase the rate.
In one section, instead of computing a weighted average, they compute the average weight and then multiply it by the total donations, for reasons unclear to me. Fixing this could cause the impact to go up or down. I mentioned this to GWWC and so far haven’t gotten a definitive answer.
Counterfactuals are self-reported and so could be systematically wrong. GWWC’s response to this ameliorated my concerns but I would still guess that they are biased in GWWC’s favor. For example, all of the pledges from EA Berkeley members would have happened as long as some sort of pledge existed. You may credit GWWC with some impact since we needed it to exist, but at the very least it wouldn’t be marginal impact. However, I think (I’m guessing) that some of our members assigned counterfactual impact to GWWC.
Their time discounting could be too small (since meta trap #5 suggests a larger time discount).
Now since GWWC talks about their impact in terms of their impact on the organizations directly beneath them on the chain, you don’t see any amplification of overestimates. However, consider the case of local EA groups. They could be overestimating their impact too:
They assume that a GWWC pledge is worth $73,000, which is what GWWC says, but the average local group pledge may be worse than the average GWWC pledge, because the members are not as committed to effectiveness, or they are more likely to drop out due to lack of community. (I say “may be worse” because I don’t know what the average GWWC pledge looks like, it may turn out there are similar problems there.)
They may simply overcount the number of members who have taken the pledge. (I have had at least one student say that they took the pledge, but their name wasn’t on the website, and many students say they will take the pledge but then don’t.)
I do think that local groups sometimes sacrifice quality of pledges for number of pledges when they shouldn’t.
It actually is the case that student pledge takers from EA Berkeley have basically no interaction with the GWWC community. I don’t know why this is or if its normal for pledge takers in general. Sometimes I worry that GWWC spends too much time promoting their top charities since that would improve their metrics, but I have no real reason for thinking that this is actually happening.
My original post explained how this would be the case for GWWC. I agree though that economies of scale will probably dominate for some time.
I think local EA groups and GWWC both take credit for pledges originating from local groups. (It depends on what those pledge takers self-reported as the counterfactual.) If they came from an 80,000 Hours career workshop, then we now have three organizations claiming the impact.
There’s also a good Facebook thread about this. I forgot about it when writing the post.
I’ve mentioned this above—GWWC uses self-reported counterfactuals. If you agree that you should penalize expected value estimates if the evidence is not robust, then I think you should do the same here.
Here’s the second example:
There was an effort to “seed” local EA groups, and in the impact evaluation we see “each hour of staff time generated between $187 and $1270 USD per hour.”
First problem: The entire point of this activity was to get other people to start a local EA group, but the time spent by these other people aren’t included as costs (meta trap #7, kind of). Those other people would probably have to put in ~20 person hours per group to get this impact. If you include these hours as costs, then the estimated cost-effectiveness becomes something more like $167-890.
Second problem: I would bet money at even odds that the pessimistic estimate for cold-email seeding was too optimistic (meta trap #1b). (I’m not questioning the counterfactual rate, I’m questioning the number of chapters that “went silent”.)
Taking these two into account, I think that the chapter seeding was probably worth ~$200 per hour. Now if GWWC itself is too optimistic in its impact calculation (as I think it is), this falls even further (meta trap #1b), and this seems just barely worthwhile.
That said, there are other benefits that aren’t incorporated into the calculation (both for GWWC pledges in general and chapter seeding). So overall it still seems like it was worthwhile, but it’s not nearly as exciting as it initially seemed.
These are all reasonable concerns. I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.
With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn’t have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.
With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say “probably not”. These people might still be biased in our favor, but on the other hand, there’s people we’ve influenced but were pushed over the edge by another org. We don’t count these people towards our impact, even though we made it easier for the other org.
(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
Zooming out a bit, ultimately what we do is make people more likely to pledge.
Here’s a toy model.
At time 0, you have 3 people.
Amy has a 10% chance of taking the pledge
Bob has a 80% chance
Carla has a 90% chance
80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:
Amy: 20%
Bob: 90%
Carla: 100% → she actually takes it
Then GWWC shows them a talk, which has the same effect. So at time 2:
Amy: 30%
Bob: 100% → actually takes it
Carla: 100% (overdetermined)
Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.
GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC’s estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob’s donations to their estimates of counterfactual money moved.
This is pessimistic for 80k, because without 80k, GWWC wouldn’t have persuaded Bob, but this isn’t added to our impact.
It’s also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they’ve made it easier for other organisations to persuade her.
In either case, what’s actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what’s really going on.
Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren’t leading EAB (which took ~10 hours of my time each week). I’m not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).
Hmm, my comment about this was lost.
On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
Anyway, I think there’s 3 main types of cost:
Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you’re into EA. So it’s plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn’t be a big factor in a calculation measured in terms of GiveWell dollars.
Based on other students I know who put time into rationalist or EA societies this seems right.
Okay, this makes more sense. I was mainly thinking of the second point—I agree that the first and third points don’t make too much of a difference. (However, some students can take on important jobs, eg. Oliver Habryka working at CEA while being a student.)
Another possibility is that you graduate faster. Instead of running a local group, you could take one extra course each semester. Aggregating this, for every two years of not running a local group, you could graduate a semester earlier.
(This would be for UC Berkeley, I think it should generalize about the same to other universities as well.)
I strongly disagree.
This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it’s very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is “they’ve probably thought of that”. To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don’t expect to be better than GWWC at estimating counterfactuals (and I don’t have the data to do so), so I can’t show that there’s a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.
There is one point that I can justify further. It’s my impression that meta orgs consistently don’t take into account the time spent by other people/groups, so I wouldn’t call that one a judgment call. Some more examples:
CEA lists “Hosted eight EAGx conferences” as one of their key accomplishments, but as far as I can tell don’t consider the costs to the people who ran the conferences, which can be huge. And there’s no way that you could expect this to come out of leisure time.
I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
Yes, I agree that there is impact that isn’t counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the “uncounted” impact is probably low).
The main issue is that I don’t expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It’s a little hard for me to imagine what “90% chance” means so it’s hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)
However, I think this is mostly because you’ve chosen a very small effect size. Under this model, it’s impossible for 80k to ever have impact—people will only say they “probably wouldn’t” have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.
Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can’t tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)
I can’t speak for the other orgs, but 80k probably wouldn’t count this as “full impact”.
First, the person would have to say they made the pledge “due to 80k”. Whereas if they were heavily influenced by the local group, they might say they would have taken it otherwise.
Second, as a first approximation, we use the same figure GWWC does for a value of a pledge in terms of donations. IIRC this already assumes only 30% is additional, once counterfactually adjusted. This % is based on their surveys of the pledgers. (Moreover, for the largest donors, who determine 75% of the donations, we ask them to make individual estimates too).
Taken together, 80k would attribute at most 30% of the value.
Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
What would you estimate is the opportunity cost of student group organiser time per hour?
How would it compare to time spent by 80k staff?
Yes, I’m predicting that they would say that almost always (over 90% of the time).
That does make quite a difference. It seems plausible then that impact is mostly undercounted rather than overcounted. This seems more like an artifact of a weird calculation (why use GWWC’s counterfactual instead of having a separate one)? And you still have the issue that impact may be double counted, it’s just that since you tend to undercount impact in the first place the effects seem to cancel out.
That’s a little uncharitable of me, but the point I’m trying to make is that there is no correction for double-counting impact—most of your counterarguments seem to be saying “we typically underestimate our impact so this doesn’t end up being a problem”. You aren’t using the 30% counterfactual rate because you’re worried about double counting impact with GWWC. (I’m correct about that, right? It would a really strange way to handle double counting of impact.)
Nitpick: This spreadsheet suggests 53%, and then adds some more impact based on changing where people donate (which could double count with GiveWell).
I agree that impact is often undercounted. I accept that impact is often undercounted, to such a degree that double counting would not get you over 100%. I still worry that people think “Their impact numbers are great and probably significant underestimates” without thinking about the issue of double counting, especially since most orgs make sure to mention how their impact estimates are likely underestimates.
Even if people just donated on the basis of “their impact numbers are great” without thinking about both undercounting and overcounting, I would worry that they are making the right decision for the wrong reasons. We should promote more rigorous thinking.
My perspective is something like “donors should know about these considerations”, whereas you may be interpreting it as “people who work in meta don’t know/care about these considerations”. I would only endorse the latter in the one specific case of not valuing the time of other groups/people.
The number I use for myself is $20, mostly just made up so that I can use it in Fermi estimates.
Unsure. Probably a little bit higher, but not much. Say $40?
(I have not thought much about the actual numbers. I do think that the ratio between the two should be relatively small.)
I also don’t care too much that 80k doesn’t include costs to student groups because those costs are relatively small compared to the costs to 80k (probably). This is why I haven’t really looked into it. This is not the case with GWWC pledges or chapter seeding.
Hey Rohin, without getting into the details, I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse, so I’m not sure how we should act. We win out in some cases (we get bragging rights from someone who found out about EA from another source then changes their career) and lose in others (someone who finds out about GiveWell through 80k but doesn’t then attribute their donations to us).
There’s double counting yes, but the orgs are also legitimately complementary of one another—not sure if the double counting exceeds the real complementarity.
We could try to measure the benefit/cost of the movement as a whole—this gets rid of the attribution and complementarity problem, though loses the ability to tell what is best within the movement.
I’m a little unclear on what you mean here. I see three different factors:
Various orgs are undercounting their impact because they don’t count small changes that are part of a larger effort, even though in theory from a single player perspective, they should count the impact.
In some cases, two (or more) organizations both reach out to an individual, but either one of the organizations would have been sufficient, so neither of them get any counterfactual impact (more generally, the sum of the individually recorded impacts is less than the impact of the system as a whole)
Multiple orgs have claimed the same object-level impact (eg. an additional $100,000 to AMF from a GWWC pledge) because they were all counterfactually responsible for it (more generally, the sum of the individually recorded impacts is more than the impact of the system as a whole).
Let’s suppose:
X is the impact of an org from a single player perspective
Y is the impact of an org taking a system-level view (so that the sum of Y values for all orgs is equal to the impact of the system as a whole)
Point 1 doesn’t change X or Y, but it does change the estimate we make of X and Y, and tends to increase it.
Point 2 can only tend to make Y > X.
Point 3 can only tend to make Y < X.
Is your claim that the combination of points 1 and 2 may outweigh point 3, or just that point 2 may outweigh point 3? I can believe the former, but the latter seems unlikely—it doesn’t seem very common for many separate orgs to all be capable of making the same change, it seems more likely to me that in such cases all of the orgs are necessary which would be an instance of point 3.
Yeah, this is the best idea I’ve come up with so far, but I don’t really like it much. (Do you include local groups? Do you include the time that EAs spend talking to their friends? If not, how do you determine how much of the impact to attribute to meta orgs vs. normal network effects?) It would be a good start though.
Another possibility is to cross-reference data between all meta orgs, and try to figure out whether for each person, the sum of the impacts recorded by all meta orgs is a reasonable number. Not sure how feasible this actually is (in particular, it’s hard to know what a “reasonable number” would be, and coordinating among so many organizations seems quite hard).
I agree the double-counting issue is pretty complex. (I think maybe the “fraction of value added” approach I mention in the value of coordination post is along the right lines)
I think the key point is that it seems unlikely that (given how orgs currently measure impact) they’re claiming significantly more than 100% in aggregate. This is partly because there’s already lots of adjustments that pick up some of this (e.g. asking people if they would have done X due to another org) and because there are various types of undercounting.
Given this, adding a further correction for double counting doesn’t seem like a particularly big consideration—there are more pressing sources of uncertainty.
Yes, I agree with this. (See also my reply to Rob above.)