Benjamin_Todd comments on Thoughts on the “Meta Trap”

Benjamin_Todd Dec 21, 2016, 5:51 PM
2 points
0 ∶ 0
These are all reasonable concerns. I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.

With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn’t have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.

With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say “probably not”. These people might still be biased in our favor, but on the other hand, there’s people we’ve influenced but were pushed over the edge by another org. We don’t count these people towards our impact, even though we made it easier for the other org.

(We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)

Zooming out a bit, ultimately what we do is make people more likely to pledge.

Here’s a toy model.
- At time 0, you have 3 people.
- Amy has a 10% chance of taking the pledge
- Bob has a 80% chance
- Carla has a 90% chance
80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:
- Amy: 20%
- Bob: 90%
- Carla: 100% → she actually takes it
Then GWWC shows them a talk, which has the same effect. So at time 2:
- Amy: 30%
- Bob: 100% → actually takes it
- Carla: 100% (overdetermined)
Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.

GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC’s estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob’s donations to their estimates of counterfactual money moved.

This is pessimistic for 80k, because without 80k, GWWC wouldn’t have persuaded Bob, but this isn’t added to our impact.

It’s also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they’ve made it easier for other organisations to persuade her.

In either case, what’s actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what’s really going on.
What links here?
- Rohin Shah's comment on Thoughts on the “Meta Trap” by Rohin Shah (Dec 14, 2021, 6:50 PM; 6 points)
- Ajeya Dec 21, 2016, 8:24 PM
  2 points
  0 ∶ 0
  Parent
  Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren’t leading EAB (which took ~10 hours of my time each week). I’m not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).
  - Benjamin_Todd Dec 22, 2016, 9:08 PM
    1 point
    0 ∶ 0
    Parent
    Hmm, my comment about this was lost.
    
    On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
    
    Anyway, I think there’s 3 main types of cost:
    
    Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
    
    Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you’re into EA. So it’s plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
    
    More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn’t be a big factor in a calculation measured in terms of GiveWell dollars.
    - Castand Dec 31, 2016, 9:29 PM
      0 points
      0 ∶ 0
      Parent
      
      On second thoughts, “leisure time” isn’t quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).
      
      Based on other students I know who put time into rationalist or EA societies this seems right.
    - Rohin Shah Dec 23, 2016, 6:37 PM
      0 points
      0 ∶ 0
      Parent
      Okay, this makes more sense. I was mainly thinking of the second point—I agree that the first and third points don’t make too much of a difference. (However, some students can take on important jobs, eg. Oliver Habryka working at CEA while being a student.)
      
      Another possibility is that you graduate faster. Instead of running a local group, you could take one extra course each semester. Aggregating this, for every two years of not running a local group, you could graduate a semester earlier.
      
      (This would be for UC Berkeley, I think it should generalize about the same to other universities as well.)
- Rohin Shah Dec 21, 2016, 8:12 PM
  1 point
  0 ∶ 0
  Parent
  In practice, it seems like much chapter time comes out of other leisure activities.
  
  I strongly disagree.
  
  I can’t speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates.
  
  This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it’s very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is “they’ve probably thought of that”. To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don’t expect to be better than GWWC at estimating counterfactuals (and I don’t have the data to do so), so I can’t show that there’s a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.
  
  There is one point that I can justify further. It’s my impression that meta orgs consistently don’t take into account the time spent by other people/groups, so I wouldn’t call that one a judgment call. Some more examples:
  - CEA lists “Hosted eight EAGx conferences” as one of their key accomplishments, but as far as I can tell don’t consider the costs to the people who ran the conferences, which can be huge. And there’s no way that you could expect this to come out of leisure time.
  - I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
  (We also don’t count people who were influenced by us indirectly, so don’t know they were influenced)
  
  Yes, I agree that there is impact that isn’t counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the “uncounted” impact is probably low).
  
  Here’s a toy model.
  
  The main issue is that I don’t expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It’s a little hard for me to imagine what “90% chance” means so it’s hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)
  
  However, I think this is mostly because you’ve chosen a very small effect size. Under this model, it’s impossible for 80k to ever have impact—people will only say they “probably wouldn’t” have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.
  
  Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
  
  More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can’t tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)
  - Benjamin_Todd Dec 21, 2016, 8:57 PM
    0 points
    0 ∶ 0
    Parent
    
    Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.
    
    I can’t speak for the other orgs, but 80k probably wouldn’t count this as “full impact”.
    
    First, the person would have to say they made the pledge “due to 80k”. Whereas if they were heavily influenced by the local group, they might say they would have taken it otherwise.
    
    Second, as a first approximation, we use the same figure GWWC does for a value of a pledge in terms of donations. IIRC this already assumes only 30% is additional, once counterfactually adjusted. This % is based on their surveys of the pledgers. (Moreover, for the largest donors, who determine 75% of the donations, we ask them to make individual estimates too).
    
    Taken together, 80k would attribute at most 30% of the value.
    
    Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
    
    I don’t know how 80k considers the impact of their career workshops, but I would bet money that they don’t take into account the costs to the local group that hosts the workshop.
    
    What would you estimate is the opportunity cost of student group organiser time per hour?
    
    How would it compare to time spent by 80k staff?
    What links here?
    Benjamin_Todd's comment on Why donate to 80,000 Hours by Benjamin_Todd (Dec 24, 2016, 9:37 PM; 20 points)
    - Rohin Shah Dec 21, 2016, 10:18 PM
      0 points
      0 ∶ 0
      Parent
      
      First, the person would have to say they made the pledge “due to 80k”.
      
      Yes, I’m predicting that they would say that almost always (over 90% of the time).
      
      this already assumes only 30% is additional, once counterfactually adjusted.
      
      That does make quite a difference. It seems plausible then that impact is mostly undercounted rather than overcounted. This seems more like an artifact of a weird calculation (why use GWWC’s counterfactual instead of having a separate one)? And you still have the issue that impact may be double counted, it’s just that since you tend to undercount impact in the first place the effects seem to cancel out.
      
      That’s a little uncharitable of me, but the point I’m trying to make is that there is no correction for double-counting impact—most of your counterarguments seem to be saying “we typically underestimate our impact so this doesn’t end up being a problem”. You aren’t using the 30% counterfactual rate because you’re worried about double counting impact with GWWC. (I’m correct about that, right? It would a really strange way to handle double counting of impact.)
      
      Nitpick: This spreadsheet suggests 53%, and then adds some more impact based on changing where people donate (which could double count with GiveWell).
      
      Third, you can still get the undercounting issue I mentioned. If someone later takes the pledge due to the local group, but was influenced by 80k, 80k probably wouldn’t count it.
      
      I agree that impact is often undercounted. I accept that impact is often undercounted, to such a degree that double counting would not get you over 100%. I still worry that people think “Their impact numbers are great and probably significant underestimates” without thinking about the issue of double counting, especially since most orgs make sure to mention how their impact estimates are likely underestimates.
      
      Even if people just donated on the basis of “their impact numbers are great” without thinking about both undercounting and overcounting, I would worry that they are making the right decision for the wrong reasons. We should promote more rigorous thinking.
      
      My perspective is something like “donors should know about these considerations”, whereas you may be interpreting it as “people who work in meta don’t know/care about these considerations”. I would only endorse the latter in the one specific case of not valuing the time of other groups/people.
      
      What would you estimate is the opportunity cost of student group organiser time per hour?
      
      The number I use for myself is $20, mostly just made up so that I can use it in Fermi estimates.
      
      How would it compare to time spent by 80k staff?
      
      Unsure. Probably a little bit higher, but not much. Say $40?
      
      (I have not thought much about the actual numbers. I do think that the ratio between the two should be relatively small.)
      
      I also don’t care too much that 80k doesn’t include costs to student groups because those costs are relatively small compared to the costs to 80k (probably). This is why I haven’t really looked into it. This is not the case with GWWC pledges or chapter seeding.
      - Robert_Wiblin Dec 23, 2016, 4:36 AM
        2 points
        0 ∶ 0
        Parent
        Hey Rohin, without getting into the details, I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse, so I’m not sure how we should act. We win out in some cases (we get bragging rights from someone who found out about EA from another source then changes their career) and lose in others (someone who finds out about GiveWell through 80k but doesn’t then attribute their donations to us).
        
        There’s double counting yes, but the orgs are also legitimately complementary of one another—not sure if the double counting exceeds the real complementarity.
        
        We could try to measure the benefit/cost of the movement as a whole—this gets rid of the attribution and complementarity problem, though loses the ability to tell what is best within the movement.
        Rohin Shah Dec 23, 2016, 7:00 PM
        0 points
        0 ∶ 0
        Parent
        
        I’m pretty unsure whether correcting for impacts from multiple orgs makes 80,000 Hours look better or worse
        
        I’m a little unclear on what you mean here. I see three different factors:
        
        Various orgs are undercounting their impact because they don’t count small changes that are part of a larger effort, even though in theory from a single player perspective, they should count the impact.
        
        In some cases, two (or more) organizations both reach out to an individual, but either one of the organizations would have been sufficient, so neither of them get any counterfactual impact (more generally, the sum of the individually recorded impacts is less than the impact of the system as a whole)
        
        Multiple orgs have claimed the same object-level impact (eg. an additional $100,000 to AMF from a GWWC pledge) because they were all counterfactually responsible for it (more generally, the sum of the individually recorded impacts is more than the impact of the system as a whole).
        
        Let’s suppose:
        
        X is the impact of an org from a single player perspective
        
        Y is the impact of an org taking a system-level view (so that the sum of Y values for all orgs is equal to the impact of the system as a whole)
        
        Point 1 doesn’t change X or Y, but it does change the estimate we make of X and Y, and tends to increase it.
        
        Point 2 can only tend to make Y > X.
        
        Point 3 can only tend to make Y < X.
        
        Is your claim that the combination of points 1 and 2 may outweigh point 3, or just that point 2 may outweigh point 3? I can believe the former, but the latter seems unlikely—it doesn’t seem very common for many separate orgs to all be capable of making the same change, it seems more likely to me that in such cases all of the orgs are necessary which would be an instance of point 3.
        
        We could try to measure the benefit/cost of the movement as a whole
        
        Yeah, this is the best idea I’ve come up with so far, but I don’t really like it much. (Do you include local groups? Do you include the time that EAs spend talking to their friends? If not, how do you determine how much of the impact to attribute to meta orgs vs. normal network effects?) It would be a good start though.
        
        Another possibility is to cross-reference data between all meta orgs, and try to figure out whether for each person, the sum of the impacts recorded by all meta orgs is a reasonable number. Not sure how feasible this actually is (in particular, it’s hard to know what a “reasonable number” would be, and coordinating among so many organizations seems quite hard).
      - Benjamin_Todd Dec 22, 2016, 9:14 PM
        1 point
        0 ∶ 0
        Parent
        I agree the double-counting issue is pretty complex. (I think maybe the “fraction of value added” approach I mention in the value of coordination post is along the right lines)
        
        I think the key point is that it seems unlikely that (given how orgs currently measure impact) they’re claiming significantly more than 100% in aggregate. This is partly because there’s already lots of adjustments that pick up some of this (e.g. asking people if they would have done X due to another org) and because there are various types of undercounting.
        
        Given this, adding a further correction for double counting doesn’t seem like a particularly big consideration—there are more pressing sources of uncertainty.
        Rohin Shah Dec 23, 2016, 7:02 PM
        0 points
        0 ∶ 0
        Parent
        Yes, I agree with this. (See also my reply to Rob above.)