Rohin Shah comments on Thoughts on the “Meta Trap”

Rohin Shah 14 Dec 2021 18:50 UTC
6 points
0 ∶ 0
Apparently this post has been nominated for the review! And here I thought almost no one had read it and liked it.
Reading through it again 5 years later, I feel pretty happy with this post. It’s clear about what it is and isn’t saying (in particular, it explicitly disclaims the argument that meta should get less money), and is careful in its use of arguments (e.g. trap #8 specifically mentions that counterfactuals being hard isn’t a trap until you combine it with a bias towards worse counterfactuals). I still agree that all of the traps mentioned here are worth keeping in mind when working on “meta”.
The biggest critique of this post is that it doesn’t demonstrate that any of these traps actually happen(ed) in practice. It has several examples, but most are of the form “such-and-such bad thing could be happening, I can’t tell from the outside”. This comment makes some more speculative claims about what bad things actually happened, but they are speculative and the response mentions that they were probably already taken into account.
I think this does in fact make the post less valuable than it otherwise could be. Nonetheless, I still find the post important, because it’s the closest we get to criticism of “meta” work. In theory, we could have better criticisms from people who are actually doing the work themselves, who can say more definitively whether in practice there are cases of these “traps”, but in practice I have not seen such critiques.
If I were rewriting this post today, I’d make a few changes:
- Stop saying “meta”. I don’t know what I’d replace it with, but “meta” is too ambiguous and easily misunderstood. “Promotion traps” came up as a suggestion in the comments; that seems reasonable.
- Focus on properties of the work. Instead of having a single type of work called “meta” and then talking about various traps, it seems better to talk about specific traps and say what kinds of work that trap applies to. For example, trap #1 applies to work that has a long chain of impact, whereas trap #6 and trap #7 apply whenever there are multiple things optimizing the same outcome. It happens that what I called “meta” work in 2016 satisfied both of these properties, but the analysis would be stronger if I had just talked about the properties and then noted that “meta” work has both of these properties. (This would also make it easier to talk about which traps apply to which pieces of work, rather than arguing about whether GiveWell and 80K count as “meta” work.)
- Note positives of “meta”. This post is straightforwardly about the negatives; it would have been good to acknowledge the positives as well (which I had probably been taking as background knowledge), or at least say that I’m only focusing on negatives because the positives are more widely known.
- Note the possibility of increasing marginal returns. Trap #6 is implicitly arguing for diminishing marginal returns, but there’s a strong case for increasing marginal returns instead. I don’t currently think this is quite as clear as it sounds in the comments—you want to distinguish between exponential-growth-that-would-have-happened-anyway vs. exponential-growth-that-happens-as-a-result-of-future-work—but I think it is more likely increasing than decreasing.
- Emphasize taking the perspective of “EA as a whole”. Trap #7 is centrally about how individually rational actions by orgs can be irrational from the perspective of an “EA superagent” trying to coordinate all of EA. Ideally I would have added motivation for why “what an EA superagent should do” was the appropriate perspective, rather than “what an EA org should do”, “what in individual should do”, or “what humanity as a whole should do”.
Some miscellaneous thoughts:
- The clearest critique of “meta”-in-practice I gave in this post is that cost effectiveness analyses often don’t take into account the costs incurred by people outside of the “meta” org. I think this critique has become stronger over time. As a small example, people working on improving the AI safety pipeline often ask for an hour or two of my time (which I value quite highly); I doubt these costs are making it into cost effectiveness analyses.
- Some commenters questioned whether there is more knowledge of positive arguments for “meta” work vs. negative arguments. The fact that enough people read this post for it to be nominated for the review is giving me some pause, but despite no longer working in “meta”, I do still have the occasional conversation where a “meta” person seems more aware of the positive arguments than the negative ones, but not the other way around, so I continue to think that the positives are better known than the negatives.
What links here?
- Results from the First Decade Review by Lizka (13 May 2022 15:01 UTC; 164 points)