Thanks, that’s helpful. If you’re saying that the stricter criterion would also apply to DM/CHAI/etc. papers then I’m not as worried about bias against younger researchers.
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that. I’m not sure yet how to formulate one that is time-efficient, so I’m going to bracket that for now (recognizing that might be less useful for you), since I think we actually disagree about in principle what papers are building towards TAI safety.
To elaborate, let’s take verification as an example (since it’s relevant to the Wong & Kolter paper). Lots of people think verification is helpful for TAI safety—MIRI has talked about it in the past, and very long-termist people like Paul Christiano are excited about it as a current direction afaik. If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria. Now the reality is that verification does have lots of traction in the academic community, but why is that? It’s because Wong & Kolter and Raghunathan et al. wrote two early papers that provided promising paths forward on neural net verification, which many other people are now trying to expand on. This seems strictly better to me than the MIRI example, so it seems like either:
-The hypothetical MIRI work shouldn’t have made the cut
-There’s actually two types of verification work (call them VerA and VerB), such that hypothetical MIRI was working on VerA that was relevant, while the above papers are VerB which is not relevant.
-Papers should make the cut on factors other than actual impact, e.g. perhaps the MIRI papers should be included because they’re from MIRI, or you should want to highlight them more because they didn’t get traction.
-Something else I’m missing?
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included (e.g. I think the initial adversarial examples papers by Szegedy and Goodfellow should be included on similar grounds).
Also in terms of alternatives, I’m not sure how time-expensive this is, but some ideas for discovering additional work:
-Following citation trails (esp. to highly-cited papers)
-Going to the personal webpages of authors of relevant papers, to see if there’s more (also similarly for faculty webpages)
Well, it’s biased toward safety organizations, not large organizations.
Yeah, good point. I agree it’s more about organizations (although I do think that DeepMind is benefiting a lot here, e.g. you’re including a fairly comprehensive list of their adversarial robustness work while explicitly ignoring that work at large—it’s not super-clear on what grounds, for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they’re on almost identical topics and some are even follow-ups to the Wong paper).
Not because it’s not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that’s wrong?
That seems wrong to me, but maybe that’s a longer conversation. (I agree that similar papers would probably have come out within the next 3 years, but asking for that level of counterfactual irreplacibility seems kind of unreasonable imo.) I also think that the majority of the CHAI and DeepMind papers included wouldn’t pass that test (tbc I think they’re great papers! I just don’t really see what basis you’re using to separate them).
I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation). I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that’s what you’re saying? But that doesn’t have much to do with motivation.
Also let me be clear that I’m very glad this database exists, and please interpret this as constructive feedback rather than a complaint.
Thanks for curating this! You sort of acknowledge this already, but one bias in this list is that it’s very tilted towards large organizations like DeepMind, CHAI, etc. One way to see this is that you have AugMix by Hendrycks et al., but not the Common Corruptions and Perturbations paper, which has the same first author and publication year and 4x the number of citations (in fact it would top the 2019 list by a wide margin). The main difference is that AugMix had DeepMind co-authors while Common Corruptions did not.
I mainly bring this up because this bias probably particularly falls against junior PhD students, many of whom are doing great work that we should seek to recognize. For instance (and I’m obviously biased here), Aditi Raghunathan and Dan Hendrycks would be at or near the top of your citation count for most years if you included all of their safety-relevant work.
In that vein, the verification work from Zico Kolter’s group should probably be included, e.g. the convex outer polytope [by Eric Wong] and randomized smoothing [by Jeremy Cohen] papers (at least, it’s not clear why you would include Aditi’s SDP work with me and Percy, but not those).
I recognize it might not be feasible to really address this issue entirely, given your resource constraints. But it seems worth thinking about if there are cheap ways to ameliorate this.
Also, in case it’s helpful, here’s a review I wrote in 2019: AI Alignment Research Overview.
I didn’t mean to imply that laziness was the main part of your reply, I was more pointing to “high personal costs of public posting” as an important dynamic that was left out of your list. I’d guess that we probably disagree about how high those are / how much effort it takes to mitigate them, and about how reasonable it is to expect people to be selfless in this regard, but I don’t think we disagree on the overall list of considerations.
I think the reasons people don’t post stuff publicly isn’t out of laziness, but because there’s lots of downside risk, e.g. of someone misinterpreting you and getting upset, and not much upside relative to sharing in smaller circles.
Thanks for writing this and for your research in this area. Based on my own read of the literature, it seems broadly correct to me, and I wish that more people had an accurate impression of polarization on social media vs mainstream news and their relative effects.
While I think your position is much more correct than the conventional one, I did want to point to an interesting paper by Ro’ee Levy, which has some very good descriptive and casual statistics on polarization on Facebook: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3653388. It suggests (among many other interesting findings) that Facebook probably is somewhat more slanted than mainstream news and that this may drive a small but meaningful increase in affective polarization. That being said, it’s unlikely to be the primary driver of US trends.
You also sort of touch on this but I think it’s also helpful to convey when you have genuine uncertainty (not at the cost of needless hedging and underconfidence) and also say when you think someone else (who they have access to) would be likely to have more informed advice on a particular question.
I like your guidelines. Some others that come to mind:
-Some people are not just looking for advice but to avoid the responsibility of choosing for themselves (they want someone else to tell them what the right answer is). I think it’s important to resist this and remind people that ultimately it’s their responsibility to make the decision.
-If someone seems to be making a decision out of fear or anxiety, I try to address this and de-dramatize the different options. People rarely make their best decisions if they’re afraid of the outcomes.
-I try to show my work and give the considerations behind different pieces of advice. That way if they get new evidence later they can integrate it with the considerations rather than starting from scratch.
Thanks! 1 seems believable to me, at least for EA as it currently presents. 2 seems believable on average but I’d expect a lot of heterogeneity (I personally know athletes who have gone on to be very good researchers). It also seems like donations are pretty accessible to everyone, as you can piggyback on other people’s research.
I personally wouldn’t pay that much attention to the particular language people use—it’s more highly correlated with their local culture than with abilities or interests. I’d personally be extra excited to talk to someone with a strong track record of handling uncertainty well who had a completely different vocabulary than me, although I’d also expect it to take more effort to get to the payoff.
This is a bit tangential, but I expect that pro athletes would be able to provide a lot of valuable mentorship to ambitious younger people in EA—my general experience has been that about 30% of the most valuable growth habits I have are imported from sports (and also not commonly found elsewhere). E.g. “The Inner Game of Tennis” was gold and I encourage all my PhD students to read it.
I didn’t downvote, but the analysis seems incorrect to me: most pro athletes are highly intelligent, and in terms of single attributes that predict success in subsequent difficult endeavors I can’t think of much better; I’d probably take it over successful startup CEO even. It also seems like the sort of error that’s particularly costly to make for reasons of overall social dynamics and biases.
Niceness and honesty are both things that take work, and can be especially hard when trying to achieve both at once. I think it’s often possible to achieve both, but this often requires either substantial emotional labor or unusual skill on the part of the person giving feedback. Under realistic constraints on time and opportunity cost, niceness and honesty do trade off against each other.
This isn’t an argument to not care about niceness, but I think it’s important to realize that there is an actual trade-off. I personally prefer people to err strongly on the honesty side when giving me feedback. In the most blunt cases it can ruin my day but I still prefer overall to get the feedback even then.
Okay, thanks for the clarification. I now see where the list comes from, although I personally am bearish on this type of weighting. For one, it ignores many people who are motivated to make AI beneficial for society but don’t happen to frequent certain web forums or communities. Secondly, in my opinion it underrates the benefit of extremely competent peers and overrates the benefit of like-minded peers.
While it’s hard to give generic advice, I would advocate for going to the school that is best at the research topic one is interested in pursuing, or where there is otherwise a good fit with a strong PI (though basing on a single PI rather than one’s top-2/top-3 can sometimes backfire). If one’s interests are not developed enough to have a good sense of topic or PI then I would go with general strength of program.
I’m not sure what the metric for the “good schools” list is but the ranking seemed off to me. Berkeley, Stanford, MIT, CMU, and UW are generally considered the top CS (and ML) schools. Toronto is also top-10 in CS and particularly strong in ML. All of these rankings are of course a bit silly but I still find it hard to justify the given list unless being located in the UK is somehow considered a large bonus.
I intended the document to be broader than a research agenda. For instance I describe many topics that I’m not personally excited about but that other people are and where the excitement seems defensible. I also go into a lot of detail on the reasons that people are interested in different directions. It’s not a literature review in the sense that the references are far from exhaustive but I personally don’t know of any better resource for learning about what’s going on in the field. Of course as the author I’m biased.
Given that Nick has a PhD in Philosophy, and that OpenPhil has funded a large amount of academic research, this explanation seems unlikely.
Disclosure: I am working at OpenPhil over the summer. (I don’t have any particular private information, both of the above facts are publicly available.)
EDIT: I don’t intend to make any statement about whether EA as a whole has an anti-academic bias, just that this particular situation seems unlikely to reflect that.
If we think of the community as needing one ops person and one research person, the marginal value in each area drops to zero once that role is filled.
If we think of the community as needing one ops person and one research person, the marginal value in each area drops to zero once that role is filled.
Yes, but these effects only show up when the number of jobs is small. In particular: If there are already 99 ops people and we are looking at having 99 vs. 100 ops people, the marginal value isn’t going to drop to zero. Going from 99 to 100 ops people means that mission-critical ops tasks will be done slightly better, and that some non-critical tasks will get done that wouldn’t have otherwise. Going from 100 to 101 will have a similar effect.
In contrast, in the traditional comparative advantage setting, there remain gains-from-coordination/gains-from-trade even when the total pool of jobs/goods is quite large.
The fact that gains-from-coordination only show up in the small-N regime here, whereas they show up even in the large-N regime traditionally, seems like a crucial difference that makes it inappropriate to apply standard intuition about comparative advantage in the present setting.
If we want to analyze this more from first principles, we could pick one of the standard justifications for considering comparative advantage and I could try to show why it breaks down here. The one I’m most familiar with is the one by David Ricardo (https://en.wikipedia.org/wiki/Comparative_advantage#Ricardo’s_example).
I’m worried that you’re mis-applying the concept of comparative advantage here. In particular, if agents A and B both have the same values and are pursuing altruistic ends, comparative advantage should not play a role—both agents should just do whatever they have an absolute advantage at (taking into account marginal effects, but in a large population this should often not matter).
For example: suppose that EA has a “shortage of operations people” but person A determines that they would have higher impact doing direct research rather than doing ops. Then in fact the best thing is for person A to work on direct research, even if there are already many other people doing research and few people doing ops. (Of course, person A could be mistaken about which choice has higher impact, but that is different from the trade considerations that comparative advantage is based on.)
I agree with the heuristic “if a type of work seems to have few people working on it, all else equal you should update towards that work being more neglected and hence higher impact” but the justification for that again doesn’t require any considerations of trading with other people . In general, if A and B can trade in a mutually beneficial way, then either A and B have different values or one of them was making a mistake.