Thanks for curating this! You sort of acknowledge this already, but one bias in this list is that it’s very tilted towards large organizations like DeepMind, CHAI, etc. One way to see this is that you have AugMix by Hendrycks et al., but not the Common Corruptions and Perturbations paper, which has the same first author and publication year and 4x the number of citations (in fact it would top the 2019 list by a wide margin). The main difference is that AugMix had DeepMind co-authors while Common Corruptions did not.
I mainly bring this up because this bias probably particularly falls against junior PhD students, many of whom are doing great work that we should seek to recognize. For instance (and I’m obviously biased here), Aditi Raghunathan and Dan Hendrycks would be at or near the top of your citation count for most years if you included all of their safety-relevant work.
In that vein, the verification work from Zico Kolter’s group should probably be included, e.g. the convex outer polytope [by Eric Wong] and randomized smoothing [by Jeremy Cohen] papers (at least, it’s not clear why you would include Aditi’s SDP work with me and Percy, but not those).
I recognize it might not be feasible to really address this issue entirely, given your resource constraints. But it seems worth thinking about if there are cheap ways to ameliorate this.
Thanks Jacob. That last link is broken for me, but I think you mean this?
You sort of acknowledge this already, but one bias in this list is that it’s very tilted towards large organizations like DeepMind, CHAI, etc.
Well, it’s biased toward safety organizations, not large organizations. (Indeed, it seems to be biased toward small safety organizations over larges ones since they tend to reply to our emails!) We get good coverage of small orgs like Ought, but you’re right we don’t have a way to easily track individual unaffiliated safety researchers and it’s not fair.
I look forward to a glorious future where this database is so well known that all safety authors naturally send us a link to their work when its released, but for now the best way we have of finding papers is (1) asking safety organizations for what they’ve produced and (2) taking references from review articles. If you can suggest another option for getting more comprehensive coverage per hour of work we’d be very interested to hear it (seriously!).
For what it’s worth, the papers by Hendrycks are very borderline based on our inclusion criteria, and in fact I think if I were classifying it today I think I would not include it. (Not because it’s not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that’s wrong?) For now I’ve added the papers you mention by Hendrycks, Wong, and Cohen to the database, but my guess is they get dropped for being too near-term-motivated when they get reviewed next year.
More generally, let me mention that we do want to recognize great work, but our higher priority is to (1) recognize work that is particularly relevant to TAI safety and (2) help donors assess safety organizations.
Thanks again! I’m adding your 2019 review to the list.
Well, it’s biased toward safety organizations, not large organizations.
Yeah, good point. I agree it’s more about organizations (although I do think that DeepMind is benefiting a lot here, e.g. you’re including a fairly comprehensive list of their adversarial robustness work while explicitly ignoring that work at large—it’s not super-clear on what grounds, for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they’re on almost identical topics and some are even follow-ups to the Wong paper).
Not because it’s not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that’s wrong?
That seems wrong to me, but maybe that’s a longer conversation. (I agree that similar papers would probably have come out within the next 3 years, but asking for that level of counterfactual irreplacibility seems kind of unreasonable imo.) I also think that the majority of the CHAI and DeepMind papers included wouldn’t pass that test (tbc I think they’re great papers! I just don’t really see what basis you’re using to separate them).
I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation). I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that’s what you’re saying? But that doesn’t have much to do with motivation.
Also let me be clear that I’m very glad this database exists, and please interpret this as constructive feedback rather than a complaint.
for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they’re on almost identical topics and some are even follow-ups to the Wong paper).
Yea, I’m saying I would drop most of those too.
I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation).
I agree this can contribute to organizational bias.
I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that’s what you’re saying? But that doesn’t have much to do with motivation.
Just to be clear: I’m using “motivation” here in the technical sense of “What distinguishes this topic for further examination out of the space of all possible topics?”, i.e., is the topic unusually likely to lead to TAI safety results down the line?” (It’s not anything to do with the author’s altruism or whatever.)
I think what would best advance this conversation would be for you to propose alternative practical inclusion criteria which could be contrasted the ones we’ve given.
Here’s how is how I arrived at ours. The initial desiderata are:
Criteria are not based on the importance/quality of the paper. (Too hard for us to assess.)
Papers that are explicitly about TAI safety are included.
Papers are not automatically included merely for being relevant to TAI safety. (There are way too many.)
Criteria don’t exclude papers merely for failure to mention TAI safety explicitly. (We want to find and support researchers working in institutions where that would be considered too weird.)
(The only desiderata that we could potentially drop are #2 or #4. #1 and #3 are absolutely crucial for keeping the workload manageable.)
So besides papers explicitly about TAI safety, what else can we include given the fact that we can’t include everything relevant to safety? Papers that TAI safety researchers are unusually likely (relative to other researchers) to want to read, and papers that TAI safety donors will want to fund. To me, that means the papers that are building toward TAI safety results more than most papers are. That’s what I’m trying to get across by “motivated”.
Perhaps that is still too vague. I’m very in your alternative suggestions!
Thanks, that’s helpful. If you’re saying that the stricter criterion would also apply to DM/CHAI/etc. papers then I’m not as worried about bias against younger researchers.
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that. I’m not sure yet how to formulate one that is time-efficient, so I’m going to bracket that for now (recognizing that might be less useful for you), since I think we actually disagree about in principle what papers are building towards TAI safety.
To elaborate, let’s take verification as an example (since it’s relevant to the Wong & Kolter paper). Lots of people think verification is helpful for TAI safety—MIRI has talked about it in the past, and very long-termist people like Paul Christiano are excited about it as a current direction afaik. If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria. Now the reality is that verification does have lots of traction in the academic community, but why is that? It’s because Wong & Kolter and Raghunathan et al. wrote two early papers that provided promising paths forward on neural net verification, which many other people are now trying to expand on. This seems strictly better to me than the MIRI example, so it seems like either:
-The hypothetical MIRI work shouldn’t have made the cut
-There’s actually two types of verification work (call them VerA and VerB), such that hypothetical MIRI was working on VerA that was relevant, while the above papers are VerB which is not relevant.
-Papers should make the cut on factors other than actual impact, e.g. perhaps the MIRI papers should be included because they’re from MIRI, or you should want to highlight them more because they didn’t get traction.
-Something else I’m missing?
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included (e.g. I think the initial adversarial examples papers by Szegedy and Goodfellow should be included on similar grounds).
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that.
Sorry I was unclear. Those were just 4 desiderata that the criteria need to satisfy; the desiderata weren’t intended to fully specify the criteria.
If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria.
Certainly possible, but I think this would partly be because MIRI would explicitly talk in their paper about the (putative) connection to TAI safety, which makes it a lot easier for me see. (Alternative interpretation: it would be tricking me, a non-expert, into thinking there was more of a substantive connection to TAI safety than actually is there.) I am trying not to penalize researchers for failing to talk explicitly about TAI, but I am limited.
I think it’s more likely the database has inconsistencies of the kind you’re pointing at from CHAI, Open AI, and (as you’ve mentioned) DeepMind, since these organizations have self-described (partial) safety focus while still doing lots of research non-safety and near-term-safety research. When confronted with such inconsistencies, I will lean heavily toward not including any of them since this seems like the only feasible choice given my resources. In other words, I select your final option: “The hypothetical MIRI work shouldn’t have made the cut”.
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included
Here I understand you to be suggesting that we use a notability criterion that can make up for the connection to TAI safety being less direct. I am very open to this suggestion, and indeed I think an ideal database would use criteria like this. (It would make the database more useful to both researchers and donors.) My chief concern is just that I have no way to do this right now because I am not in a position to judge the notability. Even after looking at the abstracts of the work by Raghunathan et al. and Wong & Kolter, I, as a layman, am unable to tell that they are quite notable.
Now, I could certainly infer notability by (1) talking to people like you and/or (2) looking at a citation trail. (Note that a citation count is insufficient because I’d need to know it’s well cited by TAI safety papers specifically.) But this is just not at all feasible for me to do for a bunch of papers, much less every paper that initially looked equally promising to my untrained eyes. This database is a personal side project, not my day job. So I really need some expert collaborators or, at the least, some experts who are willing to judge batches of papers based on a some fixed set of criteria.
Sure, sure, we tried doing both of these. But they were just taking way too long in terms of new papers surfaced per hour worked. (Hence me asking for things that are more efficient than looking at reference lists from review articles and emailing the orgs.) Following the correct (promising) citation trail also relies more heavily on technical expertise, which neither Angelica nor I have.
I would love to have some collaborators with expertise in the field to assist on the next version. As mentioned, I think it would make a good side project for a grad student, so feel to nudge yours to contact us!
Thanks for curating this! You sort of acknowledge this already, but one bias in this list is that it’s very tilted towards large organizations like DeepMind, CHAI, etc. One way to see this is that you have AugMix by Hendrycks et al., but not the Common Corruptions and Perturbations paper, which has the same first author and publication year and 4x the number of citations (in fact it would top the 2019 list by a wide margin). The main difference is that AugMix had DeepMind co-authors while Common Corruptions did not.
I mainly bring this up because this bias probably particularly falls against junior PhD students, many of whom are doing great work that we should seek to recognize. For instance (and I’m obviously biased here), Aditi Raghunathan and Dan Hendrycks would be at or near the top of your citation count for most years if you included all of their safety-relevant work.
In that vein, the verification work from Zico Kolter’s group should probably be included, e.g. the convex outer polytope [by Eric Wong] and randomized smoothing [by Jeremy Cohen] papers (at least, it’s not clear why you would include Aditi’s SDP work with me and Percy, but not those).
I recognize it might not be feasible to really address this issue entirely, given your resource constraints. But it seems worth thinking about if there are cheap ways to ameliorate this.
Also, in case it’s helpful, here’s a review I wrote in 2019: AI Alignment Research Overview.
Thanks Jacob. That last link is broken for me, but I think you mean this?
Well, it’s biased toward safety organizations, not large organizations. (Indeed, it seems to be biased toward small safety organizations over larges ones since they tend to reply to our emails!) We get good coverage of small orgs like Ought, but you’re right we don’t have a way to easily track individual unaffiliated safety researchers and it’s not fair.
I look forward to a glorious future where this database is so well known that all safety authors naturally send us a link to their work when its released, but for now the best way we have of finding papers is (1) asking safety organizations for what they’ve produced and (2) taking references from review articles. If you can suggest another option for getting more comprehensive coverage per hour of work we’d be very interested to hear it (seriously!).
For what it’s worth, the papers by Hendrycks are very borderline based on our inclusion criteria, and in fact I think if I were classifying it today I think I would not include it. (Not because it’s not high quality work, but just because I think it still happens in a world where no research is motivated by the safety of transformative AI; maybe that’s wrong?) For now I’ve added the papers you mention by Hendrycks, Wong, and Cohen to the database, but my guess is they get dropped for being too near-term-motivated when they get reviewed next year.
More generally, let me mention that we do want to recognize great work, but our higher priority is to (1) recognize work that is particularly relevant to TAI safety and (2) help donors assess safety organizations.
Thanks again! I’m adding your 2019 review to the list.
Yeah, good point. I agree it’s more about organizations (although I do think that DeepMind is benefiting a lot here, e.g. you’re including a fairly comprehensive list of their adversarial robustness work while explicitly ignoring that work at large—it’s not super-clear on what grounds, for instance if you think Wong and Cohen should be dropped then about half of the DeepMind papers should be too since they’re on almost identical topics and some are even follow-ups to the Wong paper).
That seems wrong to me, but maybe that’s a longer conversation. (I agree that similar papers would probably have come out within the next 3 years, but asking for that level of counterfactual irreplacibility seems kind of unreasonable imo.) I also think that the majority of the CHAI and DeepMind papers included wouldn’t pass that test (tbc I think they’re great papers! I just don’t really see what basis you’re using to separate them).
I think focusing on motivation rather than results can also lead to problems, and perhaps contributes to organization bias (by relying on branding to asses motivation). I do agree that counterfactual impact is a good metric, i.e. you should be less excited about a paper that was likely to soon happen anyways; maybe that’s what you’re saying? But that doesn’t have much to do with motivation.
Also let me be clear that I’m very glad this database exists, and please interpret this as constructive feedback rather than a complaint.
Yea, I’m saying I would drop most of those too.
I agree this can contribute to organizational bias.
Just to be clear: I’m using “motivation” here in the technical sense of “What distinguishes this topic for further examination out of the space of all possible topics?”, i.e., is the topic unusually likely to lead to TAI safety results down the line?” (It’s not anything to do with the author’s altruism or whatever.)
I think what would best advance this conversation would be for you to propose alternative practical inclusion criteria which could be contrasted the ones we’ve given.
Here’s how is how I arrived at ours. The initial desiderata are:
Criteria are not based on the importance/quality of the paper. (Too hard for us to assess.)
Papers that are explicitly about TAI safety are included.
Papers are not automatically included merely for being relevant to TAI safety. (There are way too many.)
Criteria don’t exclude papers merely for failure to mention TAI safety explicitly. (We want to find and support researchers working in institutions where that would be considered too weird.)
(The only desiderata that we could potentially drop are #2 or #4. #1 and #3 are absolutely crucial for keeping the workload manageable.)
So besides papers explicitly about TAI safety, what else can we include given the fact that we can’t include everything relevant to safety? Papers that TAI safety researchers are unusually likely (relative to other researchers) to want to read, and papers that TAI safety donors will want to fund. To me, that means the papers that are building toward TAI safety results more than most papers are. That’s what I’m trying to get across by “motivated”.
Perhaps that is still too vague. I’m very in your alternative suggestions!
Thanks, that’s helpful. If you’re saying that the stricter criterion would also apply to DM/CHAI/etc. papers then I’m not as worried about bias against younger researchers.
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that. I’m not sure yet how to formulate one that is time-efficient, so I’m going to bracket that for now (recognizing that might be less useful for you), since I think we actually disagree about in principle what papers are building towards TAI safety.
To elaborate, let’s take verification as an example (since it’s relevant to the Wong & Kolter paper). Lots of people think verification is helpful for TAI safety—MIRI has talked about it in the past, and very long-termist people like Paul Christiano are excited about it as a current direction afaik. If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria. Now the reality is that verification does have lots of traction in the academic community, but why is that? It’s because Wong & Kolter and Raghunathan et al. wrote two early papers that provided promising paths forward on neural net verification, which many other people are now trying to expand on. This seems strictly better to me than the MIRI example, so it seems like either:
-The hypothetical MIRI work shouldn’t have made the cut
-There’s actually two types of verification work (call them VerA and VerB), such that hypothetical MIRI was working on VerA that was relevant, while the above papers are VerB which is not relevant.
-Papers should make the cut on factors other than actual impact, e.g. perhaps the MIRI papers should be included because they’re from MIRI, or you should want to highlight them more because they didn’t get traction.
-Something else I’m missing?
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included (e.g. I think the initial adversarial examples papers by Szegedy and Goodfellow should be included on similar grounds).
Sorry I was unclear. Those were just 4 desiderata that the criteria need to satisfy; the desiderata weren’t intended to fully specify the criteria.
Certainly possible, but I think this would partly be because MIRI would explicitly talk in their paper about the (putative) connection to TAI safety, which makes it a lot easier for me see. (Alternative interpretation: it would be tricking me, a non-expert, into thinking there was more of a substantive connection to TAI safety than actually is there.) I am trying not to penalize researchers for failing to talk explicitly about TAI, but I am limited.
I think it’s more likely the database has inconsistencies of the kind you’re pointing at from CHAI, Open AI, and (as you’ve mentioned) DeepMind, since these organizations have self-described (partial) safety focus while still doing lots of research non-safety and near-term-safety research. When confronted with such inconsistencies, I will lean heavily toward not including any of them since this seems like the only feasible choice given my resources. In other words, I select your final option: “The hypothetical MIRI work shouldn’t have made the cut”.
Here I understand you to be suggesting that we use a notability criterion that can make up for the connection to TAI safety being less direct. I am very open to this suggestion, and indeed I think an ideal database would use criteria like this. (It would make the database more useful to both researchers and donors.) My chief concern is just that I have no way to do this right now because I am not in a position to judge the notability. Even after looking at the abstracts of the work by Raghunathan et al. and Wong & Kolter, I, as a layman, am unable to tell that they are quite notable.
Now, I could certainly infer notability by (1) talking to people like you and/or (2) looking at a citation trail. (Note that a citation count is insufficient because I’d need to know it’s well cited by TAI safety papers specifically.) But this is just not at all feasible for me to do for a bunch of papers, much less every paper that initially looked equally promising to my untrained eyes. This database is a personal side project, not my day job. So I really need some expert collaborators or, at the least, some experts who are willing to judge batches of papers based on a some fixed set of criteria.
Also in terms of alternatives, I’m not sure how time-expensive this is, but some ideas for discovering additional work:
-Following citation trails (esp. to highly-cited papers)
-Going to the personal webpages of authors of relevant papers, to see if there’s more (also similarly for faculty webpages)
Sure, sure, we tried doing both of these. But they were just taking way too long in terms of new papers surfaced per hour worked. (Hence me asking for things that are more efficient than looking at reference lists from review articles and emailing the orgs.) Following the correct (promising) citation trail also relies more heavily on technical expertise, which neither Angelica nor I have.
I would love to have some collaborators with expertise in the field to assist on the next version. As mentioned, I think it would make a good side project for a grad student, so feel to nudge yours to contact us!