Thanks, that’s helpful. If you’re saying that the stricter criterion would also apply to DM/CHAI/etc. papers then I’m not as worried about bias against younger researchers.
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that. I’m not sure yet how to formulate one that is time-efficient, so I’m going to bracket that for now (recognizing that might be less useful for you), since I think we actually disagree about in principle what papers are building towards TAI safety.
To elaborate, let’s take verification as an example (since it’s relevant to the Wong & Kolter paper). Lots of people think verification is helpful for TAI safety—MIRI has talked about it in the past, and very long-termist people like Paul Christiano are excited about it as a current direction afaik. If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria. Now the reality is that verification does have lots of traction in the academic community, but why is that? It’s because Wong & Kolter and Raghunathan et al. wrote two early papers that provided promising paths forward on neural net verification, which many other people are now trying to expand on. This seems strictly better to me than the MIRI example, so it seems like either:
-The hypothetical MIRI work shouldn’t have made the cut
-There’s actually two types of verification work (call them VerA and VerB), such that hypothetical MIRI was working on VerA that was relevant, while the above papers are VerB which is not relevant.
-Papers should make the cut on factors other than actual impact, e.g. perhaps the MIRI papers should be included because they’re from MIRI, or you should want to highlight them more because they didn’t get traction.
-Something else I’m missing?
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included (e.g. I think the initial adversarial examples papers by Szegedy and Goodfellow should be included on similar grounds).
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that.
Sorry I was unclear. Those were just 4 desiderata that the criteria need to satisfy; the desiderata weren’t intended to fully specify the criteria.
If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria.
Certainly possible, but I think this would partly be because MIRI would explicitly talk in their paper about the (putative) connection to TAI safety, which makes it a lot easier for me see. (Alternative interpretation: it would be tricking me, a non-expert, into thinking there was more of a substantive connection to TAI safety than actually is there.) I am trying not to penalize researchers for failing to talk explicitly about TAI, but I am limited.
I think it’s more likely the database has inconsistencies of the kind you’re pointing at from CHAI, Open AI, and (as you’ve mentioned) DeepMind, since these organizations have self-described (partial) safety focus while still doing lots of research non-safety and near-term-safety research. When confronted with such inconsistencies, I will lean heavily toward not including any of them since this seems like the only feasible choice given my resources. In other words, I select your final option: “The hypothetical MIRI work shouldn’t have made the cut”.
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included
Here I understand you to be suggesting that we use a notability criterion that can make up for the connection to TAI safety being less direct. I am very open to this suggestion, and indeed I think an ideal database would use criteria like this. (It would make the database more useful to both researchers and donors.) My chief concern is just that I have no way to do this right now because I am not in a position to judge the notability. Even after looking at the abstracts of the work by Raghunathan et al. and Wong & Kolter, I, as a layman, am unable to tell that they are quite notable.
Now, I could certainly infer notability by (1) talking to people like you and/or (2) looking at a citation trail. (Note that a citation count is insufficient because I’d need to know it’s well cited by TAI safety papers specifically.) But this is just not at all feasible for me to do for a bunch of papers, much less every paper that initially looked equally promising to my untrained eyes. This database is a personal side project, not my day job. So I really need some expert collaborators or, at the least, some experts who are willing to judge batches of papers based on a some fixed set of criteria.
Thanks, that’s helpful. If you’re saying that the stricter criterion would also apply to DM/CHAI/etc. papers then I’m not as worried about bias against younger researchers.
Regarding your 4 criteria, I think they don’t really delineate how to make the sort of judgment calls we’re discussing here, so it really seems like it should be about a 5th criterion that does delineate that. I’m not sure yet how to formulate one that is time-efficient, so I’m going to bracket that for now (recognizing that might be less useful for you), since I think we actually disagree about in principle what papers are building towards TAI safety.
To elaborate, let’s take verification as an example (since it’s relevant to the Wong & Kolter paper). Lots of people think verification is helpful for TAI safety—MIRI has talked about it in the past, and very long-termist people like Paul Christiano are excited about it as a current direction afaik. If a small group of researchers at MIRI were trying to do work on verification but not getting much traction in the academic community, my intuition is that their papers would reliably meet your criteria. Now the reality is that verification does have lots of traction in the academic community, but why is that? It’s because Wong & Kolter and Raghunathan et al. wrote two early papers that provided promising paths forward on neural net verification, which many other people are now trying to expand on. This seems strictly better to me than the MIRI example, so it seems like either:
-The hypothetical MIRI work shouldn’t have made the cut
-There’s actually two types of verification work (call them VerA and VerB), such that hypothetical MIRI was working on VerA that was relevant, while the above papers are VerB which is not relevant.
-Papers should make the cut on factors other than actual impact, e.g. perhaps the MIRI papers should be included because they’re from MIRI, or you should want to highlight them more because they didn’t get traction.
-Something else I’m missing?
I definitely agree that you shouldn’t just include every paper on robustness or verification, but perhaps at least early work that led to an important/productive/TAI-relevant line should be included (e.g. I think the initial adversarial examples papers by Szegedy and Goodfellow should be included on similar grounds).
Sorry I was unclear. Those were just 4 desiderata that the criteria need to satisfy; the desiderata weren’t intended to fully specify the criteria.
Certainly possible, but I think this would partly be because MIRI would explicitly talk in their paper about the (putative) connection to TAI safety, which makes it a lot easier for me see. (Alternative interpretation: it would be tricking me, a non-expert, into thinking there was more of a substantive connection to TAI safety than actually is there.) I am trying not to penalize researchers for failing to talk explicitly about TAI, but I am limited.
I think it’s more likely the database has inconsistencies of the kind you’re pointing at from CHAI, Open AI, and (as you’ve mentioned) DeepMind, since these organizations have self-described (partial) safety focus while still doing lots of research non-safety and near-term-safety research. When confronted with such inconsistencies, I will lean heavily toward not including any of them since this seems like the only feasible choice given my resources. In other words, I select your final option: “The hypothetical MIRI work shouldn’t have made the cut”.
Here I understand you to be suggesting that we use a notability criterion that can make up for the connection to TAI safety being less direct. I am very open to this suggestion, and indeed I think an ideal database would use criteria like this. (It would make the database more useful to both researchers and donors.) My chief concern is just that I have no way to do this right now because I am not in a position to judge the notability. Even after looking at the abstracts of the work by Raghunathan et al. and Wong & Kolter, I, as a layman, am unable to tell that they are quite notable.
Now, I could certainly infer notability by (1) talking to people like you and/or (2) looking at a citation trail. (Note that a citation count is insufficient because I’d need to know it’s well cited by TAI safety papers specifically.) But this is just not at all feasible for me to do for a bunch of papers, much less every paper that initially looked equally promising to my untrained eyes. This database is a personal side project, not my day job. So I really need some expert collaborators or, at the least, some experts who are willing to judge batches of papers based on a some fixed set of criteria.