I think it is fair to say that so far alignment research is not a standard research area in academic machine learning, unlike for example model interpretability. Do you think that would be desirable, and if so what would need to happen?
In particular, I had this toy idea of making progress legible to academic journals: Formulating problems and metrics that are “publishing-friendly”could, despite the problems that optimizing for flawed metrics bring, allow researchers at regular universities to conduct work in these areas.
It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.
Three versions of this:
Somehow cause academia to intrinsically care about reducing x-risk, and also ensure that the power structures in academia have a good understanding of the problem, so that its own quality control mechanisms cause academics to do useful work. I feel pretty pessimistic about the viability of convincing large swathes of academia to care about the right thing for the right reasons. Historically, basically the only way that people have ended up thinking about alignment research in a way that I’m excited about is that they spent a really long time thinking about AI x-risk and talking about it with other interested people. And so I’m not very optimistic about the first of these.
Just get academics to do useful work on specific problems that seem relevant to x-risk. For example, I’m fairly excited about some work on interpretability and some techniques for adversarial robustness. On the other hand, my sense is that EA funders have on many occasions tried to get academics to do useful work on topics of EA interest, and have generally found it quite difficult; this makes me pessimistic about this. Perhaps an analogy here is: Suppose you’re Google, and there’s some problem you need solved, and there’s an academic field that has some relevant expertise. How hard should you try to get academics in that field excited about working on the problem? Seems plausible to me that you shouldn’t try that hard—you’d be better off trying to have a higher-touch relationship where you employ researchers or make specific grants, rather than trying to convince the field to care about the subproblem intrinsically (even if they in some sense should care about the subproblem).
Get academics to feel generally positively towards x-risk-motivated alignment research, even if they don’t try to work on it themselves. This seems useful and more tractable.
I think it is fair to say that so far alignment research is not a standard research area in academic machine learning, unlike for example model interpretability. Do you think that would be desirable, and if so what would need to happen?
In particular, I had this toy idea of making progress legible to academic journals: Formulating problems and metrics that are “publishing-friendly”could, despite the problems that optimizing for flawed metrics bring, allow researchers at regular universities to conduct work in these areas.
It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.
Three versions of this:
Somehow cause academia to intrinsically care about reducing x-risk, and also ensure that the power structures in academia have a good understanding of the problem, so that its own quality control mechanisms cause academics to do useful work. I feel pretty pessimistic about the viability of convincing large swathes of academia to care about the right thing for the right reasons. Historically, basically the only way that people have ended up thinking about alignment research in a way that I’m excited about is that they spent a really long time thinking about AI x-risk and talking about it with other interested people. And so I’m not very optimistic about the first of these.
Just get academics to do useful work on specific problems that seem relevant to x-risk. For example, I’m fairly excited about some work on interpretability and some techniques for adversarial robustness. On the other hand, my sense is that EA funders have on many occasions tried to get academics to do useful work on topics of EA interest, and have generally found it quite difficult; this makes me pessimistic about this. Perhaps an analogy here is: Suppose you’re Google, and there’s some problem you need solved, and there’s an academic field that has some relevant expertise. How hard should you try to get academics in that field excited about working on the problem? Seems plausible to me that you shouldn’t try that hard—you’d be better off trying to have a higher-touch relationship where you employ researchers or make specific grants, rather than trying to convince the field to care about the subproblem intrinsically (even if they in some sense should care about the subproblem).
Get academics to feel generally positively towards x-risk-motivated alignment research, even if they don’t try to work on it themselves. This seems useful and more tractable.