I think that there might be a bunch of progress on theoretical alignment, with various consequences:
More projects that look like “do applied research on various strategies to make imitative generalization work in practice”—that is, projects where the theory researchers have specific proposals for ML training schemes that have attractive alignment properties, but which have practical implementation questions that might require a bunch of effort to work out. I think that a lot of the impact from applied alignment research comes from making it easier for capabilities labs to adopt alignment schemes, and so I’m particularly excited for this kind of work.
More well-scoped narrow theoretical problems, so that there’s more gains from parallelism among theory researchers.
A better sense of what kinds of practical research is useful.
I think I will probably be noticeably more optimistic or pessimistic—either there will be some plan for solving the problem that seems pretty legit to me, or else I’ll have updated substantially against such a plan existing.
We might have a clearer picture of AGI timelines. We might have better guesses about how early AGI will be trained. We might know more about empirical ML phenomena like scaling laws (which I think are somewhat relevant for alignment).
There will probably be a lot more industry interest in problems like “our pretrained model obviously knows a lot about topic X, but we don’t know how to elicit this knowledge from it.” I expect more interest in this because this becomes an increasingly important problem as your pretrained models become more knowledgeable. I think that this problem is pretty closely related to the alignment problem, so e.g. I expect that most research along the lines of Learning to Summarize with Human Feedback will be done by people who need this research for practical purposes, rather than alignment researchers interested in the analogy to AGI alignment problems.
Hopefully we’ll have more large applied alignment projects, as various x-risk-motivated orgs like Redwood scale up.
Plausibly large funders like Open Philanthropy will start spending large amounts of money on funding alignment-relevant research through RFPs or other mechanisms.
Probably we’ll have way better resources for onboarding new people into cutting edge thinking on alignment. I think that resources are way better than they were two years ago, and I expect this trend to continue.
Similarly, I think that there are a bunch of arguments about futurism and technical alignment that have been written up much more clearly and carefully now than they had been a few years ago. Eg Joe Carlsmith’s report on x-risk from power-seeking AGI and Ajeya Cotra on AGI timelines. I expect this trend to continue.
Here are some things I think are fairly likely:
I think that there might be a bunch of progress on theoretical alignment, with various consequences:
More projects that look like “do applied research on various strategies to make imitative generalization work in practice”—that is, projects where the theory researchers have specific proposals for ML training schemes that have attractive alignment properties, but which have practical implementation questions that might require a bunch of effort to work out. I think that a lot of the impact from applied alignment research comes from making it easier for capabilities labs to adopt alignment schemes, and so I’m particularly excited for this kind of work.
More well-scoped narrow theoretical problems, so that there’s more gains from parallelism among theory researchers.
A better sense of what kinds of practical research is useful.
I think I will probably be noticeably more optimistic or pessimistic—either there will be some plan for solving the problem that seems pretty legit to me, or else I’ll have updated substantially against such a plan existing.
We might have a clearer picture of AGI timelines. We might have better guesses about how early AGI will be trained. We might know more about empirical ML phenomena like scaling laws (which I think are somewhat relevant for alignment).
There will probably be a lot more industry interest in problems like “our pretrained model obviously knows a lot about topic X, but we don’t know how to elicit this knowledge from it.” I expect more interest in this because this becomes an increasingly important problem as your pretrained models become more knowledgeable. I think that this problem is pretty closely related to the alignment problem, so e.g. I expect that most research along the lines of Learning to Summarize with Human Feedback will be done by people who need this research for practical purposes, rather than alignment researchers interested in the analogy to AGI alignment problems.
Hopefully we’ll have more large applied alignment projects, as various x-risk-motivated orgs like Redwood scale up.
Plausibly large funders like Open Philanthropy will start spending large amounts of money on funding alignment-relevant research through RFPs or other mechanisms.
Probably we’ll have way better resources for onboarding new people into cutting edge thinking on alignment. I think that resources are way better than they were two years ago, and I expect this trend to continue.
Similarly, I think that there are a bunch of arguments about futurism and technical alignment that have been written up much more clearly and carefully now than they had been a few years ago. Eg Joe Carlsmith’s report on x-risk from power-seeking AGI and Ajeya Cotra on AGI timelines. I expect this trend to continue.
What’s the main way that you think resources for onboarding people has improved?