That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).