“Crunch time” has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don’t seem to be in that crunch time yet.
I agree that US domestic policy can lead to international law; that should be a consideration.
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
Thanks. I don’t share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we’ve already got once we’ve solved mechanistic interpretability for the current ones (and we’re so far off that—the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we’re basically at crunch time now in terms of having a good shot at averting extinction.
“Crunch time” has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don’t seem to be in that crunch time yet.
I agree that US domestic policy can lead to international law; that should be a consideration.
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
Thanks. I don’t share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we’ve already got once we’ve solved mechanistic interpretability for the current ones (and we’re so far off that—the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we’re basically at crunch time now in terms of having a good shot at averting extinction.