I believe we are not currently in crunch time. I expect we will be able to predict crunch time decently well (say) a year in advance by noticing AI systems’ near-dangerous capabilities.
I also agree with David Manheim that the path matters; and therefore incremental steps such as a US moratorium are likely net positive, especially considering that it is crunch time, now. International treaties can be built from such a precedent, and the US is probably at least 1-2 years ahead of the rest of the world currently.
“Crunch time” has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don’t seem to be in that crunch time yet.
I agree that US domestic policy can lead to international law; that should be a consideration.
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
Thanks. I don’t share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we’ve already got once we’ve solved mechanistic interpretability for the current ones (and we’re so far off that—the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we’re basically at crunch time now in terms of having a good shot at averting extinction.
There’s also a lot of overlap between disagreeing with someone and liking a post. If you disagree with something, you are more likely to not like it. I don’t love this about the voting system but I don’t really have a better alternative to suggest.
We are already in crunch time, doubly so post GPT-4. What predictors are you using that aren’t yet being triggered?
I also agree with David Manheim that the path matters; and therefore incremental steps such as a US moratorium are likely net positive, especially considering that it is crunch time, now. International treaties can be built from such a precedent, and the US is probably at least 1-2 years ahead of the rest of the world currently.
“Crunch time” has many meanings, but in this post it mostly means a time shortly before critical systems in which alignment research is much more productive. We don’t seem to be in that crunch time yet.
I agree that US domestic policy can lead to international law; that should be a consideration.
That makes sense. But like Greg_Colbourn says, it seems like a non-trivial assumption that alignment research will become significantly more productive with newer systems.
Also, different researchers may expect very different degrees of “more productive.” It seems plausible to me that we could learn more about the motivations of AI models once we move to a paradigm that isn’t just “training next-token prediction on everything on the internet.” At the same time, it seems outlandish to me that there’d ever come a point where new systems could help us with the harder parts of alignment (due the expert delegation problem where delegating well in an environment where the assistants may not all be competent and well-intentioned becomes impossible if you don’t already have the expertise yourself).
Thanks. I don’t share the expectation that alignment research will be much more productive shortly before critical systems. At least not to a degree where it reduces relative risk. We should only have systems more advanced than those we’ve already got once we’ve solved mechanistic interpretability for the current ones (and we’re so far off that—the frontier of interpretability research is looking at GPT-2 sized models and smaller!). Also, I think there is a non-zero chance that the next generation of models will be critical, so we’re basically at crunch time now in terms of having a good shot at averting extinction.
I am actually interested answers to my question, it wasn’t rhetorical (and not sure why my comment was downvoted—disagreement votes: fine).
There’s also a lot of overlap between disagreeing with someone and liking a post. If you disagree with something, you are more likely to not like it. I don’t love this about the voting system but I don’t really have a better alternative to suggest.