How crucial a role do you expect x-risk-motivated AI alignment will play in making things go well? What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)
We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)
I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. But I don’t have explicit models here and just made these second two numbers up right now; I wouldn’t be surprised to hear that they moved noticeably after two hours of thought. I guess that one thing you might learn from these numbers is that I think that x-risk-motivated AI alignment output is really important.
What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)
I definitely think that AI x-risk seems lower in worlds where we expect medium-scale alignment failure warning shots. I don’t know whether I think that x-risk-motivated alignment research seems less important in those worlds or not—even if everyone thinks that AI is potentially dangerous, we have to have scalable solutions to alignment problems, and I don’t see a reliable route that takes us directly from “people are concerned” to “people solve the problem”.
I think the main factor that affects the importance of x-risk-motivated alignment research is whether it turns out that most of the alignment problem occurs in miniature in sub-AGI systems. If so, much more of the work required for aligning AGI will be done by people who aren’t thinking about how to reduce x-risk.
How crucial a role do you expect x-risk-motivated AI alignment will play in making things go well? What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)
We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)
I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. But I don’t have explicit models here and just made these second two numbers up right now; I wouldn’t be surprised to hear that they moved noticeably after two hours of thought. I guess that one thing you might learn from these numbers is that I think that x-risk-motivated AI alignment output is really important.
I definitely think that AI x-risk seems lower in worlds where we expect medium-scale alignment failure warning shots. I don’t know whether I think that x-risk-motivated alignment research seems less important in those worlds or not—even if everyone thinks that AI is potentially dangerous, we have to have scalable solutions to alignment problems, and I don’t see a reliable route that takes us directly from “people are concerned” to “people solve the problem”.
I think the main factor that affects the importance of x-risk-motivated alignment research is whether it turns out that most of the alignment problem occurs in miniature in sub-AGI systems. If so, much more of the work required for aligning AGI will be done by people who aren’t thinking about how to reduce x-risk.