So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment techniques to the most capable currently available ML systems, in the hope of working out all the kinks in these techniques, and then repeat this every year, so that whenever it comes time to actually build the AGI with these techniques, the relevant lab can just hire all the applied alignment people who are experts on these techniques and get them to apply them. (You might call this fire drills for AI safety, or having an “anytime alignment plan” (someone else invented this latter term, I don’t remember who).)
Assuming that it’s taking too long to solve the technical alignment problem, what might be some of our other best interventions to reduce x-risk from AI? E.g., regulation, institutions for fostering cooperation and coordination between AI labs, public pressure on AI labs/other actors to slow deployment, …
I normally focus my effort on the question “how do we solve the technical alignment problem and make it as convenient as possible to build aligned systems, and then ensure that the relevant capabilities labs put effort into using these alignment techniques”, rather than this question, because it seems relatively tractable, compared to causing things to go well in worlds like those you describe.
One way of thinking about your question is to ask how many years the deployment of existentially risky AI could be delayed (which might buy time to solve the alignment problem). I don’t have super strong takes on this question. I think that there are many reasonable-seeming interventions, such as all of those that you describe. I guess I’m more optimistic about regulation and voluntary coordination between AI labs (eg, I’m happy about “Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.” from the OpenAI Charter) than about public pressure, but I’m not confident.
If we solve the technical alignment problem in time, what do you think are the other major sources of AI-related x-risk that remain? How likely do you think these are, compared to x-risk from not solving the technical alignment problem in time?
Again, I think that maybe 30% of AI accident risk comes from situations where we sort of solved the alignment problem in time but the relevant labs don’t use the known solutions. Excluding that, I think that misuse risk is serious and worth worrying about. I don’t know how much value I think is destroyed in expectation by AI misuse compared to AI accident. I can also imagine various x-risk related to narrow AI in various ways.
So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment techniques to the most capable currently available ML systems, in the hope of working out all the kinks in these techniques, and then repeat this every year, so that whenever it comes time to actually build the AGI with these techniques, the relevant lab can just hire all the applied alignment people who are experts on these techniques and get them to apply them. (You might call this fire drills for AI safety, or having an “anytime alignment plan” (someone else invented this latter term, I don’t remember who).)
I normally focus my effort on the question “how do we solve the technical alignment problem and make it as convenient as possible to build aligned systems, and then ensure that the relevant capabilities labs put effort into using these alignment techniques”, rather than this question, because it seems relatively tractable, compared to causing things to go well in worlds like those you describe.
One way of thinking about your question is to ask how many years the deployment of existentially risky AI could be delayed (which might buy time to solve the alignment problem). I don’t have super strong takes on this question. I think that there are many reasonable-seeming interventions, such as all of those that you describe. I guess I’m more optimistic about regulation and voluntary coordination between AI labs (eg, I’m happy about “Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project.” from the OpenAI Charter) than about public pressure, but I’m not confident.
Again, I think that maybe 30% of AI accident risk comes from situations where we sort of solved the alignment problem in time but the relevant labs don’t use the known solutions. Excluding that, I think that misuse risk is serious and worth worrying about. I don’t know how much value I think is destroyed in expectation by AI misuse compared to AI accident. I can also imagine various x-risk related to narrow AI in various ways.