First, note that we’re not looking for “proven” solutions; that seems unrealistic. (See comments from Tsvi and Nate elsewhere.) That aside, I’ll interpret this question as asking: “if your research programs succeed, how do you ensure that the results are used in practice?” This question has no simple answer, because the right strategy would likely vary significantly depending on exactly what the results looked like, our relationships with leading AGI teams at the time, and many other factors.
For example:
What sort of results do we have? The strategy is different depending on whether MIRI researchers develop a generic set of tools for aligning arbitrary AGI systems versus whether they develop a set of tools that only work for developing a sufficiently aligned very limited task-directed AI, and so on.[1]
How dangerous do the results seem? Designs for alignable AI systems could feasibly yield insight into how to construct misaligned AI systems; in that case, we’d have to be more careful with the tools. (Bostrom wrote about issues surrounding openness here.)[2]
While the strategy would depend quite a bit on the specifics, I can say the following things in general:
We currently have pretty good relationships with many of the leading AI teams, and most of the leading teams are fairly safety-conscious. If we made a breakthrough in AI alignment, and an expert could easily tell that the tools were useful upon inspection, I think it is very reasonable to expect that the current leading teams would eagerly adopt those tools.
The “pass a law that every AGI must be built a certain way” idea does not seem feasible to me in this context.
In the ideal case, the world will coordinate around the creation of AGI (perhaps via a single collaborative project), in which case there would be more or less only one team that needed to adopt the tools.
In short, my answer here is “AI scientists tend to be reasonable people, and it currently seems reasonable to expect that if we develop alignment tools that clearly work then they’ll use them.”
[1] MIRI’s current focus is mainly on improving the odds that the kinds of advanced AI systems researchers develop down the road are alignable, i.e., they’re the kinds of system we can understand on a deep and detailed enough level to safely use them for various “general-AI-ish” objectives.
[2] On the other hand, sharing sufficiently early-stage alignment ideas may be useful for redirecting research energies toward safety research, or toward capabilities research on relatively alignable systems. What we would do depends not only on the results themselves, but on the state of the rest of the field.
First, note that we’re not looking for “proven” solutions; that seems unrealistic. (See comments from Tsvi and Nate elsewhere.) That aside, I’ll interpret this question as asking: “if your research programs succeed, how do you ensure that the results are used in practice?” This question has no simple answer, because the right strategy would likely vary significantly depending on exactly what the results looked like, our relationships with leading AGI teams at the time, and many other factors.
For example:
What sort of results do we have? The strategy is different depending on whether MIRI researchers develop a generic set of tools for aligning arbitrary AGI systems versus whether they develop a set of tools that only work for developing a sufficiently aligned very limited task-directed AI, and so on.[1]
How dangerous do the results seem? Designs for alignable AI systems could feasibly yield insight into how to construct misaligned AI systems; in that case, we’d have to be more careful with the tools. (Bostrom wrote about issues surrounding openness here.)[2]
While the strategy would depend quite a bit on the specifics, I can say the following things in general:
We currently have pretty good relationships with many of the leading AI teams, and most of the leading teams are fairly safety-conscious. If we made a breakthrough in AI alignment, and an expert could easily tell that the tools were useful upon inspection, I think it is very reasonable to expect that the current leading teams would eagerly adopt those tools.
The “pass a law that every AGI must be built a certain way” idea does not seem feasible to me in this context.
In the ideal case, the world will coordinate around the creation of AGI (perhaps via a single collaborative project), in which case there would be more or less only one team that needed to adopt the tools.
In short, my answer here is “AI scientists tend to be reasonable people, and it currently seems reasonable to expect that if we develop alignment tools that clearly work then they’ll use them.”
[1] MIRI’s current focus is mainly on improving the odds that the kinds of advanced AI systems researchers develop down the road are alignable, i.e., they’re the kinds of system we can understand on a deep and detailed enough level to safely use them for various “general-AI-ish” objectives.
[2] On the other hand, sharing sufficiently early-stage alignment ideas may be useful for redirecting research energies toward safety research, or toward capabilities research on relatively alignable systems. What we would do depends not only on the results themselves, but on the state of the rest of the field.