Problem: We donât know how to build a simulated robot that cleans houses well
Available techniques arenât suitable:
Simple hand-coded reward functions (e.g. dust minimization) wonât produce the desired behavior
We donât have enough data (or sufficiently relevant data) for imitation learning
Existing reward modeling approaches are probably insufficient
This is sort of an âAI alignment problem,â insofar as techniques currently classified as âalignment techniquesâ will probably be needed to solve it. But it also seems very different from the AI alignment problem as classically conceived.
...
One possible interpretation: If we canât develop âalignmentâ techniques soon enough, we will instead build powerful and destructive dust-minimizers
A more natural interpretation: We wonât have highly capable house-cleaning robots until we make progress on âalignmentâ techniques
Iâve concluded that the process orthogonality thesis is less likely to apply to real AI systems than I would have assumed (i.e. Iâve updated downward), and therefore, the âalignment problemâ as originally conceived is less likely to affect AI systems deployed in the real world. However, I donât feel ready to reject all potential global catastrophic risks from imperfectly designed AI (e.g. multi-multi failures), because Iâd rather be safe than sorry.
Ben gives a great example of how the âalignment problemâ might look different than we expect:
The case of the house-cleaning robot
Problem: We donât know how to build a simulated robot that cleans houses well
Available techniques arenât suitable:
Simple hand-coded reward functions (e.g. dust minimization) wonât produce the desired behavior
We donât have enough data (or sufficiently relevant data) for imitation learning
Existing reward modeling approaches are probably insufficient
This is sort of an âAI alignment problem,â insofar as techniques currently classified as âalignment techniquesâ will probably be needed to solve it. But it also seems very different from the AI alignment problem as classically conceived.
...
One possible interpretation: If we canât develop âalignmentâ techniques soon enough, we will instead build powerful and destructive dust-minimizers
A more natural interpretation: We wonât have highly capable house-cleaning robots until we make progress on âalignmentâ techniques
Iâve concluded that the process orthogonality thesis is less likely to apply to real AI systems than I would have assumed (i.e. Iâve updated downward), and therefore, the âalignment problemâ as originally conceived is less likely to affect AI systems deployed in the real world. However, I donât feel ready to reject all potential global catastrophic risks from imperfectly designed AI (e.g. multi-multi failures), because Iâd rather be safe than sorry.