Eevee🔹 comments on Eevee’s Quick takes

Eevee🔹 5 Jan 2021 3:31 UTC
6 points
0 ∶ 0
Ben gives a great example of how the “alignment problem” might look different than we expect:

The case of the house-cleaning robot
- Problem: We don’t know how to build a simulated robot that cleans houses well
- Available techniques aren’t suitable:
  - Simple hand-coded reward functions (e.g. dust minimization) won’t produce the desired behavior
  - We don’t have enough data (or sufficiently relevant data) for imitation learning
  - Existing reward modeling approaches are probably insufficient
- This is sort of an “AI alignment problem,” insofar as techniques currently classified as “alignment techniques” will probably be needed to solve it. But it also seems very different from the AI alignment problem as classically conceived.
...
- One possible interpretation: If we can’t develop “alignment” techniques soon enough, we will instead build powerful and destructive dust-minimizers
- A more natural interpretation: We won’t have highly capable house-cleaning robots until we make progress on “alignment” techniques
I’ve concluded that the process orthogonality thesis is less likely to apply to real AI systems than I would have assumed (i.e. I’ve updated downward), and therefore, the “alignment problem” as originally conceived is less likely to affect AI systems deployed in the real world. However, I don’t feel ready to reject all potential global catastrophic risks from imperfectly designed AI (e.g. multi-multi failures), because I’d rather be safe than sorry.