Buck comments on My personal cruxes for working on AI safety

Buck 21 Feb 2020 5:55 UTC
8 points
0 ∶ 0
For the problems-that-solve-themselves arguments, I feel like your examples have very “good” qualities for solving themselves: both personal and economic incentives are against them, they are obvious when one is confronted with the situation, and at the point where the problems becomes obvious, you can still solve them. I would argue that not all these properties holds for AGI. What are your thoughts about that?
I agree that it’s an important question whether AGI has the right qualities to “solve itself”. To go through the ones you named:
- “Personal and economic incentives are aligned against them”—I think AI safety has somewhat good properties here. Basically no-one wants to kill everyone, and AI systems that aren’t aligned with their users are much less useful. On the other hand, it might be the case that people are strongly incentivised to be reckless and deploy things quickly.
- “they are obvious when one is confronted with the situation”—I think that alignment problems might be fairly obvious, especially if there’s a long process of continuous AI progress where unaligned non-superintelligent AI systems do non-catastrophic damage. So this comes down to questions about how rapid AI progress will be.
- “at the point where the problems become obvious, you can still solve them”—If the problems become obvious because non-superintelligent AI systems are behaving badly, then we can still maybe put more effort into aligning increasingly powerful AI systems after that and hopefully we won’t lose that much of the value of the future.
What links here?
- What do you make of AGI:unaligned::spaceships:not enough food? by Ronny Fernandez (LessWrong; 22 Feb 2020 14:14 UTC; 4 points)