From a long-termist perspective, I think that—the more gradual AI progress is—the more important concerns about “bad attractor states” and “instability” become relative to concerns about AI safety/alignment failures. (See slides).
I think it is probably true, though, that AI safety/alignment risk is more tractable than these other risks. To some extent, the solution to safety risk is for enough researchers to put their heads down and work really hard on technical problems; there’s probably some amount of research effort that would be enough, even if this quantity is very large. In contrast, the only way to avoid certain risks associated with “bad attractor states” might be to establish stable international institutions that are far stronger than any that have come before; there might be structural barriers, here, that no amount of research effort or insight would be enough to overcome.
I think it’s at least plausible that the most useful thing for AI safety and governance researchers to do ultimately focus on brain-in-a-box-ish AI risk scenarios, even they’re not very likely relative to other scenarios. (This would still entail some amount of work that’s useful for multiple scenarios; there would also be instrumental reasons, related to skill-building and reputation-building, to work on present-day challenges.) But I have some not-fully-worked-out discomfort with this possibility.
One thing that I do feel comfortable saying is that more effort should go into assessing the tractability of different influence pathways, the likelihood of different kinds of risks beyond the classic version of AI risk, etc.
From a long-termist perspective, I think that—the more gradual AI progress is—the more important concerns about “bad attractor states” and “instability” become relative to concerns about AI safety/alignment failures. (See slides).
I think it is probably true, though, that AI safety/alignment risk is more tractable than these other risks. To some extent, the solution to safety risk is for enough researchers to put their heads down and work really hard on technical problems; there’s probably some amount of research effort that would be enough, even if this quantity is very large. In contrast, the only way to avoid certain risks associated with “bad attractor states” might be to establish stable international institutions that are far stronger than any that have come before; there might be structural barriers, here, that no amount of research effort or insight would be enough to overcome.
I think it’s at least plausible that the most useful thing for AI safety and governance researchers to do ultimately focus on brain-in-a-box-ish AI risk scenarios, even they’re not very likely relative to other scenarios. (This would still entail some amount of work that’s useful for multiple scenarios; there would also be instrumental reasons, related to skill-building and reputation-building, to work on present-day challenges.) But I have some not-fully-worked-out discomfort with this possibility.
One thing that I do feel comfortable saying is that more effort should go into assessing the tractability of different influence pathways, the likelihood of different kinds of risks beyond the classic version of AI risk, etc.