In discussions on the difficulty of aligning transformative AI, I’ve seen reference class arguments like “When engineers build and deploy things, it rarely turns out to be destructive.”
I’ve always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: “When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?”
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where “our algorithm affects society poorly” would receive a lot of sincere attention even at management levels, securely going forward throughout the company’s existence.)
Admittedly, I’m kind of lumping together the alignment and coordination problems. Someone could have the view that “AI alignment,” with a narrow definition of what counts as “aligned,” is comparatively easy, but coordination could still be hard.
In discussions on the difficulty of aligning transformative AI, I’ve seen reference class arguments like “When engineers build and deploy things, it rarely turns out to be destructive.”
I’ve always felt like this is pointing at the wrong reference class.
My above comment on framings explains why. I think the reference class for AI alignment difficulty levels should be more like: “When have the people who deployed transformative technology correctly foreseen long-term bad societal consequences and have taken the right costly steps to mitigate them?”
(Examples could be: Keeping a new technology secret; or facebook in an alternate history setting up a governance structure where “our algorithm affects society poorly” would receive a lot of sincere attention even at management levels, securely going forward throughout the company’s existence.)
Admittedly, I’m kind of lumping together the alignment and coordination problems. Someone could have the view that “AI alignment,” with a narrow definition of what counts as “aligned,” is comparatively easy, but coordination could still be hard.