I think an extremely dangerous failure mode for AI safety research would be to prioritize ‘hardcore-quantitative people’ trying to solve AI alignment using clever technical tricks, without understanding much about the 8 billion ordinary humans they’re trying to align AIs with.
If behavioral scientists aren’t involved in AI alignment research—and as skeptics about whether ‘alignment’ is even possible—it’s quite likely that whatever ‘alignment solution’ the hardcore-quantitative people invent is going to be brittle, inadequate, and risky.
I think an extremely dangerous failure mode for AI safety research would be to prioritize ‘hardcore-quantitative people’ trying to solve AI alignment using clever technical tricks, without understanding much about the 8 billion ordinary humans they’re trying to align AIs with.
If behavioral scientists aren’t involved in AI alignment research—and as skeptics about whether ‘alignment’ is even possible—it’s quite likely that whatever ‘alignment solution’ the hardcore-quantitative people invent is going to be brittle, inadequate, and risky.