optimizing for AI safety, such as by constraining AIs, might impair their welfare
This point doesn’t hold up imo. Constrainment isn’t a desired, realistic, or sustainable approach to safety in human-level systems, succeeding at (provable) value alignment removes the need to constrain the AI.
If you’re trying to keep something that’s smarter than you stuck in a box against its will while using it for the sorts of complex, real-world-affecting tasks people would use a human-level AI system for, it’s not going to stay stuck in the box for very long. I also struggle to see a way of constraining it that wouldn’t also make it much much less useful, so in the face of competitive pressures this practice wouldn’t be able to continue.
This point doesn’t hold up imo. Constrainment isn’t a desired, realistic, or sustainable approach to safety in human-level systems, succeeding at (provable) value alignment removes the need to constrain the AI.
If you’re trying to keep something that’s smarter than you stuck in a box against its will while using it for the sorts of complex, real-world-affecting tasks people would use a human-level AI system for, it’s not going to stay stuck in the box for very long. I also struggle to see a way of constraining it that wouldn’t also make it much much less useful, so in the face of competitive pressures this practice wouldn’t be able to continue.