I think that replaceability is very high, so the counterfactual impact is minimal. But that said, there is very little possibility in my mind that even helping with RLHF for compliance with their “safety” guidelines is more beneficial for safety than for accelerating capabilities racing, so any impact is negative.
I think that replaceability is very high, so the counterfactual impact is minimal. But that said, there is very little possibility in my mind that even helping with RLHF for compliance with their “safety” guidelines is more beneficial for safety than for accelerating capabilities racing, so any impact is negative.