Boring bottom line up front: AI safety is sometimes a model property, in the following sense: It sometimes makes sense to perform safety interventions at the model level. Other safety interventions should be at the system or deployment context level. It takes skill and careful consideration to determine where in the AI deployment cycle the best interventions lie, and blanket statements excluding one or the other are not helpful.
AI Safety is Sometimes a Model Property
Link post
Arvind Narayanan and Sayash Kapoor at the AI Snake Oil newsletter have a recent post called “AI safety is not a model property.” I think they make a number of important points in that post, and some incorrect ones.
Boring bottom line up front: AI safety is sometimes a model property, in the following sense: It sometimes makes sense to perform safety interventions at the model level. Other safety interventions should be at the system or deployment context level. It takes skill and careful consideration to determine where in the AI deployment cycle the best interventions lie, and blanket statements excluding one or the other are not helpful.