I’m not sure whether I agree that “quietly lowering the bar at the last minute so you can meet requirements isn’t how safety policies are supposed to work”. (Not sure I disagree; but going to try to articulate a case against).
I think in a world where you understand the risks well ahead of time of course this isn’t how safety policies should work. In a world where you don’t understand the risks well ahead of time, you can get more clarity as the key moments approach, and this could lead you to rationally judge that a lower bar would be appropriate. In a regime of voluntary safety policies, the idea seems to me to be that each actor makes their own judgements about what policy would be safe, and it seems to me like you absolutely could expect to see this pattern some of the time from actors just following their own best judgements, even if those are unbiased.
Of course:
This is a pattern that you could also see because judgements are getting biased (your discussion of the object level merits of the change is very relevant to this question)
The difficulty from outside of distinguishing between cases where it’s a fair judgement behind the change vs an impartial one is a major reason to potentially prefer regimes other than “voluntary safety policy”—but I don’t really think that it’s good for people in the voluntary safety policy regime to act as though they’re already under a different regime (e.g. because this might slow down actually moving to a different regime)
(Someone might object that if people can make these changes there’s little point in voluntary safety policies—the company will ultimately do what it thinks is good! I think there would be something to this objection, and that these voluntary policies provide less assurance than it is commonly supposed they do; nonetheless I still think they are valuable, not for the commitments they provide but for the transparency they provide about how companies at a given moment are thinking about the tradeoffs.)
From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late.
So, I think it’s good to create an expectation/norm that you shouldn’t substantially weaken a policy right as it is being applied. That’s not to say that a reasonable company shouldn’t do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn’t a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.
I guess I’m fairly sympathetic to this. It makes me think that voluntary safety policies should ideally include some meta-commentary about how companies view the purpose and value-add of the safety policy, and meta-policies about how updates to the safety policy will be made—in particular, that it might be good to specify a period for public comments before a change is implemented. (Even a short period could be some value add.)
I appreciate the investigation here.
I’m not sure whether I agree that “quietly lowering the bar at the last minute so you can meet requirements isn’t how safety policies are supposed to work”. (Not sure I disagree; but going to try to articulate a case against).
I think in a world where you understand the risks well ahead of time of course this isn’t how safety policies should work. In a world where you don’t understand the risks well ahead of time, you can get more clarity as the key moments approach, and this could lead you to rationally judge that a lower bar would be appropriate. In a regime of voluntary safety policies, the idea seems to me to be that each actor makes their own judgements about what policy would be safe, and it seems to me like you absolutely could expect to see this pattern some of the time from actors just following their own best judgements, even if those are unbiased.
Of course:
This is a pattern that you could also see because judgements are getting biased (your discussion of the object level merits of the change is very relevant to this question)
The difficulty from outside of distinguishing between cases where it’s a fair judgement behind the change vs an impartial one is a major reason to potentially prefer regimes other than “voluntary safety policy”—but I don’t really think that it’s good for people in the voluntary safety policy regime to act as though they’re already under a different regime (e.g. because this might slow down actually moving to a different regime)
(Someone might object that if people can make these changes there’s little point in voluntary safety policies—the company will ultimately do what it thinks is good! I think there would be something to this objection, and that these voluntary policies provide less assurance than it is commonly supposed they do; nonetheless I still think they are valuable, not for the commitments they provide but for the transparency they provide about how companies at a given moment are thinking about the tradeoffs.)
From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late.
So, I think it’s good to create an expectation/norm that you shouldn’t substantially weaken a policy right as it is being applied. That’s not to say that a reasonable company shouldn’t do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn’t a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.
I guess I’m fairly sympathetic to this. It makes me think that voluntary safety policies should ideally include some meta-commentary about how companies view the purpose and value-add of the safety policy, and meta-policies about how updates to the safety policy will be made—in particular, that it might be good to specify a period for public comments before a change is implemented. (Even a short period could be some value add.)