Improving the quality/quantity of output from safety teams within AI labs has a (much) bigger impact on perceived safety of the lab than it does on actual safety of the lab. This is therefore the dominant term in the impact of the team’s work. Right now it’s negative.
If perception of safety is higher than actual safety, it will lead to underinvestment of future safety, which increases the probability of failure of the system.
I would agree that this is a good summary:
If perception of safety is higher than actual safety, it will lead to underinvestment of future safety, which increases the probability of failure of the system.