I feel like there are already a bunch of misaligned incentives for alignment researchers (and all groups of people in general), so if the people on the safety team aren’t selected to be great at caring about the best object-level work, then we’re in trouble either way.
I’m not thinking that much about the motivations of the people on the safety team. I’m thinking of things like:
Resourcing for the safety team (hiring, compute, etc) is conditioned on whether the work produces big splashy announcements that the public will like
Other teams in the company that are doing things that the public likes will be rebranded as safety teams
When safety teams make recommendations for safety interventions on the strongest AI systems, their recommendations are rejected if the public wouldn’t like them
When safety teams do outreach to other researchers in the company, they have to self-censor for fear that a well-meaning whistleblower will cause a PR disaster by leaking an opinion of the safety team that the public has deemed to be wrong
I’m not thinking that much about the motivations of the people on the safety team. I’m thinking of things like:
Resourcing for the safety team (hiring, compute, etc) is conditioned on whether the work produces big splashy announcements that the public will like
Other teams in the company that are doing things that the public likes will be rebranded as safety teams
When safety teams make recommendations for safety interventions on the strongest AI systems, their recommendations are rejected if the public wouldn’t like them
When safety teams do outreach to other researchers in the company, they have to self-censor for fear that a well-meaning whistleblower will cause a PR disaster by leaking an opinion of the safety team that the public has deemed to be wrong
See also Unconscious Economics.
(I should have said that companies will face pressure, rather than safety teams. I’ll edit that now.)