It would be bad to create significant public pressure for a pause through advocacy, because this would cause relevant actors (particularly AGI labs) to spend their effort on looking good to the public, rather than doing what is actually good.
I think I can reasonably model the safety teams at AGI labs as genuinely trying to do good. But I don’t know that the AGI labs as organizations are best modeled as trying to do good, rather than optimizing for objectives like outperforming competitors, attracting investment, and advancing exciting capabilities – subject to some safety-related concerns from leadership. That said, public pressure could manifest itself in a variety of ways, some of which might work toward more or less productive goals.
I agree that conditional pauses better than unconditional pauses, due to pragmatic factors. But I worry about AGI labs specification gaming their way through dangerous-capability evaluations, using brittle band-aid fixes that don’t meaningfully contribute to safety.
I don’t know that the AGI labs as organizations are best modeled as trying to do good, rather than optimizing for objectives like outperforming competitors, attracting investment, and advancing exciting capabilities – subject to some safety-related concerns from leadership.
I will go further—it’s definitely the latter one for at least Google DeepMind and OpenAI; Anthropic is arguable. I still think that’s a much better situation than having public pressure when the ask is very nuanced (as it would be for alignment research).
For example, I’m currently glad that the EA community does not have the power to exert much pressure on the work done by the safety teams at AGI labs, because the EA community’s opinions on what alignment research should be done are bad, and the community doesn’t have the self-awareness to notice that themselves, and so instead safety teams at labs would have to spend hundreds of hours writing detailed posts explaining their work to defuse the pressure from the EA community.
At least there I expect the safety teams could defuse the pressure by spending hundreds of hours writing detailed posts, because EAs will read that. With the public there’s no hope of that, and safety teams instead just won’t do the work that they think is good, and instead do work that they can sell to the public.
I think I can reasonably model the safety teams at AGI labs as genuinely trying to do good. But I don’t know that the AGI labs as organizations are best modeled as trying to do good, rather than optimizing for objectives like outperforming competitors, attracting investment, and advancing exciting capabilities – subject to some safety-related concerns from leadership. That said, public pressure could manifest itself in a variety of ways, some of which might work toward more or less productive goals.
I agree that conditional pauses better than unconditional pauses, due to pragmatic factors. But I worry about AGI labs specification gaming their way through dangerous-capability evaluations, using brittle band-aid fixes that don’t meaningfully contribute to safety.
I will go further—it’s definitely the latter one for at least Google DeepMind and OpenAI; Anthropic is arguable. I still think that’s a much better situation than having public pressure when the ask is very nuanced (as it would be for alignment research).
For example, I’m currently glad that the EA community does not have the power to exert much pressure on the work done by the safety teams at AGI labs, because the EA community’s opinions on what alignment research should be done are bad, and the community doesn’t have the self-awareness to notice that themselves, and so instead safety teams at labs would have to spend hundreds of hours writing detailed posts explaining their work to defuse the pressure from the EA community.
At least there I expect the safety teams could defuse the pressure by spending hundreds of hours writing detailed posts, because EAs will read that. With the public there’s no hope of that, and safety teams instead just won’t do the work that they think is good, and instead do work that they can sell to the public.