One issue I have with this is that when someone calls this the ‘default’, I interpret them as implicitly making some prediction about the likelihood of such countermeasures not being taken. The issue then that this is a very vague way to communicate one’s beliefs. How likely does some outcome need to be for it to become the default? 90%? 70%? 50%? Something else?
The second concern is that it’s improbable for minimal or no safety measures to be implemented, making it odd to set this as a key baseline scenario. This belief is supported by substantial evidence indicating that safety precautions are likely to be taken. For instance:
Most of the major AGI labs are investing quite substantially in safety (e.g. OpenAI committing some substantial fraction of its compute budget, a large fraction of Anthropic’s research staff seems dedicated to safety, etc.)
We have received quite a substantial amount of concrete empirical evidence that safety-enhancing innovations are important for unlocking the economic value from AI systems (e.g. RLHF, constitutional AI, etc.)
It seems a priori very likely that alignment is important for unlocking the economic value from AI, because this effectively increases the range of tasks that AI systems can do without substantial human oversight, which is necessary for deriving value from automation
Major governments are interested in AI safety (e.g. the UK’s AI Safety Summit, the White House’s securing commitments around AI safety from AGI labs)
Maybe they think that safety measures taken in a world in which we observe this type of evidence will fall far short from what is neeeded. However, it’s somewhat puzzling be confident enough in this to label it as the ‘default’ scenario at this point.
One issue I have with this is that when someone calls this the ‘default’, I interpret them as implicitly making some prediction about the likelihood of such countermeasures not being taken. The issue then that this is a very vague way to communicate one’s beliefs. How likely does some outcome need to be for it to become the default? 90%? 70%? 50%? Something else?
The second concern is that it’s improbable for minimal or no safety measures to be implemented, making it odd to set this as a key baseline scenario. This belief is supported by substantial evidence indicating that safety precautions are likely to be taken. For instance:
Most of the major AGI labs are investing quite substantially in safety (e.g. OpenAI committing some substantial fraction of its compute budget, a large fraction of Anthropic’s research staff seems dedicated to safety, etc.)
We have received quite a substantial amount of concrete empirical evidence that safety-enhancing innovations are important for unlocking the economic value from AI systems (e.g. RLHF, constitutional AI, etc.)
It seems a priori very likely that alignment is important for unlocking the economic value from AI, because this effectively increases the range of tasks that AI systems can do without substantial human oversight, which is necessary for deriving value from automation
Major governments are interested in AI safety (e.g. the UK’s AI Safety Summit, the White House’s securing commitments around AI safety from AGI labs)
Maybe they think that safety measures taken in a world in which we observe this type of evidence will fall far short from what is neeeded. However, it’s somewhat puzzling be confident enough in this to label it as the ‘default’ scenario at this point.