Companies and governments will find it strategically valuable to develop advanced AIs which are able to execute creative plans in pursuit of a goal achieving real-world outcomes. Current large language models have a rich understanding of the world which generalizes to other domains, and reinforcement learning agents already achieve superhuman performance at various games. With further advancements in AI research and compute, we are likely to see the development of human-level AI this century. But for a wide variety of goals, it is often valuable to pursue instrumental goals such as acquiring resources, self-preservation, seeking power, and eliminating opposition. By default, we should expect that highly capable agents will have these unsafe instrumental objectives.
The vast majority of actors would not want to develop unsafe systems. However, there are reasons to think that alignment will be hard with modern deep learning systems, and difficulties with making large language models safe provide empirical support of this claim. Misaligned AI may seem acceptably safe and only have catastrophic consequences with further advancements in AI capabilities, and it may be unclear in advance whether a model is dangerous. In the heat of an AI race between companies or governments, proper care may not be taken to make sure that the systems being developed behave as intended.
(This is technically two paragraphs haha. You could merge them into one paragraph, but note that the second paragraph is mostly by Joshua Clymer.)
Companies and governments will find it strategically valuable to develop advanced AIs which are able to execute creative plans in pursuit of a goal achieving real-world outcomes. Current large language models have a rich understanding of the world which generalizes to other domains, and reinforcement learning agents already achieve superhuman performance at various games. With further advancements in AI research and compute, we are likely to see the development of human-level AI this century. But for a wide variety of goals, it is often valuable to pursue instrumental goals such as acquiring resources, self-preservation, seeking power, and eliminating opposition. By default, we should expect that highly capable agents will have these unsafe instrumental objectives.
The vast majority of actors would not want to develop unsafe systems. However, there are reasons to think that alignment will be hard with modern deep learning systems, and difficulties with making large language models safe provide empirical support of this claim. Misaligned AI may seem acceptably safe and only have catastrophic consequences with further advancements in AI capabilities, and it may be unclear in advance whether a model is dangerous. In the heat of an AI race between companies or governments, proper care may not be taken to make sure that the systems being developed behave as intended.
(This is technically two paragraphs haha. You could merge them into one paragraph, but note that the second paragraph is mostly by Joshua Clymer.)