But I am a bit at loss on why people in the AI safety field think it is possible to build safe AI systems in the first place. I guess as long as it is not proven that the properties of safe AI systems are contradictory with each other, you could assume it is theoretically possible. When it comes to ML, the best performance in practice is sadly often worse than the theoretical best.
To me, this belief that AI safety is hard or impossible would imply that AI x-risk is quite high. Then, I’d think that AI safety is very important but unfortunately intractable. Would you agree? Or maybe I misunderstood what you were trying to say.
I agree that x-risk from AI misuse is quite underexplored.
For what it’s worth, AI safety and governance researchers do assign significant probability to x-risk from AI misuse. AI Governance Week 3 — Effective Altruism Cambridge comments:
For context on the field’s current perspectives on these questions, a 2020 survey of AI safety and governance researchers (Clarke et al., 2021) found that, on average [1], researchers currently guess there is: [2]
A 10% chance of existential catastrophe from misaligned, influence-seeking AI [3]
A 6% chance of existential catastrophe from AI-exacerbated war or AI misuse
A 7% chance of existential catastrophe from “other scenarios”
Companies and governments will find it strategically valuable to develop advanced AIs which are able to execute creative plans in pursuit of a goal achieving real-world outcomes. Current large language models have a rich understanding of the world which generalizes to other domains, and reinforcement learning agents already achieve superhuman performance at various games. With further advancements in AI research and compute, we are likely to see the development of human-level AI this century. But for a wide variety of goals, it is often valuable to pursue instrumental goals such as acquiring resources, self-preservation, seeking power, and eliminating opposition. By default, we should expect that highly capable agents will have these unsafe instrumental objectives.
The vast majority of actors would not want to develop unsafe systems. However, there are reasons to think that alignment will be hard with modern deep learning systems, and difficulties with making large language models safe provide empirical support of this claim. Misaligned AI may seem acceptably safe and only have catastrophic consequences with further advancements in AI capabilities, and it may be unclear in advance whether a model is dangerous. In the heat of an AI race between companies or governments, proper care may not be taken to make sure that the systems being developed behave as intended.
(This is technically two paragraphs haha. You could merge them into one paragraph, but note that the second paragraph is mostly by Joshua Clymer.)