Conflicts of interests: Much AI safety work is done by companies who develop AIs. Max Tegmark makes this analogy: What would we think if a large part of climate change research were done by oil companies, or a large part of lung cancer research by tobacco companies? This situation probably makes AI safety research weaker. There is also the risk that it improves the reputation of AI companies, so that their non-safety work can advance faster and more boldly. And it means safety is delegated to a subteam rather than being everyone’s responsibility (different from, say, information security).
Speeding up AI: Even well-meaning safety work likely speeds up the overall development of AI. For example, interpretability seems really promising for safety, but at the same time it is a quasi-necessary condition to deploy a powerful AI system. If you look at (for example) the recent papers from anthropic.com, you will find many techniques that are generally useful to build AIs.
Information hazard: I admire work like the Slaughterbots video from the Future of Life Institute. Yet it has clear infohazard potential. Similarly, Michael Nielsen writes “Afaict talking a lot about AI risk has clearly increased it quite a bit (many of the most talented people I know working on actual AI were influenced to by Bostrom.)”
publishing the results of the GPT models, demonstrating AI capabilities and showing the world how much further we can already push it, and therefore accelerating AI development, or
slowing AI development more in countries that care more about safety than those that don’t care much, risking a much worse AGI takeover if it matters who builds it first.
Several people whom I respect hold the view that AI safety might be dangerous. For example, here’s Alexander Berger tweeting about it.
A brief list of potential risks:
Conflicts of interests: Much AI safety work is done by companies who develop AIs. Max Tegmark makes this analogy: What would we think if a large part of climate change research were done by oil companies, or a large part of lung cancer research by tobacco companies? This situation probably makes AI safety research weaker. There is also the risk that it improves the reputation of AI companies, so that their non-safety work can advance faster and more boldly. And it means safety is delegated to a subteam rather than being everyone’s responsibility (different from, say, information security).
Speeding up AI: Even well-meaning safety work likely speeds up the overall development of AI. For example, interpretability seems really promising for safety, but at the same time it is a quasi-necessary condition to deploy a powerful AI system. If you look at (for example) the recent papers from anthropic.com, you will find many techniques that are generally useful to build AIs.
Information hazard: I admire work like the Slaughterbots video from the Future of Life Institute. Yet it has clear infohazard potential. Similarly, Michael Nielsen writes “Afaict talking a lot about AI risk has clearly increased it quite a bit (many of the most talented people I know working on actual AI were influenced to by Bostrom.)”
Other failure modes mentioned by MichaelStJules:
creating a false sense of security,
publishing the results of the GPT models, demonstrating AI capabilities and showing the world how much further we can already push it, and therefore accelerating AI development, or
slowing AI development more in countries that care more about safety than those that don’t care much, risking a much worse AGI takeover if it matters who builds it first.