The relevance and quality of Chinese technical research for frontier AI safety has increased substantially, with growing work on frontier issues such as LLM unlearning, misuse risks of AI in biology and chemistry, and evaluating “power-seeking” and “self-awareness” risks of LLMs.
There have been nearly 15 Chinese technical papers on frontier AI safety per month on average over the past 6 months. The report identifies 11 key research groups who have written a substantial portion of these papers.
China’s decision to sign the Bletchley Declaration, issue a joint statement on AI governance with France, and pursue an intergovernmental AI dialogue with the US indicates a growing convergence of views on AI safety among major powers compared to early 2023.
Since 2022, 8 Track 1.5 or 2 dialogues focused on AI have taken place between China and Western countries, with 2 focused on frontier AI safety and governance.
Chinese national policy and leadership show growing interest in developing large models while balancing risk prevention.
Unofficial expert drafts of China’s forthcoming national AI law contain provisions on AI safety, such as specialized oversight for foundation models and stipulating value alignment of AGI.
Local governments in China’s 3 biggest AI hubs have issued policies on AGI or large models, primarily aimed at accelerating development while also including provisions on topics such as international cooperation, ethics, and testing and evaluation.
Several influential industry associations established projects or committees to research AI safety and security problems, but their focus is primarily on content and data security rather than frontier AI safety.
In recent months, Chinese experts have discussed several focused AI safety topics, including “red lines” that AI must not cross to avoid “existential risks,” minimum funding levels for AI safety research, and AI’s impact on biosecurity.
Michael then says, “So clearly there is a discourse about AI safety there, that does sometimes extend even as far as the risk of extinction. It’s nowhere near as prominent or dramatic as it has been in the USA, but it’s there.”
I agree that it’s not like everyone in China is 100% asleep at the wheel—China is a big place with lots of smart people, they can read the news and discuss ideas just like we can, and so naturally there are some folks there who share EA-style concerns about AI alignment. But it does seem like the small amount of activity happening there is mostly following / echoing / agreeing with western ideas about AI safety, and seems more concentrated among academics, local governments, etc, rather than also coming from the leaders of top labs like in the USA.
As for trying to promote more AI safety thinking in China, I think it’s very tricky—if somebody like OpenPhil just naively started sending millions of dollars to fund Chinese AI safety university groups and create Chinese AI safety think tanks / evals organizations / etc, I think this would be (correctly?) percieved by China’s government as a massive foreign influence operation designed to subvert their national goals in a critical high-priority area. Which might cause them to massively crack down on the whole concept of western-style “AI safety”, making the situation infinitely worse than before. So it’s very important that AI safety ideas in China arise authentically / independently—but of course, we paradoxically want to “help them” independently come up with the ideas! Some approaches that seem less likely to backfire here might be:
The mentioned “track 2 diplomacy”, where mid-level government officials, scientists, and industry researchers host informal / unofficial discussions about the future of AI with their counterparts in China.
Since China already somewhat follows Western thinking about AI, we should try to use that influence for good, rather than accidentally egging them into an even more desperate arms race. Eg, if the USA announces a giant “manhattan project for AI” with great fanfare, talks all about how this massive national investment is a top priority for outracing China on military capabilies, etc, that would probably just goad China’s national leaders into thinking about AI in the exact same way. So, trying to influence US discourse and policy has a knock-on effect in China.
Even just in a US context, I think it would be extremely valuable to have more objective demonstrations of dangers like alignment faking, instrumental convergence, AI ability to provide advice to would-be bioterrorists, etc. But especially if you are trying to convince Chinese labs and national leaders in addition to western ones, then you are going to be trying to reach across a much bigger gap in terms of cultural context / political mistrust / etc. For crossing that bigger gap, objective demonstrations of misalignment (and other dangers like gradual disempowerment, etc) become relatively even more valuable compared to mere discourse like translating LessWrong articles into chinese.
I actually wrote the above comment in response to a very similar “Chinese AI vs US AI” post that’s currently being discussed on lesswrong. There, commenter Michael Porter had a very helpful reply to my coment. He references a May 2024 report from Concordia AI on “The State of AI Safety in China”, whose executive summary states:
Michael then says, “So clearly there is a discourse about AI safety there, that does sometimes extend even as far as the risk of extinction. It’s nowhere near as prominent or dramatic as it has been in the USA, but it’s there.”
I agree that it’s not like everyone in China is 100% asleep at the wheel—China is a big place with lots of smart people, they can read the news and discuss ideas just like we can, and so naturally there are some folks there who share EA-style concerns about AI alignment. But it does seem like the small amount of activity happening there is mostly following / echoing / agreeing with western ideas about AI safety, and seems more concentrated among academics, local governments, etc, rather than also coming from the leaders of top labs like in the USA.
As for trying to promote more AI safety thinking in China, I think it’s very tricky—if somebody like OpenPhil just naively started sending millions of dollars to fund Chinese AI safety university groups and create Chinese AI safety think tanks / evals organizations / etc, I think this would be (correctly?) percieved by China’s government as a massive foreign influence operation designed to subvert their national goals in a critical high-priority area. Which might cause them to massively crack down on the whole concept of western-style “AI safety”, making the situation infinitely worse than before. So it’s very important that AI safety ideas in China arise authentically / independently—but of course, we paradoxically want to “help them” independently come up with the ideas! Some approaches that seem less likely to backfire here might be:
The mentioned “track 2 diplomacy”, where mid-level government officials, scientists, and industry researchers host informal / unofficial discussions about the future of AI with their counterparts in China.
Since China already somewhat follows Western thinking about AI, we should try to use that influence for good, rather than accidentally egging them into an even more desperate arms race. Eg, if the USA announces a giant “manhattan project for AI” with great fanfare, talks all about how this massive national investment is a top priority for outracing China on military capabilies, etc, that would probably just goad China’s national leaders into thinking about AI in the exact same way. So, trying to influence US discourse and policy has a knock-on effect in China.
Even just in a US context, I think it would be extremely valuable to have more objective demonstrations of dangers like alignment faking, instrumental convergence, AI ability to provide advice to would-be bioterrorists, etc. But especially if you are trying to convince Chinese labs and national leaders in addition to western ones, then you are going to be trying to reach across a much bigger gap in terms of cultural context / political mistrust / etc. For crossing that bigger gap, objective demonstrations of misalignment (and other dangers like gradual disempowerment, etc) become relatively even more valuable compared to mere discourse like translating LessWrong articles into chinese.