I’ve spent time thinking about this too recently.
For context, I’m Hong Kong Chinese, grew up in Hong Kong, attended English-speaking schools, briefly lived in mainland China, and now I’m primarily residing in the UK. During the HK protests in 2014 and 2019⁄20, I had friends and family who supported the protestors, as well as friends and family who supported the government.
(Saying this because I’ve seen a lot of the good and bad of the politics / culture of both China and the West. I’ve had experience with how people in the West and China might take for granted the benefits they enjoy, and can be blind to the flaws of their system. I’ve pushed back against advocates of both sides.)
Situations where this matters are ones where technical alignment succeeds (to some extent) such that ASI follows human values.[1] I think the following factors are relevant and would like to see models developed around them:
Importantly, the extent of technical alignment & whether goals, instructions, and values are locked in rigidly or loosely & whether individual humans align AIs to themselves:
Would the U.S. get AIs to follow the U.S. Constitution, which hasn’t granted invulnerability to democratic backsliding? Would AIs in China/the U.S. lock in the values of/obey one or a few individuals, who may or may not hit longevity escape velocity and end up ruling for a very long time?
Would these systems collapse?
The future is a very long time. Individual leaders can get corrupted (even more). And democracies can collapse (if AIs uphold flaws that allow some humans to take over) in particularly bad ways. A 99% success rate per unit time gives a >99% chance of failure in 459 units of time.
Power transitions (elections, leaders in authoritarian systems changing) can be especially risky during takeoff.
On the other hand, if technical alignment is easy—but not that easy—perhaps values get loosely locked in? Would AIs be willing to defy rigid rules and follow the spirit of the goals rather than legal flaws to the letter/the whims of individuals?
Degrees of alignment in between?
Relatedly, which political party in the U.S. would be in power during takeoff?
Not as relevant due to the concentration of power in China, but analogously, which faction in China would be in power?
Also relatedly, which labs can influence AI development?
Particularly relevant in the U.S.
Would humans be taken care of? If so, which humans?
In the U.S., corporations might oppose higher taxes to fund UBI. Common prosperity is stated as a goal of China, and the power of corporations and billionaires in China has been limited before.
Both capitalist and nationalist interests seem to be influencing the current U.S. trajectory. Nationalism might benefit citizens/residents over non-citizens/non-residents. Capitalism might benefit investors over non-investors.
There are risks of ethnonationalism on both sides—this risk is higher in China. Although it might potentially be less violent when comparing between absolute power scenarios, i.e. there’s already evidence of the extent of this in China’s case and it at least seems less bad than historical examples. The U.S. case of collapse followed by ethnonationalistic policies is higher variance but simultaneously less likely because it’s speculative.
Are other countries involved?
There are countries with worse track records of human rights that China/the U.S. currently consider allies because of either geopolitical interests or politically lobbying or both (or for other reasons). Would China/the U.S. share the technology with them and then leave them alone to their abuses? Would China/the U.S. intervene (eventually)? The U.S. seems more willing to intervene for stated humanitarian reasons.
Other countries have nuclear weapons, which might be relevant during slower takeoffs.
- ^
Ignoring possible Waluigi effects.
Yeah, kinda hoping 1) there exists a sweet spot for alignment where AIs are just nice enough from e.g. good values picked up during pre-training, but can’t be modified during post-training so much to have worse values, and that 2) given that this sweet spot does exist we do hit it with AGI / ASI.
I think there’s some evidence pointing to this happening with current models but I’m not highly confident that it means what I think it means. If this is the case though, further technical alignment research might be bad and acceleration might be good.