Matthew_Barnett comments on Value fragility and AI takeover

Matthew_Barnett 5 Aug 2024 22:15 UTC
2 points
0 ∶ 0
In general, to me it seems quite fruitful to examine in more detail whether, in fact, multipolarity of various kinds might alleviate concerns about value fragility. And to those who have the intuition that it would (especially in cases, like Multipolar value fragility, where agent A’s exact values aren’t had by any of agents 1-n), I’d be curious to hear the case spelled out in more detail.
Here’s a case that I roughly believe: multipolarity means that there’s a higher likelihood that one’s own values will be represented because it gives them the opportunity to literally live in, and act in the world to bring about outcomes they personally want.
This case is simple enough, and it’s consistent with the ordinary multipolarity the world already experiences. Consider an entirely selfish person. Now, divide the world into two groups: the selfish person (which we call Group A) and the rest of the world (which we call Group B).
Group A and Group B have very different values, even “upon reflection”. Group B is also millions or billions of times more powerful than Group A (as it comprises the entire world minus the selfish individual). Therefore, on a naive analysis, you might expect Group B to “take over the world” and then implement its values without any regard whatsoever to Group A. Indeed, because of the vast power differential, it would be “very easy” for Group B to achieve this world takeover. And such an outcome would indeed be very bad according to Group A’s values.
Of course, this naive analysis is flawed, because the real world is multipolar in an important respect: usually, Group B will let Group A (the individual) have some autonomy, and let them receive a tiny fraction of the world’s resources, rather than murdering Group A and taking all their stuff. They will do this because of laws, moral norms, and respect for one’s fellow human. This multipolarity therefore sidesteps all the issues with value fragility, and allows Group A to achieve a pretty good outcome according to their values.
This is also my primary hope with misaligned AI. Even if misaligned AIs are collectively millions or billions of times more powerful than humans (or aligned AIs), I would hope they would still allow the humans or aligned AIs to have some autonomy, leave us alone, and let us receive a sufficient fraction of resources that we can enjoy an OK outcome, according to our values.