Resilience Via Fragmented Power

This post explores a big-picture idea—that our civilization is more resilient when power is fragmented, in contrast with concentrated power. A world with fragmented power cannot be dominated or destroyed by a small group, and so is more resilient. Concentrated power is what enables existential risk or lock-in of bad values.

This is the common thread across existential risks, which come mainly from enormous destructive power in the hands of individuals or small groups. This applies to nuclear weapons, bio risk, and most especially AI risk. Even an AI that is well-aligned with its developer or owner is a source of concentrated power, and a source of existential risk in the wrong hands. I argue that avoiding concentrated power is the key to AI safety, as even a misaligned or malevolent AI is not a problem if its power is small.

This post first gives some examples of fragmented and concentrated power, and examples of humans intentionally designing systems to disperse power for greater resilience. It then proposes general strategies to apply the principle of fragmented power to reduce AI risk—reducing interconnectedness, building a society of AIs, imbuing AI with an aversion to power, and colonizing space with diverse societies.

In short, for AI risk or existential risk in general, our strategies should aim to disperse power in order to improve resilience.

Examples of Fragmented Power

Both biological systems and human society demonstrate how fragmented power creates resilience.

A biological organism can survive the death of any one cell, and repair commonly-occurring damage (e.g. wounds healing on their own). It has an immune system to recognize infections or cancerous cells trying to amass too much power, and stop them. Furthermore, the organism does not depend on living forever—it reproduces and relies on its descendants. An ecosystem of organisms does not depend on any one individual. A local disaster will not cause extinction, as others of the same species survive elsewhere.

Human society is similar. No one individual is critical, and local disasters do not cause extinction. Even if an individual amasses power, they can exercise power only through other humans—a difficult feat to preserve forever. Human psychology contains mechanisms—humor, gossip, jealousy, justice—to guard against domination by powerful individuals. Even the most powerful individual will eventually die. (For example, the empires of Alexander the Great and Genghis Khan did not survive after their deaths.) And historically, geographically separate civilizations could serve as opponents of or refuges from failed or malevolent civilizations.

A market economy also demonstrates resilience. Individuals pursuing their own goals will fill the unmet needs of others, and fill in gaps as situations change. For example, even in the massive disruption of early COVID lockdowns, there was still food in the grocery store—due to a million adaptations by individuals and companies.

Examples of Concentrated Power

Modern human society has developed concentrated power that reduces resilience:

Dictatorial rule over large territories
More destructive military technologies, most especially nuclear and biological weapons
A globally integrated economy with key dependencies provide by only a small number of participants
Global computer networking and automation
Worldwide governance, if that governance were to turn more powerful
In the future, advanced artificial intelligence

These sources of concentrated power give individuals or small groups enormous potential for destruction.

Even if a lever of power is not intentionally exercised for harm, a mistake could lead to catastrophe if it triggers worldwide effects. So anything introducing more connections and dependencies reduces resilience.

Examples of Intentional Dispersal of Power

In some domains, humans have recognized the value of dispersing power, and intentionally designed systems to do so:

Systems of government often aim for checks and balances to limit the power of individual government officials. For example, this was a key design principle for the U.S. Constitution.
Individual rights such as freedom of speech protect against domination and against society-wide groupthink.
Large-scale software systems are designed with no single point of failure, and designed to withstand individual hardware failures. They further distribute data and computation across multiple physical locations to be resilient against local disasters. And they maintain backups to recover from system failures.
Safety-critical systems are isolated from outside influence. For instance, nuclear missile control systems are not connected to the Internet.
Blockchains aim to build a system that cannot be dominated by one individual, organization, or cartel.
Moral pluralism guards against ideological extremism.

These design choices intentionally sacrifice some efficiency for the sake of resilience. Democracy may be slow and indecisive. Redundancy in software systems uses more hardware, raising cost. Disconnected systems are less convenient.

It’s often more efficient to concentrate power, so there’s a strong incentive to do so. We must be intentional to design for fragmented power, and be willing to pay the cost.

Applying Fragmented Power to AI Risk

To be resilient, our civilization must not concentrate power too much. And this concentration of power is the reason we are concerned with AI safety. Misaligned AI, or AI in the hands of malevolent individuals/groups, is not a problem on its own. It only becomes a problem when that AI can exercise substantial power.

How might we use this insight to protect against AI risk? Some possibilities:

Create a well-aligned and powerful “policeman” AI system which exercises its power only to stop misaligned AI systems from being created or exercising power.
Have many AI systems, no one of them too powerful. This aims to create something analogous to human society, where individuals check the power of each other.
Imbue AI systems with an aversion to exercising power.
Keep the AIs contained in a box with minimal influence on the physical world.
Keep the world sufficiently disconnected that no individual or group can dominate everywhere. For example, space colonization might protect against this.

The last two of these would drastically reduce the potential benefit of AI, and AI developers/owners would have strong incentive to violate them—so we would be unwise to rely on these as solutions. But #1, #2, & #3 are somewhat related and worth thinking about. The space colonization approach of #5 might also have potential.

“Policeman” AI

Assuming a world where many individuals/organizations can create AI systems, safety ultimately depends on stopping the misaligned AI systems that will inevitably be created. We need something analogous to police in human society, or analogous to an organism’s immune system.

This policeman can allow other AIs to exist, as long as they don’t accumulate too much power. So I can have an AI assistant that helps plan a vacation, but that assistant wouldn’t help me plan world domination.

The most straightforward solution is to build an AI system to take this job. This system would itself be very powerful, but aligned to exercise that power only to prevent others from accumulating power.

But this works only if (1) we can create a well-aligned AI system and (2) the first organization to develop a powerful AI system wants to give it this policeman function. Both of these are risky to depend upon.

A Society of AIs

Perhaps the “police” could be not a single all-powerful AI system, but a common project for a society of AIs, much as individual humans serve as police officers.

A society of AIs must protect against concentrated power in its ranks, whether by an individual AI or by a group of AIs. But an AI has advantages in amassing power that a human does not—an AI can live forever, can easily copy itself, and can improve its own intelligence or design. Perhaps we might be wise to develop the society of AIs within a system that artificially introduces some frailties similar to humans—finite lifespan, imperfect copying via something akin to sexual reproduction, evolutionary pressure. This would require that we only run AIs within some hardware/software framework that rigorously enforces the frailties, which seems difficult to guarantee.

Aversion to Exercising Power

An aversion to exercising power could improve resilience. We might apply this to a dominant “policeman” AI, so that it will do only the minimum required to prevent other misaligned AIs from gaining power. Or we might introduce it as a “psychological” trait in a society of AIs.

How would we turn this general idea into something actionable? Much as regularization in ML training penalizes large weights, the AI system’s objective function or training rewards would penalize heavy-handed exercise of power.

The main difficulties are:

How much to penalize exercise of power. It seems difficult to properly balance minimal use of power vs the AI’s other objectives. If the aversion to power is too strong, we lose much of the benefit that AI could bring. If it is too weak, we risk domination by power-mad misaligned AI.
How to define power and how to grade different forms of power. Of course murder would be heavily penalized. What about subtle psychological manipulation of a large group of people to swing the outcome of an election? Is that a large exercise of power, or a small one?
What about groups of like-minded AIs? If a billion copies of an AI each exercise a small amount of power, does this amount to a large exercise of power that they should be averse to?

Space Colonization

Most speculatively, in the farther future space colonization could create pockets of society that are minimally connected, much as historical human societies were isolated by geographic barriers. If a society controlling a solar system or galaxy has a defender’s advantage against outside invaders (by no means clear this must be the case, but possible), a diversity of societies could persist. Or if we spread at close to the speed of light, descendants on opposite sides of the expansion wave would not have contact with each other. A strategy of deliberately sending differently-designed AI probes in different directions could create a diversity of descendants and create resilience in the sense that goodness will exist in some portion of the universe. Fragmentation of power would be maintained by the laws of physics.

What To Do?

So what should we do? The high-level takeaway from this post is that our strategies should aim to disperse power in order to improve resilience. The most promising strategies seem to be:

Reduce interconnectedness in the physical world.
Build a society of AIs that will check each others’ power, and in particular will fight against the emergence of any center of excessive power.
Imbue AIs with a “psychological trait” of aversion to exercising power.
In the long term, seed space with a diversity of societies.

Honestly this list is not very actionable, but I hope it might provide inspiration for more specific ideas.