Aligning the Aligners: Ensuring Aligned AI acts for the common good of all mankind

The argument I’m going to make here is obviously not original with me, but I do think that while AI governance is talked about, and one of the career paths that 80,000 hours recommends, it is not sufficiently pointed to as an important cause area. Further, I think the specific issue that I’m pointing to is even more neglected than AI governance generally.

Suppose we create a human superior AI that does what we tell it to.

This creates the possibility of us immediately transforming into a post scarcity economy. Nobody needs to work. Everybody gets as much as they want of the stuff that used to be scarce due to limited human labor.

This would be amazing!

Each of us could own a yacht, or two. The limit on how many yachts we can have is space on the water, not how expensive the boat is. We can own as many clothes as we want. If anyone wants an authentic 18th century costume, it can be hand sewn immediately.

If anyone gets sick, they will be able to have access to a flawless surgeon using the most advanced medical knowledge. They also won’t need to wait for an appointment to talk to a doctor. Every single child will have access to private tutors who know as much as every tutor that any child has had until the present put together.

An actual super intelligence could give us delicious zero suffering meat, probably a malaria vaccine, it could eliminate either all, or nearly all childhood diseases.

AI could create a world where nobody needs to desperately work and hustle just to get by, and where no one ever needs to worry about whether there will be food and shelter.

Etc, etc, etc.

There is a reason that techno-optimist transhumanists think friendly AI can create an amazingly good world.

So, let’s for a moment assume Deepmind, Open AI or a black project run by the Chinese government successfully creates a super human intelligence that does what they ask, and which will not betray them.

Does this super awesome world actually come into existence?

Reasons to worry (definitely not exhaustive!)

Power corrupts
- Capabilities researchers have been reading about how awesome singletons are.
  - Even worse, some of the suits might have read about it too. Sergey and Elon definitely have.
The Leftist critique: A limited number of powerful people making decisions about the fate of people who have no say in it. They will treat our collective human resources as their own private toys.
- I take this issue very seriously.
- A good world is a world where everyone has control over their own fate, and is no longer at the mercy of impersonal forces that they can neither understand nor manipulate.
- Further a good world is one in which people in difficult, impoverished, and non normative circumstances are able to make choices to make their lives go well, as they see it.
The Nationalism problem
- Suppose AI developed in the US successfully stays under democratic control. And it is used purely to aggrandize the wealth and well being of Americans, by locking in America’s dominance of all resources in the solar system and the light cone, forever.
  - Poor Malawians are still second or third class citizens on earth, and are still only receiving the drips of charity from those who silently consider themselves their betters.
  - We could have fixed poverty forever instead.
- Suppose AI is developed in China
  - They establish a regime with communist principles and social control over communication everywhere on the planet. This regime keeps everyone, everywhere, forever parroting communist slogans.
- Worse: Suppose they don’t give any charity to the poor? There is precedent for dominant groups to simply treat the poor around them as parasites or work animals. Perhaps whoever controls the AI will starve or directly kill all other humans.

Summary: Misaligned people controlling the AI would be bad.

This issue is connected to the agency problem (of which aligning AI itself is an example).

How do we make sure that the people or institutions in power act with everyone’s best interests in mind?
What systems can we put in place to hold the people/institutions in power accountable to the promises they make?
How do we shape the incentive landscape in such a way that the people/institutions in power act to maximise wellbeing (while not creating adverse effects)?
How do we make sure we give power only to the actors who have a sufficient understanding of the most pressing issues and are committed to tackling them?

So what can we do about this? Two approaches that I have heard about:

Limiting financial upside of developing AI to a finite quantity that is small relative to the output of a dyson swarm.
- Windfall clauses
- Profit capping arrangements like I think Open AI has
- Ad hoc after the fact government taxes and seizures
The Moon Treaty, and giving the whole global community collective ownership of outerspace resources
Make sure that if AI is developed, it only comes out of a limited number of highly regulated and government controlled entities where part of the regulatory framework ensures a broad distribution of the benefits to at least the citizens of the country where it was constructed. This centralization might also have substantial safety benefits.

The problem with any approach to AI control after it is developed is that we cannot trust the legal system to constrain the behavior of someone in control of a singleton following a fast take off scenario. There need to be safeguards embedded in these companies that are capable of physically forcing the group that built the AI to do what they promised to do with it, and these regulations need to be built into the structure of how any AI that might develop into a singleton is trained and built.

This should be part of the AI safety regulatory framework, and might be used as part of what convinces the broader public that AI safety regulation is necessary in the first place (it would actually be bad, even if you are a libertarian, if AI is just used to satisfy the desires of rich people).

All of this only becomes a problem if we actually solve the general alignment problem of creating a system that does what its developers actually want it to do. What you think the p(AGI Doom) is will drive whether you think this is worth working on.

This also is an effective and possibly tractable place to focus on systemic change. A world system that ensures that everyone gets a sufficient share of the global resources to meet their needs after full automation likely will require major legal and institutional changes, possibly of the same magnitude as a switch to communism or anarcho capitalism would require.

The value of improving a post aligned AI future is multiplied by the possibility that we actually reach that future. So if you think that the odds are 1/million that AI is safely developed, the expected value from efforts in this direction is far lower than if you believe the odds of AI killing us all are 1/million.

But if we meet that (possibly unlikely) bar of not dying from AI, there will still be more work needed to be done to create utopia.

I’d like to thank Milan, Laszlo, Marta, Gergo, Richard and David for their comments on the draft text of this essay.