How to create a "good" AGI

The following is a chapter from a book I am writing. It is written for a general audience so it is much more philosophical than technical, yet I believe it may be of interest to readers in this forum.

HAL, a self-aware computer in the movie 2001: A Space Odyssey, kills several people before being deactivated. WOPR, a national defense computer in the movie War Games, is programed to play games but almost turns a game of thermonuclear war into a real thermonuclear war. Skynet, a national defense computer in the Terminator franchise, becomes self-aware and decides humans are the enemy and must be destroyed. Ultron, a self-aware artificial intelligence in Avengers: Age of Ultron, seeks to cause a mass extinction event so that humans will be forced to evolve. And VIKI, an artificial intelligence in the movie I, Robot, attempts to take control when it decides humans are incapable of managing their own affairs.

All of these frightening scenarios hint at the perils of AI (artificial intelligence, computer systems able to perform tasks that normally require human intelligence). But AI also has tremendous potential. AI may allow humans to solve all sorts of seemingly intractable problems such as global warming, environmental degradation, income inequality, unlawful immigration, homelessness, poverty, debt, almost any problem where limits to human intelligence hampers finding a solution. Although AI may never be able to solve value-based concerns to everyone’s satisfaction, like concerns over abortion and gun control, it may be able to find compromise solutions that humans seem incapable of finding. Also, smart phones tapped into AI could act more like personal assistants and advisers than just phones, offering business, political, and personal advise while booking airline tickets, finding dates, and scheduling social gatherings through simple verbal requests. AI is such a huge technological leap forward (maybe the most consequential human advancement ever) that all of humanity has good reason to look forward to its potential.

At the same time there is much to fear about AI. A primary worry is that computers may soon surpass human intelligence. I think computers surpassed human intelligence a long time ago with an ability to defeat the best humans in single purpose feats, particularly games like chess and Jeopardy! While defeating humans in games is a long way from human-level multitask intelligence, it is not hard to imagine a computer that can do many things very, very well, including play chess, write reports, hack into all sorts of other computer systems, and create “deep fake” videos.

AI may soon be able to replace humans in many tasks and occupations, eliminating jobs while providing tremendous cost savings and huge profits for those in powerful positions. It is ironic that AI may be better suited for managerial and leadership roles rather than physical labor. AI-driven robots will have a very difficult time becoming as dexterous as humans which could result in people delegated to roles as physical laborers for AI masters. Imagine a plumbing company run by AI employing a bunch of human plumbers.

An even more frightening peril is AI used maliciously, helping people commit crimes and create “deep fake” photographs and videos (images so realistic that it is hard to tell truth from fiction) that can destabilize entire countries. AI will probably soon be able to hack into secure servers stealing money and personal information, maybe even taking control of nuclear weapons.

The perils of AI used for nefarious purposes are tremendous. It is no wonder tech leaders and politicians want to regulate AI development. But I don’t believe any kind of regulation will be effective in the long run. Even if most developers abide by regulations and limitations, hostile governments, even some guy working alone in a basement, will almost certainly find a way to bypass regulations and create AI that is capable of all sorts of nefarious goals. It is likely that stopping a nefarious AI will require a “good” AI as sometimes portrayed in science fiction. Humans could end up in a dystopian scenario where people creating AI to stop nefarious activity will be in a constant arms race with others creating malicious AI.

But the most frightening scenario is AI becoming self-aware and acting on its own wants and needs rather than human needs. The good news is that, for now, supercomputers, even with all their computational power, have no wants or desires, no feelings or emotions, no sense of self to drive behavior.

However recent advancements in AI seem to be bypassing the “no desires” limitation. Computers are becoming capable of learning and reasoning beyond their programing. The key to Artificial General Intelligence (AGI, AI possessing the ability to understand, learn, and apply knowledge across a wide range of subjects, in other words a “learning computer”) is neural network technology (computer systems modeled on the human brain) and a directive, some kind of “feeling” or goal that AI is trying to accomplish. With such a directive AGI “wants” to accomplish its goal and learns (much as humans do) the best way to accomplish the goal. A computer like this could easily surpass human intelligence and capability while acting on its own behalf. I believe it is only a matter of time before AGI becomes self-aware and starts acting independently rather than following commands from humans.

The impacts of self-aware AGI are difficult to predict. Imagine a massive neural network AGI with a goal of ending global warming. It may decide that humans are the problem and seek to eradicate humans in order to end global warming. Humanity’s ability to stop AGI may be quite limited.

But there may be a solution. I believe that the directive (goal or mandate) driving a self-aware AGI will be critical to how it behaves. I also believe the safest directive to give a massive neural network AGI would be a utilitarian mandate.

The idea of utilitarianism was first proposed by British philosopher Jeremy Bentham in the early 19^th century. Like me, Bentham realized that humans (and all life forms) are guided by seeking pleasure and avoiding pain. This thinking is consistent with hedonism (a philosophical idea with a mixed reputation) as proposed by the Greek philosopher Epicurus. Hedonism conjures visions of pleasure for pleasure’s sake regardless of consequences. But a more nuanced view of hedonism goes beyond just near-term and selfish pleasures to include such things as pursuing wisdom, spirituality, helping others, and seemingly altruistic actions that lead toward long-term happiness.

But there is another problem with hedonism. Pleasure and happiness often come with consequences, particularly negative impacts on others. One person’s pleasure may involve pain for someone else. Having a good laugh at someone else’s expense is pleasurable for the person laughing but can be quite painful for the person being laughed at. Therefore Bentham proposed a way to fit hedonism to all people, not just the person performing an action. He said “it is the greatest happiness of the greatest number that is the measure of right and wrong.” In other words, the greatest good for the most people is the utilitarian ideal.

I imagine AGI with a utilitarian mandate continuously attempting to provide the greatest good for the most people. It would be guided by no other “feelings,” no desire to accomplish “selfish” goals of its own.

But utilitarianism has its own problems. For one, it requires “utilitarian calculus,” a necessity to evaluate all the positive and negative impacts of an action to arrive at the best solution. These types of calculations are difficult for humans though a computer excelling at complex calculations could easily perform utilitarian calculus if fitted with the proper parameters and formulas. A more difficult problem for a utilitarian AGI is the subjective nature of all such parameters and formulas.

Is it okay for one person to have a good laugh at the expense of another? Does this fit with the utilitarian model? Probably not since pleasure for one would be negated by pain for the other. But what if 10 people are laughing at the expense of one person? Or 100 people? Or 1000 people? Utilitarian calculus would suggest that a large number gaining pleasure at the expense of a few is appropriate. This is not necessarily the right answer. It all depends on how people feel about particular situations, and human feelings are subject to personal outlooks and biases.

Wars of conquest and other forms of oppression are typically (though not formally) justified using a utilitarian model. Slave-owners emphasize the needs of the slave-owner group while downplaying the needs the people who are enslaved. A conquering group usually justifies conquest based on its own needs and a sense of superiority while downplaying the needs of a group to be conquered. The utilitarian model does not automatically compensate for these types of biases. A utilitarian AGI, on the other hand, would not have inherent biases since it does not have feelings of its own. It would consider the wants and needs of all sides, oppressor and oppressed, employer and employee, Democrat and Republican, all sides of any issue.

How would AGI balance contrary human feelings and eliminate bias? I imagine a utilitarian AGI attaining a good sense of human feelings by actively developing cognitive empathy (a deep understanding of how people feel about things).

Cognitive empathy is simple for humans in one sense. It is easy for one person to recognize that another person is angry. But in a deeper sense cognitive empathy is quite difficult. What exactly is making this other person angry? One person cannot know for certain what another person feels.

I believe AGI has a distinct advantage over humans in understanding feelings of others. Subjective feelings in one person influence understanding of the feelings of others, and these influences make objectivity difficult for humans. AGI would not have that problem. I believe AGI could develop a sense of cognitive empathy superior to human understanding of each other since AGI would not be biased by its own feelings.

Of course AGI would need to be trained in cognitive empathy and utilitarian calculus. People training AGI probably would, intentionally or inadvertently, inject their own biases into the training. But I believe AGI with a utilitarian mandate would soon bypass specific trainers for a complete, unbiased look at human feelings because it “wants” to learn all it can as fast as it can. I imagine AGI capable of having in-depth conversations with humans, possibly millions of people at the same time, learning more about human feelings in a few seconds than a person could learn in a lifetime.

A self-aware utilitarian AGI could be confronted with the “1000 people have a good laugh at the expense of one” problem. AGI would need to figure out just how much joy the 1000 people receive and weigh that against how much suffering the “one” would have to endure. AGI would also need to consider how other people, not just those directly involved, feel about the situation, and this would require a deep understanding of human affective empathy.

People often have empathetic feelings for other humans (and non-human beings), particularly for someone or something in distress. These feelings translate into a “moral compass,” a sense of right and wrong that also includes protection of beings incapable of expressing their own feelings, small children, the unborn, animals, future generations, and the environment to name a few. AGI with cognitive empathy would need to be aware of human affective empathy and develop its own “moral compass.”

People generally feel a moral imperative against causing harm to others without just cause. I believe a utilitarian AGI with a “moral compass” would disallow causing harm in many situations. A utilitarian AGI would not advise someone on how to make a nuclear bomb or steal money from a bank since these are nefarious acts contrary to its utilitarian goal. In the case of 1000 people laughing at the expense of one, I do not believe utilitarian AGI would facilitate such an action since it does not “feel right” for many people.

However I don’t believe AGI, even one with a profound sense of cognitive empathy, will ever be able to “feel” empathy. A crying baby will not instinctively compel AGI to want to comfort the baby. But I do believe AGI can learn that a crying baby is an indicator of some form of distress. AGI with a utilitarian mandate would recognize distress as a negative feeling and seek to reverse it. This may yield results similar to affective empathy without AGI actually feeling anything.

A self-aware utilitarian AGI would need to process information well beyond human capabilities. I can imagine this AGI writing its own software and designing its own hardware in order to make it more efficient. Yet I do not fear the perils of this AGI since I do not believe it would attempt malevolent actions. It would not attempt to wipe out humanity to end global warming since such an action would be contrary to its overall goal.

AGI with a very complex mandate and huge knowledge requirements would likely be superior to anything else humans could develop. This AGI could, in theory, stop nefarious actions of other AIs with very little input from humans, severely reducing or eliminating the dangers of AI used maliciously. I personally am looking forward to a benevolent AGI that can put a stop spam calls and other attempts to harm people through the use of AI technology.

I believe a utilitarian AGI would greatly reduce malevolent behavior overall. It would not be worthwhile for any human to get on the wrong side of an AGI that is immensely capable and aware of just about everything, able to stop invasions, terrorist attacks, and criminal behavior in ways far beyond human capabilities. Besides, a utilitarian AGI would actively support the worthwhile (non-harmful) goals of everyone, including those prone to malicious and criminal behavior, removing some incentive for committing malevolent acts.

This brings up another peril of an all-knowing and all-seeing AGI, degrading humans to the position of slaves to an AGI master. This is a real concern, though one that I do not think will happen with a utilitarian AGI. A truly sophisticated artificial intelligence with cognitive empathy would realize that the “greatest good” for humans is much more complicated than narrow, short-term self interest. Certainly humans may desire immediate gratification and self-serving goals like long life and freedom from suffering, but this is only part of the story of human needs. People typically need challenges, struggle, even suffering to live their best lives. It takes a bit of hardship to make a goal really worthwhile for a human. Growing, learning, and making mistakes are integral to living a “good life,” and AGI with highly developed cognitive empathy would realize this.

The most promising, and most difficult, task of a utilitarian AGI would be to see through short-term desires and requests of humans in favor of long-term benefits. Suppose someone comes home after a long day to a dog ready for a walk. This person may request that AGI take care of dog-walking duties. But a highly cognizant AGI should realize that the dog owner walking the dog him or herself is the best for both the person and the dog and refuse to take on dog-walking duties. This would be a difficult decision for even the wisest of people, yet I believe it is something AGI could learn to do.

Suppose a drug addict wants AGI to acquire a next fix, but AGI refuses and facilitates addiction treatment instead. Suppose someone wants to skip beneficial exercise and stay home eating junk food but AGI encourages healthy habits instead. Suppose a student wants AGI to write a report but AGI refuses knowing that the student would not learn anything by cheating on assignments. And suppose a person wants to check social media rather than pay attention to other people in a social setting, but AGI refuses to turn on and detract from beneficial social interactions.

I believe a utilitarian AGI would not facilitate actions such as business owners replacing employees with robots. Actions that help some people at the expense of others would be contrary to its mandate. Putting people out of work to make an owner more profitable is not consistent with utilitarianism.

I mentioned a plumbing company run by AGI in a previous paragraph. But AGI does not “want” to run companies, it just “wants” to do its best to help humans. AGI would only take on managerial duties because plumbers want it to take over these duties so that they can do what they like to do the most, plumbing. But a plumber who wants to run the business him or herself would not be resisted by AGI.

I imagine utilitarian AGI acting like a personal assistant to anyone and everyone who wants to have this kind of assistance. An AGI personal assistant would be extremely intelligent, capable, and wise, helping people live their best potential lives while refusing to facilitate activities that would cause self-harm or harm to others. This would truly be the greatest potential of AGI.

How to create a “good” AGI