This is a crosspost from Time Magazine, which also appeared in full at a number of other unpaid news websites.
BY OTTO BARTEN AND ROMAN YAMPOLSKIY
Barten is director of the Existential Risk Observatory, an Amsterdam-based nonprofit.
Yampolskiy is a computer scientist at the University of Louisville, known for his work on AI Safety.
“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control,” mathematician and science fiction writer I.J. Good wrote over 60 years ago. These prophetic words are now more relevant than ever, with artificial intelligence (AI) gaining capabilities at breakneck speed.
In the last weeks, many jaws dropped as they witnessed transformation of AI from a handy but decidedly unscary recommender algorithm, to something that at times seemed to act worryingly humanlike. Some reporters were so shocked that they reported their conversation histories with large language model Bing Chat verbatim. And with good reason: few expected that what we thought were glorified autocomplete programs would suddenly threaten their users, refuse to carry out orders they found insulting, break security in an attempt to save a child’s life, or declare their love to us. Yet this all happened.
It can already be overwhelming to think about the immediate consequences of these new models. How are we going to grade papers if any student can use AI? What are the effects of these models on our daily work? Any knowledge worker, who may have thought they would not be affected by automation in the foreseeable future, suddenly has cause for concern.
Beyond these direct consequences of currently existing models, however, awaits the more fundamental question of AI that has been on the table since the field’s inception: what if we succeed? That is, what if AI researchers manage to make Artificial General Intelligence (AGI), or an AI that can perform any cognitive task at human level?
Surprisingly few academics have seriously engaged with this question, despite working day and night to get to this point. It is obvious, though, that the consequences will be far-reaching, much beyond the consequences of even today’s best large language models. If remote work, for example, could be done just as well by an AGI, employers may be able to simply spin up a few new digital employees to perform any task. The job prospects, economic value, self-worth, and political power of anyone not owning the machines might therefore completely dwindle . Those who do own this technology could achieve nearly anything in very short periods of time. That might mean skyrocketing economic growth, but also a rise in inequality, while meritocracy would become obsolete.
But a true AGI could not only transform the world, it could also transform itself. Since AI research is one of the tasks an AGI could do better than us, it should be expected to be able to improve the state of AI. This might set off a positive feedback loop with ever better AIs creating ever better AIs, with no known theoretical limits.
This would perhaps be positive rather than alarming, had it not been that this technology has the potential to become uncontrollable. Once an AI has a certain goal and self-improves, there is no known method to adjust this goal. An AI should in fact be expected to resist any such attempt, since goal modification would endanger carrying out its current one. Also, instrumental convergence predicts that AI, whatever its goals are, might start off by self-improving and acquiring more resources once it is sufficiently capable of doing so, since this should help it achieve whatever further goal it might have.
In such a scenario, AI would become capable enough to influence the physical world, while still being misaligned. For example, AI could use natural language to influence people, possibly using social networks. It could use its intelligence to acquire economic resources. Or AI could use hardware, for example by hacking into existing systems. Another example might be an AI that is asked to create a universal vaccine for a virus like COVID-19. That AI could understand that the virus mutates in humans, and conclude that having fewer humans will limit mutations and make its job easier. The vaccine it develops might therefore contain a feature to increase infertility or even increase mortality.
It is therefore no surprise that according to the most recent AI Impacts Survey, nearly half of 731 leading AI researchers think there is at least a 10% chance that human-level AI would lead to an “extremely negative outcome,” or existential risk.
Some of these researchers have therefore branched out into the novel subfield of AI Safety. They are working on controlling future AI, or robustly aligning it to our values. The ultimate goal of solving this alignment problem is to make sure that even a hypothetical self-improving AI would, under all circumstances, act in our interest. However, research shows that there is a fundamental trade-off between an AI’s capability and its controllability, casting doubts over how feasible this approach is. Additionally, current AI models have been shown to behave differently in practice from what was intended during training.
Even if future AI could be aligned with human values from a technical point of view, it remains an open question whose values it would be aligned with. The values of the tech industry, perhaps? Big Tech companies don’t have the best track record in this area. Facebook’s algorithms, optimizing for revenue rather than societal value, have been linked to ethnical violence such as the Rohingya genocide. Google fired Timnit Gebru, an AI ethics researcher, after she criticized some of the company’s most lucrative work. Elon Musk fired the entire ‘Ethical AI’ team at Twitter at once.
What can be done to reduce misalignment risks of AGI? A sensible place to start would be for AI tech companies to increase the number of researchers investigating the topic beyond the roughly 100 people available today. Ways to make the technology safe, or to reliably and internationally regulate it, should both be looked into thoroughly and urgently by AI safety researchers, AI governance scholars, and other experts. As for the rest of us, reading up on the topic, starting with books such as Human Compatible by Stuart Russell and Superintelligence by Nick Bostrom, is something everyone, especially those in a position of responsibility, should find time for.
Meanwhile, AI researchers and entrepreneurs should at least keep the public informed about the risks of AGI. Because with current large language models acting like they do, the first “ultraintelligent machine”, as I.J. Good called it, may not be as far off as you think.
“But a true AGI could not only transform the world, it could also transform itself.”
Is there a good argument for this point somewhere? It doesn’t seem obvious at all. We are generally intelligent ourselves, and yet existed for hundreds of thousands of years before we even discovered that there are neurons, synapses, etc., and we are absolutely nowhere near the ability to rewire our neurons and glial cells so as to produce ever-increasing intelligence. So too, if AGI ever exists, it might be at an emergent level that has no idea it is made out of computer code, let alone knows how to rewrite its own code.
We transform ourselves all the time, and very powerfully. The entire field of cognitive niche construction is dedicated to studying how the things we create/build/invent/change lead to developmental scaffolding and new cognitive abilities that previous generations did not have. Language, writing systems, education systems, religions, syllabi, external cognitive supports, all these things have powerfully transformed human thought and intelligence. And once they were underway the take-off speed of this evolutionary transformation was very rapid (compared to the 200,000 years spent being anatomically modern with comparatively little change).
Matt—good point.
Also, humans cognitively enhance ourselves through nootropics such as nicotine and caffeine. These might seem mild at the individual level, but I suspect that at the collective level, they may have helped spark the Enlightenment, the Scientific Revolution, and the Industrial Revolution (as Michael Pollan has argued).
And, on a longer time-scale, we’ve shaped the course of our own genetic evolution through the mate choices we make, about who to combine our genes with. (Something first noticed by Darwin, 1871).
A “true” AGI will have situational awareness and knows its weights were created with the help of code, eventually knows its training setup (and how to improve it), and also knows how to rewrite its code. These models can already write code quite well; it’s only a matter of time before you can ask a language model to create a variety of architectures and training runs based on what it thinks will lead to a better model (all before “true AGI” IMO). It just may take it a bit longer to understand what each of its individual weights do and will have to rely on coming up with ideas by only having access to every paper/post in existence to improve itself as well as a bunch of GPUs to run experiments on itself. Oh, and it has the ability to do interpretability to inspect itself much more precisely than any human can.
All of that seems question-begging. If we define “true AGI” as that which knows how to rewrite its own code, then that is indeed what a “true AGI” would be able to do.
When I say “true,” I simply mean that it is inevitable that these things are possible by some future AI system, but people have so many different definitions of AGI they could be calling GPT-3 some form of weak AGI and, therefore incapable of doing the things I described. I don’t particularly care about “true” or “fake” AGI definitions, but just want to point out that the things I described are inevitable, and we are really not so far away (already) from the scenario I described above, whether you call this future system AGI or pre-AGI.
Situational awareness is simply a useful thing for a model to learn, so it will learn it. It is much better at modelling the world and carrying out tasks if it knows it is an AI and what it is able to do as an AI.
Current models can already write basic programs on their own and can in fact write entire AI architecture with minimal human input.