Strategic implications of AI scenarios
[Originally posted on my new website on cause prioritization. This article is an introductory exploration of what different AI scenarios imply for our strategy in shaping advanced AI and might be interesting to the broader EA community, which is why I crosspost it here.]
Efforts to mitigate the risks of advanced artificial intelligence may be a top priority for effective altruists. If this is true, what are the best means to shape AI? Should we write math-heavy papers on open technical questions, or opt for broader, non-technical interventions like values spreading?
The answer to these questions hinges on how we expect AI to unfold. That is, what do we expect advanced AI to look like, and how will it be developed?
Many of these issues have been discussed at length, but the implications for the action-guiding question of how to best work on the problem often remain unclear. This post aims to fill the gap with a rigorous analysis of how different views on AI scenarios relate to the possible ways to shape advanced AI.
Key questions
We can slice the space of possible scenarios in infinitely many ways, some of which are more useful for our thinking than others. Commonly discussed questions about AI scenarios include:
When will humanity build general artificial intelligence, assuming that it happens at all?
Will the takeoff be hard or soft? That is, how long will it take to get from a human-level AI to a superintelligence?
To what extent will the goals of an AI be aligned with human values?
What architecture will be used to build advanced AI? For instance, will it use an explicit utility function or a reward module? Will it be based on “clean” mathematical principles or on a “messy” collection of heuristics?
Will advanced AI act as a single agent, as MIRI’s models tend to assume, or will superintelligence reside in a distributed system like the economy?
The reason why we ask these questions is that the answers determine how we should work on the problem. We can choose from a plethora of possible approaches:
We might work on technical aspects of the AI alignment problem.
We could do other kinds of technical research, such as finding scalable solutions to short-term problems or specifically trying to prevent the worst possible outcomes.
We could focus on philosophical and conceptual work to raise awareness of AI-related issues.
We could work on AI policy or AI strategy.
Instead of shaping AI directly, we might opt for broader, more indirect interventions such as improving international cooperation and spreading altruistic values.
Which factors determine the value of technical AI safety work?
To avoid the complexity of considering many strategic questions at the same time, I will focus on whether we should work on AI in a technical or non-technical way, which I believe to be the most action-guiding dimension.
The control problem
The value of technical work depends on whether it is possible to find well-posed and tractable technical problems whose solution is essential for a positive AI outcome. The most common candidate for this role is the control problem (and subproblems thereof), or how to make superintelligent AI systems act in accordance with human values. The viability of technical work therefore depends to some extent on whether it makes sense to think about AI in this way – that is, whether the control problem is of central importance.
This, in turn, depends on our outlook on AI scenarios. For instance, we might think that the technical side of AI safety may be less difficult than it seems, that they will likely be solved anyway, or that the most serious risks may instead be related to security aspects, coordination problems, and selfish values.
The following views support work on the control problem:
Uncontrolled AI is the “default outcome” or at least somewhat likely, which makes technical work on the problem a powerful lever for influencing the far future. Also, uncontrolled AI would mean that human values will matter less in the future, which renders many other interventions – such as values spreading – futile.
A hard takeoff or intelligence explosion is likely. This matters because it correlates with the likelihood of uncontrolled AI. Also, it might mean that humans will quickly be “out of the loop”, making it more difficult to shape AI with non-technical means.
We cannot rule out very short timelines. In this case, the takeoff will be unexpected and research on AI safety will be more neglected, which means that we can have a larger impact.
In contrast, if AI is like the economy, then the control problem does not apply in its usual form – there is no unified agent to control. Influencing the technical development of AI would be harder because of its gradual nature, just as it was arguably difficult to influence industrialization in the past.
It is often argued that an agent-like superintelligence would ultimately emerge even if AI takes a different form at first. I think this is likely, but not certain. But even so, the strategic picture is radically different if economy-like AI comes first. This is because we can mainly hope to (directly) shape the first kind of advanced AI since it is hard to predict, and hard to influence, what happens afterward.
In other words, the first transition may constitute an “event horizon” and therefore be most relevant to strategic considerations. For example, if agent-like AI is built second, then the first kind of advanced AI systems will be the driving force. They will be intellectually superior to us by many orders of magnitude, which makes it all but impossible to (directly) influence the agent-like AI via technical work.
How much safety work will be done anyway?
This brings us to another intermediate variable, namely how much technical safety work will be done by others anyway. If the timeline to AI is long, if the takeoff is soft, or if AI is like the economy, then large amounts of money and skilled time may be dedicated to AI safety, comparable to contemporary mainstream discussion of climate change.
As AI is applied to more and more industrial contexts, large-scale failures of AI systems will likely become dangerous or costly, so we can expect that the AI industry will be forced to make them safe, either because their customers demand it or because of regulation. We may also experience an AI Sputnik moment that leads to more investment in safety research.
Since the resources of effective altruists are small in comparison to large companies and governments, this scenario reduces the value of technical AI safety work. Non-technical approaches such as spreading altruistic values among AI researchers or work on AI policy might be more promising in these cases. However, the argument does not apply if we are interested in specific questions that would otherwise remain neglected, or if we think that safety techniques will not work anymore once AI systems reach a certain threshold of capability. (It’s unclear to what extent this is the case.)
This shows that how we work on AI depends not only on our predictions of future scenarios, but also on our goals. Personally, I’m mostly interested in suffering-focused AI safety, that is, how to prevent s-risks of advanced AI. This may lead to slightly different strategic conclusions compared to AI safety efforts that focus on loading human values. For instance, it means that fewer people will work on the issues that matter most to me.
A related question is whether strong intelligence enhancement, such as emulations or iterated embryo selection, will become feasible (and is employed) before strong AI is built. In that case, the enhanced minds will likely work on AI safety, too, which might mean that future generations can tackle the problem more effectively (given sufficiently long timelines). In fact, this may be true even without intelligence enhancement because we are nearsighted with respect to time, that is, it is harder to predict and influence events that are further in the future.
It’s not clear whether strong intelligence enhancement technology will be available before advanced AI. But we can view modern tools such as blogs and online forums as a weak form of intelligence enhancement in that they facilitate the exchange of ideas; extrapolating this trend, future generations may be even more “intelligent” in a sense. Of course, if we think that AI may be built unexpectedly soon, then the argument is less relevant.
Uncertainty about AI scenarios
Technical work requires a sufficiently good model of what AI will look like, or else we cannot identify viable technical measures. The more uncertain we are about all the different parameters of how AI will unfold, the harder it is to influence its technical development. That said, radical uncertainty also affects other approaches to shape AI, potentially making it a general argument against focusing on AI. Still, the argument applies to a larger extent to technical work than to non-technical work.
In a nutshell, AI scenarios inform our strategy via three intermediate variables:
Is the control problem of central importance?
How much (quality-adjusted) technical work will others do anyway?
How certain can we be about how AI will develop?
Technical work seems more promising if we think the control problem is pivotal, if we think that others will invest sufficient resources, and if we have a clear picture of what AI will look like.
AI strategy on the movement level
Effective altruists should coordinate their efforts, that is, think in terms of comparative advantages and what the movement should do on the margin rather than just considering individual actions. Applied to the problem of how to best shape AI, this might imply that we should pursue a variety of approaches as a movement rather than committing to any single approach.
Still, my impression is that non-technical work on AI is somewhat neglected in the EA community. (80000 hours’ guide on AI policy tends to agree.)
My thoughts on AI scenarios
My position on AI scenarios is close to Brian Tomasik, that is, I lean toward a soft takeoff, relatively long timelines, and distributed, economy-like AI rather than a single actor. Also, we should question the notion of general (super)intelligence. AI systems will likely achieve superhuman performance in more and more domain-specific tasks, but not across all domains at the same time, which makes it a gradual process rather than an intelligence explosion. But of course, I cannot justify high confidence in these views given that many experts disagree.
Following the analysis of this post, this is reason to be mildly sceptical about whether technical work on the control problem is the best way to shape AI. That said, it’s still a viable option because I might be wrong and because technical work has indirect benefits in that it influences the AI community to take safety concerns more seriously.
More generally, one of the best ways to handle pervasive uncertainty may be to focus on “meta” activities such as increasing the influence of effective altruists in the AI community by building expertise and credibility. This is valuable regardless of one’s views on AI scenarios.”
Dangling sentence.
In my personal belief, the “hard AI takeoff” scenarios are driven mostly by the belief that current AI progress largely flows from a single skill, that is, “mathematics/programming”. So while AI will continue to develop at disparate rates and achieve superhuman performance in different areas at different rates, an ASI takeoff will be driven almost entirely by AI performance in software development, and once AI becomes superhuman in this skill it will rapidly become superhuman in all skills. This seems obvious to me, and I think disagreements with it have to rest largely with hidden difficulties in “software development”, such as understanding and modeling many different systems well enough to develop algorithms specialized for them (which seems like it’s almost circularly “AGI complete”).
What do you make of that objection? (I agree with it. I think programming efficiently and flexibly across problem domains is probably AGI-complete.)
My 2 cents: math/ programming is only half the battle. Here’s an analogy—you could be the best programmer in the world, but if you don’t understand chess, you can’t program a computer to beat a human at chess, and if you don’t understand quantum physics, you can’t program a computer to simulate matter at the atomic scale (well, not using ab initio methods anyway).
In order to get an intelligence explosion, a computer would have to not only have great programming skills, but also really understand intelligence. And intelligence isn’t just one thing—it’s a bunch of things (creativity, memory, planning, social skills, emotional skills etc and these can be subdivided further into different fields like physics, design, social understanding, social manipulation etc). I find it hard to believe that the same computer would go from not superhuman to superhuman in almost all of these all at once. Obviously computers outcompete humans in many of these already, but I think even on the more “human” traits and in areas where computer act more like agents than just like tools, it’s still more likely to happen in several waves instead of just one takeoff.
Does it mean that we could try to control AI by preventing its to know anything about programming?
And on the other side, any AI which is able to write code should be regarded extremely dangerous, no matter how low its abilities in other domains?
I think I broadly agree that take off is likely to be slow and that there is not a slam dunk argument for trying to make safe super intelligent agents.
However I think there is room for all sorts of work. Anything that can reduce the uncertainty of where AGI is going.
I think AI, as it is, is on slightly the wrong track. If we get on the right track we will get somewhere a lot quicker than the decades referenced above.
Computers as they stand are designed with the idea of having a human that looks after them and understands their inner workings, at least somewhat. Animals from the lowly nematode to humans do not have that assumption. Current deep learning assumes a human will create the input and output spaces and assign resources to that learning process.
If we can off load the administration of a computer to the computer itself, this would allow cheaper administration of computers and also the computer systems to become more complex. Computer systems are limited in complexity by the thing that debugs them.
I have an idea of what this might look like and if my current paradigm plays out, I think humanity will get the choice of creating separate agents or creating external lobes of our brains. Most likely humanity will pick the creating external lobes. The external lobes may act in a more economic fashion, but I think they still might have the capability of going bad. Minimising the probability of this is very important.
I think there is also probably a network effect, if we could get altruistically minded people to be the first to have the external brains then we might influence the future by preferentially helping other altruists to get external brains. This could create a social norms among people with external brains.
So I think technical work towards understanding administratively autonomous computers (no matter how intelligent they are) can reduce uncertainty and allow us to understand what choices face us.