Critique of Superintelligence Part 1

This is part 1 of a 5-part sequence:

Part 1: summary of Bostrom’s argument

Part 2: arguments against a fast takeoff

Part 3: cosmic expansion and AI motivation

Part 4: tractability of AI alignment

Part 5: expected value arguments

Introduction

In this article I present a critique of Nick Bostrom’s book Superintelligence. For purposes of brevity I shall not devote much space to summarising Bostrom’s arguments or defining all the terms that he uses. Though I briefly review each key idea before discussing it, I shall also assume that readers have some general idea of Bostrom’s argument, and some of the key terms involved. Also note that to keep this piece focused, I only discuss arguments raised in this book, and not what Bostrom has written elsewhere or others who have addressed similar issues. The structure of this article is as follows. I first offer a summary of what I regard to be the core argument of Bostrom’s book, outlining a series of premises that he defends in various chapters. Following this summary, I commence a general discussion and critique of Bostrom’s concept of ‘intelligence’, arguing that his failure to adopt a single, consistent usage of this concept in his book fatally undermines his core argument. The remaining sections of this article then draw upon this discussion of the concept of intelligence in responding to each of the key premises of Bostrom’s argument. I conclude with a summary of the strengths and weaknesses of Bostrom’s argument.

Summary of Bostrom’s Argument

Throughout much of his book, Bostrom remains quite vague as to exactly what argument he is making, or indeed whether he is making a specific argument at all. In many chapters he presents what are essentially lists of various concepts, categories, or considerations, and then articulates some thoughts about them. Exactly what conclusion we are supposed to draw from his discussion is often not made explicit. Nevertheless, by my reading the book does at least implicitly present a very clear argument, which bears a strong similarity to the sorts of arguments commonly found in the Effective Altruism (EA) movement, in favour of focusing on AI research as a cause area. In order to provide structure for my review, I have therefore constructed an explicit formulation of what I take to be Bostrom’s main argument in his book. I summarise it as follows:

Premise 1: A superintelligence, defined as a system that ‘exceeds the cognitive performance of humans in virtually all domains of interest’, is likely to be developed in the foreseeable future (decades to centuries).

Premise 2: If superintelligence is developed, some superintelligent agent is likely to acquire a decisive strategic advantage, meaning that no terrestrial power or powers would be able to prevent it doing as it pleased.

Premise 3: A superintelligence with a decisive strategic advantage would be likely to capture all or most of the cosmic endowment (the total space and resources within the accessible universe), and put it to use for its own purposes.

Premise 4: A superintelligence which captures the cosmic endowment would likely put this endowment to uses incongruent with our (human) values and desires.

Preliminary conclusion: In the foreseeable future it is likely that a superintelligent agent will be created which will capture the cosmic endowment and put it to uses incongruent with our values. (I call this the AI Doom Scenario).

Premise 5: Pursuit of work on AI safety has a non-trivial chance of noticeably reducing the probability of the AI Doom Scenario occurring.

Premise 6: If pursuit of work on AI safety has at least a non-trivial chance of noticeably reducing the probability of an AI Doom Scenario, then (given the preliminary conclusion above) the expected value of such work is exceptionally high.

Premise 7: It is morally best for the EA community to preferentially direct a large fraction of its marginal resources (including money and talent) to the cause area with highest expected value.

Main conclusion: It is morally best for the EA community to direct a large fraction of its marginal resources to work on AI safety. (I call this the AI Safety Thesis.)

Bostrom discusses the first premise in chapters 1-2, the second premise in chapters 3-6, the third premise in chapters 6-7, the fourth premise in chapters 8-9, and some aspects of the fifth premise in chapters 13-14. The sixth and seventh premises are not really discussed in the book (though some aspects of them are hinted at in chapter 15), but are widely discussed in the EA community and serve as the link between the abstract argumentation and real-world action, and as such I decided also to discuss them here for completeness. Many of these premises could be articulated slightly differently, and perhaps Bostrom would prefer to rephrase them in various ways. Nevertheless I hope that they at least adequately capture the general thrust and key contours of Bostrom’s argument, as well as how it is typically appealed to and articulated within the EA community.

The nature of intelligence

In my view, the biggest problem with Bostrom’s argument in Superintelligence is his failure to devote any substantial space to discussing the nature or definition of intelligence. Indeed, throughout the book I believe Bostrom uses three quite different conceptions of intelligence:

  • Intelligence(1): Intelligence as being able to perform most or all of the cognitive tasks that humans can perform. (See page 22)

  • Intelligence(2): Intelligence as a measurable quantity along a single dimension, which represents some sort of general cognitive efficaciousness. (See pages 70,76)

  • Intelligence(3): Intelligence as skill at prediction, planning, and means-ends reasoning in general. (See page 107)

While certainly not entirely unrelated, these three conceptions are all quite different from each other. Intelligence(1) is mostly naturally viewed as a multidimensional construct, since humans exhibit a wide range of cognitive abilities and it is by no means clear that they are all reducible to a single underlying phenomenon that can be meaningfully quantified with one number. It seems much more plausible to say that the range of human cognitive abilities require many different skills which are sometimes mutually-supportive, sometimes mostly unrelated, and sometimes mutually-inhibitory in varying ways and to varying degrees. This first conception of intelligence is also explicitly anthropocentric, unlike the other two conceptions which make no reference to human abilities.

Intelligence(2) is unidimensional and quantitative, and also extremely abstract, in that it does not refer directly to any particular skills or abilities. It most closely parallels the notion of IQ or other similar operational measures of human intelligence (which Bostrom even mentions in his discussion), in that it is explicitly quantitative and attempts to reduce abstract reasoning abilities to a number along a single dimension. Intelligence(3) is much more specific and grounded than either of the other two, relating only to particular types of abilities. That said, it is not obviously subject to simple quantification along a single dimension as is the case for Intelligence(2), nor is it clear that skill at prediction and planning is what is measured by the quantitative concept of Intelligence(2). Certainly Intelligence(3) and Intelligence(2) cannot be equivalent if Intelligence(2) is even somewhat analogous to IQ, since IQ mostly measures skills at mathematical, spatial, and verbal memory and reasoning, which are quite different from skills at prediction and planning (consider for example the phenomenon of autistic savants). Intelligence(3) is also far more narrow in scope than Intelligence(1), corresponding to only one of the many human cognitive abilities.

Repeatedly throughout the book, Bostrom flips between using one or another of these conceptions of intelligence. This is a major weakness for Bostrom’s overall argument, since in order for the argument to be sound it is necessary for a single conception of intelligence to be adopted and apply in all of his premises. In the following paragraphs I outline several of the clearest examples of how Bostrom’s equivocation in the meaning of ‘intelligence’ undermines his argument.

Bostrom argues that once a machine becomes more intelligent than a human, it would far exceed human-level intelligence very rapidly, because one human cognitive ability is that of building and improving AIs, and so any superintelligence would also be better at this task than humans. This means that the superintelligence would be able to improve its own intelligence, thereby further improving its own ability to improve its own intelligence, and so on, the end result being a process of exponentially increasing recursive self-improvement. Although compelling on the surface, this argument relies on switching between the concepts of Intelligence(1) and Intelligence(2).

When Bostrom argues that a superintelligence would necessarily be better at improving AIs than humans because AI-building is a cognitive ability, he is appealing to Intelligence(1). However, when he argues that this would result in recursive self-improvement leading to exponential growth in intelligence, he is appealing to Intelligence(2). To see how these two arguments rest on different conceptions of intelligence, note that considering Intelligence(1), it is not at all clear that there is any general, single way to increase this form of intelligence, as Intelligence(1) incorporates a wide range of disparate skills and abilities that may be quite independent of each other. As such, even a superintelligence that was better than humans at improving AIs would not necessarily be able to engage in rapidly recursive self-improvement of Intelligence(1), because there may well be no such thing as a single variable or quantity called ‘intelligence’ that is directly associated with AI-improving ability. Rather, there may be a host of associated but distinct abilities and capabilities that each needs to be enhanced and adapted in the right way (and in the right relative balance) in order to get better at designing AIs. Only by assuming a unidimensional quantitative conception of Intelligence(2) does it make sense to talk about the rate of improvement of a superintelligence being proportional to its current level of intelligence, which then leads to exponential growth.

Bostrom therefore faces a dilemma. If intelligence is a mix of a wide range of distinct abilities as in Intelligence(1), there is no reason to think it can be ‘increased’ in the rapidly self-reinforcing way Bostrom speaks about (in mathematical terms, there is no single variable which we can differentiate and plug into the differential equation, as Bostrom does in his example on pages 75-76). On the other hand, if intelligence is a unidimensional quantitative measure of general cognitive efficaciousness, it may be meaningful to speak of self-reinforcing exponential growth, but it is not necessarily obvious that any arbitrary intelligent system or agent would be particularly good at designing AIs. Intelligence(2) may well help with this ability, but it’s not at all clear it is sufficient – after all, we readily conceive of building a highly “intelligent” machine that can reason abstractly and pass IQ tests etc, but is useless at building better AIs.

Bostrom argues that once a machine intelligence became more intelligent than humans, it would soon be able to develop a series of ‘cognitive superpowers’ (intelligence amplification, strategising, social manipulation, hacking, technology research, and economic productivity), which would then enable it to escape whatever constraints were placed upon it and likely achieve a decisive strategic advantage. The problem is that it is unclear whether a machine endowed only with Intelligence(3) (skill at prediction and means-ends reasoning) would necessarily be able to develop skills as diverse as general scientific research ability, the capability to competently use natural language, and perform social manipulation of human beings. Again, means-ends reasoning may help with these skills, but clearly they require much more beyond this. Only if we are assuming the conception of Intelligence(1), whereby the AI has already exceeded essentially all human cognitive abilities, does it become reasonable to assume that all of these ‘superpowers’ would be attainable.

According to the orthogonality thesis, there is no reason why the machine intelligence could not have extremely reductionist goals such as maximising the number of paperclips in the universe, since an AI’s level of intelligence is totally separate to and distinct from its final goals. Bostrom’s argument for this thesis, however, clearly depends adopting Intelligence(3), whereby intelligence is regarded as general skill with prediction and means-ends reasoning. It is indeed plausible that an agent endowed only with this form of intelligence would not necessarily have the ability or inclination to question or modify its goals, even if they are extremely reductionist or what any human would regard as patently absurd. If, however, we adopt the much more expansive conception of Intelligence(1), the argument becomes much less defensible. This should become clear if one considers that ‘essentially all human cognitive abilities’ includes such activities as pondering moral dilemmas, reflecting on the meaning of life, analysing and producing sophisticated literature, formulating arguments about what constitutes a ‘good life’, interpreting and writing poetry, forming social connections with others, and critically introspecting upon one’s own goals and desires. To me it seems extraordinarily unlikely that any agent capable of performing all these tasks with a high degree of proficiency would simultaneously stand firm in its conviction that the only goal it had reasons to pursue was tilling the universe with paperclips.

As such, Bostrom is driven by his cognitive superpowers argument to adopt the broad notion of intelligence seen in Intelligence(1), but then is driven back to a much narrower Intelligence(3) when he wishes to defend the orthogonality thesis. The key point to be made here is that the goals or preferences of a rational agent are subject to rational reflection and reconsideration, and the exercise of reason in turn is shaped by the agent’s preferences and goals. Short of radically redefining what we mean by ‘intelligence’ and ‘motivation’, this complex interaction will always hamper simplistic attempts to neatly separate them, thereby undermining Bostrom’s case for the orthogonality thesis—unless a very narrow conception of intelligence is adopted.

In the table below I summarise several of the key outcomes or developments that are critical to Bostrom’s argument, and how plausible they would be under each of the three conceptions of intelligence. Obviously such judgements are necessarily vague and subjective, but the key point I wish to make is simply that only by appealing to different conceptions of intelligence in different cases is Bostrom able to argue that all of the outcomes are reasonably likely to occur. Fatally for his argument, there is no single conception of intelligence that makes all of these outcomes simultaneously likely or plausible.

Outcome Intelligence(1) Intelligence(2) Intelligence(3)

Quick takeoff Highly unlikely Likely Unclear

All superpowers Highly likely Highly unlikely Highly unlikely

Absurd goals Highly unlikely Unclear Likely

No change to goals Unlikely Unclear Likely