Critique of Superintelligence Part 1

This is part 1 of a 5-part se­quence:

Part 1: sum­mary of Bostrom’s argument

Part 2: ar­gu­ments against a fast takeoff

Part 3: cos­mic ex­pan­sion and AI motivation

Part 4: tractabil­ity of AI alignment

Part 5: ex­pected value arguments


In this ar­ti­cle I pre­sent a cri­tique of Nick Bostrom’s book Su­per­in­tel­li­gence. For pur­poses of brevity I shall not de­vote much space to sum­maris­ing Bostrom’s ar­gu­ments or defin­ing all the terms that he uses. Though I briefly re­view each key idea be­fore dis­cussing it, I shall also as­sume that read­ers have some gen­eral idea of Bostrom’s ar­gu­ment, and some of the key terms in­volved. Also note that to keep this piece fo­cused, I only dis­cuss ar­gu­ments raised in this book, and not what Bostrom has writ­ten el­se­where or oth­ers who have ad­dressed similar is­sues. The struc­ture of this ar­ti­cle is as fol­lows. I first offer a sum­mary of what I re­gard to be the core ar­gu­ment of Bostrom’s book, out­lin­ing a se­ries of premises that he defends in var­i­ous chap­ters. Fol­low­ing this sum­mary, I com­mence a gen­eral dis­cus­sion and cri­tique of Bostrom’s con­cept of ‘in­tel­li­gence’, ar­gu­ing that his failure to adopt a sin­gle, con­sis­tent us­age of this con­cept in his book fatally un­der­mines his core ar­gu­ment. The re­main­ing sec­tions of this ar­ti­cle then draw upon this dis­cus­sion of the con­cept of in­tel­li­gence in re­spond­ing to each of the key premises of Bostrom’s ar­gu­ment. I con­clude with a sum­mary of the strengths and weak­nesses of Bostrom’s ar­gu­ment.

Sum­mary of Bostrom’s Argument

Through­out much of his book, Bostrom re­mains quite vague as to ex­actly what ar­gu­ment he is mak­ing, or in­deed whether he is mak­ing a spe­cific ar­gu­ment at all. In many chap­ters he pre­sents what are es­sen­tially lists of var­i­ous con­cepts, cat­e­gories, or con­sid­er­a­tions, and then ar­tic­u­lates some thoughts about them. Ex­actly what con­clu­sion we are sup­posed to draw from his dis­cus­sion is of­ten not made ex­plicit. Nev­er­the­less, by my read­ing the book does at least im­plic­itly pre­sent a very clear ar­gu­ment, which bears a strong similar­ity to the sorts of ar­gu­ments com­monly found in the Effec­tive Altru­ism (EA) move­ment, in favour of fo­cus­ing on AI re­search as a cause area. In or­der to provide struc­ture for my re­view, I have there­fore con­structed an ex­plicit for­mu­la­tion of what I take to be Bostrom’s main ar­gu­ment in his book. I sum­marise it as fol­lows:

Premise 1: A su­per­in­tel­li­gence, defined as a sys­tem that ‘ex­ceeds the cog­ni­tive perfor­mance of hu­mans in vir­tu­ally all do­mains of in­ter­est’, is likely to be de­vel­oped in the fore­see­able fu­ture (decades to cen­turies).

Premise 2: If su­per­in­tel­li­gence is de­vel­oped, some su­per­in­tel­li­gent agent is likely to ac­quire a de­ci­sive strate­gic ad­van­tage, mean­ing that no ter­res­trial power or pow­ers would be able to pre­vent it do­ing as it pleased.

Premise 3: A su­per­in­tel­li­gence with a de­ci­sive strate­gic ad­van­tage would be likely to cap­ture all or most of the cos­mic en­dow­ment (the to­tal space and re­sources within the ac­cessible uni­verse), and put it to use for its own pur­poses.

Premise 4: A su­per­in­tel­li­gence which cap­tures the cos­mic en­dow­ment would likely put this en­dow­ment to uses in­con­gru­ent with our (hu­man) val­ues and de­sires.

Pre­limi­nary con­clu­sion: In the fore­see­able fu­ture it is likely that a su­per­in­tel­li­gent agent will be cre­ated which will cap­ture the cos­mic en­dow­ment and put it to uses in­con­gru­ent with our val­ues. (I call this the AI Doom Sce­nario).

Premise 5: Pur­suit of work on AI safety has a non-triv­ial chance of no­tice­ably re­duc­ing the prob­a­bil­ity of the AI Doom Sce­nario oc­cur­ring.

Premise 6: If pur­suit of work on AI safety has at least a non-triv­ial chance of no­tice­ably re­duc­ing the prob­a­bil­ity of an AI Doom Sce­nario, then (given the pre­limi­nary con­clu­sion above) the ex­pected value of such work is ex­cep­tion­ally high.

Premise 7: It is morally best for the EA com­mu­nity to prefer­en­tially di­rect a large frac­tion of its marginal re­sources (in­clud­ing money and tal­ent) to the cause area with high­est ex­pected value.

Main con­clu­sion: It is morally best for the EA com­mu­nity to di­rect a large frac­tion of its marginal re­sources to work on AI safety. (I call this the AI Safety Th­e­sis.)

Bostrom dis­cusses the first premise in chap­ters 1-2, the sec­ond premise in chap­ters 3-6, the third premise in chap­ters 6-7, the fourth premise in chap­ters 8-9, and some as­pects of the fifth premise in chap­ters 13-14. The sixth and sev­enth premises are not re­ally dis­cussed in the book (though some as­pects of them are hinted at in chap­ter 15), but are widely dis­cussed in the EA com­mu­nity and serve as the link be­tween the ab­stract ar­gu­men­ta­tion and real-world ac­tion, and as such I de­cided also to dis­cuss them here for com­plete­ness. Many of these premises could be ar­tic­u­lated slightly differ­ently, and per­haps Bostrom would pre­fer to rephrase them in var­i­ous ways. Nev­er­the­less I hope that they at least ad­e­quately cap­ture the gen­eral thrust and key con­tours of Bostrom’s ar­gu­ment, as well as how it is typ­i­cally ap­pealed to and ar­tic­u­lated within the EA com­mu­nity.

The na­ture of intelligence

In my view, the biggest prob­lem with Bostrom’s ar­gu­ment in Su­per­in­tel­li­gence is his failure to de­vote any sub­stan­tial space to dis­cussing the na­ture or defi­ni­tion of in­tel­li­gence. In­deed, through­out the book I be­lieve Bostrom uses three quite differ­ent con­cep­tions of in­tel­li­gence:

  • In­tel­li­gence(1): In­tel­li­gence as be­ing able to perform most or all of the cog­ni­tive tasks that hu­mans can perform. (See page 22)

  • In­tel­li­gence(2): In­tel­li­gence as a mea­surable quan­tity along a sin­gle di­men­sion, which rep­re­sents some sort of gen­eral cog­ni­tive effi­ca­cious­ness. (See pages 70,76)

  • In­tel­li­gence(3): In­tel­li­gence as skill at pre­dic­tion, plan­ning, and means-ends rea­son­ing in gen­eral. (See page 107)

While cer­tainly not en­tirely un­re­lated, these three con­cep­tions are all quite differ­ent from each other. In­tel­li­gence(1) is mostly nat­u­rally viewed as a mul­ti­di­men­sional con­struct, since hu­mans ex­hibit a wide range of cog­ni­tive abil­ities and it is by no means clear that they are all re­ducible to a sin­gle un­der­ly­ing phe­nomenon that can be mean­ingfully quan­tified with one num­ber. It seems much more plau­si­ble to say that the range of hu­man cog­ni­tive abil­ities re­quire many differ­ent skills which are some­times mu­tu­ally-sup­port­ive, some­times mostly un­re­lated, and some­times mu­tu­ally-in­hibitory in vary­ing ways and to vary­ing de­grees. This first con­cep­tion of in­tel­li­gence is also ex­plic­itly an­thro­pocen­tric, un­like the other two con­cep­tions which make no refer­ence to hu­man abil­ities.

In­tel­li­gence(2) is uni­di­men­sional and quan­ti­ta­tive, and also ex­tremely ab­stract, in that it does not re­fer di­rectly to any par­tic­u­lar skills or abil­ities. It most closely par­allels the no­tion of IQ or other similar op­er­a­tional mea­sures of hu­man in­tel­li­gence (which Bostrom even men­tions in his dis­cus­sion), in that it is ex­plic­itly quan­ti­ta­tive and at­tempts to re­duce ab­stract rea­son­ing abil­ities to a num­ber along a sin­gle di­men­sion. In­tel­li­gence(3) is much more spe­cific and grounded than ei­ther of the other two, re­lat­ing only to par­tic­u­lar types of abil­ities. That said, it is not ob­vi­ously sub­ject to sim­ple quan­tifi­ca­tion along a sin­gle di­men­sion as is the case for In­tel­li­gence(2), nor is it clear that skill at pre­dic­tion and plan­ning is what is mea­sured by the quan­ti­ta­tive con­cept of In­tel­li­gence(2). Cer­tainly In­tel­li­gence(3) and In­tel­li­gence(2) can­not be equiv­a­lent if In­tel­li­gence(2) is even some­what analo­gous to IQ, since IQ mostly mea­sures skills at math­e­mat­i­cal, spa­tial, and ver­bal mem­ory and rea­son­ing, which are quite differ­ent from skills at pre­dic­tion and plan­ning (con­sider for ex­am­ple the phe­nomenon of autis­tic sa­vants). In­tel­li­gence(3) is also far more nar­row in scope than In­tel­li­gence(1), cor­re­spond­ing to only one of the many hu­man cog­ni­tive abil­ities.

Re­peat­edly through­out the book, Bostrom flips be­tween us­ing one or an­other of these con­cep­tions of in­tel­li­gence. This is a ma­jor weak­ness for Bostrom’s over­all ar­gu­ment, since in or­der for the ar­gu­ment to be sound it is nec­es­sary for a sin­gle con­cep­tion of in­tel­li­gence to be adopted and ap­ply in all of his premises. In the fol­low­ing para­graphs I out­line sev­eral of the clear­est ex­am­ples of how Bostrom’s equiv­o­ca­tion in the mean­ing of ‘in­tel­li­gence’ un­der­mines his ar­gu­ment.

Bostrom ar­gues that once a ma­chine be­comes more in­tel­li­gent than a hu­man, it would far ex­ceed hu­man-level in­tel­li­gence very rapidly, be­cause one hu­man cog­ni­tive abil­ity is that of build­ing and im­prov­ing AIs, and so any su­per­in­tel­li­gence would also be bet­ter at this task than hu­mans. This means that the su­per­in­tel­li­gence would be able to im­prove its own in­tel­li­gence, thereby fur­ther im­prov­ing its own abil­ity to im­prove its own in­tel­li­gence, and so on, the end re­sult be­ing a pro­cess of ex­po­nen­tially in­creas­ing re­cur­sive self-im­prove­ment. Although com­pel­ling on the sur­face, this ar­gu­ment re­lies on switch­ing be­tween the con­cepts of In­tel­li­gence(1) and In­tel­li­gence(2).

When Bostrom ar­gues that a su­per­in­tel­li­gence would nec­es­sar­ily be bet­ter at im­prov­ing AIs than hu­mans be­cause AI-build­ing is a cog­ni­tive abil­ity, he is ap­peal­ing to In­tel­li­gence(1). How­ever, when he ar­gues that this would re­sult in re­cur­sive self-im­prove­ment lead­ing to ex­po­nen­tial growth in in­tel­li­gence, he is ap­peal­ing to In­tel­li­gence(2). To see how these two ar­gu­ments rest on differ­ent con­cep­tions of in­tel­li­gence, note that con­sid­er­ing In­tel­li­gence(1), it is not at all clear that there is any gen­eral, sin­gle way to in­crease this form of in­tel­li­gence, as In­tel­li­gence(1) in­cor­po­rates a wide range of dis­parate skills and abil­ities that may be quite in­de­pen­dent of each other. As such, even a su­per­in­tel­li­gence that was bet­ter than hu­mans at im­prov­ing AIs would not nec­es­sar­ily be able to en­gage in rapidly re­cur­sive self-im­prove­ment of In­tel­li­gence(1), be­cause there may well be no such thing as a sin­gle vari­able or quan­tity called ‘in­tel­li­gence’ that is di­rectly as­so­ci­ated with AI-im­prov­ing abil­ity. Rather, there may be a host of as­so­ci­ated but dis­tinct abil­ities and ca­pa­bil­ities that each needs to be en­hanced and adapted in the right way (and in the right rel­a­tive bal­ance) in or­der to get bet­ter at de­sign­ing AIs. Only by as­sum­ing a uni­di­men­sional quan­ti­ta­tive con­cep­tion of In­tel­li­gence(2) does it make sense to talk about the rate of im­prove­ment of a su­per­in­tel­li­gence be­ing pro­por­tional to its cur­rent level of in­tel­li­gence, which then leads to ex­po­nen­tial growth.

Bostrom there­fore faces a dilemma. If in­tel­li­gence is a mix of a wide range of dis­tinct abil­ities as in In­tel­li­gence(1), there is no rea­son to think it can be ‘in­creased’ in the rapidly self-re­in­forc­ing way Bostrom speaks about (in math­e­mat­i­cal terms, there is no sin­gle vari­able which we can differ­en­ti­ate and plug into the differ­en­tial equa­tion, as Bostrom does in his ex­am­ple on pages 75-76). On the other hand, if in­tel­li­gence is a uni­di­men­sional quan­ti­ta­tive mea­sure of gen­eral cog­ni­tive effi­ca­cious­ness, it may be mean­ingful to speak of self-re­in­forc­ing ex­po­nen­tial growth, but it is not nec­es­sar­ily ob­vi­ous that any ar­bi­trary in­tel­li­gent sys­tem or agent would be par­tic­u­larly good at de­sign­ing AIs. In­tel­li­gence(2) may well help with this abil­ity, but it’s not at all clear it is suffi­cient – af­ter all, we read­ily con­ceive of build­ing a highly “in­tel­li­gent” ma­chine that can rea­son ab­stractly and pass IQ tests etc, but is use­less at build­ing bet­ter AIs.

Bostrom ar­gues that once a ma­chine in­tel­li­gence be­came more in­tel­li­gent than hu­mans, it would soon be able to de­velop a se­ries of ‘cog­ni­tive su­per­pow­ers’ (in­tel­li­gence am­plifi­ca­tion, strate­gis­ing, so­cial ma­nipu­la­tion, hack­ing, tech­nol­ogy re­search, and eco­nomic pro­duc­tivity), which would then en­able it to es­cape what­ever con­straints were placed upon it and likely achieve a de­ci­sive strate­gic ad­van­tage. The prob­lem is that it is un­clear whether a ma­chine en­dowed only with In­tel­li­gence(3) (skill at pre­dic­tion and means-ends rea­son­ing) would nec­es­sar­ily be able to de­velop skills as di­verse as gen­eral sci­en­tific re­search abil­ity, the ca­pa­bil­ity to com­pe­tently use nat­u­ral lan­guage, and perform so­cial ma­nipu­la­tion of hu­man be­ings. Again, means-ends rea­son­ing may help with these skills, but clearly they re­quire much more be­yond this. Only if we are as­sum­ing the con­cep­tion of In­tel­li­gence(1), whereby the AI has already ex­ceeded es­sen­tially all hu­man cog­ni­tive abil­ities, does it be­come rea­son­able to as­sume that all of these ‘su­per­pow­ers’ would be at­tain­able.

Ac­cord­ing to the or­thog­o­nal­ity the­sis, there is no rea­son why the ma­chine in­tel­li­gence could not have ex­tremely re­duc­tion­ist goals such as max­imis­ing the num­ber of pa­per­clips in the uni­verse, since an AI’s level of in­tel­li­gence is to­tally sep­a­rate to and dis­tinct from its fi­nal goals. Bostrom’s ar­gu­ment for this the­sis, how­ever, clearly de­pends adopt­ing In­tel­li­gence(3), whereby in­tel­li­gence is re­garded as gen­eral skill with pre­dic­tion and means-ends rea­son­ing. It is in­deed plau­si­ble that an agent en­dowed only with this form of in­tel­li­gence would not nec­es­sar­ily have the abil­ity or in­cli­na­tion to ques­tion or mod­ify its goals, even if they are ex­tremely re­duc­tion­ist or what any hu­man would re­gard as patently ab­surd. If, how­ever, we adopt the much more ex­pan­sive con­cep­tion of In­tel­li­gence(1), the ar­gu­ment be­comes much less defen­si­ble. This should be­come clear if one con­sid­ers that ‘es­sen­tially all hu­man cog­ni­tive abil­ities’ in­cludes such ac­tivi­ties as pon­der­ing moral dilem­mas, re­flect­ing on the mean­ing of life, analysing and pro­duc­ing so­phis­ti­cated liter­a­ture, for­mu­lat­ing ar­gu­ments about what con­sti­tutes a ‘good life’, in­ter­pret­ing and writ­ing po­etry, form­ing so­cial con­nec­tions with oth­ers, and crit­i­cally in­tro­spect­ing upon one’s own goals and de­sires. To me it seems ex­traor­di­nar­ily un­likely that any agent ca­pa­ble of perform­ing all these tasks with a high de­gree of profi­ciency would si­mul­ta­neously stand firm in its con­vic­tion that the only goal it had rea­sons to pur­sue was till­ing the uni­verse with pa­per­clips.

As such, Bostrom is driven by his cog­ni­tive su­per­pow­ers ar­gu­ment to adopt the broad no­tion of in­tel­li­gence seen in In­tel­li­gence(1), but then is driven back to a much nar­rower In­tel­li­gence(3) when he wishes to defend the or­thog­o­nal­ity the­sis. The key point to be made here is that the goals or prefer­ences of a ra­tio­nal agent are sub­ject to ra­tio­nal re­flec­tion and re­con­sid­er­a­tion, and the ex­er­cise of rea­son in turn is shaped by the agent’s prefer­ences and goals. Short of rad­i­cally re­defin­ing what we mean by ‘in­tel­li­gence’ and ‘mo­ti­va­tion’, this com­plex in­ter­ac­tion will always ham­per sim­plis­tic at­tempts to neatly sep­a­rate them, thereby un­der­min­ing Bostrom’s case for the or­thog­o­nal­ity the­sis—un­less a very nar­row con­cep­tion of in­tel­li­gence is adopted.

In the table be­low I sum­marise sev­eral of the key out­comes or de­vel­op­ments that are crit­i­cal to Bostrom’s ar­gu­ment, and how plau­si­ble they would be un­der each of the three con­cep­tions of in­tel­li­gence. Ob­vi­ously such judge­ments are nec­es­sar­ily vague and sub­jec­tive, but the key point I wish to make is sim­ply that only by ap­peal­ing to differ­ent con­cep­tions of in­tel­li­gence in differ­ent cases is Bostrom able to ar­gue that all of the out­comes are rea­son­ably likely to oc­cur. Fatally for his ar­gu­ment, there is no sin­gle con­cep­tion of in­tel­li­gence that makes all of these out­comes si­mul­ta­neously likely or plau­si­ble.

Out­come In­tel­li­gence(1) In­tel­li­gence(2) In­tel­li­gence(3)

Quick take­off Highly un­likely Likely Unclear

All su­per­pow­ers Highly likely Highly un­likely Highly unlikely

Ab­surd goals Highly un­likely Un­clear Likely

No change to goals Un­likely Un­clear Likely