New report on how much computational power it takes to match the human brain (Open Philanthropy)

Link post

Writ­ten by Joseph Car­l­smith.

Open Philan­thropy is in­ter­ested in when AI sys­tems will be able to perform var­i­ous tasks that hu­mans can perform (“AI timelines”). To in­form our think­ing, I in­ves­ti­gated what ev­i­dence the hu­man brain pro­vides about the com­pu­ta­tional power suffi­cient to match its ca­pa­bil­ities. I con­sulted with more than 30 ex­perts, and con­sid­ered four meth­ods of gen­er­at­ing es­ti­mates, fo­cus­ing on float­ing point op­er­a­tions per sec­ond (FLOP/​s) as a met­ric of com­pu­ta­tional power.

The full re­port on what I learned is here. This blog post is a medium-depth sum­mary of some con­text, the ap­proach I took, the meth­ods I ex­am­ined, and the con­clu­sions I reached. The re­port’s ex­ec­u­tive sum­mary is a shorter overview.

In brief, I think it more likely than not that 1015 FLOP/​s is enough to perform tasks as well as the hu­man brain (given the right soft­ware, which may be very hard to cre­ate). And I think it un­likely (<10%) that more than 1021 FLOP/​s is re­quired. (1) But I’m not a neu­ro­scien­tist, and the sci­ence here is very far from set­tled. (2) I offer a few more spe­cific prob­a­bil­ities, keyed to one spe­cific type of brain model, in the re­port’s ap­pendix.

For con­text: the Fu­gaku su­per­com­puter (~$1 billion) performs ~4×1017 FLOP/​s, and a V100 GPU (~$10,000) performs up to ~1014 FLOP/​s. (3) But even if my best guesses are right, this doesn’t mean we’ll see AI sys­tems as ca­pa­ble as the hu­man brain any­time soon. In par­tic­u­lar: ac­tu­ally cre­at­ing/​train­ing such sys­tems (as op­posed to build­ing com­put­ers that could in prin­ci­ple run them) is a sub­stan­tial fur­ther challenge.

Context

Some clas­sic analy­ses of AI timelines (no­tably, by Hans Mo­ravec and Ray Kurzweil) em­pha­size fore­casts about when available com­puter hard­ware will be “equiv­a­lent,” in some sense (see be­low for dis­cus­sion), to the hu­man brain. (4)

Graph schema for clas­sic fore­casts. See real ex­am­ples here and here.

A ba­sic ob­jec­tion to pre­dict­ing AI timelines on this ba­sis alone is that you need more than hard­ware to do what the brain does. (5) In par­tic­u­lar, you need soft­ware to run on your hard­ware, and cre­at­ing the right soft­ware might be very hard (Mo­ravec and Kurzweil both rec­og­nize this, and ap­peal to fur­ther ar­gu­ments). (6)

In the con­text of ma­chine learn­ing, we can offer a more spe­cific ver­sion of this ob­jec­tion: the hard­ware re­quired to run an AI sys­tem isn’t enough; you also need the hard­ware re­quired to train it (along with other re­sources, like data). (7) And train­ing a sys­tem re­quires run­ning it a lot. Deep­Mind’s AlphaGo Zero, for ex­am­ple, trained on ~5 mil­lion games of Go. (8)

Note, though, that de­pend­ing on what sorts of task-perfor­mance will re­sult from what sorts of train­ing, a frame­work for think­ing about AI timelines that in­cor­po­rated train­ing re­quire­ments would start, at least, to in­cor­po­rate and quan­tify the difficulty of cre­at­ing the right soft­ware more broadly. (9) This is be­cause train­ing turns com­pu­ta­tion and data (along with other re­sources) into soft­ware you wouldn’t oth­er­wise know how to code di­rectly.

What’s more, the hard­ware re­quired to train a sys­tem is re­lated to the hard­ware re­quired to run it. (10) This re­la­tion­ship is cen­tral to Open Philan­thropy’s in­ter­est in the topic of this re­port, and to an in­ves­ti­ga­tion my col­league Ajeya Co­tra has been con­duct­ing, which draws on my anal­y­sis. That in­ves­ti­ga­tion fo­cuses on what brain-re­lated FLOP/​s es­ti­mates, along with other es­ti­mates and as­sump­tions, might tell us about when it will be fea­si­ble to train differ­ent types of AI sys­tems. I don’t dis­cuss this ques­tion here, but it’s an im­por­tant part of the con­text. And in that con­text, brain-re­lated hard­ware es­ti­mates play a differ­ent role than they do in fore­casts like Mo­ravec’s and Kurzweil’s.

Approach

I fo­cus on float­ing point op­er­a­tions per sec­ond (FLOP/​s) as a met­ric of com­pu­ta­tional power. Th­ese are ar­ith­metic op­er­a­tions (ad­di­tion, sub­trac­tion, mul­ti­pli­ca­tion, di­vi­sion) performed on a pair of num­bers rep­re­sented in a com­puter in a for­mat akin to sci­en­tific no­ta­tion. Perform­ing tasks with com­put­ers re­quires re­sources other than FLOP/​s (for ex­am­ple, mem­ory and mem­ory band­width), so this fo­cus is nar­row (see sec­tion 1.4 for more dis­cus­sion). But FLOP/​s are a key in­put to the in­ves­ti­ga­tion of train­ing costs de­scribed above; and they’re one im­por­tant re­source more gen­er­ally.

My aim in the re­port is to see what ev­i­dence the brain pro­vides about what sorts of FLOP/​s bud­gets would be enough to perform any cog­ni­tive task that the hu­man brain can perform. (11) Sec­tion 1.6 gives more de­tails about the tasks I have in mind.

The pro­ject here is re­lated to, but dis­tinct from, di­rectly es­ti­mat­ing the min­i­mum FLOP/​s suffi­cient to perform any task the brain can perform. Here’s an anal­ogy. Sup­pose you want to build a bridge across the lo­cal river, and you’re won­der­ing if you have enough bricks. You know of only one such bridge (the “old bridge”), so it’s nat­u­ral to look there for ev­i­dence. If the old bridge is made of bricks, you could count them. If it’s made of some­thing else, like steel, you could try to figure out how many bricks you need to do what a given amount of steel does. If suc­cess­ful, you’ll end up con­fi­dent that e.g. 100,000 bricks is enough to build such a bridge, and hence that the min­i­mum is less than this. But how much less is still un­clear. You stud­ied an ex­am­ple bridge, but you didn’t de­rive the­o­ret­i­cal limits on the effi­ciency of bridge-build­ing. (12)

The pro­ject is also dis­tinct from es­ti­mat­ing the FLOP/​s “equiv­a­lent” to the hu­man brain. As I dis­cuss in the re­port’s ap­pendix, I think the no­tion of “the FLOP/​s equiv­a­lent to the brain” re­quires clar­ifi­ca­tion: there are a va­ri­ety of im­por­tantly differ­ent con­cepts in the vicinity.

To get a fla­vor of this, con­sider the bridge anal­ogy again, but as­sume that the old bridge is made of steel. What num­ber of bricks would be “equiv­a­lent” to the old bridge? The ques­tion seems ill-posed. It’s not that bridges can’t be built from bricks. But we need to say more about what we want to know.

I group the salient pos­si­ble con­cepts of the “FLOP/​s equiv­a­lent to the hu­man brain” into four cat­e­gories, each of which, I ar­gue, has its own prob­lems (see sec­tion 7.5 for a sum­mary chart). In the hopes of avoid­ing some of these prob­lems, I have kept the re­port’s frame­work broad. The brain-based FLOP/​s bud­gets I’m in­ter­ested in don’t need to be uniquely “equiv­a­lent” to the brain. Nor need they ac­com­mo­date any fur­ther con­straints on the similar­ity be­tween brain’s in­ter­nal dy­nam­ics and those of the AI sys­tems un­der con­sid­er­a­tion (see sec­tion 7.2); or on the train­ing/​en­g­ineer­ing pro­cesses that could cre­ate such sys­tems (see sec­tion 7.3). The bud­gets just need to be big enough, in prin­ci­ple, to perform the tasks in ques­tion.

Methods

I con­sid­ered four meth­ods of us­ing the brain to gen­er­ate FLOP/​s bud­gets. They were:

  1. Es­ti­mate the FLOP/​s re­quired to model the brain’s low-level mechanisms at a level of de­tail ad­e­quate to repli­cate task-perfor­mance (the “mechanis­tic method”). (13)

  2. Iden­tify a por­tion of the brain whose func­tion we can ap­prox­i­mate with com­put­ers, and then scale up to FLOP/​s es­ti­mates for the whole brain (the “func­tional method”).

  3. Use the brain’s en­ergy bud­get, to­gether with phys­i­cal limits set by Lan­dauer’s prin­ci­ple, to up­per-bound re­quired FLOP/​s (the “limit method”).

  4. Use the com­mu­ni­ca­tion band­width in the brain as ev­i­dence about its com­pu­ta­tional ca­pac­ity (the “com­mu­ni­ca­tion method”). I dis­cuss this method only briefly.

All these meth­ods must grap­ple in differ­ent ways with the se­vere limits on our un­der­stand­ing of how the brain pro­cesses in­for­ma­tion – a con­sis­tent theme in my con­ver­sa­tions with ex­perts. Sec­tion 1.5.1 de­tails some of the limits I have in mind. In many cases, cen­tral bar­ri­ers in­clude:

  • we lack the tools to gather the data we need (for ex­am­ple, we can’t re­li­ably mea­sure the in­put-out­put trans­for­ma­tion a neu­ron im­ple­ments dur­ing live be­hav­ior), (14) and/​or

  • we don’t know enough about the tasks that cells or groups of cells are perform­ing to tell how differ­ent lower-level mechanisms con­tribute. (15)

Th­ese and other bar­ri­ers coun­sel pes­simism about the ro­bust­ness of FLOP/​s es­ti­mates based on our cur­rent neu­ro­scien­tific un­der­stand­ing (see sec­tion 1.2 for fur­ther caveats). But the aim here is not to set­tle the ques­tion: it’s to make rea­son­able best-guesses, us­ing the in­con­clu­sive ev­i­dence cur­rently available.

I’ll say a few words about each method in turn, and the num­bers that re­sult.

The mechanis­tic method

The mechanis­tic method at­tempts to es­ti­mate the com­pu­ta­tion re­quired to model the brain’s biolog­i­cal mechanisms at a level of de­tail ad­e­quate to repli­cate task-perfor­mance. This method re­ceives the most at­ten­tion in the re­port, and it’s the one I put most weight on.

Si­mu­lat­ing the brain in ex­treme de­tail would re­quire enor­mous amounts of com­pu­ta­tional power. (16) The cen­tral ques­tion for the mechanis­tic method, then, is which de­tails need to be in­cluded, and which can be left out or sum­ma­rized.

The ap­proach I pur­sue fo­cuses on sig­nal­ing be­tween cells. Here, the idea is that for a pro­cess oc­cur­ring in a cell to mat­ter to task-perfor­mance, it needs to af­fect the type of sig­nals that cell sends to other cells. Hence, a model of that cell that repli­cates its sig­nal­ing be­hav­ior (that is, the pro­cess of re­ceiv­ing sig­nals, “de­cid­ing” what sig­nals to send out, and send­ing them) would repli­cate the cell’s role in task-perfor­mance, even if it leaves out or sum­ma­rizes many other pro­cesses oc­cur­ing in the cell. Do that for all the cells in the brain in­volved in task-perfor­mance, and you’ve got a model that can match the brain’s ca­pa­bil­ities. (17)

I give a ba­sic overview of the sig­nal­ing pro­cesses in the brain in sec­tion 1.5. For the pur­poses of the mechanis­tic method, I di­vide these into three cat­e­gories:

  1. Stan­dard neu­ron sig­nal­ing. This is the form of sig­nal­ing in the brain that re­ceives the most at­ten­tion from neu­ro­scien­tists and text­books. In brief: cells called neu­rons sig­nal to each other us­ing elec­tri­cal im­pulses called ac­tion po­ten­tials or spikes. Th­ese ac­tion po­ten­tials travel down a tail-like pro­jec­tion called an axon, which branches off to form con­nec­tions called synapses with other neu­rons. When an ac­tion po­ten­tial from one neu­ron reaches the synapse be­tween that neu­ron and an­other, this can cause the first neu­ron to re­lease chem­i­cals called neu­ro­trans­mit­ters, which in turn cause changes in the sec­ond neu­ron that in­fluence whether it fires. Th­ese changes can pro­ceed in part via ac­tivity in the neu­ron’s den­drites – tree-like branches that typ­i­cally re­ceive sig­nals from other neu­rons. I use the term spike through synapse to re­fer to the event of a spike ar­riv­ing at a synapse.

  2. Learn­ing. Ex­pe­rience shapes neu­ral sig­nal­ing in a man­ner that im­proves task-perfor­mance and stores task-rele­vant in­for­ma­tion. (18) Where not already cov­ered by (1), I bucket the pro­cesses in­volved in this un­der “learn­ing.” Salient ex­am­ples in­clude: changes at synapses that oc­cur over time, other changes to the elec­tri­cal prop­er­ties of neu­rons, and the growth and death of neu­rons and synapses.

  3. Other sig­nal­ing mechanisms. The brain con­tains a wide va­ri­ety of sig­nal­ing mechanisms (or can­di­date mechanisms) other than those in­cluded in the ba­sic pic­ture of stan­dard neu­ron sig­nal­ing. Th­ese in­clude other types of chem­i­cal sig­nals, other types of cells, synapses that don’t work via neu­ro­trans­mit­ter re­lease, lo­cal elec­tric fields, and other forms of sig­nal­ing along the axon. Where not already cov­ered by (1) and (2), I lump all of these, known and un­known, un­der “other sig­nal­ing mechanisms.”

Here’s a di­a­gram of the ba­sic frame­work I use for think­ing about what mod­els of these pro­cesses need to cap­ture:

Here’s the mechanis­tic method for­mula that re­sults:

To­tal FLOP/​s = FLOP/​s for stan­dard neu­ron sig­nal­ing +
        FLOP/​s for learn­ing +
        FLOP/​s for other sig­nal­ing mechanisms

I’m par­tic­u­larly in­ter­ested in the fol­low­ing ar­gu­ment:

  1. You can cap­ture stan­dard neu­ron sig­nal­ing and learn­ing with some­where be­tween ~1013-1017 FLOP/​s over­all.

  2. This is the bulk of the FLOP/​s bur­den (other sig­nal­ing mechanisms may be im­por­tant to task-perfor­mance, but they won’t re­quire com­pa­rable FLOP/​s to cap­ture).

Why think (I)? In brief: there are roughly 1011 neu­rons in the brain, and roughly 1014-1015 synapses. On the es­ti­mates that seem most plau­si­ble to me, each neu­ron spikes about 0.1-1 times per sec­ond (this is lower than the rate as­sumed by many other mechanis­tic method es­ti­mates in the liter­a­ture), (19) sug­gest­ing ~1013-1015 spikes through synapses per sec­ond over­all. (20) So 1013-1017 FLOP/​s bud­gets:

  • 1-100 FLOP per spike through synapse, which would cover var­i­ous sim­ple mod­els of the im­pact of a spike through synapse on the down­stream neu­ron (~1 FLOP per spike through synapse), with two ex­tra or­ders of mag­ni­tude to al­low for some pos­si­ble com­plex­ities. (21)

  • 100-1,000,000 FLOP/​s per neu­ron, (22) which cov­ers a va­ri­ety of sim­plified mod­els of a neu­ron’s “de­ci­sion” about whether to fire (in­clud­ing some that in­cor­po­rate com­pu­ta­tion tak­ing place in den­drites) that var­i­ous ar­gu­ments sug­gest would be ad­e­quate, and which, at the high end, cov­ers a level of mod­el­ing com­plex­ity (sin­gle-com­part­ment Hodgkin-Huxley mod­els) (23) that I ex­pect many com­pu­ta­tional neu­ro­scien­tists to think un­nec­es­sary. (24)

The FLOP/​s bud­gets for learn­ing are a sig­nifi­cant source of un­cer­tainty, but var­i­ous mod­els of learn­ing in the brain plau­si­bly fall within this range as well; and there are some ad­di­tional rea­sons – for ex­am­ple, rea­sons re­lated to the timescales of pro­cesses in­volved in learn­ing – that we might think that learn­ing will re­quire fewer FLOP/​s than stan­dard neu­ron sig­nal­ing. Var­i­ous ex­perts I spoke to (though not all) were also sym­pa­thetic to­wards (I). (25)

What about the other sig­nal­ing mechanisms at stake in (II)? Here, the ques­tion is not whether these mechanisms mat­ter. The ques­tion is whether they mean­ingfully in­crease a FLOP/​s bud­get that already cov­ers stan­dard neu­ron sig­nal­ing and learn­ing. My best guess is that they don’t. This is mostly be­cause:

  • My im­pres­sion is that most ex­perts who have formed opinions on the topic (as op­posed to re­main­ing ag­nos­tic) do not ex­pect these mechanisms to ac­count for the bulk of the brain’s in­for­ma­tion-pro­cess­ing, even if some play an im­por­tant role. (26)

  • Rel­a­tive to stan­dard neu­ron sig­nal­ing, each of the mechanisms I con­sider is some com­bi­na­tion of (a) slower, (b) less spa­tially-pre­cise, (c) less com­mon in the brain (or, not sub­stan­tially more com­mon), or (d) less clearly rele­vant to task-perfor­mance.

Sec­tion 2.3 offers an ini­tial ex­am­i­na­tion of a num­ber of these mechanisms in light of con­sid­er­a­tions like (a)-(d). See sec­tion 2.3.7 for a sum­mary chart.

To be clear: many of the ques­tions at stake in these es­ti­mates re­main very open. The mod­els and as­sump­tions cov­ered by 1013-1017 FLOP/​s seem to me rea­son­able de­faults given what we know now. But there are also a va­ri­ety of ways in which these num­bers could be too low, or too high.

In par­tic­u­lar, num­bers larger than 1017 FLOP/​s might be sug­gested by:

  • Higher-pre­ci­sion tem­po­ral dy­nam­ics in the brain. (27)

  • Very FLOP/​s-in­ten­sive deep neu­ral net­work (DNN) mod­els of neu­ron be­hav­ior (see the dis­cus­sion in sec­tion 2.1.2.2 of Be­ni­aguev et al. (2020) – a model that could sug­gest that you need ~1021 FLOP/​s for the brain over­all).

  • Es­ti­mates based on time-steps per rele­vant vari­able at synapses, in­stead of spikes through synapses per sec­ond (see dis­cus­sion here).

  • Larger FLOP/​s bud­gets for pro­cesses like den­dritic com­pu­ta­tion and learn­ing. (28)

  • Higher es­ti­mates of pa­ram­e­ters like synapse count or av­er­age firing rate. (29)

  • Back­ground ex­pec­ta­tions that in­for­ma­tion-pro­cess­ing in biol­ogy will be ex­tremely com­plex, effi­cient, and/​or ill-suited to repli­ca­tion us­ing digi­tal com­puter hard­ware. (30)

Num­bers smaller than 1013 FLOP/​s might be sug­gested by:

  • Noise, re­dun­dancy, and low-di­men­sional be­hav­ior amongst neu­rons, which sug­gest that mod­el­ing in­di­vi­d­ual neu­rons/​synapses might be overkill.

  • Over­es­ti­mates of FLOP/​s ca­pac­ity that re­sult from ap­ply­ing analogs of the mechanis­tic method to hu­man-en­g­ineered com­put­ers.

  • Evolu­tion­ary con­straints on the brain’s de­sign (e.g., con­straints on vol­ume, en­ergy con­sump­tion, growth/​main­te­nance re­quire­ments, genome size, and speed/​re­li­a­bil­ity of ba­sic el­e­ments, as well as an in­abil­ity to re­design the sys­tem from scratch), which sug­gest the pos­si­bil­ity of im­prove­ments in effi­ciency.

Over­all, I find the con­sid­er­a­tions point­ing to the ad­e­quacy of bud­gets smaller than 1013-1017 FLOP/​s more com­pel­ling than the con­sid­er­a­tions point­ing to the ne­ces­sity of larger ones (though it also seems eas­ier, in gen­eral, to show that X is suffi­cient than that X is strictly re­quired – an asym­me­try pre­sent through­out the re­port). But the un­cer­tain­ties in ei­ther di­rec­tion rightly prompt dis­satis­fac­tion with the mechanis­tic method’s ro­bust­ness.

The func­tional method

The func­tional method at­tempts to iden­tify a por­tion of the brain whose func­tion we can ap­prox­i­mate with ar­tifi­cial sys­tems, and then to scale up to an es­ti­mate for the brain as a whole.

Var­i­ous at­tempts at this method have been made. I fo­cus on two cat­e­gories: es­ti­mates based on the retina, and es­ti­mates based on the vi­sual cor­tex.

The retina

The retina is a thin layer of neu­ral tis­sue in the eye. It performs the first stage of vi­sual pro­cess­ing, and sends the re­sults to the rest of the brain via spike pat­terns in the op­tic nerve – a bun­dle of roughly a mil­lion ax­ons of neu­rons called reti­nal gan­glion cells.

Di­a­gram of the retina. From Dowl­ing (2007), un­altered. Li­censed un­der CC BY-SA 3.0.

I con­sider two types of es­ti­mates for the FLOP/​s suffi­cient to repli­cate reti­nal func­tion.

  • Hans Mo­ravec es­ti­mates 109 calcu­la­tions per sec­ond, based on the as­sump­tion that the retina’s func­tion is to de­tect edges and mo­tion. (31) One prob­lem here is that the retina does a lot more than this (for ex­am­ple, it can an­ti­ci­pate mo­tion, it can sig­nal that a pre­dicted stim­u­lus is ab­sent, and it can adapt to differ­ent light­ing con­di­tions). (32)

  • Re­cent deep neu­ral net­works used to pre­dict gan­glion cell firing pat­terns sug­gest higher es­ti­mates: ~1013-1014 FLOP/​s (though I’m very un­cer­tain about these num­bers, as they de­pend heav­ily on the size of the vi­sual in­put, and on how these mod­els would scale up to a mil­lion gan­glion cells). (33) Th­ese, too, do not yet rep­re­sent full repli­ca­tions of hu­man reti­nal com­pu­ta­tion, but they out­perform var­i­ous other mod­els on nat­u­ral images. (34)

Mov­ing from the retina to the whole brain in­tro­duces fur­ther un­cer­tainty. There are a va­ri­ety of pos­si­ble ways of scal­ing up (e.g., based on mass, vol­ume, neu­rons, synapses, and en­ergy use), which re­sult in scal­ing fac­tors be­tween 103 and 106. (35) Th­ese fac­tors im­ply the fol­low­ing ranges for the whole brain:

  • Us­ing Mo­ravec’s retina es­ti­mate: 1012-1015 calcu­la­tions per second

  • Us­ing DNN retina model es­ti­mates: 1016-1020 FLOP/​s

But there are also differ­ences be­tween the retina and the rest of the brain, which weaken the ev­i­dence these num­bers provide (for ex­am­ple, the retina is less plas­tic, more spe­cial­ized, and sub­ject to unique phys­i­cal con­straints).

Over­all, I treat the DNN es­ti­mates here as some weak ev­i­dence that the mechanis­tic method range above (1013-1017 FLOP/​s) is too low (and these could yet un­der­es­ti­mate the retina’s com­plex­ity, or the com­plex­ity of the brain rel­a­tive to the retina). But as noted, I feel very un­sure about the es­ti­mates them­selves. And it seems plau­si­ble to me that the rele­vant mod­els use many more FLOP/​s than are re­quired to au­to­mate what gan­glion cells do (for ex­am­ple, these mod­els re­flect spe­cific im­ple­men­ta­tion choices that haven’t been shown nec­es­sary; and Mo­ravec’s es­ti­mate, even if in­com­plete in its cov­er­age of all reti­nal com­pu­ta­tion, is much lower – see the end of sec­tion 3.1.2 for more dis­cus­sion).

The vi­sual cortex

A differ­ent ap­pli­ca­tion of the func­tional method treats deep neu­ral net­works trained on vi­sion tasks as au­tomat­ing some por­tion of the in­for­ma­tion-pro­cess­ing in the vi­sual cor­tex – the re­gion of the brain that re­ceives and be­gins to pro­cess vi­sual sig­nals sent from the retina (via the lat­eral genicu­late nu­cleus). (36)

Such net­works can clas­sify full-color images into 1000 differ­ent cat­e­gories with some­thing like hu­man-level ac­cu­racy. (37) What’s more, they can be used as state-of-the-art pre­dic­tors of neu­ral ac­tivity in the vi­sual cor­tex, and the fea­tures they de­tect bear in­ter­est­ing similar­i­ties to ones the vi­sual cor­tex de­tects (see sec­tion 3.2 for dis­cus­sion).

Us­ing these net­works for func­tional method es­ti­mates, though, in­tro­duces at least two types of un­cer­tainty. First, there’s clearly a lot hap­pen­ing in the vi­sual cor­tex other than image clas­sifi­ca­tion of the type these mod­els perform. For ex­am­ple: the vi­sual cor­tex is in­volved in mo­tor pro­cess­ing, pre­dic­tion, and learn­ing. In­deed, the idea that differ­ent cor­ti­cal re­gions are highly spe­cial­ized for par­tic­u­lar tasks seems to have lost fa­vor in neu­ro­science. And vi­sion as a whole seems closely tied to, for ex­am­ple, be­hav­ioral af­for­dances, 3D mod­els of an en­vi­ron­ment, and high-level in­ter­pre­ta­tions of what’s sig­nifi­cant. (38)

Se­cond, even on the par­tic­u­lar task of image clas­sifi­ca­tion, available DNN mod­els do not yet clearly match hu­man-level perfor­mance. For ex­am­ple:

  • They’re vuln­er­a­ble to ad­ver­sar­ial ex­am­ples and other types of gen­er­al­iza­tion failures.

  • They typ­i­cally use smaller in­puts than the vi­sual cor­tex re­ceives.

  • They clas­sify stim­uli into a smaller num­ber of cat­e­gories (in­deed, it is un­clear to me, con­cep­tu­ally, how to bound the num­ber of cat­e­gories hu­mans can rec­og­nize).

Ex­am­ples of gen­er­al­iza­tion failures. From Geirhos et al. (2020), Figure 3, p. 8, reprinted with per­mis­sion, and un­altered. Origi­nal cap­tion: “Both hu­man and ma­chine vi­sion gen­er­al­ise, but they gen­er­al­ise very differ­ently. Left: image pairs that be­long to the same cat­e­gory for hu­mans, but not for DNNs. Right: image pairs as­signed to the same cat­e­gory by a va­ri­ety of DNNs, but not by hu­mans.”

Sup­pose we try to forge ahead with a func­tional method es­ti­mate, de­spite these un­cer­tain­ties. What re­sults?

An Effi­cien­tNet-B2 takes 109 FLOP to clas­sify a sin­gle image (though it may be pos­si­ble to use even less than this). (39) Hu­mans can rec­og­nize ~ten images per sec­ond; run­ning an Effi­cien­tNet-B2 at this fre­quency would re­quire ~1010 FLOP/​s. (40)

I es­ti­mate that the pri­mary vi­sual cor­tex (a large and es­pe­cially well-stud­ied part of the early vi­sual sys­tem, also called V1) is ~.3-3% of the brain’s neu­rons, and that vi­sual cor­tex as a whole is ~1-10% (though if we fo­cused on per­centage of vol­ume, mass, en­ergy con­sump­tion, or synapses, the rele­vant per­centages might be larger). (41)

We also need to es­ti­mate two other pa­ram­e­ters, rep­re­sent­ing the two cat­e­gories of un­cer­tainty dis­cussed above:

  1. The per­centage of the vi­sual cor­tex’s in­for­ma­tion-pro­cess­ing ca­pac­ity that it de­votes to tasks analo­gous to image clas­sifi­ca­tion, when it performs them. (42)

  2. The fac­tor in­crease in FLOP/​s re­quired to reach hu­man-level perfor­mance on this task (if any), rel­a­tive to the FLOP/​s costs of an Effi­cien­tNet-B2 run at 10 Hz.

My es­ti­mates for these are very made-up. For (1), I use 1% of V1 as a more con­ser­va­tive es­ti­mate, and 10% of the vi­sual cor­tex as a whole as a more ag­gres­sive one, with 1% of the vi­sual cor­tex as a rough mid­dle. For (2), I use 10× as a low end, and 1000× as a high end, with 100× as a rough mid­dle. See sec­tion 3.2.3 for a bit more dis­cus­sion of these num­bers.

Com­bin­ing these es­ti­mates for (1) and (2), we get:

Over­all, I hold these es­ti­mates very lightly. The ques­tion of how high (2) could go, for ex­am­ple, seems very salient. And the con­cep­tual am­bi­gui­ties in­volved in (1) cau­tion against rely­ing on what might ap­pear to be con­ser­va­tive num­bers. (43)

Still, I don’t think these es­ti­mates are en­tirely un­in­for­ma­tive. For ex­am­ple, it is at least in­ter­est­ing to me that you need to treat a 10 Hz Effi­cien­tNet-B2 as run­ning on e.g. ~0.1% of the FLOP/​s of a model that would cover ~1% of V1, in or­der to get whole brain es­ti­mates sub­stan­tially above 1017 FLOP/​s – the top end of the mechanis­tic method range I dis­cussed above. This weakly sug­gests to me that such a range is not way too low.

The limit method

The limit method at­tempts to up­per bound re­quired FLOP/​s by ap­peal­ing to phys­i­cal limits.

I fo­cus on limits im­posed by “Lan­dauer’s prin­ci­ple,” which speci­fies the min­i­mum en­ergy costs of eras­ing bits (see sec­tion 4.1.1 for more ex­pla­na­tion). Stan­dard FLOP (that is, those performed by hu­man-en­g­ineered com­put­ers) erase bits, which means that an ideal­ized com­puter run­ning on the brain’s en­ergy bud­get (~20W) can only perform so many stan­dard FLOP/​s: speci­fi­cally, ~7×1021 (~1021 if we as­sume 8-bit FLOPs, and ~1019 if we as­sume cur­rent digi­tal mul­ti­plier im­ple­men­ta­tions). (44)

Does this up­per bound the FLOP/​s re­quired to match the brain’s task-perfor­mance? Not on its own, be­cause the brain need not be perform­ing op­er­a­tions that re­sem­ble stan­dard FLOPs. (45) In­deed, in the­ory, it ap­pears pos­si­ble to perform ar­bi­trar­ily com­pli­cated com­pu­ta­tions with very few bit era­sures, with man­age­able in­creases in com­pu­ta­tion and mem­ory bur­den. (46)

Ab­sent a sim­ple up­per bound, then, the ques­tion is what, if any­thing, we can say about the ra­tio be­tween the FLOP/​s re­quired to match the brain’s task-perfor­mance and the max­i­mum bits per sec­ond the brain can erase. Var­i­ous ex­perts I spoke to about the limit method (though not all) were quite con­fi­dent that the lat­ter far ex­ceed the former. (47) They gave var­i­ous ar­gu­ments, which I group into:

  • Al­gorith­mic ar­gu­ments (sec­tion 4.2.1), which fo­cus on the bits we should ex­pect the brain’s “al­gorithm” to erase, per FLOP re­quired to repli­cate it; and

  • Hard­ware ar­gu­ments (sec­tion 4.2.2), which fo­cus on the en­ergy we should ex­pect the brain’s hard­ware to dis­si­pate, per FLOP re­quired to repli­cate the com­pu­ta­tion it im­ple­ments.

Of these, the hard­ware ar­gu­ments seem to me stronger (though they also don’t seem to me to rely very di­rectly on Lan­dauer’s prin­ci­ple in par­tic­u­lar). Both, though, ap­peal to gen­eral con­sid­er­a­tions that ap­ply even if more spe­cific as­sump­tions from other meth­ods are mis­taken.

Over­all, it seems un­likely to me that re­quired FLOP/​s ex­ceeds the bounds sug­gested by the limit method. This is partly out of defer­ence to var­i­ous ex­perts; partly be­cause var­i­ous al­gorith­mic and hard­ware ar­gu­ments seem plau­si­ble to me (re­gard­less of whether they rely on Lan­dauer’s prin­ci­ple or not); and partly be­cause other meth­ods gen­er­ally point to lower num­bers. But this doesn’t seem like a case of a phys­i­cal limit im­pos­ing a clean up­per bound.

The com­mu­ni­ca­tion method

The com­mu­ni­ca­tion method at­tempts to use the com­mu­ni­ca­tion band­width in the brain as ev­i­dence about its com­pu­ta­tional ca­pac­ity.

Com­mu­ni­ca­tion band­width, here, refers to the speed with which a sys­tem can send differ­ent amounts of in­for­ma­tion differ­ent dis­tances. This is dis­tinct from the op­er­a­tions per sec­ond it can perform (com­pu­ta­tion). But es­ti­mat­ing com­mu­ni­ca­tion band­width might help with com­pu­ta­tion es­ti­mates, be­cause the marginal value of ad­di­tional com­pu­ta­tion and com­mu­ni­ca­tion are re­lated (e.g., too lit­tle com­mu­ni­ca­tion and your com­pu­ta­tional units sit idle; too few com­pu­ta­tional units and it be­comes less use­ful to move in­for­ma­tion around).

The ba­sic form of the ar­gu­ment is roughly:

  1. The com­mu­ni­ca­tion band­width in the brain is X.

  2. If the com­mu­ni­ca­tion band­width in the brain is X, then Y FLOP/​s is prob­a­bly enough to match the brain’s task-perfor­mance.

I don’t ex­am­ine at­tempts to use this method in any de­tail. But I note some ex­am­ples in the hopes of in­spiring fu­ture work.

  • Dr. Paul Chris­ti­ano, one of Open Philan­thropy’s tech­ni­cal ad­vi­sors, offers a loose es­ti­mate of the brain’s com­mu­ni­ca­tion ca­pac­ity, and sug­gests that it looks com­pa­rable (in­deed, in­fe­rior) to the com­mu­ni­ca­tion pro­file of a V100 GPU. Per­haps, then, the brain’s com­pu­ta­tional ca­pac­ity is com­pa­rable (or in­fe­rior) to a V100 as well. (48) This would sug­gest 1014 FLOP/​s or less for the brain (though I think this ar­gu­ment gets more com­pli­cated if you also bring in com­par­i­sons based on mem­ory and en­ergy con­sump­tion).

  • AI Im­pacts recom­mends us­ing tra­versed edges per sec­ond (TEPS) – a met­ric used to as­sess com­mu­ni­ca­tion ca­pa­bil­ities of hu­man-en­g­ineered com­put­ers, which mea­sures the time re­quired to perform a cer­tain type of search through a ran­dom graph – to quan­tify the brain’s com­mu­ni­ca­tion ca­pac­ity. (49) Treat­ing spikes through synapses as traver­sals of an edge, they es­ti­mate ~2×1013-6×1014 TEPS for the brain. They then ex­am­ine the ra­tio of TEPS to FLOP/​s in eight top su­per­com­put­ers, and find a fairly con­sis­tent ~500-600 FLOP/​s per TEPS. Scal­ing up from their TEPS es­ti­mate for the brain, they get ~1016-3×1017 FLOP/​s.

I haven’t vet­ted these es­ti­mates. And in gen­eral, efforts in this vein face a num­ber of is­sues (see sec­tion 5.2 for ex­am­ples). But I think they may well prove helpful.

Conclusions

Here’s a chart plot­ting the differ­ent es­ti­mates I dis­cussed, along with a few oth­ers from the re­port.

The re­port’s main es­ti­mates. See the con­clu­sion for a list that de­scribes them in more de­tail, and sum­ma­rizes my eval­u­a­tion of each.

As I’ve said, these num­bers should be held lightly. They are back-of-the-en­velope calcu­la­tions, offered, in the re­port, alongside ini­tial dis­cus­sion of com­pli­ca­tions and ob­jec­tions. The sci­ence here is very far from set­tled.

Here’s a sum­mary of the main con­clu­sions dis­cussed above:

  • Mechanis­tic es­ti­mates sug­gest­ing that 1013–1017 FLOP/​s would be enough to match the hu­man brain’s task-perfor­mance seem plau­si­ble to me. Some con­sid­er­a­tions point to higher num­bers; some, to lower num­bers. Of these, the lat­ter seem to me stronger.

  • I give less weight to func­tional method es­ti­mates. How­ever, I take es­ti­mates based on the vi­sual cor­tex as some weak ev­i­dence that 1013–1017 FLOP/​s isn’t much too low. Some es­ti­mates based on deep neu­ral net­work mod­els of reti­nal neu­rons point to higher num­bers, but I take these as even weaker ev­i­dence.

  • I think it un­likely that the re­quired num­ber of FLOP/​s ex­ceeds the bounds sug­gested by the limit method. How­ever, I don’t think the method it­self air­tight.

  • Com­mu­ni­ca­tion method es­ti­mates may well prove in­for­ma­tive, but I haven’t vet­ted them.

And re­mem­ber, the min­i­mum ad­e­quate bud­get could be lower than all these es­ti­mates. The brain is only one ex­am­ple of a sys­tem that performs these tasks.

Over­all, I think it more likely than not that 1015 FLOP/​s is enough to perform tasks as well as the hu­man brain (given the right soft­ware, which may be very hard to cre­ate). And I think it un­likely (<10%) that more than 1021 FLOP/​s is re­quired. But there’s no con­sen­sus amongst ex­perts.

I offer a few more spe­cific prob­a­bil­ities, keyed to one spe­cific type of brain model, in the ap­pendix. My cur­rent best-guess me­dian for the FLOP/​s re­quired to run that par­tic­u­lar type of model is around 1015 (re­call that none of these num­bers are es­ti­mates of the FLOP/​s uniquely “equiv­a­lent” to the brain).

As can be seen from the figure above, the FLOP/​s ca­pac­i­ties of cur­rent com­put­ers cover the es­ti­mates I find most plau­si­ble. How­ever:

  • Task-perfor­mance re­quires re­sources other than FLOP/​s (for ex­am­ple, mem­ory and mem­ory band­width).

  • Perform­ing tasks on a par­tic­u­lar ma­chine can in­tro­duce fur­ther over­heads and com­pli­ca­tions.

  • Most im­por­tantly, match­ing the hu­man brain’s task-perfor­mance re­quires ac­tu­ally cre­at­ing suffi­ciently ca­pa­ble and com­pu­ta­tion­ally effi­cient AI sys­tems, and this could be ex­tremely (even pro­hibitively) difficult in prac­tice even with com­put­ers that could run such sys­tems in the­ory. In­deed, as noted above, the FLOP/​s re­quired to run a sys­tem that does X can be available even while the re­sources (in­clud­ing data) re­quired to train it re­main sub­stan­tially out of reach. And what sorts of task-perfor­mance will re­sult from what sorts of train­ing is it­self a fur­ther, knotty ques­tion. (50)

So even if my best-guesses are cor­rect, this does not im­ply that we’ll see AI sys­tems as ca­pa­ble as the hu­man brain any­time soon.

Acknowledgements

This pro­ject emerged out of Open Philan­thropy’s en­gage­ment with some ar­gu­ments sug­gested by one of our tech­ni­cal ad­vi­sors, Dario Amodei, in the vein of the mechanis­tic and func­tional meth­ods dis­cussed be­low. How­ever, my dis­cus­sion should not be treated as rep­re­sen­ta­tive of Dr. Amodei’s views; the pro­ject even­tu­ally broad­ened con­sid­er­ably; and my con­clu­sions are my own. See the end of the ex­ec­u­tive sum­mary for fur­ther ac­knowl­edg­ments, along with a list of ex­perts con­sulted for the re­port.

My thanks to Nick Beck­stead, Ajeya Co­tra, Tom David­son, Owain Evans, Katja Grace, Holden Karnofsky, Michael Lev­ine, Luke Muehlhauser, Zachary Robin­son, David Rood­man, Carl Shul­man, and Ja­cob Trefethen for com­ments on this blog post in par­tic­u­lar; and to Eli Nathan for ex­ten­sive help with the web­page.