It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link]
This is a link post for Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial neural networks. Neuron. https://www.sciencedirect.com/science/article/abs/pii/S0896627321005018
Also this article about the paper: https://www.quantamagazine.org/how-computationally-complex-is-a-single-neuron-20210902
My own quick takeaway is that it takes 5-8 layers with about 1000 neurons in total in an artificial neural network to simulate a single biological neuron of a certain kind, and before taking this into account, we’d likely underestimate the computational power of animal brains relative to artificial neural networks, possibly up to about 1000x. After taking this into account, this may set back AI timelines (based on biological anchors EDIT: although this particular report does in fact already assume biological neurons are more powerful) or reduce the relative moral weight of artificial sentience. However, there are two important weaknesses that undermine this conclusion (see also irving’s comment):
It’s possible much of that supposed additional complexity isn’t useful or is closer to a constant (rather than proportional) overhead that can be ignored as we scale to simulate larger biological neural networks.
We should also try to simulate artificial neurons (and artificial neural networks) with biological neuron models; there could be similar overhead in that direction, too.
From the Quanta article:
They continued increasing the number of layers until they achieved 99% accuracy at the millisecond level between the input and output of the simulated neuron. The deep neural network successfully predicted the behavior of the neuron’s input-output function with at least five — but no more than eight — artificial layers. In most of the networks, that equated to about 1,000 artificial neurons for just one biological neuron.
From the paper:
Highlights
Cortical neurons are well approximated by a deep neural network (DNN) with 5–8 layers
DNN’s depth arises from the interaction between NMDA receptors and dendritic morphology
Dendritic branches can be conceptualized as a set of spatiotemporal pattern detectors
We provide a unified method to assess the computational complexity of any neuron type
Summary
Utilizing recent advances in machine learning, we introduce a systematic approach to characterize neurons’ input/output (I/O) mapping complexity. Deep neural networks (DNNs) were trained to faithfully replicate the I/O function of various biophysical models of cortical neurons at millisecond (spiking) resolution. A temporally convolutional DNN with five to eight layers was required to capture the I/O mapping of a realistic model of a layer 5 cortical pyramidal cell (L5PC). This DNN generalized well when presented with inputs widely outside the training distribution. When NMDA receptors were removed, a much simpler network (fully connected neural network with one hidden layer) was sufficient to fit the model. Analysis of the DNNs’ weight matrices revealed that synaptic integration in dendritic branches could be conceptualized as pattern matching from a set of spatiotemporal templates. This study provides a unified characterization of the computational complexity of single neurons and suggests that cortical networks therefore have a unique architecture, potentially supporting their computational power.
Graphical abstract
- 25 Mar 2023 8:54 UTC; 10 points) 's comment on Scale of the welfare of various animal populations by (
- Open Philanthropy Should Fund Further Cause Exploration by 8 Jul 2022 23:47 UTC; 3 points) (
This paper has at least two significant flaws when used to estimate relative complexity for useful purposes. In the authors’ defense such an estimate wasn’t the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn’t mention the flaws.
Flaw one: no reversed control
Say we have two parameterized model classes An and Bn, and ask what ns are necessary for An to approximate B1 and Bn to approximate A1. It is trivial to construct model classes for which the n is large in both directions, just because A1 is a much better algorithm to approximate A1 than B1 and vice versa. I’m not sure how much this cuts off the 1000 estimate, but it could easily be 10x.
Brief Twitter thread about this: https://twitter.com/geoffreyirving/status/1433487270779174918
Flaw two: no scaling w.r.t. multiple neurons
I don’t see any reason to believe the 1000 factor would remain constant as you add more neurons, so that we’re approximating many real neurons with many (more) artificial neurons. In particular, it’s easy to construct model classes where the factor decays to 1 as you add more real neurons. I don’t know how strong this effect is, but again there is no discussion or estimation of it in the paper.
Thanks, these are both excellent points. I did hint to the first one, and I specifically came back to this post to mention the second, but you beat me to it. ;)
I’ve edited my post.
EDIT: Also edited again to emphasize the weaknesses.
Yup! That’s where I’d put my money.
It’s a forgone conclusion that a real-world system has tons of complexity that is not related to the useful functions that the system performs. Consider, for example, the silicon transistors that comprise digital chips—”the useful function that they perform” is a little story involving words like “ON” and “OFF”, but “the real-world transistor” needs three equations involving 22 parameters, to a first approximation!
By the same token, my favorite paper on the algorithmic role of dendritic computation has them basically implementing a simple set of ANDs and ORs on incoming signals. It’s quite likely that dendrites do other things too besides what’s in that one paper, but I think that example is suggestive.
Caveat: I’m mainly thinking of the complexity of understanding the neuronal algorithms involved in “human intelligence” (e.g. common sense, science, language, etc.), which (I claim) are mainly in the cortex and thalamus. I think those algorithms need to be built out of really specific and legible operations, and such operations are unlikely to line up with the full complexity of the input-output behavior of neurons. I think the claim “the useful function that a neuron performs is simpler than the neuron itself” is always true, but it’s very strongly true for “human intelligence” related algorithms, whereas it’s less true in other contexts, including probably some brainstem circuits, and the neurons in microscopic worms. It seems to me that microscopic worms just don’t have enough neurons to not squeeze out useful functionality from every squiggle in their neurons’ input-output relations. And moreover here we’re not talking about massive intricate beautifully-orchestrated learning algorithms, but rather things like “do this behavior a bit less often when the temperature is low” etc. See my post Building brain-inspired AGI is infinitely easier than understanding the brain for more discussion kinda related to this.
Addendum: In the other direction, one could point out that the authors were searching for “an approximation of an approximation of a neuron”, not “an approximation of a neuron”. (insight stolen from here.) Their ground truth was a fancier neuron model, not a real neuron. Even the fancier model is a simplification of real life. For example, if I recall correctly, neurons have been observed to do funny things like store state variables via changes in gene expression. Even the fancier model wouldn’t capture that. As in my parent comment, I think these kinds of things are highly relevant to simulating worms, and not terribly relevant to reverse-engineering the algorithms underlying human intelligence.
This does not seem right to me. I haven’t read the paper yet, so maybe I’m totally misunderstanding things, but...
The bio anchors framework does not envision us achieving AGI/TAI/etc. by simulating the brain, or even by simulating neurons. Instead, it tries to guesstimate how many artificial neurons or parameters we’d need to achieve similar capabilities to the brain, by looking at how many biological neurons or synapses are used in the brain, and then adding a few orders of magnitude of error bars. See the Carlsmith report, especially the conclusion summary diagram. Obviously if we actually wanted to simulate the brain we’d need to do something more sophisticated than just use 1 artificial neuron per biological neuron. For a related post, see this. Anyhow, the point is, this paper seems almost completely irrelevant to the bio anchors framework, because we knew already (and other papers had shown) that if we wanted to simulate a neuron it would take more than just one artificial neuron.
Assuming I’m wrong about point #1, I think the calculation would be more complex than just “1000 artificial neurons needed per biological neuron, so +3 OOMs to bio anchors framework.” Most of the computation in the bio anchors calculation comes from synapses, not neurons. Here’s an attempt at how the revised calculation might go:
Currently Carlsmith’s median estimate is 10^15 flop per second. Ajeya’s report guesses that artificial stuff is crappier than bio stuff and so uses 10e16 flop per second as the median instead, IIRC.
There are 10e11 neurons in the brain, and 10^14-10^15 synapses.
If we assume each neuron requires an 8-layer convolutional DNN with 1000 neurons… how many parameters is that? Let’s say it’s 100,000, correct me if I’m wrong.
So then that would be 100,000 flop per period of neuron-simulation.
I can’t access the paper itself but one of the diagrams says something about one ms of input. So that means maybe that the period length is 1 ms, which means 1000 periods a second, which means 100,000,000 flop per second of neuron-simulation.
This would be a lot more than the cost of simulating the synapses, so we don’t have to bother calculating that.
So our total cost is 10^8 flop per second per neuron times 10^11 neurons = 10^17 flop per second to simulate the brain.
So this means a loose upper bound for the bio anchors framework should be at 10^17, whereas currently Ajeya uses a median of 10^16 with a few OOMs of uncertainty on either side. It also means, insofar as you think my point #1 is wrong and that this paper is the last word on the subject, that the median should maybe be 10^17 as well, though that’s less clear. (Plausibly we’ll be able to find more efficient ways to simulate neurons than the dumb 8-layer NN they tried in this paper, shaving an OOM or so off the cost, bringing us back down to 10^16again...)
It’s unclear whether this would lengthen or shorten timelines, I’d like to see the calculations for that. My wild guess is that it would lower the probability of <15 year timelines and also lower the probability of >30 year timelines.
On point 1, my claim is that the paper is evidence for the claim that biological neurons are more computationally powerful than artificial ones, not that we’d achieve AGI/TAI by simulating biological brains. I agree that for those who already expected this, this paper wouldn’t be much of an update (well, maybe the actual numbers matter; 1000x seemed pretty high, but is also probably an overestimate).
I also didn’t claim that the timelines based on biological anchors that I linked to would actually be affected by this (since I didn’t know either way whether they made any adjustment for this, since I only read summaries and may have skimmed a few parts of the actual report), but that’s a totally reasonable interpretation of what I said, and I should have been more careful to prevent it from being interpreted that way.
What does it mean to say a biological neuron is more computationally powerful than an artificial one? If all it means is that it takes more computation to fully simulate its behavior, then by that standard a leaf falling from a tree is more computationally powerful than my laptop.
(This is a genuine question, not a rhetorical one. I do have some sense of what you are saying but it’s fuzzy in my head and I’m wondering if you have a more precise definition that isn’t just “computation required to simulate.” I suspect that the Carlsmith report I linked may have already answered this question and I forgot what it said.)
I would say a biological neuron can compute more complex functions or a wider variety of functions of its inputs than standard artificial neurons in deep learning (linear combination of inputs followed by a nonlinear real-valued function with one argument), and you could approximate functions of interest with fewer biological neurons than artificial ones. Maybe biological neurons have more (useable) degrees of freedom for the same number of input connections.
I think I get it, thanks! (What follows is my understanding, please correct if wrong!) The idea is something like: A falling leaf is not a computer, it can’t be repurposed to perform many different useful computations. But a neuron is; depending on the weights of its synapses it can be an and gate, an or gate, or various more complicated things. And this paper in the OP is evidence that the range of more complicated useful computations it can do is quite large, which is reason to think that in maybe in the relevant sense a lot of the brain’s skills have to involve fancy calculations within neurons. (Just because they do doesn’t mean they have to, but if neurons are general-purpose computers capable of doing lots of computations, that seems like evidence compared to if neurons were more like falling leaves)
I still haven’t read the paper—does the experiment distinguish between the “it’s a tiny computer” hypothesis vs. the “it’s like a falling leaf—hard to simulate, but not in an interesting way” hypothesis?
Ya, this is what I’m thinking, although have to is also a matter of scaling, e.g. a larger brain could accomplish the same with less powerful neurons. There’s also probably a lot of waste in the human brain, even just among the structures most important for reasoning (although the same could end up being true or an AGI/TAI we try to build; we might need a lot of waste before we can prune or make smaller student networks, etc.).
On falling leaves, the authors were just simulating the input and output behaviour of the neurons, not the physics/chemistry/biology (I’m not sure if that’s what you had in mind), but based on the discussion on this post, the 1000x could be very misleading and could mostly go away as you scale to try to simulate a larger biological network, or you could have a similar cost in trying to simulate an artificial neural network with a biological one. They didn’t check for these possibilities (so it could still be in some sense like simulating falling leaves).
Still, 1000x seems high to me for biological neurons not being any more powerful than artificial neurons, although this is pretty much just gut intuition, and I can’t really explain why. Based on the conversations here (with you and others), I think 10x is a reasonable guess.
What I meant by the falling leaf thing:
If we wanted to accurately simulate where a leaf would land when dropped from a certain height and angle, it would require a ton of complex computation. But (one can imagine) it’s not necessary for us to do this; for any practical purpose we can just simplify it to a random distribution centered directly below the leaf with variance v.
Similarly (perhaps) if we want to accurately simulate the input-output behavior of a neuron, maybe we need 8 layers of artificial neurons. But maybe in practice if we just simplified it to “It sums up the strength of all the neurons that fired at it in the last period, and then fires with probability p, where p is an s-curve function of the strength sum...” maybe that would work fine for practical purposes—NOT for purpose of accurately reproducing the human brain’s behavior, but for purposes of building an approximately brain-sized artificial neural net that is able to learn and excel at the same tasks.
My original point no. 1 was basically that I don’t see how the experiment conducted in this paper is much evidence against the “simplified model would work fine for practical purposes” hypothesis.
Ya, that’s fair. If this is the case, I might say that the biological neurons don’t have additional useful degrees of freedom for the same number of inputs, and the paper didn’t explicitly test for this either way, although, imo, what they did test is weak Bayesian evidence for biological neurons having more useful degrees of freedom, since if they could be simulated with few artificial neurons, we could pretty much rule out that hypothesis. Maybe this evidence is too weak to update much on, though, especially if you had a prior that simulating biological neurons would be pretty hard even if they had no additional useful degrees of freedom.
Now I think we are on the same page. Nice! I agree that this is weak bayesian evidence for the reason you mention; if the experiment had discovered that one artificial neuron could adequately simulate one biological neuron, that would basically put an upper bound on things for purposes of the bio anchors framework (cutting off approximately the top half of Ajeya’s distribution over required size of artificial neural net). Instead they found that you need thousands. But (I would say) this is only weak evidence because prior to hearing about this experiment I would have predicted that it would be difficult to accurately simulate a neuron, just as it’s difficult to accurately simulate a falling leaf. Pretty much everything that happens in biology is complicated and hard to simulate.
This makes a lot of sense to me given our limited progress on simulating even very simple animal brains so far, given the huge amount of compute we have nowadays. The only other viable hypothesis I can think of is that people aren’t trying that hard, which doesn’t seem right to me.
What about the hypothesis that simple animal brains haven’t been simulated because they’re hard to scan—we lack a functional map of the neurons—which ones promote or inhibit one another, and other such relations.
Here’s some supporting evidence for it being hard to map:
A mouse brain is about 500x that.
On the the other hand, progress with OpenWorm has been kind of slow, despite C elegans having only 302 neurons and 959 cells in total. Is mapping the bottleneck here?
If interested, here’s some further evidence that it’s just really hard to map: