AI timelines and theoretical understanding of deep learning
I have generally been quite skeptical about the view that we are on the cusp of a revolution that will lead us to artificial general intelligence in the next 50 years so.
Aside from fundamental limitations of current AI systems, and flaws of extrapolating their remarkable ability at narrow tasks towards more general learning by appealing to “exponential” growth, there is another issue with the discourse on AI that I want to highlight.
One of the primary reasons to believe that AGI will happen in the near to mid term future comes from predictions of experts working in the field, the majority of whom seem to think that we will have AGI latest by 2100.
While there is every reason to attach credence to their perspective, it should be noted that deep learning, the framework that underpins most of recent developments in AI including language models like BERT and GPT-3 and strategy-game champions like AphaGo are notoriously hard to decipher from a theoretical perspective.
It would be a mistake to assume that people who design, develop and deploy these models necessarily understand why they happen to be as successful as they are. This may sound like a rather strange statement to make but the reality is that despite incredible pace of progress across various frontiers of AI with deep learning, our knowledge of why it works—the mathematical theory of it—lags behind immensely.
To be clear, I am not at all suggesting that research scientists at Google or DeepMind have no knowledge all of why models they design and deploy work. They are certainly guided by various ideas and heuristics when deciding on the loss function, the type of attention mechanism to use, the iterative update to the reward, the overall architecture of the network, etc. However, there are two things to note here : first, a lot of the design is based on experimenting with various functional forms, wiring combinations, convolution structure, parameter choices; second, the fact that there are heuristics and high level understanding of what is happening does not imply that there is a first-principles mathematical explanation for it.
There are people study the theoretical side of deep learning work towards establishing exact results and also aim to understand why the model training process is so incredibly successful. The progress there has been rather limited, and certainly well behind where the state-of-the-art in terms of performance is. There are a lot of unusual things with deep learning and among them the fact that core concepts in conventional machine learning simply does not seem to apply (such as overfitting). For a more technical view on this, watch this amazing talk by Sanjeev Arora where he explains how intriguing deep learning model and training is.
This should be contrasted with physics where our understanding of theories is much deeper and fundamental. There is a very precise mathematical framework to characterize the physics of say, electrons or quarks, and, at the other end of the spectrum, a model to understand cosmology. There is no such thing even remotely comparable to that in deep learning.
Given all this, one should be more skeptical about prediction timelines for a qualitatively superior intelligence from experts in this field. The fact that there are considerable gaps in our understanding would suggest that expert opinion is perhaps guided less by some deeper insight into the learning and generalization process of AI models and more by higher level examination of the rapid progress of AI, i.e., their views may be relatively more closer to that of a lay person. Couple this with the fact that we have a very limited understanding of human consciousness and how that is related to the electro-physiological properties of the brain. Such limitations impose considerable challenges to predict with any degree of certainty.
Can you clarify what you mean by this? Does “quite skeptical” mean
or
or
or something else?
Language is quite imprecise, numbers can’t resolve uncertainty in the underlying phenomenon, but they help a lot in clarifying and making the strength of your uncertainty more precise.
I feel like the main reasons you shouldn’t trust forecasts from subject matter experts are something like:
external validity: do experts in ML have good forecasts that outperform a reasonable baseline?
AFAIK this is an open question, probably not enough forecasts have resolved yet?
internal validity: do experts in ML have internally consistent predictions? Do they give similar answers at slightly different times when the evidence that has changed is minimal? Do they give similar answers when not subject to framing effects?
AFAIK they’ve failed miserably
base rates: what’s the general reference class we expect to draw from?
I’m not aware of any situation where subject matter experts not incentivized to have good forecasts do noticeably better than trained amateurs with prior forecasting track records.
So like you and steve2152 I’m at least somewhat skeptical of putting too much faith in expert forecasts.
However, in contrast I feel like a lack of theoretical understanding of current ML can’t be that strong evidence against trusting experts here, for the very simple reason that conservation of expected evidence means this implies that we ought to trust forecasts from experts with a theoretical understanding of their models more. And this seems wrong because (among others) it would’ve been wrong 50 years ago to trust experts on GOFAI for their AI timelines!
Several good points made by Linch, Aryeh and steve2512.
As for making my skepticism more precise in terms of probability, it’s less about me having a clear sense of timeline predictions that are radically different from those who believe that AGI will explode upon us in the next few decades, and more about the fact that I find most justifications and arguments made in favor of a timeline of less than 50 years to be rather unconvincing.
For instance, having studied and used state-of-the-art deep learning models, I am simply not able to understand why we are significantly closer to AGI today than we were in 1950s. General intelligence requires something qualitatively different from GPT-3 or Alpha Go, and I have seen literally zero evidence that any AI systems comprehend things even remotely close how humans operate.
Note that the last point is not a requirement (namely that AI should understand objects, events and relations like humans do) as such for AGI but it does make me skeptical of people who cite these examples as evidence of progress we’ve made towards such a general intelligence.
I have looked at Holden’s post and there are several things that are not clear to me. Here is one: there appears to be a lot of focus on the number of computations, especially in comparison to the human brain, and while I have little doubt that artificial systems would surpass those limitations (if it has already not done so), the real question is decoding the nature of wiring and the functional form of the relation between the inputs and outputs. Perhaps there is something I am not getting here but (at least in principle) isn’t there an infinite degree of freedom associated with a continuous function? Even if one argued that we can define equivalence class of similar functions (made rigorous), does that still not leave us with an extremely large number of possibilities?
If we don’t have convincing evidence in favor of a timeline <50 years, and we also don’t have convincing evidence in favor of a timeline ≥50 years, then we just have to say that this is a question on which we don’t have convincing evidence of anything in particular. But we still have to take whatever evidence we have and make the best decisions we can. ¯\_(ツ)_/¯
(You don’t say this explicitly but your wording kinda implies that ≥50 years is the default, and we need convincing evidence to change our mind away from that default. If so, I would ask why we should take ≥50 years to be the default. Or sorry if I’m putting words in your mouth.)
Lots of ingredients go into AGI, including (1) algorithms, (2) lots of inexpensive chips that can do lots of calculations per second, (3) technology for fast communication between these chips, (4) infrastructure for managing large jobs on compute clusters, (5) frameworks and expertise in parallelizing algorithms, (6) general willingness to spend millions of dollars and roll custom ASICs to run a learning algorithm, (7) coding and debugging tools and optimizing compilers, etc. Even if you believe that you’ve made no progress whatsoever on algorithms since the 1950s, we’ve made massive progress in the other categories. I think that alone puts us “significantly closer to AGI today than we were in the 1950s”: once we get the algorithms, at least everything else will be ready to go, and that wasn’t true in the 1950s, right?
But I would also strongly disagree with the idea that we’ve made no progress whatsoever on algorithms since the 1950s. Even if you think that GPT-3 and AlphaGo have absolutely nothing whatsoever to do with AGI algorithms (which strikes me as an implausibly strong statement, although I would endorse much weaker versions of that statement), that’s far from the only strand of research in AI, let alone neuroscience. For example, there’s a (IMO plausible) argument that PGMs and causal diagrams will be more important to AGI than deep neural networks are. But that would still imply that we’ve learned AGI-relevant things about algorithms since the 1950s. Or as another example, there’s a (IMO misleading) argument that the brain is horrifically complicated and we still have centuries of work ahead of us in understanding how it works. But even people who strongly endorse that claim wouldn’t also say that we’ve made “no progress whatsoever” in understanding brain algorithms since the 1950s.
Sorry if I’m misunderstanding.
I’m a bit confused by this; are you saying that the only possible AGI algorithm is “the exact algorithm that the human brain runs”? The brain is wired up by a finite number of genes, right?
Great points again!
I have only cursorily examined the links you’ve shared (bookmarked them for later) but I hope the central thrust of what I am saying does not depend too strongly on being closely familiar with the contents of those.
A few clarifications are in order. I am really not sure about AGI timelines and that’s why I am reluctant to attach any probability to it. For instance, the only reason I believe that there is less than 50% chance that we will have AGI in the next 50 years is because we have not seen it yet and IMO it seems rather unlikely to me that the current directions will lead us there. But that is a very weak justification. What I do know is that there has to be some radical qualitative change for artificial agents to go from excelling in narrow tasks to developing general intelligence.
That said, it may seem like nit-picking but I do want to draw the distinction between “not significant progress” and “no progress at all” towards AGI. Not only am I stating the former, I have no doubt that we have made incredible progress with algorithms in general. I am less convinced about how much those algorithms help us get closer towards an AGI. (In hindsight, it may turn out that our current deep learning approaches such as GANs contain path-breaking proto-AGI ideas /principles, but I am unable to see it that way).
If we consider a scale of 0-100 where 100 represents AGI attainment and 0 is some starting point in the 1950s, I have no clear idea whether the progress we’ve made thus far is close to 5 or 0.5 or even 0.05. I have no strong arguments to justify one or the other because I am way too uncertain about how far the final stage is.
There can also be no question with respect to the other categories of progress that you have highlighted such as compute power and infrastructure and large datasets -indeed I see these as central to the remarkable performance we have come to witness with deep learning models.
The perspective I have is that while acknowledging plenty of progress in understanding several processes in the brain such as signal propagation, mapping of specific sensory stimuli to neuronal activity, theories of how brain wiring at birth may have encoded several learning algorithms, they constitute piece-meal knowledge and they still seem quite a few strides removed the bigger question—how do we attain high level cognition, develop abstract thinking, be able to reason and solve complex mathematical problems ?
I agree that we don’t necessarily have to reproduce the exact wiring or the functional relation in order to create a general intelligence (which is why I mentioned the equivalence classes).
Finite number of genes implies finite steps/information/computation (and that is not disputable of course) but the number of potential wiring options in the brain and functional forms between input and output is exponentially large. (It is in principle, infinite, if we want to reproduce the exact function, but we both agree that that may not be necessary). Pure exploratory search may not be feasible and one may make the case that with appropriate priors and assuming some modular structure of the brain, the search space will reduce considerably, but still how much of a quantitative grip do we have on this? And how much rests on speculation?
This seems correct and a valid point to keep in mind—but it cuts both ways. It makes sense to reduce your credence when you recognize that expert judgment here is less informed than you originally thought. But by the same token, you should probably reduce your credence in your own forecasts being correct, at least to the extent that they involve inside view arguments like, “deep learning will not scale up all the way because it’s missing xyz.” The correct response in this case will depend on how much your views depend on inside view arguments about deep learning, of course. But I suspect that at least for a lot of people the correct response is to become more agnostic about any timeline forecast, their own included, rather than to think that since the experts aren’t so reliable here, therefore I should just trust my own judgement.
This was my initial reaction, that suspiciousness of existing forecasts can justify very wide error bars but not certainty in >50 year timelines. But then I realized I didn’t understand what probability OP gave to <50 years timelines, which is why I asked a clarifying question first.
Have you read https://www.cold-takes.com/where-ai-forecasting-stands-today/ ?
I do agree that there are many good reasons to think that AI practitioners are not AI forecasting experts, such as the fact that they’re, um, obviously not—they generally have no training in it and have spent almost no time on it, and indeed they give very different answers to seemingly-equivalent timelines questions phrased differently. This is a reason to discount the timelines that come from AI practitioner surveys, in favor of whatever other forecasting methods / heuristics you can come up with. It’s not per se a reason to think “definitely no AGI in the next 50 years”.
Well, maybe I should just ask: What probability would you assign to the statement “50 years from today, we will have AGI”? A couple examples:
If you think the probability is <90%, and your intention here is to argue against people who think it should be >90%, well I would join you in arguing against those people too. This kind of technological forecasting is very hard and we should all be pretty humble & uncertain here. (Incidentally, if this is who you’re arguing against, I bet that you’re arguing against fewer people than you imagine.)
If you think the probability is <10%, and your intention here is to argue against people who think it should be >10%, then that’s quite a different matter, and I would strongly disagree with you, and I would very curious how you came to be so confident. I mean, a lot can happen in 50 years, right? What’s the argument?