AI timelines and theoretical understanding of deep learning

I have generally been quite skeptical about the view that we are on the cusp of a revolution that will lead us to artificial general intelligence in the next 50 years so.

Aside from fundamental limitations of current AI systems, and flaws of extrapolating their remarkable ability at narrow tasks towards more general learning by appealing to “exponential” growth, there is another issue with the discourse on AI that I want to highlight.

One of the primary reasons to believe that AGI will happen in the near to mid term future comes from predictions of experts working in the field, the majority of whom seem to think that we will have AGI latest by 2100.

While there is every reason to attach credence to their perspective, it should be noted that deep learning, the framework that underpins most of recent developments in AI including language models like BERT and GPT-3 and strategy-game champions like AphaGo are notoriously hard to decipher from a theoretical perspective.

It would be a mistake to assume that people who design, develop and deploy these models necessarily understand why they happen to be as successful as they are. This may sound like a rather strange statement to make but the reality is that despite incredible pace of progress across various frontiers of AI with deep learning, our knowledge of why it works—the mathematical theory of it—lags behind immensely.

To be clear, I am not at all suggesting that research scientists at Google or DeepMind have no knowledge all of why models they design and deploy work. They are certainly guided by various ideas and heuristics when deciding on the loss function, the type of attention mechanism to use, the iterative update to the reward, the overall architecture of the network, etc. However, there are two things to note here : first, a lot of the design is based on experimenting with various functional forms, wiring combinations, convolution structure, parameter choices; second, the fact that there are heuristics and high level understanding of what is happening does not imply that there is a first-principles mathematical explanation for it.

There are people study the theoretical side of deep learning work towards establishing exact results and also aim to understand why the model training process is so incredibly successful. The progress there has been rather limited, and certainly well behind where the state-of-the-art in terms of performance is. There are a lot of unusual things with deep learning and among them the fact that core concepts in conventional machine learning simply does not seem to apply (such as overfitting). For a more technical view on this, watch this amazing talk by Sanjeev Arora where he explains how intriguing deep learning model and training is.

This should be contrasted with physics where our understanding of theories is much deeper and fundamental. There is a very precise mathematical framework to characterize the physics of say, electrons or quarks, and, at the other end of the spectrum, a model to understand cosmology. There is no such thing even remotely comparable to that in deep learning.

Given all this, one should be more skeptical about prediction timelines for a qualitatively superior intelligence from experts in this field. The fact that there are considerable gaps in our understanding would suggest that expert opinion is perhaps guided less by some deeper insight into the learning and generalization process of AI models and more by higher level examination of the rapid progress of AI, i.e., their views may be relatively more closer to that of a lay person. Couple this with the fact that we have a very limited understanding of human consciousness and how that is related to the electro-physiological properties of the brain. Such limitations impose considerable challenges to predict with any degree of certainty.