I think this is a poor way to describe a reasonable underlying point. Heavier-than-air flying machines were pursued for centuries, but airplanes appeared almost instantly (on a historic scale) after the development of engines with sufficient power density. Nonetheless, it would be confusing to say “flying is more about engine power than the right theories of flight”. Both are required.
I agree. A better phrasing could have emphasized that, although both theory and compute are required, in practice, the compute part seems to be the crucial bottleneck. The ‘theories’ that drive deep learning are famously pretty shallow, and most progress seems to come from tinkering, scaling, and writing code to be more efficient. I’m not aware of any major algorithmic contribution that would not have been possible if it were not for some fundamental analysis from deep learning theory (though perhaps these happen all the time and I’m just not sufficiently familiar to know).
As you observe, algorithmic progress has been comparable to compute progress (both within and outside of AI). You list three “main explanations” for where algorithmic progress ultimately comes from and observe that only two of them explain the similar rates of progress in algorithms and compute. But both of these draw a causal path from compute to algorithms without considering the (to-me-very-natural) explanation that some third thing is driving them both at a similar rate. There are a lot of options for this third thing! Researcher-to-researcher communication timescales, the growth rate of the economy, the individual learning rate of humans, new tech adoption speed, etc.
I think the alternative theory of a common cause is somewhat plausible, but I don’t see any particular reason to believe in it. If there were a common factor that caused progress in computer hardware and algorithms to proceed at a similar rate, why wouldn’t other technologies that shared that cause grow at similar rates?
Hardware progress has been incredibly fast over the last 70 years—indeed, many people say that the speed of computers is by far the most salient difference between the world in 1973 and 2023. And yet algorithmic progress has apparently been similarly rapid, which seems hard to square with a theory of a general factor that causes innovation to proceed at similar rates. Surely there are such bottlenecks that slow down progress in both places, but the question is what explains the coincidence in rates.
Humans have been automating mechanical task for many centuries, and information-processing tasks for many decades. Moore’s law, the growth rate of the thing (compute) that you ague drives everything else, has been stated explicitly for almost 58 years (and presumably applicable for at least a few decade before that). Why are you drawing a distinction between all the information processing that happened in the past and “AI”, which you seem to be taking as a basket of things that have mostly not had a chance to be applied yet (so no data)?
I expect innovation in AI in the future will take a different form than innovation in the past.
When innovating in the past, people generally found a narrow tool or method that improved efficiency in one narrow domain, without being able to substitute for human labor across a wide variety of domains. Occasionally, people stumbled upon general purpose technologies that were unusually useful across a variety of situations, although by-and-large these technologies are quite narrow compared to human resourcefulness. By contrast, I think it’s far more plausible that ML foundation models allow us to create models that can substitute for human labor across nearly all domains at once, once a sufficient scale is reached. This would happen because sufficiently capable foundation models can be cheaply fine-tuned to provide a competent worker for nearly any task.
This is essentially the theory that there’s something like “general intelligence” which causes human labor to be so useful across a very large variety of situations compared to physical capital. This is also related to transfer learning and task-specific bottlenecks that I talked about in Part 2. Given that human population growth seems to have caused a “productivity explosion” in the past (relative to pre-agricultural rates of growth), it seems highly probable that AI could do something similar in the future if it can substitute for human labor.
That said, I’m sympathetic to the model that says future AI innovation will happen without much transfer to other domains, more similar to innovation in the past. This is one reason (among many) that my final probability distribution in the post has a long tail extending many decades into the future.
I agree. A better phrasing could have emphasized that, although both theory and compute are required, in practice, the compute part seems to be the crucial bottleneck. The ‘theories’ that drive deep learning are famously pretty shallow, and most progress seems to come from tinkering, scaling, and writing code to be more efficient. I’m not aware of any major algorithmic contribution that would not have been possible if it were not for some fundamental analysis from deep learning theory (though perhaps these happen all the time and I’m just not sufficiently familiar to know).
I think the alternative theory of a common cause is somewhat plausible, but I don’t see any particular reason to believe in it. If there were a common factor that caused progress in computer hardware and algorithms to proceed at a similar rate, why wouldn’t other technologies that shared that cause grow at similar rates?
Hardware progress has been incredibly fast over the last 70 years—indeed, many people say that the speed of computers is by far the most salient difference between the world in 1973 and 2023. And yet algorithmic progress has apparently been similarly rapid, which seems hard to square with a theory of a general factor that causes innovation to proceed at similar rates. Surely there are such bottlenecks that slow down progress in both places, but the question is what explains the coincidence in rates.
I expect innovation in AI in the future will take a different form than innovation in the past.
When innovating in the past, people generally found a narrow tool or method that improved efficiency in one narrow domain, without being able to substitute for human labor across a wide variety of domains. Occasionally, people stumbled upon general purpose technologies that were unusually useful across a variety of situations, although by-and-large these technologies are quite narrow compared to human resourcefulness. By contrast, I think it’s far more plausible that ML foundation models allow us to create models that can substitute for human labor across nearly all domains at once, once a sufficient scale is reached. This would happen because sufficiently capable foundation models can be cheaply fine-tuned to provide a competent worker for nearly any task.
This is essentially the theory that there’s something like “general intelligence” which causes human labor to be so useful across a very large variety of situations compared to physical capital. This is also related to transfer learning and task-specific bottlenecks that I talked about in Part 2. Given that human population growth seems to have caused a “productivity explosion” in the past (relative to pre-agricultural rates of growth), it seems highly probable that AI could do something similar in the future if it can substitute for human labor.
That said, I’m sympathetic to the model that says future AI innovation will happen without much transfer to other domains, more similar to innovation in the past. This is one reason (among many) that my final probability distribution in the post has a long tail extending many decades into the future.