In fact, algorithmic progress has been found to be similarly as important as compute for explaining progress across a variety of different domains, such as Mixed-Integer Linear Programming, SAT solvers, and chess engines—an interesting coincidence that can help shed light on the source of algorithmic progress (Koch et al. 2022, Grace 2013). From a theoretical perspective, there appear to be at least three main explanations of where algorithmic progress ultimately comes from:
Theoretical insights, which can be quickly adopted to improve performance.
Insights whose adoption is enabled by scale, which only occurs after there’s sufficient hardware progress. This could be because some algorithms don’t work well on slower hardware, and only start working well once they’re scaled up to a sufficient level, after which they can be widely adopted.
Experimentation in new algorithms. For example, it could be that efficiently testing out all the reasonable choices for new potential algorithms requires a lot of compute.
I always feel kind of uneasy about how the term “algorithmic progress” is used. If you find an algorithm with better asymptotics, then apparent progress depends explicitly on the problem size. MILP seems like a nice benchmark because it’s NP-hard in general, but then again most(?) improvements are exploiting structure of special classes of problems. Is that general progress?
One important factor affecting our ability to measure algorithmic progress is the degree to which algorithmic progress on one task generalizes to other tasks. So far, much of our data on algorithmic progress in machine learning has been on ImageNet. However, there seem to be two ways of making algorithms more efficient on ImageNet. The first way is to invent more efficient learning algorithms that apply to general tasks. The second method is to develop task-specific methods that only narrowly produce progress on ImageNet.
We care more about the rate of general algorithmic progress, which in theory will be overestimated by measuring the rate of algorithmic progress on any specific narrow task. This consideration highlights one reason to think that estimates overstate algorithmic progress in a general sense.
I definitely agree with the last sentence, but I’m still not sure how to think about this. I have the impression that, typically, some generalizable method makes a problem feasible, at which point focused attention on applying related methods to that problem drives solving it towards being economical. I suppose for this framework we’d still care more about the generalizable methods, because those trigger the starting gun for each automatable task?
I’ve also now put code for Appendix C.3 on GitHub.