You should probably think of Moore’s law as having a logarithmic x-axis as well. It can be transformed to an instance of [Wright’s law](https://en.wikipedia.org/wiki/Experience_curve_effect) that the graph of log cost as a function of log aggregate production is a straight line. Moore’s law of exponential improvement in time is fueled by exponential expansion of demand in time.
The proportion of silicon devoted to LLMs has run through a logistic curve and is now a substantial proportion of production, so it can’t keep growing. But the total production of chips has been growing exponentially for decades and it is quite reasonable to expect GPUs to keep growing at that exponential rate, although slower than the recent exponential.
Comparing AI scaling laws to Wright’s law is an interesting idea. That is still a power law rather than logarithmic returns, but usefully comparable to both the pretraining and inference scaling behaviours.
You should probably think of Moore’s law as having a logarithmic x-axis as well. It can be transformed to an instance of [Wright’s law](https://en.wikipedia.org/wiki/Experience_curve_effect) that the graph of log cost as a function of log aggregate production is a straight line. Moore’s law of exponential improvement in time is fueled by exponential expansion of demand in time.
The proportion of silicon devoted to LLMs has run through a logistic curve and is now a substantial proportion of production, so it can’t keep growing. But the total production of chips has been growing exponentially for decades and it is quite reasonable to expect GPUs to keep growing at that exponential rate, although slower than the recent exponential.
Comparing AI scaling laws to Wright’s law is an interesting idea. That is still a power law rather than logarithmic returns, but usefully comparable to both the pretraining and inference scaling behaviours.