Order

Karma: 4

Order Jun 19, 2024, 5:44 AM
5 points
1 ∶ 2
in reply to: Linch’s comment on: David Mathers’s Quick takes
In the future I would recommend reading the full comment. Admitting your own lack of knowledge (not having read the comments) and then jumping to “obviously nonsense” and “insulting” and “Does he really think no one on Metaculus took CS 101?” is not an amazing first impression of EA. You selected the one snippet where I was discussing a complicated topic (ease of algorithmic improvements) instead of low hanging and obviously wrong topics like Aschenbrenner seemingly being unable to do basic math (3^3) using his own estimates for compute improvements. I consider this to be a large misrepresentation of my argument and I hope that you respond to this forthcoming comment in good faith.
Anyway, I am crossposting my response from Metaculus, since I responded there at length:
...there is a cavernous gap between:
- we don’t know the lower bound computational complexity
versus
- 100,000x improvement is very much in the realm of possibilities, and
- if you extend this trendline on a log plot, it will happen by 2027, and we should take this seriously (aka there is nothing that makes [the usual fraught issues with extending trendlines](https://xkcd.com/605/) appear here)
I find myself in the former camp. If you question that a sigmoid curve is likely, there is no logical basis to believe that 100,000x improvement in LLM algorithm output speed at constant compute (Aschenbrenner’s claim) is likely either.
Linch’s evidence to suggest that 100,000x is likely is:
- Moore’s Law happened [which was a hardware miniaturization problem, not strictly an algorithms problem, so doesn’t directly map onto this. But it is evidence that humans are capable of log plot improvement sometimes]
- “You can’t have infinite growth on a finite planet” is false [it is actually true, but we are not utilizing Earth anywhere near fully]
- “Numerical improvements happen all the time, sometimes by OOMs” [without cited evidence]
None of these directly show that 100,000x improvement in compute or speed is forthcoming for LLMs specifically. They are attempts to map other domains onto LLMs without a clear correspondence. Most domains don’t let you do trendline expansion like this. But I will entertain it, and provide a source to discuss (since they did not): [How Fast Do Algorithms Improve? (2021)](https://ieeexplore.ieee.org/document/9540991)
Some key takeaways:
1. Some algorithms do exhibit better-than-Moore’s-Law improvements when compared to brute force, although the likelihood of this is ~14% over the course of the entire examined time window (80 years). I would also add from looking at the plots that many of these historical improvements happened when computer science was still relatively young (1970s-1990s) and it is not obvious that this is so common nowadays with more sophisticated research in computer science. The actual yearly probability is super low (<1%) as you can see in the state diagram at the bottom of these charts in Figure 1: https://ieeexplore.ieee.org/document/9540991/figures#figures
2. Moore’s Law has slowed down, at least for CPUs. Although there is still further room in GPUs / parallel compute, the slowdown in CPUs is not a good portent for the multi-decade outlook of continued GPU scaling.
Some other things I would add:
1. LLMs already rest on decades of algorithmic advancements, for example, matrix multiplication. I would be very surprised if any algorithmic advancements can make matrix multiplication on the order of O(n^2) with a reasonable constant—it is a deeply researched human field of study and gains in it are harder to reach every year. We in theory have O(n^2.371552) but the constant in front (hidden in big O notation) is infeasibly large. Overall this one seems to have hit diminishing returns since 1990:
![](https://upload.wikimedia.org/wikipedia/commons/5/5b/MatrixMultComplexity_svg.svg)
2. There are currently trillions of dollars per year in LLMs and the current algorithmic improvements are the best we can muster. (Most of the impressive results recently have been compute driven, not algorithmic driven.) This implies that the problem might actually be very difficult instead of easy.
These two points nudge me in the direction that LLM algorithmic improvement might actually be harder than other algorithms, and therefore lead me to think that much less than 1% chance of big O improvement will happen each year. Sure, a priori ML model improvements have seemed ad hoc to an outside viewer, but that we still haven’t done better than ad hoc improvements also implies something about the problem difficulty.