Linch comments on David Mathers’s Quick takes

Linch 16 Jun 2024 0:35 UTC
5 points
2 ∶ 2
I didn’t read all the comments, but Order’s are obvious nonsense, of the “(a+b^n)/n = x, therefore God exists” tier. Eg take this comment:
But something like 5 OOMs seems very much in the realm of possibilities; again, that would just require another decade of trend algorithmic efficiencies (not even counting algorithmic gains from unhobbling).
Here he claims that 100,000x improvement is possible in LLM algorithmic efficiency, given that 10x was possible in a year. This seems unmoored from reality—algorithms cannot infinitely improve, you can derive a mathematical upper bound. You provably cannot get better than Ω(n log n) comparisons for sorting a randomly distributed list. Perhaps he thinks new mathematics or physics will also be discovered before 2027?
This is obviously invalid. The existence of a theoretical complexity upper bound (which incidentally Order doesn’t have numbers of) doesn’t mean we are anywhere near it, numerically. Those aren’t even the same level of abstraction! Furthermore, we have clear theoretical proofs for how fast sorting can get, without AFAIK any such theoretical limits for learning. “algorithms cannot infinitely improve” is irrelevant here, it’s the slightly more mathy way to say a deepity like “you can’t have infinite growth on a finite planet,” without actual relevant semantic meaning^[1].
Numerical improvements happen all the time, sometimes by OOMs. No “new mathematics or physics” required.

Frankly, as a former active user of Metaculus, I feel pretty insulted by his comment. Does he really think no one on Metaculus took CS 101?
1. ^
  It’s probably true that every apparently “exponential” curve become a sigmoid eventually, knowing this fact doesn’t let you time the transition. You need actual object-level arguments and understanding, and even then it’s very very hard (as people arguing against Moore’s Law or for “you can’t have infinite growth on a finite planet” found out).
- Linch 16 Jun 2024 1:37 UTC
  6 points
  1 ∶ 0
  Parent
  To be clear I also have high error bars on whether traversing 5 OOMs of algorithmic efficiency in the next five years are possible, but that’s because a) high error bars on diminishing returns to algorithmic gains, and b) a tentative model that most algorithmic gains in the past were driven by compute gains, rather than exogeneous to it. Algorithmic improvements in ML seems much more driven by the “f-ck around and find out” paradigm than deep theoretical or conceptual breakthroughs; if we model experimentation gains as a function of quality-adjusted researchers multiplied by compute multiplied by time, it’s obvious that the compute term is the one that’s growing the fastest (and thus the thing that drives the most algorithmic progress).
- Order 19 Jun 2024 5:44 UTC
  5 points
  1 ∶ 2
  Parent
  In the future I would recommend reading the full comment. Admitting your own lack of knowledge (not having read the comments) and then jumping to “obviously nonsense” and “insulting” and “Does he really think no one on Metaculus took CS 101?” is not an amazing first impression of EA. You selected the one snippet where I was discussing a complicated topic (ease of algorithmic improvements) instead of low hanging and obviously wrong topics like Aschenbrenner seemingly being unable to do basic math (3^3) using his own estimates for compute improvements. I consider this to be a large misrepresentation of my argument and I hope that you respond to this forthcoming comment in good faith.
  Anyway, I am crossposting my response from Metaculus, since I responded there at length:
  ...there is a cavernous gap between:
  - we don’t know the lower bound computational complexity
  versus
  - 100,000x improvement is very much in the realm of possibilities, and
  - if you extend this trendline on a log plot, it will happen by 2027, and we should take this seriously (aka there is nothing that makes [the usual fraught issues with extending trendlines](https://xkcd.com/605/) appear here)
  I find myself in the former camp. If you question that a sigmoid curve is likely, there is no logical basis to believe that 100,000x improvement in LLM algorithm output speed at constant compute (Aschenbrenner’s claim) is likely either.
  Linch’s evidence to suggest that 100,000x is likely is:
  - Moore’s Law happened [which was a hardware miniaturization problem, not strictly an algorithms problem, so doesn’t directly map onto this. But it is evidence that humans are capable of log plot improvement sometimes]
  - “You can’t have infinite growth on a finite planet” is false [it is actually true, but we are not utilizing Earth anywhere near fully]
  - “Numerical improvements happen all the time, sometimes by OOMs” [without cited evidence]
  None of these directly show that 100,000x improvement in compute or speed is forthcoming for LLMs specifically. They are attempts to map other domains onto LLMs without a clear correspondence. Most domains don’t let you do trendline expansion like this. But I will entertain it, and provide a source to discuss (since they did not): [How Fast Do Algorithms Improve? (2021)](https://ieeexplore.ieee.org/document/9540991)
  Some key takeaways:
  1. Some algorithms do exhibit better-than-Moore’s-Law improvements when compared to brute force, although the likelihood of this is ~14% over the course of the entire examined time window (80 years). I would also add from looking at the plots that many of these historical improvements happened when computer science was still relatively young (1970s-1990s) and it is not obvious that this is so common nowadays with more sophisticated research in computer science. The actual yearly probability is super low (<1%) as you can see in the state diagram at the bottom of these charts in Figure 1: https://ieeexplore.ieee.org/document/9540991/figures#figures
  2. Moore’s Law has slowed down, at least for CPUs. Although there is still further room in GPUs / parallel compute, the slowdown in CPUs is not a good portent for the multi-decade outlook of continued GPU scaling.
  Some other things I would add:
  1. LLMs already rest on decades of algorithmic advancements, for example, matrix multiplication. I would be very surprised if any algorithmic advancements can make matrix multiplication on the order of O(n^2) with a reasonable constant—it is a deeply researched human field of study and gains in it are harder to reach every year. We in theory have O(n^2.371552) but the constant in front (hidden in big O notation) is infeasibly large. Overall this one seems to have hit diminishing returns since 1990:
  ![](https://upload.wikimedia.org/wikipedia/commons/5/5b/MatrixMultComplexity_svg.svg)
  2. There are currently trillions of dollars per year in LLMs and the current algorithmic improvements are the best we can muster. (Most of the impressive results recently have been compute driven, not algorithmic driven.) This implies that the problem might actually be very difficult instead of easy.
  These two points nudge me in the direction that LLM algorithmic improvement might actually be harder than other algorithms, and therefore lead me to think that much less than 1% chance of big O improvement will happen each year. Sure, a priori ML model improvements have seemed ad hoc to an outside viewer, but that we still haven’t done better than ad hoc improvements also implies something about the problem difficulty.
  - Linch 19 Jun 2024 13:01 UTC
    8 points
    2 ∶ 0
    Parent
    I appreciate that you replied! I’m sorry if I was rude. I think you’re not engaging with what I actually said in my comment, which is pretty ironic. :)
    (eg there are multiple misreadings. ~~I’ve never interacted with you before so I don’t really know if they’re intentional~~)
    - Linch 21 Jun 2024 4:27 UTC
      2 points
      0 ∶ 0
      Parent
      (I replied more substantively on Metaculus)