What about factor increase per year, reported alongside a second number to show how the increases compose (e.g. the factor increase per decade)? So “compute has been increasing by 1.4x per year, or 28x per decade” or sth.
The main problem with OOMs is fractional OOMs, like your recent headline of “0.1 OOMs”. Very few people are going to interpret this right, where they’d do much better with “2 OOMs”.
Factor increase per year is the way we are reporting growth rates by default now in the dashboard.
And I agree it will be better interpreted by the public. On the other hand, multiplying numbers is hard, so it’s not as nice for mental arithmetic. And thinking logarithmically puts you in the right frame of mind.
Saying that GPT-4 was trained on x100 more compute than GPT-3 invokes GPT-3 being 100 times better, whereas I think saying it was trained on 2 OOM more compute gives you a better picture of the expected improvement.
I might be wrong here.
In any case, it is still a better choice than doubling times.
What about factor increase per year, reported alongside a second number to show how the increases compose (e.g. the factor increase per decade)? So “compute has been increasing by 1.4x per year, or 28x per decade” or sth.
The main problem with OOMs is fractional OOMs, like your recent headline of “0.1 OOMs”. Very few people are going to interpret this right, where they’d do much better with “2 OOMs”.
Factor increase per year is the way we are reporting growth rates by default now in the dashboard.
And I agree it will be better interpreted by the public. On the other hand, multiplying numbers is hard, so it’s not as nice for mental arithmetic. And thinking logarithmically puts you in the right frame of mind.
Saying that GPT-4 was trained on x100 more compute than GPT-3 invokes GPT-3 being 100 times better, whereas I think saying it was trained on 2 OOM more compute gives you a better picture of the expected improvement.
I might be wrong here.
In any case, it is still a better choice than doubling times.