The issue is that FLOPS cannot accurately represent computing power across different computing architectures, in particular between single CPUs versus computing clusters. As an example, let’s compare 1 computer of 100 MFLOPS with a cluster of 1000 computers of 1 MFLOPS each. The latter option has 10 times as many FLOPS, but there is a wide variety of computational problems in which the former will always be much faster. This means that FLOPS don’t meaningfully tell you which option is better, it will always depend on how well the problem you want to solve maps onto your hardware.
In large-scale computing, the bottleneck is often the communication speed in the network. If the calculations you have to do don’t neatly fall apart into roughly separate tasks, the different computers have to communicate a lot, which slows everything down. Adding more FLOPS (computers) won’t prevent that in the slightest.
You can not extrapolate FLOPS estimates without justifying why the communication overhead doesn’t make the estimated quantity meaningless on parallel hardware.
I remember looking into communication speed, but unfortunately I can’t find the sources I found last time! As I recall, when I checked the communication figures weren’t meaningfully different from processing speed figures.
Yeah, basically computers are closer in communication speed to a human brain than they are in processing speed. Which makes intuitive sense—they can transfer information at the speed of light, while brains are stuck sending chemical signals in many (all?) cases.
2nd edit: On your earlier point about training time vs. total engineering time...”Most honest” isn’t really the issue. It’s what you care about—training time illustrates that human-level performance can be quickly surpassed by an AI system’s capabilities once it’s built. Then the AI will keep improving, leaving us in the dust (although the applicability of current algorithms to more complex tasks is unclear). Total engineering time would show that these are massive projects which take time to develop...which is also true.
The issue is that FLOPS cannot accurately represent computing power across different computing architectures, in particular between single CPUs versus computing clusters. As an example, let’s compare 1 computer of 100 MFLOPS with a cluster of 1000 computers of 1 MFLOPS each. The latter option has 10 times as many FLOPS, but there is a wide variety of computational problems in which the former will always be much faster. This means that FLOPS don’t meaningfully tell you which option is better, it will always depend on how well the problem you want to solve maps onto your hardware.
In large-scale computing, the bottleneck is often the communication speed in the network. If the calculations you have to do don’t neatly fall apart into roughly separate tasks, the different computers have to communicate a lot, which slows everything down. Adding more FLOPS (computers) won’t prevent that in the slightest.
You can not extrapolate FLOPS estimates without justifying why the communication overhead doesn’t make the estimated quantity meaningless on parallel hardware.
I remember looking into communication speed, but unfortunately I can’t find the sources I found last time! As I recall, when I checked the communication figures weren’t meaningfully different from processing speed figures.
Edit: found it! AI Impacts on TEPS (traversed edges per second): https://aiimpacts.org/brain-performance-in-teps/
Yeah, basically computers are closer in communication speed to a human brain than they are in processing speed. Which makes intuitive sense—they can transfer information at the speed of light, while brains are stuck sending chemical signals in many (all?) cases.
2nd edit: On your earlier point about training time vs. total engineering time...”Most honest” isn’t really the issue. It’s what you care about—training time illustrates that human-level performance can be quickly surpassed by an AI system’s capabilities once it’s built. Then the AI will keep improving, leaving us in the dust (although the applicability of current algorithms to more complex tasks is unclear). Total engineering time would show that these are massive projects which take time to develop...which is also true.