PSA: the term “compute overhang” or “hardware overhang” has been used in many ways. Today it seems to most often (but far from always) mean amount labs can quickly scale up the size of the largest training run (especially because a ban on large training runs ends). When you see it or use it, make sure everyone knows what it means.
PSA: if “pause” is not defined but seems to refer to a specific kind of government policy, it most likely means policy regime that stops training runs using compute beyond a certain threshold.
Relatedly, there’s something like a soft pause or slowdown where you slow training runs using compute beyond a certain threshold, but the threshold is moving every year. This could be a pragmatic tweak because compute will likely get cheaper, so it becomes easier for rogue actors to circumvent the compute cap if it never moves. This soft pause idea has been referred to as “moving bright line (of a compute cap).”
I’m trying to make “FLOPstacles” happen for things that mean we can’t just take max FLOP per GPU and multiply by number of GPUs, e.g. mem or interconnect bandwidth.
PSA: the term “compute overhang” or “hardware overhang” has been used in many ways. Today it seems to most often (but far from always) mean amount labs can quickly scale up the size of the largest training run (especially because a ban on large training runs ends). When you see it or use it, make sure everyone knows what it means.
(It will come up often in this debate.)
PSA: if “pause” is not defined but seems to refer to a specific kind of government policy, it most likely means policy regime that stops training runs using compute beyond a certain threshold.
Relatedly, there’s something like a soft pause or slowdown where you slow training runs using compute beyond a certain threshold, but the threshold is moving every year. This could be a pragmatic tweak because compute will likely get cheaper, so it becomes easier for rogue actors to circumvent the compute cap if it never moves. This soft pause idea has been referred to as “moving bright line (of a compute cap).”
PSA: use “FLOP” for compute and “FLOP/s” for compute per second. Avoid “FLOPS” and “FLOPs.”
(Adding to this: “FLOP” is the plural of “FLOP”.)
I’m trying to make “FLOPstacles” happen for things that mean we can’t just take max FLOP per GPU and multiply by number of GPUs, e.g. mem or interconnect bandwidth.