You appear to be comparing two different things here. GPT-4 was trained using around 2*10^25 FLOP (floating point operations). The Open Phil report estimates that 10^21 FLOP/s (floating point operations per second) would be needed to run—not train—a model that matches the human brain.
Epoch AI estimates that the compute used in the final training run of GPT-4, the most compute-intensive model to date, was 2e25 FLOP (source).
2e25 FLOP is 2 * 10^25 FLOP. So, if this estimate is correct, then GPT-4 is already beyond Open Phil’s threshold of 10^21 FLOP. Am I wrong?
You appear to be comparing two different things here. GPT-4 was trained using around 2*10^25 FLOP (floating point operations). The Open Phil report estimates that 10^21 FLOP/s (floating point operations per second) would be needed to run—not train—a model that matches the human brain.