A 10-fold increase in the number of GPUs above GPT-5 would require a 1 to 2.5 GW data center, which doesn’t exist and would take years to build, OR would require decentralized training using several data centers. Thus GPT-5 is expected to mark a significant slowdown in scaling runs.
Why do you think decentralized training using several data centers will lead to a significant slowdown in scaling runs? Gemini was already trained across multiple data centers.
Why do you think decentralized training using several data centers will lead to a significant slowdown in scaling runs? Gemini was already trained across multiple data centers.