Executive summary: The scaling of AI training runs is expected to slow down significantly after GPT-5 due to the unsustainable power consumption required to continue scaling at the current rate, which would necessitate the equivalent of multiple nuclear power plants.
Key points:
Current large data centers consume around 100 MW of power, limiting the number of GPUs that can be supported.
GPT-4 used an estimated 15k to 25k GPUs, requiring 15 to 25 MW of power.
A 10-fold increase in GPUs above GPT-5 would require a 1 to 2.5 GW data center, which doesn’t exist and would take years to build.
After GPT-5, the focus will shift to improving software efficiency, scaling at inference time, and decentralized training using multiple data centers.
Scaling GPUs will be slowed down by regulations on lands, energy production, and build time, potentially leading to the construction of training data centers in low-regulation countries.
The total growth rate of effective compute is expected to decrease significantly after GPT-5, from ~x22/year (or x6.2/year using pre-ChatGPT investment growth values) to ~x4/year, assuming no efficient decentralized training is developed.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: The scaling of AI training runs is expected to slow down significantly after GPT-5 due to the unsustainable power consumption required to continue scaling at the current rate, which would necessitate the equivalent of multiple nuclear power plants.
Key points:
Current large data centers consume around 100 MW of power, limiting the number of GPUs that can be supported.
GPT-4 used an estimated 15k to 25k GPUs, requiring 15 to 25 MW of power.
A 10-fold increase in GPUs above GPT-5 would require a 1 to 2.5 GW data center, which doesn’t exist and would take years to build.
After GPT-5, the focus will shift to improving software efficiency, scaling at inference time, and decentralized training using multiple data centers.
Scaling GPUs will be slowed down by regulations on lands, energy production, and build time, potentially leading to the construction of training data centers in low-regulation countries.
The total growth rate of effective compute is expected to decrease significantly after GPT-5, from ~x22/year (or x6.2/year using pre-ChatGPT investment growth values) to ~x4/year, assuming no efficient decentralized training is developed.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.