Riding transformer scaling laws all the way to the end of the internet
What about to the limits of data capture? There’s still many orders of magnitude more data that could be collected—imagine all the billions of cameras in the world recording video 24⁄7 for a start. Or the limits of data generation? There are already companies creating sythetic data for training ML models.
and a 10,000,000-fold increase in transistor density.
There’s probably at least another 100-fold hardware overhang in terms of under-utilised compute that could be immediately exploited by AI; much more if all GPUs/TPUs are consolidated for big training runs.
Also, you know those uncanny ads you get that are related to what you were just talking about? Google is likely already capturing more spoken words per day from phone mic recording than were used in the entirety of the GPT-4 training set (~10^12).
What about to the limits of data capture? There’s still many orders of magnitude more data that could be collected—imagine all the billions of cameras in the world recording video 24⁄7 for a start. Or the limits of data generation? There are already companies creating sythetic data for training ML models.
There’s probably at least another 100-fold hardware overhang in terms of under-utilised compute that could be immediately exploited by AI; much more if all GPUs/TPUs are consolidated for big training runs.
Also, you know those uncanny ads you get that are related to what you were just talking about? Google is likely already capturing more spoken words per day from phone mic recording than were used in the entirety of the GPT-4 training set (~10^12).