Riding transformer scaling laws all the way to the end of the internet still only gets you something at most moderately superhuman. This would be civilization-of-immortal-geniuses dangerous, but not angry-alien-god dangerous: MAD is still in effect, for instance. No nanotech, no psychohistory.
In particular, they won’t be smart enough to determine whether an alternative architecture can go Foom a priori.
Foom candidates will not be many orders of magnitude cheaper to train than mature language models
and that as a result the marginal return on trying to go Foom will be zero. If it happens, it’ll be the result of deliberate effort by an agent with lots and lots of slack to burn, not something that accidentally falls out of market dynamics.
We’ve had ~50 years of software development so far and gone from 0 to GPT-4.
and a 10,000,000-fold increase in transistor density. We might return to 20th century compute cost improvements for a bit, if things get really really cheap, but it’s not going to move anywhere near as fast as software.
Riding transformer scaling laws all the way to the end of the internet
What about to the limits of data capture? There’s still many orders of magnitude more data that could be collected—imagine all the billions of cameras in the world recording video 24⁄7 for a start. Or the limits of data generation? There are already companies creating sythetic data for training ML models.
and a 10,000,000-fold increase in transistor density.
There’s probably at least another 100-fold hardware overhang in terms of under-utilised compute that could be immediately exploited by AI; much more if all GPUs/TPUs are consolidated for big training runs.
Also, you know those uncanny ads you get that are related to what you were just talking about? Google is likely already capturing more spoken words per day from phone mic recording than were used in the entirety of the GPT-4 training set (~10^12).
I expect that
Riding transformer scaling laws all the way to the end of the internet still only gets you something at most moderately superhuman. This would be civilization-of-immortal-geniuses dangerous, but not angry-alien-god dangerous: MAD is still in effect, for instance. No nanotech, no psychohistory.
In particular, they won’t be smart enough to determine whether an alternative architecture can go Foom a priori.
Foom candidates will not be many orders of magnitude cheaper to train than mature language models
and that as a result the marginal return on trying to go Foom will be zero. If it happens, it’ll be the result of deliberate effort by an agent with lots and lots of slack to burn, not something that accidentally falls out of market dynamics.
and a 10,000,000-fold increase in transistor density. We might return to 20th century compute cost improvements for a bit, if things get really really cheap, but it’s not going to move anywhere near as fast as software.
What about to the limits of data capture? There’s still many orders of magnitude more data that could be collected—imagine all the billions of cameras in the world recording video 24⁄7 for a start. Or the limits of data generation? There are already companies creating sythetic data for training ML models.
There’s probably at least another 100-fold hardware overhang in terms of under-utilised compute that could be immediately exploited by AI; much more if all GPUs/TPUs are consolidated for big training runs.
Also, you know those uncanny ads you get that are related to what you were just talking about? Google is likely already capturing more spoken words per day from phone mic recording than were used in the entirety of the GPT-4 training set (~10^12).