Agree, and add that code models won’t be data constrained as they can generate their own training data. It’s simple to write tests automatically, and you can run the code to see whether it passes the tests before adding it to your training dataset. As an unfortunate side effect, part of this process involves constantly and automatically running code output by a large model, and feeding it data which it generated so it can update its weights, both of which are not good safety-wise if the model is misaligned and power seeking.
I don’t know if this has been incorporated into a wider timelines analysis yet as it is quite recent, but this was a notable update for me given the latest scaling laws which indicate that data is the constraining factor, not parameter count. Much shorter timelines than 2043 seem like a live and strategically relevant possibility.
Agree, and add that code models won’t be data constrained as they can generate their own training data. It’s simple to write tests automatically, and you can run the code to see whether it passes the tests before adding it to your training dataset. As an unfortunate side effect, part of this process involves constantly and automatically running code output by a large model, and feeding it data which it generated so it can update its weights, both of which are not good safety-wise if the model is misaligned and power seeking.
I don’t know if this has been incorporated into a wider timelines analysis yet as it is quite recent, but this was a notable update for me given the latest scaling laws which indicate that data is the constraining factor, not parameter count. Much shorter timelines than 2043 seem like a live and strategically relevant possibility.