For what it’s worth, I think pre-training alone is probably enough to get us to about 1-3 month time horizons based on a 7 month doubling time, but pre-training data will start to run out in the early 2030s, meaning that you no longer (in the absence of other benchmarks) have very good general proxies of capabilities improvements.
The real issue isn’t the difference between hours and months long tasks, but the difference between months long tasks and century long tasks, which Steve Newman describes well here.
For what it’s worth, I think pre-training alone is probably enough to get us to about 1-3 month time horizons based on a 7 month doubling time, but pre-training data will start to run out in the early 2030s, meaning that you no longer (in the absence of other benchmarks) have very good general proxies of capabilities improvements.
The real issue isn’t the difference between hours and months long tasks, but the difference between months long tasks and century long tasks, which Steve Newman describes well here.