Re Tesla: My best guess is that they still need 5-20x reliability to match human level, and I don’t entirely rule out that they’ll manage it with AI4. Hard to get good data on this though. It sounds like they were still finalizing the AI5 chip design until quite recently, and I’m not sure it makes sense to spend training budget on a model that can only run on a handful of cars while they’re still hoping to squeeze more out of AI4. There’s likely an inference overhang here. They’ve spent years scaling training data and compute while model size stayed fixed, way, way past the usual theoretical optimal tradeoff point. Lifting the size constraint will probably yield disproportionate gains.
Re the middle stuff: I think we just disagree on how to weigh various evidence from failed predictions (based on narrow models, older models...), various firsthand reports and more recent benchmark results.
Re the 90% prediction: by “conservative” I meant this is like my 5-10th percentile slowest timeline. I’ve heard from a number of SWEs that they’re already basically not writing code, just instructing and reviewing. I’m also uncertain about adoption speed. I’d put it at >50% chance that among SWEs actually using the latest LLMs and tools, AI writes 90%+ of their code in the first half of this year.
I recall Elon Musk once said the goal was to get to an average of one intervention per million miles of driving. I think this is based on the statistic of one crash per 500,000 miles on average.
I believe interventions currently happen more than once per 100 miles on average. If so, and if one intervention per million miles is what Tesla is indeed targeting, then Tesla is more than 10,000x off from its goal.
There are other ways of measuring Tesla’s FSD software’s performance compared to average human driving performance and getting another number. I am skeptical it would be possible to use real, credible numbers and come to the conclusion that Tesla is currently less than 100x away from human-level driving.
I very much doubt that Hardware 5/AI5 is going to provide what it takes for Tesla to achieve SAE Level 4⁄5 autonomy at human-level or better performance, or that Tesla will achieve that goal (in any robust, meaningful sense) within the next 2 years. I still think what I said is true — Tesla, internally, would have evidence of this if it were true (or would be capable of obtaining it), and would be incentivized to show off that evidence.
Andrej Karpathy understands this topic better than almost anyone else in the world, and he is clear that he thinks fully autonomous driving is not solved (at Tesla, Waymo, or elsewhere) and there’s long way to go still. There’s good reason to listen to Karpathy on this.
I also very much doubt that the best AI models in 2 years will be capable of writing 90% of commercial, production code, let alone that this will happen within six months. I think there’s essentially no chance of this happening in 2026. As far as I can see, there is no good evidence currently available that would suggest this is starting to happen or should be possible soon. Extrapolating from performance on narrow, contrived benchmark tasks to real world performance is just a mistake. And the evidence about real world use of AI for coding does not support this.
Re Tesla: My best guess is that they still need 5-20x reliability to match human level, and I don’t entirely rule out that they’ll manage it with AI4. Hard to get good data on this though. It sounds like they were still finalizing the AI5 chip design until quite recently, and I’m not sure it makes sense to spend training budget on a model that can only run on a handful of cars while they’re still hoping to squeeze more out of AI4. There’s likely an inference overhang here. They’ve spent years scaling training data and compute while model size stayed fixed, way, way past the usual theoretical optimal tradeoff point. Lifting the size constraint will probably yield disproportionate gains.
Re the middle stuff: I think we just disagree on how to weigh various evidence from failed predictions (based on narrow models, older models...), various firsthand reports and more recent benchmark results.
Re the 90% prediction: by “conservative” I meant this is like my 5-10th percentile slowest timeline. I’ve heard from a number of SWEs that they’re already basically not writing code, just instructing and reviewing. I’m also uncertain about adoption speed. I’d put it at >50% chance that among SWEs actually using the latest LLMs and tools, AI writes 90%+ of their code in the first half of this year.
Where do you get that 5-20x figure from?
I recall Elon Musk once said the goal was to get to an average of one intervention per million miles of driving. I think this is based on the statistic of one crash per 500,000 miles on average.
I believe interventions currently happen more than once per 100 miles on average. If so, and if one intervention per million miles is what Tesla is indeed targeting, then Tesla is more than 10,000x off from its goal.
There are other ways of measuring Tesla’s FSD software’s performance compared to average human driving performance and getting another number. I am skeptical it would be possible to use real, credible numbers and come to the conclusion that Tesla is currently less than 100x away from human-level driving.
I very much doubt that Hardware 5/AI5 is going to provide what it takes for Tesla to achieve SAE Level 4⁄5 autonomy at human-level or better performance, or that Tesla will achieve that goal (in any robust, meaningful sense) within the next 2 years. I still think what I said is true — Tesla, internally, would have evidence of this if it were true (or would be capable of obtaining it), and would be incentivized to show off that evidence.
Andrej Karpathy understands this topic better than almost anyone else in the world, and he is clear that he thinks fully autonomous driving is not solved (at Tesla, Waymo, or elsewhere) and there’s long way to go still. There’s good reason to listen to Karpathy on this.
I also very much doubt that the best AI models in 2 years will be capable of writing 90% of commercial, production code, let alone that this will happen within six months. I think there’s essentially no chance of this happening in 2026. As far as I can see, there is no good evidence currently available that would suggest this is starting to happen or should be possible soon. Extrapolating from performance on narrow, contrived benchmark tasks to real world performance is just a mistake. And the evidence about real world use of AI for coding does not support this.