right now we have lots of resources that did not exist in 2018, like dramatically more compute, better tooling and frameworks like PyTorch and JAX, armies of experts on parallelization, and on and on. These were bottlenecks in 2018, without which we presumably would have gotten the LLMs of today years earlier.
I fear this may be pointless nitpicking, but if Iām getting the timeline right, PyTorchās initial alpha release was in September 2016, its initial proper public release was in January 2017, and PyTorch version 1.0 was released in October 2018. Iām much less familiar with JAX, but apparently it was released in December 2018. Maybe you simply intended to say that PyTorch and JAX are better today than they were in 2018. I donāt know. This just stuck out to me as I was re-reading your comment just now.
For context, OpenAI published a paper about GPT-1 (or just GPT) in 2018, released GPT-2 in 2019, and released GPT-3 in 2020. (Iām going off the dates on the Wikipedia pages for each model.) GPT-1 apparently used TensorFlow, which was initially released in 2015, the same year OpenAI was founded. TensorFlow had a version 1.0 release in 2017, the year before the GPT-1 paper. (In 2020, OpenAI said in a blog post they would be switching to using PyTorch exclusively.)
Maybe you simply intended to say that PyTorch and JAX are better today than they were in 2018.
Yup! E.g. torch.compile āmakes code run up to 2x fasterā and came out in PyTorch 2.0 in 2023.
More broadly, what I had in mind was: open-source software for everything to do with large-scale ML trainingācontainerization, distributed training, storing checkpoints, hyperparameter tuning, training data and training environments, orchestration and pipelines, dashboards for monitoring training runs, on and onāis much more developed now compared to 2018, and even compared to 2022, if I understand correctly (Iām not a practitioner). Sorry for poor wording. :)
Presumably a lot of these are all optimised for the current gen-AI paradigm, though. But weāre talking about what happens if the current paradigm fails. Iām sure some of it would carry over to a different AI paradigm, but also itās pretty likely there would be other bottleneck we would have to tune to get things working.
I feel like what youāre saying is the equivalent of pointing out in 2020 that we have had so many optimisations and computing resources that went into, say, google searches, and then using that as evidence that surely the big data that goes into LLMās should be instantaneous as well.
Presumably a lot of these are all optimised for the current gen-AI paradigm, though. But weāre talking about what happens if the current paradigm fails. Iām sure some of it would carry over to a different AI paradigm, but also itās pretty likely there would be other bottleneck we would have to tune to get things working.
Yup, some stuff will be useful and others wonāt. The subset of useful stuff will make future researchersā lives easier and allow them to work faster. For example, here are people using JAX for lots of computations that are not deep learning at all.
I feel like what youāre saying is the equivalent of pointing out in 2020 that we have had so many optimisations and computing resources that went into, say, google searches, and then using that as evidence that surely the big data that goes into LLMās should be instantaneous as well.
In like 2010ā2015, ābig dataā and āthe cloudā were still pretty hot new things, and people developed a bunch of storage formats, software tools, etc. for distributed data, distributed computing, parallelization, and cloud computing. And yes I do think that stuff turned out to be useful when deep learning started blowing up (and then LLMs after that), in the sense that ML researchers would have made slower progress (on the margin) if not for all that development. I think Docker and Kubernetes are good examples here. Iām not sure exactly how different the counterfactual would have been, but I do think it made more than zero difference.
Things like Docker containers or cloud VMs that can be, in principle, applied to any sort of software or computation could be helpful for all sorts of applications we canāt anticipate. They are very general-purpose. That makes sense to me.
The extent to which things designed for deep learning, such as PyTorch, could be applied to ideas outside deep learning seems much more dubious.
And if weāre thinking about ideas that fall within deep learning, but outside what is currently mainstream and popular, then I simply donāt know.
I fear this may be pointless nitpicking, but if Iām getting the timeline right, PyTorchās initial alpha release was in September 2016, its initial proper public release was in January 2017, and PyTorch version 1.0 was released in October 2018. Iām much less familiar with JAX, but apparently it was released in December 2018. Maybe you simply intended to say that PyTorch and JAX are better today than they were in 2018. I donāt know. This just stuck out to me as I was re-reading your comment just now.
For context, OpenAI published a paper about GPT-1 (or just GPT) in 2018, released GPT-2 in 2019, and released GPT-3 in 2020. (Iām going off the dates on the Wikipedia pages for each model.) GPT-1 apparently used TensorFlow, which was initially released in 2015, the same year OpenAI was founded. TensorFlow had a version 1.0 release in 2017, the year before the GPT-1 paper. (In 2020, OpenAI said in a blog post they would be switching to using PyTorch exclusively.)
Yup! E.g.
torch.compileāmakes code run up to 2x fasterā and came out in PyTorch 2.0 in 2023.More broadly, what I had in mind was: open-source software for everything to do with large-scale ML trainingācontainerization, distributed training, storing checkpoints, hyperparameter tuning, training data and training environments, orchestration and pipelines, dashboards for monitoring training runs, on and onāis much more developed now compared to 2018, and even compared to 2022, if I understand correctly (Iām not a practitioner). Sorry for poor wording. :)
Presumably a lot of these are all optimised for the current gen-AI paradigm, though. But weāre talking about what happens if the current paradigm fails. Iām sure some of it would carry over to a different AI paradigm, but also itās pretty likely there would be other bottleneck we would have to tune to get things working.
I feel like what youāre saying is the equivalent of pointing out in 2020 that we have had so many optimisations and computing resources that went into, say, google searches, and then using that as evidence that surely the big data that goes into LLMās should be instantaneous as well.
Yup, some stuff will be useful and others wonāt. The subset of useful stuff will make future researchersā lives easier and allow them to work faster. For example, here are people using JAX for lots of computations that are not deep learning at all.
In like 2010ā2015, ābig dataā and āthe cloudā were still pretty hot new things, and people developed a bunch of storage formats, software tools, etc. for distributed data, distributed computing, parallelization, and cloud computing. And yes I do think that stuff turned out to be useful when deep learning started blowing up (and then LLMs after that), in the sense that ML researchers would have made slower progress (on the margin) if not for all that development. I think Docker and Kubernetes are good examples here. Iām not sure exactly how different the counterfactual would have been, but I do think it made more than zero difference.
Things like Docker containers or cloud VMs that can be, in principle, applied to any sort of software or computation could be helpful for all sorts of applications we canāt anticipate. They are very general-purpose. That makes sense to me.
The extent to which things designed for deep learning, such as PyTorch, could be applied to ideas outside deep learning seems much more dubious.
And if weāre thinking about ideas that fall within deep learning, but outside what is currently mainstream and popular, then I simply donāt know.