I have been wondering the same. However, given that OpenAI’s “AI and Compute” inclusion criteria are also a bit vague, I’m having a hard time which of our data points would fulfill their criteria.
In general, I would describe our dataset matching the same criteria because:
“relatively well known” equals our “lots of citations”.
“used a lot of compute for their time” equals our dataset if we exclude outliers from efficient ML models.
There’s a recent trend in efficient ML models that achieve similar performance by using less compute for inference and training (those models are then used for e.g., deployment on embedded systems or smartphones).
“gave enough information to estimate the compute”: We also rely on estimates from us or the community based on the information available in the paper. For a source of the estimate see the note on the cell in our dataset.
We’re working on gathering more compute data by directly asking researchers (next target n=100) .
I’d be interested in discussing more precise inclusion criteria. As I say in the post:
Also, it is unclear on which models we should base this trend. The piece AI and Compute also quickly discusses this in the appendix. Given the recent trend of efficient ML models due to emerging fields such as Machine Learning on the Edge, I think it might be worthwhile discussing how to integrate and interpret such models in analyses like this — ignoring them cannot be the answer.
The described doubling time of 6.2 months is the result when the outliers are excluded.
If one includes all our models, the doubling time was around ≈7 months. However, the number of efficient ML models was only one or two.
I have been wondering the same. However, given that OpenAI’s “AI and Compute” inclusion criteria are also a bit vague, I’m having a hard time which of our data points would fulfill their criteria.
In general, I would describe our dataset matching the same criteria because:
“relatively well known” equals our “lots of citations”.
“used a lot of compute for their time” equals our dataset if we exclude outliers from efficient ML models.
There’s a recent trend in efficient ML models that achieve similar performance by using less compute for inference and training (those models are then used for e.g., deployment on embedded systems or smartphones).
“gave enough information to estimate the compute”: We also rely on estimates from us or the community based on the information available in the paper. For a source of the estimate see the note on the cell in our dataset.
We’re working on gathering more compute data by directly asking researchers (next target
n=100
) .I’d be interested in discussing more precise inclusion criteria. As I say in the post:
Thanks! What happens to your doubling times if you exclude the outliers from efficient ML models?
The described doubling time of 6.2 months is the result when the outliers are excluded. If one includes all our models, the doubling time was around ≈7 months. However, the number of efficient ML models was only one or two.