The described doubling time of 6.2 months is the result when the outliers are excluded. If one includes all our models, the doubling time was around ≈7 months. However, the number of efficient ML models was only one or two.
lennart
Compute Governance and Conclusions—Transformative AI and Compute [3/4]
For “Semiconductor industry amortize their R&D cost due to slower improvements” the decreased price comes from the longer innovation cycles, so the R&D investments spread out over a longer time period. Competition should then drive the price down.
While in contrast “Sale price amortization when improvements are slower” describes the idea that the sale price within the company will be amortized over a longer time period given that obsolescence will be achieved later.
Those ideas stem from Cotra’s appendices: “Room for improvements to silicon chips in the medium term”.
Thanks, Sammy. Indeed this is related and very interesting!
Forecasting Compute—Transformative AI and Compute [2/4]
Thanks, I’ve edited it.
Thanks, Michael.
n
is counting the number of ML systems in the analysis at the point of writing. (We have added more systems in the meantime). An example for such a system is GPT-3, AlphaFold, etc. - basically a row in our dataset.Right, good point. I’ll add the number of systems for the given time period.
That’s hard to answer. I don’t think OpenAI misinterpreted anything. For the moment, I think it’s probably a mixture of:
the inclusion criteria for the systems on which we base this trend
actual slower doubling times for reasons which we should figure out Nonetheless, as outlined in Part 1 - Section 2.3, I did not interpret those trends yet but I’m interested in a discussion and trying to write up my thoughts on this in the future.
I have been wondering the same. However, given that OpenAI’s “AI and Compute” inclusion criteria are also a bit vague, I’m having a hard time which of our data points would fulfill their criteria.
In general, I would describe our dataset matching the same criteria because:
“relatively well known” equals our “lots of citations”.
“used a lot of compute for their time” equals our dataset if we exclude outliers from efficient ML models.
There’s a recent trend in efficient ML models that achieve similar performance by using less compute for inference and training (those models are then used for e.g., deployment on embedded systems or smartphones).
“gave enough information to estimate the compute”: We also rely on estimates from us or the community based on the information available in the paper. For a source of the estimate see the note on the cell in our dataset.
We’re working on gathering more compute data by directly asking researchers (next target
n=100
) .
I’d be interested in discussing more precise inclusion criteria. As I say in the post:
Also, it is unclear on which models we should base this trend. The piece AI and Compute also quickly discusses this in the appendix. Given the recent trend of efficient ML models due to emerging fields such as Machine Learning on the Edge, I think it might be worthwhile discussing how to integrate and interpret such models in analyses like this — ignoring them cannot be the answer.
What is Compute? - Transformative AI and Compute [1/4]
Transformative AI and Compute [Summary]
Also happy to help on a more local level: eazurich.org/join
If you’re not already in contact with EA Zürich, just sent us a mail and we will get back to you: info@eazurich.org .
Is there any chance to get a hold of the material which you used for this workshop?
Thanks, Nuño.