Let CT be the computing power used to train the model. Is the idea that “if you could afford CT to train the model, then you can also afford CT for running models”?
Because that doesn’t seem obvious. What if you used 99% of your budget on training? Then you’d only be able to afford 0.01×CT for running models.
Or is this just an example to show that training costs >> running costs?
Yes, that’s how I understood it as well. If you spend the same amount on inference as you did on training, then you get a hell of a lot of inference.
I would expect he’d also argue that, because companies are willing to spend tons of money on training, we should also expect them to be willing to spend lots on inference.
Nearly impossible to answer. This report by OpenPhil gives it a hell of an effort, but could still be wrong by orders of magnitude. Most fundamentally, the amount of compute necessary for AGI might not be related to the amount of compute used by the human brain, because we don’t know how similar our algorithmic efficiency is compared to the brain’s.
So like the terms of art here are “training” versus “inference”. I don’t have a reference or guide (because the relative size is not something that most people think about versus the absolute size of each individually) but if you google them and scroll through some papers or posts I think you will see some clear examples.
Just LARPing here. I don’t really know anything about AI or machine learning.
I guess in some deeper sense you are right and (my simulated version of) what Holden has written is imprecise.
We don’t really see many “continuously” updating models where training continues live with use. So the mundane pattern we see today of inference, where we trivially running the instructions from the model (often on specific silicon made for inference) being much cheaper than training, may not apply for some reason, to the pattern that the out of control AI uses.
It’s not impossible that if the system needs to be self improving, it has to provision a large fraction of its training cost, or something, continually.
It’s not really clear what the “shape” of this “relative cost curve” would be, if this would be a short period of time, and it doesn’t make it any less dangerous.
Let CT be the computing power used to train the model. Is the idea that “if you could afford CT to train the model, then you can also afford CT for running models”?
Because that doesn’t seem obvious. What if you used 99% of your budget on training? Then you’d only be able to afford 0.01×CT for running models.
Or is this just an example to show that training costs >> running costs?
Yes, that’s how I understood it as well. If you spend the same amount on inference as you did on training, then you get a hell of a lot of inference.
I would expect he’d also argue that, because companies are willing to spend tons of money on training, we should also expect them to be willing to spend lots on inference.
Do we know the expected cost for training an AGI? Is that within a single company’s budget?
Nearly impossible to answer. This report by OpenPhil gives it a hell of an effort, but could still be wrong by orders of magnitude. Most fundamentally, the amount of compute necessary for AGI might not be related to the amount of compute used by the human brain, because we don’t know how similar our algorithmic efficiency is compared to the brain’s.
https://www.cold-takes.com/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell/
Yes, the last sentence is exactly correct.
So like the terms of art here are “training” versus “inference”. I don’t have a reference or guide (because the relative size is not something that most people think about versus the absolute size of each individually) but if you google them and scroll through some papers or posts I think you will see some clear examples.
Just LARPing here. I don’t really know anything about AI or machine learning.
I guess in some deeper sense you are right and (my simulated version of) what Holden has written is imprecise.
We don’t really see many “continuously” updating models where training continues live with use. So the mundane pattern we see today of inference, where we trivially running the instructions from the model (often on specific silicon made for inference) being much cheaper than training, may not apply for some reason, to the pattern that the out of control AI uses.
It’s not impossible that if the system needs to be self improving, it has to provision a large fraction of its training cost, or something, continually.
It’s not really clear what the “shape” of this “relative cost curve” would be, if this would be a short period of time, and it doesn’t make it any less dangerous.