If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
I guess it’s not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don’t feel like I have a clear picture of the dynamics here, though.)
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.
OK I see the model there.
I guess it’s not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don’t feel like I have a clear picture of the dynamics here, though.)