The end-to-end training run is not what makes learning slow. It’s the iterative reinforcement learning process of deploying in an environment, gathering data, training on that data, and then redeploying with a new data collection strategy, etc. It’s a mistake, I think, to focus only the narrow task of updating model weights and omit the critical task of iterative data collection (i.e., reinforcement learning).
The end-to-end training run is not what makes learning slow. It’s the iterative reinforcement learning process of deploying in an environment, gathering data, training on that data, and then redeploying with a new data collection strategy, etc. It’s a mistake, I think, to focus only the narrow task of updating model weights and omit the critical task of iterative data collection (i.e., reinforcement learning).