Thanks, yeah I agree overall. Large pre-trained models will be the future, because of the few shot learning if nothing else.
I think the point I was trying to make, though, is that this paper raises a question, at least to me, as to how well these models can share knowledge between tasks. But I want to stress again I haven’t read it in detail.
In theory, we expect that multi-task models should do better than single task because they can share knowledge between tasks. Of course, the model has to be big enough to handle both tasks. (In medical imaging, a lot of studies don’t show multi-task models to be better, but I suspect this is because they don’t make the multi-task models big enough.) It seemed what they were saying was it was only in the robotics tasks where they saw a lot of clear benefits to making it multi-task, but now that I read it again it seems they found benefits for some of the other tasks too. They do mention later that transfer across Atari games is challenging.
Another thing I want to point out is that at least right now training large models and parallelization the training over many GPUs/TPUs is really technically challenging. They even ran into hardware problems here which limited the context window they were able to use. I expect this to change though with better GPU/TPU hardware and software infrastructure.
Thanks, yeah I agree overall. Large pre-trained models will be the future, because of the few shot learning if nothing else.
I think the point I was trying to make, though, is that this paper raises a question, at least to me, as to how well these models can share knowledge between tasks. But I want to stress again I haven’t read it in detail.
In theory, we expect that multi-task models should do better than single task because they can share knowledge between tasks. Of course, the model has to be big enough to handle both tasks. (In medical imaging, a lot of studies don’t show multi-task models to be better, but I suspect this is because they don’t make the multi-task models big enough.) It seemed what they were saying was it was only in the robotics tasks where they saw a lot of clear benefits to making it multi-task, but now that I read it again it seems they found benefits for some of the other tasks too. They do mention later that transfer across Atari games is challenging.
Another thing I want to point out is that at least right now training large models and parallelization the training over many GPUs/TPUs is really technically challenging. They even ran into hardware problems here which limited the context window they were able to use. I expect this to change though with better GPU/TPU hardware and software infrastructure.