Thank you for the examples! Could you elaborate on the technical example of breaking down a large model into sub-components, then training each sub-components individually, and finally assembling it into a large model? Will such a method realistically be used to train AGI-level systems? I would think that the model needs to be sufficiently large during training to learn highly complex functions. Do you have any resources you could share that indicate that large models can be successfully trained this way?
Thank you for the examples! Could you elaborate on the technical example of breaking down a large model into sub-components, then training each sub-components individually, and finally assembling it into a large model? Will such a method realistically be used to train AGI-level systems? I would think that the model needs to be sufficiently large during training to learn highly complex functions. Do you have any resources you could share that indicate that large models can be successfully trained this way?