Maybe, like, it seems like you want to focus on the “output” instead (and define some metric[1] relative to this output and the “your targeted performance of the model”) ?
In contrast to focusing on the output, focusing on the mix of input data seems different.
For example, it’s not clear that a pass with a batch of GiveWell content, will shift GPT-3 more or less vs a same size batch of 80k content. It’s not clear that the input length of text would be a good measure, versus something like “perplexity of the fine tune text to the current GPT-3 output”. I haven’t trained a GPT-3 model though so I’m not sure.
Although, in some sense, it’s really hard/crazy to think about what this metric would be, besides something trivial like perplexity. Maybe this difficulty is what you want to avoid?
Hmmm. You’re focused on the input text.
Maybe, like, it seems like you want to focus on the “output” instead (and define some metric[1] relative to this output and the “your targeted performance of the model”) ?
In contrast to focusing on the output, focusing on the mix of input data seems different.
For example, it’s not clear that a pass with a batch of GiveWell content, will shift GPT-3 more or less vs a same size batch of 80k content. It’s not clear that the input length of text would be a good measure, versus something like “perplexity of the fine tune text to the current GPT-3 output”. I haven’t trained a GPT-3 model though so I’m not sure.
Although, in some sense, it’s really hard/crazy to think about what this metric would be, besides something trivial like perplexity. Maybe this difficulty is what you want to avoid?