Surely a big part of the resolution is that GPT-3 is sample-inefficient in total, but sample-efficient on the margin?
Surely a big part of the resolution is that GPT-3 is sample-inefficient in total, but sample-efficient on the margin?