I think an easier way to see it is realising that its loss is uniform over all pieces of text, whereas humans only care about predicting an extreme minority of that text. If you see a sentence like
“It was a rainy day in Nairobi, the capital of_”
...it’s obvious to you that the salient piece of knowledge here is what country Nairobi is the capital of, so that’s how you design your benchmarks for AI performance. But the AI cares equally about predicting ‘capital’ after ‘the’, and ‘rainy’ after ‘It was a’. GPT-2 was already past human level at almost all text except the very selective subset humans put all their optimisation into (e.g. answers to math tests, long-term coherence in stories, etc.).
And yet GPT-4 rivals us at what we care about.
It’s comparable to a science fiction author who only cares about writing better stories yet ends up rivalling top scientists in every field as an instrumental side quest. Human-centric benchmarks[1] vastly underestimate the objective intelligence and generality of GPTs.
I think an easier way to see it is realising that its loss is uniform over all pieces of text, whereas humans only care about predicting an extreme minority of that text. If you see a sentence like
...it’s obvious to you that the salient piece of knowledge here is what country Nairobi is the capital of, so that’s how you design your benchmarks for AI performance. But the AI cares equally about predicting ‘capital’ after ‘the’, and ‘rainy’ after ‘It was a’. GPT-2 was already past human level at almost all text except the very selective subset humans put all their optimisation into (e.g. answers to math tests, long-term coherence in stories, etc.).
And yet GPT-4 rivals us at what we care about.
It’s comparable to a science fiction author who only cares about writing better stories yet ends up rivalling top scientists in every field as an instrumental side quest. Human-centric benchmarks[1] vastly underestimate the objective intelligence and generality of GPTs.
Lessons from Are We Smart Enough to Know How Smart Animals Are seem relevant here.