I agree that it’s best to think of GPT as a predictor, to expect it to think in ways very unlike humans, and to expect it to become much smarter than a human in the limit.
That said, there’s an important further question that isn’t determined by the loss function alone—does the model do its most useful cognition in order to predict what a human would say, or via predicting what a human would say?
To illustrate, we can imagine asking the model to either (i) predict the outcome of a news story, (ii) predict a human thinking step-by-step about what will happen next in a news story. To the extent that (ii) is smarter than (i), it indicates that some significant part of the model’s cognitive ability is causally downstream of “predict what a human would say next,” rather than being causally upstream of it. The model has learned to copy useful cognitive steps performed by humans, which produce correct conclusions when executed by the model for the same reasons they produce correct conclusions when executed by humans.
(In fact (i) is smarter than (ii) in some ways, because the model has a lot of tacit knowledge about news stories that humans lack, but (ii) is smarter than (i) in other ways, and in general having models imitate human cognitive steps seems like the most useful way to apply them to most economically relevant tasks.)
Of course in the limit it’s overdetermined that the model will be smart in order to predict what a human would say, and will have no use for copying along with the human’s steps except insofar as this gives it (a tiny bit of) additional compute. But I would expect to AI to be transformative well before approaching that limit, so that this will remain an empirical question.
GPT-4 is still not as smart as a human in many ways, but it’s naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.
I don’t think this is totally meaningful. Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems. But:
Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn’t super meaningful.
To the extent that GPT-4 and humans are both optimizing a loss function, getting a nearly perfect genetic fitness is probably harder than getting a nearly perfect log loss.
Getting a GPT-4 level loss on GPT-4′s task is probably much easier than getting a human-level loss on the human task.
The conditional GAN task (given some text, complete it in a way that looks human-like) is just even harder than the autoregressive task, so I’m not sure I’d stick with that analogy.
I think that >50% of the time when people talk about “imitation” they mean autoregressive models; GANs and IRL are still less common than behavioral cloning. (Not sure about that.)
I agree that “figure out who to simulate, then simulate them” is probably a bad description of the cognition GPT does, even if a lot of its cognitive ability comes from copying human cognitive processes.
For what it’s worth, I think Eliezer’s post was primarily directed at people who have spent a lot less time thinking about this stuff than you, and that this sentence:
“Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems.”
Is the whole point of his post, and is not at all obvious to even very smart people who haven’t spent much time thinking about the problem. I’ve had a few conversations with e.g. skilled Google engineers who have said things like “even if we make really huge neural nets with lots of parameters, they have to cap out at human-level intelligence, since the internet itself is human-level intelligence,” and then I bring up the hash/plaintext example (which I doubt I’d have thought of if I hadn’t already seen Eliezer point it out) and they’re like “oh, you’re right… huh.”
I think the point Eliezer’s making in this post is just a very well-fleshed out version of the hash/plaintext point (and making it clear that the basic concept isn’t just confined to that one narrow example), and is actually pretty significant and non-obvious, and it only feels obvious because it has one of the nice property of simple, good ideas, of being “impossible to unsee” once you’ve seen it.
Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn’t super meaningful.
I don’t understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn’t near the upper limit?
As an analogy: consider a variant of rock paper scissors where you get to see your opponent’s move in advance—but it’s encrypted with RSA. In some sense this game is much harder than proving Fermat’s last theorem, since playing optimally requires breaking the encryption scheme. But if you train a policy and find that it wins 33% of the time at encrypted rock paper scissors, it’s not super meaningful or interesting to say that the task is super hard, and in the relevant intuitive sense it’s an easier task than proving Fermat’s last theorem.
I agree that it’s best to think of GPT as a predictor, to expect it to think in ways very unlike humans, and to expect it to become much smarter than a human in the limit.
That said, there’s an important further question that isn’t determined by the loss function alone—does the model do its most useful cognition in order to predict what a human would say, or via predicting what a human would say?
To illustrate, we can imagine asking the model to either (i) predict the outcome of a news story, (ii) predict a human thinking step-by-step about what will happen next in a news story. To the extent that (ii) is smarter than (i), it indicates that some significant part of the model’s cognitive ability is causally downstream of “predict what a human would say next,” rather than being causally upstream of it. The model has learned to copy useful cognitive steps performed by humans, which produce correct conclusions when executed by the model for the same reasons they produce correct conclusions when executed by humans.
(In fact (i) is smarter than (ii) in some ways, because the model has a lot of tacit knowledge about news stories that humans lack, but (ii) is smarter than (i) in other ways, and in general having models imitate human cognitive steps seems like the most useful way to apply them to most economically relevant tasks.)
Of course in the limit it’s overdetermined that the model will be smart in order to predict what a human would say, and will have no use for copying along with the human’s steps except insofar as this gives it (a tiny bit of) additional compute. But I would expect to AI to be transformative well before approaching that limit, so that this will remain an empirical question.
I don’t think this is totally meaningful. Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems. But:
Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn’t super meaningful.
To the extent that GPT-4 and humans are both optimizing a loss function, getting a nearly perfect genetic fitness is probably harder than getting a nearly perfect log loss.
Getting a GPT-4 level loss on GPT-4′s task is probably much easier than getting a human-level loss on the human task.
Smaller notes:
The conditional GAN task (given some text, complete it in a way that looks human-like) is just even harder than the autoregressive task, so I’m not sure I’d stick with that analogy.
I think that >50% of the time when people talk about “imitation” they mean autoregressive models; GANs and IRL are still less common than behavioral cloning. (Not sure about that.)
I agree that “figure out who to simulate, then simulate them” is probably a bad description of the cognition GPT does, even if a lot of its cognitive ability comes from copying human cognitive processes.
For what it’s worth, I think Eliezer’s post was primarily directed at people who have spent a lot less time thinking about this stuff than you, and that this sentence:
“Getting perfect loss on the task of being GPT-4 is obviously much harder than being a human, and so gradient descent on its loss could produce wildly superhuman systems.”
Is the whole point of his post, and is not at all obvious to even very smart people who haven’t spent much time thinking about the problem. I’ve had a few conversations with e.g. skilled Google engineers who have said things like “even if we make really huge neural nets with lots of parameters, they have to cap out at human-level intelligence, since the internet itself is human-level intelligence,” and then I bring up the hash/plaintext example (which I doubt I’d have thought of if I hadn’t already seen Eliezer point it out) and they’re like “oh, you’re right… huh.”
I think the point Eliezer’s making in this post is just a very well-fleshed out version of the hash/plaintext point (and making it clear that the basic concept isn’t just confined to that one narrow example), and is actually pretty significant and non-obvious, and it only feels obvious because it has one of the nice property of simple, good ideas, of being “impossible to unsee” once you’ve seen it.
I don’t understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn’t near the upper limit?
As an analogy: consider a variant of rock paper scissors where you get to see your opponent’s move in advance—but it’s encrypted with RSA. In some sense this game is much harder than proving Fermat’s last theorem, since playing optimally requires breaking the encryption scheme. But if you train a policy and find that it wins 33% of the time at encrypted rock paper scissors, it’s not super meaningful or interesting to say that the task is super hard, and in the relevant intuitive sense it’s an easier task than proving Fermat’s last theorem.