In their most straightforward form (“foundation models”), language models are a technology which naturally scales to something in the vicinity of human-level (because it’s about emulating human outputs), not one that naturally shoots way past human-level performance
i.e. it is a mistake-in-principle to imagine projecting out the GPT-2—GPT-3—GPT-4 capability trend into the far-superhuman range
Surprised to see no pushback on this yet. I do not think this is true; I’ve come around to thinking that Eliezer is basically right that the limit of next token prediction on human generated text is superintelligence. Now how this latent ability manifests is a hard question, but it’s there to be used by the model for its own ends or elicited by humans for ours, or both.
Also worth adding (guessing this point has been made before) that non human-generated text (e.g. regression outputs from a program) are in the training data, so merely predicting those gets you superhuman performance in some domains.
Sorry, I think you’re reading me as saying something like “language models scaled naively up don’t do anything superhuman”? Whereas I’m trying to say something more like “language models scaled naively up break the trend line in the vicinity of human level, because the basic mechanism for improved capabilities that they had been using stops working, so they need to use other mechanisms (which probably move a bit slower)”.
If you disagree with that unpacking, I’m interested to hear it. If you agree with the unpacking and think that I’ve done a bad job summarizing it, I’m interested if you want to propose alternate wording.
I do discuss the stuff you’re talking about in several places in the doc, especially Sections 2.3, 4.1, and 6.2.
Surprised to see no pushback on this yet. I do not think this is true; I’ve come around to thinking that Eliezer is basically right that the limit of next token prediction on human generated text is superintelligence. Now how this latent ability manifests is a hard question, but it’s there to be used by the model for its own ends or elicited by humans for ours, or both.
Also worth adding (guessing this point has been made before) that non human-generated text (e.g. regression outputs from a program) are in the training data, so merely predicting those gets you superhuman performance in some domains.
Sorry, I think you’re reading me as saying something like “language models scaled naively up don’t do anything superhuman”? Whereas I’m trying to say something more like “language models scaled naively up break the trend line in the vicinity of human level, because the basic mechanism for improved capabilities that they had been using stops working, so they need to use other mechanisms (which probably move a bit slower)”.
If you disagree with that unpacking, I’m interested to hear it. If you agree with the unpacking and think that I’ve done a bad job summarizing it, I’m interested if you want to propose alternate wording.
I do discuss the stuff you’re talking about in several places in the doc, especially Sections 2.3, 4.1, and 6.2.