This question is more about ASI, but here goes: If LLMs are trained on human writings, what is the current understanding for how an ASI/AGI could get smarter than humans? Would it not just asymptotically approach human intelligence levels? It seems to be able to get smarter learning more and more from the training set, but the training set also only knows so much.
I should have clarified that LW post is the post on which I based my question, so here is a more fleshed out version: Because GPTs are trained on human data, and given that humans make mistakes and don’t have complete understanding of most situations, it seems highly implausible to me that enough information can be extracted from text/images to make a valid prediction of highly complex/abstract topics because of the imprecision of language.
Yudkowsky says of GPT-4:
It is being asked to model what you were thinking—the thoughts in your mind whose shadow is your text output—so as to assign as much probability as possible to your true next word.
How do we know it will be able to extract enough information from the shadow to be able to reconstruct the thoughts? Text has comparatively little information to characterize such a complex system. It reminds me of the difficulty of problems like the inverse scattering problem or CT scan computation where underlying structure is very complex, and all you get is a low-dimensional projection of it which may or may not be solvable to obtain the original complex structure. CT scans can find tumors, but they can’t tell you which gene mutated because they just don’t have enough resolution.
Yudkowsky gives this as an example in the article:
“Imagine a Mind of a level where it can hear you say ‘morvelkainen blaambla ringa’, and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is ‘mongo’.”
I understand that it would be evidence of extreme intelligence to make that kind of prediction, but I don’t see how the path to such a conclusion can be made solely from its training data.
Going further, because the training data is from humans (who, as mentioned, make mistakes and have an incomplete understanding of the world), it seems highly unlikely that the model would have the ability to produce new concepts in something exact as, for example, math and science if its understanding of causality is solely based on predicting something as unpredictable as human behavior, even if it’s really good. Why should we assume that a model, even a really big one, would converge to understanding the laws of physics well enough to make new discoveries based on human data alone? Is the idea behind ASI that it will even come from LLMs? If so, I am very curious to hear the theory for how that will develop that I am not grasping here.
Yep that’s a fair argument, and I don’t have a knockdown case that predicting human generated data will result in great abilities.
One bit of evidence is that people used to be really pessimistic that scaling up imitation would do anything interesting, this paper was a popular knockdown arguing language models could never understand the physical world, but most of the substantive predictions of that line of thinking have been wrong and those people have largely retreated to semantics debates about the meaning of “understanding”. Scaling has gone further than many people expected, and could continue.
Another argument would be that pretraining on human data has a ceiling, but RL fine-tuning on downstream objectives will be much more efficient after pretraining and will allow AI to surpass the human level.
But again, there are plenty of people who think GPT will not scale to superintelligence—Eliezer, Gary Marcus, Yann Lecun—and it’s hard to predict these things in advance.
In theory, the best way to be the best next-word-predictor is to model humans. Internally, humans model the world they live in. A sufficiently powerful human modeler would likely model the world the humans live in. Further, humans reason and so a really good next-word-predictor would be able to more accurately predict the next word by reasoning. Similarly, it is an optimization strategy to develop other cognitive abilities, logic, etc.
All of this allows you to predict the correct next word with less “neurons” because it takes fewer neurons to learn how to do logical deduction and memorize some premises than it takes to memorize all of the possible outputs that some future prompt may require.
The fact that we train on human data just means that we are training the AI to be able to reason and critically think in the same way we do. Once it has that ability, we can then “scale it up”, which is something humans really struggle with.
An AI that could perfectly predict human text would have a lot of capabilities that humans don’t have. (Note that it is impossible for any AI to perfectly predict human text, but an imperfect text-predictor may have weaker versions of many of the capabilities a perfect predictor would have.) Some examples include:
Ability to predict future events: Lots of text on the internet describes something that happened in the real world. Examples might include the outcome of some sports game, whether a company’s stock goes up or down and by how much, or the result of some study or scientific research. Being able to predict such text would require the AI to have the ability to make strong predictions about complicated things.
Reversibility: There are many tasks that are easy to do in one direction but much harder to do in the reverse direction. Examples include factoring a number (it’s easier to multiply two primes p and q to get a number N=pq, then to figure out p and q when given N), and hash functions (it’s easy to calculate the hash of a number, but almost impossible to calculate the original number from the hash). An AI trained to do the reverse, more difficult direction of such a task would be incentivized to do things more difficult than humans could do.
Speed: Lots of text on the internet comes from very long and painstaking effort. If an AI can output the same thing a human can, but 100x faster, that is still a significant capability increase over humans.
Volume of knowledge: Available human text spans a wider breadth of subject areas than any single person has expertise in. An AI trained on this text could have a broader set of knowledge than any human—and in fact by some definition this may already be the case with GPT-4. To the extent that making good decisions is helped by having internalized the right information, advanced models may be able to make good decisions that humans are not able to make themselves.
Extrapolation: Modern LLMs can extrapolate to some degree from information provided in its training set. In some domains, this can result in LLMs performing tasks more complicated than any it had previously seen in the training data. It’s possible with the appropriate prompt, these models would be able to extrapolate to generate text that would be made by slightly smarter humans.
In addition to this, modern LLM model training typically consists of two steps, a standard predict the next word first training step, and a reinforcement learning based second step. Models trained with reinforcement learning can in principle become even better than models just trained with next-token prediction.
This question is more about ASI, but here goes: If LLMs are trained on human writings, what is the current understanding for how an ASI/AGI could get smarter than humans? Would it not just asymptotically approach human intelligence levels? It seems to be able to get smarter learning more and more from the training set, but the training set also only knows so much.
I think it’s because predicting exactly what someone will say is more difficult than just sounding something like them. Eliezer Yudkowsky wrote about it here: https://www.lesswrong.com/posts/nH4c3Q9t9F3nJ7y8W/gpts-are-predictors-not-imitators
I should have clarified that LW post is the post on which I based my question, so here is a more fleshed out version: Because GPTs are trained on human data, and given that humans make mistakes and don’t have complete understanding of most situations, it seems highly implausible to me that enough information can be extracted from text/images to make a valid prediction of highly complex/abstract topics because of the imprecision of language.
Yudkowsky says of GPT-4:
How do we know it will be able to extract enough information from the shadow to be able to reconstruct the thoughts? Text has comparatively little information to characterize such a complex system. It reminds me of the difficulty of problems like the inverse scattering problem or CT scan computation where underlying structure is very complex, and all you get is a low-dimensional projection of it which may or may not be solvable to obtain the original complex structure. CT scans can find tumors, but they can’t tell you which gene mutated because they just don’t have enough resolution.
Yudkowsky gives this as an example in the article:
I understand that it would be evidence of extreme intelligence to make that kind of prediction, but I don’t see how the path to such a conclusion can be made solely from its training data.
Going further, because the training data is from humans (who, as mentioned, make mistakes and have an incomplete understanding of the world), it seems highly unlikely that the model would have the ability to produce new concepts in something exact as, for example, math and science if its understanding of causality is solely based on predicting something as unpredictable as human behavior, even if it’s really good. Why should we assume that a model, even a really big one, would converge to understanding the laws of physics well enough to make new discoveries based on human data alone? Is the idea behind ASI that it will even come from LLMs? If so, I am very curious to hear the theory for how that will develop that I am not grasping here.
Yep that’s a fair argument, and I don’t have a knockdown case that predicting human generated data will result in great abilities.
One bit of evidence is that people used to be really pessimistic that scaling up imitation would do anything interesting, this paper was a popular knockdown arguing language models could never understand the physical world, but most of the substantive predictions of that line of thinking have been wrong and those people have largely retreated to semantics debates about the meaning of “understanding”. Scaling has gone further than many people expected, and could continue.
Another argument would be that pretraining on human data has a ceiling, but RL fine-tuning on downstream objectives will be much more efficient after pretraining and will allow AI to surpass the human level.
But again, there are plenty of people who think GPT will not scale to superintelligence—Eliezer, Gary Marcus, Yann Lecun—and it’s hard to predict these things in advance.
In theory, the best way to be the best next-word-predictor is to model humans. Internally, humans model the world they live in. A sufficiently powerful human modeler would likely model the world the humans live in. Further, humans reason and so a really good next-word-predictor would be able to more accurately predict the next word by reasoning. Similarly, it is an optimization strategy to develop other cognitive abilities, logic, etc.
All of this allows you to predict the correct next word with less “neurons” because it takes fewer neurons to learn how to do logical deduction and memorize some premises than it takes to memorize all of the possible outputs that some future prompt may require.
The fact that we train on human data just means that we are training the AI to be able to reason and critically think in the same way we do. Once it has that ability, we can then “scale it up”, which is something humans really struggle with.
An AI that could perfectly predict human text would have a lot of capabilities that humans don’t have. (Note that it is impossible for any AI to perfectly predict human text, but an imperfect text-predictor may have weaker versions of many of the capabilities a perfect predictor would have.) Some examples include:
Ability to predict future events: Lots of text on the internet describes something that happened in the real world. Examples might include the outcome of some sports game, whether a company’s stock goes up or down and by how much, or the result of some study or scientific research. Being able to predict such text would require the AI to have the ability to make strong predictions about complicated things.
Reversibility: There are many tasks that are easy to do in one direction but much harder to do in the reverse direction. Examples include factoring a number (it’s easier to multiply two primes p and q to get a number N=pq, then to figure out p and q when given N), and hash functions (it’s easy to calculate the hash of a number, but almost impossible to calculate the original number from the hash). An AI trained to do the reverse, more difficult direction of such a task would be incentivized to do things more difficult than humans could do.
Speed: Lots of text on the internet comes from very long and painstaking effort. If an AI can output the same thing a human can, but 100x faster, that is still a significant capability increase over humans.
Volume of knowledge: Available human text spans a wider breadth of subject areas than any single person has expertise in. An AI trained on this text could have a broader set of knowledge than any human—and in fact by some definition this may already be the case with GPT-4. To the extent that making good decisions is helped by having internalized the right information, advanced models may be able to make good decisions that humans are not able to make themselves.
Extrapolation: Modern LLMs can extrapolate to some degree from information provided in its training set. In some domains, this can result in LLMs performing tasks more complicated than any it had previously seen in the training data. It’s possible with the appropriate prompt, these models would be able to extrapolate to generate text that would be made by slightly smarter humans.
In addition to this, modern LLM model training typically consists of two steps, a standard predict the next word first training step, and a reinforcement learning based second step. Models trained with reinforcement learning can in principle become even better than models just trained with next-token prediction.