Kei comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

Kei 18 Apr 2023 1:56 UTC
1 point
0 ∶ 1
An AI that could perfectly predict human text would have a lot of capabilities that humans don’t have. (Note that it is impossible for any AI to perfectly predict human text, but an imperfect text-predictor may have weaker versions of many of the capabilities a perfect predictor would have.) Some examples include:
- Ability to predict future events: Lots of text on the internet describes something that happened in the real world. Examples might include the outcome of some sports game, whether a company’s stock goes up or down and by how much, or the result of some study or scientific research. Being able to predict such text would require the AI to have the ability to make strong predictions about complicated things.
- Reversibility: There are many tasks that are easy to do in one direction but much harder to do in the reverse direction. Examples include factoring a number (it’s easier to multiply two primes p and q to get a number N=pq, then to figure out p and q when given N), and hash functions (it’s easy to calculate the hash of a number, but almost impossible to calculate the original number from the hash). An AI trained to do the reverse, more difficult direction of such a task would be incentivized to do things more difficult than humans could do.
- Speed: Lots of text on the internet comes from very long and painstaking effort. If an AI can output the same thing a human can, but 100x faster, that is still a significant capability increase over humans.
- Volume of knowledge: Available human text spans a wider breadth of subject areas than any single person has expertise in. An AI trained on this text could have a broader set of knowledge than any human—and in fact by some definition this may already be the case with GPT-4. To the extent that making good decisions is helped by having internalized the right information, advanced models may be able to make good decisions that humans are not able to make themselves.
- Extrapolation: Modern LLMs can extrapolate to some degree from information provided in its training set. In some domains, this can result in LLMs performing tasks more complicated than any it had previously seen in the training data. It’s possible with the appropriate prompt, these models would be able to extrapolate to generate text that would be made by slightly smarter humans.
In addition to this, modern LLM model training typically consists of two steps, a standard predict the next word first training step, and a reinforcement learning based second step. Models trained with reinforcement learning can in principle become even better than models just trained with next-token prediction.