The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).
The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).