https://x.com/slow_developer/status/1979157947529023997 I would bet a lot of money you are going to see exactly what I described for math in the next two years. The capabilities literally just exploded. It took us like 20 years to start using the lightbulb but you are expecting results from products that came out in the last few weeks/months.
I can also confidently say because I am working on a project with doctors that the work I described for clinical medicine is being tested and happening right now. It’s exact usefulness remains to be seen but like people are trying exactly what I described, there will be some lag as people need to learn how to use the tools best and then distribute their results.
Again, I don’t think most of this stuff was particularly useful with the tools available to use >1 year ago.
>Would an AI system that can’t learn new ideas from one example or a few examples count as AGI?
The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).
https://x.com/slow_developer/status/1979157947529023997
I would bet a lot of money you are going to see exactly what I described for math in the next two years. The capabilities literally just exploded. It took us like 20 years to start using the lightbulb but you are expecting results from products that came out in the last few weeks/months.
I can also confidently say because I am working on a project with doctors that the work I described for clinical medicine is being tested and happening right now. It’s exact usefulness remains to be seen but like people are trying exactly what I described, there will be some lag as people need to learn how to use the tools best and then distribute their results.
Again, I don’t think most of this stuff was particularly useful with the tools available to use >1 year ago.
>Would an AI system that can’t learn new ideas from one example or a few examples count as AGI?
https://www.anthropic.com/news/skills
you are going to need to be a lot more precise in your definitions imo otherwise we are going to talk past each other.
The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).