For example, how can AI automate the labour of scientists, philosophers, and journalists if it can’t understand novel ideas?
The bar is much lower because they are 100x faster and 1000x cheaper than me. They open up a bunch of brute forceable techniques in the same way that you can open up https://projecteuler.net/ solve many of eulers discoveries with little math knowledge but basic python and for loops.
Math → re read every arxiv paper → translate them all into lean → aggregate every open well specificied math problem → use the database of all previous learnings to see if you can chain chunks of previous problems together to solve.
clinical medicine → re-read every RCT ever done and comprehensively rank intervention effectiveness by disease → find cost data where available and rank the cost/qaly of all disease/intervention space
Econometrics → aggregate every natural experiment and instrumental variable ever used in an econometrics paper → think about other use cases for these tools → search if other use cases have available data → reapply the general theory of the original paper with the new data.
First, why do you think LLMs haven’t already done of any of these things?
Second, even if LLMs could do these things, they couldn’t automate all of human labour, and this isn’t an argument that they could. This is an argument that LLMs could do some really useful things, not that they could do all the useful things that human workers do.
Unless, I guess, if you think there’s no such thing as something so novel it can’t be understood by LLMs based on existing knowledge, but then this would be equivalent to arguing that LLMs have or will have a very high level of data efficiency.
i’m fleshing out nunos point a bit. Basically AI have so many systematic advantages with their cost/speed/seemless integration into the digital world that they can afford to be worse than humans at a variety of things and still automate (most/all/some) work. Just as a plane doesn’t need to flap it’s wings. Of course I wasn’t saying I solved automating the economy. I’m just showing you ways in which something lacking some top level human common sense/iq/whatever could replace still.
FWIW I basically disagree with every point you made in the summary. This mostly just comes from using these tools every day and getting utility out of them + seeing how fast they are improving + seeing how many different routes there are to improvement (i was quite skeptical a year ago, not so anymore). But I wanted to keep the argument contained and isolate a point of disagreement.
I want to try to separate out a few different ideas because I worry they might get confused together.
Are actual existing LLMs good at discovering novel ideas? No. They haven’t discovered anything useful in any domain yet. They haven’t come up with any interesting new idea in science, math, economics, medicine, or anything.
Could LLMs eventually discover novel ideas in the way you described? I don’t think so. I think you’re saying you think this will happen. Okay, so, why? What are LLMs missing now that they will have in, say, 5 years that will mean they make the jump from zero novel ideas to lots of novel ideas? Is it just scale?
Would an AI system that can’t learn new ideas from one example or a few examples count as AGI? No, I don’t think so.
Would an AI system that can’t learn new ideas from one example or a few examples be able to automate all human labour? No, I don’t think so because this kind of learning is part of many different jobs, such as scientist, philosopher, and journalist, and also taxi driver (per the above point about autonomous vehicles).
I do use ChatGPT every day and find it to be a useful tool for what I use it for, which is mainly a form of search engine. I used ChatGPT when it first launched, as well as GPT-4 when it first launched, and have been following the progress.
Everything is relative to expectations. If I’m judging ChatGPT based on the expectations of a typical consumer tech product, or even a cool AI science experiment, then I do find the progress impressive. On the other hand, if I’m judging ChatGPT as a potential precursor to AGI, I don’t find the progress particularly impressive.
I guess I don’t see the potential routes to improvement that you see. The ones that I’ve seen discussed don’t strike me as promising.
https://x.com/slow_developer/status/1979157947529023997 I would bet a lot of money you are going to see exactly what I described for math in the next two years. The capabilities literally just exploded. It took us like 20 years to start using the lightbulb but you are expecting results from products that came out in the last few weeks/months.
I can also confidently say because I am working on a project with doctors that the work I described for clinical medicine is being tested and happening right now. It’s exact usefulness remains to be seen but like people are trying exactly what I described, there will be some lag as people need to learn how to use the tools best and then distribute their results.
Again, I don’t think most of this stuff was particularly useful with the tools available to use >1 year ago.
>Would an AI system that can’t learn new ideas from one example or a few examples count as AGI?
The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).
The bar is much lower because they are 100x faster and 1000x cheaper than me. They open up a bunch of brute forceable techniques in the same way that you can open up https://projecteuler.net/ solve many of eulers discoveries with little math knowledge but basic python and for loops.
Math → re read every arxiv paper → translate them all into lean → aggregate every open well specificied math problem → use the database of all previous learnings to see if you can chain chunks of previous problems together to solve.
clinical medicine → re-read every RCT ever done and comprehensively rank intervention effectiveness by disease → find cost data where available and rank the cost/qaly of all disease/intervention space
Econometrics → aggregate every natural experiment and instrumental variable ever used in an econometrics paper → think about other use cases for these tools → search if other use cases have available data → reapply the general theory of the original paper with the new data.
I’m not sure if I understand what you’re arguing.
First, why do you think LLMs haven’t already done of any of these things?
Second, even if LLMs could do these things, they couldn’t automate all of human labour, and this isn’t an argument that they could. This is an argument that LLMs could do some really useful things, not that they could do all the useful things that human workers do.
Unless, I guess, if you think there’s no such thing as something so novel it can’t be understood by LLMs based on existing knowledge, but then this would be equivalent to arguing that LLMs have or will have a very high level of data efficiency.
i’m fleshing out nunos point a bit. Basically AI have so many systematic advantages with their cost/speed/seemless integration into the digital world that they can afford to be worse than humans at a variety of things and still automate (most/all/some) work. Just as a plane doesn’t need to flap it’s wings. Of course I wasn’t saying I solved automating the economy. I’m just showing you ways in which something lacking some top level human common sense/iq/whatever could replace still.
FWIW I basically disagree with every point you made in the summary. This mostly just comes from using these tools every day and getting utility out of them + seeing how fast they are improving + seeing how many different routes there are to improvement (i was quite skeptical a year ago, not so anymore). But I wanted to keep the argument contained and isolate a point of disagreement.
I want to try to separate out a few different ideas because I worry they might get confused together.
Are actual existing LLMs good at discovering novel ideas? No. They haven’t discovered anything useful in any domain yet. They haven’t come up with any interesting new idea in science, math, economics, medicine, or anything.
Could LLMs eventually discover novel ideas in the way you described? I don’t think so. I think you’re saying you think this will happen. Okay, so, why? What are LLMs missing now that they will have in, say, 5 years that will mean they make the jump from zero novel ideas to lots of novel ideas? Is it just scale?
Would an AI system that can’t learn new ideas from one example or a few examples count as AGI? No, I don’t think so.
Would an AI system that can’t learn new ideas from one example or a few examples be able to automate all human labour? No, I don’t think so because this kind of learning is part of many different jobs, such as scientist, philosopher, and journalist, and also taxi driver (per the above point about autonomous vehicles).
I do use ChatGPT every day and find it to be a useful tool for what I use it for, which is mainly a form of search engine. I used ChatGPT when it first launched, as well as GPT-4 when it first launched, and have been following the progress.
Everything is relative to expectations. If I’m judging ChatGPT based on the expectations of a typical consumer tech product, or even a cool AI science experiment, then I do find the progress impressive. On the other hand, if I’m judging ChatGPT as a potential precursor to AGI, I don’t find the progress particularly impressive.
I guess I don’t see the potential routes to improvement that you see. The ones that I’ve seen discussed don’t strike me as promising.
https://x.com/slow_developer/status/1979157947529023997
I would bet a lot of money you are going to see exactly what I described for math in the next two years. The capabilities literally just exploded. It took us like 20 years to start using the lightbulb but you are expecting results from products that came out in the last few weeks/months.
I can also confidently say because I am working on a project with doctors that the work I described for clinical medicine is being tested and happening right now. It’s exact usefulness remains to be seen but like people are trying exactly what I described, there will be some lag as people need to learn how to use the tools best and then distribute their results.
Again, I don’t think most of this stuff was particularly useful with the tools available to use >1 year ago.
>Would an AI system that can’t learn new ideas from one example or a few examples count as AGI?
https://www.anthropic.com/news/skills
you are going to need to be a lot more precise in your definitions imo otherwise we are going to talk past each other.
The math example you cited doesn’t seem to an example of an LLM coming up with a novel idea in math. It just sounds like mathematicians are using an LLM as a search tool. I agree that LLMs are really useful for search, but this is a far cry from an LLM actually coming up with a novel idea itself.
The point you raise about LLMs doing in-context learning is ably discussed the video I embedded in the post.
“novel idea” means almost nothing to me. A math proof is simply a->b. It doesn’t matter how you figure out a->b. If you can figure it out by reading 16 million papers and clicking them together that still counts. There are many ways to cook an egg.
I don’t think the LLMs in this case are clicking them together. Rather, it seems like the LLMs are being used as a search tool for human mathematicians who are clicking them together.
If you could give the LLM a prompt along the lines of, “Read the mathematics literature and come up with some new proofs based on that,” and it could do it, then I would count that as an LLM successfully coming up with a proof, and with a novel idea.
Based on the tweets you linked to, what seems to be happening is that the LLMs are being used as a search tool like Google Scholar, and it’s the mathematicians coming up with the proofs, not the search engine.
Sure that’s a fair point. I’d guess I hope you would feel at least a little pushed in the direction after this thread that AIs need not take a similar route to humans to automating large amounts of our current work.
LLMs may have some niches in which they enhance productivity, such as by serving as an advanced search engine or text search tool for mathematicians. This is quite different than AGI and quite different from either:
a) LLMs having a broad impact on productivity across the economy (which would not necessarily amount to AGI but which would be economically significant)
or
b) LLMs fully automating jobs by acting autonomously and doing hierarchical planning over very long time horizons (which is the sort of thing AGI would have to be capable of doing to meet the conventional definition of AGI).
If you want to argue LLMs will get from their current state where they can’t do (a) or (b) to a state where they will be able to do (a) and/or (b), then I think you have to address my arguments in the post about LLMs’ apparent fundamental weaknesses (e.g. the Tower of Hanoi example seems stark to me) and what I said about the obstacles to scaling LLMs further (e.g. Epoch AI estimates that data may run out around 2028).