I donât think the pay for a person solving problems for Waymos would need to be that much higher than taxi drivers but even if it is 1.5x or 2x, that would still be an order of magnitude reduction in labor cost. Of course there is additional hardware cost for autonomous vehicles, but I think that can be paid off with a reasonable duty cycle. So then if you grant that the geofencing is less than a 20x safety advantage, I think there is an economic case for the chimeras, as you say.
This could be true, but you have to account for other elements of the cost structure. For example, can you improve the ratio of engineers to autonomous vehicles from the current ratio of around 1:1 or 1:2 to something like 1:1000 or 1:10,000?
It seems like Waymo is using methods that scale with engineer labour rather than learning methods that scale with data and compute. So, to deploy more vehicles in more areas would require commensurately more engineers, and, in addition to being too expensive, there are simply not enough of them on Earth.
As for AGI, I do think that some people conflate the 50% success time horizon at coding tasks reaching ~one month as AGI, when it really only means superhuman coder (I think something like 50% success rate is appropriate for long coding tasks, because humans rarely produce code that works on the first try).
It doesnât mean that, either. METR has found that current frontier AI systems are worse in real world, practical use cases than not using AI at all.
Automatically gradable benchmarks generally donât seem to have much to do with the ability to do tasks in the real world. So, predicting real world performance based on benchmark performance seems to just be an invalid inference.
Anecdotally, what I hear from people who say they find AI coding assistants useful is that it saves them the time it would take to copy and paste code from Stack Exchange. I have never heard anything along the lines of âit came up with a new ideaâ or âit was creativeâ. Yet this is what human-level coding would require.
However, once you get to superhuman coder, the logic goes that the progress in AI research will dramatically accelerate. In AI 2027, I think there was some assumption like it would take 90 years at current progress to get something like AGI, but when things accelerate, it ends up happening in a year or two.
The obvious objection to this as it pertains to the initial advent of AGI, rather than to superintelligence, is that this is a chicken-and-egg problem. If you need AGI to do AGI R&D, AGI canât help you develop AGI because you havenât invented it yet. You would need a sub-human AI that can do tasks that speed up AI research and AI engineering. And that seems dubious. Is automating this kind of work not an AGI-level problem?
Youâre right itâs not fully general, but AI systems are much more general than they were 7 years ago.
I donât know if I believe this. LLMs are impressive, but their scope is fairly narrow. They have memorized most of the important digital/âdigitized text thatâs available. Their next-token prediction and everything layered on top of that like fine-tuning to follow instructions, reinforcement learning from human feedback (RLHF), and Chain of Thought results in some impressive behaviours. But they are extremely brittle. They routinely make errors on very basic tasks.
I think of LLMs more like another type of AI system that are proficient in one area, comparable to game-playing RL agents. LLMs are good at many text-related tasks (including math and coding, which are also text) but they arenât able to generalize beyond the text-related tasks they have massive amounts of training data for. They donât do well outside of text-related tasks, they donât do well with novelty, they frequently fail to reason properly, etc.
So, Iâm not sure LLMs are all that more general than previous systems like MuZero and AlphaZero.
Part of generality or generalization is that you should see positive transfer learning, i.e., having skills in some domains should improve the AI systemâs skills in other domains. But it seems like we see is the opposite. That is, it seems like we see negative transfer learning. Training an AI on many diverse, heterogeneous tasks from multiple domains seems to hurt performance. Thatâs narrowness, not generality.
LessWrong is also known for having very short timelines, but I think the last survey indicated median time for AGI was 2040. So I do think that it is a vocal minority in EA and LW that have median timelines before 2030.
Thatâs very interesting, if you remember correctly. I would be interested in seeing survey data both for LessWrong and for EA.
If you need AGI to do AGI R&D, AGI canât help you develop AGI because you havenât invented it yet. You would need a sub-human AI that can do tasks that speed up AI research and AI engineering. And that seems dubious. Is automating this kind of work not an AGI-level problem?
So, Iâm not sure LLMs are all that more general than previous systems like MuZero and AlphaZero.
I donât think you can have it both ways: A superhuman coder (that is actually competent, which you donât think AI assistants are now) is relatively narrow AI, but would accelerate AI progress. A superhuman AI researcher is more general (which would drastically speed up AI progress), but is not fully general. I would argue that LLMs now are more general than AI researcher tasks (though LLMs are currently not good at all of those tasks), because LLMs can competently discuss philosophy, economics, political science, art, history, engineering, science, etc.
For example, can you improve the ratio of engineers to autonomous vehicles from the current ratio of around 1:1 or 1:2 to something like 1:1000 or 1:10,000?
Iâm claiming that they could approach overall staff to vehicle ratio of 1:10 if the number of real-time helpers (which donât have to be engineers) and vehicles were dramatically scaled up, and thatâs enough for profitability.
I would be interested in seeing survey data both for LessWrong and for EA.
The 2023 LessWrong survey was median 2040 for singularity, and 2030 for âBy what year do you think AI will be able to do intellectual tasks that expert humans currently do?â. The second question was ambiguous, and some people put it in the past. I havenât seen a similar survey result for EAs, but I expect longer timelines than LW.
I donât think you can have it both ways: A superhuman coder (that is actually competent, which you donât think AI assistants are now) is relatively narrow AI, but would accelerate AI progress. A superhuman AI researcher is more general (which would drastically speed up AI progress), but is not fully general.
I definitely disagree with this. Hopefully what I say below will explain why.
I would argue that LLMs now are more general than AI researcher tasks (though LLMs are currently not good at all of those tasks), because LLMs can competently discuss philosophy, economics, political science, art, history, engineering, science, etc.
The general in artificial general intelligence doesnât just refer to having a large repertoire of skills. Generality is about the ability to learn to generalize beyond what a system has seen in its training data. An artificial general intelligence doesnât just need to have new skills, it needs to be able to acquire new skills, and to acquire new skills that have never existed in history before by developing them itself â just as humans do.
If a new video game comes out today, Iâm able to play that game and develop a new skill that has never existed before.[1] I will probably get the hang of it in a few minutes, with a few attempts. Thatâs general intelligence.
AlphaStar was not able to figure out how to play StarCraft using pure reinforcement learning. It just got stuck using its builders to attack the enemy, rather than figuring out how to use its builders to make buildings that produce units that attack. To figure out the basics of the game, it needed to do imitation learning on a very large dataset of human play. Then, after imitation learning, to get as good as it did, it needed to do an astronomical amount of self-play, around 60,000 years of playing StarCraft. Thatâs not general intelligence. If you need to copy a large dataset of human examples to acquire a skill and millennia of training on automatically gradable, relatively short time horizon tasks (which often donât exist in the real world), thatâs something, and itâs even something impressive, but itâs not general intelligence.
Letâs say you wanted to apply this kind of machine learning to AI R&D. The necessary conditions donât apply. You donât have a large dataset of human examples to train on. You donât have automatically gradable, relatively short time horizon tasks with which to do reinforcement learning. And if the tasks require real world feedback and canât be simulated, you certainly donât have 60,000 years.
I like what the AI researcher François Chollet has to say about this topic in this video from 11:45 to 20:00. He draws the distinction between crystallized behaviours and fluid intelligence, between skills and the ability to learn skills. I think this is important. This is really what the whole topic of AGI is about.
Why have LLMs absorbed practically all text on philosophy, economics, political science, art, history, engineering, science, and so on and not come up with a single novel and correct idea of any note in any of these domains? They are not able to generalize enough to do so. They can generalize or interpolate a little bit beyond their training data, but not very much. Itâs that generalization ability (which is mostly missing in LLMs) thatâs the holy grail in AI research.
Iâm claiming that they could approach overall staff to vehicle ratio of 1:10 if the number of real-time helpers (which donât have to be engineers) and vehicles were dramatically scaled up, and thatâs enough for profitability.
There are two concepts here. One is remote human assistance, which Waymo calls fleet response. The other is Waymoâs approach to the engineering problem. I was saying that I suspect Waymoâs approach to the engineering problem doesnât scale. I think it probably relies on engineers doing too much special casing that doesnât generalize well when a modest amount of novelty is introduced. So, Waymo currently has something like 1,500 engineers to operate in the comparatively small geofenced areas where it currently operates. If it wanted to expand where it drives to a 10x larger area, would its techniques generalize to that larger area, or would it need to hire commensurately more engineers?
I suspect that Waymo faces the problem of trying to do far too much essentially by hand, just adding incremental fix after fix as problems arise. The ideal would be to, instead, apply machine learning techniques that can learn from data and generalize to new scenarios and new driving conditions. Unfortunately, current machine learning techniques do not seem to be up to that task.
The 2023 LessWrong survey was median 2040 for singularity, and 2030 for âBy what year do you think AI will be able to do intellectual tasks that expert humans currently do?â. The second question was ambiguous, and some people put it in the past.
Okay, well maybe the play testers and the game developers have developed the skill before me, but then at some point one of them had to be the first person in history to ever acquire the skill of playing that game.
So I do think that it is a vocal minority in EA and LW that have median timelines before 2030.
Now we have some data on AGI timelines for EA (though it was only 34 responses, so of course there could be large sample bias): about 15% expect it by 2030 or sooner.
But 47% (16 out of 34) put their median year no later than 2032 and 68% (23 out of 34) put their median year no later than 2035, so how significant a finding this is depends how much you care about those extra 2-5 years, I guess.
Only 12% (4 out of 34) of respondents to the poll put their median year after 2050. So, overall, respondents overwhelmingly see relatively near-term AGI (within 25 years) as at least 50% likely.
This could be true, but you have to account for other elements of the cost structure. For example, can you improve the ratio of engineers to autonomous vehicles from the current ratio of around 1:1 or 1:2 to something like 1:1000 or 1:10,000?
It seems like Waymo is using methods that scale with engineer labour rather than learning methods that scale with data and compute. So, to deploy more vehicles in more areas would require commensurately more engineers, and, in addition to being too expensive, there are simply not enough of them on Earth.
It doesnât mean that, either. METR has found that current frontier AI systems are worse in real world, practical use cases than not using AI at all.
Automatically gradable benchmarks generally donât seem to have much to do with the ability to do tasks in the real world. So, predicting real world performance based on benchmark performance seems to just be an invalid inference.
Anecdotally, what I hear from people who say they find AI coding assistants useful is that it saves them the time it would take to copy and paste code from Stack Exchange. I have never heard anything along the lines of âit came up with a new ideaâ or âit was creativeâ. Yet this is what human-level coding would require.
The obvious objection to this as it pertains to the initial advent of AGI, rather than to superintelligence, is that this is a chicken-and-egg problem. If you need AGI to do AGI R&D, AGI canât help you develop AGI because you havenât invented it yet. You would need a sub-human AI that can do tasks that speed up AI research and AI engineering. And that seems dubious. Is automating this kind of work not an AGI-level problem?
I donât know if I believe this. LLMs are impressive, but their scope is fairly narrow. They have memorized most of the important digital/âdigitized text thatâs available. Their next-token prediction and everything layered on top of that like fine-tuning to follow instructions, reinforcement learning from human feedback (RLHF), and Chain of Thought results in some impressive behaviours. But they are extremely brittle. They routinely make errors on very basic tasks.
I think of LLMs more like another type of AI system that are proficient in one area, comparable to game-playing RL agents. LLMs are good at many text-related tasks (including math and coding, which are also text) but they arenât able to generalize beyond the text-related tasks they have massive amounts of training data for. They donât do well outside of text-related tasks, they donât do well with novelty, they frequently fail to reason properly, etc.
So, Iâm not sure LLMs are all that more general than previous systems like MuZero and AlphaZero.
Part of generality or generalization is that you should see positive transfer learning, i.e., having skills in some domains should improve the AI systemâs skills in other domains. But it seems like we see is the opposite. That is, it seems like we see negative transfer learning. Training an AI on many diverse, heterogeneous tasks from multiple domains seems to hurt performance. Thatâs narrowness, not generality.
Thatâs very interesting, if you remember correctly. I would be interested in seeing survey data both for LessWrong and for EA.
I donât think you can have it both ways: A superhuman coder (that is actually competent, which you donât think AI assistants are now) is relatively narrow AI, but would accelerate AI progress. A superhuman AI researcher is more general (which would drastically speed up AI progress), but is not fully general. I would argue that LLMs now are more general than AI researcher tasks (though LLMs are currently not good at all of those tasks), because LLMs can competently discuss philosophy, economics, political science, art, history, engineering, science, etc.
Iâm claiming that they could approach overall staff to vehicle ratio of 1:10 if the number of real-time helpers (which donât have to be engineers) and vehicles were dramatically scaled up, and thatâs enough for profitability.
The 2023 LessWrong survey was median 2040 for singularity, and 2030 for âBy what year do you think AI will be able to do intellectual tasks that expert humans currently do?â. The second question was ambiguous, and some people put it in the past. I havenât seen a similar survey result for EAs, but I expect longer timelines than LW.
I definitely disagree with this. Hopefully what I say below will explain why.
The general in artificial general intelligence doesnât just refer to having a large repertoire of skills. Generality is about the ability to learn to generalize beyond what a system has seen in its training data. An artificial general intelligence doesnât just need to have new skills, it needs to be able to acquire new skills, and to acquire new skills that have never existed in history before by developing them itself â just as humans do.
If a new video game comes out today, Iâm able to play that game and develop a new skill that has never existed before.[1] I will probably get the hang of it in a few minutes, with a few attempts. Thatâs general intelligence.
AlphaStar was not able to figure out how to play StarCraft using pure reinforcement learning. It just got stuck using its builders to attack the enemy, rather than figuring out how to use its builders to make buildings that produce units that attack. To figure out the basics of the game, it needed to do imitation learning on a very large dataset of human play. Then, after imitation learning, to get as good as it did, it needed to do an astronomical amount of self-play, around 60,000 years of playing StarCraft. Thatâs not general intelligence. If you need to copy a large dataset of human examples to acquire a skill and millennia of training on automatically gradable, relatively short time horizon tasks (which often donât exist in the real world), thatâs something, and itâs even something impressive, but itâs not general intelligence.
Letâs say you wanted to apply this kind of machine learning to AI R&D. The necessary conditions donât apply. You donât have a large dataset of human examples to train on. You donât have automatically gradable, relatively short time horizon tasks with which to do reinforcement learning. And if the tasks require real world feedback and canât be simulated, you certainly donât have 60,000 years.
I like what the AI researcher François Chollet has to say about this topic in this video from 11:45 to 20:00. He draws the distinction between crystallized behaviours and fluid intelligence, between skills and the ability to learn skills. I think this is important. This is really what the whole topic of AGI is about.
Why have LLMs absorbed practically all text on philosophy, economics, political science, art, history, engineering, science, and so on and not come up with a single novel and correct idea of any note in any of these domains? They are not able to generalize enough to do so. They can generalize or interpolate a little bit beyond their training data, but not very much. Itâs that generalization ability (which is mostly missing in LLMs) thatâs the holy grail in AI research.
There are two concepts here. One is remote human assistance, which Waymo calls fleet response. The other is Waymoâs approach to the engineering problem. I was saying that I suspect Waymoâs approach to the engineering problem doesnât scale. I think it probably relies on engineers doing too much special casing that doesnât generalize well when a modest amount of novelty is introduced. So, Waymo currently has something like 1,500 engineers to operate in the comparatively small geofenced areas where it currently operates. If it wanted to expand where it drives to a 10x larger area, would its techniques generalize to that larger area, or would it need to hire commensurately more engineers?
I suspect that Waymo faces the problem of trying to do far too much essentially by hand, just adding incremental fix after fix as problems arise. The ideal would be to, instead, apply machine learning techniques that can learn from data and generalize to new scenarios and new driving conditions. Unfortunately, current machine learning techniques do not seem to be up to that task.
Thank you. Well, that isnât surprising at all.
Okay, well maybe the play testers and the game developers have developed the skill before me, but then at some point one of them had to be the first person in history to ever acquire the skill of playing that game.
Quoting myself:
Now we have some data on AGI timelines for EA (though it was only 34 responses, so of course there could be large sample bias): about 15% expect it by 2030 or sooner.
But 47% (16 out of 34) put their median year no later than 2032 and 68% (23 out of 34) put their median year no later than 2035, so how significant a finding this is depends how much you care about those extra 2-5 years, I guess.
Only 12% (4 out of 34) of respondents to the poll put their median year after 2050. So, overall, respondents overwhelmingly see relatively near-term AGI (within 25 years) as at least 50% likely.