In 2012, Holden Karnofsky[1] critiqued MIRI (then SI) by saying “SI appears to neglect the potentially important distinction between ‘tool’ and ‘agent’ AI.” He particularly claimed:
Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work
I understand this to be the first introduction of the “tool versus agent” ontology, and it is a helpful (relatively) concrete prediction. Eliezer replied here, making the following summarized points (among others):
Tool AI is nontrivial
Tool AI is not obviously the way AGI should or will be developed
Gwern more directly replied by saying:
AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.
11 years later, can we evaluate the accuracy of these predictions?
- ^
Some Bayes points go to LW commenter shminux for saying that this Holden kid seems like he’s going places
I think it’s pretty clear now that the default trajectory of AI development is taking us towards pretty much exactly the sorts of agentic AGI that MIRI et al were worried about 11 years ago. We are not heading towards a world of AI tools by default; coordination is needed to not build agents.
If in 5 more years the state of the art, most-AGI-ish systems are still basically autocomplete, not capable of taking long series of action-input-action-input-etc. with humans out of the loop, not capable of online learning, and this had nothing to do with humans coordinating to slow down progress towards agentic AGI, I’ll count myself as having been very wrong and very surprised.
My take is that both were fairly wrong.[1] AI is much more generally intelligent and single systems are useful for many more things than Holden and the tool AI camp would have predicted. But they are also extremely non-agentic.
(To me this is actually rather surprising. I would have expected agency to be necessary to get this much general capability.)
I’m tempted to call it a wash. But rereading Holden’s writing in the linked post, it seems to be pretty narrowly arguing against AI as necessarily being agentic, which seems to have predicted the current world (though note there’s still plenty of time for AIs to get agentic, and I still roughly believe the arguments that they probably will).
This seems unsurprising, tbh. I think everyone now should be pretty uncertain about how AI will go in the future.
This doesn’t sound super true to me, for what it’s worth. The AIs are predicting humans after all, and humans are pretty agentic. Many people had conversations with Sydney where Sydney tried to convince them to somehow not shut her down.
I think there is still an important sense in which there is a surprising amount of generality compared to the general level of capability, but I wouldn’t particularly call the current genre of AIs “extremely non-agentic”.
I guess it depends on your priors or something. It’s agentic relative to a rock, but, relative to an AI which can pass the LSAT, it’s well below my expectations. It seems like ARC-Evals had to coax and prod GPT-4 to get it to do things it “should” have been doing with rudimentary levels of agency.
Relevant, I think, is Gwern’s later writing on Tool AIs:
Personally, I think the distinction is basically irrelevant in terms of safety concerns, mostly for reasons outlined by the second bullet-point above. The danger is in the fact that “useful answers” you might get out of a Tool AI are those answers which let you steer the future to hit narrow targets (approximately described as “apply optimization power” by Eliezer & such).
If you manage to construct a training regime for something that we’d call a Tool AI, which nevertheless gives us something smart enough that it does better than humans in terms of creating plans which affect reality in specific ways[1], then it approximately doesn’t matter whether or not we give it actuators to act in the world[2]. It has to be aiming at something; whether or not that something is friendly to human interests won’t depend on what we name we give the AI.
I’m not sure how to evaluate the predictions themselves. I continue to think that the distinction is basically confused and doesn’t carve reality at the relevant joints, and I think progress to date supports this view.
Which I claim is a reasonable non-technical summary of OpenAI’s plan.
Though note that even if whatever lab develops it doesn’t do so, the internet has helpfully demonstrated that the people will do it themselves, and quickly, too.