Does generality pay? GPT-3 can provide preliminary evidence.

One open question in predicting the future development of AI is whether individual AI systems will tend to remain highly specialized or become more general over time. In the most recent 80,000 Hours Podcast, Ben Garfinkel says:

Another argument for [AI systems staying specialized] is that it seems like specialized systems often outperform general ones, or it’s often easier to make, let’s say two specialized systems, one which performs task A and one that performs task B pretty well, rather than a single system that does both well. And it seems like this is often the case to some extent in AI research today. It’s easier to create individual systems that can play a single Atari game quite well than it is to create one system that plays all Atari games quite well. And it also seems like it’s a general, maybe economic or biological principle or something like that in lots of current cases. There are benefits from specialization as you get more systems that are interacting. So biologically, when you have a larger organism, cells tend to become more specialized, or economically, as you have a more sophisticated complex economy that does more stuff, it tends to be the case that you have greater specialization in terms of worker’s skills.

The question here is whether it’s more cost-effective to develop and use more general or more narrow AI systems. That is, we can develop either a suite of narrow AI systems that each perform some service at some level of competence, or a single, more general AI system that performs all of those services at the same levels of competence. Whichever collection of AI systems is cheaper to develop and use is the solution that will more likely be adopted by society.

We now have some examples of relatively general AI: for example, GPT-3 is a language model that performs decently well on a range of natural language processing (NLP) tasks, even though it’s only expressly trained to predict the next few tokens in a piece of text. The GPT-3 paper tells us how GPT-3 compares to state-of-the-art (SOTA) systems on the tasks it was tested on. In theory, we should be able to find out how much those SOTA systems altogether cost to produce and how much GPT-3 cost to produce, and compare the two.

But there’s another element to cost-effectiveness: how easy-to-use is the interface to the AI system. It might be that even though GPT-3 costs slightly more to produce than a suite of specialized AI systems, GPT-3 provides more value because it’s easier to consume the service. For example, the interfaces to GPT-3 and a specialized machine translation system might look like this:

// GPT-3
do_language_task("Translate English to French: The quick brown fox jumps over the lazy dog.")
// Narrow machine translation service
translate("The quick brown fox jumps over the lazy dog.", from="en", to="fr")

The GPT-3 interface is more elegant, since it allows you to specify tasks in natural language rather than learn the APIs for individual types of tasks. It also allows you to specify tasks that no one has trained a narrow AI system for, along with a relatively small number of examples (what the authors call few-shot learning). This tips the balance in favor of more general AI systems like GPT-3.

If GPT-3 is cheaper to develop and use across a representative basket of NLP tasks than an equivalent suite of specialized NLP systems, then the market will likely favor more general AI systems for NLP tasks. This evidence will provide insight as to whether society will develop more general AI systems or continue to produce narrow ones in the future.