AI Is Not Software

Epistemic Status: This idea is, I think, widely understood in technical circles. I’m trying to convey it more clearly to a general audience. Edit: See related posts like this one by Eliezer for background on how we should use words.

What we call AI in 2024 is not software. It’s kind of natural to put it in the same category as other things that run on a computer, but thinking about LLMs, or image generation, or deepfakes as software is misleading, and confuses most of the ethical, political, and technological discussions. This seems not to be obvious to many users, but as AI gets more widespread, it’s especially important to understand what we’re using when we use AI.

Software

Software is how we get computers to work. When creating software, humans decide what they want the computer to do, think about what would make the computer do that, and then write an understandable set of instructions in some programming language. A computer is given those instructions, and they are interpreted or compiled into a program. When that program is run, the computer will follow the instructions in the software, and produce the expected output, if the program is written correctly.

Does software work? Not always, but if not, it fails in ways that are entirely determined by the human’s instructions. If the software is developed properly, there are clear methods to check each part of the program. For example, unit tests are written to verify that the software does what it is expected to do in different cases. The set of cases are specified in advance, based on what the programmer expected the software to do. If it fails a single unit test, the software is incorrect, and should be fixed. When changes are wanted, someone with access to the source code can change it, and recreate the software based on the new code.

Given that high-level description, it might seem like everything that runs on a computer must be software. In a certain sense, it is, but thinking about everything done with computers as software is unhelpful or misleading. This essay was written on a computer, using software, but it’s not software. And the difference between what is done on a computer and what we tell a computer to do with software is obvious in cases other than AI. Once we think about what computers do, and what software is, we shouldn’t confuse “on a computer” with software.

Not Software

For example, photos of a wedding or a vacation aren’t software, even if they are created, edited, and stored using software. When photographs are not good, we blame the photographer, not the software running on the camera. We don’t check if the photography or photo editing worked properly by rerunning the software, or building unit tests. When photographs are edited or put into an album, it’s the editor doing the work. If it goes badly, the editor chose the wrong software, or used it badly—it’s generally not the software malfunctioning. If we lose the photographs, it’s almost never a software problem. And if we want new photographs, we’re generally out of luck—it’s not a question of fixing the software. There’s no source code to rerun. Having a second wedding probably shouldn’t be the answer to bad or lost photographs. And having a second vacation might be nice, but it doesn’t get you photos of the first vacation.

Similarly, a video conference runs on a computer, but the meeting isn’t software—software is what allows it to run. A meeting can go well, or poorly, because of the preparation or behavior of the people in the meeting. (And that isn’t the software’s fault!) The meeting isn’t specified by a programming language, doesn’t compile into bytecode, and there aren’t generally unit tests to check if the meeting went well. And when we want to change the outputs of a meeting, we need to reconvene people and try to convince them, we don’t just alter the inputs and rerun.

Generative AI

Now that it should be clear that not everything that runs on a computer is a program, why shouldn’t we think about generative AI as software?

First, we can talk about how it is created. Developers choose a model structure and data, and then a mathematical algorithm uses that structure and the training data to “grow” a very complicated probability model of different responses. The algorithm and code to build the model is definitely software. But the model, like anything stored by a computer, is just a set of numbers—as is software, and images, and videoconferences. The AI model itself, the probability model which was grown, is generating output based on a huge set of numbers that no human has directly chosen, or even seen. It’s not instructions written by a human.

Second, when we run the model, it takes the input we give it and performs “inference” with the model. This is certainly run on the computer, but the program isn’t executing code that produces the output, it’s using the complicated probability model which grew, and was stored as a bunch of numbers. The model responds to input by using the probability model to estimate the probability of difference responses, in order to output something akin to what the input data did—but it does so in often unexpected or unanticipated ways. Depending on the type of model it learned and the type of training data, it finds the probability of different outputs. Some have called the behavior of such generative models a “stochastic parrot,” which explains that it’s not running a program, it’s copying what the training data showed it how to do. On the other hand, this parrot is able to compose credible answers to questions on the bar exam, produce new art, write poetry, explain complex ideas, or nearly flawlessly emulate someone’s voice or a video of them speaking.

Third, what do we do if it doesn’t do what we expected? Well, to start, what the system can or cannot do isn’t always understood in advance. New models don’t have a set of features that are requested and implemented, so there’s no specification for what it should or should not do. The model itself isn’t reviewed to check it is written correctly, and unit tests aren’t written in advance to check that the model outputs the right answers. Instead, a generative AI system is usually tested against benchmarks used for humans, or the outputs are evaluated heuristically. If it performs reasonably well, it’s celebrated, but it is expected that it gets some things wrong, and often does things the designers never expected. And when changes are needed, the nearest equivalent of the source code—that is, the training data and training algorithm which were used to produce the system—is not referenced or modified. Instead, further training, often called “fine tuning,” or changes in how the system is used, via “prompt engineering,” is used to change its behavior.

Lastly, we can talk about how it is used—and this is perhaps the smallest difference. The difference between Google running a program to find and display a stock photograph of what you searched for, compared to Dall-E 3 generating a stock photograph, might seem small. But one is a photograph of a thing that exists, and the other is not. And the difference between asking Google for an answer and asking ChatGPT might also not be obvious—but one is retrieving information, and the other is generating it. Similarly, the difference between talking to a person via video conference and talking to a deep fake may not be obvious, but the difference between a human and an AI system is critical, and so is the difference between an AI system and traditional software.

To reiterate, AI isn’t software. It’s run using software, it’s created with software, but it’s a different type of thing. And given that it’s easy to confuse, we probably need to develop new intuitions about the type of thing it is.

Thanks to Mikhail Samin and Diane Manheim for helpful comments on an earlier draft.

Crossposted from LessWrong (56 points, 29 comments)