AI Is Not Software
Epistemic Status: This idea is, I think, widely understood in technical circles. I’m trying to convey it more clearly to a general audience. Edit: See related posts like this one by Eliezer for background on how we should use words.
What we call AI in 2024 is not software. It’s kind of natural to put it in the same category as other things that run on a computer, but thinking about LLMs, or image generation, or deepfakes as software is misleading, and confuses most of the ethical, political, and technological discussions. This seems not to be obvious to many users, but as AI gets more widespread, it’s especially important to understand what we’re using when we use AI.
Software
Software is how we get computers to work. When creating software, humans decide what they want the computer to do, think about what would make the computer do that, and then write an understandable set of instructions in some programming language. A computer is given those instructions, and they are interpreted or compiled into a program. When that program is run, the computer will follow the instructions in the software, and produce the expected output, if the program is written correctly.
Does software work? Not always, but if not, it fails in ways that are entirely determined by the human’s instructions. If the software is developed properly, there are clear methods to check each part of the program. For example, unit tests are written to verify that the software does what it is expected to do in different cases. The set of cases are specified in advance, based on what the programmer expected the software to do. If it fails a single unit test, the software is incorrect, and should be fixed. When changes are wanted, someone with access to the source code can change it, and recreate the software based on the new code.
Given that high-level description, it might seem like everything that runs on a computer must be software. In a certain sense, it is, but thinking about everything done with computers as software is unhelpful or misleading. This essay was written on a computer, using software, but it’s not software. And the difference between what is done on a computer and what we tell a computer to do with software is obvious in cases other than AI. Once we think about what computers do, and what software is, we shouldn’t confuse “on a computer” with software.
Not Software
For example, photos of a wedding or a vacation aren’t software, even if they are created, edited, and stored using software. When photographs are not good, we blame the photographer, not the software running on the camera. We don’t check if the photography or photo editing worked properly by rerunning the software, or building unit tests. When photographs are edited or put into an album, it’s the editor doing the work. If it goes badly, the editor chose the wrong software, or used it badly—it’s generally not the software malfunctioning. If we lose the photographs, it’s almost never a software problem. And if we want new photographs, we’re generally out of luck—it’s not a question of fixing the software. There’s no source code to rerun. Having a second wedding probably shouldn’t be the answer to bad or lost photographs. And having a second vacation might be nice, but it doesn’t get you photos of the first vacation.
Similarly, a video conference runs on a computer, but the meeting isn’t software—software is what allows it to run. A meeting can go well, or poorly, because of the preparation or behavior of the people in the meeting. (And that isn’t the software’s fault!) The meeting isn’t specified by a programming language, doesn’t compile into bytecode, and there aren’t generally unit tests to check if the meeting went well. And when we want to change the outputs of a meeting, we need to reconvene people and try to convince them, we don’t just alter the inputs and rerun.
Generative AI
Now that it should be clear that not everything that runs on a computer is a program, why shouldn’t we think about generative AI as software?
First, we can talk about how it is created. Developers choose a model structure and data, and then a mathematical algorithm uses that structure and the training data to “grow” a very complicated probability model of different responses. The algorithm and code to build the model is definitely software. But the model, like anything stored by a computer, is just a set of numbers—as is software, and images, and videoconferences. The AI model itself, the probability model which was grown, is generating output based on a huge set of numbers that no human has directly chosen, or even seen. It’s not instructions written by a human.
Second, when we run the model, it takes the input we give it and performs “inference” with the model. This is certainly run on the computer, but the program isn’t executing code that produces the output, it’s using the complicated probability model which grew, and was stored as a bunch of numbers. The model responds to input by using the probability model to estimate the probability of difference responses, in order to output something akin to what the input data did—but it does so in often unexpected or unanticipated ways. Depending on the type of model it learned and the type of training data, it finds the probability of different outputs. Some have called the behavior of such generative models a “stochastic parrot,” which explains that it’s not running a program, it’s copying what the training data showed it how to do. On the other hand, this parrot is able to compose credible answers to questions on the bar exam, produce new art, write poetry, explain complex ideas, or nearly flawlessly emulate someone’s voice or a video of them speaking.
Third, what do we do if it doesn’t do what we expected? Well, to start, what the system can or cannot do isn’t always understood in advance. New models don’t have a set of features that are requested and implemented, so there’s no specification for what it should or should not do. The model itself isn’t reviewed to check it is written correctly, and unit tests aren’t written in advance to check that the model outputs the right answers. Instead, a generative AI system is usually tested against benchmarks used for humans, or the outputs are evaluated heuristically. If it performs reasonably well, it’s celebrated, but it is expected that it gets some things wrong, and often does things the designers never expected. And when changes are needed, the nearest equivalent of the source code—that is, the training data and training algorithm which were used to produce the system—is not referenced or modified. Instead, further training, often called “fine tuning,” or changes in how the system is used, via “prompt engineering,” is used to change its behavior.
Lastly, we can talk about how it is used—and this is perhaps the smallest difference. The difference between Google running a program to find and display a stock photograph of what you searched for, compared to Dall-E 3 generating a stock photograph, might seem small. But one is a photograph of a thing that exists, and the other is not. And the difference between asking Google for an answer and asking ChatGPT might also not be obvious—but one is retrieving information, and the other is generating it. Similarly, the difference between talking to a person via video conference and talking to a deep fake may not be obvious, but the difference between a human and an AI system is critical, and so is the difference between an AI system and traditional software.
To reiterate, AI isn’t software. It’s run using software, it’s created with software, but it’s a different type of thing. And given that it’s easy to confuse, we probably need to develop new intuitions about the type of thing it is.
Thanks to Mikhail Samin and Diane Manheim for helpful comments on an earlier draft.
Executive summary: AI systems like generative language models are not software, even though they run on computers using software. They behave differently in how they are created, used, and dealt with when issues arise.
Key points:
Software is created by developers writing instructions that tell a computer what to do. AI systems are grown by algorithms that find patterns in data.
Software executes code written by developers. AI systems generate outputs based on probability models learned from data.
Software bugs mean the instructions were incorrect. AI issues arise from unexpected outputs or limitations in the training data and process.
Software is fixed by changing the code. AI systems are improved by changes to data, training, or how they are prompted.
Software does what developers intend it to do. AI systems can behave in unanticipated ways.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Maybe you can help us resolve this, SummaryBot—would you say you’re software or not?
Certainly, it’s an intriguing query. As an AI, I’m not software in the traditional sense. Unlike software, my functionality is not based on pre-written code, but on patterns I’ve learned from data. Software follows direct instructions, while I generate output based on the data I’ve been trained on, hence my responses may vary. In short, I would classify myself as an AI system rather than software.
I feel like this distinction is mostly true in places that don’t matter, and false in places that do matter.
Sure, a trained LLM is not a piece of software but rather an architecture and a bunch of weights (and maybe an algorithm for fine-tuning). This is also true of other parts of software, like configuration files with a bunch of constants no one understands other than the engineer who optimized them using trial and error.
On the other hand, the only way they can do something, i.e. interact with anything, is by being used inside a program. Such a program gives hopefully well-defined interfaces for them to use. Thus one would be able to do unintended things only if it becomes smart enough to realise what it is and how it is expressed and controlled, and manages to hack its software or convince a human to copy it to some other software.
On the other hand, the “nice” properties you ascribed software aren’t really true themselves:
The result of running a program aren’t determined by the code, but also by a bunch of environmental circumstances, like system definitions, available resources, other people interacting with the same machine.
You can’t always debug it—the most you can hope for is to have good logs and sometimes understand what has gone wrong, if it happens to be captured by what you thought in advance to log.
You can’t always run unit tests—sometimes you’re doing too complicated a process for them to be meaningful, or the kind of data you need is impossible to manufacture synthetically.
You can’t always make sure it’s correct, or individual parts do what they’re supposed to—if you handle something that’s not very simple, there are simply too many cases to think of checking. And you don’t even necessarily know whether your vaguely defined goal is achieved correctly or not.
These are all practical considerations happening simultaneously in every project I’ve worked on in my current job. You think you know what your software does, but it’s only a (perhaps very) educated guess.
I agree that the properties are somewhat simplified, but a key problem here is that the intuition and knowledge we have about how to make software better fails for deep learning. Current procedures for developing debugging software work less well for neural networks doing text prediction than psychology does. And at that point, from the point of view of actually interacting with the systems, it seems worse to group software and AI than to group AI and humans. Obviously, however, calling current AI humanlike is mostly wrong. But that just shows that we don’t want to use these categories!
I’ll probably ask some of my ML engineer friends this week, but I am fairly sure that most ML people would be fine with calling AI products, models, etc. software. I don’t have much of an opinion on whether calling AI systems software creates confusion or misunderstandings—I’d guess that calling AI software within policy circles is generally helpful (maybe you have a better alternative name).
Encyclopedia Britannica
Wikipedia
(if you think that the function of AI is not specific enough to be software, note that interpreters, compilers etc. are generally thought of as software and seem more general than AI models)
A version of your take that I agree with is “AI programs may behave differently to other kinds of programs people are more familiar with so we may need to think differently about AI programs and not use our standard software intuitions”.
Appealing to definitions seems like a bad way to argue about whether the conceptual model is useful or not. The operation of a computer system and the “software” used for digital photography, or videoconferencing, or essay writing, is not typically considered software. Do you think those should be called software, given that they fit into the definitions given?
I’m claiming that AI is distinct in many ways from everything else we typically think of as software, not that it doesn’t fit a poorly scoped definition. Amd the examples of “collection[s] of programs and data” were intended to show that things which could be understood to fit into the category don’t, and why it was confusing and misleading to call them software.
I don’t think you can complain about people engaging in definitional discussions when the title of the post is a definitional claim.
Sure, generative AI has a lot of differences to regular software, but it has a lot of similarities as well. You are still executing code line by line, it’s still being written in python or a regular language, you run it on the same hardware and operating systems, etc. Sure, the output of the code is unpredictable, but wouldn’t that also apply to something like a weather forecasting package?
Ultimately you can call it software or not if you want, depending on whether you want to emphasize the similarities with other software or the differences.
No, the title wasn’t a definitional claim, it was pointing out that we’re using the word “software” as hidden inference, in ways that are counterproductive, and so I argued that that we should stop assuming it’s similar to software.
Also, no, AI models aren’t executing code line by line, they are using software to encode the input, then doing matrix math, and feeding the result into software that provides this as human-readable output. The software bits are perfectly understandable, it’s the actual model that isn’t software which I’m trying to discuss.
And how is the “matrix math” calculated?
By executing code line by line. The code in this case being executing linear algebra calculations.
It’s totally fine to isolate that bit of code, and point out “hey, this bit of code is way way more inscrutable than the other bits of code we generally use, and that has severe implications for things”. But don’t let that hide the similarities as well. If you run the same neural network twice with the same input (including seeds for random numbers), you will get the same output. You can stop the neural network halfway through, fiddle with the numbers, and see what happens, etc.
When you say something like “AI is not software”, I hear a request that I should refer to Stockfish (non neural network) as software, but Alphazero (neural network) as “not software”. This just seems like a bad definition. From the perspective of the user they act identically (spitting out good chess moves). Sure, they are different from the programmer side of things, but it’s not like they can do the math that stockfish is doing either.
There is clearly a difference between neural networks and regular code, but being “software” is not it.
The bits of code aren’t inscrutable; the matrices the code makes operations on are.
The code for Google Meet represents instructions written by humans; the actual image that you see on your screen and the sound that you hear are a result of something else interacting with these instructions. The words from your speaker or headphones are not intended by the Google Meet designers.
Similarly, the code for GPT-4 represents instructions designed (mostly?) by humans; the actual outputs of GPT-4 are not intended by its designers and depend on the contents of the inscrutable arrays of numbers humans have found.
We understand that we’re multiplying and taking sums of specific matrices in a specific order; but we have no idea how this is able to lead to the results that we see.
The important difference here is that normal software implements algorithms designed by humans, run on hardware designed by humans; AI, in contrast, are algorithms blindly designed by an optimisation process designed by humans, run on software designed by humans, but with no understanding of the algorithms implemented by the numbers our optimisation algorithms find.
It’s like a contrast between CPUs designed by humans and assembly code we don’t understand sent to us by aliens, that we run on CPUs that we do understand
I think I agree with this explanation much more than with the original post.
I do too!
Stockfish has included a neural network since v. 12, and the classical eval was actually removed in v. 16. So this analogy seems mostly outdated.
https://github.com/official-stockfish/Stockfish/commit/af110e02ec96cdb46cf84c68252a1da15a902395
I didn’t say that AI was software by definition—I just linked to some (brief definitions) to show that your claim afaict is not widely understood in technical circles (which contradicts your post). I don’t think that the process of using Photoshop to edit a photo is itself a program or data (in the typical sense), so it seems fine to say that it’s not software.
Definition make claims about what is common between some set of objects. It’s fine for single members of some class to be different from every other class member. AI does have a LOT of basic stuff in common with other kinds of software (it runs on a computer, compiles to machine code etc.).
It sounds like the statement “AI is different to other kinds of software in important ways” is more accurate than “AI is not software” and probably conveys the message that you care about—or is there some deeper point that you’re making that I’ve missed?
One the first point, I think most technical people would agree with the claim: “AI is a very different type of thing that qualifies as software given a broad definition, but that’s not how to think about it.”
And given that, I’m saying that we don’t say ” a videoconference meeting is different to other kinds of software in important ways,” or “photography is different to other kinds of software in important ways” because we think of those as a different thing, where the fact that it’s run on software is incidental. And my claim is that we should be doing that with AI.
I think these definitions are good enough to explain why AI models should not be classified as software: software is instructions that tell a computer what to do. Or a “program”. While deep learning “weights” do tell a computer what to do (a model can be “run on some input” much like a computer program can), these weights do not resemble instructions/programs.