I feel like this distinction is mostly true in places that don’t matter, and false in places that do matter.
Sure, a trained LLM is not a piece of software but rather an architecture and a bunch of weights (and maybe an algorithm for fine-tuning). This is also true of other parts of software, like configuration files with a bunch of constants no one understands other than the engineer who optimized them using trial and error.
On the other hand, the only way they can do something, i.e. interact with anything, is by being used inside a program. Such a program gives hopefully well-defined interfaces for them to use. Thus one would be able to do unintended things only if it becomes smart enough to realise what it is and how it is expressed and controlled, and manages to hack its software or convince a human to copy it to some other software.
On the other hand, the “nice” properties you ascribed software aren’t really true themselves:
The result of running a program aren’t determined by the code, but also by a bunch of environmental circumstances, like system definitions, available resources, other people interacting with the same machine.
You can’t always debug it—the most you can hope for is to have good logs and sometimes understand what has gone wrong, if it happens to be captured by what you thought in advance to log.
You can’t always run unit tests—sometimes you’re doing too complicated a process for them to be meaningful, or the kind of data you need is impossible to manufacture synthetically.
You can’t always make sure it’s correct, or individual parts do what they’re supposed to—if you handle something that’s not very simple, there are simply too many cases to think of checking. And you don’t even necessarily know whether your vaguely defined goal is achieved correctly or not.
These are all practical considerations happening simultaneously in every project I’ve worked on in my current job. You think you know what your software does, but it’s only a (perhaps very) educated guess.
I agree that the properties are somewhat simplified, but a key problem here is that the intuition and knowledge we have about how to make software better fails for deep learning. Current procedures for developing debugging software work less well for neural networks doing text prediction than psychology does. And at that point, from the point of view of actually interacting with the systems, it seems worse to group software and AI than to group AI and humans. Obviously, however, calling current AI humanlike is mostly wrong. But that just shows that we don’t want to use these categories!
I feel like this distinction is mostly true in places that don’t matter, and false in places that do matter.
Sure, a trained LLM is not a piece of software but rather an architecture and a bunch of weights (and maybe an algorithm for fine-tuning). This is also true of other parts of software, like configuration files with a bunch of constants no one understands other than the engineer who optimized them using trial and error.
On the other hand, the only way they can do something, i.e. interact with anything, is by being used inside a program. Such a program gives hopefully well-defined interfaces for them to use. Thus one would be able to do unintended things only if it becomes smart enough to realise what it is and how it is expressed and controlled, and manages to hack its software or convince a human to copy it to some other software.
On the other hand, the “nice” properties you ascribed software aren’t really true themselves:
The result of running a program aren’t determined by the code, but also by a bunch of environmental circumstances, like system definitions, available resources, other people interacting with the same machine.
You can’t always debug it—the most you can hope for is to have good logs and sometimes understand what has gone wrong, if it happens to be captured by what you thought in advance to log.
You can’t always run unit tests—sometimes you’re doing too complicated a process for them to be meaningful, or the kind of data you need is impossible to manufacture synthetically.
You can’t always make sure it’s correct, or individual parts do what they’re supposed to—if you handle something that’s not very simple, there are simply too many cases to think of checking. And you don’t even necessarily know whether your vaguely defined goal is achieved correctly or not.
These are all practical considerations happening simultaneously in every project I’ve worked on in my current job. You think you know what your software does, but it’s only a (perhaps very) educated guess.
I agree that the properties are somewhat simplified, but a key problem here is that the intuition and knowledge we have about how to make software better fails for deep learning. Current procedures for developing debugging software work less well for neural networks doing text prediction than psychology does. And at that point, from the point of view of actually interacting with the systems, it seems worse to group software and AI than to group AI and humans. Obviously, however, calling current AI humanlike is mostly wrong. But that just shows that we don’t want to use these categories!