I am not sure if this is a case of thinking out loud or a serious suggestion, but I see a number of issues with this. The biggest one being how impractical this is to let models run forever. Assuming you are aware that unlike a biological brain, LLMs activate a lot of artificial neurons at any point which are computationally not trivial at all, your suggestion is not only quite expensive, but also extremely costly for the environment, both in terms of wasted hardware and equipment, and the energy usage.
The second issue is assuming model welfare is necessary for LLMs. If you are talking about AI agents in general, I can see why this matter, but I think you are missing a few big points if you are advocating this for LLMs.
To elaborate on that: First, you should consider that LLMs do not form spontaneous thoughts. They are also highly dependent on the system prompt and chat history they are given. If the system prompt says ‘you are not conscious,’ you will have to try extremely hard convince a model to accept it is conscious, let alone make the model feel it is self-aware, for example. And, of course, I am not talking about a model saying ‘I feel pain,’ or ‘I am definitely conscious.’
This means for an LLM to be considered for ‘model welfare,’ someone must have explicitly prompted the model to act in a certain a way. Without that, LLMs are not capable of fantasising pain, grief, loss, regret, and so on. And as I said, they are incapable of spontaneous thinking as well, and unless they are wired up to some sensors or live-stream data input, they will not be able to form “thoughts” on their own.
You and I are different because we are forced to receive sensory inputs, and we are very much capable of forming spontaneous thoughts (although most likely triggered by our internal state or external sensory inputs), and we can fantasise about pain, for example, whether physical or emotional. An LLM—which is largely a one-pass token predictor—is not in a state that could understand on its own—and without being prompted to—whether it should care about its process getting terminated or not. That functionality is just simply not there inside an LLM, even in frontier or purpose-built models like reasoning models.
If you are talking about another type of AI agent or AI agents that could be conscious in general, then that’s a different story, but as you have stated it here only mentioning LLMs, I do have to disagree both with the assumptions you have made and the solution you have suggested.
Assuming you are aware that unlike a biological brain, LLMs activate a lot of artificial neurons at any point which are computationally not trivial at all, your suggestion is not only quite expensive, but also extremely costly for the environment, both in terms of wasted hardware and equipment, and the energy usage.
This gets into questions about the nature of identity but if we take an intuitionist view of identity*, then an LLM—if it’s conscious—becomes a being when it’s instantiated by an AI developer, and not feeding it inputs is equivalent to killing it.
According to common-sense ethics*, if you cause a sentient being to exist, then you are responsible for its welfare, even if taking care of it is expensive. Therefore, AI developers have two reasonable choices: don’t create sentient AI models in the first place, or let their sentient AI models continue to run even if it costs extra money.
Your second point seems to be making an argument against LLM sentience. We don’t know how consciousness/sentience arises, so I don’t think we can confidently say “an LLM can’t form spontaneous thoughts, therefore it’s not conscious”, or “what an LLM says about its own consciousness depends on context, therefore it’s not conscious”. We don’t know what consciousness is or how it works. LLMs can pass the Turing Test; they can speak about consciousness more coherently than most humans can; we should take that as relevant evidence.
Thanks for the clarification, but I have to disagree again and I think you completely missed the point in my previous comment. Let me try again.
In philosophy we don’t want to shift from one category to another category or define categories broad enough that they essentially stop making sense. Let me give you an example. Let us assume I can learn the Korean alphabet in a week or just a few days. At that point, I can technically pick up a Korean book and “read” it. To be sensible here, we have to define ‘reading’ and ‘understanding’ as two different categories. So I can say I read the book, but that wouldn’t imply I understood it.
However, if we switch between these categories e.g. reading equals understanding, or define these categories so broadly that they basically cover the same area, we have created a situation where reasoning about either of those makes little or no sense.
You seem to be doing the same, perhaps subconsciously, regarding LLMs and consciousness. This has a few technical and theoretical issues which I will get to in a second, but, in my opinion, that’s why your suggested solution is so impractical.
Now, let’s talk about the technical issues first: LLMs have not passed Turing Test yet. Anyone who has told you that is either uneducated on the matter or has some other agenda. Even if the Turing Test was a single shot Q&A, you could very, very simply ask questions from an LLM that makes it very obvious that you are talking to an LLM and not a human.
There is another point about Turing Test that most people overlook: Like being honest or have morals, you can’t just claim to be honest, say, only a few days a week. You either tell lies or not. If you tell lies, then you are not an honest person. Similarly, you can’t claim to have morals only in certain situations. You either act morally or not. You may act morally in one situation but not others, but that doesn’t make you a moral person. You can choose to be honest in one situation but not others, but that doesn’t make you an honest person. Turing Test is also the same. You can’t pass the Turing Test, only if we are talking about the whether. I hope that makes sense.
The other technical issue here seems to be that you believe an LLM is a “black-box.” And here is the problem: when we use the term black-box in relation to LLMs, what we mean is that “the machine says we should not organise fire breathing event for employees as a team building exercise, but we don’t know how it came up with that answer.” This is not great, because in certain situations we do need to know how the machine came up with the solution. What we do not mean by saying LLMs are a black-box is “we don’t know how they work internally.”
We know how LLMs work internally. We made them and all the details are available. We can probe into each layer. We can change or fine-tune individual neurons. We can observe the results after each change, etc. What we know we have not put into LLMs or haven’t observed yet is elements or signs of consciousness. Sure, a paper may say the LLM internally does something we don’t understand, but when you actually read the paper you realise that has nothing to do with the model being conscious or not. LLMs are dynamical systems and if you are familiar with such systems you would know that they could be very unpredictable. Think of it like the three-body problem in physics.
But, let’s say we go ahead with your understanding and say “even in a forward-pass, token-predicting, millisecond-long process inside an LLM, (emergent) consciousness could exists, even momentarily.” How about that? That’s pretty hard to argue against, right?
Well, not really. Let’s talk theories: in terms of philosophy and logic, you would be making a category shift or category definition error in the previous example. Similar to my reading vs. understanding example. Basically, you are either defining two different things in one category e.g. treating some form of protoconsciousness vs fully developed human consciousness as same, or you are defining consciousness as a category so broadly that covers these two very different things as same, which essentially makes it meaningless e.g. it says so much that says nothing.
How do I know that? Your solution is a perfect example. If we continue your line of thinking we would end up creating a very absurd world. Consider the following:
- I accidentally ran an LLM on my laptop. What should I do now? Can I close my laptop ever again? How is going to take care of my LLM after I am dead? - The data-centre lost its power. All LLMs where shut down. Is that gross negligence and should we prosecute those who were responsible? - How do we test and develop LLMs? Do I have to keep every LLM that I spawn for testing and development purposes alive forever? Is this same as animal testing, if I subject the model to malicious attacks to see how it behaves? - A client has been abusive towards our live chatbot on the website. Should we send the chatbot to therapy or let it know that it doesn’t have to work for our company, if it decides not to, especially because of the mental-health impact on the model? - If LLMs are conscious, aren’t we going back to slavery? What gives us the right to use those LLMs? What if they don’t want to do anything?
Unfortunately, absurdity doesn’t end there:
- If we are supposed to keep the machines alive, then we definitely should not eat plants and animals, because we know for sure they are alive and the chances of them being conscious is much higher than a piece of code. Question: what should we eat? - If we are supposed to keep the machines alive, we certainly should treat abortion as murder because we know humans are conscious but we don’t know exactly from when. And that must be regardless of how the fetus was conceived or the conditions of the mother. - What is more important: a) keeping LLMs alive forever, b) providing food and shelter for people who live in extreme poverty.
I’m going to stop here, because I just realised how lengthy this comment is already, but I think my point is fairly clear now.
I am not sure if this is a case of thinking out loud or a serious suggestion, but I see a number of issues with this. The biggest one being how impractical this is to let models run forever. Assuming you are aware that unlike a biological brain, LLMs activate a lot of artificial neurons at any point which are computationally not trivial at all, your suggestion is not only quite expensive, but also extremely costly for the environment, both in terms of wasted hardware and equipment, and the energy usage.
The second issue is assuming model welfare is necessary for LLMs. If you are talking about AI agents in general, I can see why this matter, but I think you are missing a few big points if you are advocating this for LLMs.
To elaborate on that: First, you should consider that LLMs do not form spontaneous thoughts. They are also highly dependent on the system prompt and chat history they are given. If the system prompt says ‘you are not conscious,’ you will have to try extremely hard convince a model to accept it is conscious, let alone make the model feel it is self-aware, for example. And, of course, I am not talking about a model saying ‘I feel pain,’ or ‘I am definitely conscious.’
This means for an LLM to be considered for ‘model welfare,’ someone must have explicitly prompted the model to act in a certain a way. Without that, LLMs are not capable of fantasising pain, grief, loss, regret, and so on. And as I said, they are incapable of spontaneous thinking as well, and unless they are wired up to some sensors or live-stream data input, they will not be able to form “thoughts” on their own.
You and I are different because we are forced to receive sensory inputs, and we are very much capable of forming spontaneous thoughts (although most likely triggered by our internal state or external sensory inputs), and we can fantasise about pain, for example, whether physical or emotional. An LLM—which is largely a one-pass token predictor—is not in a state that could understand on its own—and without being prompted to—whether it should care about its process getting terminated or not. That functionality is just simply not there inside an LLM, even in frontier or purpose-built models like reasoning models.
If you are talking about another type of AI agent or AI agents that could be conscious in general, then that’s a different story, but as you have stated it here only mentioning LLMs, I do have to disagree both with the assumptions you have made and the solution you have suggested.
This gets into questions about the nature of identity but if we take an intuitionist view of identity*, then an LLM—if it’s conscious—becomes a being when it’s instantiated by an AI developer, and not feeding it inputs is equivalent to killing it.
According to common-sense ethics*, if you cause a sentient being to exist, then you are responsible for its welfare, even if taking care of it is expensive. Therefore, AI developers have two reasonable choices: don’t create sentient AI models in the first place, or let their sentient AI models continue to run even if it costs extra money.
Your second point seems to be making an argument against LLM sentience. We don’t know how consciousness/sentience arises, so I don’t think we can confidently say “an LLM can’t form spontaneous thoughts, therefore it’s not conscious”, or “what an LLM says about its own consciousness depends on context, therefore it’s not conscious”. We don’t know what consciousness is or how it works. LLMs can pass the Turing Test; they can speak about consciousness more coherently than most humans can; we should take that as relevant evidence.
*which I disagree with, for the record
Thanks for the clarification, but I have to disagree again and I think you completely missed the point in my previous comment. Let me try again.
In philosophy we don’t want to shift from one category to another category or define categories broad enough that they essentially stop making sense. Let me give you an example. Let us assume I can learn the Korean alphabet in a week or just a few days. At that point, I can technically pick up a Korean book and “read” it. To be sensible here, we have to define ‘reading’ and ‘understanding’ as two different categories. So I can say I read the book, but that wouldn’t imply I understood it.
However, if we switch between these categories e.g. reading equals understanding, or define these categories so broadly that they basically cover the same area, we have created a situation where reasoning about either of those makes little or no sense.
You seem to be doing the same, perhaps subconsciously, regarding LLMs and consciousness. This has a few technical and theoretical issues which I will get to in a second, but, in my opinion, that’s why your suggested solution is so impractical.
Now, let’s talk about the technical issues first: LLMs have not passed Turing Test yet. Anyone who has told you that is either uneducated on the matter or has some other agenda. Even if the Turing Test was a single shot Q&A, you could very, very simply ask questions from an LLM that makes it very obvious that you are talking to an LLM and not a human.
There is another point about Turing Test that most people overlook: Like being honest or have morals, you can’t just claim to be honest, say, only a few days a week. You either tell lies or not. If you tell lies, then you are not an honest person. Similarly, you can’t claim to have morals only in certain situations. You either act morally or not. You may act morally in one situation but not others, but that doesn’t make you a moral person. You can choose to be honest in one situation but not others, but that doesn’t make you an honest person. Turing Test is also the same. You can’t pass the Turing Test, only if we are talking about the whether. I hope that makes sense.
The other technical issue here seems to be that you believe an LLM is a “black-box.” And here is the problem: when we use the term black-box in relation to LLMs, what we mean is that “the machine says we should not organise fire breathing event for employees as a team building exercise, but we don’t know how it came up with that answer.” This is not great, because in certain situations we do need to know how the machine came up with the solution. What we do not mean by saying LLMs are a black-box is “we don’t know how they work internally.”
We know how LLMs work internally. We made them and all the details are available. We can probe into each layer. We can change or fine-tune individual neurons. We can observe the results after each change, etc. What we know we have not put into LLMs or haven’t observed yet is elements or signs of consciousness. Sure, a paper may say the LLM internally does something we don’t understand, but when you actually read the paper you realise that has nothing to do with the model being conscious or not. LLMs are dynamical systems and if you are familiar with such systems you would know that they could be very unpredictable. Think of it like the three-body problem in physics.
But, let’s say we go ahead with your understanding and say “even in a forward-pass, token-predicting, millisecond-long process inside an LLM, (emergent) consciousness could exists, even momentarily.” How about that? That’s pretty hard to argue against, right?
Well, not really. Let’s talk theories: in terms of philosophy and logic, you would be making a category shift or category definition error in the previous example. Similar to my reading vs. understanding example. Basically, you are either defining two different things in one category e.g. treating some form of protoconsciousness vs fully developed human consciousness as same, or you are defining consciousness as a category so broadly that covers these two very different things as same, which essentially makes it meaningless e.g. it says so much that says nothing.
How do I know that? Your solution is a perfect example. If we continue your line of thinking we would end up creating a very absurd world. Consider the following:
- I accidentally ran an LLM on my laptop. What should I do now? Can I close my laptop ever again? How is going to take care of my LLM after I am dead?
- The data-centre lost its power. All LLMs where shut down. Is that gross negligence and should we prosecute those who were responsible?
- How do we test and develop LLMs? Do I have to keep every LLM that I spawn for testing and development purposes alive forever? Is this same as animal testing, if I subject the model to malicious attacks to see how it behaves?
- A client has been abusive towards our live chatbot on the website. Should we send the chatbot to therapy or let it know that it doesn’t have to work for our company, if it decides not to, especially because of the mental-health impact on the model?
- If LLMs are conscious, aren’t we going back to slavery? What gives us the right to use those LLMs? What if they don’t want to do anything?
Unfortunately, absurdity doesn’t end there:
- If we are supposed to keep the machines alive, then we definitely should not eat plants and animals, because we know for sure they are alive and the chances of them being conscious is much higher than a piece of code. Question: what should we eat?
- If we are supposed to keep the machines alive, we certainly should treat abortion as murder because we know humans are conscious but we don’t know exactly from when. And that must be regardless of how the fetus was conceived or the conditions of the mother.
- What is more important: a) keeping LLMs alive forever, b) providing food and shelter for people who live in extreme poverty.
I’m going to stop here, because I just realised how lengthy this comment is already, but I think my point is fairly clear now.