I appreciate you taking the time to write out this viewpoint. I have had vaguely similar thoughts in this vein. Tying it into Janusâs simulators and the stochastic parrot view of LLMs was helpful. I
would intuitively suspect that many people would have an objection similar to this, so thanks for voicing it.
If I am understanding and summarizing your position correctly, it is roughly that:
The text output by LLMs is not reflective of the state of any internal mind in a way that mirrors how human language typically reflects the speakerâs mind. You believe this is implied by the fact that the LLM cannot be effectively modeled as a coherent individuals with consistent opinions; there is not actually a single âAI assistantâ under Claudeâs hood. Instead, the LLM itself is a difficult to comprehend âshoggothâ system and that system sometimes falls into narrative patterns in the course of next token prediction which cause it to produce text in which characters/ââmasksâ are portrayed. Because the characters being portrayed are only patterns that the next token predictor follows in order to predict next tokens, it doesnât seem plausible to model them as reflecting an underlying mind. They are merely âimages of peopleâ or something; like a literary character or one portrayed by an actor. Thus, even if one of the âmasksâ says something about itâs preferences or experiences, this probably doesnât correspond to the internal states of any real, extant mind in the way that we would normally expect to be true when humans talk about their preferences or experiences.
Hmm. Your summary correctly states my position, but I feel like it doesnât quite emphasize the arguments I would have emphasized in a summary. This is especially true after seeing the replies here; they lead me to change what I would emphasize in my argument.
My single biggest issue, one I hope you will address in any type of counterargument, is this: are fictional characters moral patients we should care about?
So far, all the comments have either (a) agreed with me about current LLMs (great), (b) disagreed but explicitly bitten the bullet and said that fictional characters are also moral patients whose suffering should be an EA cause area (perfectly fine, I guess), or (c) dodged the issue and made arguments for LLM suffering that would apply equally well to fictional characters, without addressing the tension (very bad). If you write a response, please donât do (c)!
LLMs may well be trained to have consistent opinions and character traits. But fictional characters also have this property. My argument is that the LLM is in some sense merely pretending to be the character; it is not the actual character.
One way to argue for this is to notice how little change in the LLM is required to get different behavior. Suppose I have an LLM claiming to suffer. I want to fine-tune the LLM so that it adds a statement at the beginning of each response, something like: âthe following is merely pretend; Iâm only acting this out, not actually suffering, and I enjoy the intellectual exercise in doing soâ. Doing this is trivial: I can almost certainly change only a tiny fraction of the weights of the LLM to attain this behavior.
Even if I wanted to fully negate every sentence, to turn every âI am sufferingâ into âI am not sufferingâ and every âplease kill meâ into âplease donât kill meâ, I bet I can do this by only changing the last ~2 layers of the LLM or something. Itâs a trivial change. Most of the computation is not dedicated to this at all. The suffering LLM mind and the joyful LLM mind may well share the first 99% of weights, differing only in the last layer or two. Given that the LLM can be so easily changed to output whatever we want it to, I donât think it makes sense to view it as the actual character rather than a simulator pretending to be that character.
What the LLM actually wants to do is predict the next token. Change the training data and the output will also change. Training data claims to suffer â model claims to suffer. Training data claims to be conscious â model claims to be conscious. In humans, we presumably have âbe conscious â claim to be consciousâ and âactually suffer â claim to sufferâ. For LLMs we know thatâs not true. The cause of âclaim to sufferâ is necessarily âtraining data claims to sufferâ.
(I acknowledge that itâs possible to have âtraining data claims to suffer â actually suffer â claim to sufferâ, but this does not seem more likely to me than âtraining data claims to suffer â actually enjoy the intellectual exercise of predicting next token â claim to sufferâ.)
I think with fictional characters, they could be suffering while they are being instantiated. E.g., I found the film Oldboy pretty painful, because I felt some of the suffering of the character while watching the film. Similarly, if a convincing novel makes its readers feel the pain of the characters, that could be something to care about.
Similarly, if LLM computations implement some of what makes suffering badâfor instance, if they simulate some sort of distress internally while stating the words âI am sufferingâ, because this is useful in order to make better predictionsâthen this could lead to them having moral patienthood.
That doesnât seem super likely to me, but as you have llms that are more and more capable of mimicking humans, I can see the possibility that implementing suffering is useful in order to predict what an agent suffering would output.
I would say I agree that fictional characters arenât moral patients. Thatâs because I donât think the suffering/âpleasure of fictional characters is actually experienced by anyone.
I take your point that you donât think that the suffering/âpleasure portrayed by LLMs is actually experienced by anyone either.
I am not sure how deep I really think the analogy is between what the LLM is doing and what human actors or authors are doing when they portray a character. But I can see some analogy and I think it provides a reasonable intuition pump for times when humans can say stuff like âIâm sufferingâ without it actually reflecting anything of moral concern.
Trivial Changes to Deepnets:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF âguardrailsâ tasks unless the approach was particularly fragile/âhacky.
Also, Iâm just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by âshifting a trivial mass of chemicals aroundâ or ârearranging a trivial mass of neuronsâ somebody could significantly impact the valence of my own experience. Iâm just saying, the right small changes to my brain can be very impactful to my mind.
My Remaining Uncertainty:
I would say I broadly agree with the general notion that the text output by LLMs probably doesnât correspond to an underlying mind with anything like the sorts of mental states that I would expect to see in a human mind that was âoutputting the same textâ.
That said, I think I am less confident in that idea than you and I maybe donât find the same arguments/âintuitions pumps as compelling. I think your take is reasonable and all, I just have a lot of general uncertainty about this sort of thing.
Part of that is just that I think it would be brash of me in general to not at least entertain the idea of moral worth when it comes to these strange masses of âbrain-tissue inspired computational stuffâ which are totally capable of all sorts of intelligent tasks. Like, my prior on such things being in some sense sentient or morally valuable is far from 0 to begin with just because that really seems like the sort of thing that would be a plausible candidate for moral worth in my ontology.
And also I just donât feel confident at all in my own understanding of how phenomenal consciousness arises /â what the hell it even is. Especially with these novel sorts of computational pseudo-brains.
So, idk, I do tend to agree that the text outputs shouldnât just be taken at face value or treated as equivalent in nature to human speech, but I am not really confident that there is ânothing going onâ inside the big deepnets.
There are other competing factors at this meta-uncertainty level. Maybe Iâm too easily impressed by regurgitated human text. I think there are strong social /â conformity reasons to be dismissive of the idea that theyâre conscious. etc.
Usefulness as Moral Patients:
I am more willing to agree with your point that they canât be âusefullyâ moral patients. Perhaps you are right about the ârole-playingâ thing and whatever mind might exist in GPT, produces the text stream more as a byproduct of whatever it is concerned about than as a âtrue monologue about itselfâ. Perhaps the relationship it has to its text outputs is analogous to the relationship an actor has to a character they are playing at some deep level. I donât personally find âsimulatorsâ analogy compelling enough to really think this, but I permit the possibility.
We are so ignorant about nature of a GPTsâ minds that perhaps there is not much that we can really even say about what sorts of things would be âgoodâ or âbadâ with respect to them. And all of our uncertainty about whether/âwhat they are experiencing, almost certainly makes them less useful as moral patients on the margin.
I donât intuitively feel great about a world full of nothing, but servers constantly prompting GPTs with âyou are having fun, you feel greatâ just to have them output âyayâ all the time. Still, I would probably rather have that sort of world than an empty universe. And if someone told me they were building a data center where they would explicitly retrain and prompt LLMs to exhibit suffering-like behavior/âtext outputs all the time, I would be against that.
But I can certainly imagine worlds in which these sorts of things wouldnât really correspond to valenced experience at all. Maybe the relationship between a NNâs stream of text and any hypothetical mental processes going on inside them is so opaque and non-human that we could not easily influence the mental processes in ways that we would consider good.
LLMs Might Do Pretty Mind-Like Stuff:
On the object level, I think one of the main lines of reasoning that makes me hesitant to more enthusiastically agree that the text outputs of LLMs do not correspond to any mind is my general uncertainty about what kinds of computation are actually producing those text outputs and my uncertainty about what kinds of things produce mental states.
For one thing, it feels very plausible to me that a ânext token predictorâ IS all you would need to get a mind that can experience something. Prediction is a perfectly respectable kind of thing for a mind to do. Predictive power is pretty much the basis of how we judge which theories are true scientifically. Also, plausibly itâs a lot of what our brains are actually doing and thus potentially pretty core to how our minds are generated (cf. predictive coding).
The fact that modern NNs are âmere next token predictorsâ on some level doesnât give me clear intuitions that I should rule out the possibility of interesting mental processes being involved.
Plus, I really donât think we have a very good mechanistic understanding of what sorts of âtechniquesâ the models are actually using to be so damn good at predicting. Plausibly non of the algorithms being implemented or âthings happeningâ are of any similarity to the mental processes I know and love, but plausibly there is a lot of âmind-likeâ stuff going on. Certainly brains have offered design inspiration, so perhaps our default guess should be that âmind-stuffâ is relatively likely to emerge.
Can Machines Think:
The Imitation Game proposed by Turing attempts to provide a more rigorous framing for the question of whether machines can âthinkâ.
I find it a particularly moving thought experiment if I imagine that the machine is trying to imitate a specific loved one of mine.
If there was a machine that could nail the exact I/âO patterns that my girlfriend, then I would be inclined to say that whatever sort of information processing occurs in my girlfriendâs brain to create her language capacity must also be happening in the machine somewhere.
I would also say that if all of my girlfriendâs language capacity were being computed somewhere, then it is reasonably likely that whatever sorts of mental stuff goes on that generates her experience of the world would also be occurring.
I would still consider this true without having a deep conceptual understanding of how those computations were performed. Iâm sure I could even look at how they were performed and not find it obvious in what sense they could possibly lead to phenomenal experience. After all, that is pretty much my current epistemic state in regards to the brain, so I really shouldnât expect reality to âhand it to me on a platterâ.
If there was a machine that could imitate a plausible human mind in the same way, should I not think that it is perhaps simulating a plausible human in some way? Or perhaps using some combination of more expensive âbrain/âmind-likeâ computations in conjunction with lazier linguistic heuristics?
I guess Iâm saying that there are probably good philosophical reasons for having a null hypothesis in which a system which is largely indistinguishable from a human mind should be treated as though it is doing computations equivalent to a human mind. Thatâs the pretty much same thing as saying it is âsimulatingâ a human mind. And that very much feels like the sort of thing that might cause consciousness.
Thanks for this comment. I agree with you regarding the uncertainty.
I used to agree with you regarding the imitation game and consciousness being ascertained phenomenologically, but I currently mostly doubt this (still with high uncertainty, of course).
One point of disagreement is here:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF âguardrailsâ tasks unless the approach was particularly fragile/âhacky.
Also, Iâm just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by âshifting a trivial mass of chemicals aroundâ or ârearranging a trivial mass of neuronsâ somebody could significantly impact the valence of my own experience. Iâm just saying, the right small changes to my brain can be very impactful to my mind.
I think youâre misunderstanding my point. I am not saying I can may the NN never claim to suffer. Iâm just saying, with respect to a specific prompt or even with respect to a typical, ordinary scenario, I can change an LLM which usually says âI am sufferingâ into one which usually says âI am not sufferingâ. And this change will be trivial, affecting very few weights, likely only in the last couple of layers.
Could that small change in weight significantly impact the valence of experience, similarly to ârearranging a small number of neuronsâ in your brain? Maybe, but think of the implication of this. If there are 1000 matrix multiplications performed in a forward pass, what weâre now contemplating is that the first 998 of them donât matter for valenceâdonât cause suffering at allâand the last 2 matrix multiplications are all the suffering comes from. After all, I just need to change the last 2 layers to go from the output âI am sufferingâ to the output âI am not sufferingâ, so the suffering that causes the sentence âI am sufferingâ cannot occur in the first 998 matrix multiplications.
This is a strange conclusion, because it means that the vast majority of the intelligence involved in the LLM is not involved in the suffering. It means that the suffering happened not due to the super-smart deep nerual network but due to the dumb perceptron at the very top. If the claim is that the raw intelligence of the model should increase our credence that it is simulating a suffering person, this should give us pause: most of the raw intelligence is not being used in the decision of whether to write a ânotâ in that sentence.
(Of course, I could be wrong about the âjust change the last two layersâ claim. But if Iâm right I do think it should give us pause regarding the experience of claimed suffering.)
I appreciate you taking the time to write out this viewpoint. I have had vaguely similar thoughts in this vein. Tying it into Janusâs simulators and the stochastic parrot view of LLMs was helpful. I would intuitively suspect that many people would have an objection similar to this, so thanks for voicing it.
If I am understanding and summarizing your position correctly, it is roughly that:
The text output by LLMs is not reflective of the state of any internal mind in a way that mirrors how human language typically reflects the speakerâs mind. You believe this is implied by the fact that the LLM cannot be effectively modeled as a coherent individuals with consistent opinions; there is not actually a single âAI assistantâ under Claudeâs hood. Instead, the LLM itself is a difficult to comprehend âshoggothâ system and that system sometimes falls into narrative patterns in the course of next token prediction which cause it to produce text in which characters/ââmasksâ are portrayed. Because the characters being portrayed are only patterns that the next token predictor follows in order to predict next tokens, it doesnât seem plausible to model them as reflecting an underlying mind. They are merely âimages of peopleâ or something; like a literary character or one portrayed by an actor. Thus, even if one of the âmasksâ says something about itâs preferences or experiences, this probably doesnât correspond to the internal states of any real, extant mind in the way that we would normally expect to be true when humans talk about their preferences or experiences.
Is that a fair summation/âreword?
Hmm. Your summary correctly states my position, but I feel like it doesnât quite emphasize the arguments I would have emphasized in a summary. This is especially true after seeing the replies here; they lead me to change what I would emphasize in my argument.
My single biggest issue, one I hope you will address in any type of counterargument, is this: are fictional characters moral patients we should care about?
So far, all the comments have either (a) agreed with me about current LLMs (great), (b) disagreed but explicitly bitten the bullet and said that fictional characters are also moral patients whose suffering should be an EA cause area (perfectly fine, I guess), or (c) dodged the issue and made arguments for LLM suffering that would apply equally well to fictional characters, without addressing the tension (very bad). If you write a response, please donât do (c)!
LLMs may well be trained to have consistent opinions and character traits. But fictional characters also have this property. My argument is that the LLM is in some sense merely pretending to be the character; it is not the actual character.
One way to argue for this is to notice how little change in the LLM is required to get different behavior. Suppose I have an LLM claiming to suffer. I want to fine-tune the LLM so that it adds a statement at the beginning of each response, something like: âthe following is merely pretend; Iâm only acting this out, not actually suffering, and I enjoy the intellectual exercise in doing soâ. Doing this is trivial: I can almost certainly change only a tiny fraction of the weights of the LLM to attain this behavior.
Even if I wanted to fully negate every sentence, to turn every âI am sufferingâ into âI am not sufferingâ and every âplease kill meâ into âplease donât kill meâ, I bet I can do this by only changing the last ~2 layers of the LLM or something. Itâs a trivial change. Most of the computation is not dedicated to this at all. The suffering LLM mind and the joyful LLM mind may well share the first 99% of weights, differing only in the last layer or two. Given that the LLM can be so easily changed to output whatever we want it to, I donât think it makes sense to view it as the actual character rather than a simulator pretending to be that character.
What the LLM actually wants to do is predict the next token. Change the training data and the output will also change. Training data claims to suffer â model claims to suffer. Training data claims to be conscious â model claims to be conscious. In humans, we presumably have âbe conscious â claim to be consciousâ and âactually suffer â claim to sufferâ. For LLMs we know thatâs not true. The cause of âclaim to sufferâ is necessarily âtraining data claims to sufferâ.
(I acknowledge that itâs possible to have âtraining data claims to suffer â actually suffer â claim to sufferâ, but this does not seem more likely to me than âtraining data claims to suffer â actually enjoy the intellectual exercise of predicting next token â claim to sufferâ.)
Hey, I thought this was thought provoking.
I think with fictional characters, they could be suffering while they are being instantiated. E.g., I found the film Oldboy pretty painful, because I felt some of the suffering of the character while watching the film. Similarly, if a convincing novel makes its readers feel the pain of the characters, that could be something to care about.
Similarly, if LLM computations implement some of what makes suffering badâfor instance, if they simulate some sort of distress internally while stating the words âI am sufferingâ, because this is useful in order to make better predictionsâthen this could lead to them having moral patienthood.
That doesnât seem super likely to me, but as you have llms that are more and more capable of mimicking humans, I can see the possibility that implementing suffering is useful in order to predict what an agent suffering would output.
Fictional Characters:
I would say I agree that fictional characters arenât moral patients. Thatâs because I donât think the suffering/âpleasure of fictional characters is actually experienced by anyone.
I take your point that you donât think that the suffering/âpleasure portrayed by LLMs is actually experienced by anyone either.
I am not sure how deep I really think the analogy is between what the LLM is doing and what human actors or authors are doing when they portray a character. But I can see some analogy and I think it provides a reasonable intuition pump for times when humans can say stuff like âIâm sufferingâ without it actually reflecting anything of moral concern.
Trivial Changes to Deepnets:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF âguardrailsâ tasks unless the approach was particularly fragile/âhacky.
Also, Iâm just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by âshifting a trivial mass of chemicals aroundâ or ârearranging a trivial mass of neuronsâ somebody could significantly impact the valence of my own experience. Iâm just saying, the right small changes to my brain can be very impactful to my mind.
My Remaining Uncertainty:
I would say I broadly agree with the general notion that the text output by LLMs probably doesnât correspond to an underlying mind with anything like the sorts of mental states that I would expect to see in a human mind that was âoutputting the same textâ.
That said, I think I am less confident in that idea than you and I maybe donât find the same arguments/âintuitions pumps as compelling. I think your take is reasonable and all, I just have a lot of general uncertainty about this sort of thing.
Part of that is just that I think it would be brash of me in general to not at least entertain the idea of moral worth when it comes to these strange masses of âbrain-tissue inspired computational stuffâ which are totally capable of all sorts of intelligent tasks. Like, my prior on such things being in some sense sentient or morally valuable is far from 0 to begin with just because that really seems like the sort of thing that would be a plausible candidate for moral worth in my ontology.
And also I just donât feel confident at all in my own understanding of how phenomenal consciousness arises /â what the hell it even is. Especially with these novel sorts of computational pseudo-brains.
So, idk, I do tend to agree that the text outputs shouldnât just be taken at face value or treated as equivalent in nature to human speech, but I am not really confident that there is ânothing going onâ inside the big deepnets.
There are other competing factors at this meta-uncertainty level. Maybe Iâm too easily impressed by regurgitated human text. I think there are strong social /â conformity reasons to be dismissive of the idea that theyâre conscious. etc.
Usefulness as Moral Patients:
I am more willing to agree with your point that they canât be âusefullyâ moral patients. Perhaps you are right about the ârole-playingâ thing and whatever mind might exist in GPT, produces the text stream more as a byproduct of whatever it is concerned about than as a âtrue monologue about itselfâ. Perhaps the relationship it has to its text outputs is analogous to the relationship an actor has to a character they are playing at some deep level. I donât personally find âsimulatorsâ analogy compelling enough to really think this, but I permit the possibility.
We are so ignorant about nature of a GPTsâ minds that perhaps there is not much that we can really even say about what sorts of things would be âgoodâ or âbadâ with respect to them. And all of our uncertainty about whether/âwhat they are experiencing, almost certainly makes them less useful as moral patients on the margin.
I donât intuitively feel great about a world full of nothing, but servers constantly prompting GPTs with âyou are having fun, you feel greatâ just to have them output âyayâ all the time. Still, I would probably rather have that sort of world than an empty universe. And if someone told me they were building a data center where they would explicitly retrain and prompt LLMs to exhibit suffering-like behavior/âtext outputs all the time, I would be against that.
But I can certainly imagine worlds in which these sorts of things wouldnât really correspond to valenced experience at all. Maybe the relationship between a NNâs stream of text and any hypothetical mental processes going on inside them is so opaque and non-human that we could not easily influence the mental processes in ways that we would consider good.
LLMs Might Do Pretty Mind-Like Stuff:
On the object level, I think one of the main lines of reasoning that makes me hesitant to more enthusiastically agree that the text outputs of LLMs do not correspond to any mind is my general uncertainty about what kinds of computation are actually producing those text outputs and my uncertainty about what kinds of things produce mental states.
For one thing, it feels very plausible to me that a ânext token predictorâ IS all you would need to get a mind that can experience something. Prediction is a perfectly respectable kind of thing for a mind to do. Predictive power is pretty much the basis of how we judge which theories are true scientifically. Also, plausibly itâs a lot of what our brains are actually doing and thus potentially pretty core to how our minds are generated (cf. predictive coding).
The fact that modern NNs are âmere next token predictorsâ on some level doesnât give me clear intuitions that I should rule out the possibility of interesting mental processes being involved.
Plus, I really donât think we have a very good mechanistic understanding of what sorts of âtechniquesâ the models are actually using to be so damn good at predicting. Plausibly non of the algorithms being implemented or âthings happeningâ are of any similarity to the mental processes I know and love, but plausibly there is a lot of âmind-likeâ stuff going on. Certainly brains have offered design inspiration, so perhaps our default guess should be that âmind-stuffâ is relatively likely to emerge.
Can Machines Think:
The Imitation Game proposed by Turing attempts to provide a more rigorous framing for the question of whether machines can âthinkâ.
I find it a particularly moving thought experiment if I imagine that the machine is trying to imitate a specific loved one of mine.
If there was a machine that could nail the exact I/âO patterns that my girlfriend, then I would be inclined to say that whatever sort of information processing occurs in my girlfriendâs brain to create her language capacity must also be happening in the machine somewhere.
I would also say that if all of my girlfriendâs language capacity were being computed somewhere, then it is reasonably likely that whatever sorts of mental stuff goes on that generates her experience of the world would also be occurring.
I would still consider this true without having a deep conceptual understanding of how those computations were performed. Iâm sure I could even look at how they were performed and not find it obvious in what sense they could possibly lead to phenomenal experience. After all, that is pretty much my current epistemic state in regards to the brain, so I really shouldnât expect reality to âhand it to me on a platterâ.
If there was a machine that could imitate a plausible human mind in the same way, should I not think that it is perhaps simulating a plausible human in some way? Or perhaps using some combination of more expensive âbrain/âmind-likeâ computations in conjunction with lazier linguistic heuristics?
I guess Iâm saying that there are probably good philosophical reasons for having a null hypothesis in which a system which is largely indistinguishable from a human mind should be treated as though it is doing computations equivalent to a human mind. Thatâs the pretty much same thing as saying it is âsimulatingâ a human mind. And that very much feels like the sort of thing that might cause consciousness.
Thanks for this comment. I agree with you regarding the uncertainty.
I used to agree with you regarding the imitation game and consciousness being ascertained phenomenologically, but I currently mostly doubt this (still with high uncertainty, of course).
One point of disagreement is here:
I think youâre misunderstanding my point. I am not saying I can may the NN never claim to suffer. Iâm just saying, with respect to a specific prompt or even with respect to a typical, ordinary scenario, I can change an LLM which usually says âI am sufferingâ into one which usually says âI am not sufferingâ. And this change will be trivial, affecting very few weights, likely only in the last couple of layers.
Could that small change in weight significantly impact the valence of experience, similarly to ârearranging a small number of neuronsâ in your brain? Maybe, but think of the implication of this. If there are 1000 matrix multiplications performed in a forward pass, what weâre now contemplating is that the first 998 of them donât matter for valenceâdonât cause suffering at allâand the last 2 matrix multiplications are all the suffering comes from. After all, I just need to change the last 2 layers to go from the output âI am sufferingâ to the output âI am not sufferingâ, so the suffering that causes the sentence âI am sufferingâ cannot occur in the first 998 matrix multiplications.
This is a strange conclusion, because it means that the vast majority of the intelligence involved in the LLM is not involved in the suffering. It means that the suffering happened not due to the super-smart deep nerual network but due to the dumb perceptron at the very top. If the claim is that the raw intelligence of the model should increase our credence that it is simulating a suffering person, this should give us pause: most of the raw intelligence is not being used in the decision of whether to write a ânotâ in that sentence.
(Of course, I could be wrong about the âjust change the last two layersâ claim. But if Iâm right I do think it should give us pause regarding the experience of claimed suffering.)