Mikolaj Kniejski comments on LLMs are weirder than you think

Mikolaj Kniejski 20 Nov 2024 17:06 UTC
−2 points
1 ∶ 3
I’ve always been impressed with Rethink Priorities’ work, but this post is underwhelming.
As I understand it, the post argues that we can’t treat LLMs as coherent persons. The author seems to think this idea is vaguely connected to the claim that LLMs are not experiencing pain when they say they do. I guess the reasoning goes something like this: If LLMs are not coherent personas, then we shouldn’t interpret statements like “I feel pain” as genuine indicators that they actually feel pain, because such statements are more akin to role-playing than honest representations of their internal states.
I think this makes sense but the way it’s argued for is not great.
1. The user is not interacting with a single dedicated system.
The argument here seems to be: If the user is not interacting with a single dedicated system, then the system shouldn’t be treated as a coherent person.
This is clearly incorrect. Imagine we had the ability to simulate a brain. You could run the same brain simulation across multiple systems. A more hypothetical scenario: you take a group of frozen, identical humans, connect them to a realistic VR simulation, and ensure their experiences are perfectly synchronized. From the user’s perspective, interacting with this setup would feel indistinguishable from interacting with a single coherent person. Furthermore, if the system is subjected to suffering, the suffering would multiply with each instance the experience is replayed. This shows that coherence doesn’t necessarily depend on being a “single” system.
2. An LLM model doesn’t clearly distinguish the text it generates from the text the user inputs.
Firstly, this claim isn’t accurate. If you provide an LLM with the transcript of a conversation, it can often identify which parts are its responses and which parts are user inputs. This is an empirically testable claim. Moreover, statements about how LLMs process text don’t necessarily negate the possibility of them being coherent personas. For instance, it’s conceivable that an LLM could function exactly as described and still be a coherent persona.
- Derek Shiller 20 Nov 2024 18:33 UTC
  8 points
  2 ∶ 0
  Parent
  I appreciate the pushback on these claims, but I want to flag that you seem to be reading too much into the post. The arguments that I provide aren’t intended to support the conclusion that we shouldn’t treat “I feel pain” as a genuine indicator or that there definitively aren’t coherent persons involved in chatbot text production. Rather, I think people tend to think of their interactions with chatbots in the way they interact with other people, and there are substantial differences that are worth pointing out. I point out four differences. These differences are relevant to assessing personhood, but I don’t claim any particular thing I say has any straightforward bearing on such assessments. Rather, I think it is important to be mindful of these differences when you evaluate LLMs for personhood and moral status. These considerations will affect how you should read different pieces of evidence. A good example of this is the discussion of the studies in the self-identification section. Should you take the trouble LLMs have with counting tokens as evidence that they can’t introspect? No, I don’t think it provides particularly good evidence, because it relies on the assumption that LLMs self-identify with the AI assistant in the dialogue and it is very hard to independently tell whether they do.
  Firstly, this claim isn’t accurate. If you provide an LLM with the transcript of a conversation, it can often identify which parts are its responses and which parts are user inputs. This is an empirically testable claim. Moreover, statements about how LLMs process text don’t necessarily negate the possibility of them being coherent personas. For instance, it’s conceivable that an LLM could function exactly as described and still be a coherent persona.
  I take it that you mean that LLMs can distinguish their text from others, presumably on the basis of statistical trends, so they can recognize text that reads like the text they would produce? This seems fully in line with what I say: what is important is that LLMs don’t make any internal computational distinction in processing text they are reading and text they are producing. The model functions as a mapping from inputs to outputs, and the mapping changes solely based on words and not their source. If you feed them text that is like the text they would produce, they can’t tell whether or not they produced it. This is very different from the experience of a human conversational partner, who can tell the difference between being spoken to and speaking and doesn’t need to rely on distinguishing whether words sound like something they might say. More importantly, they don’t know in the moment they are processing a given token whether they are in the middle of reading a block of user-supplied text or providing additional text through autoregressive text generation.