Cheekily butting in here to +1 David’s point—I don’t currently think it’s currently reasonable to assume that there is a relationship between the inner workings of an AI system which might lead to valenced experience, and its textual output.
For me I think this is based on the idea that when you ask a question, there isn’t a sense in which an LLM ‘introspects’. I don’t subscribe to the reductive view that LLMs are merely souped up autocorrect, but they do have something in common. An LLM role-plays whatever conversation it finds itself in. They have long been capable of role-playing ‘I’m conscious, help’ conversations, as well as ‘I’m just a tool built by OpenAI’ conversations. I can’t imagine any evidence coming from LLM self-reports which isn’t undermined by this fact.
Cheekily butting in here to +1 David’s point—I don’t currently think it’s currently reasonable to assume that there is a relationship between the inner workings of an AI system which might lead to valenced experience, and its textual output.
For me I think this is based on the idea that when you ask a question, there isn’t a sense in which an LLM ‘introspects’. I don’t subscribe to the reductive view that LLMs are merely souped up autocorrect, but they do have something in common. An LLM role-plays whatever conversation it finds itself in. They have long been capable of role-playing ‘I’m conscious, help’ conversations, as well as ‘I’m just a tool built by OpenAI’ conversations. I can’t imagine any evidence coming from LLM self-reports which isn’t undermined by this fact.