Post summary (feel free to suggest edits!): Argues that statements by large language models that seem to report their internal life (eg. ‘I feel scared because I don’t know what to do’), isn’t straightforward evidence either for or against the sentience of that model. As an analogy, parrots are probably sentient and very likely feel pain. But when they say ‘I feel pain’, that doesn’t mean they are in pain.
It might be possible to train systems to more accurately report if they are sentient, via removing any other incentives for saying conscious-sounding things, and training them to report their own mental states. However, this could advance dangerous capabilities like situational awareness, and training on self-reflection might also be what ends up making a system sentient.
(This will appear in this week’s forum summary. If you’d like to see more summaries of top EA and LW forum posts, check out the Weekly Summaries series.)
I like it! I think one thing the post itself could have been clearer on is that reports could be indirect evidence for sentience, in that they are evidence of certain capabilities that are themselves evidence of sentience. To give an example (though it’s still abstract), the ability of LLMs to fluently mimic human speech —> evidence for capability C—> evidence for sentience. You can imagine the same thing for parrots: ability to say “I’m in pain”—> evidence of learning and memory —> evidence of sentience. But what they aren’t are reports of sentience.
so maybe at the beginning: aren’t “strong evidence” or “straightforward evidence”
Post summary (feel free to suggest edits!):
Argues that statements by large language models that seem to report their internal life (eg. ‘I feel scared because I don’t know what to do’), isn’t straightforward evidence either for or against the sentience of that model. As an analogy, parrots are probably sentient and very likely feel pain. But when they say ‘I feel pain’, that doesn’t mean they are in pain.
It might be possible to train systems to more accurately report if they are sentient, via removing any other incentives for saying conscious-sounding things, and training them to report their own mental states. However, this could advance dangerous capabilities like situational awareness, and training on self-reflection might also be what ends up making a system sentient.
(This will appear in this week’s forum summary. If you’d like to see more summaries of top EA and LW forum posts, check out the Weekly Summaries series.)
I like it! I think one thing the post itself could have been clearer on is that reports could be indirect evidence for sentience, in that they are evidence of certain capabilities that are themselves evidence of sentience. To give an example (though it’s still abstract), the ability of LLMs to fluently mimic human speech —> evidence for capability C—> evidence for sentience. You can imagine the same thing for parrots: ability to say “I’m in pain”—> evidence of learning and memory —> evidence of sentience. But what they aren’t are reports of sentience.
so maybe at the beginning: aren’t “strong evidence” or “straightforward evidence”
Fixed, thanks!