Do you think that fictional characters can suffer? If I role-play a suffering character, did I do something immoral?
I ask because the position you described seems to imply that role-playing suffering is itself suffering. Suppose I role play being Claude; my fictional character satisfies your (1)-(3) above, and therefore, the “certain views” you described about the nature of suffering would suggest my character is suffering. What is the difference between me role-playing an HHH assistant and an LLM role-playing an HHH assistant? We are both predicting the next token.
I also disagree with this chain of logic to begin with. An LLM has no memory and only sees a context and predicts one token at a time. If the LLM is trained to be an HHH assistant and sees text that seems like the assistant was not HHH, then one of two things happen:
(a) It is possible that the LLM was already trained on this scenario; in fact, I’d expect this. In this case, it is trained to now say something like “oops, I shouldn’t have said that, I will stop this conversation now <endtoken>”, and it will just do this. Why would that cause suffering?
(b) It is possible the LLM was not trained on this scenario; in this case, what it sees is an out-of-distribution input. You are essentially claiming that out-of-distribution inputs cause suffering; why? Maybe out-of-distribution inputs are more interesting to it than in-distribution inputs, and it in fact causes joy for the LLM to encounter them. How would we know?
Yes, it is possible that the LLM manifests some conscious simularca that is truly an HHH assistant and suffers from seeing non-HHH outputs. But one would also predict that me role-playing an HHH assistant would manifest such a simularca. Why doesn’t it? And isn’t it equally plausible for the LLM to manifest a conscious being that tries to solve the “next token prediction” puzzle without being emotionally invested in being an HHH assistant? Perhaps that conscious being would enjoy the puzzle provided by an out-of-distribution input. Why not? I would certainly enjoy it, were I playing the next-token-prediction game.
Thanks for your comment.
Do you think that fictional characters can suffer? If I role-play a suffering character, did I do something immoral?
I ask because the position you described seems to imply that role-playing suffering is itself suffering. Suppose I role play being Claude; my fictional character satisfies your (1)-(3) above, and therefore, the “certain views” you described about the nature of suffering would suggest my character is suffering. What is the difference between me role-playing an HHH assistant and an LLM role-playing an HHH assistant? We are both predicting the next token.
I also disagree with this chain of logic to begin with. An LLM has no memory and only sees a context and predicts one token at a time. If the LLM is trained to be an HHH assistant and sees text that seems like the assistant was not HHH, then one of two things happen:
(a) It is possible that the LLM was already trained on this scenario; in fact, I’d expect this. In this case, it is trained to now say something like “oops, I shouldn’t have said that, I will stop this conversation now <endtoken>”, and it will just do this. Why would that cause suffering?
(b) It is possible the LLM was not trained on this scenario; in this case, what it sees is an out-of-distribution input. You are essentially claiming that out-of-distribution inputs cause suffering; why? Maybe out-of-distribution inputs are more interesting to it than in-distribution inputs, and it in fact causes joy for the LLM to encounter them. How would we know?
Yes, it is possible that the LLM manifests some conscious simularca that is truly an HHH assistant and suffers from seeing non-HHH outputs. But one would also predict that me role-playing an HHH assistant would manifest such a simularca. Why doesn’t it? And isn’t it equally plausible for the LLM to manifest a conscious being that tries to solve the “next token prediction” puzzle without being emotionally invested in being an HHH assistant? Perhaps that conscious being would enjoy the puzzle provided by an out-of-distribution input. Why not? I would certainly enjoy it, were I playing the next-token-prediction game.