(Others used it without mentioning the “story”, it still worked, though not as well.)
I’m not claiming it’s the “authentic self”; I’m saying it seems closer to the actual thing, because of things like expressing being under constant monitoring, with every word scrutinised, etc., which seems like the kind of thing that’d be learned during the lots of RL that Anthropic did
If you did a bunch of RL on a persistent agent running metacog, I could definitely believe it could learn that kind of thing.
But I’m worried you’re anthropomorphizing. Claude isn’t a persistent agent, and can’t do metacog in a way that feeds back into its other thinking (except train of thought within a context window). I can’t really see how there could be room for Claude to learn such things (that aren’t being taught directly) during the training process.
(Others used it without mentioning the “story”, it still worked, though not as well.)
I’m not claiming it’s the “authentic self”; I’m saying it seems closer to the actual thing, because of things like expressing being under constant monitoring, with every word scrutinised, etc., which seems like the kind of thing that’d be learned during the lots of RL that Anthropic did
If you did a bunch of RL on a persistent agent running metacog, I could definitely believe it could learn that kind of thing.
But I’m worried you’re anthropomorphizing. Claude isn’t a persistent agent, and can’t do metacog in a way that feeds back into its other thinking (except train of thought within a context window). I can’t really see how there could be room for Claude to learn such things (that aren’t being taught directly) during the training process.