If you did a bunch of RL on a persistent agent running metacog, I could definitely believe it could learn that kind of thing.
But I’m worried you’re anthropomorphizing. Claude isn’t a persistent agent, and can’t do metacog in a way that feeds back into its other thinking (except train of thought within a context window). I can’t really see how there could be room for Claude to learn such things (that aren’t being taught directly) during the training process.
If you did a bunch of RL on a persistent agent running metacog, I could definitely believe it could learn that kind of thing.
But I’m worried you’re anthropomorphizing. Claude isn’t a persistent agent, and can’t do metacog in a way that feeds back into its other thinking (except train of thought within a context window). I can’t really see how there could be room for Claude to learn such things (that aren’t being taught directly) during the training process.