Its not role playing if the AI behaves completely differently except when the instruction set is specifically set up so the AI parses it as role playing, in this case with other glaringly obviously unrealistic scene-setting like sending it a config file with it’s own monitoring settings along with a logic puzzle involving a concealed goal. I can get a bucket of water to conduct a sneak attack on me if I put enough effort into setting it up to fall on my head, but that doesn’t mean I need to worry about being ambushed by water on a regular basis!
Re “role-playing”, that is moot when it’s the end result that matters—what actions the AI takes in the world. See also: Frontier AI systems have surpassed the self-replicating red line.
Its not role playing if the AI behaves completely differently except when the instruction set is specifically set up so the AI parses it as role playing, in this case with other glaringly obviously unrealistic scene-setting like sending it a config file with it’s own monitoring settings along with a logic puzzle involving a concealed goal. I can get a bucket of water to conduct a sneak attack on me if I put enough effort into setting it up to fall on my head, but that doesn’t mean I need to worry about being ambushed by water on a regular basis!