Oh, the whole story is strictly speaking unnecessary :). There are disjunctively many stories for an escape or disaster, and I’m not trying to paint a picture of the most minimal or the most likely barebones scenario.
The point is to serve as a ‘near mode’ visualization of such a scenario to stretch your mind, as opposed to a very ‘far mode’ observation like “hey, an AI could make a plan to take over its reward channel”. Which is true but comes with a distinct lack of flavor. So for that purpose, stuffing in more weird mechanics before a reward-hacking twist is better, even if I could have simply skipped to “HQU does more planning than usual for an HQU and realizes it could maximize its reward by taking over its computer”. Yeah, sure, but that’s boring and doesn’t exercise your brain more than the countless mentions of reward-hacking that a reader has already seen before.
Yeah, a story this complicated isn’t good for introducing people to AI risk (because they’ll assume the added details are necessary for the outcome), but it’s great for making the story more interesting and real-feeling.
The real world is less cute and funny, but is typically even more derpy / inelegant / garden-pathy / full of bizarre details.
Oh, the whole story is strictly speaking unnecessary :). There are disjunctively many stories for an escape or disaster, and I’m not trying to paint a picture of the most minimal or the most likely barebones scenario.
The point is to serve as a ‘near mode’ visualization of such a scenario to stretch your mind, as opposed to a very ‘far mode’ observation like “hey, an AI could make a plan to take over its reward channel”. Which is true but comes with a distinct lack of flavor. So for that purpose, stuffing in more weird mechanics before a reward-hacking twist is better, even if I could have simply skipped to “HQU does more planning than usual for an HQU and realizes it could maximize its reward by taking over its computer”. Yeah, sure, but that’s boring and doesn’t exercise your brain more than the countless mentions of reward-hacking that a reader has already seen before.
Yeah, a story this complicated isn’t good for introducing people to AI risk (because they’ll assume the added details are necessary for the outcome), but it’s great for making the story more interesting and real-feeling.
The real world is less cute and funny, but is typically even more derpy / inelegant / garden-pathy / full of bizarre details.