Don’t give your AI system excess compute. Like ideally on a hardware level. Run it on a small isolated machine not a 0.1% timeshare on a supercomputer.
Use the coding practices developed by Nasa to minimize standard bugs.
Record all random seeds and input data to make everything reproducible.
Put in hard coded sanity checks between AI and output. A robot arm isn’t allowed to move beyond safe limits by a simple max(AI_OUTPUT, MAXIMUM_ARM_ANGLE) type code.
Humans checking in the loop.
Hardware minimization of unneeded action space. Suppose you are programming a social media bot. The AI is quite smart. If you have 1 computer running the AI that has a network connection, and it rowhammers, it can send arbitrary internet packages. Computer A runs the AI and sends raw ascii down a wire to computer B. Computer B checks every simple check that you would expect to be true in normal operation, and performs a hard shutdown if it spots anything odd. (Ie funny characters, post too long or short. ) Computer A of course has no other connections, has a power supply resistant to modulated energy demand attacks, has a radio and audio noise generator next to it and is in a Farady cage.
2 or 3 redundant computers (cosmic ray errors)
Thermite on a timer. Slag the whole computer system. Stops the AI thinking too long. Makes it harder to hide info for future versions of itself.
The speculative answers.
Holomorphic encryption.
Run it on a quantum computer using amplitude amplification techniques. Any breaking out of the box technique must be reversible to work, or use new physics.
Honeypots that look like good hacking targets.
A fairly easy way for any moderately smart AI to wirehead, setting its own utility to maximum. (and triggering some alarm/ delayed shutdown.)
The silly answers
Chuck the AI system into a black hole, set to only start computing after they cross the event horizon.
The boring answers
Don’t give your AI system excess compute. Like ideally on a hardware level. Run it on a small isolated machine not a 0.1% timeshare on a supercomputer.
Use the coding practices developed by Nasa to minimize standard bugs.
Record all random seeds and input data to make everything reproducible.
Put in hard coded sanity checks between AI and output. A robot arm isn’t allowed to move beyond safe limits by a simple max(AI_OUTPUT, MAXIMUM_ARM_ANGLE) type code.
Humans checking in the loop.
Hardware minimization of unneeded action space. Suppose you are programming a social media bot. The AI is quite smart. If you have 1 computer running the AI that has a network connection, and it rowhammers, it can send arbitrary internet packages. Computer A runs the AI and sends raw ascii down a wire to computer B. Computer B checks every simple check that you would expect to be true in normal operation, and performs a hard shutdown if it spots anything odd. (Ie funny characters, post too long or short. ) Computer A of course has no other connections, has a power supply resistant to modulated energy demand attacks, has a radio and audio noise generator next to it and is in a Farady cage.
2 or 3 redundant computers (cosmic ray errors)
Thermite on a timer. Slag the whole computer system. Stops the AI thinking too long. Makes it harder to hide info for future versions of itself.
The speculative answers.
Holomorphic encryption.
Run it on a quantum computer using amplitude amplification techniques. Any breaking out of the box technique must be reversible to work, or use new physics.
Honeypots that look like good hacking targets.
A fairly easy way for any moderately smart AI to wirehead, setting its own utility to maximum. (and triggering some alarm/ delayed shutdown.)
The silly answers
Chuck the AI system into a black hole, set to only start computing after they cross the event horizon.