One horse-sized duck AI. For one thing, the duck is the ultimate (route) optimization process: you can ride it on land, sea, or air. For another, capabilities scale very nonlinearly in size; the neigh of even 1000 duck-sized horse AIs does not compare to the quack of a single horse-sized duck AI. Most importantly, if you can safely do something with 100 opposite-sized AIs, you can safely do the same thing with one opposite-sized AI.
In all seriousness though, we don’t generally think in terms of “proving the friendliness” of an AI system. When doing research, we might prove that certain proposals have flaws (for example, see (1)) as a way of eliminating bad ideas in the pursuit of good ideas. And given a realistic system, one could likely prove certain high-level statistical features (such as “this component of the system has an error rate that vanishes under thus-and-such assumptions”), though it’s not yet clear how useful those proofs would be. Overall, though, the main challenges in friendly AI seem to be ones of design rather than verification. In other words, the problem is to figure out what properties an aligned system should possess, rather than to figure out how to prove them; and then to design a system that satisfies those properties. What properties would constitute friendliness, and what assumptions about the world do they rely on? I expect that answering these questions in formal detail would get us most of the way towards a design for an aligned AI, even if the only guarantees we can give about the actual system are statistical guarantees.
(1) Soares, Nate, et al. “Corrigibility.” Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
Would you rather prove the friendliness of 100 duck-sized horse AIs or one horse-sized duck AI?
One horse-sized duck AI. For one thing, the duck is the ultimate (route) optimization process: you can ride it on land, sea, or air. For another, capabilities scale very nonlinearly in size; the neigh of even 1000 duck-sized horse AIs does not compare to the quack of a single horse-sized duck AI. Most importantly, if you can safely do something with 100 opposite-sized AIs, you can safely do the same thing with one opposite-sized AI.
In all seriousness though, we don’t generally think in terms of “proving the friendliness” of an AI system. When doing research, we might prove that certain proposals have flaws (for example, see (1)) as a way of eliminating bad ideas in the pursuit of good ideas. And given a realistic system, one could likely prove certain high-level statistical features (such as “this component of the system has an error rate that vanishes under thus-and-such assumptions”), though it’s not yet clear how useful those proofs would be. Overall, though, the main challenges in friendly AI seem to be ones of design rather than verification. In other words, the problem is to figure out what properties an aligned system should possess, rather than to figure out how to prove them; and then to design a system that satisfies those properties. What properties would constitute friendliness, and what assumptions about the world do they rely on? I expect that answering these questions in formal detail would get us most of the way towards a design for an aligned AI, even if the only guarantees we can give about the actual system are statistical guarantees.
(1) Soares, Nate, et al. “Corrigibility.” Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.