[Question] What are the challenges and problems with programming law-breaking constraints into AGI?

I’m think­ing the ob­jec­tive func­tion could have con­straints on the ex­pected num­ber of times the AI breaks the law, or the prob­a­bil­ity that it breaks the law, e.g.

  • only ac­tions with a prob­a­bil­ity of break­ing any law < 0.0001 are per­mis­si­ble, or

  • only ac­tions for which the ex­pected num­ber of bro­ken laws is < 0.001 are per­mis­si­ble.

There could also be sep­a­rate con­straints for in­di­vi­d­ual laws or groups of laws, and these could de­pend on the sever­ity of the penalties.

Looser con­straints like this seem like they could avoid is­sues of lex­i­cal­ity and pri­ori­tiz­ing avoidance of break­ing the law over ev­ery­thing we want the AI to ac­tu­ally do, since the surest way to avoid break­ing the law com­pletely would be to never do any­thing (al­though we could also have a sep­a­rate con­straint for this).

Of course, the con­straints should de­pend on break­ing the law, not just be­ing caught break­ing the law, so the AI should pre­dict whether or not it will break the law, not merely whether or not it will be caught break­ing the law.

The AI could also pre­dict whether or not it will break laws that don’t ex­ist now but will in the fu­ture (pos­si­bly even in re­sponse to its ac­tions).

What are the challenges and prob­lems with such an ap­proach? Would it be too difficult to cap­ture such con­straints? Are laws too im­pre­cise or am­bigu­ous for this? Can we just have the AI con­sider mul­ti­ple in­ter­pre­ta­tions of the laws or try to pre­dict how a hu­man (or hu­man judge) would in­ter­pret the law and ap­ply it to its ac­tions given the in­for­ma­tion the AI has?

How much work should the AI spend on es­ti­mat­ing the prob­a­bil­ities that it will break laws?

What kinds of cases would it miss, say, given cur­rent laws?

No comments.