I want (and I suspect you also want) AI systems to have such incentivization.
Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AI’s actions, then that automatically disincentivizes AI systems from breaking the law.
(I’m not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesn’t seem necessary in the world where we’ve solved alignment.)
I don’t see why (from a societal perspective) we shouldn’t just do that on the actor’s side and not the “police’s” side.
I agree that doing it on the actor’s side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isn’t bound by law.
E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit that’s actually enforced is 10mph higher), you fire that chauffeur and find a different one.
(Also, I’m assuming you’re teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, I’d make the second argument that it isn’t likely to work.)
Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AI’s actions, then that automatically disincentivizes AI systems from breaking the law.
(I’m not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesn’t seem necessary in the world where we’ve solved alignment.)
I agree that doing it on the actor’s side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isn’t bound by law.
E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit that’s actually enforced is 10mph higher), you fire that chauffeur and find a different one.
(Also, I’m assuming you’re teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, I’d make the second argument that it isn’t likely to work.)