Certainly you still need legal accountabilityâwhy wouldnât we have that? If we solve alignment, then we can just have the AIâs owner be accountable for any law-breaking actions the AI takes.
I agree that that is a very good and desirable step to take. However, as I said, it also incentives the AI-agent to obfuscate its actions and intentions to save its principal. In the human context, human agents do this but are independently disincentivized from breaking the law they face legal liability (a disincentive) for their actions. I want (and I suspect you also want) AI systems to have such incentivization.
If I understand correctly, you identify two ways to do this in the teenager analogy:
Rewiring
Explaining laws and their consequences and letting the agentâs existing incentives do the rest.
I could be wrong about this, but ultimately, for AI systems, it seems like both are actually similarly difficult. As youâve said, for 2. to be most effective, you probably need âAI police.â Those police will need a way of interpreting the legality of an AI agentâs {âmentalâ state; actions} and mapping them only existing laws.
But if you need to do that for effective enforcement, I donât see why (from a societal perspective) we shouldnât just do that on the actorâs side and not the âpoliceâsâ side. Baking the enforcement into the agents has the benefits of:
Not incentivizing an arms race
Giving the enforcerâs a clearer picture of the AIâs âmental stateâ
I want (and I suspect you also want) AI systems to have such incentivization.
Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AIâs actions, then that automatically disincentivizes AI systems from breaking the law.
(Iâm not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesnât seem necessary in the world where weâve solved alignment.)
I donât see why (from a societal perspective) we shouldnât just do that on the actorâs side and not the âpoliceâsâ side.
I agree that doing it on the actorâs side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isnât bound by law.
E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit thatâs actually enforced is 10mph higher), you fire that chauffeur and find a different one.
(Also, Iâm assuming youâre teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, Iâd make the second argument that it isnât likely to work.)
I agree that that is a very good and desirable step to take. However, as I said, it also incentives the AI-agent to obfuscate its actions and intentions to save its principal. In the human context, human agents do this but are independently disincentivized from breaking the law they face legal liability (a disincentive) for their actions. I want (and I suspect you also want) AI systems to have such incentivization.
If I understand correctly, you identify two ways to do this in the teenager analogy:
Rewiring
Explaining laws and their consequences and letting the agentâs existing incentives do the rest.
I could be wrong about this, but ultimately, for AI systems, it seems like both are actually similarly difficult. As youâve said, for 2. to be most effective, you probably need âAI police.â Those police will need a way of interpreting the legality of an AI agentâs {âmentalâ state; actions} and mapping them only existing laws.
But if you need to do that for effective enforcement, I donât see why (from a societal perspective) we shouldnât just do that on the actorâs side and not the âpoliceâsâ side. Baking the enforcement into the agents has the benefits of:
Not incentivizing an arms race
Giving the enforcerâs a clearer picture of the AIâs âmental stateâ
Not obviously. My point is just that if the AI is aligned with an human principal, and that human principal can be held accountable for the AIâs actions, then that automatically disincentivizes AI systems from breaking the law.
(Iâm not particularly opposed to AI systems being disincentivized directly, e.g. by making it possible to hold AI systems accountable for their actions. It just doesnât seem necessary in the world where weâve solved alignment.)
I agree that doing it on the actorâs side is better if you can ensure it for all actors, but you have to also prevent the human principal from getting a different actor that isnât bound by law.
E.g. if you have a chauffeur who refuses to exceed the speed limit (in a country where the speed limit thatâs actually enforced is 10mph higher), you fire that chauffeur and find a different one.
(Also, Iâm assuming youâre teaching the agent to follow the law via something like case 2 above, where you have it read the law and understand it using its existing abilities, and then train it somehow to not break the law. If you were instead thinking something like case 1, Iâd make the second argument that it isnât likely to work.)