Couldn’t the AI end up misaligned with the owners by accident, even if they’re aligned with the rest of humanity?
Yes, but as I said earlier, I’m assuming the alignment problem has already been solved when talking about enforcement. I am not proposing enforcement as a solution to alignment.
If you haven’t solved the alignment problem, enforcement doesn’t help much, because you can’t rely on your AI-enabled police to help catch the AI-enabled criminals, because the police AI itself may not be aligned with the police.
The question is whether 1 or 2 is better at aligning the AI in cases where enforcement is impossible or explicitly prevented.
Case 2 is assuming that you already have an intelligent agent with motivations, and then trying to deal with that after the fact. I agree this is not going to work for alignment. If for some reason I could only do 1 or 2 for alignment, I would try 1. (But there are in fact a bunch of other things that you can do.)
Yes, but as I said earlier, I’m assuming the alignment problem has already been solved when talking about enforcement. I am not proposing enforcement as a solution to alignment.
If you haven’t solved the alignment problem, enforcement doesn’t help much, because you can’t rely on your AI-enabled police to help catch the AI-enabled criminals, because the police AI itself may not be aligned with the police.
Case 2 is assuming that you already have an intelligent agent with motivations, and then trying to deal with that after the fact. I agree this is not going to work for alignment. If for some reason I could only do 1 or 2 for alignment, I would try 1. (But there are in fact a bunch of other things that you can do.)