Thank you for the edit, and thank you again for your interest. I’m still not sure what you mean by a person “having access to the ground truth of the universe”. There’s just no sense I can think of where it is true that this a requirement for the mentor.
“The system is only safe if the mentor knows what is safe.” It’s true that if the mentor kills everyone, then the combined mentor-agent system would kill everyone, but surely that fact doesn’t weight against this proposal at all. In any case, more importantly a) the agent will not aim to kill everyone regardless of whether the mentor would (Corollary 14), which I think refutes your comment. And b) for no theorem in the paper does the mentor need to know what is safe; for Theorem 11 to be interesting, he just needs to act safely (an important difference for a concept so tricky to articulate!). But I decided these details were beside the point for this post, which is why I only cited Corollary 14 in the OP, not Theorem 11.
Thank you for the edit, and thank you again for your interest. I’m still not sure what you mean by a person “having access to the ground truth of the universe”. There’s just no sense I can think of where it is true that this a requirement for the mentor.
“The system is only safe if the mentor knows what is safe.” It’s true that if the mentor kills everyone, then the combined mentor-agent system would kill everyone, but surely that fact doesn’t weight against this proposal at all. In any case, more importantly a) the agent will not aim to kill everyone regardless of whether the mentor would (Corollary 14), which I think refutes your comment. And b) for no theorem in the paper does the mentor need to know what is safe; for Theorem 11 to be interesting, he just needs to act safely (an important difference for a concept so tricky to articulate!). But I decided these details were beside the point for this post, which is why I only cited Corollary 14 in the OP, not Theorem 11.
Do you have a minute to react to this? Are you satisfied with my response?