For AGI there isn’t much of a distinction between giving advice and taking actions, so this isn’t part of our argument for safety in the long run. But in the time between here and AGI it’s better to focus on supporting reasoning to help us figure out how to manage this precarious situation.
Do I understand correctly: “safety in the long run” is unrelated to what you’re currently doing in any negative way—you don’t think you’re advancing AGI-relevant capabilities (and so there is no need to try to align-or-whatever your forever-well-below-AGI system), do I understand correctly?
The product you are building only gives advice, it doesn’t take actions
If this would be enough, couldn’t we make a normal AGI and ask only ask it for advice without giving it the capability to take actions?
For AGI there isn’t much of a distinction between giving advice and taking actions, so this isn’t part of our argument for safety in the long run. But in the time between here and AGI it’s better to focus on supporting reasoning to help us figure out how to manage this precarious situation.
Do I understand correctly: “safety in the long run” is unrelated to what you’re currently doing in any negative way—you don’t think you’re advancing AGI-relevant capabilities (and so there is no need to try to align-or-whatever your forever-well-below-AGI system), do I understand correctly?
Please feel free to correct me!
No, it’s that our case for alignment doesn’t rest on “the system is only giving advice” as a step. I sketched the actual case in this comment.