The challenge isn’t figuring out some complicated, nuanced utility function that “represents human values”; the challenge is getting AIs to do what it says on the tin—to reliably do whatever a human operator tells them to do.
Why do you think this? I infer for what I’ve seen written in other posts and comments that this is a common belief but I don’t find the reasons why.
The fact that there are specific really difficult problems with aligning ML systems doesn’t mean that the original really difficult problem with finding and specifying the objectives that we want for a superintelligence were solved.
I hate it because it makes it seems like alignment is a technical problem that can be solved by a single team and as you put it in your other post we should just race and win against the bad guys.
I could try to envision what type of AI you are thinking of and how would you use it, but I would prefer if you tell me. So, what would you ask your aligned AGI to do and how would it interpret that? And how are you so sure that most alignment researchers would ask it the same things as you?
Why do you think this? I infer for what I’ve seen written in other posts and comments that this is a common belief but I don’t find the reasons why.
The fact that there are specific really difficult problems with aligning ML systems doesn’t mean that the original really difficult problem with finding and specifying the objectives that we want for a superintelligence were solved.
I hate it because it makes it seems like alignment is a technical problem that can be solved by a single team and as you put it in your other post we should just race and win against the bad guys.
I could try to envision what type of AI you are thinking of and how would you use it, but I would prefer if you tell me. So, what would you ask your aligned AGI to do and how would it interpret that? And how are you so sure that most alignment researchers would ask it the same things as you?