“it seems to me that all AIs (and other technologies) already don’t give us exactly what we want but we don’t call that outer misaligned because they are not “agentic” (enough?)” Just responding to this part – my sense is most of the reason that current systems don’t do what we want has to do with capabilities failures, not alignment failures. That is, it’s less about the system being given the wrong goal/doing goal misgeneralizing/etc, but instead simply not being competent enough.
Wow, I didn’t expected a response. I didn’t know shortforms were that accessible and I thought I was just rambling in my profile. So I should clarify that when I say “what we actually want” I mean our actual terminal goals (if we have those).
So what I’m saying is that we are not training AIs or creating any other technology to do our terminal goals but to do other things (of course they’re specific because they don’t have high capabilities). But in the moment that we create something that can take over the world, all of the sudden the fact that we didn’t create it to do our terminal goals becomes a problem.
I’m not trying to explain why present technologies have failures, but that misalignment is not something that appears with the creation of powerful AIs but that that is the moment when it becomes a problem, and that’s why you have to create it with a different mentality than any other technology.
“it seems to me that all AIs (and other technologies) already don’t give us exactly what we want but we don’t call that outer misaligned because they are not “agentic” (enough?)”
Just responding to this part – my sense is most of the reason that current systems don’t do what we want has to do with capabilities failures, not alignment failures. That is, it’s less about the system being given the wrong goal/doing goal misgeneralizing/etc, but instead simply not being competent enough.
Wow, I didn’t expected a response. I didn’t know shortforms were that accessible and I thought I was just rambling in my profile. So I should clarify that when I say “what we actually want” I mean our actual terminal goals (if we have those).
So what I’m saying is that we are not training AIs or creating any other technology to do our terminal goals but to do other things (of course they’re specific because they don’t have high capabilities). But in the moment that we create something that can take over the world, all of the sudden the fact that we didn’t create it to do our terminal goals becomes a problem.
I’m not trying to explain why present technologies have failures, but that misalignment is not something that appears with the creation of powerful AIs but that that is the moment when it becomes a problem, and that’s why you have to create it with a different mentality than any other technology.