The idea of ‘alignment’ presupposes that you cannot control the computer and that it has its own will so you need to ‘align it’ ie incentivise it. But this isn’t the case, we can control them.
It’s true that machine learning AIs can create their own instructions and perform tasks however we still maintain overall control. We can constraint both inputs and outputs. We can nest the ‘intelligent’ machine learning part of the system within constraints that prevent unwanted outcomes. For instance ask an AI a question about feeling suicidal now and you’ll probably get an answer that’s been written by a human. That’s what I got last time I checked and the conversation was abrupty ended.
The idea of ‘alignment’ presupposes that you cannot control the computer and that it has its own will so you need to ‘align it’ ie incentivise it. But this isn’t the case, we can control them.
It’s true that machine learning AIs can create their own instructions and perform tasks however we still maintain overall control. We can constraint both inputs and outputs. We can nest the ‘intelligent’ machine learning part of the system within constraints that prevent unwanted outcomes. For instance ask an AI a question about feeling suicidal now and you’ll probably get an answer that’s been written by a human. That’s what I got last time I checked and the conversation was abrupty ended.