This is a fair point, but I’m not sure why it wants to kill humans.
Like my point here is not just ‘we’ll train it out of its natural tendency to kill humans’, it’s more like ‘if we’re giving it its natural tendencies in the first place, through training, how does it get that one?’ (and there are arguments about instrumental convergence and such but I say some stuff about that in the post)
This is a fair point, but I’m not sure why it wants to kill humans.
Like my point here is not just ‘we’ll train it out of its natural tendency to kill humans’, it’s more like ‘if we’re giving it its natural tendencies in the first place, through training, how does it get that one?’ (and there are arguments about instrumental convergence and such but I say some stuff about that in the post)