Hmm, my guess is by the time a system might succeed at takeover (i.e. has more than like a 5% chance of actually disempowering all of humanity permanently), I expect its behavior and thinking to be quite rational. I agree that there will probably be AIs taking reckless action earlier than that, but in as much as an AI is actually posing a risk of takeover, I do expect it to behave pretty rationally overall.
I agree with “pretty rationally overall” with respect to general world modelling, but I think that some of the stuff about how it relates to its own values / future selves is a bit of a different magisterium and it wouldn’t be too surprising if (1) it hadn’t been selected for rationality/competence on this dimension, and (2) the general rationality didn’t really transfer over.
Hmm, my guess is by the time a system might succeed at takeover (i.e. has more than like a 5% chance of actually disempowering all of humanity permanently), I expect its behavior and thinking to be quite rational. I agree that there will probably be AIs taking reckless action earlier than that, but in as much as an AI is actually posing a risk of takeover, I do expect it to behave pretty rationally overall.
I agree with “pretty rationally overall” with respect to general world modelling, but I think that some of the stuff about how it relates to its own values / future selves is a bit of a different magisterium and it wouldn’t be too surprising if (1) it hadn’t been selected for rationality/competence on this dimension, and (2) the general rationality didn’t really transfer over.