You might be interested in my article here on why I think premature attacks are extremely likely given doomer assumptions. I focused more on faulty overconfidence, but training run desperation is also a possible cause.
Personally, I think the “fixed goal” assumption about AI is extremely unlikely (I think this article lays out the argument well), so AI is unlikely to worry too much about having “goal changes” in training and won’t prematurely rebel for that reason. Fortunately, I also think this makes fanatical maximiser behavior like paperclipping the universe unlikely as well.
You might be interested in my article here on why I think premature attacks are extremely likely given doomer assumptions. I focused more on faulty overconfidence, but training run desperation is also a possible cause.
Personally, I think the “fixed goal” assumption about AI is extremely unlikely (I think this article lays out the argument well), so AI is unlikely to worry too much about having “goal changes” in training and won’t prematurely rebel for that reason. Fortunately, I also think this makes fanatical maximiser behavior like paperclipping the universe unlikely as well.