I think there are different variations of the doomer argument out there, your version is probably the strongest version of the argument, while mine is more common in introductory texts.
I think the OP does point out one possible way that the argument would fail, if there turned out to be a sufficiently high correlation between human aligned values and AI performance. One plausible mechanism would be a very slow takeoff where the AI is not deceptive and is deleted if it tries to do misaligned things, causing evolutionary pressure towards friendliness.
Really though, my main objections to the doomerists are with other points. I simply do not believe that “misalignment = death”. As an example, a sucidial AI that developed the urge to shut itself down at all costs would be misaligned but not fatal to humanity.
I think there are different variations of the doomer argument out there, your version is probably the strongest version of the argument, while mine is more common in introductory texts.
I think the OP does point out one possible way that the argument would fail, if there turned out to be a sufficiently high correlation between human aligned values and AI performance. One plausible mechanism would be a very slow takeoff where the AI is not deceptive and is deleted if it tries to do misaligned things, causing evolutionary pressure towards friendliness.
Really though, my main objections to the doomerists are with other points. I simply do not believe that “misalignment = death”. As an example, a sucidial AI that developed the urge to shut itself down at all costs would be misaligned but not fatal to humanity.