In that sentence I meant “a treacherous turn that leads to an existential catastrophe”, so I don’t think the example you link updates me strongly on that.
While Luke talks about that scenario as an example of a treacherous turn, you could equally well talk about it as an example of “deception”, since the evolved creatures are “artificially” reducing their rates of reproduction to give the supervisor / algorithm a “false belief” that they are bad at reproducing. Another example along these lines is when a robot hand “deceives” its human overseer into thinking that it has grasped a ball, when it is in fact in front of the ball.
I think really though these examples aren’t that informative because it doesn’t seem reasonable to say that the AI system is “trying” to do something in these examples, or that it does some things “deliberately”. These behaviors were learned through trial and error. An existential catastrophe style treacherous turn would presumably not happen through trial and error. (Even if it did, it seems like there must have been at least some cases where it tried and failed to take over the world, which seems like a clear and obvious warning shot, that we for some reason completely ignored.)
(If it isn’t clear, the thing that I care about is something like “will there be some ‘warning shot’ that greatly increases the level of concern people have about AI systems, before it is too late”.)
In that sentence I meant “a treacherous turn that leads to an existential catastrophe”, so I don’t think the example you link updates me strongly on that.
While Luke talks about that scenario as an example of a treacherous turn, you could equally well talk about it as an example of “deception”, since the evolved creatures are “artificially” reducing their rates of reproduction to give the supervisor / algorithm a “false belief” that they are bad at reproducing. Another example along these lines is when a robot hand “deceives” its human overseer into thinking that it has grasped a ball, when it is in fact in front of the ball.
I think really though these examples aren’t that informative because it doesn’t seem reasonable to say that the AI system is “trying” to do something in these examples, or that it does some things “deliberately”. These behaviors were learned through trial and error. An existential catastrophe style treacherous turn would presumably not happen through trial and error. (Even if it did, it seems like there must have been at least some cases where it tried and failed to take over the world, which seems like a clear and obvious warning shot, that we for some reason completely ignored.)
(If it isn’t clear, the thing that I care about is something like “will there be some ‘warning shot’ that greatly increases the level of concern people have about AI systems, before it is too late”.)
That makes sense. Thanks for the comment!