Note that I’m conditioning on AIs successfully taking over which is strong evidence against human success at creating desirable AIs.
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
This outcomes based feedback results in selecting the AIs for carefully tricking their human overseers in a variety of cases and generally ruthlessly pursuing reward.
Would aliens not also be incentivized to trick us or others? What about other humans? In my opinion, basically all the arguments about AI deception from gradient descent apply in some form to other methods of selecting minds, including evolution by natural selection, cultural learning, and in-lifetime learning. Humans frequently lie to or mislead each other about our motives. For example, if you ask a human what they’d do if they became world dictator, I suspect you’d often get a different answer than the one they’d actually chose if given that power. I think this is essentially the same epistemic position we might occupy with AI.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
By “AI takeover”, I mean autonomous AI coup/revolution. E.g., violating the law and/or subverting the normal mechanisms of power transfer. (Somewhat unclear exactly what should count tbc, but there are some central examples.) By this definition, it basically always involves subverting the intentions of the creators of the AI, though may not involve violent conflict.
I don’t think this is super likely, perhaps 25% chance.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
I don’t strongly disagree with either of these claims, but this isn’t exactly where my crux lies.
The key thing is “generally ruthlessly pursuing reward”.
The key thing is “generally ruthlessly pursuing reward”.
It depends heavily on what you mean by this, but I’m kinda skeptical of the strong version of ruthless reward seekers, for similar reasons given in this post. I think AIs by default might be ruthless in some other senses—since we’ll be applying a lot of selection pressure to them to get good behavior—but I’m not really sure how how much weight to put on the fact that AIs will be “ruthless” when evaluating how good they are at being our successors. It’s not clear how that affects my evaluation of how much I’d be OK handing the universe over to them, and my guess is the answer is “not much” (absent more details).
Humans seem pretty ruthless in certain respects too, e.g. about survival, or increasing their social status. I’d expect aliens, and potentially uplifted dogs to be ruthless too along some axes depending on how we uplifted them.
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
Would aliens not also be incentivized to trick us or others? What about other humans? In my opinion, basically all the arguments about AI deception from gradient descent apply in some form to other methods of selecting minds, including evolution by natural selection, cultural learning, and in-lifetime learning. Humans frequently lie to or mislead each other about our motives. For example, if you ask a human what they’d do if they became world dictator, I suspect you’d often get a different answer than the one they’d actually chose if given that power. I think this is essentially the same epistemic position we might occupy with AI.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
By “AI takeover”, I mean autonomous AI coup/revolution. E.g., violating the law and/or subverting the normal mechanisms of power transfer. (Somewhat unclear exactly what should count tbc, but there are some central examples.) By this definition, it basically always involves subverting the intentions of the creators of the AI, though may not involve violent conflict.
I don’t think this is super likely, perhaps 25% chance.
I don’t strongly disagree with either of these claims, but this isn’t exactly where my crux lies.
The key thing is “generally ruthlessly pursuing reward”.
I’m checking out of this conversation though.
It depends heavily on what you mean by this, but I’m kinda skeptical of the strong version of ruthless reward seekers, for similar reasons given in this post. I think AIs by default might be ruthless in some other senses—since we’ll be applying a lot of selection pressure to them to get good behavior—but I’m not really sure how how much weight to put on the fact that AIs will be “ruthless” when evaluating how good they are at being our successors. It’s not clear how that affects my evaluation of how much I’d be OK handing the universe over to them, and my guess is the answer is “not much” (absent more details).
Humans seem pretty ruthless in certain respects too, e.g. about survival, or increasing their social status. I’d expect aliens, and potentially uplifted dogs to be ruthless too along some axes depending on how we uplifted them.
Alright, that’s fine.