I think a main point of disagreement is that I don’t think systems need to be “dangerous maximizers” in the sense you described in order to predictably disempower humanity and then kill everyone. Humans aren’t dangerous maximizers yet we’ve killed many species of animals, neanderthals, and various other human groups (genocide, wars, oppression of populations by governments, etc.) Katja’s scenario sounds plausible for me except for the part where somehow it all turns out OK in the end for humans. :)
Another, related point of disagreement:
“look, LLMs distill human cognition, much of this cognition implicitly contains plans, human-like value judgements, etc.” I start from a place where I currently believe “future systems have human-like inductive biases” will be a better predictive abstraction than “randomly sample from the space of simplicity-weighted plans”. And … I just don’t currently see the argument for rejecting my current view?
I actually agree that current and future systems will have human-like concepts, human-like inductive biases, etc. -- relative to the space of all possible minds at least. But their values will be sufficiently alien that humanity will be in deep trouble. (Analogy: Suppose we bred some octopi to be smarter and smarter, in an environment where they were e.g. trained with pavlovian conditioning + artificial selection to be really good at reading internext text and predicting it, and then eventually writing it also.… They would indeed end up a lot more human-like than regular wild octopi. But boy would it be scary if they started getting generally smarter than humans and being integrated deeply into lots of important systems and humans started trusting them a lot etc.)
Your analogy successfully motivates the “man, I’d really like more people to be thinking about the potentially looming Octopcracy” sentiment, and my intuitions here feel pretty similar to the AI case. I would expect the relevant systems (AIs, von-Neumann-Squidwards, etc) to inherit human-like properties wrt human cognition (including normative cognition, like plan search), and a small-but-non-negligible chance that we end up with extinction (or worse).
On maximizers: to me, the most plausible reason for believing that continued human survival would be unstable in Grace’s story either consists in the emergence of dangerous maximizers, or the emergence of related behaviors like rapacious influence-seeking (e.g., Part II of What Failure Looks Like). I agree that maximizers aren’t necessary for human extinction, but it does seem like the most plausible route to ‘human extinction’ rather than ‘something else weird and potentially not great’.
Nice. Well, I guess we just have different intuitions then—for me, the chance of extinction or worse in the Octopcracy case seems a lot bigger than “small but non-negligible” (though I also wouldn’t put it as high as 99%).
Human groups struggle against each other for influence/power/control constantly; why wouldn’t these octopi (or AIs) also seek influence? You don’t need to be an expected utility maximizer to instrumentally converge; humans instrumentally converge all the time.
Oh also you might be interested in Joe Carlsmith’s report on power-seeking AI, it has a relatively thorough discussion of the overall argument for risk.
Nice analysis!
I think a main point of disagreement is that I don’t think systems need to be “dangerous maximizers” in the sense you described in order to predictably disempower humanity and then kill everyone. Humans aren’t dangerous maximizers yet we’ve killed many species of animals, neanderthals, and various other human groups (genocide, wars, oppression of populations by governments, etc.) Katja’s scenario sounds plausible for me except for the part where somehow it all turns out OK in the end for humans. :)
Another, related point of disagreement:
I actually agree that current and future systems will have human-like concepts, human-like inductive biases, etc. -- relative to the space of all possible minds at least. But their values will be sufficiently alien that humanity will be in deep trouble. (Analogy: Suppose we bred some octopi to be smarter and smarter, in an environment where they were e.g. trained with pavlovian conditioning + artificial selection to be really good at reading internext text and predicting it, and then eventually writing it also.… They would indeed end up a lot more human-like than regular wild octopi. But boy would it be scary if they started getting generally smarter than humans and being integrated deeply into lots of important systems and humans started trusting them a lot etc.)
thnx! : )
Your analogy successfully motivates the “man, I’d really like more people to be thinking about the potentially looming Octopcracy” sentiment, and my intuitions here feel pretty similar to the AI case. I would expect the relevant systems (AIs, von-Neumann-Squidwards, etc) to inherit human-like properties wrt human cognition (including normative cognition, like plan search), and a small-but-non-negligible chance that we end up with extinction (or worse).
On maximizers: to me, the most plausible reason for believing that continued human survival would be unstable in Grace’s story either consists in the emergence of dangerous maximizers, or the emergence of related behaviors like rapacious influence-seeking (e.g., Part II of What Failure Looks Like). I agree that maximizers aren’t necessary for human extinction, but it does seem like the most plausible route to ‘human extinction’ rather than ‘something else weird and potentially not great’.
Nice. Well, I guess we just have different intuitions then—for me, the chance of extinction or worse in the Octopcracy case seems a lot bigger than “small but non-negligible” (though I also wouldn’t put it as high as 99%).
Human groups struggle against each other for influence/power/control constantly; why wouldn’t these octopi (or AIs) also seek influence? You don’t need to be an expected utility maximizer to instrumentally converge; humans instrumentally converge all the time.
Oh also you might be interested in Joe Carlsmith’s report on power-seeking AI, it has a relatively thorough discussion of the overall argument for risk.