Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
We might actually expect an AGI to be trained to conquer the entire planet, or rather to be trained in many of the abilities needed to do so. For example, we may train it to be good at things like:
Strategic planning
Getting humans to do what it wants effectively
Controlling physical systems
Cybersecurity
Researching new, powerful technologies
Engineering
Running large organizations
Communicating with humans and other AIs
Put differently, I think “taking control over humans” and “running a multinational corporation” (which seems like the sort of thing people will want AIs to be able to do) have lots more overlap than “playing chess” and “having true beliefs about subjects of conspiracies”. I’d be curious to hear if you have thoughts about which specific abilities you expect an AGI would need to have to take control over humanity that it’s unlikely to actually possess?
Hi Erich,
Note humans are also trained on all those abilities, but no single human is trained to be a specialist in all those areas. Likewise for AIs.
Yes, that’s true. Can you spell out for me what you think that implies in a little more detail?
For an agent to conquer to world, I think it would have to be close to the best across all those areas, but I think this is super unlikely based on it being super unlikely for a human to be close to the best across all those areas.
That seems right.
I’m not sure that follows? I would expect improvements on these types of tasks to be highly correlated in general-purpose AIs. I think we’ve seen that with GPT-3 to GPT-4, for example: GPT-4 got better pretty much across the board (excluding the tasks that neither of them can do, and the tasks that GPT-3 could already do perfectly). That is not the case for a human who will typically improve in just one domain or a few domains from one year to the next, depending on where they focus their effort.
Higher IQ in humans is correlated with better performance in all sorts of tasks too, but the probability of finding a single human performing better than 99.9 % of (human or AI) workers in each of the areas you mentioned is still astronomically low. So I do not expect a single AI system to become better than 99.9 % of (human or AI) workers in each of the areas you mentioned. It can still be the case that the AI systems share a baseline common architecture, in the same way that humans share the same underlying biology, but I predict the top performers in each area will still be specialised systems.
Going from GPT-3 to GPT-4 seems more analogous to a human going from 10 to 20 years old. There are improvements across the board during this phase, but specialisation still matters among adults. Likewise, I assume specialisation will matter among frontier AI systems (although I am quite open to a single future AI system being better than all humans at any task). GPT-4 is still far from being better than 99.9 % of (human or AI) workers in the areas you mentioned.
Let me see if I can rephrase your argument, because I’m not sure I get it. As I understand it, you’re saying:
In humans, higher IQ means better performance across a variety of tasks. This is analogous to AI, where more compute/parameters/data etc. means better performance across a variety of tasks.
AI systems tend to share a common underlying architecture, just as humans share the same basic biology.
For humans, when IQ increases, there are improvements across the board, but still specialization, meaning no single human (the one with the most IQ) will be better than all other humans at all of those things.
By analogy: For AIs, when they’re scaled up, there are improvements across the board, but (likely) still specialization, meaning no single AI (the one with the most compute/parameters/data/etc.) will be better than all other AIs at all of those things.
Now I’m a bit unsure about whether you’re saying that you find it extremely unlikely that any AI will be vastly better in the areas I mentioned than all humans, or that you find it extremely unlikely that any AI will be vastly better than all humans and all other AIs in those areas.
If you mean 1-4 to suggest that no AI is will be better than all humans and other AIs, I’m not sure about whether 4 follows from 1-3, but I think that seems plausible at least. But if this is what you mean, I’m not sure what you’re original comment (“Note humans are also trained on all those abilities, but no single human is trained to be a specialist in all those areas. Likewise for AIs.”) was meant to say in response to my original comment, which was meant as pushback against the view that AGI would be bad at taking over the planet since it wouldn’t be intended for that purpose.
If you mean 1-4 to suggest that no AI will be better than all humans, I don’t think the analogy holds, because the underlying factor (IQ versus AI scale/algorithms) is different. Like, it seems possible that even unspecialized AIs could just sweep past the most intelligent and specialized humans, given enough time.
Thanks for the clarification, Erich! Strongly upvoted.
I think your rephrasement was great.
The latter.
I think a single AI agent would have to be better than the vast majority of agents (including both human and AI agents) to gain control over the world, which I consider extremely unlikely given gains from specialisation.
I agree.
I believe the probability of a rogue (human or AI) agent gaining control over the world mostly depends on its level of capabilities relative to those of the other agents, not on the absolute level of capabilities of the rogue agent. So I mostly worry about concentration of capabilities rather than increases in capabilities per se. In theory, the capabilities of a given group of (human or AI) agents could increase a lot in a short period of time such that capabilities become so concentrated that the group would be in a position to gain control over the world. However, I think this is very unlikely in practice. I guess the annual probability of human extinction over the next 10 years is around 10^-6.
Thank you for writing this well-argued post—I think its important to keep discussing exactly how big P(doom) is. However, and I say this as someone who believes that P(doom) is on the lower end, it would also be good to be clear about what the implications would be for EAs if P(doom) was low. It seems likely that many of the same recommendations—reduce spending on risky AI technologies and increase spending on AI safety—would still hold, at least until we get a clearer idea of the exact nature of AI risks.
Terminological point: It sounds like you’re using the phrase “instrumental convergence” in an unusual way.
I take it the typical idea is just that there are some instrumental goals that an intelligent agent can expect to be useful in the pursuit of a wide range of other goals, whereas you seem to be emphasizing the idea that those instrumental goals would be pursued to extremes destructive of humanity. It seems to me that (1) those two two ideas are worth keeping separate, (2) “instrumental convergence” would more accurately label the first idea, and (3) that phrase is in fact usually used to refer to the first idea only.
This occurred to me as I was skimming the post and saw the suggestion that instrumental convergence is not seen in humans, to which my reaction was, “What?! Don’t people like money?”
I agree with this, there are definitely two definitions at play. I think a failure to distinguish between these two definitions is actually a big problem with the AI doom argument, where they end up doing an unintentional motte-and-bailey between the two definitions.
David Thornstad explains it pretty well here. The “people want money” definition is trivial and obviously true, but does not lead to the “doom is inevitable” conclusion. I have a goal of eating food, and money is useful for that purpose, but that doesn’t mean I automatically try and accumulate all the wealth on the planet in order to tile the universe with food.
No, the doomer says, “If that AI doesn’t destroy the world, people will build a more capable one.” Current AIs haven’t destroyed the world. So people are trying to build more capable ones.
There is some weird thing here about people trying to predict trajectories, not endpoints; they get as far as describing, in their story, an AI that doesn’t end the world as we know it, and then they stop, satisfied that they’ve refuted the doomer story. But if the world as we know it continues, somebody builds a more powerful AI.
My point is that the trajectories affect the endpoints. You have fundamentally misunderstood my entire argument.
Say a rogue, flawed, AI has recently killed ten million people before being stopped. That results in large amounts of regulation, research, and security changes.
This can have two effects:
Firstly,(if AI research isn’t shut down entirely), it makes it more likely that the AI safety problem will be solved due to increased funding and urgency.
Secondly, it makes the difficulty level of future takeover attempts greater, due to awareness of AI tactics, increased monitoring, security, international agreeements, etc.
If the difficulty level increases faster than the AI capabilities can catch up, then humanity wins.
Suppose we end up with a future where every time a rogue AI pops up, there are 1000 equally powerful safe AI’s there to kill it in it’s crib. In this case, scaling up the power levels doesn’t matter: the new, more powerful rogue AI is met by 1000 new, more powerful safe AI’s. At no point does it become world domination capable.
The other possible win condition is that enough death and destruction is wrought by failed AI’s that humanity bands together to ban AI entirely, and successfully enforces this ban.
Great post, titotal!
The link is broken.
Does this actually mean anything? If we think the weak but non-aligned AI thinks it has a 10% chance of taking over the world if it tries to, and that the AI thinks that soon new more powerful AIs will come online and prevent it from doing that, and that it consequently reasons that it ought to attempt to take over the world immediately, as opposed to waiting for new more powerful AIs coming online and stopping it. Then there are to possibilities: Either these new AIs will be non-aligned or aligned.
In the first case, it would mean that the (very smart) AI thinks there is a really high chance (>90%?) that non-aligned AIs will take over the world any time now. In this case we are doomed, and us getting an early warning shot should matter unless we act extremely quickly.
In the second case the AI thinks there is a high chance that very soon we’ll get aligned superhuman AIs. In this case, everything will be well. Most likely we’d already have the technology to prevent the 10% non-aligned AI from doing anything or even existing in the first place.
Seems like this argument shouldn’t make us feel any more or less concerned. I guess it depends on specifics, like whether the AI thinks the AI regulation we impose on seeing other AIs non-successfully try to take over the world will make it harder for itself to take over the world, or if it just, for example, only affects new models and not itself (as it presumably already has been trained and deployed). Overall though, it should maybe make you slightly less concerned if you are a super doomer, and slightly more concerned if you are super AI bloomer.