Here’s my current four-point argument for AI risk/danger from misaligned AIs.
We are on the path of creating intelligences capable of being better than humans at almost all economically and militarily relevant tasks.
There are strong selection pressures and trends to make these intelligences into goal-seeking minds acting in the real world, rather than disembodied high-IQ pattern-matchers.
Unlike traditional software, we have little ability to know or control what these goal-seeking minds will do, only directional input.
Minds much better than humans at seeking their goals, with goals different enough from our own, may end us all, either as a preventative measure or side effect.
Request for feedback: I’m curious whether there are points that people think I’m critically missing, and/or ways that these arguments would not be convincing to “normal people.” Original goal.
I have many disagreements, but I’ll focus on one: I think point 2 is in contradiction with points 3 and 4. To put it it plainly: the “selection pressures” go away pretty quickly if we don’t have reliable methods of knowing or controlling what the AI will do, or preventing it from doing noticeably bad stuff. That applies to the obvious stuff like if AI tries to prematurely go skynet, but it also applies to more mundane stuff like getting an AI to act reliably more than 99% of the time.
I believe that if we manage to control AI enough to make widespread rollout feasible, then it’s pretty likely we’ve already solved alignment well enough to prevent extinction.
Hmm right now this seems wrong to me, and also not worth going into in an introductory post. Do you have a sense that your view is commonplace? (eg from talking to many people not involved in AI)
Hi Linch. I am open tobetsagainst short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
The bets I’ve seen you post seem rather disadvantageous to the other side, and I believed so at the time. Which is fine/good business from your perspective given that you managed to find takers. But it means I’m more pessimistic on finding good deals by both of our lights.
Here are a few things you might need to address to convince a skeptic:
Humans currently have access to, maintain and can shut down or destroy the hardware and infrastructure AI depends on. This is an important advantage.
Ending us all can be risky from an AI’s perspective, because of the risk of shutdown (or losing humans to maintain, extract resource for, and build infrastructure AI depends on without an adequate replacement).
I’d guess we can make AIs risk-averse (or difference-making risk averse) for whatever goals they do end up with, even if we can’t align them.
Ending us all sounds hard and unlikely. There are many ways we are resilient and ways governments and militaries could respond to a threat of this level.
2 thoughts here just thinking about persuasiveness. I’m not quite sure what you mean by normal people and also if you still want your arguments to be actually arguments or just persuasion-max.
show don’t tell for 1-3
For anyone who hasn’t intimately used frontier models but is willing to with an open mind, I’d guess you should just push them to use and actually engage mentally with them and their thought traces, even better if you can convince them to use something agentic like CC.
Ask and/or tell stories for 4
What can history tell us about what happens when a significantly more tech savy/powerful nation finds another one?
no “right” answer here though the general arc of history is that significantly more powerful nations capture/kill/etc.
What would it be like to be a native during various european conquests in the new world (esp ignoring effects of smallpox/disease to the extent you can)?
Incan perspective? Mayan?
I especially like Orellena’s first expedition down the amazon. As far as I can tell, Orellena was not especially bloodthirsty, had some interest/respect for natives. Though he is certainly misaligned with the natives.
Even if Orellana is “less bloodthirsty,” you still don’t want to be a native on that river. You hear fragmented rumors—trade, disease, violence—with no shared narrative; you don’t know what these outsiders want or what their weapons do; you don’t know whether letting them land changes the local equilibrium by enabling alliances with your enemies; and you don’t know whether the boat carries Orellana or someone worse.
Do you trade? attack? flee? coordinate? Any move could be fatal, and the entire situation destabilizes before anyone has to decide “we should exterminate them.”
and for all of these situations you can actually see what happened (approximately) and usually it doesn’t end well.
Why is AI different?
not rhetorical and gives them space to think in a smaller, more structured way that doesn’t force an answer.
I think that your list is really great! As a person who try to understand misaligned AI better, this is my arguments:
The difference between a human and an AGI might be greater than the difference between a human and a mushroom.
If the difference is that great, it will probably not make much difference between a cow and a human. The way humans treat other animals, the planet and each other makes it hard to see how we could possibly create AI alignment that is willing to save a creature like us.
If AGI has self-perservation, we are the only creatures that can threat their existence. Which means that they might want to make sure that we didn’t exist anymore just to be safe.
AGI is a thing that we know nothing about. If it cane a spaceship with aliens, we would probably use enormous resources to make sure it would not threat our planet. But now we are creating this alien creature ourselves and don’t do very much to make sure it isn’t a threat to our planet.
Here’s my current four-point argument for AI risk/danger from misaligned AIs.
We are on the path of creating intelligences capable of being better than humans at almost all economically and militarily relevant tasks.
There are strong selection pressures and trends to make these intelligences into goal-seeking minds acting in the real world, rather than disembodied high-IQ pattern-matchers.
Unlike traditional software, we have little ability to know or control what these goal-seeking minds will do, only directional input.
Minds much better than humans at seeking their goals, with goals different enough from our own, may end us all, either as a preventative measure or side effect.
Request for feedback: I’m curious whether there are points that people think I’m critically missing, and/or ways that these arguments would not be convincing to “normal people.” Original goal.
I have many disagreements, but I’ll focus on one: I think point 2 is in contradiction with points 3 and 4. To put it it plainly: the “selection pressures” go away pretty quickly if we don’t have reliable methods of knowing or controlling what the AI will do, or preventing it from doing noticeably bad stuff. That applies to the obvious stuff like if AI tries to prematurely go skynet, but it also applies to more mundane stuff like getting an AI to act reliably more than 99% of the time.
I believe that if we manage to control AI enough to make widespread rollout feasible, then it’s pretty likely we’ve already solved alignment well enough to prevent extinction.
Hmm right now this seems wrong to me, and also not worth going into in an introductory post. Do you have a sense that your view is commonplace? (eg from talking to many people not involved in AI)
Hi Linch. I am open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. Do you see any that we could make that is good for both of us under our own views considering we could invest our money, and that you could take loans?
The bets I’ve seen you post seem rather disadvantageous to the other side, and I believed so at the time. Which is fine/good business from your perspective given that you managed to find takers. But it means I’m more pessimistic on finding good deals by both of our lights.
Here are a few things you might need to address to convince a skeptic:
Humans currently have access to, maintain and can shut down or destroy the hardware and infrastructure AI depends on. This is an important advantage.
Ending us all can be risky from an AI’s perspective, because of the risk of shutdown (or losing humans to maintain, extract resource for, and build infrastructure AI depends on without an adequate replacement).
I’d guess we can make AIs risk-averse (or difference-making risk averse) for whatever goals they do end up with, even if we can’t align them.
Ending us all sounds hard and unlikely. There are many ways we are resilient and ways governments and militaries could respond to a threat of this level.
2 thoughts here just thinking about persuasiveness. I’m not quite sure what you mean by normal people and also if you still want your arguments to be actually arguments or just persuasion-max.
show don’t tell for 1-3
For anyone who hasn’t intimately used frontier models but is willing to with an open mind, I’d guess you should just push them to use and actually engage mentally with them and their thought traces, even better if you can convince them to use something agentic like CC.
Ask and/or tell stories for 4
What can history tell us about what happens when a significantly more tech savy/powerful nation finds another one?
no “right” answer here though the general arc of history is that significantly more powerful nations capture/kill/etc.
What would it be like to be a native during various european conquests in the new world (esp ignoring effects of smallpox/disease to the extent you can)?
Incan perspective? Mayan?
I especially like Orellena’s first expedition down the amazon. As far as I can tell, Orellena was not especially bloodthirsty, had some interest/respect for natives. Though he is certainly misaligned with the natives.
Even if Orellana is “less bloodthirsty,” you still don’t want to be a native on that river. You hear fragmented rumors—trade, disease, violence—with no shared narrative; you don’t know what these outsiders want or what their weapons do; you don’t know whether letting them land changes the local equilibrium by enabling alliances with your enemies; and you don’t know whether the boat carries Orellana or someone worse.
Do you trade? attack? flee? coordinate? Any move could be fatal, and the entire situation destabilizes before anyone has to decide “we should exterminate them.”
and for all of these situations you can actually see what happened (approximately) and usually it doesn’t end well.
Why is AI different?
not rhetorical and gives them space to think in a smaller, more structured way that doesn’t force an answer.
I think that your list is really great! As a person who try to understand misaligned AI better, this is my arguments:
The difference between a human and an AGI might be greater than the difference between a human and a mushroom.
If the difference is that great, it will probably not make much difference between a cow and a human. The way humans treat other animals, the planet and each other makes it hard to see how we could possibly create AI alignment that is willing to save a creature like us.
If AGI has self-perservation, we are the only creatures that can threat their existence. Which means that they might want to make sure that we didn’t exist anymore just to be safe.
AGI is a thing that we know nothing about. If it cane a spaceship with aliens, we would probably use enormous resources to make sure it would not threat our planet. But now we are creating this alien creature ourselves and don’t do very much to make sure it isn’t a threat to our planet.
I hope my list helps!