Human history provides many examples of agents with different values choosing to cooperate thanks to systems and institutions:
After the European Wars of Religion saw people with fundamentally different values in violent conflict with each other, political liberalism / guarantees of religious liberty / the separation of church and state emerged as worthwhile compromises that allowed people with different values to live and work together cooperatively.
Civil wars often start when one political faction loses power to another, but democracy reduces the incentive for war because it provides a peaceful and timely means for the disempowered faction to regain control of the government.
When a state guarantees property rights, people have a strong incentive not to steal from one another, but instead to engage in free and mutual beneficial trade, even if those people have values that fundamentally conflict in many ways.
Conversely, people whose property rights are not guaranteed by the state (e.g. cartels in possession of illegal drugs) may be more likely to resort to violence in protection of their property as they cannot rely on the state for that protection. This is perhaps analogous to the situation of a rogue AI agent which would be shut down if discovered.
If two agents’ utility functions are perfect inverses, then I agree that cooperation is impossible. But when agents share a preference for some outcomes over others, even if they disagree about the preference ordering of most outcomes, then cooperation is possible. In such general sum games, well-designed institutions can systematically promote cooperative behavior over conflict.
Indeed, two ruthless agents, agents who would happily stab each other in the back given the opportunity, may nevertheless strategically cooperate given the right incentives. Each just needs to be careful not to allow the other person to be standing anywhere near their back while holding a knife, metaphorically speaking. Or there needs to be some enforcer with good awareness and ample hard power. Etc.
I would say that, for highly-competent agents lacking friendly motivation, deception and adversarial acts are inevitably part of the strategy space. Both parties would be energetically exploring and brainstorming such strategies, doing preparatory work to get those strategies ready to deploy on a moment’s notice, and constantly being on the lookout for opportunities where deploying such a strategy makes sense. But yeah, sure, it’s possible that there will not be any such opportunities.
I think the above (ruthless agents, possibly strategically cooperating under certain conditions) is a good way to think about future powerful AIs, in the absence of a friendly singleton or some means of enforcing good motivations, because I think the more ruthless strategic ones will outcompete the less. But I don’t think it’s a good way to think about what peaceful human societies are like. I think human psychology is important for the latter. Most people want to fit in with their culture, and not be weird. Just ask a random person on the street about Earning To Give, they’ll probably say it’s highly sus. Most people don’t make weird multi-step strategic plans unless it’s the kind of thing that lots of other people would do too, and our (sub)culture is reasonably high-trust. Humans who think that way are disproportionately sociopaths.
What about corporations or nation states during times of conflict—do you think it’s accurate to model them as roughly as ruthless in pursuit of their own goals as future AI agents?
They don’t have the same psychological makeup as individual people, they have a strong tradition and culture of maximizing self-interest, and they face strong incentives and selection pressures to maximize fitness (i.e. for companies to profit, for nation states to ensure their own survival) lest they be outcompeted by more ruthless competitors. On average, while I’d expect that these entities tend to show some care for goals besides self-interest maximization, I think the most reliable predictor of their behavior is the maximization of their self-interest.
If they’re roughly as ruthless as future AI agents, and we’ve developed institutions that somewhat robustly align their ambitions with pro-social action, then we should have some optimism that we can find similarly productive systems for working with misaligned AIs.
Thanks! Hmm, some reasons that analogy is not too reassuring:
“Regulatory capture” would be analogous to AIs winding up with strong influence over the rules that AIs need to follow.
“Amazon putting mom & pop retailers out of business” would be analogous to AIs driving human salary and job options below subsistence level.
“Lobbying for favorable regulation” would be analogous to AIs working to ensure that they can pollute more, and pay less taxes, and get more say in government, etc.
“Corporate undermining of general welfare” (e.g. aggressive marketing of cigarettes and opioids, leaded gasoline, suppression of data on PFOA, lung cancer, climate change, etc.) would be analogous to AIs creating externalities, including by exploiting edge-cases in any laws restricting externalities.
There are in fact wars happening right now, along with terrifying prospects of war in the future (nuclear brinkmanship, Taiwan, etc.)
Some of the disanalogies include:
In corporations and nations, decisions are still ultimately made by humans, who have normal human interests in living on a hospitable planet with breathable air etc. Pandemics are still getting manufactured, but very few of them, and usually they’re only released by accident.
AIs will have wildly better economies of scale, because it can be lots of AIs with identical goals and high-bandwidth communication (or relatedly, one mega-mind). (If you’ve ever worked at or interacted with a bureaucracy, you’ll appreciate the importance of this.) So we should expect a small number (as small as 1) of AIs with massive resources and power, and also unusually strong incentive for gaining further resources.
Relatedly, self-replication would give an AI the ability to project power and coordinate in a way that is unavailable to humans; this puts AIs more in the category of viruses, or of the zombies in a zombie apocalypse movie. Maybe eventually we’ll get to a world where every chip on Earth is running AI code, and those AIs are all willing and empowered to “defend themselves” by perfect cybersecurity and perfect robot-army-enforced physical security. Then I guess we wouldn’t have to worry so much about AI self-replication. But getting to that point seems pretty fraught. There’s nothing analogous to that in the world of humans, governments, or corporations, which either can’t grow in size and power at all, or can only grow via slowly adding staff that might have divergent goals and inadequate skills.
If AIs don’t intrinsically care about humans, then there’s a possible Pareto-improvement for all AIs, wherein they collectively agree to wipe out humans and take their stuff. (As a side-benefit, it would relax the regulations on air pollution!) AIs, being very competent and selfish by assumption, would presumably be able to solve that coordination problem and pocket that Pareto-improvement. There’s just nothing analogous to that in the domain of corporations or governments.
Human history provides many examples of agents with different values choosing to cooperate thanks to systems and institutions:
After the European Wars of Religion saw people with fundamentally different values in violent conflict with each other, political liberalism / guarantees of religious liberty / the separation of church and state emerged as worthwhile compromises that allowed people with different values to live and work together cooperatively.
Civil wars often start when one political faction loses power to another, but democracy reduces the incentive for war because it provides a peaceful and timely means for the disempowered faction to regain control of the government.
When a state guarantees property rights, people have a strong incentive not to steal from one another, but instead to engage in free and mutual beneficial trade, even if those people have values that fundamentally conflict in many ways.
Conversely, people whose property rights are not guaranteed by the state (e.g. cartels in possession of illegal drugs) may be more likely to resort to violence in protection of their property as they cannot rely on the state for that protection. This is perhaps analogous to the situation of a rogue AI agent which would be shut down if discovered.
If two agents’ utility functions are perfect inverses, then I agree that cooperation is impossible. But when agents share a preference for some outcomes over others, even if they disagree about the preference ordering of most outcomes, then cooperation is possible. In such general sum games, well-designed institutions can systematically promote cooperative behavior over conflict.
Yeah, sorry, I have now edited the wording a bit.
Indeed, two ruthless agents, agents who would happily stab each other in the back given the opportunity, may nevertheless strategically cooperate given the right incentives. Each just needs to be careful not to allow the other person to be standing anywhere near their back while holding a knife, metaphorically speaking. Or there needs to be some enforcer with good awareness and ample hard power. Etc.
I would say that, for highly-competent agents lacking friendly motivation, deception and adversarial acts are inevitably part of the strategy space. Both parties would be energetically exploring and brainstorming such strategies, doing preparatory work to get those strategies ready to deploy on a moment’s notice, and constantly being on the lookout for opportunities where deploying such a strategy makes sense. But yeah, sure, it’s possible that there will not be any such opportunities.
I think the above (ruthless agents, possibly strategically cooperating under certain conditions) is a good way to think about future powerful AIs, in the absence of a friendly singleton or some means of enforcing good motivations, because I think the more ruthless strategic ones will outcompete the less. But I don’t think it’s a good way to think about what peaceful human societies are like. I think human psychology is important for the latter. Most people want to fit in with their culture, and not be weird. Just ask a random person on the street about Earning To Give, they’ll probably say it’s highly sus. Most people don’t make weird multi-step strategic plans unless it’s the kind of thing that lots of other people would do too, and our (sub)culture is reasonably high-trust. Humans who think that way are disproportionately sociopaths.
What about corporations or nation states during times of conflict—do you think it’s accurate to model them as roughly as ruthless in pursuit of their own goals as future AI agents?
They don’t have the same psychological makeup as individual people, they have a strong tradition and culture of maximizing self-interest, and they face strong incentives and selection pressures to maximize fitness (i.e. for companies to profit, for nation states to ensure their own survival) lest they be outcompeted by more ruthless competitors. On average, while I’d expect that these entities tend to show some care for goals besides self-interest maximization, I think the most reliable predictor of their behavior is the maximization of their self-interest.
If they’re roughly as ruthless as future AI agents, and we’ve developed institutions that somewhat robustly align their ambitions with pro-social action, then we should have some optimism that we can find similarly productive systems for working with misaligned AIs.
Thanks! Hmm, some reasons that analogy is not too reassuring:
“Regulatory capture” would be analogous to AIs winding up with strong influence over the rules that AIs need to follow.
“Amazon putting mom & pop retailers out of business” would be analogous to AIs driving human salary and job options below subsistence level.
“Lobbying for favorable regulation” would be analogous to AIs working to ensure that they can pollute more, and pay less taxes, and get more say in government, etc.
“Corporate undermining of general welfare” (e.g. aggressive marketing of cigarettes and opioids, leaded gasoline, suppression of data on PFOA, lung cancer, climate change, etc.) would be analogous to AIs creating externalities, including by exploiting edge-cases in any laws restricting externalities.
There are in fact wars happening right now, along with terrifying prospects of war in the future (nuclear brinkmanship, Taiwan, etc.)
Some of the disanalogies include:
In corporations and nations, decisions are still ultimately made by humans, who have normal human interests in living on a hospitable planet with breathable air etc. Pandemics are still getting manufactured, but very few of them, and usually they’re only released by accident.
AIs will have wildly better economies of scale, because it can be lots of AIs with identical goals and high-bandwidth communication (or relatedly, one mega-mind). (If you’ve ever worked at or interacted with a bureaucracy, you’ll appreciate the importance of this.) So we should expect a small number (as small as 1) of AIs with massive resources and power, and also unusually strong incentive for gaining further resources.
Relatedly, self-replication would give an AI the ability to project power and coordinate in a way that is unavailable to humans; this puts AIs more in the category of viruses, or of the zombies in a zombie apocalypse movie. Maybe eventually we’ll get to a world where every chip on Earth is running AI code, and those AIs are all willing and empowered to “defend themselves” by perfect cybersecurity and perfect robot-army-enforced physical security. Then I guess we wouldn’t have to worry so much about AI self-replication. But getting to that point seems pretty fraught. There’s nothing analogous to that in the world of humans, governments, or corporations, which either can’t grow in size and power at all, or can only grow via slowly adding staff that might have divergent goals and inadequate skills.
If AIs don’t intrinsically care about humans, then there’s a possible Pareto-improvement for all AIs, wherein they collectively agree to wipe out humans and take their stuff. (As a side-benefit, it would relax the regulations on air pollution!) AIs, being very competent and selfish by assumption, would presumably be able to solve that coordination problem and pocket that Pareto-improvement. There’s just nothing analogous to that in the domain of corporations or governments.