Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade.
I donât really understand this argument: if there is some game theoretic solution such that all intelligences will avoid conflict, then shouldnât we expect that AIâs would find and implement it themselves so that they can get gains from trade?
In order for this to be an argument for us working on s-risks, I would think that you need to show that only some subset of intelligences will avoid conflicts, which means we need to ensure we build only that subset.
I agree with your reasoning hereâwhile I think working on s-risks from AI conflict is a top priority, I wouldnât give Dawnâs argument for it. This post gives the main arguments for why some ârationalâ AIs wouldnât avoid conflicts by default, and some high-level ways we could steer AIs into the subset that would.
Iirc, one problem is that there are ways to trade in positive sum ways, but they are multiple ways, and they donât mix. So to agree on something, you first have to agree on the method you want to use to agree, but some may be at an advantage using one method and others using another method.
More empirically, there have been plenty of situations in which groups of smart humans after long deliberation have made bad decision because they thought another was bluffing, because they thought they could get away with a bluff, because their intended bluff got out of control, etc.
Tobias Baumann has thought a bit about whether perfectly rational all-knowing superintelligences might still fail to realize certain gains from trade. I donât think he arrived at a strong conclusion even in that ideal case. (Idealized models of AIs donât ring true to me and are at best helpful to establish hypothetical limits of sorts, I think.) But in practice even superintelligences will have some uncertainty over whether another is lying, concealing something, might not have something that they think they have, etc. Such imperfect knowledge of each other has historically led to a lot of unnecessary bloodshed.
Another source of problems is behavior in single-shot vs. iterated games. An AI might be forced into a situation where it has to allow a smaller s-risk to prevent a greater s-risk.
Folks at CLR have a ton of research into all the various failure modes, and itâs not clear to me at all what constellations of attitudes minimize or maximize s-risk. Iâve been hypothesizing that the Tit-For-Tat-heavy European culture may (if learned by AIs) lead to fewer worse suffering catastrophes whereas the more âPavlovianâ (in the game theory sense) cultures of South Korea or Australia (iirc?) may cause more smaller catastrophes.
But thatâs just as vague speculation as it sounds. My takeaway is rather that I think that any multipolar scenarios will lead to tons of small and large bargaining failures, and some of those may involve extreme suffering on an unprecedented scale.
I donât really understand this argument: if there is some game theoretic solution such that all intelligences will avoid conflict, then shouldnât we expect that AIâs would find and implement it themselves so that they can get gains from trade?
In order for this to be an argument for us working on s-risks, I would think that you need to show that only some subset of intelligences will avoid conflicts, which means we need to ensure we build only that subset.
I agree with your reasoning hereâwhile I think working on s-risks from AI conflict is a top priority, I wouldnât give Dawnâs argument for it. This post gives the main arguments for why some ârationalâ AIs wouldnât avoid conflicts by default, and some high-level ways we could steer AIs into the subset that would.
Agreed, and thanks for linking the article!
This article for example makes the case.
Iirc, one problem is that there are ways to trade in positive sum ways, but they are multiple ways, and they donât mix. So to agree on something, you first have to agree on the method you want to use to agree, but some may be at an advantage using one method and others using another method.
More empirically, there have been plenty of situations in which groups of smart humans after long deliberation have made bad decision because they thought another was bluffing, because they thought they could get away with a bluff, because their intended bluff got out of control, etc.
Tobias Baumann has thought a bit about whether perfectly rational all-knowing superintelligences might still fail to realize certain gains from trade. I donât think he arrived at a strong conclusion even in that ideal case. (Idealized models of AIs donât ring true to me and are at best helpful to establish hypothetical limits of sorts, I think.) But in practice even superintelligences will have some uncertainty over whether another is lying, concealing something, might not have something that they think they have, etc. Such imperfect knowledge of each other has historically led to a lot of unnecessary bloodshed.
Another source of problems is behavior in single-shot vs. iterated games. An AI might be forced into a situation where it has to allow a smaller s-risk to prevent a greater s-risk.
Folks at CLR have a ton of research into all the various failure modes, and itâs not clear to me at all what constellations of attitudes minimize or maximize s-risk. Iâve been hypothesizing that the Tit-For-Tat-heavy European culture may (if learned by AIs) lead to fewer worse suffering catastrophes whereas the more âPavlovianâ (in the game theory sense) cultures of South Korea or Australia (iirc?) may cause more smaller catastrophes.
But thatâs just as vague speculation as it sounds. My takeaway is rather that I think that any multipolar scenarios will lead to tons of small and large bargaining failures, and some of those may involve extreme suffering on an unprecedented scale.