Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)
Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
Thanks for the reply!
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)