For instance we might get WBEs only in hypothetical-2080 but get superintelligent LLMs in 2040, and the people using superintelligent LLMs make the world unrecognisably different by 2042 itself.
I definitely don’t just want to talk about what happens / what’s feasible before the world becomes unrecognisably different. It seems pretty likely to me that lock-in will only become feasible after the world has become extremely strange. (Though this depends a bit on details of how to define “feasible”, and what we count as the start-date of lock-in.)
And I think that advanced civilizations that tried could eventually become very knowledgable about how to create AI with a wide variety of properties, which is why I feel ok with the assumption that AIs could be made similar to humans in some ways without being WBEs.
(In particular, the arguments in this document are not novel suggestions for how to succeed with alignment in a realistic scenario with limited time! That still seems like a hard problem! C.f. my response to Michael Plant.)
If your report is conditional on a dominant institution existing at the time WBEs are invented, then this claim makes sense to make! If it is asserting that there is a non-trivial probability that a dominant institution will in fact exist at this point and that WBEs will in fact be invented at some point, then I wonder if that might need to be separately defended.
Random (not exhaustive list of) reasons a dominant institution may not exist:
Humans of a country appoint non-RL programs as their worthy successors and give them all power. Due to multipolar race dynamics humans of some country correctly decide this is better than waiting until WBEs or world dominance. These successors do not have convergent instrumental subgoals of retaining billion year stability. Other countries also respond by similarly appointing their own successors to avoid fading into irrelevance.
Offence-defence balances shift in favour of defence using extinction weapons and highly reliably enforceable threats, hence multiple countries are stable for arbitrarily long without any dominating one. (Dominating may require research into lots of new technologies which causes value drift which may be undesirable to a state.)
Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)
I definitely don’t just want to talk about what happens / what’s feasible before the world becomes unrecognisably different. It seems pretty likely to me that lock-in will only become feasible after the world has become extremely strange. (Though this depends a bit on details of how to define “feasible”, and what we count as the start-date of lock-in.)
And I think that advanced civilizations that tried could eventually become very knowledgable about how to create AI with a wide variety of properties, which is why I feel ok with the assumption that AIs could be made similar to humans in some ways without being WBEs.
(In particular, the arguments in this document are not novel suggestions for how to succeed with alignment in a realistic scenario with limited time! That still seems like a hard problem! C.f. my response to Michael Plant.)
Thanks this makes a lot of sense!
If your report is conditional on a dominant institution existing at the time WBEs are invented, then this claim makes sense to make! If it is asserting that there is a non-trivial probability that a dominant institution will in fact exist at this point and that WBEs will in fact be invented at some point, then I wonder if that might need to be separately defended.
Random (not exhaustive list of) reasons a dominant institution may not exist:
Humans of a country appoint non-RL programs as their worthy successors and give them all power. Due to multipolar race dynamics humans of some country correctly decide this is better than waiting until WBEs or world dominance. These successors do not have convergent instrumental subgoals of retaining billion year stability. Other countries also respond by similarly appointing their own successors to avoid fading into irrelevance.
Offence-defence balances shift in favour of defence using extinction weapons and highly reliably enforceable threats, hence multiple countries are stable for arbitrarily long without any dominating one. (Dominating may require research into lots of new technologies which causes value drift which may be undesirable to a state.)
Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
Thanks for the reply!
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)