You’ve assumed from the get go that AIs will follow similar reinforcement-learning like paradigms like humans and converge on similar ontologies of looking at the world as humans. You’ve also assumed these ontologies will be stable—for instance a RL agent wouldn’t become superintelligent, use reasoning and then decide to self modify into something that is not an RL agent.
Something like that, though I would phrase it as relying on the claim that it’s feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in that context, the claim seems pretty safe to me. (Largely because we know that humans exist.)
You’ve assumed laws of physics as we know them today are constraints on things like computation and space colonization and oversight and alignment processes for other AIs.
Yup, sounds right.
Does this assume a clean separation between two kinds of processes—those that can be predicted and those that can’t?
That’s a good question. I wouldn’t be shocked if something like this was roughly right, even if it’s not exactly right. Let’s imagine the situation from the post, where we have an intelligent observer with some large amount of compute that gets to see the paths of lots of other civilizations built by evolved species. Now let’s imagine a graph where the x-axis has some increasing combination of “compute” and “number of previous examples seen”, and the y-axis has something like “ability to predict important events”. At first, the y-value would probably go up pretty fast with greater x, as the observer get a better sense of what the distribution of outcomes are. But on our understanding of chaos theory, it’s ability to predict e.g. the weather years in advance would be limited even at astoundingly large values of compute+knowledge of what the distribution is like. And since chaotic processes affect important real-world events in various ways (e.g. the genes of new humans seem similarly random as the weather, and that has huge effects), it seems plausible that our imagined graph would asymptote towards some limit of what’s predictable.
And that’s not even bringing up fundamental quantum effects, which are fundamentally unpredictable from our perspective. (With a many-worlds interpretation, they might be predictable in the sense that all of them will happen. But that still lets us make interesting claims about “fractions of everett branches”, which seems pretty interchangeable with “probabilities of events”.)
In any case, I don’t think this impinges much on the main claims in the doc. (Though if I was convinced that the picture above was wildly wrong, I might want to give a bit of extra thought to what’s the most convenient definition of lock-in.)
But on our understanding of chaos theory, it’s ability to predict e.g. the weather years in advance would be limited even at astoundingly large values of compute+knowledge of what the distribution is like.
Are there computational-complexity bounds on predicting chaotic processes? I have not studied chaos theory so I do not actually know. And if not, why are we so confident there exists no algorithm that sidesteps requiring way too much compute?
Especially when considering these algorithms may be discovered by arbitrarily superintelligent machines running billions of years. But I would even be interested in reading a defence of weaker claims like “biological human mathematicians cannot solve chaos theory in the next 200 years assuming incremental progress in math and assuming no AGI or other fancy stuff takes place in the same 200 years.”
Chaos theory is about systems where tiny deviations in initial conditions cause large deviations in what happens in the future. My impression (though I don’t know much about the field) is that, assuming some model of a system (e.g. the weather), you can prove things about how far ahead you can predict the system given some uncertainty (normally about the initial conditions, though uncertainty brought about by limited compute that forces approximations should work similarly). Whether the weather corresponds to any particular model isn’t really susceptible to proofs, but that question can be tackled by normal science.
This topic also comes up when discussing ajeya cotra’s biological anchors—how much compute is required to simulate evolution and create AGI in the first place—which is another reason why I was curious about this topic. If re-running evolution requires simulating the weather and if this is computationally too difficult then re-running evolution may not be a viable path to AGI. (And out of the all of biological anchors, the evolutionary one is the only one that matters imo.) I wonder if it’s worth studying this topic further.
If re-running evolution requires simulating the weather and if this is computationally too difficult then re-running evolution may not be a viable path to AGI.
There are many things that prevent us from literally rerunning human evolution. The evolution anchor is not a proof that we could do exactly what evolution did, but instead an argument that if something as inefficient as evolution spit out human intelligence with that amount of compute, surely humanity could do it if we had a similar amount of compute. Evolution is very inefficient — it has itself been far less optimized than the creatures it produces.
(I’d have more specific objections to the idea that chaos-theory-in-weather in particular would be an issue: I think that a weather-distribution approximated with a different random generation procedure would be as likely to produce human intelligence as a weather distribution generated by Earth’s precise chaotic behavior. But that’s not very relevant, because there would be far bigger differences between Earthly evolution and what-humans-would-do-with-1e40-FLOP than the weather.)
There are many things that prevent us from literally rerunning human evolution. The evolution anchor is not a proof that we could do exactly what evolution did, but instead an argument that if something as inefficient as evolution spit out human intelligence with that amount of compute, surely humanity could do it if we had a similar amount of compute. Evolution is very inefficient — it has itself been far less optimized than the creatures it produces.
Yup I feel like there’s different ways to interpret it, you’ve picked one interpretation which is fair!
Another way of interpreting it that I found was: “what’s an argument for AI timelines this century that is straightforward and airtight, and doesn’t rely on things like hard-to-convey inside views, lots of deference or arbitrary ways of setting priors”. Many AI risk ppl anyway seem to agree that if you’re aiming for accuracy you can’t rely on the anchor as much, at best its a sort of upper bound. But if you are aiming for airtight arguments that can convince literally anybody, then biological anchors might more persuasive than other ways of thinking about AI timelines.
And if you are aiming for airtightness, I wonder if “we can literally re-run evolution and this is how we will do it at a technical level” can be made more airtight than the broader arguments in your first para. [Broader arguments such as: we can do different things with the compute and still get AGI, that evolution was in fact a “dumb” unoptimised process and not smart in some unknown way, that we as humans can in fact do better than evolution (at finding AGI) because we’re smart, that evolution didn’t get astronomically lucky because of some instantiation choices etc etc.]
(I’d have more specific objections to the idea that chaos-theory-in-weather in particular would be an issue: I think that a weather-distribution approximated with a different random generation procedure would be as likely to produce human intelligence as a weather distribution generated by Earth’s precise chaotic behavior. But that’s not very relevant, because there would be far bigger differences between Earthly evolution and what-humans-would-do-with-1e40-FLOP than the weather.)
This is fair! Although I do wonder more broadly, not just restricted to the weather but to tasks in general: Is it possible to train/select/evolve RL agents to get to AGI only by training on fast-to-evaluate tasks, or is training on slow-to-evaluate tasks a necessary condition? By fast-to-evaluate I’d just mean doing a forward pass of the environment is not significantly slower than doing a forward pass of the agent, and that you can in fact spend most of the compute during training on the agent not the environment.
Some of MIRI stuff on decision theory does make me wonder if acting in environments that are more complicated* than you as an agent are, is a qualitatively different kind of problem than acting in environments that are simpler than you are.
*Ways an environment may be “complicated”: possess more computational complexity than you, contain your perfect clones, contain agents with much higher intelligence than you, contain chaos-theoretic / quantum / physical / chemical stuff necessary for life or intelligent behaviour, be literally incomputable etc.
Something like that, though I would phrase it as relying on the claim that it’s feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in that context, the claim seems pretty safe to me. (Largely because we know that humans exist.)
The main way I find this claim easier* to buy if the system literally consists of WBEs. There is significant alignment tax to building a system using WBEs, so even if it is possible to build in theory, it may not be possible in practice. For instance we might get WBEs only in hypothetical-2080 but get superintelligent LLMs in 2040, and the people using superintelligent LLMs make the world unrecognisably different by 2042 itself.
*Even then I have doubts I wanna bring up, but I am more convinced by the report in that case.
For instance we might get WBEs only in hypothetical-2080 but get superintelligent LLMs in 2040, and the people using superintelligent LLMs make the world unrecognisably different by 2042 itself.
I definitely don’t just want to talk about what happens / what’s feasible before the world becomes unrecognisably different. It seems pretty likely to me that lock-in will only become feasible after the world has become extremely strange. (Though this depends a bit on details of how to define “feasible”, and what we count as the start-date of lock-in.)
And I think that advanced civilizations that tried could eventually become very knowledgable about how to create AI with a wide variety of properties, which is why I feel ok with the assumption that AIs could be made similar to humans in some ways without being WBEs.
(In particular, the arguments in this document are not novel suggestions for how to succeed with alignment in a realistic scenario with limited time! That still seems like a hard problem! C.f. my response to Michael Plant.)
If your report is conditional on a dominant institution existing at the time WBEs are invented, then this claim makes sense to make! If it is asserting that there is a non-trivial probability that a dominant institution will in fact exist at this point and that WBEs will in fact be invented at some point, then I wonder if that might need to be separately defended.
Random (not exhaustive list of) reasons a dominant institution may not exist:
Humans of a country appoint non-RL programs as their worthy successors and give them all power. Due to multipolar race dynamics humans of some country correctly decide this is better than waiting until WBEs or world dominance. These successors do not have convergent instrumental subgoals of retaining billion year stability. Other countries also respond by similarly appointing their own successors to avoid fading into irrelevance.
Offence-defence balances shift in favour of defence using extinction weapons and highly reliably enforceable threats, hence multiple countries are stable for arbitrarily long without any dominating one. (Dominating may require research into lots of new technologies which causes value drift which may be undesirable to a state.)
Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)
Thanks!
Something like that, though I would phrase it as relying on the claim that it’s feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in that context, the claim seems pretty safe to me. (Largely because we know that humans exist.)
Yup, sounds right.
That’s a good question. I wouldn’t be shocked if something like this was roughly right, even if it’s not exactly right. Let’s imagine the situation from the post, where we have an intelligent observer with some large amount of compute that gets to see the paths of lots of other civilizations built by evolved species. Now let’s imagine a graph where the x-axis has some increasing combination of “compute” and “number of previous examples seen”, and the y-axis has something like “ability to predict important events”. At first, the y-value would probably go up pretty fast with greater x, as the observer get a better sense of what the distribution of outcomes are. But on our understanding of chaos theory, it’s ability to predict e.g. the weather years in advance would be limited even at astoundingly large values of compute+knowledge of what the distribution is like. And since chaotic processes affect important real-world events in various ways (e.g. the genes of new humans seem similarly random as the weather, and that has huge effects), it seems plausible that our imagined graph would asymptote towards some limit of what’s predictable.
And that’s not even bringing up fundamental quantum effects, which are fundamentally unpredictable from our perspective. (With a many-worlds interpretation, they might be predictable in the sense that all of them will happen. But that still lets us make interesting claims about “fractions of everett branches”, which seems pretty interchangeable with “probabilities of events”.)
In any case, I don’t think this impinges much on the main claims in the doc. (Though if I was convinced that the picture above was wildly wrong, I might want to give a bit of extra thought to what’s the most convenient definition of lock-in.)
Are there computational-complexity bounds on predicting chaotic processes? I have not studied chaos theory so I do not actually know. And if not, why are we so confident there exists no algorithm that sidesteps requiring way too much compute?
Especially when considering these algorithms may be discovered by arbitrarily superintelligent machines running billions of years. But I would even be interested in reading a defence of weaker claims like “biological human mathematicians cannot solve chaos theory in the next 200 years assuming incremental progress in math and assuming no AGI or other fancy stuff takes place in the same 200 years.”
Chaos theory is about systems where tiny deviations in initial conditions cause large deviations in what happens in the future. My impression (though I don’t know much about the field) is that, assuming some model of a system (e.g. the weather), you can prove things about how far ahead you can predict the system given some uncertainty (normally about the initial conditions, though uncertainty brought about by limited compute that forces approximations should work similarly). Whether the weather corresponds to any particular model isn’t really susceptible to proofs, but that question can be tackled by normal science.
Thanks for this reply!
This topic also comes up when discussing ajeya cotra’s biological anchors—how much compute is required to simulate evolution and create AGI in the first place—which is another reason why I was curious about this topic. If re-running evolution requires simulating the weather and if this is computationally too difficult then re-running evolution may not be a viable path to AGI. (And out of the all of biological anchors, the evolutionary one is the only one that matters imo.) I wonder if it’s worth studying this topic further.
There are many things that prevent us from literally rerunning human evolution. The evolution anchor is not a proof that we could do exactly what evolution did, but instead an argument that if something as inefficient as evolution spit out human intelligence with that amount of compute, surely humanity could do it if we had a similar amount of compute. Evolution is very inefficient — it has itself been far less optimized than the creatures it produces.
(I’d have more specific objections to the idea that chaos-theory-in-weather in particular would be an issue: I think that a weather-distribution approximated with a different random generation procedure would be as likely to produce human intelligence as a weather distribution generated by Earth’s precise chaotic behavior. But that’s not very relevant, because there would be far bigger differences between Earthly evolution and what-humans-would-do-with-1e40-FLOP than the weather.)
Yup I feel like there’s different ways to interpret it, you’ve picked one interpretation which is fair!
Another way of interpreting it that I found was: “what’s an argument for AI timelines this century that is straightforward and airtight, and doesn’t rely on things like hard-to-convey inside views, lots of deference or arbitrary ways of setting priors”. Many AI risk ppl anyway seem to agree that if you’re aiming for accuracy you can’t rely on the anchor as much, at best its a sort of upper bound. But if you are aiming for airtight arguments that can convince literally anybody, then biological anchors might more persuasive than other ways of thinking about AI timelines.
And if you are aiming for airtightness, I wonder if “we can literally re-run evolution and this is how we will do it at a technical level” can be made more airtight than the broader arguments in your first para. [Broader arguments such as: we can do different things with the compute and still get AGI, that evolution was in fact a “dumb” unoptimised process and not smart in some unknown way, that we as humans can in fact do better than evolution (at finding AGI) because we’re smart, that evolution didn’t get astronomically lucky because of some instantiation choices etc etc.]
This is fair! Although I do wonder more broadly, not just restricted to the weather but to tasks in general: Is it possible to train/select/evolve RL agents to get to AGI only by training on fast-to-evaluate tasks, or is training on slow-to-evaluate tasks a necessary condition? By fast-to-evaluate I’d just mean doing a forward pass of the environment is not significantly slower than doing a forward pass of the agent, and that you can in fact spend most of the compute during training on the agent not the environment.
Some of MIRI stuff on decision theory does make me wonder if acting in environments that are more complicated* than you as an agent are, is a qualitatively different kind of problem than acting in environments that are simpler than you are.
*Ways an environment may be “complicated”: possess more computational complexity than you, contain your perfect clones, contain agents with much higher intelligence than you, contain chaos-theoretic / quantum / physical / chemical stuff necessary for life or intelligent behaviour, be literally incomputable etc.
Thanks for your reply!
The main way I find this claim easier* to buy if the system literally consists of WBEs. There is significant alignment tax to building a system using WBEs, so even if it is possible to build in theory, it may not be possible in practice. For instance we might get WBEs only in hypothetical-2080 but get superintelligent LLMs in 2040, and the people using superintelligent LLMs make the world unrecognisably different by 2042 itself.
*Even then I have doubts I wanna bring up, but I am more convinced by the report in that case.
I definitely don’t just want to talk about what happens / what’s feasible before the world becomes unrecognisably different. It seems pretty likely to me that lock-in will only become feasible after the world has become extremely strange. (Though this depends a bit on details of how to define “feasible”, and what we count as the start-date of lock-in.)
And I think that advanced civilizations that tried could eventually become very knowledgable about how to create AI with a wide variety of properties, which is why I feel ok with the assumption that AIs could be made similar to humans in some ways without being WBEs.
(In particular, the arguments in this document are not novel suggestions for how to succeed with alignment in a realistic scenario with limited time! That still seems like a hard problem! C.f. my response to Michael Plant.)
Thanks this makes a lot of sense!
If your report is conditional on a dominant institution existing at the time WBEs are invented, then this claim makes sense to make! If it is asserting that there is a non-trivial probability that a dominant institution will in fact exist at this point and that WBEs will in fact be invented at some point, then I wonder if that might need to be separately defended.
Random (not exhaustive list of) reasons a dominant institution may not exist:
Humans of a country appoint non-RL programs as their worthy successors and give them all power. Due to multipolar race dynamics humans of some country correctly decide this is better than waiting until WBEs or world dominance. These successors do not have convergent instrumental subgoals of retaining billion year stability. Other countries also respond by similarly appointing their own successors to avoid fading into irrelevance.
Offence-defence balances shift in favour of defence using extinction weapons and highly reliably enforceable threats, hence multiple countries are stable for arbitrarily long without any dominating one. (Dominating may require research into lots of new technologies which causes value drift which may be undesirable to a state.)
Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.
I do think that “there is a non-trivial probability that a dominant institution will in fact exist”, and also that there’s a non-trivial probability that a multipolar scenario will either
(i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
(ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence.)
...but arguing for that isn’t really a focus of the doc.
(And also, a large part of why I believe they might happen is that they sound plausible enough, and I haven’t heard great arguments for why we should be confident in some particular alternative. Which is a bit hard to forcefully argue for.)
P.S. A (slightly) concrete scenario where we dont get longterm stability that I wonder about—we get AGI through a DL not RL formalism, this AGI doesnt have convergent instrumental subgoal of billion year stability and we dont know how to give it this goal, and we face strong reasons to deploy it. All this happens before we know how to build AGIs that can value longterm stability.
Thanks for the reply!
Yup this is fair! You and I might have to assign some probability to both lock-in and no-lock-in scenarios as of today (2022). [0]
But it does seem useful to disambiguate between the following two things as to why we’re assigning that probability.
There is some probability the actions you and I or other humans take will make a large difference on which longterm future humanity ends up. We dont know what these actions are (we’re assuming we cant just determinstically predict other people’s behaviour) and therefore we’re uncertain which future we end up in.
There is nothing you and I or other humans can do that would make a large difference on the longterm future. We are already headed for certain specific futures, we’re just not smart enough or knowledgible enough to predict in advance which futures we will end up in. Predicting this doesn’t require deterministically predicting other people’s behaviour, maybe there’s simple technical arguments + institutional incentives that suffice to prove which future we end up in. However we do not know these arguments yet. The probability (of lockin and nolockin in [0]) is coming out of this uncertainty.
[0] = [1] + [2]
It doesn’t seem crazy to me to have some probability on 2 being true, even though we don’t actually have a clear argument for why we’re headed for specific futures or which ones they are.
(Yudkowsky for instance is basically predicting we face > 50% x-risk no matter what we do or do not do, maybe he’s right and we’re too dumb to realise it. I have greater than 1% probability on “him being right and us being too dumb to realise it”, even though I don’t really understand why he’s so pessimistic and am just deferring.)