You’ve assumed from the get go that AIs will follow similar reinforcement-learning like paradigms like humans and converge on similar ontologies of looking at the world as humans. You’ve also assumed these ontologies will be stable—for instance a RL agent wouldn’t become superintelligent, use reasoning and then decide to self modify into something that is not an RL agent.
Something like that, though I would phrase it as relying on the claim that it’s feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in that context, the claim seems pretty safe to me. (Largely because we know that humans exist.)
You’ve assumed laws of physics as we know them today are constraints on things like computation and space colonization and oversight and alignment processes for other AIs.
Yup, sounds right.
Does this assume a clean separation between two kinds of processes—those that can be predicted and those that can’t?
That’s a good question. I wouldn’t be shocked if something like this was roughly right, even if it’s not exactly right. Let’s imagine the situation from the post, where we have an intelligent observer with some large amount of compute that gets to see the paths of lots of other civilizations built by evolved species. Now let’s imagine a graph where the x-axis has some increasing combination of “compute” and “number of previous examples seen”, and the y-axis has something like “ability to predict important events”. At first, the y-value would probably go up pretty fast with greater x, as the observer get a better sense of what the distribution of outcomes are. But on our understanding of chaos theory, it’s ability to predict e.g. the weather years in advance would be limited even at astoundingly large values of compute+knowledge of what the distribution is like. And since chaotic processes affect important real-world events in various ways (e.g. the genes of new humans seem similarly random as the weather, and that has huge effects), it seems plausible that our imagined graph would asymptote towards some limit of what’s predictable.
And that’s not even bringing up fundamental quantum effects, which are fundamentally unpredictable from our perspective. (With a many-worlds interpretation, they might be predictable in the sense that all of them will happen. But that still lets us make interesting claims about “fractions of everett branches”, which seems pretty interchangeable with “probabilities of events”.)
In any case, I don’t think this impinges much on the main claims in the doc. (Though if I was convinced that the picture above was wildly wrong, I might want to give a bit of extra thought to what’s the most convenient definition of lock-in.)
Thanks!
Something like that, though I would phrase it as relying on the claim that it’s feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in that context, the claim seems pretty safe to me. (Largely because we know that humans exist.)
Yup, sounds right.
That’s a good question. I wouldn’t be shocked if something like this was roughly right, even if it’s not exactly right. Let’s imagine the situation from the post, where we have an intelligent observer with some large amount of compute that gets to see the paths of lots of other civilizations built by evolved species. Now let’s imagine a graph where the x-axis has some increasing combination of “compute” and “number of previous examples seen”, and the y-axis has something like “ability to predict important events”. At first, the y-value would probably go up pretty fast with greater x, as the observer get a better sense of what the distribution of outcomes are. But on our understanding of chaos theory, it’s ability to predict e.g. the weather years in advance would be limited even at astoundingly large values of compute+knowledge of what the distribution is like. And since chaotic processes affect important real-world events in various ways (e.g. the genes of new humans seem similarly random as the weather, and that has huge effects), it seems plausible that our imagined graph would asymptote towards some limit of what’s predictable.
And that’s not even bringing up fundamental quantum effects, which are fundamentally unpredictable from our perspective. (With a many-worlds interpretation, they might be predictable in the sense that all of them will happen. But that still lets us make interesting claims about “fractions of everett branches”, which seems pretty interchangeable with “probabilities of events”.)
In any case, I don’t think this impinges much on the main claims in the doc. (Though if I was convinced that the picture above was wildly wrong, I might want to give a bit of extra thought to what’s the most convenient definition of lock-in.)