If transformative AI ends up extremely intelligent, and reasonably aligned—not so that it cares about 100% of “our values” whatever that means—but in such a way as to comply with our intentions while limiting side-effects. The way we’ve aligned it, it just refuses to try to do anything with tangible near-term consequences that we didn’t explicitly intend.[1]
Any failure-modes that arise as a consequence of using this system, we readily blame ourselves for. If we tried to tell it “fix the world!” it would just honestly say that it wouldn’t know how to do that, because it can’t solve moral philosophy for us. We can ask it to do things, but we can’t ask it what it thinks we would like to ask it to do.
Such an AI would act as an amplifier on what we intend, but we might not intend the wisest things. So even if it fully understands and complies with our intentions, we’re still prone to long-term disasters we can’t foresee.
This kind of AI seems like a plausible endpoint for alignment. It may just forever remain unfeasible to “encode human morality into an AI” or “adequately define boundary conditions for human morality”, so the rest is up to us.
If this scenario occupies a decent chunk of our probability mass, then these things seem high priority:
Figure out what we would like to ask such an AI to do, that we won’t regret come a hundred years later. Lots of civilizational outcomes are path-dependent, and we could permanently lock us out of better outcomes if we try to become interstellar too early.
(It could still help us with surveillance technologies, and thereby assist us in preventing the development of other AIs or existentially dangerous technologies.)
Space colonisation is now super easy, technologically speaking, but we might still wish to postpone it in order to give us time to reflect on what foundational institutions we’d like to start space governance out with. We’d want strong political institutions and treaties in place to make sure that no country is allowed to settle any other planet until some specified time.
We’d want to make sure we have adequate countermeasures against Malthusian forces before we start spreading. We want to have some sense of what an adequate civilization in a stable equilibrium looks likes, and what institutions need to be in place in order to make sure we can keep it going for millions of years before we even start spreading our current system.
Spreading our current civilization throughout the stars might pose an S-risk (and certainly an x-risk) unless we have robust mechanisms for keeping Malthus (and Moloch more generally) in check.
Given this scenario, if we don’t figure out how to do these things before the era of transformative AI, then TAI will just massively speed up technological advancement with no time for reflection. And countries will defect to grab planetary real-estate before we have stable space governance norms in place.
The Cautiously Compliant Superintelligence
If transformative AI ends up extremely intelligent, and reasonably aligned—not so that it cares about 100% of “our values” whatever that means—but in such a way as to comply with our intentions while limiting side-effects. The way we’ve aligned it, it just refuses to try to do anything with tangible near-term consequences that we didn’t explicitly intend.[1]
Any failure-modes that arise as a consequence of using this system, we readily blame ourselves for. If we tried to tell it “fix the world!” it would just honestly say that it wouldn’t know how to do that, because it can’t solve moral philosophy for us. We can ask it to do things, but we can’t ask it what it thinks we would like to ask it to do.
Such an AI would act as an amplifier on what we intend, but we might not intend the wisest things. So even if it fully understands and complies with our intentions, we’re still prone to long-term disasters we can’t foresee.
This kind of AI seems like a plausible endpoint for alignment. It may just forever remain unfeasible to “encode human morality into an AI” or “adequately define boundary conditions for human morality”, so the rest is up to us.
If this scenario occupies a decent chunk of our probability mass, then these things seem high priority:
Figure out what we would like to ask such an AI to do, that we won’t regret come a hundred years later. Lots of civilizational outcomes are path-dependent, and we could permanently lock us out of better outcomes if we try to become interstellar too early.
(It could still help us with surveillance technologies, and thereby assist us in preventing the development of other AIs or existentially dangerous technologies.)
Space colonisation is now super easy, technologically speaking, but we might still wish to postpone it in order to give us time to reflect on what foundational institutions we’d like to start space governance out with. We’d want strong political institutions and treaties in place to make sure that no country is allowed to settle any other planet until some specified time.
We’d want to make sure we have adequate countermeasures against Malthusian forces before we start spreading. We want to have some sense of what an adequate civilization in a stable equilibrium looks likes, and what institutions need to be in place in order to make sure we can keep it going for millions of years before we even start spreading our current system.
Spreading our current civilization throughout the stars might pose an S-risk (and certainly an x-risk) unless we have robust mechanisms for keeping Malthus (and Moloch more generally) in check.
Given this scenario, if we don’t figure out how to do these things before the era of transformative AI, then TAI will just massively speed up technological advancement with no time for reflection. And countries will defect to grab planetary real-estate before we have stable space governance norms in place.
It’s corrigible, but does not influence us via auto-induced distributional shift to make it easier to comply with what we ask it.