Conditional on a Misaligned AGI being exposed to high-impact inputs, it will scale (in aggregate) to the point of permanently disempowering roughly all of humanity
I answered 70% for this question, but the wording doesnât feel quite right. I put >80% that a sufficiently capable misaligned AI would disempower humanity, but the first AGI deployed is likely to not be maximally capable unless takeoff is really fast. It could neither initiate a pivotal act/âprocess nor disempower humanity, then over the next days to years (depending on takeoff speeds) different systems could become powerful enough to disempower humanity.
One way in which Unaligned AGI might cease to be a risk is if we develop a test for Misalignment, such that Misaligned AGIs are never superficially attractive to deploy. What is your best guess for the year when such a test is invented?
Such a test might not end the acute risk period, because people might not trust the results and could still deploy misaligned AGI. The test would also have to extrapolate into the real world, farther than any currently existing benchmark. It would probably need to rely on transparency tools far in advance of what we have today, and because this region of the transparency tech tree also contains alignment solutions, the development of this test should not be treated as uncorrelated with other alignment solutions.
Even then, I also think thereâs a good chance this test is very difficult to develop before AGI. The misalignment test and alignment problem arenât research problems that we are likely to solve independently of AGI, theyâre dramatically sped up by being able to iterate on AI systems and get more than one try on difficult problems.
Also, conditional on aligned ASI being deployed, I expect this test to be developed within a few days. So the question should say âconditional on AGI not being developedâ.
One way in which Unaligned AGI might cease to be a risk is if we have a method which provably creates Aligned AGIs (âsolving the Alignment Problemâ). What is your best guess for the year when this is first accomplished?
I.E. The year when it becomes possible (not necessarily practical /â economic) to build an AGI and know it is definitely Aligned.
Solving the alignment problem doesnât mean we can create a provably aligned AGI. Nate Soares says
FollowingEliezer, I think of an AGI as âsafeâ if deploying it carries no more than a 50% chance of killing more than a billion people:
When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, âplease donât disassemble literally everyone with probability roughly 1â is an overly large ask that we are not on course to get. [...] Practically all of the difficulty is in getting to âless than certainty of killing literally everyoneâ. Trolley problems are not an interesting subproblem in all of this; if there are any survivors, you solved alignment. At this point, I no longer care how it works, I donât care how you got there, I am cause-agnostic about whatever methodology you used, all I am looking at is prospective results, all I want is that we have justifiable cause to believe of a pivotally useful AGI âthis will not kill literally everyoneâ.
Notably absent from this definition is any notion of âcertaintyâ or âproofâ. I doubt weâre going to be able to prove much about the relevant AI systems, and pushing for proofs does not seem to me to be a particularly fruitful approach (and never has; the idea that this was a key part of MIRIâs strategy is a common misconception about MIRI).
I have some qualms with the survey wording.
I answered 70% for this question, but the wording doesnât feel quite right. I put >80% that a sufficiently capable misaligned AI would disempower humanity, but the first AGI deployed is likely to not be maximally capable unless takeoff is really fast. It could neither initiate a pivotal act/âprocess nor disempower humanity, then over the next days to years (depending on takeoff speeds) different systems could become powerful enough to disempower humanity.
Such a test might not end the acute risk period, because people might not trust the results and could still deploy misaligned AGI. The test would also have to extrapolate into the real world, farther than any currently existing benchmark. It would probably need to rely on transparency tools far in advance of what we have today, and because this region of the transparency tech tree also contains alignment solutions, the development of this test should not be treated as uncorrelated with other alignment solutions.
Even then, I also think thereâs a good chance this test is very difficult to develop before AGI. The misalignment test and alignment problem arenât research problems that we are likely to solve independently of AGI, theyâre dramatically sped up by being able to iterate on AI systems and get more than one try on difficult problems.
Also, conditional on aligned ASI being deployed, I expect this test to be developed within a few days. So the question should say âconditional on AGI not being developedâ.
Solving the alignment problem doesnât mean we can create a provably aligned AGI. Nate Soares says