One perspective that I (and I think many other people in the AI Safety space) have is that AI Safety people’s “main job” so to speak is to safely hand off the reins to our value-aligned weakly superintelligent AI successors.
This involves: a) Making sure the transition itself goes smoothly and
b) Making sure that the first few generations of our superhuman AI successors are value-aligned with goals that we broadly endorse.
Importantly, this likely means that the details of the first superhuman AIs we make are critically important. We may not be able to, or need to, solve technical alignment or strategy in the general case. What matters most is that our first* successors are broadly aligned with our goals (as well as other desiderata).
At least for me, I think an implicit assumption of this model is that humans will have to hand off the reins anyway. Whether by choice or by force. Without fast takeoffs, it’s hard to imagine that the transition to vastly superhuman AI will primarily be brought about or even overseen by humans, as opposed to nearly-vastly superhuman AI successors.
Unless we don’t build AGI, of course.
*In reality, may be several generations, I imagine the first iterations of weakly superhuman AIs will make decisions alongside humans, and we may still wish to cling to relevance maintain some level of human-in-the-loop oversight for a while longer, even after AIs are objectively smarter than us in every way.
One perspective that I (and I think many other people in the AI Safety space) have is that AI Safety people’s “main job” so to speak is to safely hand off the reins to our value-aligned weakly superintelligent AI successors.
This involves:
a) Making sure the transition itself goes smoothly and
b) Making sure that the first few generations of our superhuman AI successors are value-aligned with goals that we broadly endorse.
Importantly, this likely means that the details of the first superhuman AIs we make are critically important. We may not be able to, or need to, solve technical alignment or strategy in the general case. What matters most is that our first* successors are broadly aligned with our goals (as well as other desiderata).
At least for me, I think an implicit assumption of this model is that humans will have to hand off the reins anyway. Whether by choice or by force. Without fast takeoffs, it’s hard to imagine that the transition to vastly superhuman AI will primarily be brought about or even overseen by humans, as opposed to nearly-vastly superhuman AI successors.
Unless we don’t build AGI, of course.
*In reality, may be several generations, I imagine the first iterations of weakly superhuman AIs will make decisions alongside humans, and we may still wish to
cling to relevancemaintain some level of human-in-the-loop oversight for a while longer, even after AIs are objectively smarter than us in every way.