They could become extremely powerful in the next 10-50 years.
We basically don’t understand how they work (except at high levels of abstraction) or what’s happening inside them. This gets even harder as they get bigger and/or more general.
We don’t know how to reliably get these systems to do what we want them to do. One, it’s really hard to specify what exactly we want. Two, even if we could, their goals/drives may not generalize to new environments.
But it does seem like, whatever objectives they do aim for, they’ll face incentives that conflict with our interests. For example, accruing more power, preserving option value, avoiding being shut off and so on is generally useful, whatever goal you pursue.
It’s really hard to rigorously test AIs because they (1) are the result of a “blind” optimization process (not a deliberate design), (2) are monolithic (i.e. don’t consist of individual testable components), and (3) may at some point be smarter than us.
There are strong incentives to develop and deploy AI systems. This means powerful AI systems may be deployed even if they aren’t adequately safe/tested.
Of course this is a rough argument, and necessarily leaves out a bunch of detail and nuance.
How about something like:
AI systems are rapidly becoming more capable.
They could become extremely powerful in the next 10-50 years.
We basically don’t understand how they work (except at high levels of abstraction) or what’s happening inside them. This gets even harder as they get bigger and/or more general.
We don’t know how to reliably get these systems to do what we want them to do. One, it’s really hard to specify what exactly we want. Two, even if we could, their goals/drives may not generalize to new environments.
But it does seem like, whatever objectives they do aim for, they’ll face incentives that conflict with our interests. For example, accruing more power, preserving option value, avoiding being shut off and so on is generally useful, whatever goal you pursue.
It’s really hard to rigorously test AIs because they (1) are the result of a “blind” optimization process (not a deliberate design), (2) are monolithic (i.e. don’t consist of individual testable components), and (3) may at some point be smarter than us.
There are strong incentives to develop and deploy AI systems. This means powerful AI systems may be deployed even if they aren’t adequately safe/tested.
Of course this is a rough argument, and necessarily leaves out a bunch of detail and nuance.