Slaying the Hydra: toward a new game board for AI

AI Timelines as a Hydra

Think of current timelines as a giant hydra. You can’t exactly see where the head is, and you don’t know exactly if you’re on the neck of the beast or the body. But you do have some sense of what a hydra is, and the difficulty of what you’re in for. Wherever the head of the beast is, that’s the part where you get eaten, so you want to kill it before that point. Say, you saw a head approaching. Perhaps it was something being created by Facebook AI, perhaps Google, it doesn’t matter. You see an opportunity, and you prevent the apocalypse from happening.

Looks like Deepmind was about to destroy the world, but you somehow prevented it. Maybe you convinced some people at the company to halt training, it doesn’t matter. Congratulations! You’ve cut the head off the hydra! However, you seem to remember reading something about the nature of how hydras work, and as everyone is celebrating, you notice something weird happening.

So it looks like you didn’t prevent the apocalypse, but just prevented an apocalypse. Maybe Google held off on destroying the world, but now FAIR and OpenAI are on a race of their own. Oh, well, now it’s time to figure out which path to turn, and cut the head off again. Looks like FAIR is a little more dangerous, and moving a bit faster, better prevent that happening now.

Well, this isn’t working. Now there are three of them. You now have two new AI tech companies companies to deal with, and you still haven’t taken care of that bad OpenAI timeline. Better keep cutting. But it seems the further you go down, the more heads appear.

This is the key problem of finding a good outcome. It’s not the odds of Facebook, or OpenAI, or anyone else creating the head of the hydra, it’s that the head appears whichever path you go down. And the further down you go, the harder it will be to foresee and cut off all of the many heads.

AI as a Chess Game

AI is an unstable game. It involves players continuously raising stakes until, inevitably, someone plays a winning move that ends the game. As time goes on, the players increase in possible moves. Controlling players becomes increasingly more and more difficult as the game becomes more and more unstable.

The only way stability is brought back is the game ending. This happens by one player executing a winning move. And, given our current lack of control or understanding, that winning move is probably easiest to execute by simply removing the other players. In this case, an AI gains stability over future timelines by in some way, intentionally or not, removing humans from the board, since humans could create an even more powerful AI. This is by default, by far, the easiest way to end the game.

Note that this does not even require the AI to intentionally decide to destroy the world, or for the AI to even be a full general intelligence, or have agency, goal-seeking, or coherence, just that destroying the world is the least complex method required to end the game. By being the simplest method for ending it, it’s also the most probable. The more complex the strategy, the harder it will be to execute. Winning moves that don’t involve removing the other pieces, but somehow safely preventing players’ from making such moves in the future, seems more complicated, and thus less likely to occur before a destructive move is played.

Note that this doesn’t deal with deceptive alignment, lack of interpretability, sharp left turns, or any other specific problems voiced. It doesn’t even necessitate any specific problems. It’s the way the board is set up. There could hypothetically even be many safe AGIs in this scenario. But, if the game keeps being played, eventually someone plays a wrong move. Ten safe, careful moves doesn’t stop an eleventh move that terminates the game for all other players. And the way I see current trajectories, each move leads to players’ rise in potential moves. When I speak of potential moves, I mean, in simpler form, as the power of the AIs scale. The number of potential actions that AIs can make will increase. Think of a chess game where more and more pieces become rooks, and then become queens.

Eventually, some of those potential moves will include game-ending moves. For instance, in terms of humans’ option space, we went from unlocking Steam Power to unlocking Nukes in just a few moves. The speed of these moves will likely get faster and faster. Eventually, it will turn into a game of blitz. And anyone who has tried playing Blitz Chess, without the sufficient experience to handle it, knows that you begin making more and more risky moves, with a higher and higher likelihood of error as the game continues.

What’s the point of writing this? I think my understanding has driven me to believe five separate things.

1. Current Governance plans will likely fail

Human political governance structures cannot adapt at the speed of progress in AI. Scale this to that of international treaties, and it becomes even harder. Human governance structures are not fast at incorporating new information, or operating in any kind of parallel decision making. Even operating with some of the best and brightest in the field, the structure is too rigid to properly control the situation.

2. Most current Alignment plans are Insane

Many don’t have any longterm Alignment plans. And the ones that do, many (not all) current plans involve taking one of the pieces on the board, having it go supernova, have it control all other pieces on the board, and find a way for it to also be nice to those other pieces. This is not a plan I foresee working. This might be physically possible, but not humanly possible. An actual alignment solution sent from the future would probably be a gigantic, multi-trillion parameter matrix. Thinking that humans will find that path on their own is not realistic. As a single intelligence moves up the staircase of intelligence, humans remain at the bottom, looking up.

3. We need a new board

Instead of continuing to find a solution on the current board, we could instead focus on creating a new one. This would likely involve Mechanism Design. This new board could be superior-enough to the old one that the older pieces would have to join it to maintain an advantage. In this new board, rules could be interwoven into its fabric. Control of the board could be out of the hands of rogue actors. And, as a result, a new equilibria could become the default state. Stability and order could be maintained, even while scaling growth. The two sides of AI, Governance and Technical Alignment, could be merged into true technical governance. Information being incorporated in real time, collective intelligent decision making based on actual outcomes, and our values and interests maintained at the core.

4. Other, more dangerous boards could be created

With distributed computing, if we fail to act, we might risk new boards being created anyway. And these boards might have no safety by default, and prove to be a possibly unstoppable force without a global surveillance state. Resorting to such extremes in order to preserve ourselves is not ideal, but in the future, governments might find it necessary. And even that might not be enough to stop it.

5. Building a new foundation

Not all the answers have been figured out yet. But I think a lot could be built on this, and we could move toward building a new foundation of AI. Something greater systems could operate on, but still be adherent to the greater, foundational network. Intelligent, complex systems that adhere to simple rules that keep things in place is the foundation of consensus mechanism design. As AI proliferation continues, they will fall into more and more hands. The damage one individual can do with an AI system will grow greater. But the aggregate, the majority of intelligence, could be used to create something much more collectively intelligent than any rogue. This is a form of superintelligence that is much more manageable, monitorable, verifiable, and controllable. It’s the sum of all its parts. And as systems continue to grow in intelligence, the whole will still be able to stay ahead of its parts.

What this provides is governance. True governance of any AI system. It does not require the UN, the US, or any alliance to implement. Those organizations could not possibly move fast enough and adapt to every possible change. It could be this doesn’t even need the endorsement of major AI tech companies. It only has to work.

Conclusion

I don’t think finding a good winning move is realistic, given the game. I think we could maybe find a solution by reexamining the board itself. The current Alignment approach of finding a non-hydra amid a sea of hydra heads does not seem promising. There has been so much focus on the pieces of the game, and their nature. But so little attention on the game itself. If we can’t find a winning strategy with the current game, perhaps we need to devote our attention toward a superior game board. One that is better than the current one, so that old pieces will join it willingly, and one that has different rules implemented. This is why I think Mechanism Design is so important for Alignment. It’s possible that with new, technically-engrained rules, we could create a stable equilibria by default. I have already proposed my first, naive approach to this, and I intend to continue researching better and better solutions. But I think this area is severely under-researched, and we need to start rethinking the entire nature of the game.

Crossposted from LessWrong (0 points, 5 comments)