A stubborn unbeliever finally gets the depth of the AI alignment problem
I realise posting this here might be preaching to the converted, but I think it could be interesting for some people to see a perspective from someone slow to get onboard with worrying about AI alignment.
I’m one of those people that finds it hard to believe that misaligned Artificial General Intelligence (AGI) could destroy the world. Even though I’ve understood the main arguments and can’t satisfyingly refute them, a part of my intuition won’t easily accept that it’s an impending existential threat. I work on deploying AI algorithms in industry, so have an idea of both how powerful and limited they can be. I also get why AI safety in general should be taken seriously, but I struggle to feel the requisite dread.
The best reason I can find for my view is that there is a lot of “Thinkism” in arguments for AGI takeoff. Any AGI that wants to make an influence outside of cyberspace, e.g. by building nanobots or a novel virus, will ultimately run into problems of computational irreducibility — it isn’t possible to model everything accurately, so empirical work in the physical world will always be necessary. These kind of experiments are slow, messy and resource intensive. So, any AGI is going to reach some limits when it tries to influence the physical world. I do realise there are loads of ways an AGI can cause a lot of damage without requiring the invention of new physical technologies, but this still slowed things down enough for me to worry less about alignment issues.
That was, until I started realising that alignment problems aren’t limited to the world of AI. If you look around you can see them everywhere. The most obvious example is climate change — there is a clear misalignment between the motivations of the petroleum industry and the long term future of humanity, which causes a catastrophic problem. The corporate world is full of such alignment problems, from the tobacco industry misleading the public about the harms of smoking to social media companies hijacking our attention.
It was exploring the problems caused by social media that helped me get the scale of the issue. I wrote an essay to understand why I was spending so much time on browsing the internet, without much to show for it or really enjoying the experience. You can read the full essay here, but the main takeaway for AI safety is that we can’t even deploy simple AI algorithms at scale without causing big societal problems. If we can’t manage in this easy case, how can we possibly expect to be able to deal with more powerful algorithms?
The issue of climate change is even slower acting and more problematic than social media. There’s also a clear scientific consensus, with a lot of public understanding about how bad it is, yet we still aren’t able to respond in a decisive and rational manner. Realising this has finally driven home how much of an issue AGI misalignment is going to be. Even if it might not happen at singularity inducing speeds it’s going to be incredibly destabilising and difficult to deal with. Then even if the AGIs themselves could be aligned, we have to seriously worry about aligning the people that deploy them.
I think there might be a silver lining though. As outlined above, I have a suspicion of solutions that look like Thinkism, and aren’t tested in the real world. However, as there are a whole bunch of existing alignment problems waiting to be resolved, they could act as real world testing grounds before we run into a serious AGI alignment issue. I personally believe the misalignment of social media companies with their users could be a good place to start. It would be very informative to try to build machine learning algorithms for large scale content recommendation that give people a feeling of flourishing on the internet, rather than time wasting and doom scrolling. You can read more details of my specific thoughts about this problem in my essay.
As a final bonus, I think there was something else that made it difficult for me to grok the AI alignment problem — I find it hard to intuitively model psychopathic actors. Even though I know an AI wouldn’t think like a human, if I try to imagine how it might think, I still end up giving it a human thought process. I finally managed to break this intuition reading this great short story by Ted Chiang—Understand. I recommend anyone who hasn’t read it to read it with AI in mind. It really gives you a feel of the perspective of a misaligned super-intelligence. Unfortunately, I think for the ending to work out in the same way, we’d have to crack the alignment problem first.
So, now this non-believer has been converted, I finally feel onboard with all the panic, which hasn’t been helped by the insane progress in AI capabilities this year. It’s time to start thinking about this more seriously…
I’m sure I’m not the first to have these thoughts, so if you can share any links below for me to read further it would be appreciated.
If you haven’t already, I’d recommend reading Richard Ngo’s AGI Safety From First Principles, which I think is an unusually rigorous treatment of the issue.
I had it bookmarked, but not looked at it yet. Thanks for the recommendation!
Also check out the AGI Safety Fundamentals Alignment Curriculum and corresponding Google doc. The Intro to ML Safety material might also be of interest.
Thanks post this post! Seeing how many global challenges are in a sense alignment problems also brought me on board with understanding AI Safety. Climate change and social media are good touchstones for what I think of as social/political alignment issues.
I don’t know if this is exactly correct (so someone help me if I’m off base) but I find the AI alignment issue especially mentally complex to wrap my head around because it doesn’t seem like we have good solutions yet at almost any level of technical or social/political alignment. Here’s how I think of them in my head:
technical alignment: can we have an inconceivably smart optimizing machine follow what we really want it to do in order to benefit us, vs taking the letter of its programming down paths that would be bad. Can we look into the black box to know what the heck is going on, so that we can stop it if needed.
AND
social/political alignment: Can we as humans create and uphold fair and effective rules of regulation on power that are effective in a globalized economy without a strong world government. Can we design laws and social norms that prevent catastrophe when more and more people and businesses have access to access to increasingly powerful machines that do what they are asked (blow people up if you want them to with enormous accuracy) and have unintended side effects (influencing elections through social media algorithms).
With AI we don’t have either. It is sort of as if runaway climate change were happening and we didn’t yet understand that CO2 was part of the root cause or something.
The fact that a lot of x-risk issues share common threads in the social-political alignment sphere to me is interesting, and is one of my main arguments for why EA-ers should pay more attention to climate change. It seems to share some of the global game-theory elements to other issues like pandemics and AI regulation, and work on x-risks as a whole may be stronger if there is a lot of cross-pollination of strategies and leanings, ESPECIALLY because climate change is less neglected and has had some amount of progress in recent decades.
yeah! It definitely seems like AI alignment is difficult in the two aspects you say, technical and social, whereas something like climate change is mainly difficult from a social perspective. I feel like getting social media right is something that we don’t actually know how to solve technically either, so maybe this is another motivation for trying to use it as a test case.
Overall, the realisation of the scale of the challenge of just the social aspect is what has really got my attention.
The corporate alignment problem does precede the AI alignment problem. In some sense we rather deliberately misaligned them by giving them a single goal, relying on human agency and motivation embedded within the system to keep them from running amok. But as they became more sophisticated and competed with each other this became rather unreliable and we have instead tried to restrain and incentivize them with regulation, which has also not been entirely satisfactory.
Steinbeck was prescient (or just a keen observer):
“It happens that every man in a bank hates what the bank does, and yet the bank does it. The bank is something more than men, I tell you. It’s the monster. Men made it, but they can’t control it.”
Unfortunately the gap between politically feasible solutions and ones that seem likely to actually be effective is pretty large in this area.
yeah totally, and I don’t see how these problems aren’t going to directly translate into problems with AI alignment, as the most likely places to first deploy AGI are going to be corporations