Scrutinizing AI Risk (80K, #81) - v. quick summary
Epistemic status: uncertain about whether this accurately describes Ben’s views. The podcast is great and he’s also doing a very interesting AMA. This is a very complex topic, and I would love to hear lots of different perspectives and have them really fleshed out in detail. The below is my attempt at a quick summary for those short on time.
Core ideas I took away (not that I necessarily all agree with)
Brain in a box—the classic Bostrom-Yudkowsky scenario is where there’s a superintelligent AGI developed which is far more capable than anything else people are dealing with, i.e. a brain in a box, but actually we’d expect systems to develop incrementally and so we should have other examples of similar concrete problems to work on.
Intelligence explosion—one of the concepts behind the runaway intelligence explosion is that a system is recursively self-improving, so the AI starts to rewrite its own code or hardware and then get much better. But there are many tasks that go into system improvement, and even coding requires many different skills, so just because a system might be able to improve one of its inputs, that doesn’t mean that its overall capacity should increase.
Entanglement and capabilities—when we’ve had AI systems they’ve usually got more capable by getting better at giving us what we want, so by exploring the potential space of solutions more and more carefully. For example house cleaning robots only get better as they learn more about our preferences. Thermostats only get more effective and capable when they get better at moderating the temperature, because the intelligence of meeting the goal is entangled with the goal itself. This should make us suspicious of extremely powerful and capable systems that also have divergent goals to ours.
Hard to shape the future—if we take these arguments seriously, it also might be the case that AI safety can develop more gradually as a field over coming decades, and that while it’s important, it just might not be as much of a race as some have previously argued. To take something potentially analogous, it’s not clear what someone in the 1500s could have done to influence the industrial revolution, even if they had strong reasons to think it would take off.
Some other points
In the interview, Ben also mentions that there are “multiple salient emerging forms of military technology” which could be of similar importance, giving the example of hypersonic glide vehicles. I’ve considered taking this course in Science and International Security at the War Studies department at KCL, and I’ve uploaded the syllabi for the main units here and here. Other examples are space security and cyber security.
In the 80K podcast with Stuart Russell, Stuart calls out Rob for conflating ML systems with AI systems. Just to define terms, machine learning systems improve automatically through experience with data. And artificial intelligence is a much broader area of research, including robotics, computer vision, classical search, logical reasoning, and many other areas. Stuart makes the point in the podcast that Google’s self-driving cars mostly use classical search, and so only looking at ML is part of the picture.
Rohin Shah reviews Ben’s interview favourably in the Alignment Newsletter here.
Rohin also discusses AI safety with Buck Shlegeris here, but I haven’t finished the interview (though I did find the discussion quite confrontational and switched off)
I’ve also just pulled out the most contentious points—Ben gives a very rounded and considerate interview, which I’d recommend listening to in full
My takeaways
I found Ben’s arguments to be very useful and interesting
I agree that working on existential risk involves more than just one technology, and so there could be fruitful work in security studies and power structures, with a great popular example of theoretical work being Destined for War (also see The Vulnerable World Hypothesis). This work seems important and neglected.
While I think Ben’s arguments require responses from people looking into AI, from my perspective the main idea that humans are not optimally intelligent, and that more advanced technologies could exploit that significantly in the future to produce undesirable outcomes (including human extinction and s-risks) seem plausible to me
Before listening to this podcast, I’d have put a 10-30% chance on a <6 month hard take-off scenario this century, conditional on AI safety work happening and the world not being radically different from now, but I’d now put it at something like 5-20%, though I’m really not an expert here, so would expect my views to change a lot (immediately after reading Superintelligence I was probably at 50%).
I was also very glad that this perspective was aired, and I hope it leads to more fruitful discussions
At the end of his slides, Ben closes with an important point for the EA community, ‘If we’ve failed to notice important issues with classic arguments until recently, we should also worry about our ability to assess new arguments.’
“To take something potentially analogous, it’s not clear what someone in the 1500s could have done to influence the industrial revolution, even if they had strong reasons to think it would take off.”
Detail—Ben says:
So he’s talking about the start of the Industrial Revolution, not two centuries before.