There should be a public adversarial collaboration on AI x-risk
I think that adversarial collaborations are a good way of understanding competing perspectives on an idea, especially if it is polarising or especially controversial.
The term was first introduced by Daniel Kahneman. The basic idea is that two people with competing perspectives on an issue work together towards a joint belief. Two people working in good faith would be able to devise various experiments and discussions that clarify the idea and work towards a joint belief. (Kahneman uses the word “truth”, but I think the word “belief” is more justified in this context).
AI x-risk is a good place to have a public adversarial collaboration
First the issue is especially polarising. The beliefs of people working on AI risk are that AI presents one of the greatest challenges to humanity’s survival. On the other hand, AI research organisations by revealed preference (they’re going full speed ahead on building AI capabilities) and stated preference (see this survey too) think the risk is much lower.
In my opinion having an adversarial collaboration between a top AI safety person (who works on x-risk from AI) and someone who did not think that the x risks were substantial would have clear benefits.
It would make the lines of disagreement clearer. To me, an outsider in the space it’s not very clear where exactly people disagree and to what extent. This would clear that up and possibly provide a baseline for future debate to be based on.
It would also legitimise x-risk concerns quite a bit if this was to be co-written by someone respected in the field.
Finally, it would make both sides of the debate evaluate the other side clearly and see their own blindspots better. This would improve the overall epistemic quality of the AI x-risk debate.
How could this go wrong?
The main failure mode is that the parties writing it aren’t doing it in good faith. If they’re trying to write it out with the purpose of proving the other side wrong, it will fail terribly.
The second failure mode is that the arguments for either sides are based too much on thought experiments and it is hard to find a resolution because there isn’t much empirical grounding for either side. In Kahenman’s example, even with actual experiments they could infer from, both parties couldn’t agree with it for 8 years. That’s entirely possible with this as well.
Other key considerations
Finding the right people from both sides of the debate might be more difficult than I assume. I think there are people who can do it (eg. Richard Ngo and Jacob Buckman have said that they have done it in private) and Boaz Bark and Ben Edelman have also published a thoughtful critique (although not an adversarial collaboration), but it maybe that they’re too busy or aren’t interested enough in doing it
A similar version has been done before and this might risk duplicating it. I don’t think this is the case because the debate was hard to follow and not explicitly written with the indent of finding a joint belief.
That seems like a terrible attempt at adversarial collaboration, with a bunch of name calling and not much constructive engagement (and thus mostly interesting as a sociological exercise in understanding top AI researcher opinions). I am extremely not concerned about duplicating it!
To me the main issue with this plan will be finding an AI x-risk skeptic who actually cares enough to seriously engage and do this, and is competent enough to represent the opposing position well—my prediction is that the vast majority wouldn’t care enough to, and haven’t engaged that much with the arguments?
Boaz Barak seems like a good person? Or even the tweet I linked to by Richard Ngo and Jacob Buckman
I think adversarial collaborations are very interesting, so I am curious to hear if anyone has done any work on how we can make this technique scale a bit more? Such as writing a good manual for how to do this?
A starting point may be these two posts on an adversarial collaboration contest from 2019: https://slatestarcodex.com/2019/12/09/2019-adversarial-collaboration-entries/ and https://slatestarcodex.com/2020/01/13/2019-adversarial-collaboration-winners/.
There aren’t too many insights relating directly to scaling, however. Important takeaways seem to be (a) it’s a lot of work to coordinate, (b) lots of teams dropped out and (c) providing a template and perhaps some formatting instructions may be useful.
Whatever people end up doing, I suspect it would be quite valuable if serious effort is put into keeping track of the arguments in the debate and making it easier for people to find responses for specific points, and responses to those responses, etc. As it currently stands, I think that a lot of traditional text-based debate formats are prone to failure modes and other inefficiencies.
Although I definitely think it is good to find a risk-skeptic who is willing to engage in such a debate
I don’t think there will be one person who speaks for all skeptical views (e.g., Erik Larsen vs. Yann LeCun vs. Gary Marcus);
I think meaningful progress could be made towards understanding skeptics’ points of view even if no skeptic wants to participate or contribute to a shared debate map/management system, so long as their arguments are publicly available (I.e., someone else could incorporate it for them).