I enjoyed reading your insightful reply! Thanks for sharing, Guillaume. You don’t make any arguments I strongly disagree with, and you’ve added many thoughtful suggestions with caveats. The distinction you make between the two sub-questions is useful.
I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.
For example, while I admire many of the people working at PauseAI, I believe there are quite a few worlds in which those initially involved in setting up the group have had a net-negative impact in expectation. Even early on, there were indications that some people were okay with using violence or radical methods to stop AI (which was then banned by the organizers). However, what happens if these tendencies resurface when “shit hits the fan”? To push back on my own thinking, it still might be a good idea to work on PauseAI due to community diversification argument within AI safety (footnote two).
I agree that other forms of CB, such as MATS, seem more robust. But even here, I can always find compelling arguments for why I should be clueless about the expected value. For instance, an increased number of AI safety researchers working on solving an alignment problem that might ultimately be unsolvable could create a false sense of security.
I agree with your reasoning, and the way you’ve articulated it is very compelling to me! It seems that the bar this evidence would need to reach is, quite literally, impossible.
I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?
Your reply also raises a broader question for me: What criteria must an intervention meet for our determinance credence in its expected value being positive to exceed 50%, thereby justifying work on it?