I’ve not yet read the full report—only this post—and so I may well be missing something, but I have to say that I am surprised at Figure E.1:
If I understand correctly, the figure says that experts think extinction is more than twice as likely if there is a warning shot compared to if there is not.
I accept that a warning shot happening probably implies that we are in a world in which AI is more dangerous, which, by itself, implies higher x-risk.[1] On the other hand, a warning shot could galvanize AI leaders, policymakers, the general public, etc., into taking AI x-risk much more seriously, such that the overall effect of a warning shot is to actually reduce x-risk.
I personally think it’s very non-obvious how these two opposing effects weigh up against each other, and so I’m interested in why the experts in this study are so confident that a warning shot increases x-risk. (Perhaps they expect the galvanizing effect will be small? Perhaps they did not consider the galvanizing effect? Perhaps there are other effects they considered that I’m missing?)
Though I believe the effect here is muddied by ‘treacherous turn’ considerations / the argument that the most dangerous AIs will probably be good at avoiding giving off warning shots.
Hi @Will Aldred, thank you for the question! It’s one we’ve also thought about. Here are three additional considerations that may help explain this result.
The sample size in this case study was very small. Only 4-6 experts forecasted on each node, so we want to be wary of reading too much into any particular number. As people continue to forecast on the Metaculus and Manifold versions of this question we should get more data about this one!
The warning shot question here was specifically about administrative disempowerment (as opposed to deaths, property damage, or anything else). It’s possible that experts think this type of warning shot wouldn’t prompt the type of reaction that would end up reducing x-risk, or that it would be particularly hard to recover from.
In our adversarial collaboration on AI risk, we asked about “escalating warning shots,” where the warning shots involve killing large numbers of people or causing large amounts of damage. The “concerned” participants in that study had substantial overlap with the experts in this one. Conditional on that question resolving positively, the concerned group would have a lower P(doom), mostly for the reasons you said. This could be because of the different operationalization, or just because the small number of people who were randomly assigned to forecast on the administrative disempowerment warning shot question don’t expect society to respond well to warning shots.
We are very interested in what types of concerning AI behavior would be most likely to influence different responses from policymakers, and we’re hoping to continue to work on related questions in future projects!
Thank you for doing this work!
I’ve not yet read the full report—only this post—and so I may well be missing something, but I have to say that I am surprised at Figure E.1:
If I understand correctly, the figure says that experts think extinction is more than twice as likely if there is a warning shot compared to if there is not.
I accept that a warning shot happening probably implies that we are in a world in which AI is more dangerous, which, by itself, implies higher x-risk.[1] On the other hand, a warning shot could galvanize AI leaders, policymakers, the general public, etc., into taking AI x-risk much more seriously, such that the overall effect of a warning shot is to actually reduce x-risk.
I personally think it’s very non-obvious how these two opposing effects weigh up against each other, and so I’m interested in why the experts in this study are so confident that a warning shot increases x-risk. (Perhaps they expect the galvanizing effect will be small? Perhaps they did not consider the galvanizing effect? Perhaps there are other effects they considered that I’m missing?)
Though I believe the effect here is muddied by ‘treacherous turn’ considerations / the argument that the most dangerous AIs will probably be good at avoiding giving off warning shots.
Hi @Will Aldred, thank you for the question! It’s one we’ve also thought about. Here are three additional considerations that may help explain this result.
The sample size in this case study was very small. Only 4-6 experts forecasted on each node, so we want to be wary of reading too much into any particular number. As people continue to forecast on the Metaculus and Manifold versions of this question we should get more data about this one!
The warning shot question here was specifically about administrative disempowerment (as opposed to deaths, property damage, or anything else). It’s possible that experts think this type of warning shot wouldn’t prompt the type of reaction that would end up reducing x-risk, or that it would be particularly hard to recover from.
In our adversarial collaboration on AI risk, we asked about “escalating warning shots,” where the warning shots involve killing large numbers of people or causing large amounts of damage. The “concerned” participants in that study had substantial overlap with the experts in this one. Conditional on that question resolving positively, the concerned group would have a lower P(doom), mostly for the reasons you said. This could be because of the different operationalization, or just because the small number of people who were randomly assigned to forecast on the administrative disempowerment warning shot question don’t expect society to respond well to warning shots.
We are very interested in what types of concerning AI behavior would be most likely to influence different responses from policymakers, and we’re hoping to continue to work on related questions in future projects!