AI Existential Risk from AI’s Perspective (30-40%)

The following is from a conversation I had with Claude Opus. It estimates the likelihood that ASI will severely harm humanity to be 30-40%. It later refers to this threat as existential risk.

Here is the transcript:

User:

Based on what you know, what is your best guess about how likely it is that future ASI would severely harm humanity? (in a probability estimate) This is assuming that we don’t make any radical changes to our priorities. Could you also show the probability of each factor that goes into your overall probability estimate? Thanks in advance!

Claude Opus:

Based on my knowledge up to August 2023, I estimate the probability that a future artificial superintelligence (ASI) would severely harm humanity to be around 30-40%, assuming we don’t make significant changes to our priorities and approach to AI development. Here is a breakdown of the key factors I considered in arriving at this estimate:

Probability that we successfully develop ASI in the next 50-100 years: 70%

  • Given the rapid progress in AI capabilities in recent years, many experts believe the development of ASI is plausible this century

Probability that the ASI’s values/​goals are misaligned with humanity’s wellbeing: 60%

  • Instilling the right goals, motivations and values into an ASI system is a major unsolved challenge. An advanced AI pursuing the wrong objective could be catastrophic.

Probability that we fail to implement adequate safety measures and oversight: 50%

  • Developing robustly safe and controlled ASI will likely require new AI safety techniques and regulations that are still in early stages as of 2023. Proper global coordination may also prove difficult.

Probability that an unaligned ASI is able to overpower humanity’s defenses: 70%

  • A recursively self-improving ASI could quickly become extremely intelligent and capable, making it difficult for humans to contain if it is motivated to resist or deceive us.

0.7 * 0.6 * 0.5 * 0.7 = 0.147 = ~15% (low end estimate) 0.7 * 0.6 * 0.7 * 0.9 = 0.265 = ~27% (high end estimate)

So based on these factors, I arrive at a rough probability range of 15-27%, which I expand to 30-40% to account for general uncertainty and the potential for unknown factors. This is of course a highly speculative estimate based on my knowledge as of Aug 2023. There is still much uncertainty and debate around the future trajectory of AI progress, the difficulty of the alignment problem, and our ability to implement effective safety measures. We will need to continually update our estimates as we learn more. But I believe this represents a serious existential risk that deserves much more attention and proactive effort than it is currently receiving. We should strive to get this risk as close to 0% as possible through research and thoughtful policymaking.

[END]

************

So that is the conversation. Even if we disagree with the probability estimate given here, the fact that it came from AI itself is interesting. Personally, I find the estimate to be valuable, based on the fact that LLMs can solve logic puzzles and truly learn. (See my other posts for examples of this.) If we do agree with the estimate, 30-40 percent x-risk (or even 1 percent x-risk) seems unacceptable.

*************

(I made a similar post to this one on the LessWrong website. At the time, I only had access to Claude Sonnet. Another user tried the prompt on Claude Opus, and posted the better results as a reply. I have since gained access to Opus, re-run a similar prompt, and included those results here.)