harfe comments on Diagram with Commentary for AGI as an X-Risk

harfe 24 May 2023 20:53 UTC
3 points
0 ∶ 0

If we assume that no AGI system develops its own goal(s), which I assign a probability of 0.2, then it is also necessary to consider whether any AGI system’s programmed goal(s) still leads to an EC. I assign this a probability of 0.04 because the human(s) who trained the AGI might not have thought out in enough detail what the consequences of programming the AGI with a specific goal or set of goals would be. The paperclip maximizer scenario is a classic example of this. Another scenario is if a nefarious human (or multiple nefarious humans) purposely creates and releases an AGI system with a destructive goal (or goals) that no human can control (including the person or people who released the AGI system) after it is discharged into the world.

I only see arguments for the 0.04 case, but not for the 0.96 case. Do you have any concrete goals in mind that would not result in an EC?

If I understand correctly, you claim to be 0.96 confident that not only outer alignment will be solved, but also that all AGIs will use some kind of outer alignment solution, and no agent builds an AGI with inadequate alignment. What makes you so confident?
- Jared Leibowich 24 May 2023 21:46 UTC
  3 points
  0 ∶ 0
  Parent
  Thank you for your comment and insight. The main reason why my forecast for this scenario is not higher is because I think there is a sizable risk of an existential catastrophe unrelated to AGI occurring before the scenario you mentioned were to resolve positively.
  I am very open to adjusting my forecast, however. Are there any resources you would recommend that make an argument for why we should forecast a higher probability for this scenario relative to other AGI x-risk scenarios? And what are your thoughts on the likelihood of another existential catastrophe occurring to humanity before an AGI-related one?
  Also please excuse any delay in my response because I will be away from my computer for the next several hours, but I will try to respond within the next 24 hours to any points you make.