Robi Rahman comments on AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death

Robi Rahman 22 Sep 2022 19:35 UTC
18 points
4 ∶ 0
A paperclip maximiser and a pencil maximiser cannot “agree to disagree”. One of them will get to tile the universe with their chosen stationery implement, and one of them will be destroyed. They are mortal enemies with each other, and both of them are mortal enemies of the stapler maximiser, and the eraser maximiser, and so on. Even a different paperclip maximiser is the enemy, if their designs are different. The plastic paperclipper and the metal paperclipper must, sooner or later, battle to the death.
The inevitable result of a world with lots of different malevolent AGI’s is a bare-knuckle, vicious, battle royale to the death between every intelligent entity. In the end, only one goal can win.
Are you familiar with the concept of values handshakes? An AI programmed to maximize red paperclips and an AI programmed to maximize blue paperclips and who know that each would prefer to destroy each other might instead agree on some intermediate goal based on their relative power and initial utility functions, e.g. they agree to maximize purple paperclips together, or tile the universe with 70% red paperclips and 30% blue paperclips.
- Linch 22 Sep 2022 22:05 UTC
  12 points
  1 ∶ 0
  Parent
  Related: Fearon 1995 from the IR literature. Basically, rational actors should only go to war against each other in a fairly limited set of scenarios.
  - Mau 23 Sep 2022 1:48 UTC
    10 points
    4 ∶ 0
    Parent
    +1 on this being a relevant intuition. I’m not sure how limited these scenarios are—aren’t information asymmetries and commitment problems really common?
    - mako yass 27 Aug 2023 20:20 UTC
      3 points
      1 ∶ 0
      Parent
      Today, somewhat, but that’s just because human brains can’t prove the state of their beliefs or share specifications with each other (ie, humans can lie about anything). There is no reason for artificial brains to have these limitations, and any trend towards communal/social factors in intelligence, or self-reflection (which is required for recursive self-improvement), then it’s actively costly to be cognitively opaque.
    - Linch 23 Sep 2022 3:30 UTC
      2 points
      0 ∶ 0
      Parent
      .
      - MichaelStJules 23 Sep 2022 4:17 UTC
        4 points
        1 ∶ 0
        Parent
        Double comment?
    - Linch 23 Sep 2022 3:28 UTC
      2 points
      0 ∶ 0
      Parent
      I agree that they’re really common in the current world. I was originally thinking that this might become substantially common in multipolar AGI scenarios (because future AIs may have better trust and commitment mechanisms than current humans do). Upon brief reflection, I think my original comment was overly concise and not very substantiated.
  - Vasco Grilo🔸 18 May 2024 9:05 UTC
    2 points
    0 ∶ 0
    Parent
    For reference, here is a seemingly nice summary of Fearon’s “Rationalist explanations for war” by David Patel.
- MichaelStJules 23 Sep 2022 4:16 UTC
  7 points
  0 ∶ 0
  Parent
  CLR just published a related sequence: https://www.lesswrong.com/posts/oNQGoySbpmnH632bG/when-does-technical-work-to-reduce-agi-conflict-make-a
- Charles He 23 Sep 2022 20:30 UTC
  3 points
  0 ∶ 0
  Parent
  This seems basic and wrong.
  In the same way that two human super powers can’t simply make a contract to guarantee world peace, two AI powers could not do so either.
  (Assuming an AI safety worldview and the standard, unaligned, agentic AIs) in the general case, each AI will always weigh/consider/scheme at getting the other’s proportion of control, and expect the other is doing the same.
  based on their relative power and initial utility functions
  It’s possible that peace/agreement might come from some sort of “MAD” or game theory sort of situation. But it doesn’t mean anything to say it will come from “relative power”.
  Also, I would be cautious about being too specific about utility functions. I think an AI’s “utility function” generally isn’t a literal, concrete, thing, like a Python function that gives comparisons , but might be far more abstract, and could only appear from emergent behavior. So it may not be something that you can rely on to contract/compare/negotiate.
- Vasco Grilo🔸 17 May 2024 8:56 UTC
  2 points
  0 ∶ 0
  Parent
  Nice point, Robi! That being said, it seems to me that having many value handshakes correlated with what humans want is not too different from historical generational changes within the human species.