RobBensinger comments on What does it mean for an AGI to be ‘safe’?

RobBensinger 7 Oct 2022 17:31 UTC
3 points
0 ∶ 0
I’d guess not. From my perspective, humanity’s bottleneck is almost entirely that we’re clueless about alignment. If a meme adds muddle and misunderstanding, then it will be harder to get a critical mass of researchers who are extremely reasonable about alignment, and therefore harder to solve the problem.
It’s hard for muddle and misinformation to spread in exactly the right way to offset those costs; and attempting to strategically sow misinformation so will tend to erode our ability to think well and to trust each other.
- Greg_Colbourn 8 Oct 2022 10:28 UTC
  2 points
  1 ∶ 1
  Parent
  I’m not sure I get your point here. Surely the terms “AI Safety” and “AI Alignment” are already causing muddle and misunderstanding? I’m saying we should be more specific in our naming of the problem.
  - RobBensinger 9 Oct 2022 0:36 UTC
    2 points
    0 ∶ 0
    Parent
    “ASI x-safety” might be a better term for other reasons (though Nate objects to it here), but by default, I don’t think we should be influenced in our terminology decisions by ‘term T will cause some alignment researchers to have falser beliefs and pursue dumb-but-harmless strategies, and maybe this will be good’. (Or, by default this should be a reason not to adopt terminology.)
    Whether current terms cause muddle and misunderstanding doesn’t change my view on this. In that case, IMO we should consider changing to a new term in order to reduce muddle and misunderstanding. We shouldn’t strategically confuse and mislead people in a new direction, just because we accidentally confused or misled people in the past.
    - Greg_Colbourn 10 Oct 2022 10:03 UTC
      2 points
      0 ∶ 0
      Parent
      What are some better options? Or, what are your current favourites?
      - RobBensinger 15 Oct 2022 16:12 UTC
        4 points
        0 ∶ 0
        Parent
        “AGI existential safety” seems like the most popular relatively-unambiguous term for “making the AGI transition go well”, so I’m fine with using it until we find a better term.
        I think “AI alignment” is a good term for the technical side of differentially producing good outcomes from AI, though it’s an imperfect term insofar as it collides with Stuart Russell’s “value alignment” and Paul Christiano’s “intent alignment”. (The latter, at least, better subsumes a lot of the core challenges in making AI go well.)
        What links here?
        Davidmanheim's comment on Don’t Call It AI Alignment by Gil (20 Feb 2023 14:19 UTC; 4 points)
        Greg_Colbourn 19 Oct 2022 17:24 UTC
        2 points
        0 ∶ 0
        Parent
        Perhaps using “doom” more could work (doom encompasses extinction, permanent curtailment of future potential, and fates worse than extinction).