Davidmanheim comments on Don’t Call It AI Alignment

Davidmanheim 20 Feb 2023 14:19 UTC
4 points
0 ∶ 0
There’s been tons of discussion about this, and I think most of the field disagrees, and there are some complications you’re missing.

In the introduction to a paper last year, Issa Rice and I tried to clarify some definitions, which explains why “control” isn’t the same as “alignment”:
The term AI-safety also encompasses technical approaches that attempt to ensure that humans retain the ability to control such systems or to restrict the capabilities of such systems (AI control), as well as approaches that aim to align systems with human values—that is, the technical problem of figuring out how to design, train, inspect, and test highly capable AI systems such that they produce all and only the outcomes their creators want (AI alignment).
More recently, Rob Bensinger said:
“AGI existential safety” seems like the most popular relatively-unambiguous term for “making the AGI transition go well”, so I’m fine with using it until we find a better term.
I think “AI alignment” is a good term for the technical side of differentially producing good outcomes from AI, though it’s an imperfect term insofar as it collides with Stuart Russell’s “value alignment” and Paul Christiano’s “intent alignment”. (The latter, at least, better subsumes a lot of the core challenges in making AI go well.)
- Gil 20 Feb 2023 14:41 UTC
  10 points
  1 ∶ 0
  Parent
  Thanks!
  
  I hate to be someone who walks into a heated debate and pretends to solve it in one short post, so I hope my post didn’t come off too authoritative (I just genuinely have never seen debate about the term). I’ll look more into these.