Wei Dai comments on Why AI alignment could be hard with modern deep learning

Wei Dai 23 Sep 2021 9:19 UTC
25 points
1 ∶ 0

There’s a very wide range of views on this question, from “misalignment risk is essentially made up and incoherent” to “humanity will almost certainly go extinct due to misaligned AI.” Most people’s arguments rely heavily on hard-to-articulate intuitions and assumptions.

My sense is that the disagreements are mostly driven “top-down” by general psychological biases/inclinations towards optimism vs pessimism, instead of “bottom-up” as the result of independent lower-level disagreements over specific intuitions and assumptions. The reason I think this is that there seems to be a strong correlation between concern about misalignment risk and concern about other kinds of AI risk (i.e., AI-related x-risk). In other words, if the disagreement was “bottom-up”, then you’d expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call “human safety problems” (see examples here and here) but in fact I don’t seem to see anyone whose position is something like, “AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying.”

(From my limited observation, optimism/pessimism on AI risk also seems correlated to optimism/pessimism on other topics. It might be interesting to verify this through some systematic method like a survey of researchers.)
- Buck 23 Sep 2021 17:57 UTC
  27 points
  1 ∶ 0
  Parent
  In other words, if the disagreement was “bottom-up”, then you’d expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call “human safety problems” (see examples here and here) but in fact I don’t seem to see anyone whose position is something like, “AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying.”
  FWIW I know some people who explicitly think this. And I think there are also a bunch of people who think something like “the alignment problem will probably be pretty technically easy, so we should be focusing on the problems arising from humanity sometimes being really bad at technically easy problems”.
  - Wei Dai 23 Sep 2021 22:11 UTC
    18 points
    0 ∶ 0
    Parent
    Sounds like their positions are not public, since you don’t cite anyone by name? Is there any reason for that?
- Lukas Finnveden 26 Sep 2021 17:48 UTC
  22 points
  0 ∶ 0
  Parent
  FWIW, I think my median future includes humanity solving AI alignment but messing up reflection/coordination in some way that makes us lose out on most possible value. I think this means that longtermists should think more about reflection/coordination-issues than we’re currently doing. But technical AI alignment seems more tractable than reflection/coordination, so I think it’s probably correct for more total effort to go towards alignment (which is the status quo).
  I’m undecided about whether these reflection/coordination-issues are best framed as “AI risk” or not. They’ll certainly interact a lot with AI, but we would face similar problems without AI.
- bmg 26 Sep 2021 18:30 UTC
  12 points
  0 ∶ 0
  Parent
  FWIW, I haven’t had this impression.
  
  Single data point: In the most recent survey on community opinion on AI risk, I was in at least the 75th percentile for pessimism (for roughly the same reasons Lukas suggests below). But I’m also seemingly unusually optimistic about alignment risk.
  
  I haven’t found that this is a really unusual combo: I think I know at least a few other people who are unusually pessimistic about ‘AI going well,’ but also at least moderately optimistic about alignment.
  
  (Caveat that my apparently higher level of pessimism could also be explained by me having a more inclusive conception of “existential risk” than other survey participants.)