RobBensinger comments on Ngo and Yudkowsky on alignment difficulty

RobBensinger 16 Nov 2021 2:57 UTC
9 points
0 ∶ 0
What’s the connection you have in mind?
- Linch 16 Nov 2021 23:38 UTC
  9 points
  0 ∶ 0
  Parent
  I’m not communicating this well and I don’t think I can pass an ITT for Yudkowsky so I might well be wrong even if I could communicate it, but roughly something like I think of a lot of things as less coupled with general mental ability. So I (and also my toy model of Richard Ngo) think of evolutionary paths like humans, neural anatomy like humans, sharper selection for pleasure and pain, etc as plausibly important for “having internal senses of state like humans,” even if it comes with substantially subhuman mental ability. In a similar way, I and I think many other EAs share the intuition that gradients can (and if carefully constructed, will) shape a lot of different states that will result in many things that humans would consider brilliant, without being by default having human-like motivations like power-seeking.
  
  Whereas I think my toy model of Eliezer would say this is bonkers. Sure in the abstract you can create minds that can drive a red car but not drive a blue car, or have a language model pass a Turing Test for human speech without having internal emotions, or be able to do superhuman physics without wanting to take over the world, but this is just vanishingly unlikely without lots of dedicated effort/a “miracle.” More broadly Eliezer appears to have a broad model/theory for almost anything important, whereas other people (myself included) are on average more fine with just not having a theory for things (partially because we’re worse at theorizing).