Ben_West🔸 comments on On how various plans miss the hard bits of the alignment challenge

Ben_West🔸Sep 9, 2022, 5:16 PM
2 points
0 ∶ 0
You maintain this pretty well as it walks up through to primate, and then suddenly it takes a sharp left turn and invents its own internal language and a bunch of abstract concepts, and suddenly you find your visualization tools to be quite lacking for interpreting its abstract mathematical reasoning about topology or whatever.
Empirically speaking, scientists who are trying to understand human brains do spend a lot (most?) of their time looking at nonhuman brains, no?
Is Nate’s objection here something like “human neuroscience is not at the level where we deal with ‘sharp left turn’ stuff, and I expect that once neuroscientists can understand chimpanzee brains very well they will discover that there is in fact a whole other set of problems they need to solve to understand human brains, and that this other set of problems is actually the harder one?”
- RobBensinger Sep 22, 2022, 7:30 PM
  2 points
  0 ∶ 0
  Parent
  scientists who are trying to understand human brains do spend a lot (most?) of their time looking at nonhuman brains, no?
  My sense is that this is mostly for ethics reasons, rather than representing a strong stance that animal models are the fastest way to make progress on understanding human cognition.
  - Ben_West🔸Sep 23, 2022, 8:19 PM
    2 points
    0 ∶ 0
    Parent
    Thanks! That sounds right to me, but I had thought that Nate was making a stronger objection, something like “looking at nonhuman brains is useless because you could have a perfect understanding of a chimpanzee brain but still completely fail to predict human behavior (after a ‘sharp left turn’).”
    Is that wrong? Or is he just saying something like “looking at nonhuman brains is 90% less effective and given long enough timelines these research projects will pan out—I just don’t expect us to have long enough timelines?”
    - RobBensinger Sep 27, 2022, 9:19 PM
      4 points
      0 ∶ 0
      Parent
      “looking at nonhuman brains is useless because you could have a perfect understanding of a chimpanzee brain but still completely fail to predict human behavior (after a ‘sharp left turn’).”
      Sounds too strong to me. If Nate or Eliezer thought that it would be totally useless to have a perfect understanding of how GPT-3, AlphaZero, and Minerva do their reasoning, then I expect that they’d just say that.
      My Nate-model instead says things like:
      Current transparency work mostly isn’t trying to gain deep mastery of how GPT-3 etc. do their reasoning; and to the extent it’s trying, it isn’t making meaningful progress.
      
      (‘Deep mastery of how this system does its reasoning’ is the sort of thing that would let us roughly understand what thoughts a chimpanzee is internally thinking at a given time, verify that it’s pursuing the right kinds of goals and thinking about all (and only) the right kinds of topics, etc.)
      
      A lot of other alignment research isn’t even trying to understand chimpanzee brains, or future human brains, or generalizations that might hold for both chimps and humans; it’s just assuming there’s no important future chimp-to-human transition it has to worry about.
      
      Once we build the equivalent of ‘humans’, we won’t have much time to align them before the tech proliferates and someone accidentally destroys the world. So even if the ‘understand human cognition’ problem turns out to be easier than the ‘understand chimpanzee cognition’ problem in a vacuum, the fact that it’s a new problem and we have a lot less time to solve it makes it a lot harder in practice.