You maintain this pretty well as it walks up through to primate, and then suddenly it takes a sharp left turn and invents its own internal language and a bunch of abstract concepts, and suddenly you find your visualization tools to be quite lacking for interpreting its abstract mathematical reasoning about topology or whatever.
Empirically speaking, scientists who are trying to understand human brains do spend a lot (most?) of their time looking at nonhuman brains, no?
Is Nateâs objection here something like âhuman neuroscience is not at the level where we deal with âsharp left turnâ stuff, and I expect that once neuroscientists can understand chimpanzee brains very well they will discover that there is in fact a whole other set of problems they need to solve to understand human brains, and that this other set of problems is actually the harder one?â
scientists who are trying to understand human brains do spend a lot (most?) of their time looking at nonhuman brains, no?
My sense is that this is mostly for ethics reasons, rather than representing a strong stance that animal models are the fastest way to make progress on understanding human cognition.
Thanks! That sounds right to me, but I had thought that Nate was making a stronger objection, something like âlooking at nonhuman brains is useless because you could have a perfect understanding of a chimpanzee brain but still completely fail to predict human behavior (after a âsharp left turnâ).â
Is that wrong? Or is he just saying something like âlooking at nonhuman brains is 90% less effective and given long enough timelines these research projects will pan outâI just donât expect us to have long enough timelines?â
âlooking at nonhuman brains is useless because you could have a perfect understanding of a chimpanzee brain but still completely fail to predict human behavior (after a âsharp left turnâ).â
Sounds too strong to me. If Nate or Eliezer thought that it would be totally useless to have a perfect understanding of how GPT-3, AlphaZero, and Minerva do their reasoning, then I expect that theyâd just say that.
My Nate-model instead says things like:
Current transparency work mostly isnât trying to gain deep mastery of how GPT-3 etc. do their reasoning; and to the extent itâs trying, it isnât making meaningful progress.
(âDeep mastery of how this system does its reasoningâ is the sort of thing that would let us roughly understand what thoughts a chimpanzee is internally thinking at a given time, verify that itâs pursuing the right kinds of goals and thinking about all (and only) the right kinds of topics, etc.)
A lot of other alignment research isnât even trying to understand chimpanzee brains, or future human brains, or generalizations that might hold for both chimps and humans; itâs just assuming thereâs no important future chimp-to-human transition it has to worry about.
Once we build the equivalent of âhumansâ, we wonât have much time to align them before the tech proliferates and someone accidentally destroys the world. So even if the âunderstand human cognitionâ problem turns out to be easier than the âunderstand chimpanzee cognitionâ problem in a vacuum, the fact that itâs a new problem and we have a lot less time to solve it makes it a lot harder in practice.
Empirically speaking, scientists who are trying to understand human brains do spend a lot (most?) of their time looking at nonhuman brains, no?
Is Nateâs objection here something like âhuman neuroscience is not at the level where we deal with âsharp left turnâ stuff, and I expect that once neuroscientists can understand chimpanzee brains very well they will discover that there is in fact a whole other set of problems they need to solve to understand human brains, and that this other set of problems is actually the harder one?â
My sense is that this is mostly for ethics reasons, rather than representing a strong stance that animal models are the fastest way to make progress on understanding human cognition.
Thanks! That sounds right to me, but I had thought that Nate was making a stronger objection, something like âlooking at nonhuman brains is useless because you could have a perfect understanding of a chimpanzee brain but still completely fail to predict human behavior (after a âsharp left turnâ).â
Is that wrong? Or is he just saying something like âlooking at nonhuman brains is 90% less effective and given long enough timelines these research projects will pan outâI just donât expect us to have long enough timelines?â
Sounds too strong to me. If Nate or Eliezer thought that it would be totally useless to have a perfect understanding of how GPT-3, AlphaZero, and Minerva do their reasoning, then I expect that theyâd just say that.
My Nate-model instead says things like:
Current transparency work mostly isnât trying to gain deep mastery of how GPT-3 etc. do their reasoning; and to the extent itâs trying, it isnât making meaningful progress.
(âDeep mastery of how this system does its reasoningâ is the sort of thing that would let us roughly understand what thoughts a chimpanzee is internally thinking at a given time, verify that itâs pursuing the right kinds of goals and thinking about all (and only) the right kinds of topics, etc.)
A lot of other alignment research isnât even trying to understand chimpanzee brains, or future human brains, or generalizations that might hold for both chimps and humans; itâs just assuming thereâs no important future chimp-to-human transition it has to worry about.
Once we build the equivalent of âhumansâ, we wonât have much time to align them before the tech proliferates and someone accidentally destroys the world. So even if the âunderstand human cognitionâ problem turns out to be easier than the âunderstand chimpanzee cognitionâ problem in a vacuum, the fact that itâs a new problem and we have a lot less time to solve it makes it a lot harder in practice.