Matthew_Barnett comments on Imitation Learning is Probably Existentially Safe

Matthew_Barnett 30 Apr 2024 19:26 UTC
11 points
1 ∶ 1
I agree with the title and basic thesis of this article but I find its argumentation weak.
First, we’ll offer a simple argument that a sufficiently advanced supervised learning algorithm, trained to imitate humans, would very likely not gain total control over humanity (to the point of making everyone defenseless) and then cause or allow human extinction from that position.
No human has ever gained total control over humanity. It would be a very basic mistake to think anyone ever has. Moreover, if they did so, very few humans would accept human extinction. An imitation learner that successfully gained total control over humanity and then allowed human extinction would, on both counts, be an extremely poor imitation of any human, and easily distinguishable from one, whereas an advanced imitation learner will likely imitate humans well.
This basic observation should establish that any conclusion to the contrary should be very surprising, and so a high degree of rigor should be expected from arguments to that effect.
The obvious reason why no human has ever gained total control over humanity is because no human has ever possessed the capability to do so, not because no human would make the choice to do so if given the opportunity. This distinction is absolutely critical, because if humans have historically lacked total control due to insufficient ability rather than unwillingness, then the quoted argument essentially collapses. That’s because we have zero data on what a human would do if they suddenly acquired the power to exert total dominion over the rest of humanity. As a result, it is highly uncertain and speculative to claim that an AI imitating human behavior would refrain from seizing total control if it had that capability.
The authors seem to have overlooked this key distinction in their argument.
It takes no great leap of imagination to envision scenarios where, if a human was granted near-omnipotent abilities, some individuals would absolutely choose to subjugate the rest of humanity and rule over them in an unconstrained fashion. The primary reason I believe imitation learning is likely safe is that I am skeptical it will imbue AIs with godlike powers in the first place, not because I naively assume humans would nobly refrain from tyranny and oppression if they suddenly acquired such immense capabilities.
Note: Had the authors considered this point and argued that an imitation learner emulating humans would be safe precisely because it would not be very powerful, their argument would have been stronger. However, even if they had made this point, it likely would have provided only relatively weak support for the (perhaps implicit) thesis that building imitation learners is a promising and safe approach to building AIs. There are essentially countless proposals one can make for ensuring AI safety simply by limiting its capabilities. Relying solely on the weakness of an AI system as a safety guarantee seems like an unsound strategy to me in the long-run.
- Vasco Grilo🔸 1 May 2024 9:51 UTC
  2 points
  0 ∶ 0
  Parent
  Thanks for the comment, Matthew!
  My understanding is that the authors are making 2 points in the passage you quoted:
  - No human has gained total control over all humanity, so an AI system that did so would not be imitating humans well.
  - Very few humans would endorse human extinction even if they gained total control over all humanity. Note a human endorsing human extinction would mean supporting the death of themselves, and their own family and friends.
  The obvious reason why no human has ever gained total control over humanity is because no human has ever possessed the capability to do so, not because no human would make the choice to do so if given the opportunity. This distinction is absolutely critical, because if humans have historically lacked total control due to insufficient ability rather than unwillingness, then the quoted argument essentially collapses.
  In my mind, very few humans would want to pursue capabilities which are conducive to gaining control over humanity. There are diminishing returns to having more resources. For example, if you give 10 M$ (0.01 % of global resources) to a random human, they will not have much of a desire to take risks to increase their wealth to 10 T$ (10 % of global resources), which would be helpful to gain control over humanity. To increase their own happiness and that of their close family and friends, they would do well by investing their newly acquired wealth in exchange-traded funds (ETFs). A good imitator AI would share our disposition of not gaining capabilities beyhond a certain point, and therefore (like humans) never get close to having a chance of gaining control over humanity.
  That’s because we have zero data on what a human would do if they suddenly acquired the power to exert total dominion over the rest of humanity. As a result, it is highly uncertain and speculative to claim that an AI imitating human behavior would refrain from seizing total control if it had that capability.
  I think humans usually aquire power fairly gradually. A good imitator AI would be mindful that acquiring power too fast (suddenly fooming) would go very much against what humans usually do.
  No human has ever had control over all humanity, so I agree there is a sense in which we have “zero data” about what humans would do under such conditions. Yet, I am still pretty confident that the vast majority of humans would not want to cause human extinction. A desire to be praised by others is a major reason humans like to gain power. There would be no one to praise or be praised given human extinction, so I think very few humans would want it if they suddenly gained control over all humanity.
  It takes no great leap of imagination to envision scenarios where, if a human was granted near-omnipotent abilities, some individuals would absolutely choose to subjugate the rest of humanity and rule over them in an unconstrained fashion.
  I do not think this is the best comparison. There would arguably be many imitator AIs, and these would not gain near-omnipotent abilities overnight. I would say both of these greatly constrain the level of subjugation. Historically, innovations and new investions have spread out across the broader economy, so I think there should be a strong prior against a single imitator AI suddenly gaining control over all the other AIs and humans.
  The primary reason I believe imitation learning is likely safe is that I am skeptical it will imbue AIs with godlike powers in the first place, not because I naively assume humans would nobly refrain from tyranny and oppression if they suddenly acquired such immense capabilities.
  From the 1st part of the sentence, it looks like you agree with what I said above about a good imitator AI sharing our disposition of not gaining capabilities beyhond a certain point. As for the 2nd part, I agree there would be a signicant risk of tyranny and oppression if a random human suddenly gained control over all humanity, but this seems very unlikely to me because of what I said above.
  Relying solely on the weakness of an AI system as a safety guarantee seems like an unsound strategy to me in the long-run.
  How long-run are you talking about here? Humans 500 years ago arguably had little control over current humans, but this alone does not imply a high existential risk 500 years ago. As Robin Hanson said:
  [...] ignoring AI, what did you expect to happen with your other kinds of descendants?
  Did you expect to be able to control their values? Or did you expect to not have any conflicts with them? Did you expect to win all conflicts you might have with them? And almost everybody thinks that with respect to their squishy bio -human descendants, that those descendants would in fact become more powerful than them. They would win conflicts with them, and their values would be different from theirs.
  And there might often actually be such conflicts. That’s what everybody expects from their ordinary descendants. And it’s what everybody has seen for many generations. And therefore, it’s what they’ve accepted. They don’t seem to mind that. But when it comes to AI descendants, they change their standards.
  They are worried that we shouldn’t have those kind of descendants unless we could make sure they never have a conflict with us or we’d always win the conflict with them or they could assure us that their values would never change. So people just holding different standards to the AI descendants than to the other descendants. And my main argument is that they really are your descendants and the same sort of evolutionary habit that should make you indulgent and supportive of all of your descendants, regardless of how they might differ.
  from you should apply to your AI descendants. So that’s one line of argument to say, you know, don’t hold them to different standards in the abstract than you hold all your other descendants.
  I expect you agree with some of this.
  - Matthew_Barnett 1 May 2024 18:03 UTC
    6 points
    1 ∶ 0
    Parent
    In my mind, very few humans would want to pursue capabilities which are conducive to gaining control over humanity.
    This seems false. Plenty of people want wealth and power, which are “conducive to gaining control over [parts of] humanity”. It is true that no single person has ever gotten enough power to actually get control over ALL of humanity, but that’s presumably because of the difficulty of obtaining such a high level of power, rather than because few humans have ever pursued the capabilities that would be conducive towards that goal. Again, this distinction is quite important.
    There are diminishing returns to having more resources. For example, if you give 10 M$ (0.01 % of global resources) to a random human, they will not have much of a desire to take risks to increase their wealth to 10 T$ (10 % of global resources), which would be helpful to gain control over humanity. To increase their own happiness and that of their close family and friends, they would do well by investing their newly acquired wealth in exchange-traded funds (ETFs). A good imitator AI would share our disposition of not gaining capabilities beyhond a certain point, and therefore (like humans) never get close to having a chance of gaining control over humanity.
    I agree that a good imitator AI would likely share our disposition towards diminishing marginal returns to resource accumulation. This makes it likely that such AIs would not take very large risks. However, I still think the main reason why no human has ever taken control over humanity is because there was no feasible strategy that any human in the past could have taken to obtain such a high degree of control, rather than because all humans in the past have voluntarily refrained from taking the risks necessary to obtain that degree of control.
    In fact, risk-neutral agents that don’t experience diminishing returns to resource consumption will asymptotically eventually lose all their wealth in high-risk bets. Therefore, even without this human imitation argument, we shouldn’t be much concerned about risk-neutral agents in most scenarios (including risks from reinforcement learners) since they’re very likely to go bankrupt before they ever get to the point at which they can take over the world. Such agents are only importantly relevant in a very small fraction of worlds.
    I think humans usually aquire power fairly gradually. A good imitator AI would be mindful that acquiring power too fast (suddenly fooming) would go very much against what humans usually do.
    Again, the fact that humans acquire power gradually is more of a function of our abilities than it is a function of our desires. I repeat myself but this is important: these are critical facts to distinguish from each other. “Ability to” and “desire to” are very different features of the situation.
    It is very plausible to me that some existing humans would “foom” if they had the ability. But in fact, no human has such an ability, so we don’t see anyone fooming in the real world. This is mainly a reflection of the fact that humans cannot foom, not that they don’t want to foom.
    No human has ever had control over all humanity, so I agree there is a sense in which we have “zero data” about what humans would do under such conditions. Yet, I am still pretty confident that the vast majority of humans would not want to cause human extinction.
    I am also “pretty confident” about that, but “pretty confident” is a relatively weak statement here. When evaluating this scenario, we are extrapolating into a regime in which we have no direct experience. It is one thing to say that we can be “pretty confident” in our extrapolations (and I agree with that); it is another thing entirely to imply that we have tons of data points directly backing up our prediction, based on thousands of years of historical evidence. We simply do not have that type of (strong) evidence.
    I do not think this is the best comparison. There would arguably be many imitator AIs, and these would not gain near-omnipotent abilities overnight. I would say both of these greatly constrain the level of subjugation. Historically, innovations and new investions have spread out across the broader economy, so I think there should be a strong prior against a single imitator AI suddenly gaining control over all the other AIs and humans.
    I agree, but this supports my point: I think imitator AIs are safe precisely because they will not have godlike powers. I am simply making the point that this is different from saying they are safe because they have human-like motives. Plenty of things in the world are safe because they are not very powerful. It is completely different if something is safe because its motives are benevolent and pure (even if it’s extremely powerful).
    How long-run are you talking about here? Humans 500 years ago arguably had little control over current humans, but this alone does not imply a high existential risk 500 years ago.
    I agree with Robin Hanson on this question. However, I think humans will likely become an increasingly small fraction of the world over time, as AIs become a larger part of it. Just as hunter-gatherers are threatened by industrial societies, so too may biological humans one day become threatened by future AIs. Such a situation may not be very morally bad (or deserving the title “existential risk”), because humans are not the only morally important beings in the world. Yet, it is still true that AI carries a great risk to humanity.
    - Vasco Grilo🔸 1 May 2024 19:24 UTC
      2 points
      0 ∶ 0
      Parent
      Thanks for following up, Matthew.
      Plenty of people want wealth and power, which are “conducive to gaining control over [parts of] humanity”.
      I agree, but I think very few people want to acquire e.g. 10 T$ of resources without broad consent of others. In addition, if a single AI system expressed such a desire, humans would not want to scale up its capabilities.
      I agree with Robin Hanson on this question. However, I think humans will likely become an increasingly small fraction of the world over time, as AIs become a larger part of it. Just as hunter-gatherers are threatened by industrial societies, so too may biological humans one day become threatened by future AIs. Such a situation may not be very morally bad (or deserving the title “existential risk”), because humans are not the only morally important beings in the world. Yet, it is still true that AI carries a great risk to humanity.
      I agree biological humans will likely become an increasingly small fraction of the world, but it does not follow that AI carries a great risk to humas^[1]. I would not say people born after 1960 carry a great risk risk to people born before 1960, even though the fraction of the global resources controlled by the latter is becoming increasingly small. I would consider that AI poses a great risk to humans if these were expected to suffer significantly more than in their typical lives, which also involve suffering, in the process of losing control over resources.
      ^
      You said “risk to humanity” instead of “risk to humans”. I prefer this because humanity is sometimes used to include other beings.
      - Matthew_Barnett 1 May 2024 19:33 UTC
        4 points
        0 ∶ 0
        Parent
        I agree, but I think very few people want to acquire e.g. 10 T$ of resources without broad consent of others.
        I think I simply disagree with the claim here. I think it’s not true. I think many people would want to acquire $10T without the broad consent of others, if they had the ability to obtain such wealth (and they could actually spend it; here I’m assuming they actually control this quantity of resources and don’t get penalized because of the fact it was acquired without the broad consent of others, because that would change the scenario). It may be that fewer than 50% of people have such a desire. I’d be very surprised if it were <1% and, I’d even be surprised if it was <10%.
        I agree biological humans will likely become an increasingly small fraction of the world, but it does not follow that AI carries a great risk to humas^[1]. I would not say people born after 1960 carry a great risk risk to people born before 1960, even though the fraction of the global resources controlled by the latter is becoming increasingly small.
        I think humans born after 1960 do pose a risk to humans born before 1960 in some ordinary senses. For example, the younger humans could vote to decrease medical spending, which could lead to early death for the older humans. They could also vote to increase taxes on people who have accumulated a lot of wealth, which very disproportionately hurts old people. This is not an implausible risk either; I think these things have broadly happened many times in the past.
        That said, I suspect part of the disagreement here is about time scales. In the short and medium term, I agree: I’m not so much worried about AI posing a risk to humanity. I was really only talking about long-term scenarios in my above comment.