TheAntithesis answers If your AGI x-risk estimates are low, what scenarios make up the bulk of your expectations for an OK outcome?

TheAntithesis 2 Aug 2023 13:44 UTC
5 points
0 ∶ 0
I can think of a few scenarios where AGI doesn’t kill us.
1. AGI does not act as a rational agent. The predicted doom scenarios rely on the AGI acting as a rational agent that maximises a utility function at all costs. This behaviour has not been seen in nature. Instead, all intelligences (natural or artificial) have some degree of laziness, which results in them being less destructive. Assuming the orthogonality thesis is true, this is unlikely to change.
2. The AGI sees humans as more useful alive than dead, probably because it’s utility function involves humans somehow. This covers a lot of scenarios from horrible dystopias where AGI tortures us constantly to see how we react all the way to us actually somehow getting alignment right on the first try. It keeps us alive for the same reason as why we keep out pets alive.
3. The first A”G”I’s are actually just a bunch of narrow AI’s in a trenchcoat, and no one of them is able to overthrow humanity. A lot of recent advances in AI (including GPT4) have been propelled by a move away from generality and towards a “mixture of experts” model, where complex tasks are split into simpler ones. If this scales, one could expect more advanced systems to still not be general enough to act autonomously in a way that overpowers humanity.
4. AGI can’t self improve because it runs face-first into the alignment problem! If we can think of how creating an intelligence greater than us results in the alignment problem, so can AGI. An AGI that fears creating something more powerful than itself will not do that, resulting in it remaining at around human level. Such an AGI would not be strong enough to beat all of humanity combined, so it will be smart enough not to try.
- Greg_Colbourn 29 Oct 2024 10:04 UTC
  2 points
  0 ∶ 0
  Parent
  1. Species aren’t lazy (those who are—or would be—are outcompeted by those who aren’t).
  2. The pets scenario is basically an existential catastrophe by other means (who wants to be a pet that is a caricature of a human like a pug is to a wolf?). And obviously so is the torture/dystopia one (i.e. not an “OK outcome”). What mechanism would allow us to get alignment right on the first try?
  3. This seems like a very unstable equilibrium. All that is needed is for one of the experts to be as good as Ilya Sutskever at AI Engineering, to get past that bottleneck in short order (speed and millions of instances run at once) and foom to ASI.
  4. It would also need to stop all other AGIs who are less cautious, and be ahead of them when self-improvement becomes possible. Seems unlikely given current race dynamics. And even if this does happen, unless it was very aligned to humanity it still spells doom for us due to the speed advantage of the AGI and it’s different substrate needs (i.e. it’s ideal operating environment isn’t survivable for us).