dsj comments on High-level hopes for AI alignment

dsj 19 Mar 2023 5:47 UTC
9 points
3 ∶ 0
A better argument is that the wildness of the next century means our models of the future are untrustworthy, which should make us pretty suspicious of any claim that something is the P = 1 - ε outcome without a watertight case for the proposition.
There doesn’t seem to be such a watertight case for AI takeover. Most threat models^[1] rest heavily on the assumption that transformative AI will be single-mindedly optimizing for some (misspecified or mislearned) utility function, as opposed to e.g. following a bunch of contextually-activated policies^[2]. While this is plausible, and thus warrants significant effort to prevent, it’s far from clear that this is even the most likely outcome “absent highly specific conditions”, never mind a near certainty.
1. ^
  e.g. Cotra and Ngo et al
2. ^
  as proposed e.g. by shard theory
- Habryka [Deactivated] 19 Mar 2023 17:41 UTC
  16 points
  3 ∶ 0
  Parent
  Yep, I think this reasoning is better, and is closer to why I don’t assign 1-ε probability to doom.
  The sad thing is that the remaining uncertainty is something that is much harder to work with. Like, I think most of the worlds where we are fine are worlds where I am deeply confused about a lot of stuff, deeply confused about the drivers of civilization, deeply confused about how to reason well, deeply confused about what I care about and whether AI doom even matters. I find it hard to plan around those worlds.
  - [ ]
    [deleted]