Owen Cotton-Barratt comments on Owen Cotton-Barratt’s Quick takes

Owen Cotton-Barratt 18 May 2024 21:32 UTC
13 points
1 ∶ 0
Most possible goals for AI systems are concerned with process as well as outcomes.
People talking about possible AI goals sometimes seem to assume something like “most goals are basically about outcomes, not how you get there”. I’m not entirely sure where this idea comes from, and I think it’s wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I’d expect that on most reasonable sense of “most” process can have a look-in.
What’s the interaction with instrumental convergence? (I’m asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won’t be concerned with process.)
- Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won’t
- Since instrumental convergence is basically about power-seeking, there’s an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
  - I actually think there are a couple of ways for this argument to fail:
    If at some point you get a singleton, there’s now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
    A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
    (There are some complications to this I won’t get into here)
  - But even if it doesn’t fail, it pushes towards things which have Omuhundro’s basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn’t push all the way to purely outcome-concerned goals
In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn’t care about process.
How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don’t think they should be treated as a default.