titotal comments on My lab’s small AI safety agenda

titotal 19 Jun 2023 15:30 UTC
7 points
3 ∶ 0
It is interesting to think about the seeming contradiction here. Looking at the von neuman theorem you linked earlier, the specific theorem is about a rational agent choosing between several different options, and saying that if their preferences follow the axioms (no dutch-booking etc), you can build a utility function to describe those preferences.
First of all, humans are not rational, and can be dutch-booked. But even if they were much more rational in their decision making, I don’t think the average person would suddenly switch into “tile the universe to fulfill a mathematical equation” mode (with the possible exception of some people in EA).
Perhaps the problem is that the utility function describing an entities preferences doesn’t need to be constant. Perhaps today I choose to buy pepsi over coke because it’s cheaper, but next week I see a good ad for coke and decide to pay the extra money for the good associations it brings. I don’t think the theorem says anything about that, it seems like the utility just describes my current preferences, and says nothing about how my preferences change over time.
- Seth Herd 20 Oct 2023 0:06 UTC
  3 points
  1 ∶ 0
  Parent
  From a neuroscience/psychology perspective, I’d say that you are maximizing your future reward. And while that’s not a well-defined thing, it doesn’t matter; if you were highly competent, you’d make a lot of changes to the world according to what tickles you, and those might or might not be good for others, depending on your preferences (reward function). The slight difference between turning the world into one well-defined thing and a bunch of things you like isn’t that important to anyone who doesn’t like what you like.
  This is a broader and more intuitive form of the argument Miles is trying to make precise.
  If you can be dutch-booked without limit, well, you’re just not competent enough to be a threat; but you’re not going to let that happen, let alone a superintelligent version of you.
- Jobst Heitzig (EMPO project) 20 Jun 2023 9:37 UTC
  2 points
  0 ∶ 0
  Parent
  I agree.
  Except for one detail: Humans who hold preferences that don’t comply to the axioms cannot necessarily be “dutch-booked” for real. That would require them not only to hold certain preferences but also to always act on those preferences like an automaton, see this nice summary discussion: https://plato.stanford.edu/entries/dutch-book/