Jobst Heitzig (vodle.it) comments on My lab’s small AI safety agenda

Jobst Heitzig (vodle.it) 18 Jun 2023 15:11 UTC
5 points
1 ∶ 0
Hey Yonatan,

first, excuse my spelling your name incorrectly originally, I fixed it now.

Thank you for your encouragement with funding. As it happens, we did apply for funding from several sources and are waiting for their response.

Regarding Rob Miles’ videos on satisficing:

One potential misunderstanding relates to the question of with what probability the agent is required to reach a certain goal. If I understand him correctly, he assumes satisficing needs to imply maximizing the probability that some constraint is met, which would still constitute a form of optimization (namely of the probability). This is why our approach is different: In a Markov Decision Process, the client would for example specify a feasibility interval for the expected value of the return (= long-term discounted sum of rewards according to some reward function that we explicitly do not assume to be a proper measure of utility), and the learning algorithm would seek a policy that makes the expected return fall anywhere into this interval.

The question of whether an agent somehow necessarily must optimize something is a little philosophical in my view. Of course, given an agent’s behavior, one can always find some function that is maximal for the given behavior. This is a mathematical triviality. But this is not the problem we need to address here. The problem we need to address is that the behavior of the agent might get chosen by the agent or its learning algorithm by maximizing some objective function.

It is all about a paradigm shift: In my view, AI systems should be made to achieve reasonable goals that are well-specified w.r.t. one or more proxy metrics, not to maximize whatever metric. What would be the reasonable goal for your modified paperclip maximizer?

Regarding “weakness”:

Non-maximizing does not imply weak, let alone “very weak”. I’m not suggesting to build a very weak system at all. In fact, maximizing an imperfect proxy metric will tend to give low score on the real utility. Or, to turn this around: The maximum of the actual utility function is most achieved by a policy that does not maximize the proxy metric. We will study this in example environments and report results later this year.
- Yonatan Cale 19 Jun 2023 12:07 UTC
  3 points
  0 ∶ 0
  Parent
  long-term discounted sum of rewards according to some reward function that we explicitly do not assume to be a proper measure of utility
  Isn’t this equivalent to building an agent (agent-2) that DID have that as their utility function?
  Ah, you wrote:
  The problem we need to address is that the behavior of the agent might get chosen by the agent or its learning algorithm by maximizing some objective function.
  I don’t understand this and it seems core to what you’re saying. Could you maybe say it in other words?
  - Jobst Heitzig (vodle.it) 20 Jun 2023 9:05 UTC
    3 points
    1 ∶ 0
    Parent
    When I said “actual utility” I meant that which we cannot properly formalize (human welfare and other values) and hence not teach (or otherwise “give” to) the agent, so no, the agent does not “have” (or otherwise know) this as their utility function in any relevant way.
    In my use of the term “maximization”, it refers to an act, process, or activity (as indicated by the ending “-ation”) that actively seeks to find the maximum of some given function. First there is the function to be maximized, then comes the maximization, and finally one knows the maximum and where the maximum is (argmax).
    On the other hand, one might object the following: if we are given a deterministic program P that takes input x and returns output y=P(x), we can of course always construct a mathematical function f that takes a pair (x,y) and returns some number r=f(x,y) so that it turns out that for each possible y we have P(x)=argmax f(x,y). A trivial choice for such a function is f(x,y)=1 if y=P(x) and f(x,y)=0 otherwise. Notice, however, that here the program P is given first, and then we construct a specific function f for this equivalence to hold.
    In other words, any deterministic program P is functionally equivalent to another program P’ that takes some input x, maximizes some function f(x,y), and returns the location y of that maximum. But being functionally equivalent to a maximizer is not the same as being a maximizer.
    In the learning agent context: If I give you a learned policy pi that takes a state s and returns an action a=pi(s) (or a distribution of actions), then you might well be able to construct a reward function g that takes a state-action pair (s,a) and returns a reward (or expected reward) r=g(s,a) so that when I then calculate the corresponding optimal state-action-quality-function Q* of this reward function, it turns out that for all states s, we have pi(s)=argmax Q*(s,a). This means that the policy pi is the same policy as the one that a learning process would have produced that searches for the policy that maximizes the long-term discounted sum of rewards according to reward function g. But it does not mean that the policy pi was actually determined by such a possible optimization procedure: the learning process that produced pi can very well be of a completely different kind than an optimization procedure.