titotal comments on My lab’s small AI safety agenda

titotal 19 Jun 2023 11:14 UTC
5 points
0 ∶ 0
Hey! Can you elaborate a bit more on what you mean by “never optimise” here? It seems like the definition you have is broad enough to render an AI useless :
When saying “optimize” I mean it in the strict mathematical sense: aiming to find an exact or approximate, local or global maximum or minimum of some function When I mean mere improvements w.r.t. some metric, I just say “improve” rather than “optimize”.
It seems like this definition would apply to anything that uses math to make decisions. If I ask the AI to find me the cheapest flight it can from london to new york tomorrow, will it refuse to answer?
Also, I don’t understand the distinction with “improvement” here. If I try to “improve” the estimate of the cheapest flight, isn’t that the same think as trying to “optimise” to find the approximate local minimum of cost?
- Jobst Heitzig (EMPO project) 20 Jun 2023 9:25 UTC
  2 points
  0 ∶ 0
  Parent
  This is difficult to say. I have a relatively clear intuition what I mean by optimization and what I mean by optimizing behavior. In your example, merely asking for the cheapest flight might be safe as long as you don’t automatically then book that flight without spending a moment to think about whether taking that one-propeller machine without any safety belts that you have to pilot yourself is actually a good idea just because it turned out to be the cheapest. I mostly care about agents that have more agency than just printing text to your screen.
  I believe what some people call “AI heaven” can be reached with AI agents that don’t book the cheapest flights but that book you a flight that costs no more than you specify, take no longer than you specify, and have at least those safety equipment and other facilities that you specify. In other words: satisficing! Another example: Not find me a job that earns me as much income as possible, but find me a job that earns me at least as much income to satisfy all my basic needs and let’s me have as much fun from leisure activities as I can squeeze into my lifetime. And so on…
  Regarding “improvement”: Replacing a state s by a state s’ that scores higher on some metric r, so that r(s’) > r(s), is an “improvement w.r.t. r”, not an optimization for r. An optimization would require replacing s by that s’ for which there is no other s″ with r(s″) > r(s’), or some approximate version of this.
  One might think that a sequence of improvements must necessarily constitute an optimization, so that my distinction is unimportant. But this is not correct: While any sequence of improvements r(s1) < r(s2) must make r(sn) converge to some value r° (at least if r is bounded), this limit value r° will in general be considerably lower than the maximal value r* = max r(s). unless the procedure that selects the improvements is especially designed to find that maximum, in other words, is an optimization algorithm. Note that optimization is a hard problem in most real-world cases, much harder than just finding some sequence of improvements.
  - titotal 20 Jun 2023 10:44 UTC
    6 points
    0 ∶ 1
    Parent
    With regards to your improvements definition, isn’t “continuously improving until you reach a limit with is not necessarily the global limit” just a different way of describing local optimization? It sounds like you’re just describing a hill climber.
    I do agree with building a satisficer, as this describes more accurately what the user actually wants! I want a cheap flight, but I wouldn’t be willing to wait 3 days for the program to find the cheapest possible flight that saved me 5 bucks. But on the other hand, if I told it to find me flights under 500 bucks, and it served me up a flight for 499 bucks even though there was another equally good option for 400 bucks, I’d be pretty annoyed.
    It seems like some amount of local optimisation is necessary for an AI to be useful.
    - Jobst Heitzig (EMPO project) 20 Jun 2023 11:05 UTC
      3 points
      1 ∶ 0
      Parent
      That depends what you mean by “continuously improving until you reach a limit which is not necessarily the global limit”.
      I guess by “continuously” you probably do not mean “in continuous time” but rather “repeatedly in discrete time steps”? So you imagine a sequence r(s1) < r(s2) < … ? Well, that could converge to anything larger than each of the r(sn). E.g., if r(sn) = 1 − 1/n, it will converge to 1. (It will of course never “reach” 1 since it will always below 1.) This is completely independent of what the local or global maxima of r are. They can obviously be way larger. For example, if the function is r(s) = s and the sequence is sn = 1 − 1/n, then r(sn) converges to 1 but the maximum of r is infinity. So, as I said before, unless your sequence of improvements is part of an attempt to find a maximum (that is, part of an optimization process), there is no reason to expect that it will converge to some maximum.
      Btw., this also shows that if you have two competing satisficers whose only goal is to outperform the other and who therefore repeatedly improve their reward to be larger than the other agents’ current reward, this does not imply that their rewards will converge to some maximum reward. They can easily be programmed to avoid this by just outperforming the other by an amount of 2**(-n) in the n-th step, so that their rewards converge to the initial reward plus one, rather than to whatever maximum reward might be possible.
      - titotal 20 Jun 2023 14:01 UTC
        3 points
        0 ∶ 0
        Parent
        Ah, well explained, thank you. Yes, I agree now that you can theoretically improve to a limit without having that limit being a local maxima. Although I’m unsure if the procedure could end up being equivalent in practice to a local maximisation with a modified goal function (say one that penalises going above “reward + 1” with exponential cost). Maybe something to think about when going forward.
        Thanks for answering the questions, best of luck with the endeavour!