Mau comments on Chaining the evil genie: why “outer” AI safety is probably easy

Mau 30 Aug 2022 22:46 UTC
7 points
0 ∶ 0

To counter that, let me emphasize the aspects of AI risk that are not disproven here.

Adding to this list, much of the field thinks a core challenge is making highly capable, agentic AI systems safe. But (ignoring inner alignment issues) severe constraints create safe AI systems that aren’t very capable agents. (For example, if you make an AI that only considers what will happen within a time limit of 1 minute, it probably won’t be very good at long-term planning. Or if you make an AI system that only pursues very small-scale goals, it won’t be able to solve problems that you don’t know how to break up into small-scale goals.) So on its own, this doesn’t seem to solve outer alignment for highly capable agents.

(See e.g. the “2. Competitive” section of this article by Paul Christiano for some more discussion of why a core desiderata for safety solutions is their performance competitiveness.)
What links here?
- elifland's comment on My take on What We Owe the Future by elifland (3 Sep 2022 13:10 UTC; 6 points)
- titotal 1 Sep 2022 17:53 UTC
  1 point
  0 ∶ 0
  Parent
  It’s clear that not every constraint will work for every application, but I reckon every application will have at least some constraints that will drastically drop risk
  I definitely agree that competitiveness is important, but remember that it’s not just about competitiveness for a specific task, but competitiveness at pleasing AI developers. There’s a large incentive for people not to build runaway murder machines! And even if a company doesn’t believe in Ai x-risk, it still has to worry about lawsuits, regulations etc for lesser accidents. I think the majority of developers can be persuaded or forced to put some constraints on, as long as they aren’t excessively onerous.
  - Mau 1 Sep 2022 23:10 UTC
    2 points
    0 ∶ 0
    Parent
    Maybe, I’m not sure though. Future applications that do long-term, large-scale planning seem hard to constrain much while still letting them do what they’re supposed to do. (Bounded goals—if they’re bounded to small-scale objectives—seem like they’d break large-scale planning, time limits seem like they’d break long-term planning, and as you mention the “don’t kill people” counter would be much trickier to implement.)
    - titotal 2 Sep 2022 10:46 UTC
      1 point
      0 ∶ 0
      Parent
      That’s a fair perspective. One last thing I’ll note is that even seemingly permissive constraints can make a huge difference from the perspective of the AI utility calculus. If I ask it to maximise paperclips, then the upper utility bound is defined by the amount of matter in the universe. Capping utility at a trillion paperclips doesn’t affect us much (too many would flood the market anyway), but it reduces the expected utility of an AI takeover by like 50 orders of magnitude. Putting in a time limit, even if it’s like 100 years, would have the same effect. Seems like a no-brainer.