RobBensinger comments on The academic contribution to AI safety seems large

RobBensinger 31 Jul 2020 16:43 UTC
19 points
0 ∶ 0
I agree with Max’s take. MIRI researchers still look at Alignment for Advanced Machine Learning Systems (AAMLS) problems periodically, but per Why I am not currently working on the AAMLS agenda, they mostly haven’t felt the problems are tractable enough right now to warrant a heavy focus.
Nate describes our new work here: 2018 Update: Our New Research Directions.
Since 2016, actually “about half” of MIRI’s research has been on their ML agenda, apparently to cover the chance of prosaic AGI.
I don’t think any of MIRI’s major research programs, including AAMLS, have been focused on prosaic AI alignment. (I’d be interested to hear if Jessica or others disagree with me.)
Paul introduced prosaic AI alignment (in November 2016) with:
It’s conceivable that we will build “prosaic” AGI, which doesn’t reveal any fundamentally new ideas about the nature of intelligence or turn up any “unknown unknowns.” I think we wouldn’t know how to align such an AGI; moreover, in the process of building it, we wouldn’t necessarily learn anything that would make the alignment problem more approachable. So I think that understanding this case is a natural priority for research on AI alignment.
In contrast, I think of AAMLS as assuming that we’ll need new deep insights into intelligence in order to actually align an AGI system. There’s a large gulf between (1) “Prosaic AGI alignment is feasible” and (2) “AGI may be produced by techniques that are descended from current ML techniques” or (3) “Working with ML concepts and systems can help improve our understanding of AGI alignment”, and I think of AAMLS as assuming some combination of 2 and 3, but not 1. From a post I wrote in July 2016:
[… AAMLS] is intended to help more in scenarios where advanced AI is relatively near and relatively directly descended from contemporary ML techniques, while our agent foundations agenda is more agnostic about when and how advanced AI will be developed.
As we recently wrote, we believe that developing a basic formal theory of highly reliable reasoning and decision-making “could make it possible to get very strong guarantees about the behavior of advanced AI systems — stronger than many currently think is possible, in a time when the most successful machine learning techniques are often poorly understood.” Without such a theory, AI alignment will be a much more difficult task.
The authors of “Concrete problems in AI safety” write that their own focus “is on the empirical study of practical safety problems in modern machine learning systems, which we believe is likely to be robustly useful across a broad variety of potential risks, both short- and long-term.” Their paper discusses a number of the same problems as the [AAMLS] agenda (or closely related ones), but directed more toward building on existing work and finding applications in present-day systems.
Where the agent foundations agenda can be said to follow the principle “start with the least well-understood long-term AI safety problems, since those seem likely to require the most work and are the likeliest to seriously alter our understanding of the overall problem space,” the concrete problems agenda [by Amodei, Olah, Steinhardt, Christiano, Schulman, and Mané] follows the principle “start with the long-term AI safety problems that are most applicable to systems today, since those problems are the easiest to connect to existing work by the AI research community.”
Taylor et al.’s new [AAMLS] agenda is less focused on present-day and near-future systems than “Concrete problems in AI safety,” but is more ML-oriented than the agent foundations agenda.
What links here?
- Max_Daniel's comment on The academic contribution to AI safety seems large by Gavin (30 Jul 2020 16:52 UTC; 21 points)