RobBensinger comments on My current thoughts on MIRI’s “highly reliable agent design” work

RobBensinger 8 Jul 2017 1:52 UTC
9 points
0 ∶ 0
FWIW, I don’t think (1) or (2) plays a role in why MIRI researchers work on the research they do, and I don’t think they play a role in why people at MIRI think “learning to reason from humans” isn’t likely to be sufficient. The shape of the “HRAD is more promising than act-based agents” claim is more like what Paul Christiano said here:

As far as I can tell, the MIRI view is that my work is aimed at [a] problem which is not possible, not that it is aimed at a problem which is too easy. [...] One part of this is the disagreement about whether the overall approach I’m taking could possibly work, with my position being “something like 50-50” the MIRI position being “obviously not” [...]

There is a broader disagreement about whether any “easy” approach can work, with my position being “you should try the easy approaches extensively before trying to rally the community behind a crazy hard approach” and the MIRI position apparently being something like “we have basically ruled out the easy approaches, but the argument/evidence is really complicated and subtle.”

With a clarification I made in the same thread:

I think Paul’s characterization is right, except I think Nate wouldn’t say “we’ve ruled out all the prima facie easy approaches,” but rather something like “part of the disagreement here is about which approaches are prima facie ‘easy.’” I think his model says that the proposed alternatives to MIRI’s research directions by and large look more difficult than what MIRI’s trying to do, from a naive traditional CS/Econ standpoint. E.g., I expect the average game theorist would find a utility/objective/reward-centered framework much less weird than a recursive intelligence bootstrapping framework. There are then subtle arguments for why intelligence bootstrapping might turn out to be easy, which Nate and co. are skeptical of, but hashing out the full chain of reasoning for why a daring unconventional approach just might turn out to work anyway requires some complicated extra dialoguing. Part of how this is framed depends on what problem categories get the first-pass “this looks really tricky to pull off” label.
- Daniel_Dewey 8 Jul 2017 4:41 UTC
  2 points
  0 ∶ 0
  Parent
  Thanks for linking to that conversation—I hadn’t read all of the comments on that post, and I’m glad I got linked back to it.