Question Mark comments on ARC is hiring alignment theory researchers

Question Mark 15 Dec 2021 21:11 UTC
3 points
0 ∶ 0
How does ARC differ from other AI alignment organizations, like MIRI?
- Paul_Christiano 17 Dec 2021 4:56 UTC
  5 points
  0 ∶ 0
  Parent
  Compared to MIRI: We are trying to align AI systems trained using techniques like modern machine learning. We’re looking for solutions that are (i) competitive, i.e. don’t make the resulting AI systems much weaker, (ii) work no matter how far we scale up ML, (iii) work for any plausible situation we can think of, i.e. don’t require empirical assumptions about what kind of thing ML systems end up learning. This forces us to confront many of the same issues at MIRI, though we are doing so in a very different style that you might describe as “algorithm-first” rather than “understanding-first.” You can read a bit about our methodology in “My research methodology” or this section of our ELK writeup.
  I think that most researchers at MIRI don’t think that this goal is achievable, at least not without some kind of philosophical breakthrough. We don’t have the same intuition (perhaps we’re 50-50). Some of the reasons: it looks to us like there are a bunch of possible approaches for making progress, there aren’t really any clear articulations of fundamental obstacles that will cause those approaches to fail, and there is extremely little existing work pursuing plausible worst-case algorithms. Right now it mostly seems like people just have varying intuitions, but searching for a worst-case approach seems like it’s a good deal as long as there’s a reasonable chance it’s possible. (And if we fail we expect to learn something about why.)
  Compared to everyone else: We think of a lot of possible algorithms, but we can virtually always rule it out without doing any experiments. That means we are almost always doing theoretical research with pen and paper. It’s not obvious whether a given algorithm works in practice, but it usually is obvious that there exist plausible situations where it wouldn’t work, and we are searching (optimistically) for something that works in every plausible situation.