Owen Cotton-Barratt comments on On how various plans miss the hard bits of the alignment challenge

Owen Cotton-Barratt 13 Jul 2022 10:09 UTC
40 points
0 ∶ 0
Thanks for this, and especially for your last post (I’m viewing this as kind of an appendix-of-examples to the last post, which was one of my favourite pieces from the MIRI-sphere or indeed on AI alignment from anywhere). General themes I want to pick out:
- My impression is that there is a surprising dearth of discussion of what the hard parts of alignment actually are, and that this is one of the most important discussions to have given that we don’t have clean agreed articulations of the issues
  - I thought your last post was one of the most direct attempts to discuss this that I’ve seen, and I’m super into that
- I am interested in further understanding “what exactly would constitute a sharp left turn, and will there be one?”
- I’m in strong agreement that the field would be healthier if more people were aiming at the central problems, and I think it’s super healthy for you to complain about how it seems to you like they’re missing them.
  - I don’t think everyone should be aiming directly at the central problems because I think it may be that we don’t yet know enough to articulate and make progress there, and it can be helpful as a complement to build up knowledge that could later help with central problems; I would at least like it though if lots of people spent a little bit of time trying to understand the central problems, even if they then give up and say “seems like we can’t articulate them yet” or “I don’t know how to make progress on that” and go back to more limited things that they know how to get traction on, while keeping half an eye on the eventual goal and how it’s not being directly attacked.
I also wanted to clarify that Truthful AI was not trying to solve the hard bit of alignment (I think my coauthors would all agree with this). I basically think it could be good for two reasons:
1. As a social institution it could put society in a better place to tackle hard challenges (like alignment; if we get long enough between building this institution and having to tackle alignment proper).
2. It could get talented people who wouldn’t otherwise be thinking about alignment to work on truthfulness. And I think that some of the hard bits of truthfulness will overlap with the hard bits of alignment, so it might produce knowledge which is helpful for alignment.
(There was also an exploration of “OK but if we had fully truthful AI maybe that would help with alignment”, but I think that’s more a hypothetical sideshow than a real plan.)
So I think you could berate me for choosing not to work on the hard part of the problem, but I don’t want to accept the charge of missing the point. So why don’t I work on the hard part of the problem? I think:
- I don’t actually perceive the hard part of the problem clearly
  - It feels slippery, and that trying to tackle it head-on prematurely is too liable to result in doing work that I will later think completely misses the point
  - But I can perceive the shape of something there (I may or may not end up agreeing with you about its rough contours), so I prefer to think about a variety of things with some bearing on alignment, and periodically check back in to see how much enlightenment I now have about the central things
    You could think of me as betting on something like Grothendieck’s rising sea approach to alignment (although of course it’s quite likely I’ll never actually get the shell open)
  - This is part of what made my taste sensors fire very happily on your posts!
- I think there are a web of things which can put us in a position of “more likely well-equipped to make it through”, and when I see I have traction on some of those it feels like there’s a real substantive opportunity cost to just ignoring them
(Laying this out so that you know the basic shape of my thinking, such that if you want to make a case that I should devote time to tackling things more head-on, you’ll know what I need to be moved on.)
What links here?