Jeff Kaufman 🔸 comments on Some thoughts on deference and inside-view models

Jeff Kaufman 🔸 28 May 2020 18:15 UTC
22 points
0 ∶ 0
I asked an AI safety researcher “Suppose your research project went as well as it could possibly go; how would it make it easier to align powerful AI systems?”, and they said that they hadn’t really thought about that. I think that this makes your work less useful.
This seems like a deeper disagreement than you’re describing. A lot of research in academia (ex: much of math) involves playing with ideas that seem poorly understood, trying to figure out what’s going on. It’s not really goal directed, especially not the kind of goal you can chain back to world improvement, it’s more understanding directed.
It reminds me of Sarah Constantin’s post about the trade-off between output and external direction: https://srconstantin.wordpress.com/2019/07/20/the-costs-of-reliability/
For AI safety your view may still be right: one major way I could see the field going wrong is getting really into interesting problems that aren’t useful. But on the other hand it’s also possible that the best path involves highly productive interest-following understanding-building research where most individual projects don’t seem promising from an end to end view. And maybe even where most aren’t useful from an end to end view!
Again, I’m not sure here at all, but I don’t think it’s obvious you’re right.
What links here?
- MichaelA🔸's comment on MichaelA’s Quick takes by MichaelA🔸 (19 May 2021 15:29 UTC; 30 points)
- Buck 3 Jun 2020 4:47 UTC
  25 points
  0 ∶ 0
  Parent
  This comment is a general reply to this whole thread.
  Some clarifications:
  - I don’t think that we should require that people working in AI safety have arguments for their research which are persuasive to anyone else. I’m saying I think they should have arguments which are persuasive to them.
  - I think that good plans involve doing things like playing around with ideas that excite you, and learning subjects which are only plausibly related if you have a hunch it could be helpful; I do these things a lot myself.
  - I think there’s a distinction between having an end-to-end story for your solution strategy vs the problem you’re trying to solve—I think it’s much more tractable to choose unusually important problems than to choose unusually effective research strategies.
    In most fields, the reason you can pick more socially important problems is that people aren’t trying very hard to do useful work. It’s a little more surprising that you can beat the average in AI safety by trying intentionally to do useful work, but my anecdotal impression is that people who choose what problems to work on based on a model of what problems would be important to solve are still noticeably more effective.
  Here’s my summary of my position here:
  - I think that being goal directed is very helpful to making progress on problems on a week-by-week or month-by-month scale.
  - I think that within most fields, some directions are much more promising than others, and backchaining is required in order to work on the promising directions. AI safety is a field like this. Math is another—if I decided to try to do good by going into math, I’d end up doing research which was really different from normal mathematicians. I agree with Paul Christiano’s old post about this.
  - If I wanted to maximize my probability of solving the Riemann hypothesis, I’d probably try to pursue some crazy plan involving weird strengths of mine and my impression of blind spots of the field. However, I don’t think this is actually that relevant, because I think that the important work in AI safety (and most other fields of relevance to EA) is less competitive than solving the Riemann hypothesis, and also a less challenging mathematical problem.
  - I think that in my experience, people who do the best work on AI safety generally have a clear end-to-end picture of the story for what work they need to do, and people who don’t have such a clear picture rarely do work I’m very excited about. Eg I think Nate Soares and Paul Christiano are both really good AI safety researchers, and both choose their research directions very carefully based on their sense of what problems are important to solve.
  Sometimes I talk to people who are skeptical of EA because they have a stronger version of the position you’re presenting here—they think that nothing useful ever comes of people intentionally pursuing research that they think is important, and the right strategy is to pursue what you’re most interested in.
  One way of thinking about this is to imagine that there are different problems in a field, and different researchers have different comparative advantages at the problems. In one extreme case, the problems vary wildly in importance, and so the comparative advantage basically doesn’t matter and you should work on what’s most important. In the other extreme, it’s really hard to get a sense of which things are likely to be more useful than other things, and your choices should be dominated by comparative advantage.
  (Incidentally, you could also apply this to the more general problem of deciding what to work on as an EA. My personal sense is that the differences in values between different cause areas are big enough to basically dwarf comparative advantage arguments, but within a cause area comparative advantage is the dominant consideration.)
  I would love to see a high quality investigation of historical examples here.
  What links here?
  - Why “cause area” as the unit of analysis? by riceissa (25 Jan 2021 2:53 UTC; 85 points)
  - EdoArad 3 Jun 2020 6:56 UTC
    7 points
    0 ∶ 0
    Parent
    I mostly share your position, except that I think that you would perhaps maximize the probability of solving the Riemann hypothesis by going into paths on the frontline of current research instead of starting something new (but I imagine that there are many promising paths currently, which may be the difference).
    This planners vs Hayekian genre of dilemmas seems very important to me, and it might be a crux in my career trajectory or at least impact possible projects I’m taking. I intuitively think that this question can be dissolved quite easily to make it obvious when each strategy is better, how parts of the EA world-view influences the answer and perhaps how this impacts how we think about academic research. There is also a lot of existing literature on this matter, so there might already be a satisfying argument.
    If someone here is up to a (possibly adversarial) collaboration on the topic, let’s do it!
    - EdoArad 6 Jun 2020 13:34 UTC
      9 points
      0 ∶ 0
      Parent
      The Planners vs Hayekian dillema seems related to some of the discussion in Realism about rationality, and especially this crux for Abram Demski and Rohin Shah.
      
      Broadly, two types of strategies in technical AI alignment work are
      
      Build a solid mathematical foundations on which to build further knowledge which would eventually serve to reason more clearly on AI alignment.
      Focus on targeted problems we can see today which are directly related to risks from advanced AI, and do our best to solve these (by heuristics or tracing back to related mathematical questions).
      
      Borrowing Vanessa’s analogy of understanding the world as a castle, each floor built on the one underneath representing knowledge hierarchically built, when one wants to build a castle with unknown materials and unknown set of rules for it’s construction with a specific tower top in mind, one can either start by building the groundwork well or by starting with some ideas of what can by directly below the tower top.
      
      Planners start from the towers top, while Hayekians want to build a solid ground and add on as many well placed floors as they can.
- Max_Daniel 31 May 2020 16:10 UTC
  5 points
  0 ∶ 0
  Parent
  I agree, and also immediately thought of pure mathematics as a counterexample. E.g., if one’s most important goal was to prove the Riemann hypothesis, then I claim (based on my personal experience of doing maths, though e.g. Terence Tao seems to agree) that it’d be a very bad strategy to only do things where one has an end-to-end story for how they might contribute to a proof of the Riemann hypothesis. This is true especially if one is junior, but I claim it would be true even for a hypothetical person eventually proving the Riemann conjecture, except maybe in some of the very last stages of them actually figuring out the proof.
  I think the history of maths also provides some suggestive examples of the dangers of requiring end-to-end stories. E.g., consider some famous open questions in Ancient mathematics that were phrased in the language of geometric constructions with ruler and compass, such as whether it’s possible to ‘square the circle’. It was solved 2,000 years after it was posed using modern number theory. But if you had insisted that everyone working on it has an end-to-end story for how what they’re doing contributes to solving that problem, I think there would have been a real risk that people continue thinking purely in ruler-and-compass terms and we never develop modern number theory in the first place.
  --
  The Planners vs. Hayekians distinction seems related. The way I’m understanding Buck is that he thinks that, at least within AI alignment, a Planning strategy is superior to a Hayekian one (i.e. roughly one based on optimizing robust heuristics rather than an end-to-end story).
  --
  One of the strongest defenses of Buck’s original claim I can think of would appeal specifically to the “preparadigmatic” stage of AI alignment. I.e. roughly the argument would be: sure, perhaps in areas where we know of heuristics that are robustly good to pursue it can sometimes be best to do so; however, the challenge with AI alignment precisely is that we do not know of such heuristics, hence there simply is no good alternative to having an end-to-end story.
  - Buck 3 Jun 2020 5:14 UTC
    9 points
    0 ∶ 0
    Parent
    I just looked up the proof of Fermat’s Last Theorem, and it came about from Andrew Wiles spotting that someone else had recently proven something which could plausibly be turned into a proof, and then working on it for seven years. This seems like a data point in favor of the end-to-end models approach.
    - Max_Daniel 3 Jun 2020 7:28 UTC
      8 points
      0 ∶ 0
      Parent
      Yes, agree. Though anecdotally my impression is that Wiles is an exception, and that his strategy was seen as quite weird and unusual by his peers.
      I think I agree that in general there will almost always be a point at which it’s optimal to switch to a more end-to-end strategy. In Wiles’s case, I don’t think his strategy would have worked if he had switched as an undergraduate, and I don’t think it would have worked if he had lived 50 years earlier (because the conceptual foundations used in the proof had not been developed yet).
      This can also be a back and forth. E.g. for Fermat’s Last Theorem, perhaps number theorists were justified in taking a more end-to-end approach in the 19th century because there had been little effort using then-modern tools; and indeed, I think partly stimulated by attempts to prove FLT (and actually proving it in some special cases), they developed some of the foundations of classical algebraic number theory. Maybe then people had understood that the conjecture resists attempts to prove it directly given then-current conceptual tools, and at this point it would have become more fruitful to spend more time on less direct approaches, though they could still be guided by heuristics like “it’s useful to further develop the foundations of this area of maths / our understanding of this kind of mathematical object because we know of a certain connection to FLT, even though we wouldn’t know how exactly this could help in a proof of FLT”. Then, perhaps in Wiles’s time, it was time again for more end-to-end attempts etc.
      I’m not confident that this is a very accurate history of FLT, but reasonably confident that the rough pattern applies to a lot of maths.
    - EdoArad 3 Jun 2020 6:59 UTC
      5 points
      0 ∶ 0
      Parent
      The paper Architecting Discovery by Ed Boyden and Adam Marblestone also discusses how one can methodologically go about producing better scientific tools (which they used for Expansion Microscopy and Optogenetics).
  - Buck 3 Jun 2020 5:12 UTC
    8 points
    0 ∶ 0
    Parent
    I think the history of maths also provides some suggestive examples of the dangers of requiring end-to-end stories. E.g., consider some famous open questions in Ancient mathematics that were phrased in the language of geometric constructions with ruler and compass, such as whether it’s possible to ‘square the circle’. It was solved 2,000 years after it was posed using modern number theory. But if you had insisted that everyone working on it has an end-to-end story for how what they’re doing contributes to solving that problem, I think there would have been a real risk that people continue thinking purely in ruler-and-compass terms and we never develop modern number theory in the first place.
    I think you’re interpreting me to say that people ought to have an externally validated end-to-end story; I’m actually just saying that they should have an approach which they think might be useful, which is weaker.
    - Max_Daniel 3 Jun 2020 9:08 UTC
      9 points
      0 ∶ 0
      Parent
      Thanks, I think this is a useful clarification. I’m actually not sure if I even clearly distinguished these cases in my thinking when I wrote my previous comments, but I agree the thing you quoted is primarily relevant to when end-to-end stories will be externally validated. (By which I think you mean something like: they would lead to an ‘objective’ solution, e.g. maths proof, if executed without major changes.)
      The extent to which we agree depends on what counts as end-to-end story. For example, consider someone working on ML transparency claiming their research is valuable for AI alignment. My guess is:
      If literally everything they can say when queried is “I don’t know how transparency helps with AI alignment, I just saw the term in some list of relevant research directions”, then we both are quite pessimistic about the value of that work.
      If they say something like “I’ve made the deliberate decision not to focus on research for which I can fully argue it will be relevant to AI alignment right now. Instead, I just focus on understanding ML transparency as best as I can because I think there are many scenarios in which understanding transparency will be beneficial.”, and then they say something showing they understand longtermist thought on AI risk, then I’m not necessarily pessimistic. I’d think they won’t come up with their own research agenda in the next two years, but depending on the circumstances I might well be optimistic about that person’s impact over their whole career, and I wouldn’t necessarily recommend them to change their approach. I’m not sure what you’d think, but I think initially I read you as being pessimistic in such a case, and this was partly what I was reacting against.
      If they give an end-to-end story for how their work fits within AI alignment, then all else equal I consider that to be a good sign. However, depending on the circumstances I might still think the best long-term strategy for that person is to postpone the direct pursuit of that end-to-end story and instead focus on targeted deliberate practice of some of the relevant skills, or at least complement the direct pursuit with such deliberate practice. For example, if someone is very junior, and their story says that mathematical logic is important for their work, I might recommend they grab a logic textbook and work through all the exercises. My guess is we disagree on such cases, but that the disagreement is somewhat gradual; i.e. we both agree about extreme cases, but I’d more often recommend more substantial deliberate practice.
  - Jeff Kaufman 🔸 2 Jun 2020 0:06 UTC
    8 points
    0 ∶ 0
    Parent
    Similar with what you’re saying about AI alignment being preparadigmatic, a major reason why trying to prove the Riemann conjecture head-on would be a bad idea is that people have already been trying to do that for a long time without success. I expect the first people to consider the conjecture approached it directly, and were reasonable to do so.
    - Max_Daniel 2 Jun 2020 7:27 UTC
      3 points
      0 ∶ 0
      Parent
      Yes, good points. I basically agree. I guess this could provide another argument in favor of Buck’s original view, namely that the AI alignment problem is young and so worth attacking directly. (Though there are differences between attacking a problem directly and having an end-to-end story for how to solve it, which may be worth paying attention to.)
      I think your view is also born out by some examples from the history of maths. For example, the Weil conjectures were posed in 1949, and it took “only” a few decades to prove them. However, some of the key steps were known from the start, it just required a lot of work and innovation to complete them. And so I think it’s fair to characterize the process as a relatively direct, and ultimately successful, attempt to solve a big problem. (Indeed, this is an example of the effect where the targeted pursuit of a specific problem led to a lot of foundational/theoretical innovation, which has much wider uses.)
  - EdoArad 2 Jun 2020 5:44 UTC
    5 points
    0 ∶ 0
    Parent
    For clarity, Terry Tao argues that it is a bad strategy to work on one open problem because one should skill up first, lose some naivety and get a higher status within the community. Not because it is a better problem solving strategy.
    - Max_Daniel 2 Jun 2020 7:13 UTC
      4 points
      0 ∶ 0
      Parent
      My reading is that career/status considerations are only one of at least two major reasons Tao mentions. I agree those may be less relevant in the AI alignment case, and are not centrally a criterion for how good a problem solving strategy is.
      However, Tao also appeals to the required “mathematical preparation”, which fits with you mentioning skilling up and losing naivety. I do think these are central criteria for how good a problem solving strategy is. If I want to build a house, it would be a bad strategy to start putting it together with my bare hands; it would be better to first build a hammer and other tools, and understand how to use them. Similarly, it would be better to acquire and understand the relevant tools before attempting to solve a mathematical problem.
      - EdoArad 2 Jun 2020 7:56 UTC
        5 points
        0 ∶ 0
        Parent
        I agree with this. Perhaps we are on the same page.
        But I think that this is in an important way orthogonal to the Planner vs Hayekian distinction which I think is the more crucial point here.
        I’d argue that if one wants to solve a problem, it would be better to have a sort of a roadmap and to learn stuff on the way. I agree that it might be great to choose subproblems if they give you some relevant tools, but there should be a good argument as to why these tools are likely to help. In many cases, I’d expect choosing subproblems which are closer to what you really want to accomplish to help you learn more relevant tools. If you want to get better at climbing stairs, you should practice climbing stairs.
        Max_Daniel 2 Jun 2020 8:14 UTC
        2 points
        0 ∶ 0
        Parent
        I think having a roadmap, and choosing subproblems as close as possible to the final problem, are often good strategies, perhaps in a large majority of cases.
        However, I think there at least three important types of exceptions:
        When it’s not possible to identify any clear subproblems or their closeness to the final problem is unclear (perhaps AI alignment is an example, though I think it’s less true today than it was, say, 10 years ago—at least if you buy e.g. Paul Christiano’s broad agenda).
        When the close, or even all known, subproblems have resisted solutions for a long time, e.g. Riemann hypothesis.
        When one needs tools/subproblems that seem closely related only after having invested a lot of effort investigating them, rather than in advance. E.g. squaring the circle—“if you want to understand constructions with ruler and compass, do a lot of constructions with ruler and compass” was a bad strategy. Though admittedly it’s unclear if one can identify examples of this type in advance unless they are also examples of one of the previous two types.
        Also, I of course acknowledge that there are limits to the idea of exploring subproblems that are less closely related. For example, I think no matter what mathematical problem you want to solve, I think it would be a very bad strategy to study dung beetles or to become a priest. And to be fair, I think at least in hindsight the idea of studying close subproblems will almost always appear to be correct. To return to the example of squaring the circle: once people had realized that the set of points you can construct with ruler and compass are closed under basic algebraic operations in the complex plane, it was possible and relatively easy to see how certain problems in algebra number theory were closely related. So the problem was less that it’s intrinsically better to focus on less related subproblems, but more that people didn’t properly understand what would count as helpfully related.
        EdoArad 2 Jun 2020 19:44 UTC
        1 point
        0 ∶ 0
        Parent
        Regarding the first two types, I think that it’s practically never the case and one can always make progress—even if that progress is in work done on analogies or heuristically relevant techniques. The Riemann hypothesis is actually a great example of that; there are many paths currently pursued to help us understand it better, even if there aren’t any especially promising reductions (not sure if that’s the case). But I guess that your point here is that this are distinct markers for how easy is it to make progress.
        What is the alternative strategy you are suggesting in those exceptions? Is it to work on problems that are weakly related and the connection is not clear but are more tractable?
        If so, I think that two alternative strategies are to just try harder to find something more related or to move to a different project altogether. Of course, this all lies on a continuum so it’s a matter of degree.