[Question] How much EA analysis of AI safety as a cause area exists?

richard_ngoSep 6, 2019, 11:15 AM

94 points

Criticism of effective altruist causes AI alignment Collections and resources Building effective altruism AI safety

AI safety has become a big deal in EA, and so I’m curious about how much “due diligence” on it has been done by the EA community as a whole. Obviously there have been many in-person discussions, but it’s very difficult to evaluate whether these contain new or high-quality content. Probably a better metric is how much work has been done which:

1. Is publicly available;

2. Engages in detail with core arguments for why AI might be dangerous (type A), OR tries to evaluate the credibility of the arguments without directly engaging with them (type B);

3. Was motivated or instigated by EA.

I’m wary of focusing too much on credit assignment, but it seems important to be able to answer a question like “if EA hadn’t ever formed, to what extent would it have been harder for an impartial observer in 2019 to evaluate whether working on AI safety is important?” The clearest evidence would be if there were much relevant work produced by people who were employed at EA orgs, funded by EA grants, or convinced to work on AI safety through their involvement with EA. Some such work comes to mind, and I’ve listed it below; what am I missing?

Type A work which meets my criteria above:

A lot of writing by Holden Karnofsky
A lot of writing by Paul Christiano
This sequence by Rohin Shah
These posts by Jeff Kaufman
This agenda by Allan Dafoe
This report by Tom Sittler

Type A work which only partially meets criterion 3 (or which I’m uncertain about):

These two articles by Luke Muehlhauser
This report by Eric Drexler
This blog by Ben Hoffman
AI impacts

Type B work which meets my criteria above:

Things which don’t meet those criteria:

This 80,000 hours report (which mentions the arguments, but doesn’t thoroughly evaluate them)
Superintelligence
The AI Foom debate

Edited to add: Wei Dai asked why I didn’t count Nick Bostrom as “part of EA”, and I wrote quite a long answer which explains the motivations behind this question much better than my original post. So I’ve copied most of it below:

The three questions I am ultimately trying to answer are: a) how valuable is it to build up the EA movement? b) how much should I update when I learn that a given belief is a consensus in EA? and c) how much evidence do the opinions of other people provide in favour of AI safety being important?

To answer the first question, assuming that analysis of AI safety as a cause area is valuable, I should focus on contributions by people who were motivated or instigated by the EA movement itself. Here Nick doesn’t count (except insofar as EA made his book come out sooner or better).

To answer the second question, it helps to know whether the focus on AI safety in EA came about because many people did comprehensive due diligence and shared their findings, or whether there wasn’t much investigation and the ubiquity of the belief was driven via an information cascade. For this purpose, I should count work by people to the extent that they or people like them are likely to critically investigate other beliefs that are or will become widespread in EA. Being motivated to investigate AI safety by membership in the EA movement is the best evidence, but for the purpose of answering this question I probably should have used “motivated by the EA movement or motivated by very similar things to what EAs are motivated by”, and should partially count Nick.

To answer the third question, it helps to know whether the people who have become convinced that AI safety is important are a relatively homogenous group who might all have highly correlated biases and hidden motivations, or whether a wide range of people have become convinced. For this purpose, I should count work by people to the extent that they are dissimilar to the transhumanists and rationalists who came up with the original safety arguments, and also to the extent that they rederived the arguments for themselves rather than being influenced by the existing arguments. Here EAs who started off not being inclined towards transhumanism or rationalism at all count the most, and Nick counts very little.

What links here?

richard_ngoSep 6, 2019, 11:15 AM

94 points

20 comments2 min readEA link

Criticism of effective altruist causes AI alignment Collections and resources Building effective altruism AI safety

bethSep 7, 2019, 11:22 AM
13 points
0 ∶ 0

I believe your assessment is correct, and I fear that EA hasn’t done due diligence on AI Safety, especially seeing how much effort and money is being spent on it.
I think there is a severe lack of writing on the side of “AI Safety is ineffective”. A lot of basic arguments haven’t been written down, including some quite low-hanging fruit.
- Anthony DiGiovanni Sep 14, 2019, 3:15 PM
  4 points
  0 ∶ 0
  Parent
  
  While I disagree with his conclusion and support FRI’s approach to reducing AI s-risks, Magnus Vinding’s essay “Why Altruists Should Perhaps Not Prioritize Artificial Intelligence” is one of the most thoughtful EA analyses against prioritizing AI safety I’m aware of. I’d say it fits into the “Type A and meets OP’s criterion” category.
  - Magnus Vinding Sep 15, 2019, 3:24 PM
    15 points
    0 ∶ 0
    Parent
    
    Thanks for sharing and for the kind words. :-)
    I should like to clarify that I also support FRI’s approach to reducing AI s-risks. The issue is more how big a fraction of our resources approaches of this kind deserve relative to other things. My view is that, relatively speaking, we very much underinvest in addressing other risks, by which I roughly mean “risks not stemming primarily from FOOM or sub-optimally written software” (which can still involve AI plenty, of course). I would like to see a greater investment in broad explorative research on s-risk scenarios and how we can reduce them.
    In terms of explaining the (IMO) skewed focus, it seems to me that we mostly think about AI futures in far mode, see https://www.overcomingbias.com/2010/06/near-far-summary.html and https://www.overcomingbias.com/2010/10/the-future-seems-shiny.html. The perhaps most significant way in which this shows is that we intuitively think the future will be determined by a single or a few agents and what they want, as opposed to countless different agents, cooperating and competing with many (for those future agents) non-intentional factors influencing the outcomes.
    I’d argue scenarios of the latter kind are far more likely given not just the history of life and civilization, but also in light of general models of complex systems and innovation (variation and specialization seem essential, and the way these play out is unlikely to conform to a singular will in anything like the neat way far mode would portray it). Indeed, I believe such a scenario would be most likely to emerge even if a single universal AI ancestor took over and copied itself (specialization would be adaptive, and significant uncertainty about the exact information and (sub-)aims possessed by conspecifics would emerge).
    In short, I think we place too much weight on simplistic toy models of the future, in turn neglecting scenarios that don’t conform neatly to these, and the ways these could come about.
    - Wei Dai Sep 15, 2019, 11:25 PM
      9 points
      0 ∶ 0
      Parent
      
      
      as opposed to countless different agents, cooperating and competing with many (for those future agents) non-intentional factors influencing the outcomes.
      
      I think there are good reasons to think this isn’t likely, aside from the possibility of FOOM:
      
      Strategic implications of AIs’ ability to coordinate at low cost, for example by merging
      AGI will drastically increase economies of scale
      - Magnus Vinding Sep 16, 2019, 2:57 PM
        3 points
        0 ∶ 0
        Parent
        
        Interesting posts. Yet I don’t see how they support that what I described is unlikely. In particular, I don’t see how “easy coordination” is in tension with what I wrote.
        To clarify, competition that determines outcomes can readily happen within a framework of shared goals, and as instrumental to some overarching final goal. If the final goal is, say, to maximize economic growth (or if that is an important instrumental goal), this would likely lead to specialization and competition among various agents that try out different things, and which, by the nature of specialization, have imperfect information about what other agents know (not having such specialization would be much less efficient). In this, a future AI economy would resemble ours more than far-mode thinking suggests (this does not necessarily contradict your claim about easier coordination, though).
        A reason I consider what I described likely is not least that I find it more likely that future software systems will consist in a multitude of specialized systems with quite different designs, even in the presence of AGI, as opposed to most everything being done by copies of some singular AGI system. This “one system will take over everything” strikes me as far-mode thinking, and not least unlikely given the history of technology and economic growth. I’ve outlined my view on this in the following e-book (though it’s a bit dated in some ways): https://www.smashwords.com/books/view/655938 (short summary and review by Kaj Sotala: https://kajsotala.fi/2017/01/disjunctive-ai-scenarios-individual-or-collective-takeoff/)
        Wei Dai Sep 17, 2019, 4:53 PM
        4 points
        0 ∶ 0
        Parent
        
        
        A reason I consider what I described likely is not least that I find it more likely that future software systems will consist in a multitude of specialized systems with quite different designs, even in the presence of AGI, as opposed to most everything being done by copies of some singular AGI system.
        
        Can you explain why this is relevant to how much effort we should put into AI alignment research today?
        
        Magnus Vinding Sep 19, 2019, 5:26 PM
        1 point
        0 ∶ 0
        Parent
        
        In brief: the less of a determinant specific AGI structure is of future outcomes, the less relevant/worthy of investment it is.
  - John_Maxwell Sep 19, 2019, 2:49 AM
    3 points
    0 ∶ 0
    Parent
    
    This critique is quite lengthy :-) Is there a summary available?
    - Anthony DiGiovanni Sep 20, 2019, 1:22 PM
      10 points
      0 ∶ 0
      Parent
      
      I’m not aware of such summaries, but I’ll take a stab at it here:
      Even though it’s possible for the expected disvalue of a very improbable outcome to be high if the outcome is sufficiently awful, the relatively large degree of investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high. This Open Phil post for example doesn’t frame this as a “yes it’s extremely unlikely, but the downsides could be massive, so in expectation it’s worth working on” cause; many EAs in general give estimates of a non-negligible probability of very bad AI outcomes. So, accordingly, AI is considered not only a viable cause to work on but indeed one of the top priorities.
      But arguably the scenarios in which AGI becomes a catastrophic threat rely on a conjunction of several improbable assumptions. One of which is that general “intelligence” in the sense of a capacity to achieve goals on a global scale—rather than capacity merely to solve problems easily representable within e.g. a Markov decision process—is something that computers can develop without a long process of real world trial and error, or cooperation in the human economy. (If such a process is necessary, then humans should be able to stop potentially dangerous AIs in their tracks before they become too powerful.) The key takeaway from the essay as far as I found was that we should be cautious about using one definition of intelligence, i.e. the sort that deep RL algorithms have demonstrated in game settings, as grounds for predicting dangerous outcomes resulting from a much more difficult-to-automate sense of intelligence, namely ability to achieve goals in physical reality.
      The actual essay is more subtle than this, of course, and I’d definitely encourage people to at least skim it before dismissing the weaker form of the argument I’ve sketched here. But I agree that the AI safety research community has a responsibility to make that connection between current deep learning “intelligence” and intelligence-as-power more explicit, otherwise it’s a big equivocation fallacy.
      Magnus, is this a fair representation?
      - Magnus Vinding Sep 28, 2019, 11:01 AM
        1 point
        0 ∶ 0
        Parent
        
        Thanks for the stab, Anthony. It’s fairly fair. :-)
        Some clarifying points:
        First, I should note that my piece was written from the perspective of suffering-focused ethics.
        Second, I would not say that “investment in AI safety work by the EA community today would only make sense if the probability of AI-catalyzed GCR were decently high”. Even setting aside the question of what “decently high” means, I would note that:
        1) Whether such investments in AI safety make sense depends in part on one’s values. (Though another critique I would make is that “AI safety” is less well-defined than people often seem to think: https://magnusvinding.com/2018/12/14/is-ai-alignment-possible/, but more on this below.)
        2) Even if “the probability of AI-catalyzed GCR” were decently high — say, >2 percent — this would not imply that one should focus on “AI safety” in a standard narrow sense (roughly: constructing the right software), nor that other risks are not greater in expectation (compared to the risks we commonly have in mind when we think of “AI-catalyzed catastrophic risks”).
        You write of “scenarios in which AGI becomes a catastrophic threat”. But a question I would raise is: what does this mean? Do we all have a clear picture of this in our minds? This sounds to me like a rather broad class of scenarios, and a worry I have is that we all have “poorly written software” scenarios in mind, although such scenarios could well comprise a relatively narrow subset of the entire class that is “catastrophic scenarios involving AI”.
        Zooming out, my critique can be crudely summarized as a critique of two significant equivocations that I see doing an exceptional amount of work in many standard arguments for “prioritizing AI”.
        First, there is what we may call the AI safety equivocation (or motte and bailey): people commonly fail to distinguish between 1) a focus on future outcomes controlled by AI and 2) a focus on writing “safe” software. Accepting that we should adopt the former focus by no means implies we should adopt the latter. By (imperfect) analogy, to say that we should focus on future outcomes controlled by humans does not imply that we should focus primarily on writing safe human genomes.
        The second is what we may call the intelligence equivocation, which is the one you described. We operate with two very different senses of the term “intelligence”, namely 1) the ability to achieve goals in general (derived from Legg & Hutter, 2007), and 2) “intelligence” in the much narrower sense of “advanced cognitive abilities”, roughly equivalent to IQ in humans.
        These two are often treated as virtually identical, and we fail to appreciate the rather enormous difference between them, as argued in/evident from books such as The Knowledge Illusion: Why We Never Think Alone, The Ascent of Man, The Evolution of Everything, and The Secret of Our Success. This was also the main point in my Reflections on Intelligence.
        Intelligence2 lies all in the brain, whereas intelligence1 includes the brain and so much more, including all the rest of our well-adapted body parts (vocal cords, hands, upright walk — remove just one of these completely in all humans and human civilization is likely gone for good). Not to mention our culture and technology as a whole, which is the level at which our ability to achieve goals at a significant level really emerges: it derives not from any single advanced machine but from our entire economy. A vastly greater toolbox than what intelligence2 covers.
        Thus, to assume that we by boosting intelligence2 to vastly super-human levels necessarily get intelligence1 at a vastly super-human level is a mistake, not least since “human-level intelligence1” already includes vastly super-human intelligence2 in many cognitive domains.
Milan Griffes Sep 7, 2019, 4:26 PM
11 points
0 ∶ 0

The Median Group has produced some Type B work: http://mediangroup.org/research (archive)
I believe they’re skeptical of AGI in the near-term.
Also OpenAI’s AI & Compute (archive), and some commentary on LessWrong.
Question Mark Dec 12, 2021, 9:04 AM
1 point
0 ∶ 0

Brian Tomasik wrote this article about the risks of a “near miss” in AI alignment. From a suffering-focused perspective, Tomasik argues that a slightly misaligned AGI could potentially cause far more suffering compared to an AI that is totally unaligned. He has also argued that there may be a ~38% chance that MIRI is actively harmful.

Wei Dai Sep 7, 2019, 10:50 PM
19 points
0 ∶ 0

Why doesn’t Superintelligence count?
- richard_ngo Sep 8, 2019, 6:20 PM
  6 points
  0 ∶ 0
  Parent
  
  To my knowledge it doesn’t meet the “Was motivated or instigated by EA” criterion, since Nick had been developing those ideas since well before the EA movement started. I guess he might have gotten EA money while writing the book, but even if that’s the case it doesn’t feel like a central example of what I’m interested in.
  - Wei Dai Sep 8, 2019, 6:31 PM
    21 points
    0 ∶ 0
    Parent
    
    That moves the question to, why not count Nick as part of EA itself? :) It seems reasonable to count him given that he wrote Astronomical Waste which seems to be part of EA’s intellectual foundations, and he personally seems to be very interested in doing altruism effectively.
    
    Or maybe you can explain more your motivations for writing this post, which would help me understand how best to interpret it.
    - richard_ngo Sep 9, 2019, 1:36 AM
      8 points
      0 ∶ 0
      Parent
      
      Let me try answer the latter question (and thanks for pushing me to flesh out my vague ideas more!) One very brief way you could describe the development of AI safety is something like “A few transhumanists came up with some key ideas and wrote many blog posts. The rationalist movement formed from those following these things online, and made further contributions. Then the EA movement formed, and while it was originally focused on causes like global poverty, over time did a bunch of investigative work which led many EAs to become convinced that AI safety matters, and to start working on it, directly or indirectly (or to gain skills with the intent of doing such work).”
      The three questions I am ultimately trying to answer are: a) how valuable is it to build up the EA movement? b) how much should I update when I learn that a given belief is a consensus in EA? and c) how much evidence do the opinions of other people provide in favour of AI safety being important?
      To answer the first question, assuming that analysis of AI safety as a cause area is valuable, I should focus on contributions by people who were motivated or instigated by the EA movement itself. Here Nick doesn’t count (except insofar as EA made his book come out sooner or better).
      To answer the second question, it helps to know whether the focus on AI safety in EA came about because many people did comprehensive due diligence and shared their findings, or whether there wasn’t much investigation and the ubiquity of the belief was driven via an information cascade. For this purpose, I should count work by people to the extent that they or people like them are likely to critically investigate other beliefs that are or will become widespread in EA. Being motivated to investigate AI safety by membership in the EA movement is the best evidence, but for the purpose of answering this question I probably should have used “motivated by the EA movement or motivated by very similar things to what EAs are motivated by”, and should partially count Nick.
      To answer the third question, it helps to know whether the people who have become convinced that AI safety is important are a relatively homogenous group who might all have highly correlated biases and hidden motivations, or whether a wide range of people have become convinced. For this purpose, I should count work by people to the extent that they are dissimilar to the transhumanists and rationalists who came up with the original safety arguments, and also to the extent that they rederived the arguments for themselves rather than being influenced by the existing arguments. Here EAs who started off not being inclined towards transhumanism or rationalism at all count the most, and Nick counts very little.
      Note that Nick is quite an outlier though, so while I’m using him as an illustrative example, I’d prefer engagement on the general points rather than this example in particular.
      - Wei Dai Sep 9, 2019, 5:52 PM
        11 points
        0 ∶ 0
        Parent
        
        
        Then the EA movement formed, and while it was originally focused on causes like global poverty, over time did a bunch of investigative work which led many EAs to become convinced that AI safety matters, and to start working on it directly or indirectly.
        
        Is this a statement that you’re endorsing, or is it part of what you’re questioning? Are you aware of any surveys or any other evidence supporting this? (I’d accept “most people in AI safety that I know started working in it because EA investigative work convinced them that AI safety matters” or something of that nature.)
        
        b) how much should I update when I learn that a given belief is a consensus in EA?
        
        Why are you trying to answer this, instead of “How should I update, given the results of all available investigations into AI safety as a cause area?” In other words, what is the point of dividing such investigations into “EA” and “not EA”, if in the end you just want to update on all of them to arrive at a posterior? Oh, is it because if a non-EA concludes that AI safety is not a worthwhile cause, it might just be because they don’t care much about the far future, so EA investigations are more relevant? But if so, why only “partially count” Nick?
        
        Here EAs who started off not being inclined towards transhumanism or rationalism at all count the most, and Nick counts very little.
        
        For this question then, it seems that Paul Christiano also needs to be discounted (and possibly others as well but I’m not as familiar with them).
        
        richard_ngo Sep 10, 2019, 9:35 AM
        2 points
        0 ∶ 0
        Parent
        
        Are you aware of any surveys or any other evidence supporting this? (I’d accept “most people in AI safety that I know started working in it because EA investigative work convinced them that AI safety matters” or something of that nature.)
        I’m endorsing this, and I’m confused about which part you’re skeptical about. Is it the “many EAs” bit? Obviously the word “many” is pretty fuzzy, and I don’t intend it to be a strong claim. Mentally the numbers I’m thinking of are something like >50 people or >25% of committed (or “core”, whatever that means) EAs. Don’t have a survey to back that up though. Oh, I guess I’m also including people currently studying ML with the intention of doing safety. Will edit to add that.
        Why are you trying to answer this, instead of “How should I update, given the results of all available investigations into AI safety as a cause area?”
        There are other questions that I would like answers to, not related to AI safety, and if I trusted EA consensus, then that would make the process much easier.
        For this question then, it seems that Paul Christiano also needs to be discounted (and possibly others as well but I’m not as familiar with them).
        Indeed, I agree.
        Wei Dai Sep 10, 2019, 4:20 PM
        1 point
        0 ∶ 0
        Parent
        
        
        I’m endorsing this, and I’m confused about which part you’re skeptical about.
        
        I think I interpreted your statement as saying something like most people in AI safety are EAs, because you started with “One very brief way you could describe the development of AI safety” which I guess made me think that maybe you consider this to be the main story of AI safety so far, or you thought other people considered this to be the main story of AI safety so far and you wanted to push against that perception. Sorry for reading too much / the wrong thing into it.
        
        There are other questions that I would like answers to, not related to AI safety, and if I trusted EA consensus, then that would make the process much easier.
        
        Ok I see. But there may not be that much correlation between the trustworthiness of EA consensus on different topics. It could easily be the case that EA has done a lot of good investigations on AI safety but very little or poor quality investigations on other topics. It seems like it wouldn’t be that hard to just look at the actual investigations for each topic, rather than rely on some sense of whether EA consensus is overall trustworthy.
Milan Griffes Sep 7, 2019, 4:21 PM
11 points
0 ∶ 0

See also Scott’s recent review of Drexler’s Reframing Superintelligence.