aog comments on What AI Safety Materials Do ML Researchers Find Compelling?

aog Dec 28, 2022, 3:33 PM
23 points
7 ∶ 0
Really interesting stuff! Another question that could be useful is how much each piece shifted their views on existential risk.

Some of the better liked pieces are less ardent about the possibility of AI x-risk. The two pieces that are most direct about x-risk might be the unpopular Cotra and Carlsmith essays. I’m open to the idea that gentler introductions to ideas about safety could be more persuasive, but it might also result in people working on topics that are less relevant for existential safety. Hopefully we’ll be able to find or write materials that are both persuasive to the ML community and directly communicate the most pressing concerns about alignment.

Separately, is your sample size 28 for each document? Or did different documents have different numbers of readers? Might be informative to see those individual sample sizes. Especially for a long report like Carlsmith’s, you might think that not many readers put in the hour+ necessary to read it.

Edit: Discussion of this point here: https://www.lesswrong.com/posts/gpk8dARHBi7Mkmzt9/what-ai-safety-materials-do-ml-researchers-find-compelling?commentId=Cxoa577LadGYwC49C#comments
- Vael Gates Dec 29, 2022, 12:47 AM
  3 points
  0 ∶ 0
  Parent
  (in response to the technical questions)
  
  Mostly n=28 for each document, some had n =29 or n= 30; you can see details in the Appendix, quantitative section.
  
  The Carlsmith link is to the Youtube talk version, not the full report—we chose materials based on them being pretty short.
  - wANIEL Jan 4, 2023, 1:52 PM
    3 points
    0 ∶ 0
    Parent
    Was each piece of writing read by a fresh set of n researchers (i.e. meaning that a total of ~30*8 researchers participated)? I understand the alternative to be that the same ~30 researchers read the 8 pieces of writing.
    
    The following question interests me if the latter was true:
    Do you specify in what order they should read the pieces?
    
    I expect somebody making their first contact with AIS to have a very path-dependent response. For instance, encountering Carlsmith first and encountering Carlsmith last seem to produce different effects—these effects possibly extending to the researchers’ ratings of the other pieces.
    
    Unrelatedly, I’m wondering whether researchers were exposed only to the transcripts of the videos as opposed to the videos themselves.
    - Vael Gates Jan 4, 2023, 11:07 PM
      1 point
      0 ∶ 0
      Parent
      No, the same set of ~28 authors read all of the readings.
      
      The order of the readings was indeed specified:
      Concise overview (Stuart Russell, Sam Bowman; 30 minutes)
      Different styles of thinking about future AI systems (Jacob Steinhardt; 30 minutes)
      A more in-depth argument for highly advanced AI being a serious risk (Joe Carlsmith; 30 minutes)
      A more detailed description of how deep learning models could become dangerously “misaligned” and why this might be difficult to solve with current ML techniques (Ajeya Cotra; 30 minutes)
      An overview of different research directions (Paul Christiano; 30 minutes)
      A study of what ML researchers think about these issues (Vael Gates; 45 minutes)
      Some common misconceptions (John Schulman; 15 minutes)
      Researchers had the option to read the transcripts where transcripts were available; we said that consuming the content in either form (video or transcript) was fine.