Rohin Shah comments on Alignment Newsletter One Year Retrospective

Rohin Shah 10 Apr 2019 7:03 UTC
1 point
0 ∶ 0
Comment thread for the question: Am I underestimating the risk of causing information cascades? Regardless, how can I mitigate this risk?
- Milan_Griffes 10 Apr 2019 17:30 UTC
  2 points
  0 ∶ 0
  Parent
  Seems okay so far, from my very ill-informed perspective.
  Interesting to think about what governance the newsletter should have in place re: info hazards, confidentiality, etc.
  What did you guys do for GPT-2?
  - Rohin Shah 10 Apr 2019 22:56 UTC
    3 points
    0 ∶ 0
    Parent
    Interesting to think about what governance the newsletter should have in place re: info hazards, confidentiality, etc.
    Currently we only write about public documents, so I don’t think these concerns arise. I suppose you could imagine that someone writes about something they shouldn’t have and we amplify it, but I suspect this is a rare case and one that should be up to my discretion.
    What did you guys do for GPT-2?
    Not sure what specifically you’re asking about here. You can see the relevant newsletter here.
    - Milan_Griffes 10 Apr 2019 23:01 UTC
      3 points
      0 ∶ 0
      Parent
      … you could imagine that someone writes about something they shouldn’t have and we amplify it, but I suspect this is a rare case and one that should be up to my discretion.
      A crux here is probably how rare a case we think this is.
      From my present vantage, the AI alignment newsletter is becoming a pretty prominent clearinghouse for academic AI alignment research updates. (I wouldn’t be surprised if it were the primary source of such for a sizable portion of newsletter subscribers.)
      To the extent that’s true, the amplification effects seem possibly strong.
      - Rohin Shah 11 Apr 2019 2:00 UTC
        3 points
        0 ∶ 0
        Parent
        From my present vantage, the AI alignment newsletter is becoming a pretty prominent clearinghouse for academic AI alignment research updates. (I wouldn’t be surprised if it were the primary source of such for a sizable portion of newsletter subscribers.)
        To the extent that’s true, the amplification effects seem possibly strong.
        I agree that’s true and that the amplification effects for AI safety researchers are strong; it’s much less strong of an amplification effect for any other category. My current model is that info hazards are most worrisome when they spread outside the AI safety community.
        On confidentiality, the downsides of the newsletter failing to preserve confidentiality seem sufficiently small that I’m not worried (if you ignore info hazards). Failures of confidentiality seem bad in that they harm your reputation and make it less likely that people are willing to talk to you—it’s similar to the reason you wouldn’t break a promise even if superficially the consequences of the thing you’re doing seem slightly negative. But in the case of the newsletter, we would amplify someone else’s failure to preserve confidentiality, which shouldn’t reflect all that poorly on us. (Obviously if we knew that the information was supposed to be confidential we wouldn’t publish it.)