PeterSlattery comments on Please help me sense-check my assumptions about the needs of the AI Safety community and related career plans

PeterSlattery 29 Mar 2023 4:24 UTC
2 points
0 ∶ 0
Hi Yonatan,
Thank you for this! Your comment is definitely readable and helpful. It highlights gaps in my communication and pushes me to think more deeply and explain my ideas better.
I’ve gained two main insights. First, I should be clearer about what I mean when I use terms like “shared language.” Second, I realise that I see EA as a well-functioning aggregator for the wisdom of well-calibrated crowds, and want to see something similar to that for AI Safety Movement building.
Now, let me address your individual points, using the quotes you provided:
Quote 1: “This still leaves open questions like “how do you chose those experts”, for example do you do it based on who has the most upvotes on the forum? (I guess not), or what happens if you chose “experts” who are making the AI situation WORSE and they tell you they mainly need to hire people to help them?”
Response 1: I agree that selecting experts is a challenge, but it seems better to survey credible experts than to exclude that evidence from the decision-making process. Also, the challenge of ‘who to treat as expert’ applies to EA and decision-making in general. We might later think that some experts were not the best to follow, but it still seems better to pay attention to those who seem expert now as opposed to the alternative of making decision based on personal intuitions.
Quote 2: And if you pick experts correctly, then once you talk to one or several of these experts, you might discover a bottle neck that is not at all in having a shared language, but is “they need a latex editor” or “someone needs to brainstorm how to find nobel prize winners to work on AI Safety” (I’m just making this up here). My point it, my priors are they will give surprising answers. [my priors are from user research, and specifically this]. These are my priors for why picking something like “having a shared language” before talking to them is probably not a good idea (though I shared why I think so, so if it doesn’t make sense, totally ignore what I said)
Response 2: I agree—a shared language won’t solve every issue, but uncovering the new issues will actually be valuable to guide other movement building work. For instance, if we realise we need latex editors more urgently then I am happy to work/advocate for that.
Quote 3: “Finding a shared language” pattern matches for me (maybe incorrectly!) to solutions like “let’s make a graph of human knowledge” which almost always fail (and I think when they work they’re unusual). These solutions are.. “far” from the problem. Sorry I’m not so coherent.
Anyway, something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
Response 3: Yeah, this makes sense—I realise I haven’t been clear enough. By creating a ‘shared language,’ I mainly mean increasing the overlap in how people conceptualize AI Safety movement building and its parts. For instance, if we all shared my understanding, everyone would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc). This way, when I ask people how they feel about AI Safety Movement building, they would all evaluate similar things to me and each other rather than very different private conceptualisations (e.g., that MB is only about running camps at universities or posting online).
Quote 4: I just spent a few months trying to figure out AI Safety so that I can have some kind of opinion about questions like “who to trust” or “does this research agenda make sense”. This was kind of hard in my experience, but I do think it’s the place to start.
Really, a simple example to keep in mind is that you might be interviewing “experts” who are actively working on things that make the situation worse—this would ruin your entire project. And figuring this out is really hard imo
Response 4: Your approach was/is a good starting point to figure out AI Safety. However, would it have been helpful to know that 85% of researchers recommend research agenda Y or person X? I think it would in the same way. I based this on believing, for instance, that knowing GiveWell/80k’s best options for donations/careers or researcher predictions for AGI is beneficial for individual decision-making in those realms. I therefore want something similar in AI Safety Movement building.
Quote 5: “None of this means ‘we should stop all community building’, but it does point at some annoying complications.”
Response 5: Yes—I agree—To reiterate my earlier point, I think that we address the complications via self assessment of the situation but that we should also try to survey and work alongside those who are more expert.
I’ll also just offer a few examples of what I have in mind because you said that it would be helpful:
1. How we could poll experts: We might survey AI researchers asking them to predict the outcomes of various research agendas so we can assess collective sentiment and add that the the pool of evidence for decision makers (researchers, funders etc) to use. A somewhat similar example is this work:: Intermediate goals in AI governance survey
2. Supervision: Future funded AI safety MB projects could be expected to have one or more experts advisors who reduce the risk of bad outcomes. E.g., X people write about AI safety for the public or as recruiters etc and Y experts who do direct work check the communication to ensure it is good on their expectation.
These are just initial ideas that indicate the direction of my thinking, not necessarily what I expect. I have a lot to learn before I have much confidence.
Anyway, I hope that some of this was helpful! Would welcome more thoughts and questions but please don’t put yourself under pressure to reply.
- Yonatan Cale 29 Mar 2023 12:14 UTC
  4 points
  0 ∶ 0
  Parent
  I agree that selecting experts is a challenge, but it seems better to survey credible experts than to exclude that evidence from the decision-making process.
  I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
  Yonatan:
  something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
  Peter:
  would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc)
  Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
  If so, it still seems too-meta / too-vague.
  Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
  And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
  See my original comment:
  I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
  - Yonatan Cale 29 Mar 2023 12:19 UTC
    4 points
    0 ∶ 0
    Parent
    However, would it have been helpful to know that 85% of researchers recommend research agenda Y or person X?
    TL;DR: No. (I know this is an annoying unintuitive answer)
    I wouldn’t be surprised if 85% of researchers think that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change), and they’ll give you some reason that sounds very wrong to me. I’m assuming you interview anyone who sees themselves as working on “AI Safety”.
    [I don’t actually know if this statistic would be true, but it’s a kind example of how your survey suggestion might go wrong imo]
    - PeterSlattery 30 Mar 2023 7:32 UTC
      2 points
      0 ∶ 0
      Parent
      Thanks, that’s helpful to know. It’s a surprise to me though! You’re the first person I have discussed this with who didn’t think it would be useful to know which research agendas were more widely supported.
      
      Just to check, would your institution change if the people being survey were only people who had worked at AI organisations, or if you could filter to only see the aggregate ratings from people who you thought were most credible (e.g., these 10 researchers)?
      
      As an aside, I’ll also mention that I think it would be a very helpful and interesting finding if we found that 85% of researchers thought that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change). That would make me change my mind on a lot of things and probably spark a lot of important debate that probably wouldn’t otherwise have happened.
  - PeterSlattery 30 Mar 2023 7:27 UTC
    2 points
    0 ∶ 0
    Parent
    Thanks for replying:
    
    I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
    
    Sorry. Again, early ideas, but the credible experts might be people who have published an AI safety paper, received funding to work on AI, and/or worked at an organisation etc. Let me know what you think of that as a sample.
    
    Yonatan:
    something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create.
    
    Peter
    would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc)
    Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
    
    The bolded terms are broadly examples of things that I want people in the community to conceptualize in similar ways so that we can have better conversations about them (i.e., shared language/understanding). What I mention there and in my posts is just my own understanding, and I’d be happy to revise it or use a better set of shared concepts.
    
    If so, it still seems too-meta / too-vague.
    Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
    And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
    See my original comment:
    I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
    I agree that the shared language will fail to address the problem of differentiating positive community building from negative community building.
    
    However, I think it is important to have it because we need shared conceptualisations and understanding of key concepts to be able to productively discuss AI safety movement building.
    
    I therefore see it as something that will be helpful for making progress in differentiating what is most likely to be good or bad AI Safety Movement building, regardless of whether that is via 1-1 discussions or survey of experts etc.
    Maybe it’s useful to draw an analogy to EA? Imagine that someone wanted to understand what sort of problems people in EA thought were most important so that they can work on and advocate for them. Imagine this is before we have many shared concepts everyone is talking about doing good in different ways—saving lives, reducing stuffer or risk etc. This person realises that people seem to have very different understandings of doing good/impact etc so they try to develop and introduce some shared conceptualizations like cause areas and qualys etc. Then they use those concepts to help them explore the community consensus and use that evidence to help them to make better decisions. That’s sort of what I am trying to do here.
    
    Does that make sense? Does it seem reasonable? Open to more thoughts if you have time and interest.