Some people think community building efforts around AI Safety are net negative, such as the post about “Shutting Down the Lightcone Offices”.
I’m not saying they’re right (it seems complicated in a way I don’t know how to solve), but I do think they’re pointing at a real failure mode.
I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
Excuse me if I misunderstood what you’re saying—I saw you specifically wanted comments and don’t have any yet, so I’m err’ing at the side of sharing my first (maybe wrong) thoughts
Hey Yonatan, thanks for replying, I really appreciate it! Here is a quick response.
I read the comments by Oliver and Ben in “Shutting Down the Lightcone Offices”.
I think that they have very valid concerns about AI Safety Movement Building (pretty sure I linked this piece in my article).
However, I don’t think that the optimum response to such concern is to stop trying to understand and improve how we do AI Safety Movement building. That seems premature given current evidence.
Instead, I think that the best response here (and everywhere else there is criticism) is to be proactively try to understand and address the concerns expressed (if possible).
To expand and link into what I discuss in my top level post: When I, a movement builder, read the link above, I think something like this: Oliver/Ben are smarter than I am and more knowledgeable about the AI safety community and it’s needs. I should therefore be more concerned than I was about the risks of AI Safety movement building.
On the other hand, lots of other people who are similarly smart and knowledgeable are in favour of AI Safety movement building of various types. Maybe Oliver and Ben hold a minority view?
I wonder: Do Oliver/Ben have the same conception of movement building as me or as many of the other people I have talked to? I imagine that they are thinking about the types of movement building which involve largely unsupervised recruitment whereas I am thinking about a wide range of things. Some of these things involve no recruitment (e.g., working on increasing contributions and coordination via resource synthesis), and all are ideally done under the supervision of relevant experts. I doubt that Oliver and Ben think that all types of movement building are bad (probably not given that they work as movement builders).
So all in all, I am not really sure what to do.
This brings me to some of what I am trying to do at the moment, as per the top level post: trying to create, then hopefully use, some sort of shared language to better understand what relevant people think is good/bad AI Safety Movement building, and why, so that I can hopefully make better decisions.
As part of this, I am hoping to persuade people like Oliver/Ben to i) read something like what I wrote above (so that they understand what I mean by movement building) and then ii) participate in various survey/discussion activities that will help me and others to understand what i) sort of movement building activities they are for and against and why they feel as they do about these options.
Then, when I know all that, hopefully, I will have a much improved and more nuanced understanding of who thinks what and why (e.g., that 75% of respondents want more ML engineers with skill X, or think that a scaled up SERI-mats project in Taiwan would be valuable, or have these contrasting intuitions about a particular option).
I can use that understanding to guide decisions about if/how to do movement building as effectively as possible.
Is that response helpful? Does my plan sound like a bad idea or very unlikely to succeed? Let me know if you have any further questions or thoughts!
Edit: I just wrote this, it’s ~1:30am here, I’m super tired and think this was incoherent. Please be extra picky with what you take from my message, if something doesn’t make sense then it’s me, not you. I’m still leaving the comment because it sounds like you really want comments
—
Hey,
TL;DR: This sounds too meta, I don’t think I understand many important points of your plan, and I think examples would help.
It involves trusting experts, or deferring to them, or polling them, or having them supervise your work, or other things.
1)
This still leaves open questions like “how do you chose those experts”, for example do you do it based on who has the most upvotes on the forum? (I guess not), or what happens if you chose “experts” who are making the AI situation WORSE and they tell you they mainly need to hire people to help them?
2)
And if you pick experts correctly, then once you talk to one or several of these experts, you might discover a bottle neck that is not at all in having a shared language, but is “they need a latex editor” or “someone needs to brainstorm how to find nobel prize winners to work on AI Safety” (I’m just making this up here). My point it, my priors are they will give surprising answers. [my priors are from user research, and specifically this]. These are my priors for why picking something like “having a shared language” before talking to them is probably not a good idea (though I shared why I think so, so if it doesn’t make sense, totally ignore what I said)
3)
Too meta:
“Finding a shared language” pattern matches for me (maybe incorrectly!) to solutions like “let’s make a graph of human knowledge” which almost always fail (and I think when they work they’re unusual). These solutions are.. “far” from the problem. Sorry I’m not so coherent.
Anyway, something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
4)
Sharing from myself:
I just spent a few months trying to figure out AI Safety so that I can have some kind of opinion about questions like “who to trust” or “does this research agenda make sense”. This was kind of hard in my experience, but I do think it’s the place to start.
Really, a simple example to keep in mind is that you might be interviewing “experts” who are actively working on things that make the situation worse—this would ruin your entire project. And figuring this out is really hard imo
5)
None of this means “we should stop all community building”, but it does point at some annoying complications
This whole situation where a large number of possible seemingly-useful actions turns out to be net negative—is SUPER ANNOYING imo, it is absolutely not anything against you, I wish it wasn’t this way, etc.
Ah, and also:
I with many others would consult about their ideas in public as you’ve done here, and you have my personal appreciation for that, fwiw
Thank you for this! Your comment is definitely readable and helpful. It highlights gaps in my communication and pushes me to think more deeply and explain my ideas better.
I’ve gained two main insights. First, I should be clearer about what I mean when I use terms like “shared language.” Second, I realise that I see EA as a well-functioning aggregator for the wisdom of well-calibrated crowds, and want to see something similar to that for AI Safety Movement building.
Now, let me address your individual points, using the quotes you provided:
Quote 1: “This still leaves open questions like “how do you chose those experts”, for example do you do it based on who has the most upvotes on the forum? (I guess not), or what happens if you chose “experts” who are making the AI situation WORSE and they tell you they mainly need to hire people to help them?”
Response 1: I agree that selecting experts is a challenge, but it seems better to survey credible experts than to exclude that evidence from the decision-making process. Also, the challenge of ‘who to treat as expert’ applies to EA and decision-making in general. We might later think that some experts were not the best to follow, but it still seems better to pay attention to those who seem expert now as opposed to the alternative of making decision based on personal intuitions.
Quote 2: And if you pick experts correctly, then once you talk to one or several of these experts, you might discover a bottle neck that is not at all in having a shared language, but is “they need a latex editor” or “someone needs to brainstorm how to find nobel prize winners to work on AI Safety” (I’m just making this up here). My point it, my priors are they will give surprising answers. [my priors are from user research, and specifically this]. These are my priors for why picking something like “having a shared language” before talking to them is probably not a good idea (though I shared why I think so, so if it doesn’t make sense, totally ignore what I said)
Response 2: I agree—a shared language won’t solve every issue, but uncovering the new issues will actually be valuable to guide other movement building work. For instance, if we realise we need latex editors more urgently then I am happy to work/advocate for that.
Quote 3: “Finding a shared language” pattern matches for me (maybe incorrectly!) to solutions like “let’s make a graph of human knowledge” which almost always fail (and I think when they work they’re unusual). These solutions are.. “far” from the problem. Sorry I’m not so coherent.
Anyway, something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
Response 3: Yeah, this makes sense—I realise I haven’t been clear enough. By creating a ‘shared language,’ I mainly mean increasing the overlap in how people conceptualize AI Safety movement building and its parts. For instance, if we all shared my understanding, everyone would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc). This way, when I ask people how they feel about AI Safety Movement building, they would all evaluate similar things to me and each other rather than very different private conceptualisations (e.g., that MB is only about running camps at universities or posting online).
Quote 4: I just spent a few months trying to figure out AI Safety so that I can have some kind of opinion about questions like “who to trust” or “does this research agenda make sense”. This was kind of hard in my experience, but I do think it’s the place to start.
Really, a simple example to keep in mind is that you might be interviewing “experts” who are actively working on things that make the situation worse—this would ruin your entire project. And figuring this out is really hard imo
Response 4: Your approach was/is a good starting point to figure out AI Safety. However, would it have been helpful to know that 85% of researchers recommend research agenda Y or person X? I think it would in the same way. I based this on believing, for instance, that knowing GiveWell/80k’s best options for donations/careers or researcher predictions for AGI is beneficial for individual decision-making in those realms. I therefore want something similar in AI Safety Movement building.
Quote 5: “None of this means ‘we should stop all community building’, but it does point at some annoying complications.”
Response 5: Yes—I agree—To reiterate my earlier point, I think that we address the complications via self assessment of the situation but that we should also try to survey and work alongside those who are more expert.
I’ll also just offer a few examples of what I have in mind because you said that it would be helpful:
How we could poll experts: We might survey AI researchers asking them to predict the outcomes of various research agendas so we can assess collective sentiment and add that the the pool of evidence for decision makers (researchers, funders etc) to use. A somewhat similar example is this work:: Intermediate goals in AI governance survey
Supervision: Future funded AI safety MB projects could be expected to have one or more experts advisors who reduce the risk of bad outcomes. E.g., X people write about AI safety for the public or as recruiters etc and Y experts who do direct work check the communication to ensure it is good on their expectation.
These are just initial ideas that indicate the direction of my thinking, not necessarily what I expect. I have a lot to learn before I have much confidence.
Anyway, I hope that some of this was helpful! Would welcome more thoughts and questions but please don’t put yourself under pressure to reply.
I agree that selecting experts is a challenge, but it seems better to survey credible experts than to exclude that evidence from the decision-making process.
I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
Yonatan:
something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
Peter:
would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc)
Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
If so, it still seems too-meta / too-vague.
Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
See my original comment:
I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
However, would it have been helpful to know that 85% of researchers recommend research agenda Y or person X?
TL;DR: No. (I know this is an annoying unintuitive answer)
I wouldn’t be surprised if 85% of researchers think that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change), and they’ll give you some reason that sounds very wrong to me. I’m assuming you interview anyone who sees themselves as working on “AI Safety”.
[I don’t actually know if this statistic would be true, but it’s a kind example of how your survey suggestion might go wrong imo]
Thanks, that’s helpful to know. It’s a surprise to me though! You’re the first person I have discussed this with who didn’t think it would be useful to know which research agendas were more widely supported.
Just to check, would your institution change if the people being survey were only people who had worked at AI organisations, or if you could filter to only see the aggregate ratings from people who you thought were most credible (e.g., these 10 researchers)?
As an aside, I’ll also mention that I think it would be a very helpful and interesting finding if we found that 85% of researchers thought that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change). That would make me change my mind on a lot of things and probably spark a lot of important debate that probably wouldn’t otherwise have happened.
I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
Sorry. Again, early ideas, but the credible experts might be people who have published an AI safety paper, received funding to work on AI, and/or worked at an organisation etc. Let me know what you think of that as a sample.
Yonatan:
something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create.
Peter would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc)
Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
The bolded terms are broadly examples of things that I want people in the community to conceptualize in similar ways so that we can have better conversations about them (i.e., shared language/understanding). What I mention there and in my posts is just my own understanding, and I’d be happy to revise it or use a better set of shared concepts.
If so, it still seems too-meta / too-vague.
Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
See my original comment:
I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
I agree that the shared language will fail to address the problem of differentiating positive community building from negative community building.
However, I think it is important to have it because we need shared conceptualisations and understanding of key concepts to be able to productively discuss AI safety movement building.
I therefore see it as something that will be helpful for making progress in differentiating what is most likely to be good or bad AI Safety Movement building, regardless of whether that is via 1-1 discussions or survey of experts etc.
Maybe it’s useful to draw an analogy to EA? Imagine that someone wanted to understand what sort of problems people in EA thought were most important so that they can work on and advocate for them. Imagine this is before we have many shared concepts everyone is talking about doing good in different ways—saving lives, reducing stuffer or risk etc. This person realises that people seem to have very different understandings of doing good/impact etc so they try to develop and introduce some shared conceptualizations like cause areas and qualys etc. Then they use those concepts to help them explore the community consensus and use that evidence to help them to make better decisions. That’s sort of what I am trying to do here.
Does that make sense? Does it seem reasonable? Open to more thoughts if you have time and interest.
Hey,
Some people think community building efforts around AI Safety are net negative, such as the post about “Shutting Down the Lightcone Offices”.
I’m not saying they’re right (it seems complicated in a way I don’t know how to solve), but I do think they’re pointing at a real failure mode.
I think that having metrics for community building (that are not strongly grounded in a “good” theory of change) (such as metrics for increasing people in the field in general) have extra risk for that kind of failure mode.
Excuse me if I misunderstood what you’re saying—I saw you specifically wanted comments and don’t have any yet, so I’m err’ing at the side of sharing my first (maybe wrong) thoughts
Hey Yonatan, thanks for replying, I really appreciate it! Here is a quick response.
I read the comments by Oliver and Ben in “Shutting Down the Lightcone Offices”.
I think that they have very valid concerns about AI Safety Movement Building (pretty sure I linked this piece in my article).
However, I don’t think that the optimum response to such concern is to stop trying to understand and improve how we do AI Safety Movement building. That seems premature given current evidence.
Instead, I think that the best response here (and everywhere else there is criticism) is to be proactively try to understand and address the concerns expressed (if possible).
To expand and link into what I discuss in my top level post: When I, a movement builder, read the link above, I think something like this: Oliver/Ben are smarter than I am and more knowledgeable about the AI safety community and it’s needs. I should therefore be more concerned than I was about the risks of AI Safety movement building.
On the other hand, lots of other people who are similarly smart and knowledgeable are in favour of AI Safety movement building of various types. Maybe Oliver and Ben hold a minority view?
I wonder: Do Oliver/Ben have the same conception of movement building as me or as many of the other people I have talked to? I imagine that they are thinking about the types of movement building which involve largely unsupervised recruitment whereas I am thinking about a wide range of things. Some of these things involve no recruitment (e.g., working on increasing contributions and coordination via resource synthesis), and all are ideally done under the supervision of relevant experts. I doubt that Oliver and Ben think that all types of movement building are bad (probably not given that they work as movement builders).
So all in all, I am not really sure what to do.
This brings me to some of what I am trying to do at the moment, as per the top level post: trying to create, then hopefully use, some sort of shared language to better understand what relevant people think is good/bad AI Safety Movement building, and why, so that I can hopefully make better decisions.
As part of this, I am hoping to persuade people like Oliver/Ben to i) read something like what I wrote above (so that they understand what I mean by movement building) and then ii) participate in various survey/discussion activities that will help me and others to understand what i) sort of movement building activities they are for and against and why they feel as they do about these options.
Then, when I know all that, hopefully, I will have a much improved and more nuanced understanding of who thinks what and why (e.g., that 75% of respondents want more ML engineers with skill X, or think that a scaled up SERI-mats project in Taiwan would be valuable, or have these contrasting intuitions about a particular option).
I can use that understanding to guide decisions about if/how to do movement building as effectively as possible.
Is that response helpful? Does my plan sound like a bad idea or very unlikely to succeed? Let me know if you have any further questions or thoughts!
Edit: I just wrote this, it’s ~1:30am here, I’m super tired and think this was incoherent. Please be extra picky with what you take from my message, if something doesn’t make sense then it’s me, not you. I’m still leaving the comment because it sounds like you really want comments
—
Hey,
TL;DR: This sounds too meta, I don’t think I understand many important points of your plan, and I think examples would help.
It involves trusting experts, or deferring to them, or polling them, or having them supervise your work, or other things.
1)
This still leaves open questions like “how do you chose those experts”, for example do you do it based on who has the most upvotes on the forum? (I guess not), or what happens if you chose “experts” who are making the AI situation WORSE and they tell you they mainly need to hire people to help them?
2)
And if you pick experts correctly, then once you talk to one or several of these experts, you might discover a bottle neck that is not at all in having a shared language, but is “they need a latex editor” or “someone needs to brainstorm how to find nobel prize winners to work on AI Safety” (I’m just making this up here). My point it, my priors are they will give surprising answers. [my priors are from user research, and specifically this]. These are my priors for why picking something like “having a shared language” before talking to them is probably not a good idea (though I shared why I think so, so if it doesn’t make sense, totally ignore what I said)
3)
Too meta:
“Finding a shared language” pattern matches for me (maybe incorrectly!) to solutions like “let’s make a graph of human knowledge” which almost always fail (and I think when they work they’re unusual). These solutions are.. “far” from the problem. Sorry I’m not so coherent.
Anyway, something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
4)
Sharing from myself:
I just spent a few months trying to figure out AI Safety so that I can have some kind of opinion about questions like “who to trust” or “does this research agenda make sense”. This was kind of hard in my experience, but I do think it’s the place to start.
Really, a simple example to keep in mind is that you might be interviewing “experts” who are actively working on things that make the situation worse—this would ruin your entire project. And figuring this out is really hard imo
5)
None of this means “we should stop all community building”, but it does point at some annoying complications
I want to add one more thing:
This whole situation where a large number of possible seemingly-useful actions turns out to be net negative—is SUPER ANNOYING imo, it is absolutely not anything against you, I wish it wasn’t this way, etc.
Ah, and also:
I with many others would consult about their ideas in public as you’ve done here, and you have my personal appreciation for that, fwiw
Thanks! I understand. I am not taking any of this personally and I am enjoying the experience of getting feedback!
Hi Yonatan,
Thank you for this! Your comment is definitely readable and helpful. It highlights gaps in my communication and pushes me to think more deeply and explain my ideas better.
I’ve gained two main insights. First, I should be clearer about what I mean when I use terms like “shared language.” Second, I realise that I see EA as a well-functioning aggregator for the wisdom of well-calibrated crowds, and want to see something similar to that for AI Safety Movement building.
Now, let me address your individual points, using the quotes you provided:
Quote 1: “This still leaves open questions like “how do you chose those experts”, for example do you do it based on who has the most upvotes on the forum? (I guess not), or what happens if you chose “experts” who are making the AI situation WORSE and they tell you they mainly need to hire people to help them?”
Response 1: I agree that selecting experts is a challenge, but it seems better to survey credible experts than to exclude that evidence from the decision-making process. Also, the challenge of ‘who to treat as expert’ applies to EA and decision-making in general. We might later think that some experts were not the best to follow, but it still seems better to pay attention to those who seem expert now as opposed to the alternative of making decision based on personal intuitions.
Quote 2: And if you pick experts correctly, then once you talk to one or several of these experts, you might discover a bottle neck that is not at all in having a shared language, but is “they need a latex editor” or “someone needs to brainstorm how to find nobel prize winners to work on AI Safety” (I’m just making this up here). My point it, my priors are they will give surprising answers. [my priors are from user research, and specifically this]. These are my priors for why picking something like “having a shared language” before talking to them is probably not a good idea (though I shared why I think so, so if it doesn’t make sense, totally ignore what I said)
Response 2: I agree—a shared language won’t solve every issue, but uncovering the new issues will actually be valuable to guide other movement building work. For instance, if we realise we need latex editors more urgently then I am happy to work/advocate for that.
Quote 3: “Finding a shared language” pattern matches for me (maybe incorrectly!) to solutions like “let’s make a graph of human knowledge” which almost always fail (and I think when they work they’re unusual). These solutions are.. “far” from the problem. Sorry I’m not so coherent.
Anyway, something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create. Maybe you want a term for “safety washing” as an example [of an example]?
Response 3: Yeah, this makes sense—I realise I haven’t been clear enough. By creating a ‘shared language,’ I mainly mean increasing the overlap in how people conceptualize AI Safety movement building and its parts. For instance, if we all shared my understanding, everyone would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc). This way, when I ask people how they feel about AI Safety Movement building, they would all evaluate similar things to me and each other rather than very different private conceptualisations (e.g., that MB is only about running camps at universities or posting online).
Quote 4: I just spent a few months trying to figure out AI Safety so that I can have some kind of opinion about questions like “who to trust” or “does this research agenda make sense”. This was kind of hard in my experience, but I do think it’s the place to start.
Really, a simple example to keep in mind is that you might be interviewing “experts” who are actively working on things that make the situation worse—this would ruin your entire project. And figuring this out is really hard imo
Response 4: Your approach was/is a good starting point to figure out AI Safety. However, would it have been helpful to know that 85% of researchers recommend research agenda Y or person X? I think it would in the same way. I based this on believing, for instance, that knowing GiveWell/80k’s best options for donations/careers or researcher predictions for AGI is beneficial for individual decision-making in those realms. I therefore want something similar in AI Safety Movement building.
Quote 5: “None of this means ‘we should stop all community building’, but it does point at some annoying complications.”
Response 5: Yes—I agree—To reiterate my earlier point, I think that we address the complications via self assessment of the situation but that we should also try to survey and work alongside those who are more expert.
I’ll also just offer a few examples of what I have in mind because you said that it would be helpful:
How we could poll experts: We might survey AI researchers asking them to predict the outcomes of various research agendas so we can assess collective sentiment and add that the the pool of evidence for decision makers (researchers, funders etc) to use. A somewhat similar example is this work:: Intermediate goals in AI governance survey
Supervision: Future funded AI safety MB projects could be expected to have one or more experts advisors who reduce the risk of bad outcomes. E.g., X people write about AI safety for the public or as recruiters etc and Y experts who do direct work check the communication to ensure it is good on their expectation.
These are just initial ideas that indicate the direction of my thinking, not necessarily what I expect. I have a lot to learn before I have much confidence.
Anyway, I hope that some of this was helpful! Would welcome more thoughts and questions but please don’t put yourself under pressure to reply.
I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
Yonatan:
Peter:
Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
If so, it still seems too-meta / too-vague.
Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
See my original comment:
TL;DR: No. (I know this is an annoying unintuitive answer)
I wouldn’t be surprised if 85% of researchers think that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change), and they’ll give you some reason that sounds very wrong to me. I’m assuming you interview anyone who sees themselves as working on “AI Safety”.
[I don’t actually know if this statistic would be true, but it’s a kind example of how your survey suggestion might go wrong imo]
Thanks, that’s helpful to know. It’s a surprise to me though! You’re the first person I have discussed this with who didn’t think it would be useful to know which research agendas were more widely supported.
Just to check, would your institution change if the people being survey were only people who had worked at AI organisations, or if you could filter to only see the aggregate ratings from people who you thought were most credible (e.g., these 10 researchers)?
As an aside, I’ll also mention that I think it would be a very helpful and interesting finding if we found that 85% of researchers thought that it would be a good idea to advance capabilities (or do some research that directly advances capabilities and does not have a “full” safety theory of change). That would make me change my mind on a lot of things and probably spark a lot of important debate that probably wouldn’t otherwise have happened.
Thanks for replying:
I want to point out you didn’t address my intended point of “how to pick experts”. You said you’d survey “credible experts”—who are those? How do you pick them? A more object-level answer would be “by forum karma” (not that I’m saying it’s the best answer, but it is more object-level than saying you’d pick the “credible” ones)
Sorry. Again, early ideas, but the credible experts might be people who have published an AI safety paper, received funding to work on AI, and/or worked at an organisation etc. Let me know what you think of that as a sample.
Yonatan:
something that might change my mind very quickly is if you’ll give me examples of what “language” you might want to create.
Peter
would conceptualize movement building in a broad sense e.g., as something involving increased contributors, contribution and coordination, people helping with operations communication and working on it while doing direct work (e.g, via going to conferences etc)
Are these examples of things you think might be useful to add to the language of community building, such as “safety washing” might be an example of something useful?
The bolded terms are broadly examples of things that I want people in the community to conceptualize in similar ways so that we can have better conversations about them (i.e., shared language/understanding). What I mention there and in my posts is just my own understanding, and I’d be happy to revise it or use a better set of shared concepts.
If so, it still seems too-meta / too-vague.
Specifically, it fails to address the problem I’m trying to point out of differentiating positive community building from negative community building.
And specifically, I think focusing on a KPI like “increased contributors” is the way that AI Safety community building accidentally becomes net negative.
See my original comment:
I agree that the shared language will fail to address the problem of differentiating positive community building from negative community building.
However, I think it is important to have it because we need shared conceptualisations and understanding of key concepts to be able to productively discuss AI safety movement building.
I therefore see it as something that will be helpful for making progress in differentiating what is most likely to be good or bad AI Safety Movement building, regardless of whether that is via 1-1 discussions or survey of experts etc.
Maybe it’s useful to draw an analogy to EA? Imagine that someone wanted to understand what sort of problems people in EA thought were most important so that they can work on and advocate for them. Imagine this is before we have many shared concepts everyone is talking about doing good in different ways—saving lives, reducing stuffer or risk etc. This person realises that people seem to have very different understandings of doing good/impact etc so they try to develop and introduce some shared conceptualizations like cause areas and qualys etc. Then they use those concepts to help them explore the community consensus and use that evidence to help them to make better decisions. That’s sort of what I am trying to do here.
Does that make sense? Does it seem reasonable? Open to more thoughts if you have time and interest.