I appreciate that a ton of work went into this, and the results are interesting. That said, I am skeptical of the value of surveys with low response rates (in this case, 15%), especially when those surveys are likely subject to non-response bias, as I suspect this one is, given: (1) many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey and (2) those researchers would likely have answered the questions on the survey differently. (I do appreciate that the authors took steps to mitigate the risk of non-response bias at the survey level, and did not find evidence of this at the question level.)
I don’t find the “expert surveys tend to have low response rates” defense particularly compelling, given: (1) the loaded nature of the content of the survey (meaning bias is especially likely), (2) the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise), (3) the fact that expert surveys often do have higher response rates (26% is a lot higher than 15%), especially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well, and (4) the possibility that many expert surveys just aren’t very useful.
Given the non-response bias issue, I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey. I recognize that the survey may have value independent of its knowledge value—for instance, I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely. That said, I don’t think we should make a habit of citing work that has methodological issues simply because such results may be compelling to people who won’t dig into them.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified (am I calculating correctly that $138,000 was spent just compensating participants for taking this survey, and that doesn’t include other costs, like those associated with using the outside firm to compensate participants, researchers’ time, etc?). In light of my concerns about cost and non-response bias, I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100), given this strategy could both reduce costs and improve response rates.
Quantitatively how large do you think the non-response bias might be? Do you have some experience or evidence in this area that would help estimate the effect size? I don’t have much to go on, so I’d definitely welcome pointers.
Let’s consider the 40% of people who put a 10% probability on extinction or similarly bad outcomes (which seems like what you are focusing on). Perhaps you are worried about something like: researchers concerned about risk might be 3x more likely to answer the survey than those who aren’t concerned about risk, and so in fact only 20% of people assign a 10% probability, not the 40% suggested by the survey.
Changing from 40% to 20% would be a significant revision of the results, but honestly that’s probably comparable to other sources of error and I’m not sure you should be trying to make that precise an inference.
But more importantly a 3x selection effect seems implausibly large to me. The survey was presented as being about “progress in AI” and there’s not an obvious mechanism for huge selection effects on these questions. I haven’t seen literature that would help estimate the effect size, but based on a general sense of correlation sizes in other domains I’d be pretty surprised by getting a 3x or even 2x selection effect based on this kind of indirect association. (A 2x effect on response rate based on views about risks seems to imply a very serious piranha problem)
The largest demographic selection effects were that some groups (e.g. academia vs industry, junior vs senior authors) were about 1.5x more likely to fill out the survey. Those small selection effects seem more like what I’d expect and are around where I’d set the prior (so: 40% being concerned might really be 30% or 50%).
many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey … the loaded nature of the content of the survey (meaning bias is especially likely),
I think the survey was described as about “progress in AI” (and mostly concerned progress in AI), and this seems like all people saw when deciding to take it. Once people started taking the survey it looks like there was negligible non-response at the question level. You can see the first page of the survey here, which I assume is representative of what people saw when deciding to take the survey.
I’m not sure if this was just a misunderstanding of the way the survey was framed. Or perhaps you think people have seen reporting on the survey in previous years and are aware that the question on risks attracted a lot of public attention, and therefore are much more likely to fill out the survey if they think risk is large? (But I think the mechanism and sign here are kind of unclear.)
specially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well
If compensation is a significant part of why participants take the survey, then I think it lowers the scope for selection bias based on views (though increases the chances that e.g. academics or junior employees are more likely to respond).
I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely
I think it’s dishonest to cite work that you think doesn’t provide evidence. That’s even more true if you think readers won’t review the citations for themselves. In my view the 15% response rate doesn’t undermine the bottom line conclusions very seriously, but if your views about non-response mean the survey isn’t evidence then I think you definitely shouldn’t cite it.
the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise),
I think the goal was to survey researchers in machine learning, and so it was sent to researchers who publish in the top venues in machine learning. I don’t think “expert” was meant to imply that these respondents had e.g. some kind of particular expertise about risk. In fact the preprint emphasizes that very few of the respondents have thought at length about the long-term impacts of AI.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified
I think it can easily be justified. This survey covers a set of extremely important questions, where policy decisions have trillions of dollars of value at stake and the views of the community of experts are frequently cited in policy discussions.
You didn’t make your concerns about selection bias quantitative, but I’m skeptical about quantitatively how much they decrease the value of information. And even if we think non-response is fatal for some purposes, it doesn’t interfere as much with comparisons across questions (e.g. what tasks do people expect to be accomplished sooner or later, what risks do they take more or less seriously) or for observing how the views of the community change with time.
I think there are many ways in which the survey could be improved, and it would be worth spending additional labor to make those improvements. I agree that sending a survey to a smaller group of recipients with larger compensation could be a good way to measure the effects of non-response bias (and might be more respectful of the research community’s time).
I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey
I think the main takeaway w.r.t. risk is that typical researchers in ML (like most of the public) have not thought about impacts of AI very seriously but their intuitive reaction is that a range of negative outcomes are plausible. They are particularly concerned about some impacts (like misinformation), particularly unconcerned about others (like loss of meaning), and are more ambivalent about others (like loss of control).
I think this kind of “haven’t thought about it” is a much larger complication for interpreting the results of the survey, although I think it’s fine as long as you bear it in mind. (I think ML researchers who have thought about the issue in detail tend if anything to be somewhat more concerned than the survey respondents.)
many AI researchers just don’t seem too concerned about the risks posed by AI
My impressions of academic opinion have been broadly consistent with these survey results. I agree there is large variation and that many AI researchers are extremely skeptical about risk.
I really appreciate your and @Katja_Grace’s thoughtful responses, and wish more of this discussion had made it into the manuscript. (This is a minor thing, but I also didn’t love that the response rate/related concerns were introduced on page 20 [right?], since it’s standard practice—at least in my area—to include a response rate up front, if not in the abstract.) I wish I had more time to respond to the many reasonable points you’ve raised, and will try to come back to this in the next few days if I do have time, but I’ve written up a few thoughts here.
many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey
Note that we didn’t tell them the topic that specifically.
I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100)..
Tried sending them $100 last year and if anything it lowered the response rate.
If you are inclined to dismiss this based on your premise “many AI researchers just don’t seem too concerned about the risks posed by AI”, I’m curious where you get that view from, and why you think it is a less biased source.
Note that we didn’t tell them the topic that specifically.
I understand that, and think this was the right call. But there seems to be consensusthat in general, a response rate below ~70% introduces concerns of non-response bias, and when you’re at 15%—with (imo) good reason to think there would be non-response bias—you really cannot rule this out. (Even basic stuff like: responders probably earn less money than non-responders, and are thus probably younger, work in academia rather than industry, etc.; responders are more likely to be familiar with the prior AI Impacts survey, and all that that entails; and so on.) In short, there is a reason many medical journals have a policy of not publishing surveys with response rates below 60%; e.g., JAMA asks for >60%, less prestigious JAMA journals also ask for >60%, and BMJ asks for >65%. (I cite medical journals because their policies are the ones I’m most familiar with, not because I think there’s something special about medical journals.)
Tried sending them $100 last year and if anything it lowered the response rate.
I find it a bit hard to believe that this lowered response rates (was this statistically significant?), although I would buy that it didn’t increase response rates much, since I think I remember reading that response rates fall off pretty quickly as compensation for survey respondents increases. I also appreciate that you’re studying a high-earning group of experts, making it difficult to incentivize participation. That said, my reaction to this is: determine what the higher-order goals of this kind of project are, and adopt a methodology that aligns with that. I have a hard time believing that at this price point, conducting a survey with a 15% response rate is the optimal methodology.
If you are inclined to dismiss this based on your premise “many AI researchers just don’t seem too concerned about the risks posed by AI”, I’m curious where you get that view from, and why you think it is a less biased source.
My impression stems from conversations I’ve had with two CS professor friends about how concerned the CS community is about the risks posed by AI. For instance, last week, I was discussing the last AI Impacts survey with a CS professor (who has conducted surveys, as have I); I was defending the survey, and they were criticizing it for reasons similar to those outlined above. They said something to the effect of: the AI Impacts survey results do not align with my impression of people’s level of concern based on discussions I’ve had with friends and colleagues in the field. And I took that seriously, because this friend is EA-adjacent; extremely competent, careful, and trustworthy; and themselves sympathetic to concerns about AI risk. (I recognize I’m not giving you enough information for this to be at all worth updating on for you, but I’m just trying to give some context for my own skepticism, since you asked.)
Lastly, as someone immersed in the EA community myself, I think my bias is—if anything—in the direction of wanting to believe these results, but I just don’t think I should update much based on a survey with such a low response rate.
I think this is going to be my last word on the issue, since I suspect we’d need to delve more deeply into the literature on non-response bias/response rates to progress this discussion, and I don’t really have time to do that, but if you/others want to, I would definitely be eager to learn more.
Just to give your final point some context: the average in-depth research project by Rethink Priorities reportedly costs $70K-$100K. So, if this AI Impacts survey cost $138K in participant compensation, plus some additional amount for things like researcher time, then it looks like this survey was two or three times more expensive than the average research project in its approximate reference class.
I haven’t thought hard about whether the costs of EA-funded research make sense in general, but I thought I’d leave this comment so that readers don’t go away thinking that this survey cost like an order of magnitude more than what’s standard.
I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100), given this strategy could both reduce costs and improve response rates.
Note that 4,000 * $100 is $400,000 which is higher than the cost you cited above.
FWIW, both of these costs seem pretty small to me.
I appreciate that a ton of work went into this, and the results are interesting. That said, I am skeptical of the value of surveys with low response rates (in this case, 15%), especially when those surveys are likely subject to non-response bias, as I suspect this one is, given: (1) many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey and (2) those researchers would likely have answered the questions on the survey differently. (I do appreciate that the authors took steps to mitigate the risk of non-response bias at the survey level, and did not find evidence of this at the question level.)
I don’t find the “expert surveys tend to have low response rates” defense particularly compelling, given: (1) the loaded nature of the content of the survey (meaning bias is especially likely), (2) the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise), (3) the fact that expert surveys often do have higher response rates (26% is a lot higher than 15%), especially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well, and (4) the possibility that many expert surveys just aren’t very useful.
Given the non-response bias issue, I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey. I recognize that the survey may have value independent of its knowledge value—for instance, I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely. That said, I don’t think we should make a habit of citing work that has methodological issues simply because such results may be compelling to people who won’t dig into them.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified (am I calculating correctly that $138,000 was spent just compensating participants for taking this survey, and that doesn’t include other costs, like those associated with using the outside firm to compensate participants, researchers’ time, etc?). In light of my concerns about cost and non-response bias, I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100), given this strategy could both reduce costs and improve response rates.
Quantitatively how large do you think the non-response bias might be? Do you have some experience or evidence in this area that would help estimate the effect size? I don’t have much to go on, so I’d definitely welcome pointers.
Let’s consider the 40% of people who put a 10% probability on extinction or similarly bad outcomes (which seems like what you are focusing on). Perhaps you are worried about something like: researchers concerned about risk might be 3x more likely to answer the survey than those who aren’t concerned about risk, and so in fact only 20% of people assign a 10% probability, not the 40% suggested by the survey.
Changing from 40% to 20% would be a significant revision of the results, but honestly that’s probably comparable to other sources of error and I’m not sure you should be trying to make that precise an inference.
But more importantly a 3x selection effect seems implausibly large to me. The survey was presented as being about “progress in AI” and there’s not an obvious mechanism for huge selection effects on these questions. I haven’t seen literature that would help estimate the effect size, but based on a general sense of correlation sizes in other domains I’d be pretty surprised by getting a 3x or even 2x selection effect based on this kind of indirect association. (A 2x effect on response rate based on views about risks seems to imply a very serious piranha problem)
The largest demographic selection effects were that some groups (e.g. academia vs industry, junior vs senior authors) were about 1.5x more likely to fill out the survey. Those small selection effects seem more like what I’d expect and are around where I’d set the prior (so: 40% being concerned might really be 30% or 50%).
I think the survey was described as about “progress in AI” (and mostly concerned progress in AI), and this seems like all people saw when deciding to take it. Once people started taking the survey it looks like there was negligible non-response at the question level. You can see the first page of the survey here, which I assume is representative of what people saw when deciding to take the survey.
I’m not sure if this was just a misunderstanding of the way the survey was framed. Or perhaps you think people have seen reporting on the survey in previous years and are aware that the question on risks attracted a lot of public attention, and therefore are much more likely to fill out the survey if they think risk is large? (But I think the mechanism and sign here are kind of unclear.)
If compensation is a significant part of why participants take the survey, then I think it lowers the scope for selection bias based on views (though increases the chances that e.g. academics or junior employees are more likely to respond).
I think it’s dishonest to cite work that you think doesn’t provide evidence. That’s even more true if you think readers won’t review the citations for themselves. In my view the 15% response rate doesn’t undermine the bottom line conclusions very seriously, but if your views about non-response mean the survey isn’t evidence then I think you definitely shouldn’t cite it.
I think the goal was to survey researchers in machine learning, and so it was sent to researchers who publish in the top venues in machine learning. I don’t think “expert” was meant to imply that these respondents had e.g. some kind of particular expertise about risk. In fact the preprint emphasizes that very few of the respondents have thought at length about the long-term impacts of AI.
I think it can easily be justified. This survey covers a set of extremely important questions, where policy decisions have trillions of dollars of value at stake and the views of the community of experts are frequently cited in policy discussions.
You didn’t make your concerns about selection bias quantitative, but I’m skeptical about quantitatively how much they decrease the value of information. And even if we think non-response is fatal for some purposes, it doesn’t interfere as much with comparisons across questions (e.g. what tasks do people expect to be accomplished sooner or later, what risks do they take more or less seriously) or for observing how the views of the community change with time.
I think there are many ways in which the survey could be improved, and it would be worth spending additional labor to make those improvements. I agree that sending a survey to a smaller group of recipients with larger compensation could be a good way to measure the effects of non-response bias (and might be more respectful of the research community’s time).
I think the main takeaway w.r.t. risk is that typical researchers in ML (like most of the public) have not thought about impacts of AI very seriously but their intuitive reaction is that a range of negative outcomes are plausible. They are particularly concerned about some impacts (like misinformation), particularly unconcerned about others (like loss of meaning), and are more ambivalent about others (like loss of control).
I think this kind of “haven’t thought about it” is a much larger complication for interpreting the results of the survey, although I think it’s fine as long as you bear it in mind. (I think ML researchers who have thought about the issue in detail tend if anything to be somewhat more concerned than the survey respondents.)
My impressions of academic opinion have been broadly consistent with these survey results. I agree there is large variation and that many AI researchers are extremely skeptical about risk.
I really appreciate your and @Katja_Grace’s thoughtful responses, and wish more of this discussion had made it into the manuscript. (This is a minor thing, but I also didn’t love that the response rate/related concerns were introduced on page 20 [right?], since it’s standard practice—at least in my area—to include a response rate up front, if not in the abstract.) I wish I had more time to respond to the many reasonable points you’ve raised, and will try to come back to this in the next few days if I do have time, but I’ve written up a few thoughts here.
Note that we didn’t tell them the topic that specifically.
Tried sending them $100 last year and if anything it lowered the response rate.
If you are inclined to dismiss this based on your premise “many AI researchers just don’t seem too concerned about the risks posed by AI”, I’m curious where you get that view from, and why you think it is a less biased source.
I understand that, and think this was the right call. But there seems to be consensus that in general, a response rate below ~70% introduces concerns of non-response bias, and when you’re at 15%—with (imo) good reason to think there would be non-response bias—you really cannot rule this out. (Even basic stuff like: responders probably earn less money than non-responders, and are thus probably younger, work in academia rather than industry, etc.; responders are more likely to be familiar with the prior AI Impacts survey, and all that that entails; and so on.) In short, there is a reason many medical journals have a policy of not publishing surveys with response rates below 60%; e.g., JAMA asks for >60%, less prestigious JAMA journals also ask for >60%, and BMJ asks for >65%. (I cite medical journals because their policies are the ones I’m most familiar with, not because I think there’s something special about medical journals.)
I find it a bit hard to believe that this lowered response rates (was this statistically significant?), although I would buy that it didn’t increase response rates much, since I think I remember reading that response rates fall off pretty quickly as compensation for survey respondents increases. I also appreciate that you’re studying a high-earning group of experts, making it difficult to incentivize participation. That said, my reaction to this is: determine what the higher-order goals of this kind of project are, and adopt a methodology that aligns with that. I have a hard time believing that at this price point, conducting a survey with a 15% response rate is the optimal methodology.
My impression stems from conversations I’ve had with two CS professor friends about how concerned the CS community is about the risks posed by AI. For instance, last week, I was discussing the last AI Impacts survey with a CS professor (who has conducted surveys, as have I); I was defending the survey, and they were criticizing it for reasons similar to those outlined above. They said something to the effect of: the AI Impacts survey results do not align with my impression of people’s level of concern based on discussions I’ve had with friends and colleagues in the field. And I took that seriously, because this friend is EA-adjacent; extremely competent, careful, and trustworthy; and themselves sympathetic to concerns about AI risk. (I recognize I’m not giving you enough information for this to be at all worth updating on for you, but I’m just trying to give some context for my own skepticism, since you asked.)
Lastly, as someone immersed in the EA community myself, I think my bias is—if anything—in the direction of wanting to believe these results, but I just don’t think I should update much based on a survey with such a low response rate.
I think this is going to be my last word on the issue, since I suspect we’d need to delve more deeply into the literature on non-response bias/response rates to progress this discussion, and I don’t really have time to do that, but if you/others want to, I would definitely be eager to learn more.
Just to give your final point some context: the average in-depth research project by Rethink Priorities reportedly costs $70K-$100K. So, if this AI Impacts survey cost $138K in participant compensation, plus some additional amount for things like researcher time, then it looks like this survey was two or three times more expensive than the average research project in its approximate reference class.
I haven’t thought hard about whether the costs of EA-funded research make sense in general, but I thought I’d leave this comment so that readers don’t go away thinking that this survey cost like an order of magnitude more than what’s standard.
Note that 4,000 * $100 is $400,000 which is higher than the cost you cited above.
FWIW, both of these costs seem pretty small to me.
No, because the response rate wouldn’t be 100%; even if it doubled to 30% (which I doubt it would), the cost would still be lower ($120k).