I find it unfortunate that people aren’t using a common scale for estimating AI risk, which makes it hard to integrate different people’s estimates, or even figure out who is relatively more optimistic or pessimistic. For example here’s you (Tobias):
My inside view puts ~90% probability on successful alignment (by which I mean narrow alignment as defined below). Factoring in the views of other thoughtful people, some of which think alignment is far less likely, that number comes down to ~80%.
Robert Wiblin, based on interviews with Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, Daniel Dewey:
We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%.
I think there is a >1/3 chance that AI will be solidly superhuman within 20 subjective years, and that in those scenarios alignment destroys maybe 20% of the total value of the future
It seems to me that Robert’s estimate is low relative to your inside view and Paul’s, since you’re both talking about failures of narrow alignment (“intent alignment” in Paul’s current language), while Robert’s “serious catastrophe caused by machine intelligence” seems much broader. But you update towards much higher risk based on “other thoughtful people” which makes me think that either your “other thoughtful people” or Robert’s interviewees are not representative, or I’m confused about who is actually more optimistic or pessimistic. Either way it seems like there’s some very valuable work to be done in coming up with a standard measure of AI risk and clarifying people’s actual opinions.
“Between 1 and 10%” also feels surprisingly low to me for general AI-related catastrophes. I at least would have thought that experts are less optimistic than that.
But pending clarification, I wouldn’t put much weight on this estimate given that the interviews mentioned in the 80k problem area profile you link to seemed to be about informing the entire problem profile rather than this estimate specifically. So it’s not clear e.g. whether the interviews included a question about all-things-considered risk for AI-related catastrophe that was asked to Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, and Daniel Dewey.
Great point – I agree that it would be value to have a common scale.
I’m a bit surprised by the 1-10% estimate. This seems very low, especially given that “serious catastrophe caused by machine intelligence” is broader than narrow alignment failure. If we include possibilities like serious value drift as new technologies emerge, or difficult AI-related cooperation and security problems, or economic dynamics riding roughshod over human values, then I’d put much more than 10% (plausibly more than 50%) on something not going well.
Regarding the “other thoughtful people” in my 80% estimate: I think it’s very unclear who exactly one should update towards. What I had in mind is that many EAs who have thought about this appear to not have high confidence in successful narrow alignment (not clear if the median is >50%?), judging based on my impressions from interacting with people (which is obviously not representative). I felt that my opinion is quite contrarian relative to this, which is why I felt that I should be less confident than the inside view suggests, although as you say it’s quite hard to grasp what people’s opinions actually are.
On the other hand, one possible interpretation (but not the only one) of the relatively low level of concern for AI risk among the larger AI community and societal elites is that people are quite optimistic that “we’ll know how to cross that bridge once we get to it”.
I’m a bit surprised by the 1-10% estimate. This seems very low, especially given that “serious catastrophe caused by machine intelligence” is broader than narrow alignment failure.
Yeah, it’s also much lower than my inside view, as well as what I thought a group of such interviewees would say. Aside from Lukas’s explanation, I think maybe 1) the interviewees did not want to appear too alarmist (either personally or for EA as a whole) or 2) they weren’t reporting their inside views but instead giving their estimates after updating towards others who have much lower risk estimates. Hopefully Robert Wiblin will see my email at some point and chime in with details of how the 1-10% figure was arrived at.
I find it unfortunate that people aren’t using a common scale for estimating AI risk, which makes it hard to integrate different people’s estimates, or even figure out who is relatively more optimistic or pessimistic. For example here’s you (Tobias):
Robert Wiblin, based on interviews with Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, Daniel Dewey:
Paul Christiano:
It seems to me that Robert’s estimate is low relative to your inside view and Paul’s, since you’re both talking about failures of narrow alignment (“intent alignment” in Paul’s current language), while Robert’s “serious catastrophe caused by machine intelligence” seems much broader. But you update towards much higher risk based on “other thoughtful people” which makes me think that either your “other thoughtful people” or Robert’s interviewees are not representative, or I’m confused about who is actually more optimistic or pessimistic. Either way it seems like there’s some very valuable work to be done in coming up with a standard measure of AI risk and clarifying people’s actual opinions.
“Between 1 and 10%” also feels surprisingly low to me for general AI-related catastrophes. I at least would have thought that experts are less optimistic than that.
But pending clarification, I wouldn’t put much weight on this estimate given that the interviews mentioned in the 80k problem area profile you link to seemed to be about informing the entire problem profile rather than this estimate specifically. So it’s not clear e.g. whether the interviews included a question about all-things-considered risk for AI-related catastrophe that was asked to Nick Bostrom, an anonymous leading professor of computer science, Jaan Tallinn, Jan Leike, Miles Brundage, Nate Soares, and Daniel Dewey.
Good point, I’ll send a message to Robert Wiblin asking for clarification.
Great point – I agree that it would be value to have a common scale.
I’m a bit surprised by the 1-10% estimate. This seems very low, especially given that “serious catastrophe caused by machine intelligence” is broader than narrow alignment failure. If we include possibilities like serious value drift as new technologies emerge, or difficult AI-related cooperation and security problems, or economic dynamics riding roughshod over human values, then I’d put much more than 10% (plausibly more than 50%) on something not going well.
Regarding the “other thoughtful people” in my 80% estimate: I think it’s very unclear who exactly one should update towards. What I had in mind is that many EAs who have thought about this appear to not have high confidence in successful narrow alignment (not clear if the median is >50%?), judging based on my impressions from interacting with people (which is obviously not representative). I felt that my opinion is quite contrarian relative to this, which is why I felt that I should be less confident than the inside view suggests, although as you say it’s quite hard to grasp what people’s opinions actually are.
On the other hand, one possible interpretation (but not the only one) of the relatively low level of concern for AI risk among the larger AI community and societal elites is that people are quite optimistic that “we’ll know how to cross that bridge once we get to it”.
Yeah, it’s also much lower than my inside view, as well as what I thought a group of such interviewees would say. Aside from Lukas’s explanation, I think maybe 1) the interviewees did not want to appear too alarmist (either personally or for EA as a whole) or 2) they weren’t reporting their inside views but instead giving their estimates after updating towards others who have much lower risk estimates. Hopefully Robert Wiblin will see my email at some point and chime in with details of how the 1-10% figure was arrived at.