It can seem strange that people act decisively about speculative things. So the first piece to understand is expected value: if something would be extremely important if it happened, then you can place quite low probability on it and still have warrant to act on it. (This is sometimes accused of being a decision-theory “mugging”, but it isn’t: we’re talking about subjective probabilities in the range of 1% − 10%, not infinitesimals like those involved in Pascal’s mugging.)
I think the most-defensible outside-view argument is: it could happen soon; it could be dangerous; aligning it could be very hard; and the product of these probabilities is not low enough to ignore.
1. When you survey general AI experts (not just safety or AGI people), they give a very wide distribution of predicting when we will have human-level AI (HLAI), with a central tendency of “10% chance of human-level AI… in the 2020s or 2030s”. (This is weak evidence, since technology forecasting is very hard; these surveys are not random samples; but it seems like some evidence.)
2. We don’t know what the risk of HLAI being dangerous is, but we have a couple of analogous precedents:
* the human precedent for world domination through intelligence / combinatorial generalisation / cunning
* the human precedent for ‘inner optimisers’: evolution was heavily optimising for genetic fitness, but produced a system, us, which optimises for a very different objective (“fun”, or “status”, or “gratification” or some bundle of nonfitness things).
* goal space is much larger than the human-friendly part of goal space (suggesting that a random objective will not be human-friendly, which combined with assumptions about goal maximisation and instrumental drives implies that most goals could be dangerous) .
* there’s a common phenomenon of very stupid ML systems still developing “clever” unintended / hacky / dangerous behaviours
3. We don’t know how hard alignment is, so we don’t know how long it will take to solve. It may involve certain profound philosophical and mathematical questions, which have been worked on by some of the greatest thinkers for a long time. Here’s a nice nontechnical statement of the potential difficulty. Some AI safety researchers are actually quite optimistic about our prospects for solving alignment, even without EA intervention, and work on it to cover things like the “value lock-in” case instead of the x-risk case.
Just wanted to note that while I am quoted as being optimistic, I am still working on it specifically to cover the x-risk case and not the value lock-in case. (But certainly some people are working on the value lock-in case.)
(Also I think several people would disagree that I am optimistic, and would instead think I’m too pessimistic, e.g. I get the sense that I would be on the pessimistic side at FHI.)
Also, for posterity, there’s some interesting discussion of that interview with Rohin here.
And some other takes on “Why AI risk might be solved without additional intervention from longtermists” are summarised, and then discussed in the comments, here.
But very much in line with technicalities’ comment, it’s of course totally possible to believe that AI risk will probably be solved without additional intervention from longtermists, and yet still think that serious effort should go into raising that probability further.
Great quote from The Precipice on that general idea, in the context of nuclear weapons:
In 1939, Enrico Fermi told Szilard the chain reaction was but a ‘remote possibility’ [...]
Fermi was asked to clarify the ‘remote possibility’ and ventured ‘ten percent’. Isidor Rabi, who was also present, replied, ‘Ten percent is not a remote possibility if it means that we may die of it. If I have pneumonia and the doctor tells me that there is a remote possibility that I might die, and it’s ten percent, I get excited about it’
It can seem strange that people act decisively about speculative things. So the first piece to understand is expected value: if something would be extremely important if it happened, then you can place quite low probability on it and still have warrant to act on it. (This is sometimes accused of being a decision-theory “mugging”, but it isn’t: we’re talking about subjective probabilities in the range of 1% − 10%, not infinitesimals like those involved in Pascal’s mugging.)
I think the most-defensible outside-view argument is: it could happen soon; it could be dangerous; aligning it could be very hard; and the product of these probabilities is not low enough to ignore.
1. When you survey general AI experts (not just safety or AGI people), they give a very wide distribution of predicting when we will have human-level AI (HLAI), with a central tendency of “10% chance of human-level AI… in the 2020s or 2030s”. (This is weak evidence, since technology forecasting is very hard; these surveys are not random samples; but it seems like some evidence.)
2. We don’t know what the risk of HLAI being dangerous is, but we have a couple of analogous precedents:
* the human precedent for world domination through intelligence / combinatorial generalisation / cunning
* the human precedent for ‘inner optimisers’: evolution was heavily optimising for genetic fitness, but produced a system, us, which optimises for a very different objective (“fun”, or “status”, or “gratification” or some bundle of nonfitness things).
* goal space is much larger than the human-friendly part of goal space (suggesting that a random objective will not be human-friendly, which combined with assumptions about goal maximisation and instrumental drives implies that most goals could be dangerous) .
* there’s a common phenomenon of very stupid ML systems still developing “clever” unintended / hacky / dangerous behaviours
3. We don’t know how hard alignment is, so we don’t know how long it will take to solve. It may involve certain profound philosophical and mathematical questions, which have been worked on by some of the greatest thinkers for a long time. Here’s a nice nontechnical statement of the potential difficulty. Some AI safety researchers are actually quite optimistic about our prospects for solving alignment, even without EA intervention, and work on it to cover things like the “value lock-in” case instead of the x-risk case.
Just wanted to note that while I am quoted as being optimistic, I am still working on it specifically to cover the x-risk case and not the value lock-in case. (But certainly some people are working on the value lock-in case.)
(Also I think several people would disagree that I am optimistic, and would instead think I’m too pessimistic, e.g. I get the sense that I would be on the pessimistic side at FHI.)
Also, for posterity, there’s some interesting discussion of that interview with Rohin here.
And some other takes on “Why AI risk might be solved without additional intervention from longtermists” are summarised, and then discussed in the comments, here.
But very much in line with technicalities’ comment, it’s of course totally possible to believe that AI risk will probably be solved without additional intervention from longtermists, and yet still think that serious effort should go into raising that probability further.
Great quote from The Precipice on that general idea, in the context of nuclear weapons: