This is just a first impression, but I’m curious about what seems a crucial point—that your beliefs seem to imply extremely high confidence of either general AI not happening this century, or that AGI will go ‘well’ by default. I’m very curious to see what guides your intuition there, or if there’s some other way that first-pass impression is wrong.
I’m curious about similar arguments that apply to bio & other plausible x-risks too, given what’s implied by low x-risk credence
The general background worldview that motivates this credence is that predicting the future is very hard, and we have almost no evidence that we can do it well. (Caveat I don’t think we have great evidence that we can’t do it either, though.) When it comes to short-term forecasting, the best strategy is to use reference-class forecasting (‘outside view’ reasoning; often continuing whatever trend has occurred in the past), and make relatively small adjustments based on inside-view reasoning. In the absence of anything better, I think we should do the same for long-term forecasts too. (Zach Groff is working on a paper making this case in more depth).
So when I look to predict the next hundred years, say, I think about how the past 100 years has gone (as well as giving consideration to how the last 1000 years and 10,000 years (etc) have gone). When you ask me about how AI will go, as a best guess I continue the centuries-long trend of automation of both physical and intellectual labour; in the particular context of AI I continue the trend where within a task, or task-category, the jump from significantly sub-human to vastly-greater-than-human level performance is rapid (on the order of years), but progress from one category of task to another (e.g. from chess to Go) goes rather slowly, as different tasks seem to differ from each other by orders of magnitude in terms of how difficult they are to automate. So I expect progress in AI to be gradual.
Then I also expect future AI systems to be narrow rather than general. When I look at the history of tech progress, I almost always see the creation of specific, highly optimised and generally very narrow tools, and very rarely the creation of general-purpose systems like general-purpose factories. And in general, when general-purpose tools are developed, they are worse than narrow tools on any given dimension: a swiss army knife is a crappier knife, bottle opener, saw, etc than any of those things individually. The current development of AI systems don’t give me any reason to think that AI is different: they’ve been very narrow to date; and when they’ve attempted to do things that are somewhat more general, like driving a car, progress has been slow and gradual, suffering from major difficulties in dealing with unusual situations.
Finally, I expect the development of any new technology to be safe by default. As an intuition pump: suppose there was some new design of bomb and BAE Systems decided to build it. There were, however, some arguments that the new design was unstable, and that if designed badly the bomb would kill everyone in the company, including the designers, the CEO, the board, and all their families. These arguments have been made in the media and the designers and the companies were aware of them. What odds do you put on BAE Systems building the bomb wrong and blowing themselves up? I’d put it very low — certainly less than 1%, and probably less than 0.1%. That would be true even if BAE Systems were in a race with Lockheed Martin to be the first to market. People in general really want to avoid dying, so there’s a huge incentive (a willingness-to-pay measured in the trillions of dollars for the USA alone) to ensure that AI doesn’t kill everyone. And when I look at other technological developments I see society being very risk averse and almost never taking major risks—a combination of public opinion and regulation means that things go slow and safe; again, self-driving cars are an example.
For each of these views, I’m very happy to acknowledge that maybe AI is different. And, when we’re talking about what could be the most important event ever, the possibility of some major discontinuity is really worth guarding against. But discontinuity is not my mainline prediction of what will happen.
(Later edit: I worry that the text above might have conveyed the idea that I’m just ignoring the Yudkowsky/Bostrom arguments, which isn’t accurate. Instead, another factor in my change of view was placing less weight on the Y-B arguments because of: (i) finding the arguments that we’ll get discontinuous progress in AI a lot less compelling than I used to (e.g. see here and here); (ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
Finally, I expect the development of any new technology to be safe by default.
The argument you give in this paragraph only makes sense if “safe” is defined as “not killing everyone” or “avoids risks that most people care about”. But what about “safe” as in “not causing differential intellectual progress in a wrong direction, which can lead to increased x-risks in the long run” or “protecting against or at least not causing value drift so that civilization will optimize for the ‘right’ values in the long run, whatever the appropriate meaning of that is”?
If short-term extinction risk (and in general risks that most people care about) is small compared to other kinds of existential risks, it would seem to make sense for longtermists to focus their efforts more on the latter.
(ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
I’d be happy to read more about this point.
If we end up with powerful deep learning models that optimize a given objective extremely well, the main arguments in Superintelligence seem to go through.
(If we end up with powerful deep learning models that do NOT optimize a given objective, it seems to me plausible that x-risks from AI are more severe, rather than less.)
[EDIT: replaced “a specified objective function” with “a given objective”]
Why do his beliefs imply extremely high confidence? Why do the higher estimates from other people not imply that? I’m curious what’s going on here epistemologically.
If you believe “<1% X”, that implies “>99% ¬X”, so you should believe that too. But if you think >99% ¬X seems too confident, then you should modus tollens and moderate your <1% X belief. When other people give e.g. 30% X, that only implies 70% ¬X, which seems more justifiable to me.
I use AGI as an example just because if it happens, it seems more obviously transformative & existential than biorisk, where it’s harder to reason about whether people survive. And because Will’s views seem to diverge quite strongly from average or median predictions in the ML community, not that I’d read all too much into that. Perhaps further, many people in the EA community believe there’s good reason to think those predictions are too conservative if anything, and have arguments for significant probability of AGI in the next couple decades, let alone century.
Since Will’s implied belief is >99% no xrisk this century, this either means AGI won’t happen, or that it has a very high probability of going well (getting or preserving most of the possible value in the future, which seems the most useful definition of existential for EA purposes). That’s at first glance of course, so not wanting the whole book, just want an intuition for how you seem to get such high confidence ¬X, especially when it seems to me there’s some plausible evidence for X.
I disagree with your implicit claim that Will’s views (which I mostly agree with) constitute an extreme degree of confidence. I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution”.
That base rate seems pretty low. And that’s not actually what we’re talking about—we’re talking about AGI, a specific future technology. In the absense of further evidence, a prior of <10% on “AGI takeoff this century” seems not unreasonable to me. (You could, of course, believe that there is concrete evidence on AGI to justify different credences.)
On a different note, I sometimes find the terminology of “no x-risk”, “going well” etc. unhelpful. It seems more useful to me to talk about concrete outcomes and separate this from normative judgments. For instance, I believe that extinction through AI misalignment is very unlikely. However, I’m quite uncertain about whether people in 2019, if you handed them a crystal ball that shows what will happen (regarding AI), would generally think that things are “going well”, e.g. because people might disapprove of value drift or influence drift. (The future will plausibly be quite alien to us in many ways.) And finally, in terms to my personal values, the top priority is to avoid risks of astronomical suffering (s-risks), which is another matter altogether. But I wouldn’t equate this with things “going well”, as that’s a normative judgment and I think EA should be as inclusive as possible towards different moral perspectives.
Maybe one source of confusion here is that the word “extreme” can be used either to say that someone’s credence is above (or below) a certain level/number (without any value judgement concerning whether that’s sensible) or to say that it’s implausibly high/low.
One possible conclusion would be to just taboo the word “extreme” in this context.
Agree, tried to add more clarification below. I’ll try to avoid this going forward, maybe unsuccessfully.
Tbh, I mean a bit of both definitions (Will’s views are quite surprising to me, which is why I want to know more), but mostly the former (i.e. stating it’s close to 0% or 100%).
I sometimes find the terminology of “no x-risk”, “going well” etc.
Agree on “going well” being under-defined. I was mostly using that for brevity, but probably more confusion than it’s worth. A definition I might use is “preserves the probability of getting to the best possible futures”, or even better if it increases that probability. Mainly because from an EA perspective (even if people are around) if we’ve locked in a substantially suboptimal moral situation, we’ve effectively lost most possible value—which I’d call x-risk.
The main point was fairly object-level—Will’s beliefs imply it’s near 1% likelihood of AGI in 100 years, or near 99% likelihood of it “not reducing the probability of the best possible futures”, or some combination like <10% likelihood of AGI in 100 years AND even if we get it, >90% likelihood of it not negatively influencing the probability of the best possible futures. Any of these sound somewhat implausible to me, so I’m curious for the intuition behind whichever one Will believes.
I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution
Def agree. Things-like-this shouldn’t be approached with a 50-50 prior—throw me in another century & I think <5% likelihood of AGI, the Industrial Revolution, etc is very reasonable on priors. I just think that probability can shift relatively quickly in response to observations. For the industrial revolution, that might be when you’ve already had the agricultural revolution (so a smallish fraction of the population can grow enough food for everyone), you get engines working well & relatively affordably, you have large-scale political stability for a while s.t. you can interact peacefully with millions of other people, you have proto-capitalism where you can produce/sell things & reasonably expect to make money doing so, etc. At that point, from an inside view, it feels like “we can use machines & spare labor to produce a lot more stuff per person, and we can make lots of money off producing a lot of stuff, so people will start doing that more” is a reasonable position. So those would shift me from single digits or less, to at least >20% on the industrial revolution in that century, probably more but discounting for hindsight bias. (I don’t know if this is a useful comparison, just using since you mentioned & does seem similar in some ways where base rate is low, but it did eventually happen).
For AI, these seem relevant: when you have a plausible physical substrate, have better predictive models for what the brain does (connectionism & refinements seem plausible & have been fairly successful over the last few decades despite being unpopular initially), start to see how comparably long-evolved mechanisms work & duplicate some of them, reach super-human performance on some tasks historically considered hard/ requiring great intelligence, have physical substrate reaching scales that seem comparable to the brain, etc.
In any case, these are getting a bit far from my original thought, which was just which of those situations w.r.t. AGI does Will believe & some intuition for why
I’d usually want to modify my definition of “well” to “preserves the probability of getting to the best possible futures AND doesn’t increase the probability of the worst possible futures”, but that’s a bit more verbose.
This is just a first impression, but I’m curious about what seems a crucial point—that your beliefs seem to imply extremely high confidence of either general AI not happening this century, or that AGI will go ‘well’ by default. I’m very curious to see what guides your intuition there, or if there’s some other way that first-pass impression is wrong.
I’m curious about similar arguments that apply to bio & other plausible x-risks too, given what’s implied by low x-risk credence
The general background worldview that motivates this credence is that predicting the future is very hard, and we have almost no evidence that we can do it well. (Caveat I don’t think we have great evidence that we can’t do it either, though.) When it comes to short-term forecasting, the best strategy is to use reference-class forecasting (‘outside view’ reasoning; often continuing whatever trend has occurred in the past), and make relatively small adjustments based on inside-view reasoning. In the absence of anything better, I think we should do the same for long-term forecasts too. (Zach Groff is working on a paper making this case in more depth).
So when I look to predict the next hundred years, say, I think about how the past 100 years has gone (as well as giving consideration to how the last 1000 years and 10,000 years (etc) have gone). When you ask me about how AI will go, as a best guess I continue the centuries-long trend of automation of both physical and intellectual labour; in the particular context of AI I continue the trend where within a task, or task-category, the jump from significantly sub-human to vastly-greater-than-human level performance is rapid (on the order of years), but progress from one category of task to another (e.g. from chess to Go) goes rather slowly, as different tasks seem to differ from each other by orders of magnitude in terms of how difficult they are to automate. So I expect progress in AI to be gradual.
Then I also expect future AI systems to be narrow rather than general. When I look at the history of tech progress, I almost always see the creation of specific, highly optimised and generally very narrow tools, and very rarely the creation of general-purpose systems like general-purpose factories. And in general, when general-purpose tools are developed, they are worse than narrow tools on any given dimension: a swiss army knife is a crappier knife, bottle opener, saw, etc than any of those things individually. The current development of AI systems don’t give me any reason to think that AI is different: they’ve been very narrow to date; and when they’ve attempted to do things that are somewhat more general, like driving a car, progress has been slow and gradual, suffering from major difficulties in dealing with unusual situations.
Finally, I expect the development of any new technology to be safe by default. As an intuition pump: suppose there was some new design of bomb and BAE Systems decided to build it. There were, however, some arguments that the new design was unstable, and that if designed badly the bomb would kill everyone in the company, including the designers, the CEO, the board, and all their families. These arguments have been made in the media and the designers and the companies were aware of them. What odds do you put on BAE Systems building the bomb wrong and blowing themselves up? I’d put it very low — certainly less than 1%, and probably less than 0.1%. That would be true even if BAE Systems were in a race with Lockheed Martin to be the first to market. People in general really want to avoid dying, so there’s a huge incentive (a willingness-to-pay measured in the trillions of dollars for the USA alone) to ensure that AI doesn’t kill everyone. And when I look at other technological developments I see society being very risk averse and almost never taking major risks—a combination of public opinion and regulation means that things go slow and safe; again, self-driving cars are an example.
For each of these views, I’m very happy to acknowledge that maybe AI is different. And, when we’re talking about what could be the most important event ever, the possibility of some major discontinuity is really worth guarding against. But discontinuity is not my mainline prediction of what will happen.
(Later edit: I worry that the text above might have conveyed the idea that I’m just ignoring the Yudkowsky/Bostrom arguments, which isn’t accurate. Instead, another factor in my change of view was placing less weight on the Y-B arguments because of: (i) finding the arguments that we’ll get discontinuous progress in AI a lot less compelling than I used to (e.g. see here and here); (ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
The argument you give in this paragraph only makes sense if “safe” is defined as “not killing everyone” or “avoids risks that most people care about”. But what about “safe” as in “not causing differential intellectual progress in a wrong direction, which can lead to increased x-risks in the long run” or “protecting against or at least not causing value drift so that civilization will optimize for the ‘right’ values in the long run, whatever the appropriate meaning of that is”?
If short-term extinction risk (and in general risks that most people care about) is small compared to other kinds of existential risks, it would seem to make sense for longtermists to focus their efforts more on the latter.
I agree re value-drift and societal trajectory worries, and do think that work on AI is plausibly a good lever to positively affect them.
I’d be happy to read more about this point.
If we end up with powerful deep learning models that optimize a given objective extremely well, the main arguments in Superintelligence seem to go through.
(If we end up with powerful deep learning models that do NOT optimize a given objective, it seems to me plausible that x-risks from AI are more severe, rather than less.)
[EDIT: replaced “a specified objective function” with “a given objective”]
Why do his beliefs imply extremely high confidence? Why do the higher estimates from other people not imply that? I’m curious what’s going on here epistemologically.
If you believe “<1% X”, that implies “>99% ¬X”, so you should believe that too. But if you think >99% ¬X seems too confident, then you should modus tollens and moderate your <1% X belief. When other people give e.g. 30% X, that only implies 70% ¬X, which seems more justifiable to me.
I use AGI as an example just because if it happens, it seems more obviously transformative & existential than biorisk, where it’s harder to reason about whether people survive. And because Will’s views seem to diverge quite strongly from average or median predictions in the ML community, not that I’d read all too much into that. Perhaps further, many people in the EA community believe there’s good reason to think those predictions are too conservative if anything, and have arguments for significant probability of AGI in the next couple decades, let alone century.
Since Will’s implied belief is >99% no xrisk this century, this either means AGI won’t happen, or that it has a very high probability of going well (getting or preserving most of the possible value in the future, which seems the most useful definition of existential for EA purposes). That’s at first glance of course, so not wanting the whole book, just want an intuition for how you seem to get such high confidence ¬X, especially when it seems to me there’s some plausible evidence for X.
I disagree with your implicit claim that Will’s views (which I mostly agree with) constitute an extreme degree of confidence. I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution”.
That base rate seems pretty low. And that’s not actually what we’re talking about—we’re talking about AGI, a specific future technology. In the absense of further evidence, a prior of <10% on “AGI takeoff this century” seems not unreasonable to me. (You could, of course, believe that there is concrete evidence on AGI to justify different credences.)
On a different note, I sometimes find the terminology of “no x-risk”, “going well” etc. unhelpful. It seems more useful to me to talk about concrete outcomes and separate this from normative judgments. For instance, I believe that extinction through AI misalignment is very unlikely. However, I’m quite uncertain about whether people in 2019, if you handed them a crystal ball that shows what will happen (regarding AI), would generally think that things are “going well”, e.g. because people might disapprove of value drift or influence drift. (The future will plausibly be quite alien to us in many ways.) And finally, in terms to my personal values, the top priority is to avoid risks of astronomical suffering (s-risks), which is another matter altogether. But I wouldn’t equate this with things “going well”, as that’s a normative judgment and I think EA should be as inclusive as possible towards different moral perspectives.
Maybe one source of confusion here is that the word “extreme” can be used either to say that someone’s credence is above (or below) a certain level/number (without any value judgement concerning whether that’s sensible) or to say that it’s implausibly high/low.
One possible conclusion would be to just taboo the word “extreme” in this context.
Agree, tried to add more clarification below. I’ll try to avoid this going forward, maybe unsuccessfully.
Tbh, I mean a bit of both definitions (Will’s views are quite surprising to me, which is why I want to know more), but mostly the former (i.e. stating it’s close to 0% or 100%).
Agree on “going well” being under-defined. I was mostly using that for brevity, but probably more confusion than it’s worth. A definition I might use is “preserves the probability of getting to the best possible futures”, or even better if it increases that probability. Mainly because from an EA perspective (even if people are around) if we’ve locked in a substantially suboptimal moral situation, we’ve effectively lost most possible value—which I’d call x-risk.
The main point was fairly object-level—Will’s beliefs imply it’s near 1% likelihood of AGI in 100 years, or near 99% likelihood of it “not reducing the probability of the best possible futures”, or some combination like <10% likelihood of AGI in 100 years AND even if we get it, >90% likelihood of it not negatively influencing the probability of the best possible futures. Any of these sound somewhat implausible to me, so I’m curious for the intuition behind whichever one Will believes.
Def agree. Things-like-this shouldn’t be approached with a 50-50 prior—throw me in another century & I think <5% likelihood of AGI, the Industrial Revolution, etc is very reasonable on priors. I just think that probability can shift relatively quickly in response to observations. For the industrial revolution, that might be when you’ve already had the agricultural revolution (so a smallish fraction of the population can grow enough food for everyone), you get engines working well & relatively affordably, you have large-scale political stability for a while s.t. you can interact peacefully with millions of other people, you have proto-capitalism where you can produce/sell things & reasonably expect to make money doing so, etc. At that point, from an inside view, it feels like “we can use machines & spare labor to produce a lot more stuff per person, and we can make lots of money off producing a lot of stuff, so people will start doing that more” is a reasonable position. So those would shift me from single digits or less, to at least >20% on the industrial revolution in that century, probably more but discounting for hindsight bias. (I don’t know if this is a useful comparison, just using since you mentioned & does seem similar in some ways where base rate is low, but it did eventually happen).
For AI, these seem relevant: when you have a plausible physical substrate, have better predictive models for what the brain does (connectionism & refinements seem plausible & have been fairly successful over the last few decades despite being unpopular initially), start to see how comparably long-evolved mechanisms work & duplicate some of them, reach super-human performance on some tasks historically considered hard/ requiring great intelligence, have physical substrate reaching scales that seem comparable to the brain, etc.
In any case, these are getting a bit far from my original thought, which was just which of those situations w.r.t. AGI does Will believe & some intuition for why
I’d usually want to modify my definition of “well” to “preserves the probability of getting to the best possible futures AND doesn’t increase the probability of the worst possible futures”, but that’s a bit more verbose.