Lots! Treat all of the following as ‘things Will casually said in conversation’ rather than ‘Will is dying on this hill’ (I’m worried about how messages travel and transmogrify, and I wouldn’t be surprised if I changed lots of these views again in the near future!). But some things include:
I think existential risk this century is much lower than I used to think — I used to put total risk this century at something like 20%; now I’d put it at less than 1%.
I find ‘takeoff’ scenarios from AI over the next century much less likely than I used to. (Fast takeoff in particular, but even the idea of any sort of ‘takeoff’, understood in terms of moving to a higher growth mode, rather than progress in AI just continuing existing two-century-long trends in automation.) I’m not sure what numbers I’d have put on this previously, but I’d now put medium and fast takeoff (e.g. that in the next century we have a doubling of global GDP in a 6 month period because of progress in AI) at less than 10%.
In general, I think it’s much less likely that we’re at a super-influential time in history; my next blog post will be about this idea
I’m much more worried about a great power war in my lifetime than I was a couple of years ago. (Because of thinking about the base rate of war, not because of recent events.)
I find (non-extinction) trajectory change more compelling as a way of influencing the long-run future than I used to.
I’m much more sceptical about our current level of understanding about how to influence the long-run future than I was before, and think its more likely than I did before that EAs in 50 years will think that EAs of today were badly mistaken.
I’m more interested than I was in getting other people’s incentives right with respect to long-run outcomes, as compared to just trying to aim for good long-run outcomes directly. So for example, I’m more interested in institutional changes than I was, including intergovernmental institutions and design of world government, and space law.
I’m much more sympathetic to the idea of giving later (and potentially much later) than I was before.
On the more philosophical end:
I’m no longer convinced of naturalism as a metaphysical view, where by ‘naturalism’ I mean the view that everything that exists exists in space-time. (So now, e.g., I think that numbers and properties exist, and I no longer see what supports the idea that everything that exists must be spatio-temporal).
I haven’t really worked it through, but I probably have a pretty different take on the right theoretical approach to moral uncertainty than I used to have. (This would take a while to explain, and wouldn’t have major practical implications, but it’s different than the broad view I defend in the book and my PhD.)
This is just a first impression, but I’m curious about what seems a crucial point—that your beliefs seem to imply extremely high confidence of either general AI not happening this century, or that AGI will go ‘well’ by default. I’m very curious to see what guides your intuition there, or if there’s some other way that first-pass impression is wrong.
I’m curious about similar arguments that apply to bio & other plausible x-risks too, given what’s implied by low x-risk credence
The general background worldview that motivates this credence is that predicting the future is very hard, and we have almost no evidence that we can do it well. (Caveat I don’t think we have great evidence that we can’t do it either, though.) When it comes to short-term forecasting, the best strategy is to use reference-class forecasting (‘outside view’ reasoning; often continuing whatever trend has occurred in the past), and make relatively small adjustments based on inside-view reasoning. In the absence of anything better, I think we should do the same for long-term forecasts too. (Zach Groff is working on a paper making this case in more depth).
So when I look to predict the next hundred years, say, I think about how the past 100 years has gone (as well as giving consideration to how the last 1000 years and 10,000 years (etc) have gone). When you ask me about how AI will go, as a best guess I continue the centuries-long trend of automation of both physical and intellectual labour; in the particular context of AI I continue the trend where within a task, or task-category, the jump from significantly sub-human to vastly-greater-than-human level performance is rapid (on the order of years), but progress from one category of task to another (e.g. from chess to Go) goes rather slowly, as different tasks seem to differ from each other by orders of magnitude in terms of how difficult they are to automate. So I expect progress in AI to be gradual.
Then I also expect future AI systems to be narrow rather than general. When I look at the history of tech progress, I almost always see the creation of specific, highly optimised and generally very narrow tools, and very rarely the creation of general-purpose systems like general-purpose factories. And in general, when general-purpose tools are developed, they are worse than narrow tools on any given dimension: a swiss army knife is a crappier knife, bottle opener, saw, etc than any of those things individually. The current development of AI systems don’t give me any reason to think that AI is different: they’ve been very narrow to date; and when they’ve attempted to do things that are somewhat more general, like driving a car, progress has been slow and gradual, suffering from major difficulties in dealing with unusual situations.
Finally, I expect the development of any new technology to be safe by default. As an intuition pump: suppose there was some new design of bomb and BAE Systems decided to build it. There were, however, some arguments that the new design was unstable, and that if designed badly the bomb would kill everyone in the company, including the designers, the CEO, the board, and all their families. These arguments have been made in the media and the designers and the companies were aware of them. What odds do you put on BAE Systems building the bomb wrong and blowing themselves up? I’d put it very low — certainly less than 1%, and probably less than 0.1%. That would be true even if BAE Systems were in a race with Lockheed Martin to be the first to market. People in general really want to avoid dying, so there’s a huge incentive (a willingness-to-pay measured in the trillions of dollars for the USA alone) to ensure that AI doesn’t kill everyone. And when I look at other technological developments I see society being very risk averse and almost never taking major risks—a combination of public opinion and regulation means that things go slow and safe; again, self-driving cars are an example.
For each of these views, I’m very happy to acknowledge that maybe AI is different. And, when we’re talking about what could be the most important event ever, the possibility of some major discontinuity is really worth guarding against. But discontinuity is not my mainline prediction of what will happen.
(Later edit: I worry that the text above might have conveyed the idea that I’m just ignoring the Yudkowsky/Bostrom arguments, which isn’t accurate. Instead, another factor in my change of view was placing less weight on the Y-B arguments because of: (i) finding the arguments that we’ll get discontinuous progress in AI a lot less compelling than I used to (e.g. see here and here); (ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
Finally, I expect the development of any new technology to be safe by default.
The argument you give in this paragraph only makes sense if “safe” is defined as “not killing everyone” or “avoids risks that most people care about”. But what about “safe” as in “not causing differential intellectual progress in a wrong direction, which can lead to increased x-risks in the long run” or “protecting against or at least not causing value drift so that civilization will optimize for the ‘right’ values in the long run, whatever the appropriate meaning of that is”?
If short-term extinction risk (and in general risks that most people care about) is small compared to other kinds of existential risks, it would seem to make sense for longtermists to focus their efforts more on the latter.
(ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
I’d be happy to read more about this point.
If we end up with powerful deep learning models that optimize a given objective extremely well, the main arguments in Superintelligence seem to go through.
(If we end up with powerful deep learning models that do NOT optimize a given objective, it seems to me plausible that x-risks from AI are more severe, rather than less.)
[EDIT: replaced “a specified objective function” with “a given objective”]
Why do his beliefs imply extremely high confidence? Why do the higher estimates from other people not imply that? I’m curious what’s going on here epistemologically.
If you believe “<1% X”, that implies “>99% ¬X”, so you should believe that too. But if you think >99% ¬X seems too confident, then you should modus tollens and moderate your <1% X belief. When other people give e.g. 30% X, that only implies 70% ¬X, which seems more justifiable to me.
I use AGI as an example just because if it happens, it seems more obviously transformative & existential than biorisk, where it’s harder to reason about whether people survive. And because Will’s views seem to diverge quite strongly from average or median predictions in the ML community, not that I’d read all too much into that. Perhaps further, many people in the EA community believe there’s good reason to think those predictions are too conservative if anything, and have arguments for significant probability of AGI in the next couple decades, let alone century.
Since Will’s implied belief is >99% no xrisk this century, this either means AGI won’t happen, or that it has a very high probability of going well (getting or preserving most of the possible value in the future, which seems the most useful definition of existential for EA purposes). That’s at first glance of course, so not wanting the whole book, just want an intuition for how you seem to get such high confidence ¬X, especially when it seems to me there’s some plausible evidence for X.
I disagree with your implicit claim that Will’s views (which I mostly agree with) constitute an extreme degree of confidence. I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution”.
That base rate seems pretty low. And that’s not actually what we’re talking about—we’re talking about AGI, a specific future technology. In the absense of further evidence, a prior of <10% on “AGI takeoff this century” seems not unreasonable to me. (You could, of course, believe that there is concrete evidence on AGI to justify different credences.)
On a different note, I sometimes find the terminology of “no x-risk”, “going well” etc. unhelpful. It seems more useful to me to talk about concrete outcomes and separate this from normative judgments. For instance, I believe that extinction through AI misalignment is very unlikely. However, I’m quite uncertain about whether people in 2019, if you handed them a crystal ball that shows what will happen (regarding AI), would generally think that things are “going well”, e.g. because people might disapprove of value drift or influence drift. (The future will plausibly be quite alien to us in many ways.) And finally, in terms to my personal values, the top priority is to avoid risks of astronomical suffering (s-risks), which is another matter altogether. But I wouldn’t equate this with things “going well”, as that’s a normative judgment and I think EA should be as inclusive as possible towards different moral perspectives.
Maybe one source of confusion here is that the word “extreme” can be used either to say that someone’s credence is above (or below) a certain level/number (without any value judgement concerning whether that’s sensible) or to say that it’s implausibly high/low.
One possible conclusion would be to just taboo the word “extreme” in this context.
Agree, tried to add more clarification below. I’ll try to avoid this going forward, maybe unsuccessfully.
Tbh, I mean a bit of both definitions (Will’s views are quite surprising to me, which is why I want to know more), but mostly the former (i.e. stating it’s close to 0% or 100%).
I sometimes find the terminology of “no x-risk”, “going well” etc.
Agree on “going well” being under-defined. I was mostly using that for brevity, but probably more confusion than it’s worth. A definition I might use is “preserves the probability of getting to the best possible futures”, or even better if it increases that probability. Mainly because from an EA perspective (even if people are around) if we’ve locked in a substantially suboptimal moral situation, we’ve effectively lost most possible value—which I’d call x-risk.
The main point was fairly object-level—Will’s beliefs imply it’s near 1% likelihood of AGI in 100 years, or near 99% likelihood of it “not reducing the probability of the best possible futures”, or some combination like <10% likelihood of AGI in 100 years AND even if we get it, >90% likelihood of it not negatively influencing the probability of the best possible futures. Any of these sound somewhat implausible to me, so I’m curious for the intuition behind whichever one Will believes.
I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution
Def agree. Things-like-this shouldn’t be approached with a 50-50 prior—throw me in another century & I think <5% likelihood of AGI, the Industrial Revolution, etc is very reasonable on priors. I just think that probability can shift relatively quickly in response to observations. For the industrial revolution, that might be when you’ve already had the agricultural revolution (so a smallish fraction of the population can grow enough food for everyone), you get engines working well & relatively affordably, you have large-scale political stability for a while s.t. you can interact peacefully with millions of other people, you have proto-capitalism where you can produce/sell things & reasonably expect to make money doing so, etc. At that point, from an inside view, it feels like “we can use machines & spare labor to produce a lot more stuff per person, and we can make lots of money off producing a lot of stuff, so people will start doing that more” is a reasonable position. So those would shift me from single digits or less, to at least >20% on the industrial revolution in that century, probably more but discounting for hindsight bias. (I don’t know if this is a useful comparison, just using since you mentioned & does seem similar in some ways where base rate is low, but it did eventually happen).
For AI, these seem relevant: when you have a plausible physical substrate, have better predictive models for what the brain does (connectionism & refinements seem plausible & have been fairly successful over the last few decades despite being unpopular initially), start to see how comparably long-evolved mechanisms work & duplicate some of them, reach super-human performance on some tasks historically considered hard/ requiring great intelligence, have physical substrate reaching scales that seem comparable to the brain, etc.
In any case, these are getting a bit far from my original thought, which was just which of those situations w.r.t. AGI does Will believe & some intuition for why
I’d usually want to modify my definition of “well” to “preserves the probability of getting to the best possible futures AND doesn’t increase the probability of the worst possible futures”, but that’s a bit more verbose.
Very interesting points! I largely agree with your (new) views. Some thoughts:
If you think that extinction risk this century is less than 1%, then in particular, you think that extinction risk from transformative AI is less than 1%. So, for this to be consistent, you have to believe either
a) that it’s unlikely that transformative AI will be developed at all this century,
b) that transformative AI is unlikely to lead to extinction when it is developed, e.g. because it will very likely be aligned in at least a narrow sense. (I wrote up some arguments for this a while ago.)
Which of the two do you believe to what extent? For instance, if you put 10% on transformative AI this century – which is significantly more conservative than “median EA beliefs” – then you’d have to believe that the conditional probability of extinction is less than 10%. (I’m not saying I disagree – in fact, I believe something along these lines myself.)
What do you think about the possibility of a growth mode change (i.e. much faster pace of economic growth and probably also social change, comparable to the industrial revolution) for reasons other than AI? I feel that this is somewhat neglected in EA – would you agree with that?
--
I’d also be interested in more details on what these beliefs imply in terms of how we can improve the long-term future. I suppose you are now more sceptical about work on AI safety as the “default” long-termist intervention. But what is the alternative? Do you think we should focus on broad improvements to civilisation, such as better governance, working towards compromise and cooperation rather than conflict / war, or generally trying to make humanity more thoughtful and cautious about new technologies and the long-term future? These are uncontroversially good but not very neglected, and it seems hard to get a lot of leverage in this way. (Then again, maybe there is no way to get extraordinary leverage over the long-term future.)
Also, if we aren’t at a particularly influential point in time regarding AI, then I think that expanding the moral circle, or otherwise advocating for “better” values, may be among the best things we can do. What are your thoughts on that?
Thanks! I’ve read and enjoyed a number of your blog posts, and often found myself in agreement.
If you think that extinction risk this century is less than 1%, then in particular, you think that extinction risk from transformative AI is less than 1%. So, for this to be consistent, you have to believe either
a) that it’s unlikely that transformative AI will be developed at all this century,
b) that transformative AI is unlikely to lead to extinction when it is developed, e.g. because it will very likely be aligned in at least a narrow sense. (I wrote up some arguments for this a while ago.)
Which of the two do you believe to what extent? For instance, if you put 10% on transformative AI this century – which is significantly more conservative than “median EA beliefs” – then you’d have to believe that the conditional probability of extinction is less than 10%. (I’m not saying I disagree – in fact, I believe something along these lines myself.)
See my comment to nonn. I want to avoid putting numbers on those beliefs to avoid anchoring myself; but I find them both very likely—it’s not that one is much more likely than the other. (Where ‘transformative AI not developed this century’ includes ‘AI is not transformative’ in the sense that it doesn’t precipitate a new growth mode in the next century—this is certainly my mainline belief.)
What do you think about the possibility of a growth mode change (i.e. much faster pace of economic growth and probably also social change, comparable to the industrial revolution) for reasons other than AI? I feel that this is somewhat neglected in EA – would you agree with that?
Yes, I’d agree with that. There’s a lot of debate about the causes of the industrial revolution. Very few commentators point to some technological breakthrough as the cause, so it’s striking that people are inclined to point to a technological breakthrough in AI as the cause of the next growth mode transition. Instead, leading theories point to some resource overhang (‘colonies and coal’), or some innovation or change in institutions (more liberal laws and norms in England, or higher wages incentivising automation) or in culture. So perhaps there’s some novel governance system that could drive a higher growth mode, and that’ll be the decisive thing.
I’d also be interested in more details on what these beliefs imply in terms of how we can improve the long-term future. I suppose you are now more sceptical about work on AI safety as the “default” long-termist intervention. But what is the alternative? Do you think we should focus on broad improvements to civilisation, such as better governance, working towards compromise and cooperation rather than conflict / war, or generally trying to make humanity more thoughtful and cautious about new technologies and the long-term future? These are uncontroversially good but not very neglected, and it seems hard to get a lot of leverage in this way. (Then again, maybe there is no way to get extraordinary leverage over the long-term future.)
Also, if we aren’t at a particularly influential point in time regarding AI, then I think that expanding the moral circle, or otherwise advocating for “better” values, may be among the best things we can do. What are your thoughts on that?
I still think that working on AI is ultra-important — in one sense, whether there’s a 1% risk or a 20% risk doesn’t really matter; society is still extremely far from the optimum level of concern. (Similarly: “Is the right carbon tax $50 or $200?” doesn’t really matter.)
For longtermist EAs more narrowly it might matter insofar as I think it makes some other options more competitive than otherwise: especially the idea of long-term investment (whether financial or via movement-building); doing research on longtermist-relevant topics; and, like you say, perhaps doing broader x-risk reduction strategies like preventing war, better governance, trying to improve incentives so that they align better with the long-term, and so on.
There’s a lot of debate about the causes of the industrial revolution. Very few commentators point to some technological breakthrough as the cause, so it’s striking that people are inclined to point to a technological breakthrough in AI as the cause of the next growth mode transition. Instead, leading theories point to some resource overhang (‘colonies and coal’), or some innovation or change in institutions (more liberal laws and norms in England, or higher wages incentivising automation) or in culture. So perhaps there’s some novel governance system that could drive a higher growth mode, and that’ll be the decisive thing.
Strongly agree. I think it’s helpful to think about it in terms of the degree to which social and economic structures optimise for growth and innovation. Our modern systems (capitalism, liberal democracy) do reward innovation—and maybe that’s what caused the growth mode change—but we’re far away from strongly optimising for it. We care about lots of other things, and whenever there are constraints, we don’t sacrifice everything on the altar of productivity / growth / innovation. And, while you can make money by innovating, the incentive is more about innovations that are marketable in the near term, rather than maximising long-term technological progress. (Compare e.g. an app that lets you book taxis in a more convenient way vs. foundational neuroscience research.)
So, a growth mode could be triggered by any social change (culture, governance, or something else) resulting in significantly stronger optimisation pressures for long-term innovation.
That said, I don’t really see concrete ways in which this could happen and current trends do not seem to point in this direction. (I’m also not saying this would necessarily be a good thing.)
One thing that moves me towards placing a lot of importance on culture and institutions: We’ve actually had the technology and knowledge to produce greater-than-human intelligence for thousands of years, via selective breeding programs. But it’s never happened, because of taboos and incentives not working out.
Do I understand you correctly that you’re relatively less worried about existential risks because you think they are less likely to be existential (that civilization will rebound) and not because you think that the typical global catastrophes that we imagine are less likely?
About the first 3 statements on existential risks, takeoff scenarios and how influential our time is: How much is your view the general wisdom of experts in the corresponding research fields (I’m not sure what this field would be for assessing our influence on the future) and how much is it something like your own internal view?
It depends on who we point to as the experts, which I think there could be disagreement about. If we’re talking about, say, FHI folks, then I’m very clearly in the optimistic tail—others would put much higher x-risk, takeoff scenario, and chance of being superinfluential. But note I think there’s a strong selection effect with respect to who becomes an FHI person, so I don’t simply peer-update to their views. I’d expect that, say, a panel of superforecasters, after being exposed to all the arguments, would be closer to my view than to the median FHI view. If I were wrong about that I’d change my view. One relevant piece of evidence is that the Metaculus (a community prediction site) algorithm puts the chance of 95%+ of people dead by 2100 at 0.5%, which is in the same ballpark as me.
I think there’s some evidence that Metaculus, while a group of fairly smart and well-informed people, are nowhere near as knowledgeable as a fairly informed EA (perhaps including a typical user of this forum?) for the specific questions around existential and global catastrophic risks.
One example I can point to is that for this question on climate change and GCR before 2100 (that has been around since October 2018), a single not-very-informative comment from me was enough to change the community median from 24% to 10%. This suggests to me that Metaculus users did not previously have strong evidence or careful reasoning on this question, or perhaps GCR-related thinking in general.
Now you might think that actual superforecasters are better, but based on the comments given so far for COVID-19, I’m unimpressed. In particular the selected comments point to use of reference classes that EAs and avid Metaculus users have known to be flawed for over a week before the report came out (eg, using China’s low deaths as evidence that this can be easily replicated in other countries as the default scenario).
Now COVID-19 is not an existential risk or GCR, but it is an “out of distribution” problem showing clear and fast exponential growth that seems unusual for most questions superforecasters are known to excel at.
I’d be very interested in hearing more about the views you list under the “more philosophical end” (esp. moral uncertainty) -- either here or on the 80k podcast.
Lots! Treat all of the following as ‘things Will casually said in conversation’ rather than ‘Will is dying on this hill’ (I’m worried about how messages travel and transmogrify, and I wouldn’t be surprised if I changed lots of these views again in the near future!). But some things include:
I think existential risk this century is much lower than I used to think — I used to put total risk this century at something like 20%; now I’d put it at less than 1%.
I find ‘takeoff’ scenarios from AI over the next century much less likely than I used to. (Fast takeoff in particular, but even the idea of any sort of ‘takeoff’, understood in terms of moving to a higher growth mode, rather than progress in AI just continuing existing two-century-long trends in automation.) I’m not sure what numbers I’d have put on this previously, but I’d now put medium and fast takeoff (e.g. that in the next century we have a doubling of global GDP in a 6 month period because of progress in AI) at less than 10%.
In general, I think it’s much less likely that we’re at a super-influential time in history; my next blog post will be about this idea
I’m much more worried about a great power war in my lifetime than I was a couple of years ago. (Because of thinking about the base rate of war, not because of recent events.)
I find (non-extinction) trajectory change more compelling as a way of influencing the long-run future than I used to.
I’m much more sceptical about our current level of understanding about how to influence the long-run future than I was before, and think its more likely than I did before that EAs in 50 years will think that EAs of today were badly mistaken.
I’m more interested than I was in getting other people’s incentives right with respect to long-run outcomes, as compared to just trying to aim for good long-run outcomes directly. So for example, I’m more interested in institutional changes than I was, including intergovernmental institutions and design of world government, and space law.
I’m much more sympathetic to the idea of giving later (and potentially much later) than I was before.
On the more philosophical end:
I’m no longer convinced of naturalism as a metaphysical view, where by ‘naturalism’ I mean the view that everything that exists exists in space-time. (So now, e.g., I think that numbers and properties exist, and I no longer see what supports the idea that everything that exists must be spatio-temporal).
I haven’t really worked it through, but I probably have a pretty different take on the right theoretical approach to moral uncertainty than I used to have. (This would take a while to explain, and wouldn’t have major practical implications, but it’s different than the broad view I defend in the book and my PhD.)
This is just a first impression, but I’m curious about what seems a crucial point—that your beliefs seem to imply extremely high confidence of either general AI not happening this century, or that AGI will go ‘well’ by default. I’m very curious to see what guides your intuition there, or if there’s some other way that first-pass impression is wrong.
I’m curious about similar arguments that apply to bio & other plausible x-risks too, given what’s implied by low x-risk credence
The general background worldview that motivates this credence is that predicting the future is very hard, and we have almost no evidence that we can do it well. (Caveat I don’t think we have great evidence that we can’t do it either, though.) When it comes to short-term forecasting, the best strategy is to use reference-class forecasting (‘outside view’ reasoning; often continuing whatever trend has occurred in the past), and make relatively small adjustments based on inside-view reasoning. In the absence of anything better, I think we should do the same for long-term forecasts too. (Zach Groff is working on a paper making this case in more depth).
So when I look to predict the next hundred years, say, I think about how the past 100 years has gone (as well as giving consideration to how the last 1000 years and 10,000 years (etc) have gone). When you ask me about how AI will go, as a best guess I continue the centuries-long trend of automation of both physical and intellectual labour; in the particular context of AI I continue the trend where within a task, or task-category, the jump from significantly sub-human to vastly-greater-than-human level performance is rapid (on the order of years), but progress from one category of task to another (e.g. from chess to Go) goes rather slowly, as different tasks seem to differ from each other by orders of magnitude in terms of how difficult they are to automate. So I expect progress in AI to be gradual.
Then I also expect future AI systems to be narrow rather than general. When I look at the history of tech progress, I almost always see the creation of specific, highly optimised and generally very narrow tools, and very rarely the creation of general-purpose systems like general-purpose factories. And in general, when general-purpose tools are developed, they are worse than narrow tools on any given dimension: a swiss army knife is a crappier knife, bottle opener, saw, etc than any of those things individually. The current development of AI systems don’t give me any reason to think that AI is different: they’ve been very narrow to date; and when they’ve attempted to do things that are somewhat more general, like driving a car, progress has been slow and gradual, suffering from major difficulties in dealing with unusual situations.
Finally, I expect the development of any new technology to be safe by default. As an intuition pump: suppose there was some new design of bomb and BAE Systems decided to build it. There were, however, some arguments that the new design was unstable, and that if designed badly the bomb would kill everyone in the company, including the designers, the CEO, the board, and all their families. These arguments have been made in the media and the designers and the companies were aware of them. What odds do you put on BAE Systems building the bomb wrong and blowing themselves up? I’d put it very low — certainly less than 1%, and probably less than 0.1%. That would be true even if BAE Systems were in a race with Lockheed Martin to be the first to market. People in general really want to avoid dying, so there’s a huge incentive (a willingness-to-pay measured in the trillions of dollars for the USA alone) to ensure that AI doesn’t kill everyone. And when I look at other technological developments I see society being very risk averse and almost never taking major risks—a combination of public opinion and regulation means that things go slow and safe; again, self-driving cars are an example.
For each of these views, I’m very happy to acknowledge that maybe AI is different. And, when we’re talking about what could be the most important event ever, the possibility of some major discontinuity is really worth guarding against. But discontinuity is not my mainline prediction of what will happen.
(Later edit: I worry that the text above might have conveyed the idea that I’m just ignoring the Yudkowsky/Bostrom arguments, which isn’t accurate. Instead, another factor in my change of view was placing less weight on the Y-B arguments because of: (i) finding the arguments that we’ll get discontinuous progress in AI a lot less compelling than I used to (e.g. see here and here); (ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)
The argument you give in this paragraph only makes sense if “safe” is defined as “not killing everyone” or “avoids risks that most people care about”. But what about “safe” as in “not causing differential intellectual progress in a wrong direction, which can lead to increased x-risks in the long run” or “protecting against or at least not causing value drift so that civilization will optimize for the ‘right’ values in the long run, whatever the appropriate meaning of that is”?
If short-term extinction risk (and in general risks that most people care about) is small compared to other kinds of existential risks, it would seem to make sense for longtermists to focus their efforts more on the latter.
I agree re value-drift and societal trajectory worries, and do think that work on AI is plausibly a good lever to positively affect them.
I’d be happy to read more about this point.
If we end up with powerful deep learning models that optimize a given objective extremely well, the main arguments in Superintelligence seem to go through.
(If we end up with powerful deep learning models that do NOT optimize a given objective, it seems to me plausible that x-risks from AI are more severe, rather than less.)
[EDIT: replaced “a specified objective function” with “a given objective”]
Why do his beliefs imply extremely high confidence? Why do the higher estimates from other people not imply that? I’m curious what’s going on here epistemologically.
If you believe “<1% X”, that implies “>99% ¬X”, so you should believe that too. But if you think >99% ¬X seems too confident, then you should modus tollens and moderate your <1% X belief. When other people give e.g. 30% X, that only implies 70% ¬X, which seems more justifiable to me.
I use AGI as an example just because if it happens, it seems more obviously transformative & existential than biorisk, where it’s harder to reason about whether people survive. And because Will’s views seem to diverge quite strongly from average or median predictions in the ML community, not that I’d read all too much into that. Perhaps further, many people in the EA community believe there’s good reason to think those predictions are too conservative if anything, and have arguments for significant probability of AGI in the next couple decades, let alone century.
Since Will’s implied belief is >99% no xrisk this century, this either means AGI won’t happen, or that it has a very high probability of going well (getting or preserving most of the possible value in the future, which seems the most useful definition of existential for EA purposes). That’s at first glance of course, so not wanting the whole book, just want an intuition for how you seem to get such high confidence ¬X, especially when it seems to me there’s some plausible evidence for X.
I disagree with your implicit claim that Will’s views (which I mostly agree with) constitute an extreme degree of confidence. I think it’s a mistake to approach these questions with a 50-50 prior. Instead, we should consider the base rate for “events that are at least as transformative as the industrial revolution”.
That base rate seems pretty low. And that’s not actually what we’re talking about—we’re talking about AGI, a specific future technology. In the absense of further evidence, a prior of <10% on “AGI takeoff this century” seems not unreasonable to me. (You could, of course, believe that there is concrete evidence on AGI to justify different credences.)
On a different note, I sometimes find the terminology of “no x-risk”, “going well” etc. unhelpful. It seems more useful to me to talk about concrete outcomes and separate this from normative judgments. For instance, I believe that extinction through AI misalignment is very unlikely. However, I’m quite uncertain about whether people in 2019, if you handed them a crystal ball that shows what will happen (regarding AI), would generally think that things are “going well”, e.g. because people might disapprove of value drift or influence drift. (The future will plausibly be quite alien to us in many ways.) And finally, in terms to my personal values, the top priority is to avoid risks of astronomical suffering (s-risks), which is another matter altogether. But I wouldn’t equate this with things “going well”, as that’s a normative judgment and I think EA should be as inclusive as possible towards different moral perspectives.
Maybe one source of confusion here is that the word “extreme” can be used either to say that someone’s credence is above (or below) a certain level/number (without any value judgement concerning whether that’s sensible) or to say that it’s implausibly high/low.
One possible conclusion would be to just taboo the word “extreme” in this context.
Agree, tried to add more clarification below. I’ll try to avoid this going forward, maybe unsuccessfully.
Tbh, I mean a bit of both definitions (Will’s views are quite surprising to me, which is why I want to know more), but mostly the former (i.e. stating it’s close to 0% or 100%).
Agree on “going well” being under-defined. I was mostly using that for brevity, but probably more confusion than it’s worth. A definition I might use is “preserves the probability of getting to the best possible futures”, or even better if it increases that probability. Mainly because from an EA perspective (even if people are around) if we’ve locked in a substantially suboptimal moral situation, we’ve effectively lost most possible value—which I’d call x-risk.
The main point was fairly object-level—Will’s beliefs imply it’s near 1% likelihood of AGI in 100 years, or near 99% likelihood of it “not reducing the probability of the best possible futures”, or some combination like <10% likelihood of AGI in 100 years AND even if we get it, >90% likelihood of it not negatively influencing the probability of the best possible futures. Any of these sound somewhat implausible to me, so I’m curious for the intuition behind whichever one Will believes.
Def agree. Things-like-this shouldn’t be approached with a 50-50 prior—throw me in another century & I think <5% likelihood of AGI, the Industrial Revolution, etc is very reasonable on priors. I just think that probability can shift relatively quickly in response to observations. For the industrial revolution, that might be when you’ve already had the agricultural revolution (so a smallish fraction of the population can grow enough food for everyone), you get engines working well & relatively affordably, you have large-scale political stability for a while s.t. you can interact peacefully with millions of other people, you have proto-capitalism where you can produce/sell things & reasonably expect to make money doing so, etc. At that point, from an inside view, it feels like “we can use machines & spare labor to produce a lot more stuff per person, and we can make lots of money off producing a lot of stuff, so people will start doing that more” is a reasonable position. So those would shift me from single digits or less, to at least >20% on the industrial revolution in that century, probably more but discounting for hindsight bias. (I don’t know if this is a useful comparison, just using since you mentioned & does seem similar in some ways where base rate is low, but it did eventually happen).
For AI, these seem relevant: when you have a plausible physical substrate, have better predictive models for what the brain does (connectionism & refinements seem plausible & have been fairly successful over the last few decades despite being unpopular initially), start to see how comparably long-evolved mechanisms work & duplicate some of them, reach super-human performance on some tasks historically considered hard/ requiring great intelligence, have physical substrate reaching scales that seem comparable to the brain, etc.
In any case, these are getting a bit far from my original thought, which was just which of those situations w.r.t. AGI does Will believe & some intuition for why
I’d usually want to modify my definition of “well” to “preserves the probability of getting to the best possible futures AND doesn’t increase the probability of the worst possible futures”, but that’s a bit more verbose.
Very interesting points! I largely agree with your (new) views. Some thoughts:
If you think that extinction risk this century is less than 1%, then in particular, you think that extinction risk from transformative AI is less than 1%. So, for this to be consistent, you have to believe either
a) that it’s unlikely that transformative AI will be developed at all this century,
b) that transformative AI is unlikely to lead to extinction when it is developed, e.g. because it will very likely be aligned in at least a narrow sense. (I wrote up some arguments for this a while ago.)
Which of the two do you believe to what extent? For instance, if you put 10% on transformative AI this century – which is significantly more conservative than “median EA beliefs” – then you’d have to believe that the conditional probability of extinction is less than 10%. (I’m not saying I disagree – in fact, I believe something along these lines myself.)
What do you think about the possibility of a growth mode change (i.e. much faster pace of economic growth and probably also social change, comparable to the industrial revolution) for reasons other than AI? I feel that this is somewhat neglected in EA – would you agree with that?
--
I’d also be interested in more details on what these beliefs imply in terms of how we can improve the long-term future. I suppose you are now more sceptical about work on AI safety as the “default” long-termist intervention. But what is the alternative? Do you think we should focus on broad improvements to civilisation, such as better governance, working towards compromise and cooperation rather than conflict / war, or generally trying to make humanity more thoughtful and cautious about new technologies and the long-term future? These are uncontroversially good but not very neglected, and it seems hard to get a lot of leverage in this way. (Then again, maybe there is no way to get extraordinary leverage over the long-term future.)
Also, if we aren’t at a particularly influential point in time regarding AI, then I think that expanding the moral circle, or otherwise advocating for “better” values, may be among the best things we can do. What are your thoughts on that?
Thanks! I’ve read and enjoyed a number of your blog posts, and often found myself in agreement.
See my comment to nonn. I want to avoid putting numbers on those beliefs to avoid anchoring myself; but I find them both very likely—it’s not that one is much more likely than the other. (Where ‘transformative AI not developed this century’ includes ‘AI is not transformative’ in the sense that it doesn’t precipitate a new growth mode in the next century—this is certainly my mainline belief.)
Yes, I’d agree with that. There’s a lot of debate about the causes of the industrial revolution. Very few commentators point to some technological breakthrough as the cause, so it’s striking that people are inclined to point to a technological breakthrough in AI as the cause of the next growth mode transition. Instead, leading theories point to some resource overhang (‘colonies and coal’), or some innovation or change in institutions (more liberal laws and norms in England, or higher wages incentivising automation) or in culture. So perhaps there’s some novel governance system that could drive a higher growth mode, and that’ll be the decisive thing.
I still think that working on AI is ultra-important — in one sense, whether there’s a 1% risk or a 20% risk doesn’t really matter; society is still extremely far from the optimum level of concern. (Similarly: “Is the right carbon tax $50 or $200?” doesn’t really matter.)
For longtermist EAs more narrowly it might matter insofar as I think it makes some other options more competitive than otherwise: especially the idea of long-term investment (whether financial or via movement-building); doing research on longtermist-relevant topics; and, like you say, perhaps doing broader x-risk reduction strategies like preventing war, better governance, trying to improve incentives so that they align better with the long-term, and so on.
Strongly agree. I think it’s helpful to think about it in terms of the degree to which social and economic structures optimise for growth and innovation. Our modern systems (capitalism, liberal democracy) do reward innovation—and maybe that’s what caused the growth mode change—but we’re far away from strongly optimising for it. We care about lots of other things, and whenever there are constraints, we don’t sacrifice everything on the altar of productivity / growth / innovation. And, while you can make money by innovating, the incentive is more about innovations that are marketable in the near term, rather than maximising long-term technological progress. (Compare e.g. an app that lets you book taxis in a more convenient way vs. foundational neuroscience research.)
So, a growth mode could be triggered by any social change (culture, governance, or something else) resulting in significantly stronger optimisation pressures for long-term innovation.
That said, I don’t really see concrete ways in which this could happen and current trends do not seem to point in this direction. (I’m also not saying this would necessarily be a good thing.)
One thing that moves me towards placing a lot of importance on culture and institutions: We’ve actually had the technology and knowledge to produce greater-than-human intelligence for thousands of years, via selective breeding programs. But it’s never happened, because of taboos and incentives not working out.
People didn’t quite have the relevant knowledge, since they didn’t have sound plant and animal breeding programs or predictions of inheritance.
I’d be super interested in hearing you elaborate more on most of the points! Especially the first two.
Me too! I’m quite surprised by many of them! (Not necessarily disagree, just surprised)
I’d like to vote for more detail on:
Unless the change in importance is fully explained by the relative reprioritization after updating downward on existential risks.
Do I understand you correctly that you’re relatively less worried about existential risks because you think they are less likely to be existential (that civilization will rebound) and not because you think that the typical global catastrophes that we imagine are less likely?
Thanks for these interesting points!
About the first 3 statements on existential risks, takeoff scenarios and how influential our time is: How much is your view the general wisdom of experts in the corresponding research fields (I’m not sure what this field would be for assessing our influence on the future) and how much is it something like your own internal view?
It depends on who we point to as the experts, which I think there could be disagreement about. If we’re talking about, say, FHI folks, then I’m very clearly in the optimistic tail—others would put much higher x-risk, takeoff scenario, and chance of being superinfluential. But note I think there’s a strong selection effect with respect to who becomes an FHI person, so I don’t simply peer-update to their views. I’d expect that, say, a panel of superforecasters, after being exposed to all the arguments, would be closer to my view than to the median FHI view. If I were wrong about that I’d change my view. One relevant piece of evidence is that the Metaculus (a community prediction site) algorithm puts the chance of 95%+ of people dead by 2100 at 0.5%, which is in the same ballpark as me.
I think there’s some evidence that Metaculus, while a group of fairly smart and well-informed people, are nowhere near as knowledgeable as a fairly informed EA (perhaps including a typical user of this forum?) for the specific questions around existential and global catastrophic risks.
One example I can point to is that for this question on climate change and GCR before 2100 (that has been around since October 2018), a single not-very-informative comment from me was enough to change the community median from 24% to 10%. This suggests to me that Metaculus users did not previously have strong evidence or careful reasoning on this question, or perhaps GCR-related thinking in general.
Now you might think that actual superforecasters are better, but based on the comments given so far for COVID-19, I’m unimpressed. In particular the selected comments point to use of reference classes that EAs and avid Metaculus users have known to be flawed for over a week before the report came out (eg, using China’s low deaths as evidence that this can be easily replicated in other countries as the default scenario).
Now COVID-19 is not an existential risk or GCR, but it is an “out of distribution” problem showing clear and fast exponential growth that seems unusual for most questions superforecasters are known to excel at.
I’d be very interested in hearing more about the views you list under the “more philosophical end” (esp. moral uncertainty) -- either here or on the 80k podcast.