There are some obvious responses to my argument here, like: ‘X seems likely to you because of a conjunction fallacy; we can learn from this test that X isn’t likely, though it’s also not vanishingly improbable.’ If a claim is conjunctive enough, and the conjuncts are individually unlikely enough, then you can obviously study a question for months or years and end up ~95% confident of not-X (e.g., ‘this urn contains seventeen different colors of balls, so I don’t expect the ball I randomly pick to be magenta’).
I worry there’s possibly something rude about responding to a careful analysis by saying ‘this conclusion is just too wrong’, without providing an equally detailed counter-analysis or drilling down on specific premises.
(I’m maybe being especially rude in a context like the EA Forum, where I assume a good number of people don’t share the perspective that AI is worth worrying even at the ~5% level!)
You mention the Multiple Stages Fallacy (also discussed here, as “the multiple-stage fallacy”), which is my initial guess as to a methodological crux behind our different all-things-considered probabilities.
But the more basic reason why I felt moved to comment here is a general worry that EAs have a track record of low-balling probabilities of AI risk and large-AI-impacts-soon in their public writing. E.g.:
The headline number from Holden Karnofsky’s 2016 Some Background on Our Views Regarding Advanced Artificial Intelligence is “I think there is a nontrivial likelihood (at least 10% with moderate robustness, and at least 1% with high robustness) of transformative AI within the next 20 years.” Only mentioning the lower bound on your probability, and not the upper bound, makes sense from the perspective of ‘this is easier to argue for and is sufficient for the set of actions we’re currently trying to justify’, but it means readers don’t come away knowing what the actual estimates are, and they come away anchored to the lowest number in your range of reasonable predictions.
80,000 Hours’ 2017 high-level summary of AI risk states at the top that “We estimate that the risk of a severe, even existential catastrophe caused by machine intelligence within the next 100 years iis [sic] between 1% and 10%.”
Back in Sep. 2017, I wrote (based on some private correspondence with researchers):
I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: “The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future.”
80,000 Hours is summarizing a research field where 80+% of specialists think that there’s >10% probability of existential catastrophe from event A; they stick their neck out to say that these 80+% are wrong, and in fact so ostentatiously wrong that their estimate isn’t even in the credible range of estimates, which they assert to be 1-10%; and they seemingly go further by saying this is true for the superset ‘severe catastrophes from A’ and not just for existential catastrophes from A.′
If this were a typical technical field, that would be a crazy thing to do in a career summary, especially without flagging that that’s what 80,000 Hours is doing (so readers can decide for themselves how to weight the views of e.g. alignment researchers vs. ML researchers vs. meta-researchers like 80K). You could say that AI is really hard to forecast so it’s harder to reach a confident estimate, but that should widen your range of estimates, not squeeze it all into the 1-10% range. Uncertainty isn’t an argument for optimism.
There are obvious social reasons one might not want to sound alarmist about a GCR, especially a weird/novel GCR. But—speaking here to EAs as a whole, since it’s a lot harder for me to weigh in on whether you’re an instance of this trend than for me to weigh in on whether the trend exists at all—I want to emphasize that there are large potential costs to being more quiet about “high-seeming numbers” than “low-seeming numbers” in this domain, analogous to the costs e.g. of experts trying to play down their worries in the early days of the COVID-19 pandemic. Even if each individual decision seems reasonable at the time, the aggregate effect is a very skewed group awareness of reality.
I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: “The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future.”
If you’re still making this claim now, want to bet on it? (We’d first have to operationalize who counts as an “AI safety researcher”.)
I also think it wasn’t true in Sep 2017, but I’m less confident about that, and it’s not as easy to bet on.
I’m understanding your main points/objections in this comment as:
You think the multiple stage fallacy might be the methodological crux behind our disagreement.
You think that >80% of AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would assign >10% probability to existential catastrophe from technical problems with AI (at some point, not necessarily before 2070). So it seems like 80k saying 1-10% reflects a disagreement with the experts, which would be strange in the context of e.g. climate change, and at least worth flagging/separating. (Presumably, something similar would apply to my own estimates.)
You worry that there are social reasons not to sound alarmist about weird/novel GCRs, and that it can feel “conservative” to low-ball rather than high-ball the numbers. But low-balling (and/or focusing on/making salient lower-end numbers) has serious downsides. And you worry that EA folks have a track record of mistakes in this vein.
(as before, let’s call “there will be an existential catastrophe from power-seeking AI before 2070” p).
Re 1 (and 1c, from my response to the main thread): as I discuss in the document, I do think there are questions about multiple-stage fallacies, here, though I also think that not decomposing a claim into sub-claims can risk obscuring conjunctiveness (and I don’t see “abandon the practice of decomposing a claim into subclaims” as a solution to this). As an initial step towards addressing some of these worries, I included an appendix that reframes the argument using fewer premises (and also, in positive (e.g., “p is false”) vs. negative (“p is true”) forms). Of course, this doesn’t address e.g. the “the conclusion could be true, but some of the premises false” version of the “multiple stage fallacy” worry; but FWIW, I really do think that the premises here capture the majority of my own credence on p, at least. In particular, the timelines premise is fairly weak, premises 4-6 are implied by basically any p-like scenario, so it seems like the main contenders for false premises (even while p is true) are 2: (“There will be strong incentives to build APS systems”) and 3: (“It will be much harder to develop APS systems that would be practically PS-aligned if deployed, than to develop APS systems that would be practically PS-misaligned if deployed (even if relevant decision-makers don’t know this), but which are at least superficially attractive to deploy anyway”). Here, I note the scenarios most salient to me in footnote 173, namely: “we might see unintentional deployment of practical PS-misaligned APS systems even if they aren’t superficially attractive to deploy” and “practical PS-misaligned might be developed and deployed even absent strong incentives to develop them (for example, simply for the sake of scientific curiosity).” But I don’t see these are constituting more than e.g. 50% of the risk. If your own probability is driven substantially by scenarios where the premises I list are false, I’d be very curious to hear which ones (setting aside scenarios that aren’t driven by power-seeking, misaligned AI), and how much credence if you give them. I’d also be curious, more generally, to hear your more specific disagreements with the probabilities I give to the premises I list.
Re: 2, your characterization of the distribution of views amongst AI safety researchers (outside of MIRI) is in some tension with my own evidence; and I consulted with a number of people who fit your description of “specialists”/experts in preparing the document. That said, I’d certainly be interested to see more public data in this respect, especially in a form that breaks down in (rough) quantitative terms the different factors driving the probability in question, as I’ve tried to do in the document (off the top of my head, the public estimates most salient to me are Ord (2020) at 10% by 2100, Grace et al (2017)’s expert survey (5% median, with no target date), and FHI’s (2008) survey (5% on extinction from superintelligent AI by 2100), though we could gather up others from e.g. LW and previous X-risk books.) That said, importantly, and as indicated in my comment on the main thread, I don’t think of the community of AI safety researchers at the orgs you mention as in an epistemic position analogous to e.g. the IPCC, for a variety of reasons (and obviously, there are strong selection effects at work). Less importantly, I also don’t think the technical aspects of this problem the only factors relevant to assessing risk; at this point I have some feeling of having “heard the main arguments”; and >10% (especially if we don’t restrict to pre-2070 scenarios) is within my “high-low” range mentioned in footnote 178 (e.g., .1%-40%).
Re: 3, I do think that the “conservative” thing to do here is to focus on the higher-end estimates (especially given uncertainty/instability in the numbers), and I may revise to highlight this more in the text. But I think we should distinguish between the project of figuring out “what to focus on”/what’s “appropriately conservative,” and what our actual best-guess probabilities are; and just as there are risks of low-balling for the sake of not looking weird/alarmist, I think there are risks of high-balling for the sake of erring on the side of caution. My aim here has been to do neither; though obviously, it’s hard to eliminate biases (in both directions).
I think I share Robby’s sense that the methodology seems like it will obscure truth.
That said, I have neither your (Joe) extensive philosophical background nor have spent substantial time like you on a report like this, and I am interested in evidence to the contrary.
To me, it seems like you’ve tried to lay out a series of 6 steps of an argument, that you think each very accurately carve the key parts of reality that are relevant, and pondered each step for quite a while.
When I ask myself whether I’ve seen something like this produce great insight, it’s hard. It’s not something I’ve done much myself explicitly. However, I can think of a nearby example where I think this has produced great insight, which is Nick Bostrom’s work. I think (?) Nick spends a lot of his time considering a simple, single key argument, looking at it from lots of perspectives, scrutinizing wording, asking what people from different scientific fields would think of it, poking and prodding and rotating and just exploring it. Through that work, I think he’s been able to find considerations that were very surprising and invalidated the arguments, and proposed very different arguments instead.
When I think of examples here, I’m imagining that this sort of intellectual work produced the initial arguments about astronomical waste, and arguments since then about unilateralism and the vulnerable world hypothesis. Oh, and also simulation hypothesis (which became a tripartite structure).
I think of Bostrom as trying to consider a single worldview, and find out whether it’s a consistent object. One feeling I have about turning it into a multi-step probabilistic argument is that it does the opposite, it does not try to examine one worldview to find falsehoods, but instead integrates over all the parts of the worldview that Bostrom would scrutinize, to make a single clump of lots of parts of different worldviews. I think Bostrom may have literally never published a six-step argument of the form that you have, where it was meant to hold anything of weight in the paper or book, and also never done so assigning each step a probability.
To be clear, probabilistic discussions are great. Talking about precisely how strong a piece of evidence is – is it 2:1, 10:1, 100:1? Helps a lot in noticing which hypotheses to even pay attention to. The suspicion I have is that they are fairly different from the kind of cognition Bostrom does when doing this sort of philosophical argumentation that produces simple arguments of world-shattering importance. I suspect you’ve set yourself a harder task than Bostrom ever has (a 6-step argument), and thought you’ve made it easier for yourself by making it only probabilistic instead of deductive, whereas in fact this removes most of the tools that Bostrom was able to use to ensure he didn’t take mis-steps.
But I am pretty interested if there are examples of great work using your methodology that you were inspired by when writing this up, or great works with nearby methodologies that feel similar to you. I’d be excited to read/discuss some.
I tried to look for writing like this. I think that people do multiple hypothesis testing, like Harry in chapter 86 of HPMOR. There Harry is trying to weigh some different hypotheses against each other to explain his observations. There isn’t really a single train of conditional steps that constitutes the whole hypothesis.
My shoulder-Scott-Alexander is telling me (somewhat similar to my shoulder-Richard-Feynman) that there’s a lot of ways to trick myself with numbers, and that I should only do very simple things with them. I looked through some of his posts just now (1, 2, 3, 4, 5).
In summary: teacher quality probably explains 10% of the variation in same-year test scores. A +1 SD better teacher might cause a +0.1 SD year-on-year improvement in test scores. This decays quickly with time and is probably disappears entirely after four or five years, though there may also be small lingering effects. It’s hard to rule out the possibility that other factors, like endogenous sorting of students, or students’ genetic potential, contributes to this as an artifact, and most people agree that these sorts of scores combine some signal with a lot of noise. For some reason, even though teachers’ effects on test scores decay very quickly, studies have shown that they have significant impact on earning as much as 20 or 25 years later, so much so that kindergarten teacher quality can predict thousands of dollars of difference in adult income. This seemingly unbelievable finding has been replicated in quasi-experiments and even in real experiments and is difficult to banish. Since it does not happen through standardized test scores, the most likely explanation is that it involves non-cognitive factors like behavior. I really don’t know whether to believe this and right now I say 50-50 odds that this is a real effect or not – mostly based on low priors rather than on any weakness of the studies themselves. I don’t understand this field very well and place low confidence in anything I have to say about it.
I don’t know any post where Scott says “there’s a particular 6-step argument, and I assign 6 different probabilities to each step, and I trust that outcome number seems basically right”. His conclusions read more like 1 key number with some uncertainty, which never came from a single complex model, but from aggregating loads of little studies and pieces of evidence into a judgment.
I think I can’t think of a post like this by Scott or Robin or Eliezer or Nick or anyone. But would be interested in an example that is like this (from other fields or wherever), or feels similar.
Maybe not ‘insight’, but re. ‘accuracy’ this sort of decomposition is often in the tool box of better forecasters. I think the longest path I evaluated in a question had 4 steps rather than 6, and I think I’ve seen other forecasters do similar things on occasion. (The general practice of ‘breaking down problems’ to evaluate sub-issues is recommended in Superforecasting IIRC).
I guess the story why this works in geopolitical forecasting is folks tend to overestimate the chance ‘something happens’ and tend to be underdamped in increasing the likelihood of something based on suggestive antecedents (e.g. chance of a war given an altercation, etc.) So attending to “Even if A, for it to lead to D one should attend to P(B|A), P(C|B) etc. etc.”, tend to lead to downwards corrections.
Naturally, you can mess this up. Although it’s not obvious you are at greater risk if you arrange your decomposed considerations conjunctively or disjunctively: “All of A-E must be true for P to be true” ~also means “if any of ¬A-¬E are true, then ¬P”. In natural language and heuristics, I can imagine “Here are several different paths to P, and each of these seem not-too-improbable, so P must be highly likely” could also lead one astray.
It seems possible that attempting to produce “great insight” or “simple arguments of world-shattering importance” warrants a methodology different from the one I’ve used here. But my aim here is humbler: to formulate and evaluate an existing argument that I and various others take seriously, and that lots of resources are being devoted to; and to come to initial, informal, but still quantitative best-guesses about the premises and conclusion, which people can (hopefully) agree/disagree with at a somewhat fine-grained level—e.g., a level that just giving overall estimates, or just saying e.g. “significant probability,” “high enough to worry about,” etc can make more difficult to engage on.
In that vein, I think it’s possible you’re over-estimating how robust I take the premises and numbers here to be (I’m thinking here of your comments re: “very accurately carve the key parts of reality that are relevant,” and “trust the outcome number”). As I wrote in response to Rob above, my low-end/high-end range here is .1% to 40% (see footnote 179, previously 178), and in general, I hold the numbers here very lightly (I try to emphasize this in section 8).
FWIW, I think Superintelligence can be pretty readily seen as a multi-step argument (e.g., something like: superintelligence will happen eventually; fast take-off is plausible; if fast-take-off, then a superintelligence will probably get a decisive strategic advantage; alignment will be tricky; misalignment leads to power-seeking; therefore plausible doom). And more broadly, I think that people make arguments with many premises all the time (though sometimes the premises are suppressed). It’s true that people don’t usually assign probabilities to the premises (and Bostrom doesn’t, in Superintelligence—a fact that leaves the implied p(doom) correspondingly ambiguous) -- but I think this is centrally because assigning informal probabilities to claims (whether within a multi-step argument, or in general) just isn’t a very common practice, for reasons not centrally to do with e.g. multi-stage-fallacy type problems. Indeed, I expect I’d prefer a world where people assigned informal, lightly-held probabilities to their premises and conclusions (and formulated their arguments in premise-premise-conclusion form) more frequently.
I’m not sure exactly what you have in mind re: “examining a single worldview to see whether it’s consistent,” but consistency in a strict sense seems too cheap? E.g., “Bob has always been wrong before, but he’ll be right this time”; “Mortimer Snodgrass did it”; etc are all consistent. That said, my sense is that you have something broader in mind—maybe something like “plausible,” “compelling,” “sense-making,” etc. But it seems like these still leave the question of overall probabilities open...
Overall, my sense is that disagreement here is probably more productively focused on the object level—e.g., on the actual probabilities I give to the premises, and/or on pointing out and giving weight to scenarios that the premises don’t cover—rather than on the methodology in the abstract. In particular, I doubt that people who disagree a lot with my bottom line will end up saying: “If I was to do things your way, I’d roughly agree with the probabilities you gave to the premises; I just disagree that you should assign probabilities to premises in a multi-step argument as a way of thinking about issues like this.” Rather, I expect a lot of it comes down to substantive disagreement about the premises at issue (and perhaps, to people assigning significant credence to scenarios that don’t fit these premises, though I don’t feel like I’ve yet heard strong candidates—e.g., ones that seem to me to plausibly account for, say, >2/3rds of the overall X-risk from power-seeking, misaligned AI by 2070 -- in this regard).
I do think I was overestimating how robust you’re treating your numbers and premises, it seems like you’re holding them all much more lightly than I think I’d been envisioning.
FWIW I am more interested in engaging with some of what you wrote in in your other comment than engaging on the specific probability you assign, for some of the reasons I wrote about here.
I think I have more I could say on the methodology, but alas, I’m pretty blocked up with other work atm. It’d be neat to spend more time reading the report and leave more comments here sometime.
The upshot seems to be that Joe, 80k, the AI researcher survey (2008), Holden-2016 are all at about a 3% estimate of AI risk, whereas AI safety researchers now are at about 30%. The latter is a bit lower (or at least differently distributed) than Rob expected, and seems higher than among Joe’s advisors.
The divergence is big, but pretty explainable, because it concords with the direction that apparent biases point in. For the 3% camp, the credibility of one’s name, brand, or field benefits from making a lowball estimates. Whereas the 30% camp is self-selected to have severe concern. And risk perception all-round has increased a bit in the last 5-15 years due to Deep Learning.
Re 80K’s 2017 take on the risk level: You could also say that the AI safety field is crazy and people in it are very wrong, as part of a case for lower risk probabilities. There are some very unhealthy scientific fields out there. Also, technology forecasting is hard. A career-evaluating group could investigate a field like climate change, decide that researchers in the field are very confused about the expected impact of climate change, but still think it’s an important enough problem to warrant sending lots of people to work on the problem. But in that case, I’d still want 80K to explicitly argue that point, and note the disagreement.
I think there is a tenable view that considers an AI catastrophe less likely than what AI safety researchers think but is not committed to anything nearly as strong as the field being “crazy” or people in it being “very wrong”:
We might simply think that people are more likely to work on AI safety if they consider an AI catastrophe more likely. When considering their beliefs as evidence we’d then need to correct for that selection effect.
[ETA: I thought I should maybe add that even the direction of the update doesn’t seem fully clear. It depends on assumptions about the underlying population. E.g. if we think that everyone’s credence is determined by an unbiased but noisy process, then people with high credences will self-select into AI safety because of noise, and we should think the ‘correct’ credence is lower than what they say. On the other hand, if we think that there are differences in how people form their beliefs, then it at least could be the case that some people are simply better at predicting AI catastrophes, or are fast at picking up ‘warning signs’, and if AI risk is in fact high then we would see a ‘vanguard’ of people self-selecting into AI safety early who also will have systematically more accurate beliefs about AI risk than the general population.]
(I am sympathetic to “I’d still want 80K to explicitly argue that point, and note the disagreement.”, though haven’t checked to what extent they might do that elsewhere.)
Though in the world where the credible range of estimates is 1-10%, and 80% of the field believed the probability were >10% (my prediction from upthread), that would start to get into ‘something’s seriously wrong with the field’ territory from my perspective; that’s not a small disagreement.
(I’m assuming here, as I did when I made my original prediction, that they aren’t all clustered around 15% or whatever; rather, I’d have expected a lot of the field to give a much higher probability than 10%.)
There are some obvious responses to my argument here, like: ‘X seems likely to you because of a conjunction fallacy; we can learn from this test that X isn’t likely, though it’s also not vanishingly improbable.’ If a claim is conjunctive enough, and the conjuncts are individually unlikely enough, then you can obviously study a question for months or years and end up ~95% confident of not-X (e.g., ‘this urn contains seventeen different colors of balls, so I don’t expect the ball I randomly pick to be magenta’).
I worry there’s possibly something rude about responding to a careful analysis by saying ‘this conclusion is just too wrong’, without providing an equally detailed counter-analysis or drilling down on specific premises.
(I’m maybe being especially rude in a context like the EA Forum, where I assume a good number of people don’t share the perspective that AI is worth worrying even at the ~5% level!)
You mention the Multiple Stages Fallacy (also discussed here, as “the multiple-stage fallacy”), which is my initial guess as to a methodological crux behind our different all-things-considered probabilities.
But the more basic reason why I felt moved to comment here is a general worry that EAs have a track record of low-balling probabilities of AI risk and large-AI-impacts-soon in their public writing. E.g.:
The headline number from Holden Karnofsky’s 2016 Some Background on Our Views Regarding Advanced Artificial Intelligence is “I think there is a nontrivial likelihood (at least 10% with moderate robustness, and at least 1% with high robustness) of transformative AI within the next 20 years.” Only mentioning the lower bound on your probability, and not the upper bound, makes sense from the perspective of ‘this is easier to argue for and is sufficient for the set of actions we’re currently trying to justify’, but it means readers don’t come away knowing what the actual estimates are, and they come away anchored to the lowest number in your range of reasonable predictions.
80,000 Hours’ 2017 high-level summary of AI risk states at the top that “We estimate that the risk of a severe, even existential catastrophe caused by machine intelligence within the next 100 years iis [sic] between 1% and 10%.”
Paul Christiano’s original draft of What Failure Looks Like in 2019 phrased things in ways that caused journalists to conclude that the risk of sudden or violent AI takeover is relatively low.
Back in Sep. 2017, I wrote (based on some private correspondence with researchers):
80,000 Hours is summarizing a research field where 80+% of specialists think that there’s >10% probability of existential catastrophe from event A; they stick their neck out to say that these 80+% are wrong, and in fact so ostentatiously wrong that their estimate isn’t even in the credible range of estimates, which they assert to be 1-10%; and they seemingly go further by saying this is true for the superset ‘severe catastrophes from A’ and not just for existential catastrophes from A.′
If this were a typical technical field, that would be a crazy thing to do in a career summary, especially without flagging that that’s what 80,000 Hours is doing (so readers can decide for themselves how to weight the views of e.g. alignment researchers vs. ML researchers vs. meta-researchers like 80K). You could say that AI is really hard to forecast so it’s harder to reach a confident estimate, but that should widen your range of estimates, not squeeze it all into the 1-10% range. Uncertainty isn’t an argument for optimism.
There are obvious social reasons one might not want to sound alarmist about a GCR, especially a weird/novel GCR. But—speaking here to EAs as a whole, since it’s a lot harder for me to weigh in on whether you’re an instance of this trend than for me to weigh in on whether the trend exists at all—I want to emphasize that there are large potential costs to being more quiet about “high-seeming numbers” than “low-seeming numbers” in this domain, analogous to the costs e.g. of experts trying to play down their worries in the early days of the COVID-19 pandemic. Even if each individual decision seems reasonable at the time, the aggregate effect is a very skewed group awareness of reality.
If you’re still making this claim now, want to bet on it? (We’d first have to operationalize who counts as an “AI safety researcher”.)
I also think it wasn’t true in Sep 2017, but I’m less confident about that, and it’s not as easy to bet on.
(Am e-mailing with Rohin, will report back e.g. if we check this with a survey.)
Results are in this post.
(Continued from comment on the main thread)
I’m understanding your main points/objections in this comment as:
You think the multiple stage fallacy might be the methodological crux behind our disagreement.
You think that >80% of AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would assign >10% probability to existential catastrophe from technical problems with AI (at some point, not necessarily before 2070). So it seems like 80k saying 1-10% reflects a disagreement with the experts, which would be strange in the context of e.g. climate change, and at least worth flagging/separating. (Presumably, something similar would apply to my own estimates.)
You worry that there are social reasons not to sound alarmist about weird/novel GCRs, and that it can feel “conservative” to low-ball rather than high-ball the numbers. But low-balling (and/or focusing on/making salient lower-end numbers) has serious downsides. And you worry that EA folks have a track record of mistakes in this vein.
(as before, let’s call “there will be an existential catastrophe from power-seeking AI before 2070” p).
Re 1 (and 1c, from my response to the main thread): as I discuss in the document, I do think there are questions about multiple-stage fallacies, here, though I also think that not decomposing a claim into sub-claims can risk obscuring conjunctiveness (and I don’t see “abandon the practice of decomposing a claim into subclaims” as a solution to this). As an initial step towards addressing some of these worries, I included an appendix that reframes the argument using fewer premises (and also, in positive (e.g., “p is false”) vs. negative (“p is true”) forms). Of course, this doesn’t address e.g. the “the conclusion could be true, but some of the premises false” version of the “multiple stage fallacy” worry; but FWIW, I really do think that the premises here capture the majority of my own credence on p, at least. In particular, the timelines premise is fairly weak, premises 4-6 are implied by basically any p-like scenario, so it seems like the main contenders for false premises (even while p is true) are 2: (“There will be strong incentives to build APS systems”) and 3: (“It will be much harder to develop APS systems that would be practically PS-aligned if deployed, than to develop APS systems that would be practically PS-misaligned if deployed (even if relevant decision-makers don’t know this), but which are at least superficially attractive to deploy anyway”). Here, I note the scenarios most salient to me in footnote 173, namely: “we might see unintentional deployment of practical PS-misaligned APS systems even if they aren’t superficially attractive to deploy” and “practical PS-misaligned might be developed and deployed even absent strong incentives to develop them (for example, simply for the sake of scientific curiosity).” But I don’t see these are constituting more than e.g. 50% of the risk. If your own probability is driven substantially by scenarios where the premises I list are false, I’d be very curious to hear which ones (setting aside scenarios that aren’t driven by power-seeking, misaligned AI), and how much credence if you give them. I’d also be curious, more generally, to hear your more specific disagreements with the probabilities I give to the premises I list.
Re: 2, your characterization of the distribution of views amongst AI safety researchers (outside of MIRI) is in some tension with my own evidence; and I consulted with a number of people who fit your description of “specialists”/experts in preparing the document. That said, I’d certainly be interested to see more public data in this respect, especially in a form that breaks down in (rough) quantitative terms the different factors driving the probability in question, as I’ve tried to do in the document (off the top of my head, the public estimates most salient to me are Ord (2020) at 10% by 2100, Grace et al (2017)’s expert survey (5% median, with no target date), and FHI’s (2008) survey (5% on extinction from superintelligent AI by 2100), though we could gather up others from e.g. LW and previous X-risk books.) That said, importantly, and as indicated in my comment on the main thread, I don’t think of the community of AI safety researchers at the orgs you mention as in an epistemic position analogous to e.g. the IPCC, for a variety of reasons (and obviously, there are strong selection effects at work). Less importantly, I also don’t think the technical aspects of this problem the only factors relevant to assessing risk; at this point I have some feeling of having “heard the main arguments”; and >10% (especially if we don’t restrict to pre-2070 scenarios) is within my “high-low” range mentioned in footnote 178 (e.g., .1%-40%).
Re: 3, I do think that the “conservative” thing to do here is to focus on the higher-end estimates (especially given uncertainty/instability in the numbers), and I may revise to highlight this more in the text. But I think we should distinguish between the project of figuring out “what to focus on”/what’s “appropriately conservative,” and what our actual best-guess probabilities are; and just as there are risks of low-balling for the sake of not looking weird/alarmist, I think there are risks of high-balling for the sake of erring on the side of caution. My aim here has been to do neither; though obviously, it’s hard to eliminate biases (in both directions).
I think I share Robby’s sense that the methodology seems like it will obscure truth.
That said, I have neither your (Joe) extensive philosophical background nor have spent substantial time like you on a report like this, and I am interested in evidence to the contrary.
To me, it seems like you’ve tried to lay out a series of 6 steps of an argument, that you think each very accurately carve the key parts of reality that are relevant, and pondered each step for quite a while.
When I ask myself whether I’ve seen something like this produce great insight, it’s hard. It’s not something I’ve done much myself explicitly. However, I can think of a nearby example where I think this has produced great insight, which is Nick Bostrom’s work. I think (?) Nick spends a lot of his time considering a simple, single key argument, looking at it from lots of perspectives, scrutinizing wording, asking what people from different scientific fields would think of it, poking and prodding and rotating and just exploring it. Through that work, I think he’s been able to find considerations that were very surprising and invalidated the arguments, and proposed very different arguments instead.
When I think of examples here, I’m imagining that this sort of intellectual work produced the initial arguments about astronomical waste, and arguments since then about unilateralism and the vulnerable world hypothesis. Oh, and also simulation hypothesis (which became a tripartite structure).
I think of Bostrom as trying to consider a single worldview, and find out whether it’s a consistent object. One feeling I have about turning it into a multi-step probabilistic argument is that it does the opposite, it does not try to examine one worldview to find falsehoods, but instead integrates over all the parts of the worldview that Bostrom would scrutinize, to make a single clump of lots of parts of different worldviews. I think Bostrom may have literally never published a six-step argument of the form that you have, where it was meant to hold anything of weight in the paper or book, and also never done so assigning each step a probability.
To be clear, probabilistic discussions are great. Talking about precisely how strong a piece of evidence is – is it 2:1, 10:1, 100:1? Helps a lot in noticing which hypotheses to even pay attention to. The suspicion I have is that they are fairly different from the kind of cognition Bostrom does when doing this sort of philosophical argumentation that produces simple arguments of world-shattering importance. I suspect you’ve set yourself a harder task than Bostrom ever has (a 6-step argument), and thought you’ve made it easier for yourself by making it only probabilistic instead of deductive, whereas in fact this removes most of the tools that Bostrom was able to use to ensure he didn’t take mis-steps.
But I am pretty interested if there are examples of great work using your methodology that you were inspired by when writing this up, or great works with nearby methodologies that feel similar to you. I’d be excited to read/discuss some.
I tried to look for writing like this. I think that people do multiple hypothesis testing, like Harry in chapter 86 of HPMOR. There Harry is trying to weigh some different hypotheses against each other to explain his observations. There isn’t really a single train of conditional steps that constitutes the whole hypothesis.
My shoulder-Scott-Alexander is telling me (somewhat similar to my shoulder-Richard-Feynman) that there’s a lot of ways to trick myself with numbers, and that I should only do very simple things with them. I looked through some of his posts just now (1, 2, 3, 4, 5).
Here’s an example of a conclusion / belief from Scott’s post Teachers: Much More Than You Wanted to Know:
I don’t know any post where Scott says “there’s a particular 6-step argument, and I assign 6 different probabilities to each step, and I trust that outcome number seems basically right”. His conclusions read more like 1 key number with some uncertainty, which never came from a single complex model, but from aggregating loads of little studies and pieces of evidence into a judgment.
I think I can’t think of a post like this by Scott or Robin or Eliezer or Nick or anyone. But would be interested in an example that is like this (from other fields or wherever), or feels similar.
Maybe not ‘insight’, but re. ‘accuracy’ this sort of decomposition is often in the tool box of better forecasters. I think the longest path I evaluated in a question had 4 steps rather than 6, and I think I’ve seen other forecasters do similar things on occasion. (The general practice of ‘breaking down problems’ to evaluate sub-issues is recommended in Superforecasting IIRC).
I guess the story why this works in geopolitical forecasting is folks tend to overestimate the chance ‘something happens’ and tend to be underdamped in increasing the likelihood of something based on suggestive antecedents (e.g. chance of a war given an altercation, etc.) So attending to “Even if A, for it to lead to D one should attend to P(B|A), P(C|B) etc. etc.”, tend to lead to downwards corrections.
Naturally, you can mess this up. Although it’s not obvious you are at greater risk if you arrange your decomposed considerations conjunctively or disjunctively: “All of A-E must be true for P to be true” ~also means “if any of ¬A-¬E are true, then ¬P”. In natural language and heuristics, I can imagine “Here are several different paths to P, and each of these seem not-too-improbable, so P must be highly likely” could also lead one astray.
Hi Ben,
A few thoughts on this:
It seems possible that attempting to produce “great insight” or “simple arguments of world-shattering importance” warrants a methodology different from the one I’ve used here. But my aim here is humbler: to formulate and evaluate an existing argument that I and various others take seriously, and that lots of resources are being devoted to; and to come to initial, informal, but still quantitative best-guesses about the premises and conclusion, which people can (hopefully) agree/disagree with at a somewhat fine-grained level—e.g., a level that just giving overall estimates, or just saying e.g. “significant probability,” “high enough to worry about,” etc can make more difficult to engage on.
In that vein, I think it’s possible you’re over-estimating how robust I take the premises and numbers here to be (I’m thinking here of your comments re: “very accurately carve the key parts of reality that are relevant,” and “trust the outcome number”). As I wrote in response to Rob above, my low-end/high-end range here is .1% to 40% (see footnote 179, previously 178), and in general, I hold the numbers here very lightly (I try to emphasize this in section 8).
FWIW, I think Superintelligence can be pretty readily seen as a multi-step argument (e.g., something like: superintelligence will happen eventually; fast take-off is plausible; if fast-take-off, then a superintelligence will probably get a decisive strategic advantage; alignment will be tricky; misalignment leads to power-seeking; therefore plausible doom). And more broadly, I think that people make arguments with many premises all the time (though sometimes the premises are suppressed). It’s true that people don’t usually assign probabilities to the premises (and Bostrom doesn’t, in Superintelligence—a fact that leaves the implied p(doom) correspondingly ambiguous) -- but I think this is centrally because assigning informal probabilities to claims (whether within a multi-step argument, or in general) just isn’t a very common practice, for reasons not centrally to do with e.g. multi-stage-fallacy type problems. Indeed, I expect I’d prefer a world where people assigned informal, lightly-held probabilities to their premises and conclusions (and formulated their arguments in premise-premise-conclusion form) more frequently.
I’m not sure exactly what you have in mind re: “examining a single worldview to see whether it’s consistent,” but consistency in a strict sense seems too cheap? E.g., “Bob has always been wrong before, but he’ll be right this time”; “Mortimer Snodgrass did it”; etc are all consistent. That said, my sense is that you have something broader in mind—maybe something like “plausible,” “compelling,” “sense-making,” etc. But it seems like these still leave the question of overall probabilities open...
Overall, my sense is that disagreement here is probably more productively focused on the object level—e.g., on the actual probabilities I give to the premises, and/or on pointing out and giving weight to scenarios that the premises don’t cover—rather than on the methodology in the abstract. In particular, I doubt that people who disagree a lot with my bottom line will end up saying: “If I was to do things your way, I’d roughly agree with the probabilities you gave to the premises; I just disagree that you should assign probabilities to premises in a multi-step argument as a way of thinking about issues like this.” Rather, I expect a lot of it comes down to substantive disagreement about the premises at issue (and perhaps, to people assigning significant credence to scenarios that don’t fit these premises, though I don’t feel like I’ve yet heard strong candidates—e.g., ones that seem to me to plausibly account for, say, >2/3rds of the overall X-risk from power-seeking, misaligned AI by 2070 -- in this regard).
Thanks for the thoughtful reply.
I do think I was overestimating how robust you’re treating your numbers and premises, it seems like you’re holding them all much more lightly than I think I’d been envisioning.
FWIW I am more interested in engaging with some of what you wrote in in your other comment than engaging on the specific probability you assign, for some of the reasons I wrote about here.
I think I have more I could say on the methodology, but alas, I’m pretty blocked up with other work atm. It’d be neat to spend more time reading the report and leave more comments here sometime.
This links to A Sketch of Good Communication, not whichever comment you were intending to link :)
Fixed, tah.
Great comment :)
The upshot seems to be that Joe, 80k, the AI researcher survey (2008), Holden-2016 are all at about a 3% estimate of AI risk, whereas AI safety researchers now are at about 30%. The latter is a bit lower (or at least differently distributed) than Rob expected, and seems higher than among Joe’s advisors.
The divergence is big, but pretty explainable, because it concords with the direction that apparent biases point in. For the 3% camp, the credibility of one’s name, brand, or field benefits from making a lowball estimates. Whereas the 30% camp is self-selected to have severe concern. And risk perception all-round has increased a bit in the last 5-15 years due to Deep Learning.
Re 80K’s 2017 take on the risk level: You could also say that the AI safety field is crazy and people in it are very wrong, as part of a case for lower risk probabilities. There are some very unhealthy scientific fields out there. Also, technology forecasting is hard. A career-evaluating group could investigate a field like climate change, decide that researchers in the field are very confused about the expected impact of climate change, but still think it’s an important enough problem to warrant sending lots of people to work on the problem. But in that case, I’d still want 80K to explicitly argue that point, and note the disagreement.
I previously complained about this on LessWrong.
I think there is a tenable view that considers an AI catastrophe less likely than what AI safety researchers think but is not committed to anything nearly as strong as the field being “crazy” or people in it being “very wrong”:
We might simply think that people are more likely to work on AI safety if they consider an AI catastrophe more likely. When considering their beliefs as evidence we’d then need to correct for that selection effect.
[ETA: I thought I should maybe add that even the direction of the update doesn’t seem fully clear. It depends on assumptions about the underlying population. E.g. if we think that everyone’s credence is determined by an unbiased but noisy process, then people with high credences will self-select into AI safety because of noise, and we should think the ‘correct’ credence is lower than what they say. On the other hand, if we think that there are differences in how people form their beliefs, then it at least could be the case that some people are simply better at predicting AI catastrophes, or are fast at picking up ‘warning signs’, and if AI risk is in fact high then we would see a ‘vanguard’ of people self-selecting into AI safety early who also will have systematically more accurate beliefs about AI risk than the general population.]
(I am sympathetic to “I’d still want 80K to explicitly argue that point, and note the disagreement.”, though haven’t checked to what extent they might do that elsewhere.)
Yeah, I like this correction.
Though in the world where the credible range of estimates is 1-10%, and 80% of the field believed the probability were >10% (my prediction from upthread), that would start to get into ‘something’s seriously wrong with the field’ territory from my perspective; that’s not a small disagreement.
(I’m assuming here, as I did when I made my original prediction, that they aren’t all clustered around 15% or whatever; rather, I’d have expected a lot of the field to give a much higher probability than 10%.)