Models such as the Carlsmith one, which treat AI x-risk as highly conjunctive (i.e. lots of things need to happen for an AI existential catastrophe), already seem like they’ll bias results towards lower probabilities (see e.g. this section of Nate’s review of the Carlsmith report). I won’t say more on this since I think it’s been discussed several times already.
What I do want to highlight is that the methodology of this post exacerbates that effect. In principle, you can get reasonable results with such a model if you’re aware of the dangers of highly conjunctive models, and sufficiently careful in assigning probabilities.[1] This might at least plausibly be the case for a single person giving probabilities, who has hopefully thought about how to avoid the multiple stage fallacy, and spent a lot of time thinking about their probability estimates. But if you just survey a lot of people, you’ll very likely get at least a sizable fraction of responses who e.g. just tend to assign probabilities close to 50% because anything else feels overconfident, or who don’t actually condition enough on previous steps having happened, even if the question tells them to. (This isn’t really meant to critique people who answered the survey—it’s genuinely hard to give good probabilities for these conjunctive models). The way the analysis in this post works, if some people give probabilities that are too low, the overall result will also be very low (see e.g. this comment).
I would strongly guess that if you ran exactly the same type of survey and analysis with a highly disjunctive model (e.g. more along the lines of this one by Nate Soares), you would get way higher probabilities of X-risk. To be clear, that would be just as bad, it would likely be an overestimate!
One related aspect I want to address:
Most models of AI risk are – at an abstract enough level – more like an elimination tournament than a league, at least based on what has been published on various AI-adjacent forums. The AI needs everything to go its way in order to catastrophically depower humanity.
There is a lot of disagreement about whether AI risk is conjunctive or disjunctive (or, more realistically, where it is on the spectrum between the two). If I understand you correctly (in section 3.1), you basically found only one model (Carlsmith) that matched your requirements, which happened to be conjunctive. I’m not sure if that’s just randomness, or if there’s a systematic effect where people with more disjunctive models don’t tend to write down arguments in the style “here’s my model, I’ll assign probabilities and then multiply them”.
If we do want to use a methodology like the one in this post, I think we’d need to take uncertainty over the model itself extremely seriously. E.g. we could come up with a bunch of different models, assign weights to them somehow (e.g. survey people about how good a model of AI x-risk this is), and then do the type of analysis you do here for each model separately. At the end, we average over the probabilities each model gives using our weights. I’m still not a big fan of that approach, but at least it would take into account the fact that there’s a lot of disagreement about the conjunctive vs disjunctive character of AI risk. It would also “average out” the biases that each type of model induces to some extent.
My apologies if I wasn’t clear enough in the essay—I think there is a very good case for investigating structural uncertainty, it is just that it would require another essay-length treatment to do a decent job with. I hope to be able to produce such a treatment before the contest deadline (and I’ll publish afterwards anyway if this isn’t possible). This essay implicitly treats the model structure as fixed (except for a tiny nod to the issue in 4.3.3) and parameter uncertainty as the only point of contention, but in reality both the model structural uncertainty and parameter uncertainty will contribute to the overall uncertainty.
Yeah, I totally agree that combining such a detailed analysis as you are doing with structural uncertainty would be a really big task. My point certainly wasn’t that you hadn’t done “enough work”, this is already a long and impressive write-up.
I will say though that if you agree that model uncertainty would likely lead to substantially higher x-risk estimates, the takeaways in this post are very misleading. E.g.:
“The headline figure from this essay is that I calculate the best estimate of the risk of catastrophe due to out-of-control AGI is approximately 1.6%.”
“analysis of uncertainty reveals that the actual risk of AI Catastrophe is almost an order of magnitude less than most experts think it is”
“the main result I want to communicate is that it is more probable than not that we live in a world where the risk of AGI Catastrophe is <3%.”
I disagree with each of those claims, and I don’t think this post makes a strong enough case to justify them. Maybe the crux is this:
in reality both the model structural uncertainty and parameter uncertainty will contribute to the overall uncertainty.
My main point was not that structural uncertainty will increase our overall uncertainty, it was that specifically using a highly conjunctive model will give very biased results compared to considering a broader distribution over models. Not sure based on your reply if you agree with that (if not, then the takeaways make more sense, but in that case we do have a substantial disagreement).
I’m not sure we actually disagree about the fact on the ground, but I don’t fully agree with the specifics of what you’re saying (if that makes sense). In a general sense I agree the risk of ‘AI is invented and then something bad happens because of that’ is substantially higher than 1.6%. In the specific scenario the Future Fund are interested in for the contest however, I think the scenario is too narrow to say with confidence what would happen on examination of structural uncertainty. I could think of ways in which a more disjunctive structural model could even plausibly diminish the risk of the specific Future Fund catastrophe scenario—for example in models where some of the microdynamics make it easier to misuse AI deliberately. That wouldn’t necessarily change the overall risk of some AI Catastrophe befalling us, but it would be a relevant distinction to make with respect to the Future Fund question which asks about a specific kind of Catastrophe.
Also you’re right the second and third quotes you give are too strong—it should read something like ‘...the actual risk of AI Catastrophe of this particular kind...’ - you’re right that this essay says nothing about AI Catastrophe broadly defined, just the specific kind of catastrophe the Future Fund are interested in. I’ll change that, as it is undesirable imprecision.
Ok, thanks for clarifying! FWIW, everything I said was meant to be specifically about AGI takeover because of misalignment (i.e. excluding misuse), so it does seem we disagree significantly about the probability of that scenario (and about the effect of using less conjunctive models). But probably doesn’t make sense to get into that discussion too much since my actual cruxes are mostly on the object level (i.e. to convince me of low AI x-risk, I’d find specific arguments about what’s going to happen and why much more persuasive than survey-based models).
The above comment is irrational and poorly formed. It shows a lack of understanding of basic probability theory, and it conflates different types of risks. Specifically, the comment conflates the risks of artificial general intelligence (AGI) takeover with the risks of misuse of AGI. These are two very different types of risks, and they should not be conflated. AGI takeover risks are risks arising from the fact that AGI may be misaligned with human values. That is, AGI may be designed in such a way that it does not act in ways that are beneficial to humanity. This is a risk because AGI could potentially cause great harm to humanity if it is not properly controlled. Misuse risks, on the other hand, are risks arising from the fact that AGI may be used in ways that are harmful to humanity. This is a risk because AGI could be used to create powerful weapons or to manipulate people in harmful ways. The comment suggests that the probability of AGI takeover is low because it is based on survey-based models. However, this is not a valid way to calculate probabilities. Probabilities should be based on evidence.
Agreed, I think this post provides a great insight that hasn’t been pointed out before, but it works best for the Carlsmith model which is unusually conjunctive. Arguments for disjunctive AI risk include Nate Soares here and Kokotajlo and Dai here.
Both of the links you suggest are strong philosophical arguments for ‘disjunctive’ risk, but are not actually model schema (although Soares does imply he has such a schema and just hasn’t published it yet). The fact that I only use Carlsmith to model risk is a fair reflection of the state of the literature.
(As an aside, this seems really weird to me—there is almost no community pressure to have people explicitly draw out their model schema in powerpoint or on a piece of paper or something. This seems like a fundamental first step in communicating about AI Risk, but only Carlsmith has really done it to an actionable level. Am I missing something here? Are community norms in AI Risk very different to community norms in health economics, which is where I usually do my modelling?)
Agreed on that as well. The Carlsmith report is the only quantitative model of AI risk I’m aware of and it was the right call to do this analysis on it. I think we do have reasonably large error bars on its parameters (though perhaps smaller than an order of magnitude) meaning your insight is important.
Why aren’t there more models? My guess is that it’s just very difficult, with lots of overlapping and entangled scenarios that are hard to tease apart. How would you go about constructing an overall x-risk from the list of disjunctive risks? You can’t assume they’re independent events, and generating conditional probabilities for each seems challenging and not necessarily helpful.
Ajeya Cotra’s BioAnchors report is another quantitative model of that drives lots of beliefs on AI timelines. Stephanie Lin won the EA Critique Contest with one critique, but I’d be curious if you’d have other concerns with it.
Models such as the Carlsmith one, which treat AI x-risk as highly conjunctive (i.e. lots of things need to happen for an AI existential catastrophe), already seem like they’ll bias results towards lower probabilities (see e.g. this section of Nate’s review of the Carlsmith report). I won’t say more on this since I think it’s been discussed several times already.
What I do want to highlight is that the methodology of this post exacerbates that effect. In principle, you can get reasonable results with such a model if you’re aware of the dangers of highly conjunctive models, and sufficiently careful in assigning probabilities.[1] This might at least plausibly be the case for a single person giving probabilities, who has hopefully thought about how to avoid the multiple stage fallacy, and spent a lot of time thinking about their probability estimates. But if you just survey a lot of people, you’ll very likely get at least a sizable fraction of responses who e.g. just tend to assign probabilities close to 50% because anything else feels overconfident, or who don’t actually condition enough on previous steps having happened, even if the question tells them to. (This isn’t really meant to critique people who answered the survey—it’s genuinely hard to give good probabilities for these conjunctive models). The way the analysis in this post works, if some people give probabilities that are too low, the overall result will also be very low (see e.g. this comment).
I would strongly guess that if you ran exactly the same type of survey and analysis with a highly disjunctive model (e.g. more along the lines of this one by Nate Soares), you would get way higher probabilities of X-risk. To be clear, that would be just as bad, it would likely be an overestimate!
One related aspect I want to address:
There is a lot of disagreement about whether AI risk is conjunctive or disjunctive (or, more realistically, where it is on the spectrum between the two). If I understand you correctly (in section 3.1), you basically found only one model (Carlsmith) that matched your requirements, which happened to be conjunctive. I’m not sure if that’s just randomness, or if there’s a systematic effect where people with more disjunctive models don’t tend to write down arguments in the style “here’s my model, I’ll assign probabilities and then multiply them”.
If we do want to use a methodology like the one in this post, I think we’d need to take uncertainty over the model itself extremely seriously. E.g. we could come up with a bunch of different models, assign weights to them somehow (e.g. survey people about how good a model of AI x-risk this is), and then do the type of analysis you do here for each model separately. At the end, we average over the probabilities each model gives using our weights. I’m still not a big fan of that approach, but at least it would take into account the fact that there’s a lot of disagreement about the conjunctive vs disjunctive character of AI risk. It would also “average out” the biases that each type of model induces to some extent.
Though there’s still the issue of disjunctive pathways being completely ignored, and I also think it’s pretty hard to be sufficiently careful.
My apologies if I wasn’t clear enough in the essay—I think there is a very good case for investigating structural uncertainty, it is just that it would require another essay-length treatment to do a decent job with. I hope to be able to produce such a treatment before the contest deadline (and I’ll publish afterwards anyway if this isn’t possible). This essay implicitly treats the model structure as fixed (except for a tiny nod to the issue in 4.3.3) and parameter uncertainty as the only point of contention, but in reality both the model structural uncertainty and parameter uncertainty will contribute to the overall uncertainty.
Yeah, I totally agree that combining such a detailed analysis as you are doing with structural uncertainty would be a really big task. My point certainly wasn’t that you hadn’t done “enough work”, this is already a long and impressive write-up.
I will say though that if you agree that model uncertainty would likely lead to substantially higher x-risk estimates, the takeaways in this post are very misleading. E.g.:
“The headline figure from this essay is that I calculate the best estimate of the risk of catastrophe due to out-of-control AGI is approximately 1.6%.”
“analysis of uncertainty reveals that the actual risk of AI Catastrophe is almost an order of magnitude less than most experts think it is”
“the main result I want to communicate is that it is more probable than not that we live in a world where the risk of AGI Catastrophe is <3%.”
I disagree with each of those claims, and I don’t think this post makes a strong enough case to justify them. Maybe the crux is this:
My main point was not that structural uncertainty will increase our overall uncertainty, it was that specifically using a highly conjunctive model will give very biased results compared to considering a broader distribution over models. Not sure based on your reply if you agree with that (if not, then the takeaways make more sense, but in that case we do have a substantial disagreement).
I’m not sure we actually disagree about the fact on the ground, but I don’t fully agree with the specifics of what you’re saying (if that makes sense). In a general sense I agree the risk of ‘AI is invented and then something bad happens because of that’ is substantially higher than 1.6%. In the specific scenario the Future Fund are interested in for the contest however, I think the scenario is too narrow to say with confidence what would happen on examination of structural uncertainty. I could think of ways in which a more disjunctive structural model could even plausibly diminish the risk of the specific Future Fund catastrophe scenario—for example in models where some of the microdynamics make it easier to misuse AI deliberately. That wouldn’t necessarily change the overall risk of some AI Catastrophe befalling us, but it would be a relevant distinction to make with respect to the Future Fund question which asks about a specific kind of Catastrophe.
Also you’re right the second and third quotes you give are too strong—it should read something like ‘...the actual risk of AI Catastrophe of this particular kind...’ - you’re right that this essay says nothing about AI Catastrophe broadly defined, just the specific kind of catastrophe the Future Fund are interested in. I’ll change that, as it is undesirable imprecision.
Ok, thanks for clarifying! FWIW, everything I said was meant to be specifically about AGI takeover because of misalignment (i.e. excluding misuse), so it does seem we disagree significantly about the probability of that scenario (and about the effect of using less conjunctive models). But probably doesn’t make sense to get into that discussion too much since my actual cruxes are mostly on the object level (i.e. to convince me of low AI x-risk, I’d find specific arguments about what’s going to happen and why much more persuasive than survey-based models).
The above comment is irrational and poorly formed. It shows a lack of understanding of basic probability theory, and it conflates different types of risks. Specifically, the comment conflates the risks of artificial general intelligence (AGI) takeover with the risks of misuse of AGI. These are two very different types of risks, and they should not be conflated. AGI takeover risks are risks arising from the fact that AGI may be misaligned with human values. That is, AGI may be designed in such a way that it does not act in ways that are beneficial to humanity. This is a risk because AGI could potentially cause great harm to humanity if it is not properly controlled. Misuse risks, on the other hand, are risks arising from the fact that AGI may be used in ways that are harmful to humanity. This is a risk because AGI could be used to create powerful weapons or to manipulate people in harmful ways. The comment suggests that the probability of AGI takeover is low because it is based on survey-based models. However, this is not a valid way to calculate probabilities. Probabilities should be based on evidence.
Agreed, I think this post provides a great insight that hasn’t been pointed out before, but it works best for the Carlsmith model which is unusually conjunctive. Arguments for disjunctive AI risk include Nate Soares here and Kokotajlo and Dai here.
Both of the links you suggest are strong philosophical arguments for ‘disjunctive’ risk, but are not actually model schema (although Soares does imply he has such a schema and just hasn’t published it yet). The fact that I only use Carlsmith to model risk is a fair reflection of the state of the literature.
(As an aside, this seems really weird to me—there is almost no community pressure to have people explicitly draw out their model schema in powerpoint or on a piece of paper or something. This seems like a fundamental first step in communicating about AI Risk, but only Carlsmith has really done it to an actionable level. Am I missing something here? Are community norms in AI Risk very different to community norms in health economics, which is where I usually do my modelling?)
Agreed on that as well. The Carlsmith report is the only quantitative model of AI risk I’m aware of and it was the right call to do this analysis on it. I think we do have reasonably large error bars on its parameters (though perhaps smaller than an order of magnitude) meaning your insight is important.
Why aren’t there more models? My guess is that it’s just very difficult, with lots of overlapping and entangled scenarios that are hard to tease apart. How would you go about constructing an overall x-risk from the list of disjunctive risks? You can’t assume they’re independent events, and generating conditional probabilities for each seems challenging and not necessarily helpful.
Ajeya Cotra’s BioAnchors report is another quantitative model of that drives lots of beliefs on AI timelines. Stephanie Lin won the EA Critique Contest with one critique, but I’d be curious if you’d have other concerns with it.