(Epistemic status: I have only read parts of the article and skimmed other parts.)
The fundamental thing I am confused about is that the article seems to frequently use probabilities of probabilities (without collapsing these probabilities).
In my worldview, probabilities of probabilites are not a meaningful concept, because they immediately collapse.
Let me explain what I mean by that:
If you assign 40% probability to the statement “there is a 70% probability that Biden will be reelected”
and 60% probability to the statment “there is a 45% probability that Biden will be reelected”, then
you have a 55% probability that Biden will be reelected (because 0.40.7 + 0.60.45 = 0.55).
Probabilities of probabilities can be intermediate steps, but they collapse into single probabilities.
There is one case where this issue directly influences the headline result of 1.6%.
You report intermediate results such as “There is a 13.04% chance we live in a world with low risk from 3% to 7%” (irrellevant side remark: In the context of xrisk, I would consider 5% as very high, not low),
or “There is 7.6% chance that the we live in a world with >35% probability of extinction”.
The latter alone should set a lower bound of 2.66% (0.076 * 0.35 = 0.0266) for the probability of extinction!
Taking the geometric mean in this instance seems wrong to me, and the mathematically correct thing would be to take the mean for aggregating the probabilities.
I have not read the SDO paper in detail, but I have doubts that the SDO method applies to the present scenario/model of xrisk.
You quote Scott Alexander:
Imagine we knew God flipped a coin. If it came up heads, He made 10 billion alien civilization. If it came up tails, He made none besides Earth. Using our one parameter [equation], we determine that on average there should be 5 billion alien civilizations. Since we see zero, that’s quite the paradox, isn’t it?
No. In this case the mean is meaningless. It’s not at all surprising that we see zero alien civilizations, it just means the coin must have landed tails.
I note that this quote fits perfectly fine for analysing the supposed Fermi Paradox,
but it fits badly whenever you have uncertainty over probabilities. If gods flips a coin whether we have 3% or 33% probability of extinction, the result is 18%, and taking the mean is perfectly fine.
I would like to ask the author:
What are your probabilites to the questions from the survey?
What is the product of these probabilities?
Do you agree that multiplying these conditional probabilities is correct under the model or at least a lower bound of the probability of AGI existential catastrophe? Do you agree with the mathematical inequality P(A)≥P(A|B)⋅P(B)?
Is the result from 2. equal approximately equal to 1.6%, or below 3%?
I think if the author accepts 2. + 3. + 4. (which I think they will), they have to give probabilities that are significantly lower than those of many survey respondents.
I do conceed that there is a empirical question whether it is better to aggregate survey results about probabilities using the arithmetic mean or the geometric mean, where the geometric mean would lead to lower results (closer in line with parts of this analysis) in certain models.
TLDR: I believe the author takes gometric means of probabilites when they should take the arithmetic mean.
Probabilities of probabilities can make sense if you specify what they’re over. Say the first level is the difficulty of the alignment problem, and the second one is our actions. The betting odds on doom collapse, but you can still say meaningful things, e.g. if we think there’s a 50% chance alignment is 1% x-risk and a 50% chance it’s 99% x-risk, then the tractability is probably low either way (e.g. if you think the success curve is logistic in effort).
You are probably right that in some cases probabilities of probabilities can contain further information. On reflection, I probably should not have objected to having probabilities of probabilities, because whether you collapse them immediately or later does not change the probabilities, and I should have focused on the arguments that actually change the probabilities.
That said, I still have trouble parsing “there’s a 50% chance alignment is 1% x-risk and a 50% chance it’s 99% x-risk”, and how it would be different from saying “there’s a 50% chance alignment is 27% x-risk and a 50% chance it’s 73% x-risk”. Can you explain the difference? Because they feel the same to me (Maybe you want to gesture at something like “If we expand more thinking effort, we will figure out whether we live in a 1% x-risk world or a 99% x-risk world, but after we figure that out further thinking will not move our probabilities away from 1% or 99%”, but I am far from sure that this is something you want to express here).
If you want to make an argument about tractability, in my view that would require a different model, which then could make statements like “X amount of effort would change the probability of catastrophe from 21% to 16%”. Of course, that model for tractability can reuse un-collapsed probabilities of the model for estimating xrisk.
I don’t know if a rough analogy might help, but imagine you just bought a house . The realtor warns you that some houses in this neighbourhood have faulty wiring, and your house might randomly set on fire during the 5 years or so you plan to live in it (that is, there is a 10% or whatever chance per year the house sets on fire). There are certain precautions you might take, like investing in a fire blanket and making sure your emergency exits are always clear, but principally buying very good home insurance, at a very high premium.
Imagine then you meet a builder in a bar and he says, “Oh yes, Smith was a terrible electrician and any house Smith built has faulty wiring, giving it a 50% chance of fire each year. If Smith didn’t do your wiring then it is no more risky than any other house, maybe 1% per year”. You don’t actually live in a house with a 10% risk, you live in a house with a 1% or 50% risk. Each of those houses necessitates a different strategy—in a low risk house you can basically take no action, and save money on the premium insurance. In the high risk house you want to basically sell immediately (or replace the wiring completely). One important thing you would want to do straight away is discover if Smith or Jones built your house, which is irrelevant information in the first situation before you met the builder in the bar, where you implicitly have perfect certainty. You might reason inductively—“I saw a fire this year, so it is highly likely I live in a home that Smith built, so I am going to sell at a loss to avoid the fire which will inevitably happen next year” (compared to the first situation where you would just reason you were unlucky)
I totally agree with your final paragraph—to actually do anything with the information there is an asymmetrically distributed ex post AI Risk requires a totally different model. This is not an essay about what to actually do about AI Risk. However hopefully this comment gives perhaps a sketch picture of what might be accomplished when such a model is designed and deployed.
I’m not sure that this responds to the objection. Specifically, I think that we would need to clarify what is meant by ‘risk’ here. It sounds like what you’re imagining is having credences over objective chances. The typical case of that would be not knowing whether a coin was biased or not, where the biased one would have (say) 90% chance of heads, and having a credence about whether the coin is biased. In such a case the hypotheses would be chance-statements, and it does make sense to have credences over them.
However, it’s unclear to me whether we can view either the house example or AGI risk as involving objective chances. The most plausible interpretation of an objective chance usually involves a pretty clear stochastic causal mechanism (and some would limit real chances to quantum events). But if we don’t want to allow talk of objective chances, then all the evidence you receive about Smith’s electricity skills, and the probability that they built the house, is just more evidence to conditionalize your credences on, which will leave you with a new final credence over the proposition we ultimately care about: whether your house will burn down. If so, the levels wouldn’t make sense, I think, and you should just multiply through.
I’m not sure how this affects the overall method and argument, but I do wonder whether it would be helpful to be more explicit what is on the respective axes of the graphs (e.g. the first bar chart), and what exactly is meant by risk, to avoid risks of equivocation.
I’m not an AI Risk expert, so any answer I gave to 1 would just be polluting. Let’s say my probabilities are A and B for a two-parameter Carlsmith Model, and those parameters could be 3% or 33% as per your example. So a simple mean of this situation is A = (3% + 33%)/2 = 18% and B is the same, so simple mean is ~3%. The geometric mean is more like 1%.
The most important point I wanted to get across is that the distribution of probabilities can be important in some contexts. If something important happens to our response at a 1% risk then it is useful to know that we will observe less than 1% risk in 3⁄4 of all possible worlds (ie worlds when A or B are at 3%). In the essay I argue that since strategies for living in a low-risk world are likely to be different from strategies for living in a high-risk world (and both sets of strategies are likely to be different from optimal strategy if we live in a simple-mean medium-risk world), distribution is what matters.
If we agree about that (which I’m not certain we do—I think possibly you are arguing that you can and should always reduce probabilities-of-probabilities to just probabilities?), then I don’t really have a strong position on your other point about geometric mean of odds vs simple mean. The most actionable summary statistic depends on the context. While I think geometric mean of odds is probably the correct summary statistic for this application, I accept that there’s an argument to be had on the point.
I’m not an AI Risk expert, so any answer I gave to 1 would just be polluting
I can understand if you don’t want to state those probabilities publicly. But then I don’t know how to resolve what feels to me like an inconsistency. I think you have to bite one of these two bullets:
Most survey respondents are wrong in (some or most of) their probabilities for the “Conditional on …” questions, and your best guess at (some or most of) these probabilities is much lower.
The probability of AGI catastrophe conditional on being invented is much higher than 1.6%
Which one is it? Or is there a way to avoid both bullets while having consistent beliefs (then I would probably need concrete probabilities to be convinced)?
Hmm… I don’t see a contradiction here. I note you skimmed some of the methods, so it might perhaps help explain the contradiction to read the second half of section 3.3.2?
The bullet I bite is the first—most survey respondents are wrong, because they give point probabilities (which is what I asked for, in fairness) whereas in reality there will be uncertainty over those probabilities. Initiatively we might think that this uncertainty doesn’t matter because it will ‘cancel out’ (ie every time you are uncertain in a low direction relative to the truth I am uncertain in a high direction relative to the truth) but in reality—given specific structural assumptions in the Carlsmith Model—this is not true. In reality, the low-end uncertainty compounds and the high-end uncertainty is neutered, which is why you end up with an asymmetric distribution favouring very low-risk outcomes.
Thanks for biting a bullet, I think I am making progress in understanding your view.
I also realized that part of my “feeling of inconsistency” comes from not having realized that the table in section 3.2 reports geometric mean of odds instead of the average, and where the average would be lower.
Lets say we have a 2-parameter Carlsmith model, where we estimate probabilities P(A) and P(B|A), in order to get to a final estimate of the probability P(A∩B). Lets say we have uncertainty over our probability estimates, and we estimate P(A) using a random variable X, and estimate P(B|A) using a random variable Y. To make the math easier, I am going to assume that X,Y are discrete (I can repeat it for a more general case, eg using densities if requested): We have k possible estimates ai for P(A), and pi:=P(X=ai) is the probability that X assigns the value ai for our estimate of P(A). Similarly, bi are estimates for P(B|A) that Y outputs with probability qi:=P(Y=bi). We also have ∑ki=1pi=∑ki=1qi=1.
Your view seems to be something like “To estimate P(A∩B), we should sample from X and Y, and then compute the geometric mean of odds for our final estimate.”
Sampling from X and Y, we get values ai⋅bi with probability pi⋅qi, and then taking the geometric mean of odds would result in the formula
P(A∩B)=k∏i=1k∏j=1(aibj1−aibj)piqj.
Whereas my view is “We should first collapse the probabilities by taking the mean, and then multiply”, that is we first calculate P(A)=∑ki=1aipi and P(B|A)=∑kj=1biqi, for a final formula of
P(A∩B)=(k∑i=1aipi)(k∑j=1bjqj).
And you are also saying ”P(A∩B)=P(A)⋅P(B|A) is still true, but the above naive estimates for P(A) and P(B) are not good, and should actually be different (and lower than typical survey respondents in the case of AI xrisk estimates).” (I can’t derive a precise formula from your comments or my skim of the article, but I don’t think thats a crucial issue.)
Do I characterize your view roughly right? (Not saying that is your whole view, just parts of it).
I think there are problems with this approach.
(Epistemic status: I have only read parts of the article and skimmed other parts.)
The fundamental thing I am confused about is that the article seems to frequently use probabilities of probabilities (without collapsing these probabilities). In my worldview, probabilities of probabilites are not a meaningful concept, because they immediately collapse. Let me explain what I mean by that:
If you assign 40% probability to the statement “there is a 70% probability that Biden will be reelected” and 60% probability to the statment “there is a 45% probability that Biden will be reelected”, then you have a 55% probability that Biden will be reelected (because 0.40.7 + 0.60.45 = 0.55). Probabilities of probabilities can be intermediate steps, but they collapse into single probabilities.
There is one case where this issue directly influences the headline result of 1.6%. You report intermediate results such as “There is a 13.04% chance we live in a world with low risk from 3% to 7%” (irrellevant side remark: In the context of xrisk, I would consider 5% as very high, not low), or “There is 7.6% chance that the we live in a world with >35% probability of extinction”. The latter alone should set a lower bound of 2.66% (0.076 * 0.35 = 0.0266) for the probability of extinction! Taking the geometric mean in this instance seems wrong to me, and the mathematically correct thing would be to take the mean for aggregating the probabilities.
I have not read the SDO paper in detail, but I have doubts that the SDO method applies to the present scenario/model of xrisk. You quote Scott Alexander:
I note that this quote fits perfectly fine for analysing the supposed Fermi Paradox, but it fits badly whenever you have uncertainty over probabilities. If gods flips a coin whether we have 3% or 33% probability of extinction, the result is 18%, and taking the mean is perfectly fine.
I would like to ask the author:
What are your probabilites to the questions from the survey?
What is the product of these probabilities?
Do you agree that multiplying these conditional probabilities is correct under the model or at least a lower bound of the probability of AGI existential catastrophe? Do you agree with the mathematical inequality P(A)≥P(A|B)⋅P(B)?
Is the result from 2. equal approximately equal to 1.6%, or below 3%?
I think if the author accepts 2. + 3. + 4. (which I think they will), they have to give probabilities that are significantly lower than those of many survey respondents.
I do conceed that there is a empirical question whether it is better to aggregate survey results about probabilities using the arithmetic mean or the geometric mean, where the geometric mean would lead to lower results (closer in line with parts of this analysis) in certain models.
TLDR: I believe the author takes gometric means of probabilites when they should take the arithmetic mean.
Probabilities of probabilities can make sense if you specify what they’re over. Say the first level is the difficulty of the alignment problem, and the second one is our actions. The betting odds on doom collapse, but you can still say meaningful things, e.g. if we think there’s a 50% chance alignment is 1% x-risk and a 50% chance it’s 99% x-risk, then the tractability is probably low either way (e.g. if you think the success curve is logistic in effort).
You are probably right that in some cases probabilities of probabilities can contain further information. On reflection, I probably should not have objected to having probabilities of probabilities, because whether you collapse them immediately or later does not change the probabilities, and I should have focused on the arguments that actually change the probabilities.
That said, I still have trouble parsing “there’s a 50% chance alignment is 1% x-risk and a 50% chance it’s 99% x-risk”, and how it would be different from saying “there’s a 50% chance alignment is 27% x-risk and a 50% chance it’s 73% x-risk”. Can you explain the difference? Because they feel the same to me (Maybe you want to gesture at something like “If we expand more thinking effort, we will figure out whether we live in a 1% x-risk world or a 99% x-risk world, but after we figure that out further thinking will not move our probabilities away from 1% or 99%”, but I am far from sure that this is something you want to express here).
If you want to make an argument about tractability, in my view that would require a different model, which then could make statements like “X amount of effort would change the probability of catastrophe from 21% to 16%”. Of course, that model for tractability can reuse un-collapsed probabilities of the model for estimating xrisk.
I don’t know if a rough analogy might help, but imagine you just bought a house . The realtor warns you that some houses in this neighbourhood have faulty wiring, and your house might randomly set on fire during the 5 years or so you plan to live in it (that is, there is a 10% or whatever chance per year the house sets on fire). There are certain precautions you might take, like investing in a fire blanket and making sure your emergency exits are always clear, but principally buying very good home insurance, at a very high premium.
Imagine then you meet a builder in a bar and he says, “Oh yes, Smith was a terrible electrician and any house Smith built has faulty wiring, giving it a 50% chance of fire each year. If Smith didn’t do your wiring then it is no more risky than any other house, maybe 1% per year”. You don’t actually live in a house with a 10% risk, you live in a house with a 1% or 50% risk. Each of those houses necessitates a different strategy—in a low risk house you can basically take no action, and save money on the premium insurance. In the high risk house you want to basically sell immediately (or replace the wiring completely). One important thing you would want to do straight away is discover if Smith or Jones built your house, which is irrelevant information in the first situation before you met the builder in the bar, where you implicitly have perfect certainty. You might reason inductively—“I saw a fire this year, so it is highly likely I live in a home that Smith built, so I am going to sell at a loss to avoid the fire which will inevitably happen next year” (compared to the first situation where you would just reason you were unlucky)
I totally agree with your final paragraph—to actually do anything with the information there is an asymmetrically distributed ex post AI Risk requires a totally different model. This is not an essay about what to actually do about AI Risk. However hopefully this comment gives perhaps a sketch picture of what might be accomplished when such a model is designed and deployed.
I’m not sure that this responds to the objection. Specifically, I think that we would need to clarify what is meant by ‘risk’ here. It sounds like what you’re imagining is having credences over objective chances. The typical case of that would be not knowing whether a coin was biased or not, where the biased one would have (say) 90% chance of heads, and having a credence about whether the coin is biased. In such a case the hypotheses would be chance-statements, and it does make sense to have credences over them.
However, it’s unclear to me whether we can view either the house example or AGI risk as involving objective chances. The most plausible interpretation of an objective chance usually involves a pretty clear stochastic causal mechanism (and some would limit real chances to quantum events). But if we don’t want to allow talk of objective chances, then all the evidence you receive about Smith’s electricity skills, and the probability that they built the house, is just more evidence to conditionalize your credences on, which will leave you with a new final credence over the proposition we ultimately care about: whether your house will burn down. If so, the levels wouldn’t make sense, I think, and you should just multiply through.
I’m not sure how this affects the overall method and argument, but I do wonder whether it would be helpful to be more explicit what is on the respective axes of the graphs (e.g. the first bar chart), and what exactly is meant by risk, to avoid risks of equivocation.
I’m not an AI Risk expert, so any answer I gave to 1 would just be polluting. Let’s say my probabilities are A and B for a two-parameter Carlsmith Model, and those parameters could be 3% or 33% as per your example. So a simple mean of this situation is A = (3% + 33%)/2 = 18% and B is the same, so simple mean is ~3%. The geometric mean is more like 1%.
The most important point I wanted to get across is that the distribution of probabilities can be important in some contexts. If something important happens to our response at a 1% risk then it is useful to know that we will observe less than 1% risk in 3⁄4 of all possible worlds (ie worlds when A or B are at 3%). In the essay I argue that since strategies for living in a low-risk world are likely to be different from strategies for living in a high-risk world (and both sets of strategies are likely to be different from optimal strategy if we live in a simple-mean medium-risk world), distribution is what matters.
If we agree about that (which I’m not certain we do—I think possibly you are arguing that you can and should always reduce probabilities-of-probabilities to just probabilities?), then I don’t really have a strong position on your other point about geometric mean of odds vs simple mean. The most actionable summary statistic depends on the context. While I think geometric mean of odds is probably the correct summary statistic for this application, I accept that there’s an argument to be had on the point.
I can understand if you don’t want to state those probabilities publicly. But then I don’t know how to resolve what feels to me like an inconsistency. I think you have to bite one of these two bullets:
Most survey respondents are wrong in (some or most of) their probabilities for the “Conditional on …” questions, and your best guess at (some or most of) these probabilities is much lower.
The probability of AGI catastrophe conditional on being invented is much higher than 1.6%
Which one is it? Or is there a way to avoid both bullets while having consistent beliefs (then I would probably need concrete probabilities to be convinced)?
Hmm… I don’t see a contradiction here. I note you skimmed some of the methods, so it might perhaps help explain the contradiction to read the second half of section 3.3.2?
The bullet I bite is the first—most survey respondents are wrong, because they give point probabilities (which is what I asked for, in fairness) whereas in reality there will be uncertainty over those probabilities. Initiatively we might think that this uncertainty doesn’t matter because it will ‘cancel out’ (ie every time you are uncertain in a low direction relative to the truth I am uncertain in a high direction relative to the truth) but in reality—given specific structural assumptions in the Carlsmith Model—this is not true. In reality, the low-end uncertainty compounds and the high-end uncertainty is neutered, which is why you end up with an asymmetric distribution favouring very low-risk outcomes.
Thanks for biting a bullet, I think I am making progress in understanding your view.
I also realized that part of my “feeling of inconsistency” comes from not having realized that the table in section 3.2 reports geometric mean of odds instead of the average, and where the average would be lower.
Lets say we have a 2-parameter Carlsmith model, where we estimate probabilities P(A) and P(B|A), in order to get to a final estimate of the probability P(A∩B). Lets say we have uncertainty over our probability estimates, and we estimate P(A) using a random variable X, and estimate P(B|A) using a random variable Y. To make the math easier, I am going to assume that X,Y are discrete (I can repeat it for a more general case, eg using densities if requested): We have k possible estimates ai for P(A), and pi:=P(X=ai) is the probability that X assigns the value ai for our estimate of P(A). Similarly, bi are estimates for P(B|A) that Y outputs with probability qi:=P(Y=bi). We also have ∑ki=1pi=∑ki=1qi=1.
Your view seems to be something like “To estimate P(A∩B), we should sample from X and Y, and then compute the geometric mean of odds for our final estimate.”
Sampling from X and Y, we get values ai⋅bi with probability pi⋅qi, and then taking the geometric mean of odds would result in the formula
P(A∩B)=k∏i=1k∏j=1(aibj1−aibj)piqj.
Whereas my view is “We should first collapse the probabilities by taking the mean, and then multiply”, that is we first calculate P(A)=∑ki=1aipi and P(B|A)=∑kj=1biqi, for a final formula of
P(A∩B)=(k∑i=1aipi)(k∑j=1bjqj).
And you are also saying ”P(A∩B)=P(A)⋅P(B|A) is still true, but the above naive estimates for P(A) and P(B) are not good, and should actually be different (and lower than typical survey respondents in the case of AI xrisk estimates).” (I can’t derive a precise formula from your comments or my skim of the article, but I don’t think thats a crucial issue.)
Do I characterize your view roughly right? (Not saying that is your whole view, just parts of it).