Probabilities of probabilities can make sense if you specify what theyāre over. Say the first level is the difficulty of the alignment problem, and the second one is our actions. The betting odds on doom collapse, but you can still say meaningful things, e.g. if we think thereās a 50% chance alignment is 1% x-risk and a 50% chance itās 99% x-risk, then the tractability is probably low either way (e.g. if you think the success curve is logistic in effort).
You are probably right that in some cases probabilities of probabilities can contain further information. On reflection, I probably should not have objected to having probabilities of probabilities, because whether you collapse them immediately or later does not change the probabilities, and I should have focused on the arguments that actually change the probabilities.
That said, I still have trouble parsing āthereās a 50% chance alignment is 1% x-risk and a 50% chance itās 99% x-riskā, and how it would be different from saying āthereās a 50% chance alignment is 27% x-risk and a 50% chance itās 73% x-riskā. Can you explain the difference? Because they feel the same to me (Maybe you want to gesture at something like āIf we expand more thinking effort, we will figure out whether we live in a 1% x-risk world or a 99% x-risk world, but after we figure that out further thinking will not move our probabilities away from 1% or 99%ā, but I am far from sure that this is something you want to express here).
If you want to make an argument about tractability, in my view that would require a different model, which then could make statements like āX amount of effort would change the probability of catastrophe from 21% to 16%ā. Of course, that model for tractability can reuse un-collapsed probabilities of the model for estimating xrisk.
I donāt know if a rough analogy might help, but imagine you just bought a house . The realtor warns you that some houses in this neighbourhood have faulty wiring, and your house might randomly set on fire during the 5 years or so you plan to live in it (that is, there is a 10% or whatever chance per year the house sets on fire). There are certain precautions you might take, like investing in a fire blanket and making sure your emergency exits are always clear, but principally buying very good home insurance, at a very high premium.
Imagine then you meet a builder in a bar and he says, āOh yes, Smith was a terrible electrician and any house Smith built has faulty wiring, giving it a 50% chance of fire each year. If Smith didnāt do your wiring then it is no more risky than any other house, maybe 1% per yearā. You donāt actually live in a house with a 10% risk, you live in a house with a 1% or 50% risk. Each of those houses necessitates a different strategyāin a low risk house you can basically take no action, and save money on the premium insurance. In the high risk house you want to basically sell immediately (or replace the wiring completely). One important thing you would want to do straight away is discover if Smith or Jones built your house, which is irrelevant information in the first situation before you met the builder in the bar, where you implicitly have perfect certainty. You might reason inductivelyāāI saw a fire this year, so it is highly likely I live in a home that Smith built, so I am going to sell at a loss to avoid the fire which will inevitably happen next yearā (compared to the first situation where you would just reason you were unlucky)
I totally agree with your final paragraphāto actually do anything with the information there is an asymmetrically distributed ex post AI Risk requires a totally different model. This is not an essay about what to actually do about AI Risk. However hopefully this comment gives perhaps a sketch picture of what might be accomplished when such a model is designed and deployed.
Iām not sure that this responds to the objection. Specifically, I think that we would need to clarify what is meant by āriskā here. It sounds like what youāre imagining is having credences over objective chances. The typical case of that would be not knowing whether a coin was biased or not, where the biased one would have (say) 90% chance of heads, and having a credence about whether the coin is biased. In such a case the hypotheses would be chance-statements, and it does make sense to have credences over them.
However, itās unclear to me whether we can view either the house example or AGI risk as involving objective chances. The most plausible interpretation of an objective chance usually involves a pretty clear stochastic causal mechanism (and some would limit real chances to quantum events). But if we donāt want to allow talk of objective chances, then all the evidence you receive about Smithās electricity skills, and the probability that they built the house, is just more evidence to conditionalize your credences on, which will leave you with a new final credence over the proposition we ultimately care about: whether your house will burn down. If so, the levels wouldnāt make sense, I think, and you should just multiply through.
Iām not sure how this affects the overall method and argument, but I do wonder whether it would be helpful to be more explicit what is on the respective axes of the graphs (e.g. the first bar chart), and what exactly is meant by risk, to avoid risks of equivocation.
Probabilities of probabilities can make sense if you specify what theyāre over. Say the first level is the difficulty of the alignment problem, and the second one is our actions. The betting odds on doom collapse, but you can still say meaningful things, e.g. if we think thereās a 50% chance alignment is 1% x-risk and a 50% chance itās 99% x-risk, then the tractability is probably low either way (e.g. if you think the success curve is logistic in effort).
You are probably right that in some cases probabilities of probabilities can contain further information. On reflection, I probably should not have objected to having probabilities of probabilities, because whether you collapse them immediately or later does not change the probabilities, and I should have focused on the arguments that actually change the probabilities.
That said, I still have trouble parsing āthereās a 50% chance alignment is 1% x-risk and a 50% chance itās 99% x-riskā, and how it would be different from saying āthereās a 50% chance alignment is 27% x-risk and a 50% chance itās 73% x-riskā. Can you explain the difference? Because they feel the same to me (Maybe you want to gesture at something like āIf we expand more thinking effort, we will figure out whether we live in a 1% x-risk world or a 99% x-risk world, but after we figure that out further thinking will not move our probabilities away from 1% or 99%ā, but I am far from sure that this is something you want to express here).
If you want to make an argument about tractability, in my view that would require a different model, which then could make statements like āX amount of effort would change the probability of catastrophe from 21% to 16%ā. Of course, that model for tractability can reuse un-collapsed probabilities of the model for estimating xrisk.
I donāt know if a rough analogy might help, but imagine you just bought a house . The realtor warns you that some houses in this neighbourhood have faulty wiring, and your house might randomly set on fire during the 5 years or so you plan to live in it (that is, there is a 10% or whatever chance per year the house sets on fire). There are certain precautions you might take, like investing in a fire blanket and making sure your emergency exits are always clear, but principally buying very good home insurance, at a very high premium.
Imagine then you meet a builder in a bar and he says, āOh yes, Smith was a terrible electrician and any house Smith built has faulty wiring, giving it a 50% chance of fire each year. If Smith didnāt do your wiring then it is no more risky than any other house, maybe 1% per yearā. You donāt actually live in a house with a 10% risk, you live in a house with a 1% or 50% risk. Each of those houses necessitates a different strategyāin a low risk house you can basically take no action, and save money on the premium insurance. In the high risk house you want to basically sell immediately (or replace the wiring completely). One important thing you would want to do straight away is discover if Smith or Jones built your house, which is irrelevant information in the first situation before you met the builder in the bar, where you implicitly have perfect certainty. You might reason inductivelyāāI saw a fire this year, so it is highly likely I live in a home that Smith built, so I am going to sell at a loss to avoid the fire which will inevitably happen next yearā (compared to the first situation where you would just reason you were unlucky)
I totally agree with your final paragraphāto actually do anything with the information there is an asymmetrically distributed ex post AI Risk requires a totally different model. This is not an essay about what to actually do about AI Risk. However hopefully this comment gives perhaps a sketch picture of what might be accomplished when such a model is designed and deployed.
Iām not sure that this responds to the objection. Specifically, I think that we would need to clarify what is meant by āriskā here. It sounds like what youāre imagining is having credences over objective chances. The typical case of that would be not knowing whether a coin was biased or not, where the biased one would have (say) 90% chance of heads, and having a credence about whether the coin is biased. In such a case the hypotheses would be chance-statements, and it does make sense to have credences over them.
However, itās unclear to me whether we can view either the house example or AGI risk as involving objective chances. The most plausible interpretation of an objective chance usually involves a pretty clear stochastic causal mechanism (and some would limit real chances to quantum events). But if we donāt want to allow talk of objective chances, then all the evidence you receive about Smithās electricity skills, and the probability that they built the house, is just more evidence to conditionalize your credences on, which will leave you with a new final credence over the proposition we ultimately care about: whether your house will burn down. If so, the levels wouldnāt make sense, I think, and you should just multiply through.
Iām not sure how this affects the overall method and argument, but I do wonder whether it would be helpful to be more explicit what is on the respective axes of the graphs (e.g. the first bar chart), and what exactly is meant by risk, to avoid risks of equivocation.