One reason I might be finding this post uncomfortable is the chart it’s centered around.
The medical information is based on real people who have died recently. It’s a forecast based on counting. We can have a lot of confidence in those numbers.
In contrast, the AI numbers are trying to predict something that’s never happened before. It’s worth trying to predict, but the numbers are very different, and we can’t have much confidence in them especially for one particular year.
It feels kind of misleading to try to put these two very different kinds of numbers side by side as if they’re directly comparable.
I fairly strongly disagree with this take on two counts:
The life expectancy numbers are not highly robust. They’re naively extrapolating out the current rate of death in the UK to the future. This is a pretty dodgy methodology! I’m assuming that medical technology won’t expand, that AI won’t accelerate biotech research, that longevity research doesn’t go anywhere, that we don’t have disasters like a much worse pandemic or nuclear war, that there won’t be new major public health hazards that disproportionately affect young people, that climate change won’t substantially affect life expectancy in the rich world, that there won’t be major enough wars to affect life expectancy in the UK, etc. The one thing that we know won’t happen in the future is the status quo.
I agree that it’s less dodgy than the AI numbers, but it’s still on a continuum, rather than some ontological difference between legit numbers and non-legit numbers.
Leaving that aside, I think it’s extremely reasonable to compare high confidence and low confidence numbers so long as they’re trying to measure the same thing. The key thing is that low confidence numbers aren’t low confidence in any particular direction (if they were, we’d change to a different estimate). Maybe the AI x-risk numbers are way higher, maybe they’re way lower. They’re definitely noisier, but the numbers mean fundamentally the same thing, and are directly comparable. And comparing numbers like this is part of the process of understanding the implications of your models of the future, even if they are fairly messy and uncertain models.
Of course, it’s totally reasonable to disagree with the models used for these questions and think that eg they have major systematic biases towards exaggerating AI probabilities. That should just give you different numbers to put into this model.
As a concrete example, I’d like governments to be able to compare the risks of a nuclear war to their citizens lives, vs other more mundane risks, and to figure out cost-effectiveness accordingly. Nuclear wars have never happened in something remotely comparable to today’s geopolitical climate, so any models here will be inherently uncertain and speculative, but it seems pretty important to be able to answer questions like this regardless.
I disagree because I think error bounds over probabilities are less principled than a lot of people assume, and they can add a bunch of false confidence.
Yes. Quantitive expressions of credal resilience is complicated, there isn’t a widely-shared-upon formulation, and a lot of people falsely assume that error bounds on made-up probabilities are more “rigorous” or “objective” than the probabilities themselves.
The issue is that by putting together high-confidence (relatively) and low-confidence estimates in your calculation, your resulting numbers should be low-confidence. For example, if your error bounds for AI risk vary by an order of magnitude each way (which is frankly insanely small for something this speculative) then the error bounds in your relative risk estimate would give you a value between 0.6% and 87%. With an error range like this, I don’t think the statement “my most likely reason to die young is AI x-risk” is justified.
Hmm. I agree that these numbers are low confidence. But for the purpose of acting and forming conclusions from this, I’m not sure what you think is a better approach (beyond saying that more resources should be put into becoming more confident, which I broadly agree with).
Do you think I can never make statements like “low confidence proposition X is more likely than high confidence proposition Y”? What would feel like a reasonable criteria for being able to say that kind of thing?
More generally, I’m not actually sure what you’re trying to capture with error bounds—what does it actually mean to say that P(AI X-risk) is in [0.5%, 50%] rather than 5%? What is this a probability distribution over? I’m estimating a probability, not a quantity. I’d be open to the argument that the uncertainty comes from ‘what might I think if I thought about this for much longer’.
I’ll also note that the timeline numbers are a distribution over years, which is already implicitly including a bunch of uncertainty plus some probability over AI never. Though obviously it could include more. The figure for AI x-risk is a point estimate, which is much dodgier.
And I’ll note again that the natural causes numbers are at best medium confidence, since they assume the status quo continues!
would give you a value between 0.6% and 87%
Nitpick: I think you mean 6%? (0.37/(0.37+5.3) = 0.06). Obviously this doesn’t change your core point.
Do you think I can never make statements like “low confidence proposition X is more likely than high confidence proposition Y”? What would feel like a reasonable criteria for being able to say that kind of thing?
Honestly, yeah, I think it is a weird statement to definitively state that X wildly speculative thing is more likely than Y well known and studied thing (or to put it differently, when the error bounds of X are orders of magnitude different from the error bounds in Y). It might help if you provided a counterexample here? I think my objections here might be partially on the semantics, saying “X is more likely than Y” seems like a smuggling of certainty into a very uncertain proposition.
what does it actually mean to say that P(AI X-risk) is in [0.5%, 50%] rather than 5%
I think it elucidates more accurately the state of knowledge about the situation, which is that you don’t know much at all.
One reason I might be finding this post uncomfortable is the chart it’s centered around.
The medical information is based on real people who have died recently. It’s a forecast based on counting. We can have a lot of confidence in those numbers.
In contrast, the AI numbers are trying to predict something that’s never happened before. It’s worth trying to predict, but the numbers are very different, and we can’t have much confidence in them especially for one particular year.
It feels kind of misleading to try to put these two very different kinds of numbers side by side as if they’re directly comparable.
I fairly strongly disagree with this take on two counts:
The life expectancy numbers are not highly robust. They’re naively extrapolating out the current rate of death in the UK to the future. This is a pretty dodgy methodology! I’m assuming that medical technology won’t expand, that AI won’t accelerate biotech research, that longevity research doesn’t go anywhere, that we don’t have disasters like a much worse pandemic or nuclear war, that there won’t be new major public health hazards that disproportionately affect young people, that climate change won’t substantially affect life expectancy in the rich world, that there won’t be major enough wars to affect life expectancy in the UK, etc. The one thing that we know won’t happen in the future is the status quo.
I agree that it’s less dodgy than the AI numbers, but it’s still on a continuum, rather than some ontological difference between legit numbers and non-legit numbers.
Leaving that aside, I think it’s extremely reasonable to compare high confidence and low confidence numbers so long as they’re trying to measure the same thing. The key thing is that low confidence numbers aren’t low confidence in any particular direction (if they were, we’d change to a different estimate). Maybe the AI x-risk numbers are way higher, maybe they’re way lower. They’re definitely noisier, but the numbers mean fundamentally the same thing, and are directly comparable. And comparing numbers like this is part of the process of understanding the implications of your models of the future, even if they are fairly messy and uncertain models.
Of course, it’s totally reasonable to disagree with the models used for these questions and think that eg they have major systematic biases towards exaggerating AI probabilities. That should just give you different numbers to put into this model.
As a concrete example, I’d like governments to be able to compare the risks of a nuclear war to their citizens lives, vs other more mundane risks, and to figure out cost-effectiveness accordingly. Nuclear wars have never happened in something remotely comparable to today’s geopolitical climate, so any models here will be inherently uncertain and speculative, but it seems pretty important to be able to answer questions like this regardless.
Re 2 the right way to compare high and low confidence numbers is to add error bounds. This chart does not do that.
I disagree because I think error bounds over probabilities are less principled than a lot of people assume, and they can add a bunch of false confidence.
More false confidence than not mentioning error ranges at all?
Yes. Quantitive expressions of credal resilience is complicated, there isn’t a widely-shared-upon formulation, and a lot of people falsely assume that error bounds on made-up probabilities are more “rigorous” or “objective” than the probabilities themselves.
The issue is that by putting together high-confidence (relatively) and low-confidence estimates in your calculation, your resulting numbers should be low-confidence. For example, if your error bounds for AI risk vary by an order of magnitude each way (which is frankly insanely small for something this speculative) then the error bounds in your relative risk estimate would give you a value between 0.6% and 87%. With an error range like this, I don’t think the statement “my most likely reason to die young is AI x-risk” is justified.
Hmm. I agree that these numbers are low confidence. But for the purpose of acting and forming conclusions from this, I’m not sure what you think is a better approach (beyond saying that more resources should be put into becoming more confident, which I broadly agree with).
Do you think I can never make statements like “low confidence proposition X is more likely than high confidence proposition Y”? What would feel like a reasonable criteria for being able to say that kind of thing?
More generally, I’m not actually sure what you’re trying to capture with error bounds—what does it actually mean to say that P(AI X-risk) is in [0.5%, 50%] rather than 5%? What is this a probability distribution over? I’m estimating a probability, not a quantity. I’d be open to the argument that the uncertainty comes from ‘what might I think if I thought about this for much longer’.
I’ll also note that the timeline numbers are a distribution over years, which is already implicitly including a bunch of uncertainty plus some probability over AI never. Though obviously it could include more. The figure for AI x-risk is a point estimate, which is much dodgier.
And I’ll note again that the natural causes numbers are at best medium confidence, since they assume the status quo continues!
Nitpick: I think you mean 6%? (0.37/(0.37+5.3) = 0.06). Obviously this doesn’t change your core point.
Honestly, yeah, I think it is a weird statement to definitively state that X wildly speculative thing is more likely than Y well known and studied thing (or to put it differently, when the error bounds of X are orders of magnitude different from the error bounds in Y). It might help if you provided a counterexample here? I think my objections here might be partially on the semantics, saying “X is more likely than Y” seems like a smuggling of certainty into a very uncertain proposition.
I think it elucidates more accurately the state of knowledge about the situation, which is that you don’t know much at all.
(also, lol, fair point on the calculation error)