Hey, thank you for taking the time to explain your position, I appreciate it. I’m not trying to take a dig at your estimates in particular, this is merely part of my suspicion that EA as a whole has widespread flaws in estimation of unbounded probabilities.
Let’s start from your six questions again:
Timelines: By 2070, it will be possible and financially feasible to build APS-AI: systems with advanced capabilities (outperform humans at tasks important for gaining power), agentic planning (make plans then acts on them), and strategic awareness (its plans are based on models of the world good enough to overpower humans).
Incentives: There will be strong incentives to build and deploy APS-AI.
Alignment difficulty: It will be much harder to build APS-AI systems that don’t seek power in unintended ways, than ones that would seek power but are superficially attractive to deploy.
High-impact failures: Some deployed APS-AI systems will seek power in unintended and high-impact ways, collectively causing >$1 trillion in damage.
Disempowerment: Some of the power-seeking will in aggregate permanently disempower all of humanity.
Catastrophe: The disempowerment will constitute an existential catastrophe.
So starting with the problems of decomposition: You’re right that the errors in answers to the six questions will be correlated, so treating them like independent events can lead to compounded error in the final estimate. But they seem to imply that this necessarily causes an underestimation, when to my mind it’s just as likely to cause an overestimation. If there was a systematic bias that caused you to double the estimate for each step, then the actual value would be about two orders of magnitude lower than your estimate.
For example, one source of correlation would be in how powerful you expect an AI to actually be. Hypothetically, if there turned out to be serious obstacles in power-seeking AI development that limited it to being “very good” instead of “overpowering”, it would simultaneously lower the odds for steps 1,2,4, and 5 together. (Of course, the opposite would apply if it turned out to be even stronger than expected).
Anyway, this partially explains why the estimates are similar, but I don’t think it lets you off the hook entirely, as they aren’t perfectly correlated. For example question 3 is fundamentally about the difficulty of coding goal functions into software, while question 5 is fundamentally about the potential capabilities of AI and the ability of society to resist an AI takeover. It still seems weird that you arrived at pretty much the same probability for both of them.
I think a really cool exercise would be doing a bunch of different decompositions of AI x-risk, making forecasts on them, then trying to reconcile the differences in results.
I would be quite interested in this! I think the way you split it can have a significant effect on what seems intuitive. For example, I believe that it will be very hard to program an AI that is not misaligned in some damaging way, but very easy to program one that is not an x-risk or s-risk threat. This objection doesn’t really jibe well with the 6 questions above: it might look like high estimates for questions 1-4, then a sudden massively low probability for question 5. It seems like the choice of decomposition will depend on what you think the “key questions” are.
I try to be as transparent in my reasoning as possible, though I think open-ended forecasting of this sort will always involve some level of “quantifying vibes”. I strongly disagree that this is a reason to avoid it, but agree this is a reason not take it too seriously. If it’s at all reassuring, I’ve written about my track record on quantifying vibes and I think it’s decent overall.
Well done on the impressive forecasting performance! I’m certainly not against forecasting in general, but I do have concerns about forecasting for low-probability and unbounded probability events. I’m not convinced that expertise at forecasting things with lots of data and evidence surrounded them such as the inflation rate next quarter will transfer to a question like “what is the probability that the universe is a simulation and will shut down within the next century”. The former seems mostly evidence based, with a little vibes thrown in, while the latter is almost entirely vibes based, with only the barest hint of evidence thrown in. I view AI safety as somewhere in between the two.
Anyway, I’ll probably be spinning this off into a whole post, so discussion is highly welcome!
tl;dr I agree with a decent amount of this. I’d guess our disagreements are mainly on the object-level arguments for and against AI risk.
So starting with the problems of decomposition: You’re right that the errors in answers to the six questions will be correlated, so treating them like independent events can lead to compounded error in the final estimate. But they seem to imply that this necessarily causes an underestimation, when to my mind it’s just as likely to cause an overestimation. If there was a systematic bias that caused you to double the estimate for each step, then the actual value would be about two orders of magnitude lower than your estimate.
I agree that correlations in particular could cause overestimation rather than underestimation, and didn’t mean to imply otherwise.
My primary point above was not about correlations between premises though; it was about the process of taking a claim we think might warrant substantial credence, then splitting it into several steps that are all conjunctive (then assigning probabilities, for which people are often reluctant to seem overconfident on).
Anyway, this partially explains why the estimates are similar, but I don’t think it lets you off the hook entirely, as they aren’t perfectly correlated. For example question 3 is fundamentally about the difficulty of coding goal functions into software, while question 5 is fundamentally about the potential capabilities of AI and the ability of society to resist an AI takeover. It still seems weird that you arrived at pretty much the same probability for both of them.
I’m a little confused here; it seems like since I gave 6 probabilities, it would on the contrary be surprising if 2 of them weren’t pretty close to each other, even the more uncorrelated ones?
I would be quite interested in this! I think the way you split it can have a significant effect on what seems intuitive. For example, I believe that it will be very hard to program an AI that is not misaligned in some damaging way, but very easy to program one that is not an x-risk or s-risk threat. This objection doesn’t really jibe well with the 6 questions above: it might look like high estimates for questions 1-4, then a sudden massively low probability for question 5. It seems like the choice of decomposition will depend on what you think the “key questions” are.
Thanks for linking your reasoning for thinking it might be easy to create an AI that isn’t an x-risk or s-risk threat. I skimmed it and agree with the top comment: I think the strategy you described would likely strongly limit the capabilities of the AI system a lot even if (and I think it’s a big if) it succeeded in alignment. I think we need an alignment solution which doesn’t impose as big of a capabilities penalty.
I think it’s fine if you have a much lower probability for one of the questions in the decomposition than others! I don’t see what’s inherently wrong with that.
I’m not convinced that expertise at forecasting things with lots of data and evidence surrounded them such as the inflation rate next quarter will transfer to a question like “what is the probability that the universe is a simulation and will shut down within the next century”. The former seems mostly evidence based, with a little vibes thrown in, while the latter is almost entirely vibes based, with only the barest hint of evidence thrown in. I view AI safety as somewhere in between the two.
I agree with you directionally, but I’d argue that my track record has many questions that don’t have much data and evidence behind them, especially in comparison to your example about the inflation rate. As a quick example, I did well in the Salk Tournament for SARS-CoV-2 Vaccine R&D. I think for many of the questions, we did have some past data to go on but not much and the right choice of reference class is very unclear compared to your inflation example.
Anyway, I’ll probably be spinning this off into a whole post, so discussion is highly welcome!
Looking forward to it, let me know if feedback would be useful :)
Yeah, no worries! I think this is helping me to figure out what my issue is, which i think is related to what probability ranges are “reasonable”.
I’m a little confused here; it seems like since I gave 6 probabilities, it would on the contrary be surprising if 2 of them weren’t pretty close to each other, even the more uncorrelated ones?
That’s the thing. I do think it’s surprising. When we are talking about speculative events, the range of probabilities should be enormous. If I estimate the odds that vladimir putin is killed by a freak meteor strike, the answer is not in the 1-99% range, it’s in the 1 in a billion range. What are the odds that paraguay becomes a world superpower in the next 50 years? 1 in a million? 1 in a trillion? Conversely, what are the odds that the sun will rise on the earth in 2070? About as close to 1 as it’s possible for an estimate to get.
When we consider question 5, we are asking about the winner of a speculative war between a society that we are very uncertain about and an AI that we know next to nothing about. In question 3, you are asking for an estimate of the competence of future AI engineers at constraining an as yet unknown AI design. In both cases, I see the “reasonable range” of estimates as being extremely broad. I would not be surprised if the “true estimate” (if thats even a coherent concept) for Q5 was 1 in a billion, or 1 in a thousand, or nearly 1 in 1. This is what I strikes me as off. Out of all the possible answers for these highly speculative questions in logarithmic space, why would both of them end up in the 75-85% range?
Consider, by contrast, your covid-19 predictions. these seem to be bounded in a way that the examples above aren’t. There was uncertainty about whether the vaccine would be implemented in 2021 or 2023, perhaps you could make a reasonable case for predicting it would take until like 2025. But if I gave an answer of “2112 ad”, you would look at me like a crazy person. It seems like the AI estimates are unbounded in their ranges in a way the metacalculus questions aren’t.
This is where the object level and the meta level get kinda hard to untangle. I think if you accept my meta level reasoning, it also necessitates lowering your object level estimates. If you try and make the estimates for the individual steps vary more (by putting a 0.1% in there or something), the total probability will then end up being as low as that step. But i’m not sure if this is necessarily wrong? If your case relies on a chain of at least somewhat independent unbounded speculative events, placing odds as high as 40% seems like it’s an error on it’s face.
The way I think about what range of probabilities is reasonable is mostly by considering reference classes for (a) the object-level prediction being made and (b) the success rate of relatively similar predictions in the past. I agree that a priori most claims that feel very speculative we’d expect to have little confidence in, but I think we can get a lot of evidence from considering more specific reference classes.
Let’s take the example of determining whether AI would be disempower humanity:
For (a), I think looking at the reference class of “do more intelligent entities disempower less intelligent entities? (past a certain level of intelligence)” is reasonable and would give a high baseline (one could then adjust down from the reference class forecast based on how strong the considerations are that we will potentially able to see it coming to some extent, prepare in advance, etc.).
In both cases, I see the “reasonable range” of estimates as being extremely broad. I would not be surprised if the “true estimate” (if thats even a coherent concept) for Q5 was 1 in a billion, or 1 in a thousand, or nearly 1 in 1.
I agree that we shouldn’t be shocked if it’s the case that the “true”estimate for at least one of the questions is very confident, but I don’t think we should be shocked if the “best realistically achievable” estimates for all of them aren’t that confident. Where “best realistically achievable” estimates are subject to our very limited time and reasoning capacities.
I think the choice of reference class is itself a major part of the object level argument. For example, instead of asking “do more intelligent entities disempower less intelligent entities”, why not ask “does the side of a war starting off with vastly more weapons, manpower and resources usually win?”. Or “do test subjects usually escape and overpower their captors?” Or ” Has any intelligent entity existed without sufficient flaws to prevent them from executing world domination?”. These reference classes intuit a much lower estimation.
Now, all of these reference classes are flawed in that none of them correspond 1 to 1 with the actual situation at hand. But neither does yours! For example, in none of the previous cases of higher intelligence overpowering lower intelligences has the lower intelligence had the ability to write the brain of the higher intelligence. Is this a big factor or a small factor? Who knows?
As for b), I just don’t agree that predictions about the outcome of future AI wars are in a similar class to questions like “will there be manned missions to mars” or “predicting the smartphone”.
Anyway, I’m not too interested in going in depth on the object level right now. Ultimately I’ve only barely scratched the surfaces of the flaws leading to overestimation of AI risk, and it will take time to break through, so I thank you for your illuminating discussion!
Hey, thank you for taking the time to explain your position, I appreciate it. I’m not trying to take a dig at your estimates in particular, this is merely part of my suspicion that EA as a whole has widespread flaws in estimation of unbounded probabilities.
Let’s start from your six questions again:
So starting with the problems of decomposition: You’re right that the errors in answers to the six questions will be correlated, so treating them like independent events can lead to compounded error in the final estimate. But they seem to imply that this necessarily causes an underestimation, when to my mind it’s just as likely to cause an overestimation. If there was a systematic bias that caused you to double the estimate for each step, then the actual value would be about two orders of magnitude lower than your estimate.
For example, one source of correlation would be in how powerful you expect an AI to actually be. Hypothetically, if there turned out to be serious obstacles in power-seeking AI development that limited it to being “very good” instead of “overpowering”, it would simultaneously lower the odds for steps 1,2,4, and 5 together. (Of course, the opposite would apply if it turned out to be even stronger than expected).
Anyway, this partially explains why the estimates are similar, but I don’t think it lets you off the hook entirely, as they aren’t perfectly correlated. For example question 3 is fundamentally about the difficulty of coding goal functions into software, while question 5 is fundamentally about the potential capabilities of AI and the ability of society to resist an AI takeover. It still seems weird that you arrived at pretty much the same probability for both of them.
I would be quite interested in this! I think the way you split it can have a significant effect on what seems intuitive. For example, I believe that it will be very hard to program an AI that is not misaligned in some damaging way, but very easy to program one that is not an x-risk or s-risk threat. This objection doesn’t really jibe well with the 6 questions above: it might look like high estimates for questions 1-4, then a sudden massively low probability for question 5. It seems like the choice of decomposition will depend on what you think the “key questions” are.
Well done on the impressive forecasting performance! I’m certainly not against forecasting in general, but I do have concerns about forecasting for low-probability and unbounded probability events. I’m not convinced that expertise at forecasting things with lots of data and evidence surrounded them such as the inflation rate next quarter will transfer to a question like “what is the probability that the universe is a simulation and will shut down within the next century”. The former seems mostly evidence based, with a little vibes thrown in, while the latter is almost entirely vibes based, with only the barest hint of evidence thrown in. I view AI safety as somewhere in between the two.
Anyway, I’ll probably be spinning this off into a whole post, so discussion is highly welcome!
tl;dr I agree with a decent amount of this. I’d guess our disagreements are mainly on the object-level arguments for and against AI risk.
I agree that correlations in particular could cause overestimation rather than underestimation, and didn’t mean to imply otherwise.
My primary point above was not about correlations between premises though; it was about the process of taking a claim we think might warrant substantial credence, then splitting it into several steps that are all conjunctive (then assigning probabilities, for which people are often reluctant to seem overconfident on).
I’m a little confused here; it seems like since I gave 6 probabilities, it would on the contrary be surprising if 2 of them weren’t pretty close to each other, even the more uncorrelated ones?
Thanks for linking your reasoning for thinking it might be easy to create an AI that isn’t an x-risk or s-risk threat. I skimmed it and agree with the top comment: I think the strategy you described would likely strongly limit the capabilities of the AI system a lot even if (and I think it’s a big if) it succeeded in alignment. I think we need an alignment solution which doesn’t impose as big of a capabilities penalty.
I think it’s fine if you have a much lower probability for one of the questions in the decomposition than others! I don’t see what’s inherently wrong with that.
I agree with you directionally, but I’d argue that my track record has many questions that don’t have much data and evidence behind them, especially in comparison to your example about the inflation rate. As a quick example, I did well in the Salk Tournament for SARS-CoV-2 Vaccine R&D. I think for many of the questions, we did have some past data to go on but not much and the right choice of reference class is very unclear compared to your inflation example.
Looking forward to it, let me know if feedback would be useful :)
Yeah, no worries! I think this is helping me to figure out what my issue is, which i think is related to what probability ranges are “reasonable”.
That’s the thing. I do think it’s surprising. When we are talking about speculative events, the range of probabilities should be enormous. If I estimate the odds that vladimir putin is killed by a freak meteor strike, the answer is not in the 1-99% range, it’s in the 1 in a billion range. What are the odds that paraguay becomes a world superpower in the next 50 years? 1 in a million? 1 in a trillion? Conversely, what are the odds that the sun will rise on the earth in 2070? About as close to 1 as it’s possible for an estimate to get.
When we consider question 5, we are asking about the winner of a speculative war between a society that we are very uncertain about and an AI that we know next to nothing about. In question 3, you are asking for an estimate of the competence of future AI engineers at constraining an as yet unknown AI design. In both cases, I see the “reasonable range” of estimates as being extremely broad. I would not be surprised if the “true estimate” (if thats even a coherent concept) for Q5 was 1 in a billion, or 1 in a thousand, or nearly 1 in 1. This is what I strikes me as off. Out of all the possible answers for these highly speculative questions in logarithmic space, why would both of them end up in the 75-85% range?
Consider, by contrast, your covid-19 predictions. these seem to be bounded in a way that the examples above aren’t. There was uncertainty about whether the vaccine would be implemented in 2021 or 2023, perhaps you could make a reasonable case for predicting it would take until like 2025. But if I gave an answer of “2112 ad”, you would look at me like a crazy person. It seems like the AI estimates are unbounded in their ranges in a way the metacalculus questions aren’t.
This is where the object level and the meta level get kinda hard to untangle. I think if you accept my meta level reasoning, it also necessitates lowering your object level estimates. If you try and make the estimates for the individual steps vary more (by putting a 0.1% in there or something), the total probability will then end up being as low as that step. But i’m not sure if this is necessarily wrong? If your case relies on a chain of at least somewhat independent unbounded speculative events, placing odds as high as 40% seems like it’s an error on it’s face.
The way I think about what range of probabilities is reasonable is mostly by considering reference classes for (a) the object-level prediction being made and (b) the success rate of relatively similar predictions in the past. I agree that a priori most claims that feel very speculative we’d expect to have little confidence in, but I think we can get a lot of evidence from considering more specific reference classes.
Let’s take the example of determining whether AI would be disempower humanity:
For (a), I think looking at the reference class of “do more intelligent entities disempower less intelligent entities? (past a certain level of intelligence)” is reasonable and would give a high baseline (one could then adjust down from the reference class forecast based on how strong the considerations are that we will potentially able to see it coming to some extent, prepare in advance, etc.).
For (b), I think a reasonable reference class would be previous long-term speculative forecasts made my futurists. My read is that these were right about 30-50% of the time.
Also:
I agree that we shouldn’t be shocked if it’s the case that the “true”estimate for at least one of the questions is very confident, but I don’t think we should be shocked if the “best realistically achievable” estimates for all of them aren’t that confident. Where “best realistically achievable” estimates are subject to our very limited time and reasoning capacities.
I think the choice of reference class is itself a major part of the object level argument. For example, instead of asking “do more intelligent entities disempower less intelligent entities”, why not ask “does the side of a war starting off with vastly more weapons, manpower and resources usually win?”. Or “do test subjects usually escape and overpower their captors?” Or ” Has any intelligent entity existed without sufficient flaws to prevent them from executing world domination?”. These reference classes intuit a much lower estimation.
Now, all of these reference classes are flawed in that none of them correspond 1 to 1 with the actual situation at hand. But neither does yours! For example, in none of the previous cases of higher intelligence overpowering lower intelligences has the lower intelligence had the ability to write the brain of the higher intelligence. Is this a big factor or a small factor? Who knows?
As for b), I just don’t agree that predictions about the outcome of future AI wars are in a similar class to questions like “will there be manned missions to mars” or “predicting the smartphone”.
Anyway, I’m not too interested in going in depth on the object level right now. Ultimately I’ve only barely scratched the surfaces of the flaws leading to overestimation of AI risk, and it will take time to break through, so I thank you for your illuminating discussion!
I agree that the choice of reference class matters a lot and is non-obvious (and hope I didn’t imply otherwise!).