Does the “deliberate about current evidence” part includes thinking a lot about AI alignment to identify new arguments or considerations that other people on Earth may not have thought of, or would that count as new evidence?
It seems like if that would not count as new evidence, that the team you described might be able to come up with much better forecasts than we have today, and I’d think their final forecast would be more likely to end up much lower or much higher than e.g. your forecast. One consequence of this might then be be that your 90% confidence about MacAskill’s misaligned AI takeover credence is too high, even if your 35% point estimate is reasonable.
I intend identifying new arguments or considerations based on current evidence to be allowed, but I’m more skeptical than you that this would converge that much closer to 0% or 100%. I think there’s a ton of effectively irreducible uncertainty in forecasting something as complex as whether misaligned AI will takeover this century.
Thanks for the response! This clarifies what I was wondering well:
I intend identifying new arguments or considerations based on current evidence to be allowed
I have some more thoughts regarding the following, but want to note up front that no response is necessary—I’m just sharing my thoughts out loud:
I’m more skeptical than you that this would converge that much closer to 0% or 100%. I think there’s a ton of effectively irreducible uncertainty in forecasting something as complex as whether misaligned AI will takeover this century.
I agree there’s a ton of irreducible uncertainty here, but… what’s a way of putting it… I think there are lots of other strong forecasters who think this too, but might look at the evidence that humanity has today and come to a significantly different forecast than you.
Like who is to say that Nate Soares and Daniel Kokotajlo’s forecasts are wrong? (Though actually it takes a smaller likelihood ratio for you to update to reach their forecasts than it does for you to reach MacAskill’s forecast.) Presumably they’ve thought of some arguments and considerations that you haven’t read or thought of before. I think it wouldn’t surprise me if this team deliberating on humanity’s current evidence for a thousand years would come across those arguments or considerations (or some other ones) in their process of logical induction (to use a term I learned from MIRI that roughly means updating without new evidence) and ultimately decide on a final forecast very different than yours as a result.
Perhaps another way of saying this is that your current forecast may be 35% not because that’s the best forecast that can be made with humanity’s current evidence, given the irreducible uncertainty in the world, but rather because you don’t currently have all of humanity’s current evidence. Perhaps your 35% is more reflective of your own ignorance than the actual amount of irreducible uncertainty in the world.
Reflecting a bit more, I’m realizing I should ask myself what I think is the appropriate level of confidence that 3% is too low. Thinking about it a bit more, 90% actually doesn’t seem that high, even given what I just wrote above. I think my main reason for thinking it may be too high is that 1000 years is a long time for a team of 100 reasonable people to think about the evidence humanity currently has and I’d expect such a team to get a much better understanding of what the actual risk of misaligned AI takeover is than anyone alive today has, even without new evidence. And because I feel like we’re in a state of relative ignorance about the risk still, it wouldn’t surprise me if after the 1000 years they justifiably believed they could be much more confident one way or the other about the amount of risk.
I added this line in response to Lukas pointing out that the researchers could just work on the agendas to get information. As I mentioned in a comment, the line between deliberating and identifying new evidence is fuzzy but I think it’s better to add a rough clarification than nothing.
I intend identifying new arguments or considerations based on current evidence to be allowed, but I’m more skeptical than you that this would converge that much closer to 0% or 100%. I think there’s a ton of effectively irreducible uncertainty in forecasting something as complex as whether misaligned AI will takeover this century.
Thanks for the response! This clarifies what I was wondering well:
I have some more thoughts regarding the following, but want to note up front that no response is necessary—I’m just sharing my thoughts out loud:
I agree there’s a ton of irreducible uncertainty here, but… what’s a way of putting it… I think there are lots of other strong forecasters who think this too, but might look at the evidence that humanity has today and come to a significantly different forecast than you.
Like who is to say that Nate Soares and Daniel Kokotajlo’s forecasts are wrong? (Though actually it takes a smaller likelihood ratio for you to update to reach their forecasts than it does for you to reach MacAskill’s forecast.) Presumably they’ve thought of some arguments and considerations that you haven’t read or thought of before. I think it wouldn’t surprise me if this team deliberating on humanity’s current evidence for a thousand years would come across those arguments or considerations (or some other ones) in their process of logical induction (to use a term I learned from MIRI that roughly means updating without new evidence) and ultimately decide on a final forecast very different than yours as a result.
Perhaps another way of saying this is that your current forecast may be 35% not because that’s the best forecast that can be made with humanity’s current evidence, given the irreducible uncertainty in the world, but rather because you don’t currently have all of humanity’s current evidence. Perhaps your 35% is more reflective of your own ignorance than the actual amount of irreducible uncertainty in the world.
Reflecting a bit more, I’m realizing I should ask myself what I think is the appropriate level of confidence that 3% is too low. Thinking about it a bit more, 90% actually doesn’t seem that high, even given what I just wrote above. I think my main reason for thinking it may be too high is that 1000 years is a long time for a team of 100 reasonable people to think about the evidence humanity currently has and I’d expect such a team to get a much better understanding of what the actual risk of misaligned AI takeover is than anyone alive today has, even without new evidence. And because I feel like we’re in a state of relative ignorance about the risk still, it wouldn’t surprise me if after the 1000 years they justifiably believed they could be much more confident one way or the other about the amount of risk.