The bandit problem is definitely related, although I’m not sure it’s the best way to formulate the situation here. The main issue is that the bandit formulation, here, treats learning about the magnitude of a risk and working to address the risk as the same action—when, in practice, they often come apart.
Here’s a toy model/analogy that feels a bit more like it fits the case, in my mind.
Let’s say there are two types of slot machines: one that has a 0% chance of paying and one that has a 100% chance of paying. Your prior gives you a 90% credence that each machine is non-paying.[1]
Unfortunately: When you pull the lever on either machine, you don’t actually get to see what the payout is. However, there’s some research you can do to try to get a clearer sense of what each machine’s “type” is.
And this research is more tractable in the case of the first machine. For example: Maybe the first machine has identifying information on it, like a model number, which might allow you to (e.g.) call up the manufacturer and ask them. The second machine is just totally nondescript.
The most likely outcome, then, is that you quickly find out that the first slot machine is almost certainly non-paying—but continue to have around a 10% credence that the second machine pays.
In this scenario, you should keep pulling the lever on the second machine. You should also, even as a rational Bayesian, actually be more optimistic about the second machine.
(By analogy, I think we actually should tend to fear speculative existential risks more.]
A more sophisticated version of this scenario would have a continuum of slot machine types and a skewed prior over the likelihood of different types arising.
Interesting, that makes perfect sense. However, if there’s no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I’m not sure your story explains why we end up fixating on the uncertain interventions (AIS research).
Another way to explain why the uncertain risks look big would be that we are unable to stop society pulling the AI progress lever until we have proven it to be dangerous. Definitely risky activities just get stopped! Maybe that’s implicitly how your model gets the desired result.
However, if there’s no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I’m not sure your story explains why we end up fixating on the uncertain interventions (AIS research).
The story does require there to be only a very limited number of arms that we initially think have a non-negligible chance of paying. If there are unlimited arms, then one of them should be both paying and easily identifiable.
So the story (in the case of existential risks) is that there are only a very small number of risks that, on the basis of limited argument/evidence, initially seem like they might lead to extinction or irrecoverable collapse by default. Maybe this set looks like: nuclear war, misaligned AI, pandemics, nanotechnology, climate change, overpopulation / resource depletion.
If we’re only talking about a very limited set, like this, then it’s not too surprising that we’d end up most worried about an ambiguous risk.
Interesting, that makes perfect sense. However, if there’s no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and forget about the unknowable one. So I’m not sure your story explains why we end up fixating on the uncertain interventions (AIS research). It seems you need an additional element where society is unable to stop itself pulling the AI progress lever...
The bandit problem is definitely related, although I’m not sure it’s the best way to formulate the situation here. The main issue is that the bandit formulation, here, treats learning about the magnitude of a risk and working to address the risk as the same action—when, in practice, they often come apart.
Here’s a toy model/analogy that feels a bit more like it fits the case, in my mind.
Let’s say there are two types of slot machines: one that has a 0% chance of paying and one that has a 100% chance of paying. Your prior gives you a 90% credence that each machine is non-paying.[1]
Unfortunately: When you pull the lever on either machine, you don’t actually get to see what the payout is. However, there’s some research you can do to try to get a clearer sense of what each machine’s “type” is.
And this research is more tractable in the case of the first machine. For example: Maybe the first machine has identifying information on it, like a model number, which might allow you to (e.g.) call up the manufacturer and ask them. The second machine is just totally nondescript.
The most likely outcome, then, is that you quickly find out that the first slot machine is almost certainly non-paying—but continue to have around a 10% credence that the second machine pays.
In this scenario, you should keep pulling the lever on the second machine. You should also, even as a rational Bayesian, actually be more optimistic about the second machine.
(By analogy, I think we actually should tend to fear speculative existential risks more.]
A more sophisticated version of this scenario would have a continuum of slot machine types and a skewed prior over the likelihood of different types arising.
Interesting, that makes perfect sense. However, if there’s no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and stop worrying about the unknowable one. So I’m not sure your story explains why we end up fixating on the uncertain interventions (AIS research).
Another way to explain why the uncertain risks look big would be that we are unable to stop society pulling the AI progress lever until we have proven it to be dangerous. Definitely risky activities just get stopped! Maybe that’s implicitly how your model gets the desired result.
The story does require there to be only a very limited number of arms that we initially think have a non-negligible chance of paying. If there are unlimited arms, then one of them should be both paying and easily identifiable.
So the story (in the case of existential risks) is that there are only a very small number of risks that, on the basis of limited argument/evidence, initially seem like they might lead to extinction or irrecoverable collapse by default. Maybe this set looks like: nuclear war, misaligned AI, pandemics, nanotechnology, climate change, overpopulation / resource depletion.
If we’re only talking about a very limited set, like this, then it’s not too surprising that we’d end up most worried about an ambiguous risk.
Interesting, that makes perfect sense. However, if there’s no correlation between the payoff of an arm and our ability to know it, then we should eventually find an arm that pays off 100% of the time with high probability, pull that arm, and forget about the unknowable one. So I’m not sure your story explains why we end up fixating on the uncertain interventions (AIS research). It seems you need an additional element where society is unable to stop itself pulling the AI progress lever...