Just wanted to note that the use of “worst case” in the mission statement
The fund’s mission is to address worst-case risks (s-risks) from artificial intelligence.
is highly non-intuitive for people with different axiology. Quoting from the s-risk explanation
For instance, an event leading to a future containing 10^35 happy individuals and 10^25 unhappy ones, would constitute an s-risk
At least for me, this would be a pretty amazing outcome, and not something which should be prevented.
In this context
We aim to differentially support alignment approaches where the risks are lowest. Work that ensures comparatively benign outcomes in the case of failure is particularly valuable from our perspective
sounds worrisome: do I interpret it correctly that in the ethical system held by the fund human extinction is comparatively benign outcome in comparison with risks like creation of 10^25 unhappy minds even if they are offset by much larger number of happy minds?
At least for me, this would be a pretty amazing outcome, and not something which should be prevented.
Yeah, we’re going to change the part that equates “worst case” with “s-risks”. Your view is common and reflects many ethical perspectives.
We were already thinking about changing the definition of “s-risk” based on similar feedback, to make it more intuitive and cooperative in the way you describe. It probably makes more sense to have it refer to only the few % of scenarios where most of the future’s expected suffering comes from (assuming s-risks are indeed heavy-tailed). These actual worst cases are what we want to focus on with the fund.
do I interpret it correctly that in the ethical system held by the fund human extinction is comparatively benign outcome in comparison with risks like creation of 10^25 unhappy minds even if they are offset by much larger [say 10^10x larger] number of happy minds?
No, that’s incorrect. Insofar as some fund managers hold this view personally (e.g., I do, while Jonas would agree with you that the latter outcome is vastly better), it won’t affect decisions because in any case, we want to avoid doing things that are weakly positive on some plausible moral views and very negative on others. But I can see why you were concerned, and thanks for raising this issue!
I guess to me, the part of the future with 10^25 unhappy individuals sounds like an s-risk. I would imagine an s-outcome could take place in a universe that’s still on the net good. Just because the universe may be on the net good though, doesn’t mean we shouldn’t be concerned with large s-outcomes that may happen.
Especially when it comes to the prevention of s-risks affecting futures that otherwise contain a lot of happiness, it matters a great deal how the risk in question is being prevented. For instance, if we envision a future that is utopian in many respects except for a small portion of the population suffering because of problem x, it is in the interest of virtually all value systems to solve problem x in highly targeted ways that move probability mass towards even better futures. By contrast, only few value systems (ones that are strongly or exclusively about reducing suffering/bad things) would consider it overall good if problem x was “solved” in a way that not only prevented the suffering due to problem x, but also prevented all the happiness from the future scenario this suffering was embedded in.
So it’d be totally fine to address all sources of unnecessary suffering (and even “small” s-risks embedded in an otherwise positive future) if there are targeted ways to bring about uncontroversial improvements. :) In practice, it’s sometimes hard to find interventions that are targeted enough because affecting the future is very very difficult and we only have crude levers. Having said that, I think many things that we’re going to support with the fund are actually quite positive for positive-future-oriented value systems as well. So there certainly are some more targeted levers.
There are instances where it does feel justified to me to also move some probability mass away from s-risks towards extinction (or paperclip scenarios), but that should be reserved either for uncontroversially terrible futures, or for those futures where most of the disvalue for downside-focused value systems comes from. I doubt that this includes futures where 10^10x more people are happy than unhappy.
And of course positive-future-oriented EAs face analogous tradeoffs of cooperation with other value systems.
We at CLR are now using a different definition of s-risks.
New definition:
S-risks are risks of events that bring about suffering in cosmically significant amounts. By “significant”, we mean significant relative to expected future suffering.
Note that it may turn out that the amount of suffering that we can influence is dwarfed by suffering that we can’t influence. By “expectation of suffering in the future” we mean “expectation of action-relevant suffering in the future”.
I’m wondering a bit about this definition. One interpretation of it is that you’re saying something like this:
“The expected future suffering is X. The risk that event E occurs is an S-risk if and only if E occurring raises the expected future suffering significantly above X.”
But I think that definition doesn’t work. Suppose that it is almost certain (99,9999999%) that a particular event E will occur, and that it would cause a tremendous amount of suffering. Then the expected future suffering is already very large (if I understand that concept correctly). And, because E is virtually certain to occur, it occurring will not actually bring about suffering in cosmically significant amounts relative to expected future suffering. And yet intuitively this is an S-risk, I’d say.
Another interpretation of the definition is:
”The expected future suffering is X. The risk that event E occurs is an S-risk if and only if the difference in suffering between E occurring and E not occurring is significant relative to X.”
That does take care of that issue, since, by hypothesis, the difference between E occurring and E not occurring is a tremendous amount of suffering.
Alternatively, you may want to say that the risk that E occurs is an S-risk if and only if occurring brings about a significant amount of suffering relative to what we expect to occur from other causes. That may be a more intuitive way of thinking about this.
A feature of this definition is that the risk of an event E1 occurring can be S-risk even if it occurring would cause much less suffering than another event E2 would, provided that E1 is much more likely to occur than E2. But if we increase our credence that E2 will occur, then the risk of E1 occurring will cease to be an S-risk, since it no longer will cause a significant amount of suffering relative to expected future suffering.
I guess that some would find that unintuitive, and that something being an S-risk shouldn’t depend on us adjusting our credences in independent events occurring in this way. But it depends a bit what perspective you have.
Just wanted to note that the use of “worst case” in the mission statement
The fund’s mission is to address worst-case risks (s-risks) from artificial intelligence.
is highly non-intuitive for people with different axiology. Quoting from the s-risk explanation
For instance, an event leading to a future containing 10^35 happy individuals and 10^25 unhappy ones, would constitute an s-risk
At least for me, this would be a pretty amazing outcome, and not something which should be prevented.
In this context
We aim to differentially support alignment approaches where the risks are lowest. Work that ensures comparatively benign outcomes in the case of failure is particularly valuable from our perspective
sounds worrisome: do I interpret it correctly that in the ethical system held by the fund human extinction is comparatively benign outcome in comparison with risks like creation of 10^25 unhappy minds even if they are offset by much larger number of happy minds?
Yeah, we’re going to change the part that equates “worst case” with “s-risks”. Your view is common and reflects many ethical perspectives.
We were already thinking about changing the definition of “s-risk” based on similar feedback, to make it more intuitive and cooperative in the way you describe. It probably makes more sense to have it refer to only the few % of scenarios where most of the future’s expected suffering comes from (assuming s-risks are indeed heavy-tailed). These actual worst cases are what we want to focus on with the fund.
No, that’s incorrect. Insofar as some fund managers hold this view personally (e.g., I do, while Jonas would agree with you that the latter outcome is vastly better), it won’t affect decisions because in any case, we want to avoid doing things that are weakly positive on some plausible moral views and very negative on others. But I can see why you were concerned, and thanks for raising this issue!
I guess to me, the part of the future with 10^25 unhappy individuals sounds like an s-risk. I would imagine an s-outcome could take place in a universe that’s still on the net good. Just because the universe may be on the net good though, doesn’t mean we shouldn’t be concerned with large s-outcomes that may happen.
Yeah. I put it the following way in another post:
So it’d be totally fine to address all sources of unnecessary suffering (and even “small” s-risks embedded in an otherwise positive future) if there are targeted ways to bring about uncontroversial improvements. :) In practice, it’s sometimes hard to find interventions that are targeted enough because affecting the future is very very difficult and we only have crude levers. Having said that, I think many things that we’re going to support with the fund are actually quite positive for positive-future-oriented value systems as well. So there certainly are some more targeted levers.
There are instances where it does feel justified to me to also move some probability mass away from s-risks towards extinction (or paperclip scenarios), but that should be reserved either for uncontroversially terrible futures, or for those futures where most of the disvalue for downside-focused value systems comes from. I doubt that this includes futures where 10^10x more people are happy than unhappy.
And of course positive-future-oriented EAs face analogous tradeoffs of cooperation with other value systems.
We at CLR are now using a different definition of s-risks.
New definition:
S-risks are risks of events that bring about suffering in cosmically significant amounts. By “significant”, we mean significant relative to expected future suffering.
Note that it may turn out that the amount of suffering that we can influence is dwarfed by suffering that we can’t influence. By “expectation of suffering in the future” we mean “expectation of action-relevant suffering in the future”.
I’m wondering a bit about this definition. One interpretation of it is that you’re saying something like this:
“The expected future suffering is X. The risk that event E occurs is an S-risk if and only if E occurring raises the expected future suffering significantly above X.”
But I think that definition doesn’t work. Suppose that it is almost certain (99,9999999%) that a particular event E will occur, and that it would cause a tremendous amount of suffering. Then the expected future suffering is already very large (if I understand that concept correctly). And, because E is virtually certain to occur, it occurring will not actually bring about suffering in cosmically significant amounts relative to expected future suffering. And yet intuitively this is an S-risk, I’d say.
Another interpretation of the definition is:
”The expected future suffering is X. The risk that event E occurs is an S-risk if and only if the difference in suffering between E occurring and E not occurring is significant relative to X.”
That does take care of that issue, since, by hypothesis, the difference between E occurring and E not occurring is a tremendous amount of suffering.
Alternatively, you may want to say that the risk that E occurs is an S-risk if and only if occurring brings about a significant amount of suffering relative to what we expect to occur from other causes. That may be a more intuitive way of thinking about this.
A feature of this definition is that the risk of an event E1 occurring can be S-risk even if it occurring would cause much less suffering than another event E2 would, provided that E1 is much more likely to occur than E2. But if we increase our credence that E2 will occur, then the risk of E1 occurring will cease to be an S-risk, since it no longer will cause a significant amount of suffering relative to expected future suffering.
I guess that some would find that unintuitive, and that something being an S-risk shouldn’t depend on us adjusting our credences in independent events occurring in this way. But it depends a bit what perspective you have.