Lets imagine a toy model, where we only care about current people. It’s still possible for most of the potential qualies to be far into the future. (If either lifetimes are very long, or quality is very high, maybe both.) So in this model, almost all the qualies come from currently existing people living for trillion years in an FAI utopia.
So lets suppose there are 2 causes, AI safety, and preventing nuclear war. Nuclear war will happen unless prevented, and will lead to half as many current people reaching ASI. (Either it kills them directly, or it delays the development of ASI) Let the QALY’s of (No nukes, FAI) be X.
Case 1) Currently P(FAI)=0.99, And AI safety research will increase that to P(FAI)=1. If we work on AI safety, a nuclear war happens, half the people survive to ASI, and we get U=0.5X. If we work on preventing nuclear war, the AI is probably friendly anyway, so U=0.99X
Case 2) Currently P(FAI)=0, but AI safety research can increase that to P(FAI)=0.01. Then if we prevent nuclear war, we get practically 0 utility, and if we work on AI safety, we get 0.005 utility, a 1% chance of the 50% of survivors living in a post FAI utopia.
Of course, this is assuming all utility ultimately resides in post ASI utopia, as well as not caring about future people. If you put a substantial fraction of utility on pre ASI world states, then the calculation is different. (Either by being really pesimistic about the chance of alignment, or by applying some form of time discounting to not care too much about the far future of existing people either. )
People generally don’t care about their future QALYs in a linear way: a 1/million chance of living 10 million times as long and otherwise dying immediately is very unappealing to most people, and so forth. If you don’t evaluate future QALYs for current people in a way they find acceptable, then you’ll wind up generating recommendations that are contrary to their preferences and which will not be accepted by society at large.
This sort of argument shows that person-affecting utilitarianism is a very wacky doctrine (also see this) that doesn’t actually sweep away issues of the importance of the future as some say, but it doesn’t override normal people concerns by their own lights.
Lets imagine a toy model, where we only care about current people. It’s still possible for most of the potential qualies to be far into the future. (If either lifetimes are very long, or quality is very high, maybe both.) So in this model, almost all the qualies come from currently existing people living for trillion years in an FAI utopia.
So lets suppose there are 2 causes, AI safety, and preventing nuclear war. Nuclear war will happen unless prevented, and will lead to half as many current people reaching ASI. (Either it kills them directly, or it delays the development of ASI) Let the QALY’s of (No nukes, FAI) be X.
Case 1) Currently P(FAI)=0.99, And AI safety research will increase that to P(FAI)=1. If we work on AI safety, a nuclear war happens, half the people survive to ASI, and we get U=0.5X. If we work on preventing nuclear war, the AI is probably friendly anyway, so U=0.99X
Case 2) Currently P(FAI)=0, but AI safety research can increase that to P(FAI)=0.01. Then if we prevent nuclear war, we get practically 0 utility, and if we work on AI safety, we get 0.005 utility, a 1% chance of the 50% of survivors living in a post FAI utopia.
Of course, this is assuming all utility ultimately resides in post ASI utopia, as well as not caring about future people. If you put a substantial fraction of utility on pre ASI world states, then the calculation is different. (Either by being really pesimistic about the chance of alignment, or by applying some form of time discounting to not care too much about the far future of existing people either. )
People generally don’t care about their future QALYs in a linear way: a 1/million chance of living 10 million times as long and otherwise dying immediately is very unappealing to most people, and so forth. If you don’t evaluate future QALYs for current people in a way they find acceptable, then you’ll wind up generating recommendations that are contrary to their preferences and which will not be accepted by society at large.
This sort of argument shows that person-affecting utilitarianism is a very wacky doctrine (also see this) that doesn’t actually sweep away issues of the importance of the future as some say, but it doesn’t override normal people concerns by their own lights.