One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:
ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:
Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
I, of course, agree!
One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:
ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:
Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:
E[V|x] = P(¬C | x)E[V |¬C, x] + P(C | x)E[V |C, x]
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
ΔEVx = (Px–Pb)Kb + Px(Kx–Kb)”