People are underrating making the future go well conditioned on no AI takeover.
This deserves a full post, but for now a quick take: in my opinion, P(no AI takeover) = 75%, P(future goes extremely well | no AI takeover) = 20%, and most of the value of the future is in worlds where it goes extremely well (and comparatively little value comes from locking in a world that’s good-but-not-great).
Under this view, an intervention is good insofar as it affects P(no AI takeover) * P(things go really well | no AI takeover). Suppose that a given intervention can change P(no AI takeover) and/or P(future goes extremely well | no AI takeover). Then the overall effect of the intervention is proportional to ΔP(no AI takeover) * P(things go really well | no AI takeover) + P(no AI takeover) * ΔP(things go really well | no AI takeover).
Plugging in my numbers, this gives us 0.2 * ΔP(no AI takeover) + 0.75 * ΔP(things go really well | no AI takeover).
And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.
(And I don’t think it’s a tractability issue: I think affecting P(things go really well | no AI takeover) is pretty tractable!)
(Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)
One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:
ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:
Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
One of the key issues with “making the future go well” interventions is that we start to run up against the reality that what is a desirable outcome for the future is so variable between different humans that the concept of making the future go well requires buying into ethical assumptions that people won’t share, meaning that it’s much less valid as any sort of absolute metric to coordinate around:
When people make statements that implicitly treat “the value of the future” as being well-defined, e.g. statements like “I define ‘strong utopia’ as: at least 95% of the future’s potential value is realized”, I’m concerned that these statements are less meaningful than they sound.
This level of variability is less for preventing bad outcomes, especially outcomes in which we don’t die (though there is still variability here) because of instrumental convergence, and while there are moral views where dying/suffering isn’t so bad, these moral views aren’t held by many human beings (in part due to selection effects), so there’s less of a chance to have conflict with other agents.
The other reason is humans mostly value the same scarce instrumental goods, but in a world where AI goes well, basically everything but status/identity becomes abundant, and this surfaces up the latent moral disagreements way more than our current world.
And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.
Do you think this sort of work is related to AI safety? It seems to me that it’s more about philosophy (etc.) so I’m wondering what you had in mind.
Yup! Copying over from a LessWrong comment I made:
Roughly speaking, I’m interested in interventions that cause the people making the most important decisions about how advanced AI is used once it’s built to be smart, sane, and selfless. (Huh, that was some convenient alliteration.)
Smart: you need to be able to make really important judgment calls quickly. There will be a bunch of actors lobbying for all sorts of things, and you need to be smart enough to figure out what’s most important.
Sane: smart is not enough. For example, I wouldn’t trust Elon Musk with these decisions, because I think that he’d make rash decisions even though he’s smart, and even if he had humanity’s best interests at heart.
Selfless: even a smart and sane actor could curtail the future if they were selfish and opted to e.g. become world dictator.
And so I’m pretty keen on interventions that make it more likely that smart, sane, and selfless people are in a position to make the most important decisions. This includes things like:
Doing research to figure out the best way to govern advanced AI once it’s developed, and then disseminating those ideas.
Helping to positively shape internal governance at the big AI companies (I don’t have concrete suggestions in this bucket, but like, whatever led to Anthropic having a Long Term Benefit Trust, and whatever could have led to OpenAI’s non-profit board having actual power to fire the CEO).
Helping to staff governments with competent people.
Helping elect smart, sane, and selfless people to elected positions in governments (see 1, 2).
Hmm, I think if we are in a world where the people in charge of the company that have already built ASI need to be smart/sane/selfless for things to go well, then we’re already in a much worse situation than we should be, and things should have been done differently prior to this point.
I realize this is not a super coherent statement but I thought about it for a bit and I’m not sure how to express my thoughts more coherently so I’m just posting this comment as-is.
People are underrating making the future go well conditioned on no AI takeover.
This deserves a full post, but for now a quick take: in my opinion, P(no AI takeover) = 75%, P(future goes extremely well | no AI takeover) = 20%, and most of the value of the future is in worlds where it goes extremely well (and comparatively little value comes from locking in a world that’s good-but-not-great).
Under this view, an intervention is good insofar as it affects P(no AI takeover) * P(things go really well | no AI takeover). Suppose that a given intervention can change P(no AI takeover) and/or P(future goes extremely well | no AI takeover). Then the overall effect of the intervention is proportional to ΔP(no AI takeover) * P(things go really well | no AI takeover) + P(no AI takeover) * ΔP(things go really well | no AI takeover).
Plugging in my numbers, this gives us 0.2 * ΔP(no AI takeover) + 0.75 * ΔP(things go really well | no AI takeover).
And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.
(And I don’t think it’s a tractability issue: I think affecting P(things go really well | no AI takeover) is pretty tractable!)
(Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)
I, of course, agree!
One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:
ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:
Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:
E[V|x] = P(¬C | x)E[V |¬C, x] + P(C | x)E[V |C, x]
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
ΔEVx = (Px–Pb)Kb + Px(Kx–Kb)”
One of the key issues with “making the future go well” interventions is that we start to run up against the reality that what is a desirable outcome for the future is so variable between different humans that the concept of making the future go well requires buying into ethical assumptions that people won’t share, meaning that it’s much less valid as any sort of absolute metric to coordinate around:
(A quote from Steven Byrnes here):
This level of variability is less for preventing bad outcomes, especially outcomes in which we don’t die (though there is still variability here) because of instrumental convergence, and while there are moral views where dying/suffering isn’t so bad, these moral views aren’t held by many human beings (in part due to selection effects), so there’s less of a chance to have conflict with other agents.
The other reason is humans mostly value the same scarce instrumental goods, but in a world where AI goes well, basically everything but status/identity becomes abundant, and this surfaces up the latent moral disagreements way more than our current world.
Do you think this sort of work is related to AI safety? It seems to me that it’s more about philosophy (etc.) so I’m wondering what you had in mind.
Yup! Copying over from a LessWrong comment I made:
Roughly speaking, I’m interested in interventions that cause the people making the most important decisions about how advanced AI is used once it’s built to be smart, sane, and selfless. (Huh, that was some convenient alliteration.)
Smart: you need to be able to make really important judgment calls quickly. There will be a bunch of actors lobbying for all sorts of things, and you need to be smart enough to figure out what’s most important.
Sane: smart is not enough. For example, I wouldn’t trust Elon Musk with these decisions, because I think that he’d make rash decisions even though he’s smart, and even if he had humanity’s best interests at heart.
Selfless: even a smart and sane actor could curtail the future if they were selfish and opted to e.g. become world dictator.
And so I’m pretty keen on interventions that make it more likely that smart, sane, and selfless people are in a position to make the most important decisions. This includes things like:
Doing research to figure out the best way to govern advanced AI once it’s developed, and then disseminating those ideas.
Helping to positively shape internal governance at the big AI companies (I don’t have concrete suggestions in this bucket, but like, whatever led to Anthropic having a Long Term Benefit Trust, and whatever could have led to OpenAI’s non-profit board having actual power to fire the CEO).
Helping to staff governments with competent people.
Helping elect smart, sane, and selfless people to elected positions in governments (see 1, 2).
Hmm, I think if we are in a world where the people in charge of the company that have already built ASI need to be smart/sane/selfless for things to go well, then we’re already in a much worse situation than we should be, and things should have been done differently prior to this point.
I realize this is not a super coherent statement but I thought about it for a bit and I’m not sure how to express my thoughts more coherently so I’m just posting this comment as-is.
What interventions are you most excited about? Why? What are they bottlenecked on?