We describe the implications the Existence Neutrality Hypothesis[1] could have for impartial longtermists, and then provide quantitative impact estimates for four idealized interventions when accounting or not for this hypothesis. Under this hypothesis, there is little value lost if humanity does not create a Space-Faring Civilization. The three main implications are: (1) To reduce significantly the longtermist value of reducing Extinction -Risks, such as those caused by nuclear and bioweapons. (2) To make some new AI Safety agendas (e.g., “Plan B”) competitive with existing AI Safety agendas. (3) To update the relative priorities between existing AI Safety agendas and possibly render some of them significantly less attractive.
Sequence: This post is part 9 of a sequence investigating the longtermist implications of alien Space-Faring Civilizations. Each post aims to be standalone. You can find an introduction to the sequence in the following post.
Summary
The Existence Neutrality Hypothesis (ENH) suggests that humanity not developing a Space-Faring Civilization (SFC) results in minimal marginal value loss, as most cosmic resources would be recovered by other civilizations with comparable utility creation efficiency. Assuming the ENH is about 75% correct (combining the Civ-Saturation Hypothesis[2] as 84% correct and the Civ-Similarity Hypothesis[3] as 90% correct), there are three core implications:
Significantly reduce the priority of Extinction-Risk reduction (e.g., nuclear, bioweapon-risks) by about 75%, since fewer cosmic resources and less value would be lost if humanity fails to create a space-faring civilization.
Increase the importance of alternative AI Safety agendas, such as the “Plan B” agenda, which aims at mitigating negative outcomes from misaligned AIs, for example, by reducing the chance that misaligned AIs achieve space colonization.
Shift the main AI safety target from maximizing the joint probability of alignment and humanity creating a space-faring civilization towards optimizing alignment conditional on humanity achieving space colonization. Shifting the focus from reducing Existential-Risks to Alignment-Risks.
We first qualitatively introduce these implications, and then review some questions related to how substantial they are. Finally, we produce a simple quantitative model estimating and illustrating these implications.
Reminder of hypotheses and assumptions
The Existence Neutrality Hypothesis is the conjunction of two subhypotheses. The Existence Neutrality Hypothesis posits that humanity creating a Space-Faring Civilization (SFC) does not bring much marginal value, because we reach relatively few marginal resources, and because humanity’s SFC does not create much more value than other SFCs. The implications we describe in the post are conditional on the Existence Neutrality Hypothesis being significantly correct. This hypothesis is the conjunction of two subhypotheses: The Civ-Saturation Hypothesis, and the Civ-Silimilarity Hypothesis.
Civ-Saturation Hypothesis: Most resources are grabbed irrespective of our existence. The Civ-Saturation Hypothesis posits that when making decisions, we should assume most of humanity’s Space-Faring Civilization (SFC) future resources will eventually be grabbed by SFCs regardless of whether humanity’s SFC exists or not. In the post “Other Civilizations Would Recover 84+% of Our Cosmic Resources”, we performed a first evaluation of this hypothesis and concluded that as long as you believe in some form of EDT, then our best guess is that 84+% of humanity’s SFC resources would be recovered by other SFCs if humanity does not create an SFC. For the remainder of the post, we will assume the most conservative form of EDT, which is simply CDT plus assuming we control our exact copies.
Civ-Similarity Hypothesis: Our civilization is not much abnormal in terms of creating value. The Civ-Similarity Hypothesis posits that the expected utility[4] of humanity’s future Space-Faring Civilization is similar to that of other SFCs. In the post “The Convergent Path to the Stars—Similar Utility Across Civilizations Challenges Extinction Prioritization”, we introduced some reasons supporting the Civ-Similarity Hypothesis. Our current speculative guess is that other SFCs produce 95% to 99% of the value humanity’s SFC would produce per unit of resources[5]. There are two main reasons behind these high values: First, our longtermist expected utility estimates are massively uncertain[6], which leads to a flattening of differences, and second the supporting and opposing arguments, don’t seem to differ massively in terms of credibility or impact. Even when we predict some differences, they likely only produce very weak updates and fail to support that humanity would be much abnormal among SFCs.[7]
We evaluate the Existence Neutrality Hypothesis using the ENH ratio. The ENH ratio is the ratio between the value produced by humanity NOT creating an SFC and creating one. We call these two outcomes U|¬S and U|S. We assume for simplicity they are positive values. ENH ratio = U|¬S / U|S. The closer the ENH ratio is to one, the more correct the Existence Neutrality Hypothesis is. Abusing language, we will say that this hypothesis is 75% correct[8] when the ratio equals 0.75.
In this post we assume the ENH ratio equals 0.75. We will investigate what would be the strategic implications if the Existence Neutrality Hypothesis were 75% true, given classical total utilitarianism. We obtain this value by multiplying our quantitative best guess estimate for the correctness of the Civ-Saturation Hypothesis (84%[9]), with a slightly conservative speculative guess for the Civ-Similarity Hypothesis (90%[10],[11]).
Overview of implications
The ENH being 75% true would have three main implications:
Reducing by 75% the importance of Extinction-Risk reduction (e.g., from nuclear weapons and bioweapons)
Opening new competitive AI Safety agendas. For example, “Plan B AI safety”, which focuses on decreasing the negative impact of misaligned AIs.
Updating the relative importance of AI safety agenda by shifting our optimization target from Existential-Risks to Alignment-Risks, from increasing P(alignment AND humanity creates an SFC) to increasing P(alignment | humanity creates an SFC),
Let’s illustrate how the values of X-risk reduction agendas are changed when the ENH is true. In the plot below, the red/green areas denote the production of marginal negative/positive value. We plot two primary directions (black continuous arrows) along which interventions can have an impact: Increasing P(humanity creates an SFC) or increasing P(alignment | humanity creates an SFC), reducing Extinction-Risks or reducing Alignment-Risks. Using dashed-arrow, we plot the conjunctive directions P(alignment AND humanity creates an SFC) and P(misalignment AND humanity creates an SFC). Finally, each circle will represent an area in which an X-risk reduction agenda is speculatively located, meaning that the typical impact of this agenda is supported by interventions producing effects along the plotted directions.
Let’s now plot the speculative locations of X-risk agendas when assuming that the ENH is incorrect (ENH ~ 0). We highlight the following: Nuclear and bioweapon X-risk reduction have positive marginal value. The Plan-B AI safety agenda is mostly neutral. And optimizing P(alignment AND humanity creating an SFC) provides the most marginal value.
Now let’s represent an alternative chart of priorities when ENH is correct. For simplicity we use ENH = 1 in the following plot. We now observe the following: Nuclear and bioweapon X-risk reduction produces much less value[12]. The Plan-B agenda now produces positive value. And increasing P(alignment | humanity creates an SFC) is now the optimal target.
Qualitative implications
Reducing extinction risks, which prevent the creation of SFCs, would be 75% less important
75% less impact from reducing Extinction-Risks. When the ENH is 75% correct, this means only 25% of humanity’s SFC value would be lost if it does not exist. This directly translates into a reduction of 75% of the value of increasing humanity’s SFC chances to exist. Nuclear and bioweapons X-risk reductions are usually seen as the most impactful, non-AI Safety, interventions to increase humanity’s SFC chance to exist. From existing surveys, estimates, and grantdatabases, we extract estimates of how much resources are allocated or should be allocated to cause areas related to reducing non-AI extinction risks, see details in footnote[13].
Non-AI Safety extinction reduction grants may represent the equivalent of between 16% (2024) and up to 50% (2022-2024 avg.) of the grants made to AI Safety. Let’s focus on Open Philanthropy since they are the largest grant maker in the EA community. Using their grant database and their own labels, we estimate that in 2024, 4.0% of their total grant budget was allocated to “Biosecurity & Pandemic Preparedness”. This is roughly equal to 22% of the grant budget allocated to “Potential Risks From Advanced AI” in the same year (18.4% of all grants in 2024). A significant issue with these numbers is that they exclude the focus area “Global Catastrophic Risks Capacity Building”, which is significant (15.7% of all grants in 2024) and consists, at first sight, of something like 50%[14] of grants for AI Safety capacity building and 50% of grants whose goal does NOT seem limited to AI Safety but likely often includes it.
If we ignore capacity building, then the grants to “Biosecurity & Pandemic Preparedness” represent 22% of the grants to “Potential Risks From Advanced AI” in 2024, down from 62% in 2023 and 67% in 2022.
If we don’t ignore “Global Catastrophic Risks Capacity Building”, and we try to classify grants using keywords in their names while defaulting to the ratio between “Biosecurity & Pandemic Preparedness” and “Potential Risks From Advanced AI” when we don’t find any keywords, then we produce the estimate that non-AI Safety grants represented 16% of the amount allocated to AI safety grants in 2024 and 50% in average over 2022 to 2024.
These estimates show that a 75% reduction in the importance of these cause areas (relative to other longtermist cause areas) may have some significant impact on longtermist grant-making prioritization and community beliefs.
Plan B AI Safety
Misaligned AIs are net negative when the ENH is significantly correct. Plan B AI Safety is a relatively new and underground AI Safety agenda that focuses on influencing the future conditional on failing at aligning ASIs. Point of views about the value of misaligned ASIs vary a lot[15]. Let’s use the middle-ground assumption that misaligned ASIs would produce a future with negligible value. Under this assumption, the Plan B agenda would have no or little value. But under the assumption that the Existence Neutrality Hypothesis is 75% true, then a misaligned ASI could have a strong negative impact by taking resources away from alien SFCs, which may succeed at creating a positive future.
Decreasing P(humanity creates an SFC | misalignment) is competitive with other AI Safety agenda when ENH is 75% correct. One goal of the Plan B AI Safety agenda is to decrease P(humanity creates an SFC | misalignment). If ENH is incorrect, this intervention has no impact, but when assuming ENH is 75% true, then this agenda is now similar in impact to increasing P(alignment AND humanity creates an SFC). See the quantitative estimates later in the post.
The less valuable humanity’s SFC is, relative to alien SFCs, the more important this agenda is. After updating on the ENH, work on Plan B AI Safety now seems impactful. Such work may be especially attractive for people thinking that humanity is especially bad at technical AI safety, AI governance, or may have especially worrisome moral values, relative to other alien Intelligent Civilizations on the verge of creating an SFC. Someone in this situation would estimate that ENH is higher than 75% correct, possibly above 100%.
Updating the importance of existing AI Safety agendas
Let’s simplify notations:
Let’s shorten P(alignment AND humanity creates an SFC) into A∧S. We call increasing A∧S: “Reducing X-risks”.
Let’s shorten P(alignment | humanity creates an SFC) into A|S. We call increasing A|S: “Reducing Alignment-Risks”, this is equivalent to “Increasing the value of the future in which our space-faring descendants exist”.
And let’s write P(humanity creates an SFC) as S. We call increasing S: “Reducing Extinction-Risks”[16]
The AI Safety community is currently optimizing for A∧S. As of 2024, the AI Safety community is mostly focusing on solving intent alignment through technical AI Safety research, and, with lower emphasis, on solving AI Governance. These works and motivations behind them can be understood as optimizing the probability that humanity creates an aligned SFC, which we can formalize as A∧S.
Optimizing A|S is a better target when ENH is correct. While optimizing A∧S may be mostly the right target when ENH is incorrect, if we now assume it is 75% correct, then the marginal value of creating a SFC is significantly decreased and the optimal target move closer to increasing A|S.
Speculative sign-flipping update. At the extreme, AI Safety agendas could see their expected value flip from positive to negative if they increase S enough to have A∧S increase overall while at the same time they are effectively decreasing A|S. The AI Safety agendas speculatively concerned with such dramatic updates may include those in which our impact is mostly limited to reducing the chance of extinction conditional on observing that humanity is especially bad at alignment, e.g., AI Control and AI Misuse. We speculate about this possible sight-flip updates in a footnote[17].
What is the difference between optimizing A∧S and A|S? Formally, Bayes’ theorem tells us that A∧S = A|S * S. Thus the difference between both is that when you optimize for A|S, you no longer optimize for the term S, which was included in A∧S; you no longer optimize for P(humanity creates an SFC).
Concretely, how should the difference between A∧S and A|S make us update on the priority of AI safety agendas? Here are speculative examples. The degree to which priorities should be updated is to be debated. We only claim that they may need to be updated conditional on the ENH being significantly correct.
AI Misuse reduction: If the Paths-To-Impact (PTIs) are (a) to prevent extinction through reducing misuse and chaos, (b) to prevent the loss of alignment power resulting from a more chaotic world, and (c) to provide more time for Alignment research. Then it is plausible that the PTI (a) would become less impactful.
Misalign AI Control: If the PTIs are (c) same as for AI misuse, (d) to prevent extinction through controlling early misaligned AI trying to take over, (e) to control misaligned early AIs to make them work on alignment research, and (f) to create fire alarms[18]. Then it is plausible the PTI (d) would be less impactful since these early misaligned AI may have a higher chance to not create an SFC after taking over (e.g., they don’t survive destroying humanity or don’t care about space colonization, see the next section for more detail).
AI evaluations: The reduction of the impact of (a) and (d) may also impact the overall importance of this agenda.
Here is another effect: If an intervention, like AI control, increases P(humanity creates an SFC | early misalignment), then this intervention may need to be discounted more than if it was only increasing S. Increasing S may have little impact when the ENH is significantly correct, but increasing P(humanity creates an SFC | misalignment) is net negative, and early misalignment and (late) misalignment may be strongly correlated.
These updates are, at the moment, speculative and need to be debated.
How valuable are these updates?
Let’s look at how valuable these updates are:
Updating the importance of Extinction-Risks reduction downwards
Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards (Plan B)
Changing the AI safety community target from increasing A∧S to increasing A|S
Is Extinction-Risks reduction receiving much less funding than Alignment-Risks reduction? No. As we described in the section related to the first implication, quick estimates tell us that grants made to Non-AI Safety X-risks were equivalent to around 22% of the grants attributed to AI Safety in 2024 and to around 50% on average over 2022 to 2024. On top of that, part of the AI Safety grants are funding work on reducing Extinction-Risks (in addition to Alignment-Risks), such that these estimates are likely understating the amount of resources going to Extinction-Risks reduction relative to Alignment-Risks reduction.
Does increasing A∧S or A|S lead to similar actions? A counterargument to the third implication is that impacting these two probabilities may be very similar. A straightforward way by which this would be true is if S, the probability that humanity creates an SFC, was very close to one. If true, then switching from increasing A∧S to increasing A|S would in practice not update much our actions. A more general version of that point is if increasing S was already assumed to be much less tractable than increasing A|S, and that AI Safety agendas are already putting most of their efforts on increasing A|S. Let’s assume this is true, then AI Safety agendas would already be focusing on A|S and not on S. Then the implications from the ENH only reinforce these existing strategies. The implications of the ENH are not incorrect, but the third implication would bring much less novel value. To know if A∧S or A|S lead to similar actions, let’s investigate the following two questions:
Is S close to one?
Is increasing S currently assumed to be much less tractable than increasing A|S by the AI Safety community?
Is S close to one? We list a few possible arguments or scenarios illustrating why S is unlikely to be very close to one.
Fragile world hypothesis. humanity likely already has the capability to destroy itself. And further capabilities provided by ASI may unlock the capability to destroy the world for small actors.
Chaotic world. The world may get significantly more chaotic during takeoff. E.g., the hypothetical necessity of a pivotal act to prevent extinction from rogue actors will push actors in power to destabilize the world as soon as they can perform a pivotal act, for example, to prevent larger destabilizations later on.
S is reduced both by extinction and by not spreading in space. There are three conjunctive conditions necessary for S to be high: no extinction of intelligent agents on Earth, spreading in space, and wanting to control the resources in space.
The capabilities required to takeover are lower when not caring about surviving. Destroying humanity without the requirement to survive it is easier than destroying humanity and surviving it.[19] We will first reach the capability threshold for AIs destroying humanity before reaching the threshold for AIs destroying humanity and reliably surviving it.
Humans causing extinction. Controllable AIs may be used by humans to destroy humanity and all AIs or more generally to prevent Earth from creating an SFC.
High compute requirements mean high fragility. Datacenters and electric networks are fragile and take a while to build. During a fast takeoff, as long as the infrastructure necessary to run ASIs needs to be large, it remains fragile.
Inference compute scaling means that capabilities can drop during takeover. Inference compute scaling means that during a takeover if the AI taking over loses 99% to 99.999% of its compute in the process of destroying humanity, the AI may lose enough capability to no longer be able to survive or recover from it. Though this argument would be weaker if there were no time constraints forcing to act quickly, which is likely not the case during a fight for survival between an AI and humanity.
Inference compute scaling means that capabilities may drop in AI-AI conflicts. AI-AI conflicts may be extremely fast-paced. Humans-Humans conflicts take place on the scale of minutes, days, years, or decades. Humans and physical limitations prevent conflicts from happening too fast. AI-AI conflicts could take place in the scale of milliseconds or seconds. Because of inference compute scaling laws, this lower bound of conflict timescale can put significant downward pressure on capabilities. It is possible that AIs at human or superhuman level on human timescale conflicts would be pushed into an equilibrium in which AI-AI conflicts happen so fast that they lack time to think and see their effective capabilities drop significantly. This drop in capabilities could be partially balanced by precomputing strategies, as human militaries already do. However, it is likely that strategies preventing opponents’ precomputation would become advantageous[20].
AIs may be trained against caring about surviving. Humans will frequently train AIs against prioritizing their survival. AIs (at least some of the early ones) will be trained not to value their long-term impact and their survival (e.g., trained for humbleness, low impact, high discount rate, against instrumental goals). Though conditional on misalignment and especially on deceptive alignment, these values may not fully be learned or preserved.[21]
AIs could be trained against spreading in space and against controlling resources. An interesting case is if a misaligned AIs were trained to only care about Earth and to not do acausal trade, then under some conditions the misaligned AI may have little reason to use the resources in space. Such conditions include, defense being much easier than offense, or travel speed limitations preventing other SFCs from ever reaching Earth. One advantage of these training objectives is that such goals could be removed conditional on humans observing we succeeded at alignment and created a corrigible ASI.
Humans fighting back spitefully. Several countries have access to nuclear weapons and other offensive capabilities. A violent takeover by AIs may lead to retaliation by humans. Between the outcomes “human extinction” or “both human and AI extinction”, humans fighting back may choose to destroy AIs because of spite, or irrational assessments, as well as plausible rational assessments that misaligned SFCs are net negative.
Is increasing S currently assumed to be much less tractable than increasing A|S in the AI Safety community? Are current AI safety agendas already focusing on reducing the A|S part within A∧S? It is hard to tell. My guess is “not really”, for example, AI Safety people seem strongly motivated by preventing extinction in particular and are directly targeting this objective. Hopefully the Existential Choices Debate Week (March 17-23) will bring some answers.
How hard is it to change S|¬A while not changing S|A? Since A includes corrigibility, then changing S|¬A independently should not be extremely hard as long as we have some alignment power. E.g., an idea is to always train towards the values we wish misaligned ASIs to have, then if we succeed at alignment, including corrigibility, we later change these values to those we wish an aligned ASIs to have. This is somewhat similar to doing an humble and fail-safe version of the long-reflection proposal[22].
How tractable is increasing ¬S|¬A, relative to increasing A|S? Here are two quick thoughts:
The less tractable you think making progress on alignment is (i.e., changing A|S is hard), the more comparatively tractable increasing ¬S|¬A may be. Especially, if you don’t rely on changing the goals of misaligned AI to change ¬S|¬A.
Let’s assume we are increasing ¬S|¬A by changing the capabilities or goals of the misaligned AI. The alignment power required to achieve that may be lower than to influence A|S. When intervening on ¬S|¬A, we may succeed when we have only partial control over the values learned. The target to reach for succeeding at alignment may be much smaller than the target for succeeding at preventing misaligned AIs from wasting cosmic resources.
Quantitative comparisons
Four idealized interventions. We compare the impact of four idealized interventions. For each of them, we analytically solve how much marginal expected utility they bring. We finally compute numerical estimates given two scenarios (optimistic and pessimistic) and two values of ENH (0 and 0.75).
Simplified notations. For readability, we write probabilities P(X) as X. For example:
P(humanity creates an SFC) as S
P(alignment | humanity creates an SFC) as A|S
P(humanity does not creates an SFC | misalignment) as ¬S|¬A
P(alignment AND humanity creates an SFC) as A∧S
Summary of results
We estimate the impact of four idealized interventions matching different X-risk agendas:
Nuclear and Bioweapon X-risk reduction (↗ S) - Increasing S while keeping A|S constant.
Plan B AI Safety (↗ ¬S|¬A) - Increasing ¬S|¬A while keeping A∧S constant.
Updated AI Safety (↗ A|S) - Increasing A|S while keeping S constant.
Initial AI Safety (↗ A∧S) - Increasing A∧S through increasing both S and A|S in equal share.
We summarize their marginal utility given two scenarios (optimistic and pessimistic values for A) and two possible values for the correctness of ENH (0% and 75%). In the Optimistic scenario, we assume: A = 0.5, S|¬A = 0.5, and S|A = 1. In the Pessimistic scenario, we assume: A = 0.01, S|¬A = 0.5, and S|A = 1. You can run estimates using your own parameters in this spreadsheet (copy and edit).
Logistic tractability prior. We report results using a logistic tractability prior on changing probabilities. This prior is more realistic and informative than its absence, allowing fairer comparisons between interventions when probabilities are far from 0.5 (in the pessimistic scenario). You can find the same table of results without using this prior towards the end of the post. For simplicity, we give the analytic solutions without the logistic tractability prior.
Interventions
Marginal Utility
(with logistic prior)
Analytic solution (without logistic prior)
Optimistic scenario
Pessimistic scenario
ENH = 0
ENH = 0.75
ENH = 0
ENH = 0.75
Nuclear and Bioweapon X-risk reduction
(↗ S)
0.12 (56%)
0.03 (14%)
0.005 (26%)
0.001 (6%)
= A|S—C
Plan B AI Safety
(↗ ¬S|¬A)
0.00 (0%)
0.12 (56%)
0.000 (0%)
0.004 (19%)
= C
Updated AI Safety
(↗ A|S)
0.22 (100%)
0.22 (100%)
0.019 (100%)
0.019 (100%)
= 1
Initial AI Safety
(↗ A∧S)
0.17 (78%)
0.13 (57%)
0.012 (63%)
0.010 (53%)
= 0.5 . (A|S—C + 1)
Values in parenthesis are after normalizing values such that the highest score in each scenario gets 100%. The marginal utilities are computed for four scenarios as the derivatives of the total utility created in our reachable universe. These derivates are along different dimensions for each of the four idealized interventions. Derivatives are computed to be fair comparisons between each other, as detailed in the following sections.
Introducing the model
Assumptions and reminders. We try to create a model as realistic as possible while easily solvable:
We assume S|A = 1. This means that conditional on humanity creating an aligned ASI, then humanity creates an SFC for sure.
For simplicity, we assume we are only correlated with our exact copies and they represent a small enough fraction of other SFCs, such that we neglect them in computing the ENH ratio. Intuitively this can be understood as assuming that within the same reachable universe, there is never more than one exact copy of us.
The ENH ratio is equal to the value produced when humanity’s SFC does NOT exist divided by the value produced when it exists. The ENH ratio is thus variable since its denominator is function of the value we produce. ENH = U|¬S / U|S = (Aa|¬S / A|S) . (Ra|¬S / Rh|S). In which:
A|S = P(alignment of humanity’s SFC | humanity creates an SFC) is variable.
Aa|¬S = P(alignment of alien SFCs | humanity does not create an SFC) is constant.
“Alignment” is from humanity’s point of view, not from alien’s point of view. An SFC is aligned if its goal is some kind of ideal moral value (e.g., CEV), and if it has the ability to optimize it strongly.
Ra|¬S = Resources(aliens’ SFCs | humanity does not creates an SFC) is constant.
Rh|S = Resources(humanity’s SFC | humanity creates an SFC) is constant.
Changing S or A would not change any of these three constants. To change these parameters we would need to either change the speed of colonization of humanity’s SFC, or to be correlated with alien SFCs but we assumed no correlation with non exact copies.
Three possible outcomes: alignment, misalignment, alien SFCs. Let’s assume longtermism, meaning that humanity’s value is dominated by the impact of the SFC it could create. We assume this SFC can either be aligned and produce a value normalized to one, or misaligned and produce no value. Humanity could also not create any SFC, in that case, the value produced by alien SFCs is equal to ENH multiplied by the expected value that humanity’s SFC would have created. Thus the total utility in the world is the expected value over three outcomes: U|A∧S, U|¬A∧S and U|¬S.
The value created when humanity creates an aligned SFC:
U|A∧S = 1
P(A∧S) = S . A|S
The value created when humanity creates an misaligned SFC:
U|¬A∧S = 0
P(¬A∧S) = S . ¬A|S
The value created when humanity does not create an SFC:
U|¬S = A|S . ENH
P(¬S) = (1 - S)
Expected utility. In total, our expected utility equals U = (S . A|S . 1) + (S . ¬A|S . 0) + [(1 - S) . A|S . ENH]. And while ENH is variable, (A|S . ENH) is a constant we call C. This is intuitive since we assumed we are not correlated with other SFCs in the same reachable universe. In the end:
U = S . A|S + (1 - S) . C
Fair comparisons require similar differentiations. To be able to compare the impact of each intervention, we need a fair comparison. A naive unfair comparison would be to compute the impact of one intervention by differentiating by S and the impact of another intervention by differentiating by S/10. A less naive unfair comparison would be to differentiate by A in one case, and by A∧S in another case. Why is it unfair in that case? We know that d(A∧S) = A|S * dS + S * d(A|S), now let’s imagine A|S, and S are both close to 1. Then d(A∧S) ~ dA|S + dS, this would be unfair since it is as if we intervene on both A|S, and S with the same strength as when we were only differentiating by one of these probabilities alone. To prevent that we will assume there is a fixed amount of effort put into each intervention and this effort is split between changing dA and dS. Thus either we change dA, or dS, or we split the effort between both, e.g. 50⁄50. When not using logistic tractability prior, this method implicitly assumes that the marginal return on changing A and S are constant, i.e. that investing 2x in increasing S leads to a 2x larger increase.
Analytic solutions for the four interventions
We don’t include yet the logistic tractability prior in these expressions.
Increasing S while keeping A|S constant (e.g., biosecurity)
The impact of this first intervention is computed by:
IS = dU/dS = A|S—C
Increasing ¬S|¬A while keeping A constant (e.g., Plan B AI Safety)
As said previously, U = S . A|S + (1 - S) . C. We are now going to reformulate this in function of ¬S|¬A and A. Using the law of total probability, we have ¬S = ¬S|A . A + ¬S|¬A . ¬A. And since we already assumed that S|A = 1, we have ¬S|A = 0 and ¬S = ¬S|¬A . ¬A. Using Bayes Law, and S|A = 1, we have: S . A|S = A∧S = A . S|A = A. Thus:
U = A + ¬S|¬A . ¬A . C
Is differentiating by ¬S|¬A a fair comparison? As seen above, we have: ¬S|¬A = ¬S / ¬A. And we are keeping A constant, thus d(¬S|¬A) = - d(S) / ¬A. Thus, differentiating by ¬S|¬A while keeping A constant would be an unfair comparison. We want our derivative to vary proportionally to either dS or dA, with a scale one. We thus need to scale the derivative by the constant ¬A, since d(¬S|¬A . ¬A) = - d(S) and is thus a fair comparison.
The impact of this intervention is computed by:
I¬S|¬A =dU/d(¬S|¬A . ¬A) = C
Increasing A|S while keeping S constant (e.g., new AI Safety target)
Is differentiating by d(A|S) a fair comparison? We know that A|S = A∧S / S = S|A . A / S. And S|A is assumed constant and equal to one. Thus, A|S = A / S. We are fixing S constant, thus d(A|S) = dA / S. In the end, differentiating by A|S would not produce a fair comparison. We want our derivative to vary proportionally to either dS or dA, with a scale of one. We thus need to scale our derive by the constant S, since d(A|S . S) = d(A) and is thus a fair comparison.
The impact of this intervention is computed by:
IA|S = dU/d(A|S . S) = 1
Increasing A∧S through increasing both A|S and S in equal measure (e.g., old AI Safety target)
We know that A∧S = A|S . S. As we have seen before, differentiating by A∧S would not produce a fair comparison since the amount of effort put into changing A and S would not be the same as for the previous three interventions. We thus compute the impact of increasing A∧S as the impact of increasing A|S and S while putting 50% of effort in each of them.
As seen previously dU/dS = A|S—C and dU/d(A|S . S) = 1. Thus the impact of this intervention is:
Improving these models with a logistic tractability prior.
The tractability of changing A, S or any probability is, a priori, proportional to their logits. It makes more sense to assume we can uniformly change the logit of probabilities. Thus we need to differentiate by d(logit(A)) and d(logit(S)), instead of dA and dS to bake in this prior. This change would simply add the terms S . (S-1) to the impact of the interventions differentiating by S or ¬S|¬A, and A . (1 - A) when differentiating by A|S.
Values in parenthesis are after normalizing values such that the highest score in each scenario gets 100%.
Numerical comparisons
You can find numerical comparisons and run the model with your own parameters using the following spreadsheet.
Context
Evaluating the Existence Neutrality Hypothesis—Introductory Series. This post is part of a series introducing a research project for which I am seeking funding: Evaluating the Existence Neutrality Hypothesis. This project includes evaluating both the Civ-Saturation[2] and the Civ-Similarity Hypotheses[1] and their longtermist macrostrategic implications. This introductory series hints at preliminary research results and looks at the tractability of making further progress in evaluating these hypotheses.
Acknowledgements
Thanks to Tristan Cook, and Justis Mills for having spent some of their personal time providing excellent feedback on this post and ideas. Note that this research was done under my personal name and that this content is not meant to represent any organization’s stance.
The Existence Neutrality Hypothesis posits that influencing humanity’s chance at creating a Space-Faring Civilization (SFC) produces little value compared to increasing the quality of the SFC we would eventually create conditional on doing so.
The Civ-Saturation Hypothesis posits that when making decisions, we should assume most of humanity’s Space-Faring Civilization (SFC) resources will eventually be grabbed by SFCs regardless of whether humanity’s SFC exists or not.
The Civ-Similarity Hypothesis posits that the expected utility efficiency of humanity’s future Space-Faring Civilization (SFC) would be similar to that of other SFCs.
Here are a few reasons for having very high uncertainty: multiagent causal interactions in Civ-Saturated worlds and acausal interactions in any kind of worlds, not understanding counsciousness, not knowing what we need to optimize for, not understanding how to align the SFC humanity may create, not understanding the universe and the multiverse, chaostic or noisy systems deralling predictions over billions of years, etc…
Until now, I have spent about a hundred hours studying the Civ-Saturation Hypothesis, another hundred on the Civ-Similarity Hypothesis, and two hundred writing about both.
As a reminder, this value means that for a unit of resource grabbed, other SFCs would produce 90% of the value humanity’s SFC would have created if it had grabbed it.
Actual allocation to Non-AI Safety X-risk reduction (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
Guessed allocation (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
Guessed ideal allocation (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
“Extinction” here is the extinction of intelligent life on Earth, it is not limited to the extinction of humanity. In this post, we mostly conflate the extinction of intelligent life on Earth with humanity not creating an Space-Faring Civilization.
Agendas focusing on improving our chance at creating an SFC conditional on us failing at aligning it may have their impact reevaluated to be negative. At first sight, the AI Misuse, AI Control, and Pause AI agendas may be the most worrisome from that point of view. For example AI Control is mostly designed to control early misaligned AIs. But these weak AIs are also the most likely to lead to humanity not creating an SFC while destroying humanity, because they are weak and not able to spread through space after destroying humanity. The goals of these weak AIs could also be less scope-sensitive relative to later and more capable AIs. At the same time, these agendas do not straightforwardly increase our capacity at aligning AIs. Given a capability level, they don’t directly increase how likely aligned ASIs will be. These agendas may mostly decrease extinction risks, conditional on producing early misaligned AIs, while we go through an intermediary phase, during which these misaligned AIs would be capable enough to destroy humanity but not enough to survive and colonize space after destroying us. To be clear, while it is pretty unlikely that updating on the ENH would sign-flip the value of these AI Safety agendas, it is a possibility.
E.g., intellectual work, which will be the first to be automated, is sufficient to cause extinction while manual work, which will be only automatized later, is necessary for the AI to reliably survive.
Quotes from The Art of War by Sun Tzu: “Be extremely subtle, even to the point of formlessness. Be extremely mysterious, even to the point of soundlessness. Thereby you can be the director of the opponent’s fate.”, “All warfare is based on deception. Hence, when we are able to attack, we must seem unable; when using our forces, we must appear inactive; when we are near, we must make the enemy believe we are far away; when far away, we must make him believe we are near.”
Quote from Lukas Vinnveden: “However, I also expect people to be training AIs for obedience and, in particular, training them to not disempower humanity. So if we condition on a future where AIs disempower humanity, we evidentally didn’t have that much control over their values. This signiciantly weakens the strength of the argument “they’ll be nice because we’ll train them to be nice”. In addition: human disempowerment is more likely to succeed if AIs are willing to egregiously violate norms, such a by lying, stealing, and killing. So conditioning on human disempowerment also updates me somewhat towards egregiously norm-violating AI. That makes me feel less good about their values.”
Longtermist Implications of the Existence Neutrality Hypothesis
Crossposted on LessWrong.
We describe the implications the Existence Neutrality Hypothesis[1] could have for impartial longtermists, and then provide quantitative impact estimates for four idealized interventions when accounting or not for this hypothesis. Under this hypothesis, there is little value lost if humanity does not create a Space-Faring Civilization. The three main implications are: (1) To reduce significantly the longtermist value of reducing Extinction -Risks, such as those caused by nuclear and bioweapons. (2) To make some new AI Safety agendas (e.g., “Plan B”) competitive with existing AI Safety agendas. (3) To update the relative priorities between existing AI Safety agendas and possibly render some of them significantly less attractive.
Sequence: This post is part 9 of a sequence investigating the longtermist implications of alien Space-Faring Civilizations. Each post aims to be standalone. You can find an introduction to the sequence in the following post.
Summary
The Existence Neutrality Hypothesis (ENH) suggests that humanity not developing a Space-Faring Civilization (SFC) results in minimal marginal value loss, as most cosmic resources would be recovered by other civilizations with comparable utility creation efficiency. Assuming the ENH is about 75% correct (combining the Civ-Saturation Hypothesis[2] as 84% correct and the Civ-Similarity Hypothesis[3] as 90% correct), there are three core implications:
Significantly reduce the priority of Extinction-Risk reduction (e.g., nuclear, bioweapon-risks) by about 75%, since fewer cosmic resources and less value would be lost if humanity fails to create a space-faring civilization.
Increase the importance of alternative AI Safety agendas, such as the “Plan B” agenda, which aims at mitigating negative outcomes from misaligned AIs, for example, by reducing the chance that misaligned AIs achieve space colonization.
Shift the main AI safety target from maximizing the joint probability of alignment and humanity creating a space-faring civilization towards optimizing alignment conditional on humanity achieving space colonization. Shifting the focus from reducing Existential-Risks to Alignment-Risks.
We first qualitatively introduce these implications, and then review some questions related to how substantial they are. Finally, we produce a simple quantitative model estimating and illustrating these implications.
Reminder of hypotheses and assumptions
The Existence Neutrality Hypothesis is the conjunction of two subhypotheses. The Existence Neutrality Hypothesis posits that humanity creating a Space-Faring Civilization (SFC) does not bring much marginal value, because we reach relatively few marginal resources, and because humanity’s SFC does not create much more value than other SFCs. The implications we describe in the post are conditional on the Existence Neutrality Hypothesis being significantly correct. This hypothesis is the conjunction of two subhypotheses: The Civ-Saturation Hypothesis, and the Civ-Silimilarity Hypothesis.
Civ-Saturation Hypothesis: Most resources are grabbed irrespective of our existence. The Civ-Saturation Hypothesis posits that when making decisions, we should assume most of humanity’s Space-Faring Civilization (SFC) future resources will eventually be grabbed by SFCs regardless of whether humanity’s SFC exists or not. In the post “Other Civilizations Would Recover 84+% of Our Cosmic Resources”, we performed a first evaluation of this hypothesis and concluded that as long as you believe in some form of EDT, then our best guess is that 84+% of humanity’s SFC resources would be recovered by other SFCs if humanity does not create an SFC. For the remainder of the post, we will assume the most conservative form of EDT, which is simply CDT plus assuming we control our exact copies.
Civ-Similarity Hypothesis: Our civilization is not much abnormal in terms of creating value. The Civ-Similarity Hypothesis posits that the expected utility[4] of humanity’s future Space-Faring Civilization is similar to that of other SFCs. In the post “The Convergent Path to the Stars—Similar Utility Across Civilizations Challenges Extinction Prioritization”, we introduced some reasons supporting the Civ-Similarity Hypothesis. Our current speculative guess is that other SFCs produce 95% to 99% of the value humanity’s SFC would produce per unit of resources[5]. There are two main reasons behind these high values: First, our longtermist expected utility estimates are massively uncertain[6], which leads to a flattening of differences, and second the supporting and opposing arguments, don’t seem to differ massively in terms of credibility or impact. Even when we predict some differences, they likely only produce very weak updates and fail to support that humanity would be much abnormal among SFCs.[7]
We evaluate the Existence Neutrality Hypothesis using the ENH ratio. The ENH ratio is the ratio between the value produced by humanity NOT creating an SFC and creating one. We call these two outcomes U|¬S and U|S. We assume for simplicity they are positive values. ENH ratio = U|¬S / U|S. The closer the ENH ratio is to one, the more correct the Existence Neutrality Hypothesis is. Abusing language, we will say that this hypothesis is 75% correct[8] when the ratio equals 0.75.
In this post we assume the ENH ratio equals 0.75. We will investigate what would be the strategic implications if the Existence Neutrality Hypothesis were 75% true, given classical total utilitarianism. We obtain this value by multiplying our quantitative best guess estimate for the correctness of the Civ-Saturation Hypothesis (84%[9]), with a slightly conservative speculative guess for the Civ-Similarity Hypothesis (90%[10],[11]).
Overview of implications
The ENH being 75% true would have three main implications:
Reducing by 75% the importance of Extinction-Risk reduction (e.g., from nuclear weapons and bioweapons)
Opening new competitive AI Safety agendas. For example, “Plan B AI safety”, which focuses on decreasing the negative impact of misaligned AIs.
Updating the relative importance of AI safety agenda by shifting our optimization target from Existential-Risks to Alignment-Risks, from increasing P(alignment AND humanity creates an SFC) to increasing P(alignment | humanity creates an SFC),
Let’s illustrate how the values of X-risk reduction agendas are changed when the ENH is true. In the plot below, the red/green areas denote the production of marginal negative/positive value. We plot two primary directions (black continuous arrows) along which interventions can have an impact: Increasing P(humanity creates an SFC) or increasing P(alignment | humanity creates an SFC), reducing Extinction-Risks or reducing Alignment-Risks. Using dashed-arrow, we plot the conjunctive directions P(alignment AND humanity creates an SFC) and P(misalignment AND humanity creates an SFC). Finally, each circle will represent an area in which an X-risk reduction agenda is speculatively located, meaning that the typical impact of this agenda is supported by interventions producing effects along the plotted directions.
Let’s now plot the speculative locations of X-risk agendas when assuming that the ENH is incorrect (ENH ~ 0). We highlight the following: Nuclear and bioweapon X-risk reduction have positive marginal value. The Plan-B AI safety agenda is mostly neutral. And optimizing P(alignment AND humanity creating an SFC) provides the most marginal value.
Now let’s represent an alternative chart of priorities when ENH is correct. For simplicity we use ENH = 1 in the following plot. We now observe the following: Nuclear and bioweapon X-risk reduction produces much less value[12]. The Plan-B agenda now produces positive value. And increasing P(alignment | humanity creates an SFC) is now the optimal target.
Qualitative implications
Reducing extinction risks, which prevent the creation of SFCs, would be 75% less important
75% less impact from reducing Extinction-Risks. When the ENH is 75% correct, this means only 25% of humanity’s SFC value would be lost if it does not exist. This directly translates into a reduction of 75% of the value of increasing humanity’s SFC chances to exist. Nuclear and bioweapons X-risk reductions are usually seen as the most impactful, non-AI Safety, interventions to increase humanity’s SFC chance to exist. From existing surveys, estimates, and grant databases, we extract estimates of how much resources are allocated or should be allocated to cause areas related to reducing non-AI extinction risks, see details in footnote[13].
Non-AI Safety extinction reduction grants may represent the equivalent of between 16% (2024) and up to 50% (2022-2024 avg.) of the grants made to AI Safety. Let’s focus on Open Philanthropy since they are the largest grant maker in the EA community. Using their grant database and their own labels, we estimate that in 2024, 4.0% of their total grant budget was allocated to “Biosecurity & Pandemic Preparedness”. This is roughly equal to 22% of the grant budget allocated to “Potential Risks From Advanced AI” in the same year (18.4% of all grants in 2024). A significant issue with these numbers is that they exclude the focus area “Global Catastrophic Risks Capacity Building”, which is significant (15.7% of all grants in 2024) and consists, at first sight, of something like 50%[14] of grants for AI Safety capacity building and 50% of grants whose goal does NOT seem limited to AI Safety but likely often includes it.
If we ignore capacity building, then the grants to “Biosecurity & Pandemic Preparedness” represent 22% of the grants to “Potential Risks From Advanced AI” in 2024, down from 62% in 2023 and 67% in 2022.
If we don’t ignore “Global Catastrophic Risks Capacity Building”, and we try to classify grants using keywords in their names while defaulting to the ratio between “Biosecurity & Pandemic Preparedness” and “Potential Risks From Advanced AI” when we don’t find any keywords, then we produce the estimate that non-AI Safety grants represented 16% of the amount allocated to AI safety grants in 2024 and 50% in average over 2022 to 2024.
These estimates show that a 75% reduction in the importance of these cause areas (relative to other longtermist cause areas) may have some significant impact on longtermist grant-making prioritization and community beliefs.
Plan B AI Safety
Misaligned AIs are net negative when the ENH is significantly correct. Plan B AI Safety is a relatively new and underground AI Safety agenda that focuses on influencing the future conditional on failing at aligning ASIs. Point of views about the value of misaligned ASIs vary a lot[15]. Let’s use the middle-ground assumption that misaligned ASIs would produce a future with negligible value. Under this assumption, the Plan B agenda would have no or little value. But under the assumption that the Existence Neutrality Hypothesis is 75% true, then a misaligned ASI could have a strong negative impact by taking resources away from alien SFCs, which may succeed at creating a positive future.
Decreasing P(humanity creates an SFC | misalignment) is competitive with other AI Safety agenda when ENH is 75% correct. One goal of the Plan B AI Safety agenda is to decrease P(humanity creates an SFC | misalignment). If ENH is incorrect, this intervention has no impact, but when assuming ENH is 75% true, then this agenda is now similar in impact to increasing P(alignment AND humanity creates an SFC). See the quantitative estimates later in the post.
The less valuable humanity’s SFC is, relative to alien SFCs, the more important this agenda is. After updating on the ENH, work on Plan B AI Safety now seems impactful. Such work may be especially attractive for people thinking that humanity is especially bad at technical AI safety, AI governance, or may have especially worrisome moral values, relative to other alien Intelligent Civilizations on the verge of creating an SFC. Someone in this situation would estimate that ENH is higher than 75% correct, possibly above 100%.
Updating the importance of existing AI Safety agendas
Let’s simplify notations:
Let’s shorten P(alignment AND humanity creates an SFC) into A∧S. We call increasing A∧S: “Reducing X-risks”.
Let’s shorten P(alignment | humanity creates an SFC) into A|S. We call increasing A|S: “Reducing Alignment-Risks”, this is equivalent to “Increasing the value of the future in which our space-faring descendants exist”.
And let’s write P(humanity creates an SFC) as S. We call increasing S: “Reducing Extinction-Risks”[16]
The AI Safety community is currently optimizing for A∧S. As of 2024, the AI Safety community is mostly focusing on solving intent alignment through technical AI Safety research, and, with lower emphasis, on solving AI Governance. These works and motivations behind them can be understood as optimizing the probability that humanity creates an aligned SFC, which we can formalize as A∧S.
Optimizing A|S is a better target when ENH is correct. While optimizing A∧S may be mostly the right target when ENH is incorrect, if we now assume it is 75% correct, then the marginal value of creating a SFC is significantly decreased and the optimal target move closer to increasing A|S.
Speculative sign-flipping update. At the extreme, AI Safety agendas could see their expected value flip from positive to negative if they increase S enough to have A∧S increase overall while at the same time they are effectively decreasing A|S. The AI Safety agendas speculatively concerned with such dramatic updates may include those in which our impact is mostly limited to reducing the chance of extinction conditional on observing that humanity is especially bad at alignment, e.g., AI Control and AI Misuse. We speculate about this possible sight-flip updates in a footnote[17].
What is the difference between optimizing A∧S and A|S? Formally, Bayes’ theorem tells us that A∧S = A|S * S. Thus the difference between both is that when you optimize for A|S, you no longer optimize for the term S, which was included in A∧S; you no longer optimize for P(humanity creates an SFC).
Concretely, how should the difference between A∧S and A|S make us update on the priority of AI safety agendas? Here are speculative examples. The degree to which priorities should be updated is to be debated. We only claim that they may need to be updated conditional on the ENH being significantly correct.
AI Misuse reduction: If the Paths-To-Impact (PTIs) are (a) to prevent extinction through reducing misuse and chaos, (b) to prevent the loss of alignment power resulting from a more chaotic world, and (c) to provide more time for Alignment research. Then it is plausible that the PTI (a) would become less impactful.
Misalign AI Control: If the PTIs are (c) same as for AI misuse, (d) to prevent extinction through controlling early misaligned AI trying to take over, (e) to control misaligned early AIs to make them work on alignment research, and (f) to create fire alarms[18]. Then it is plausible the PTI (d) would be less impactful since these early misaligned AI may have a higher chance to not create an SFC after taking over (e.g., they don’t survive destroying humanity or don’t care about space colonization, see the next section for more detail).
AI evaluations: The reduction of the impact of (a) and (d) may also impact the overall importance of this agenda.
Here is another effect: If an intervention, like AI control, increases P(humanity creates an SFC | early misalignment), then this intervention may need to be discounted more than if it was only increasing S. Increasing S may have little impact when the ENH is significantly correct, but increasing P(humanity creates an SFC | misalignment) is net negative, and early misalignment and (late) misalignment may be strongly correlated.
These updates are, at the moment, speculative and need to be debated.
How valuable are these updates?
Let’s look at how valuable these updates are:
Updating the importance of Extinction-Risks reduction downwards
Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards (Plan B)
Changing the AI safety community target from increasing A∧S to increasing A|S
Is Extinction-Risks reduction receiving much less funding than Alignment-Risks reduction? No. As we described in the section related to the first implication, quick estimates tell us that grants made to Non-AI Safety X-risks were equivalent to around 22% of the grants attributed to AI Safety in 2024 and to around 50% on average over 2022 to 2024. On top of that, part of the AI Safety grants are funding work on reducing Extinction-Risks (in addition to Alignment-Risks), such that these estimates are likely understating the amount of resources going to Extinction-Risks reduction relative to Alignment-Risks reduction.
Does increasing A∧S or A|S lead to similar actions? A counterargument to the third implication is that impacting these two probabilities may be very similar. A straightforward way by which this would be true is if S, the probability that humanity creates an SFC, was very close to one. If true, then switching from increasing A∧S to increasing A|S would in practice not update much our actions. A more general version of that point is if increasing S was already assumed to be much less tractable than increasing A|S, and that AI Safety agendas are already putting most of their efforts on increasing A|S. Let’s assume this is true, then AI Safety agendas would already be focusing on A|S and not on S. Then the implications from the ENH only reinforce these existing strategies. The implications of the ENH are not incorrect, but the third implication would bring much less novel value. To know if A∧S or A|S lead to similar actions, let’s investigate the following two questions:
Is S close to one?
Is increasing S currently assumed to be much less tractable than increasing A|S by the AI Safety community?
Is S close to one? We list a few possible arguments or scenarios illustrating why S is unlikely to be very close to one.
Fragile world hypothesis. humanity likely already has the capability to destroy itself. And further capabilities provided by ASI may unlock the capability to destroy the world for small actors.
Chaotic world. The world may get significantly more chaotic during takeoff. E.g., the hypothetical necessity of a pivotal act to prevent extinction from rogue actors will push actors in power to destabilize the world as soon as they can perform a pivotal act, for example, to prevent larger destabilizations later on.
S is reduced both by extinction and by not spreading in space. There are three conjunctive conditions necessary for S to be high: no extinction of intelligent agents on Earth, spreading in space, and wanting to control the resources in space.
The capabilities required to takeover are lower when not caring about surviving. Destroying humanity without the requirement to survive it is easier than destroying humanity and surviving it.[19] We will first reach the capability threshold for AIs destroying humanity before reaching the threshold for AIs destroying humanity and reliably surviving it.
Humans causing extinction. Controllable AIs may be used by humans to destroy humanity and all AIs or more generally to prevent Earth from creating an SFC.
High compute requirements mean high fragility. Datacenters and electric networks are fragile and take a while to build. During a fast takeoff, as long as the infrastructure necessary to run ASIs needs to be large, it remains fragile.
Inference compute scaling means that capabilities can drop during takeover. Inference compute scaling means that during a takeover if the AI taking over loses 99% to 99.999% of its compute in the process of destroying humanity, the AI may lose enough capability to no longer be able to survive or recover from it. Though this argument would be weaker if there were no time constraints forcing to act quickly, which is likely not the case during a fight for survival between an AI and humanity.
Inference compute scaling means that capabilities may drop in AI-AI conflicts. AI-AI conflicts may be extremely fast-paced. Humans-Humans conflicts take place on the scale of minutes, days, years, or decades. Humans and physical limitations prevent conflicts from happening too fast. AI-AI conflicts could take place in the scale of milliseconds or seconds. Because of inference compute scaling laws, this lower bound of conflict timescale can put significant downward pressure on capabilities. It is possible that AIs at human or superhuman level on human timescale conflicts would be pushed into an equilibrium in which AI-AI conflicts happen so fast that they lack time to think and see their effective capabilities drop significantly. This drop in capabilities could be partially balanced by precomputing strategies, as human militaries already do. However, it is likely that strategies preventing opponents’ precomputation would become advantageous[20].
AIs may be trained against caring about surviving. Humans will frequently train AIs against prioritizing their survival. AIs (at least some of the early ones) will be trained not to value their long-term impact and their survival (e.g., trained for humbleness, low impact, high discount rate, against instrumental goals). Though conditional on misalignment and especially on deceptive alignment, these values may not fully be learned or preserved.[21]
AIs could be trained against spreading in space and against controlling resources. An interesting case is if a misaligned AIs were trained to only care about Earth and to not do acausal trade, then under some conditions the misaligned AI may have little reason to use the resources in space. Such conditions include, defense being much easier than offense, or travel speed limitations preventing other SFCs from ever reaching Earth. One advantage of these training objectives is that such goals could be removed conditional on humans observing we succeeded at alignment and created a corrigible ASI.
Humans fighting back spitefully. Several countries have access to nuclear weapons and other offensive capabilities. A violent takeover by AIs may lead to retaliation by humans. Between the outcomes “human extinction” or “both human and AI extinction”, humans fighting back may choose to destroy AIs because of spite, or irrational assessments, as well as plausible rational assessments that misaligned SFCs are net negative.
Is increasing S currently assumed to be much less tractable than increasing A|S in the AI Safety community? Are current AI safety agendas already focusing on reducing the A|S part within A∧S? It is hard to tell. My guess is “not really”, for example, AI Safety people seem strongly motivated by preventing extinction in particular and are directly targeting this objective. Hopefully the Existential Choices Debate Week (March 17-23) will bring some answers.
How hard is it to change S|¬A while not changing S|A? Since A includes corrigibility, then changing S|¬A independently should not be extremely hard as long as we have some alignment power. E.g., an idea is to always train towards the values we wish misaligned ASIs to have, then if we succeed at alignment, including corrigibility, we later change these values to those we wish an aligned ASIs to have. This is somewhat similar to doing an humble and fail-safe version of the long-reflection proposal[22].
How tractable is increasing ¬S|¬A, relative to increasing A|S? Here are two quick thoughts:
The less tractable you think making progress on alignment is (i.e., changing A|S is hard), the more comparatively tractable increasing ¬S|¬A may be. Especially, if you don’t rely on changing the goals of misaligned AI to change ¬S|¬A.
Let’s assume we are increasing ¬S|¬A by changing the capabilities or goals of the misaligned AI. The alignment power required to achieve that may be lower than to influence A|S. When intervening on ¬S|¬A, we may succeed when we have only partial control over the values learned. The target to reach for succeeding at alignment may be much smaller than the target for succeeding at preventing misaligned AIs from wasting cosmic resources.
Quantitative comparisons
Four idealized interventions. We compare the impact of four idealized interventions. For each of them, we analytically solve how much marginal expected utility they bring. We finally compute numerical estimates given two scenarios (optimistic and pessimistic) and two values of ENH (0 and 0.75).
Simplified notations. For readability, we write probabilities P(X) as X. For example:
P(humanity creates an SFC) as S
P(alignment | humanity creates an SFC) as A|S
P(humanity does not creates an SFC | misalignment) as ¬S|¬A
P(alignment AND humanity creates an SFC) as A∧S
Summary of results
We estimate the impact of four idealized interventions matching different X-risk agendas:
Nuclear and Bioweapon X-risk reduction (↗ S) - Increasing S while keeping A|S constant.
Plan B AI Safety (↗ ¬S|¬A) - Increasing ¬S|¬A while keeping A∧S constant.
Updated AI Safety (↗ A|S) - Increasing A|S while keeping S constant.
Initial AI Safety (↗ A∧S) - Increasing A∧S through increasing both S and A|S in equal share.
We summarize their marginal utility given two scenarios (optimistic and pessimistic values for A) and two possible values for the correctness of ENH (0% and 75%). In the Optimistic scenario, we assume: A = 0.5, S|¬A = 0.5, and S|A = 1. In the Pessimistic scenario, we assume: A = 0.01, S|¬A = 0.5, and S|A = 1. You can run estimates using your own parameters in this spreadsheet (copy and edit).
Logistic tractability prior. We report results using a logistic tractability prior on changing probabilities. This prior is more realistic and informative than its absence, allowing fairer comparisons between interventions when probabilities are far from 0.5 (in the pessimistic scenario). You can find the same table of results without using this prior towards the end of the post. For simplicity, we give the analytic solutions without the logistic tractability prior.
Marginal Utility
(with logistic prior)
Nuclear and Bioweapon X-risk reduction
(↗ S)
Plan B AI Safety
(↗ ¬S|¬A)
Updated AI Safety
(↗ A|S)
Initial AI Safety
(↗ A∧S)
Values in parenthesis are after normalizing values such that the highest score in each scenario gets 100%. The marginal utilities are computed for four scenarios as the derivatives of the total utility created in our reachable universe. These derivates are along different dimensions for each of the four idealized interventions. Derivatives are computed to be fair comparisons between each other, as detailed in the following sections.
Introducing the model
Assumptions and reminders. We try to create a model as realistic as possible while easily solvable:
We assume S|A = 1. This means that conditional on humanity creating an aligned ASI, then humanity creates an SFC for sure.
For simplicity, we assume we are only correlated with our exact copies and they represent a small enough fraction of other SFCs, such that we neglect them in computing the ENH ratio. Intuitively this can be understood as assuming that within the same reachable universe, there is never more than one exact copy of us.
The ENH ratio is equal to the value produced when humanity’s SFC does NOT exist divided by the value produced when it exists. The ENH ratio is thus variable since its denominator is function of the value we produce. ENH = U|¬S / U|S = (Aa|¬S / A|S) . (Ra|¬S / Rh|S). In which:
A|S = P(alignment of humanity’s SFC | humanity creates an SFC) is variable.
Aa|¬S = P(alignment of alien SFCs | humanity does not create an SFC) is constant.
“Alignment” is from humanity’s point of view, not from alien’s point of view. An SFC is aligned if its goal is some kind of ideal moral value (e.g., CEV), and if it has the ability to optimize it strongly.
Ra|¬S = Resources(aliens’ SFCs | humanity does not creates an SFC) is constant.
Rh|S = Resources(humanity’s SFC | humanity creates an SFC) is constant.
Changing S or A would not change any of these three constants. To change these parameters we would need to either change the speed of colonization of humanity’s SFC, or to be correlated with alien SFCs but we assumed no correlation with non exact copies.
Three possible outcomes: alignment, misalignment, alien SFCs. Let’s assume longtermism, meaning that humanity’s value is dominated by the impact of the SFC it could create. We assume this SFC can either be aligned and produce a value normalized to one, or misaligned and produce no value. Humanity could also not create any SFC, in that case, the value produced by alien SFCs is equal to ENH multiplied by the expected value that humanity’s SFC would have created. Thus the total utility in the world is the expected value over three outcomes: U|A∧S, U|¬A∧S and U|¬S.
The value created when humanity creates an aligned SFC:
U|A∧S = 1
P(A∧S) = S . A|S
The value created when humanity creates an misaligned SFC:
U|¬A∧S = 0
P(¬A∧S) = S . ¬A|S
The value created when humanity does not create an SFC:
U|¬S = A|S . ENH
P(¬S) = (1 - S)
Expected utility. In total, our expected utility equals U = (S . A|S . 1) + (S . ¬A|S . 0) + [(1 - S) . A|S . ENH]. And while ENH is variable, (A|S . ENH) is a constant we call C. This is intuitive since we assumed we are not correlated with other SFCs in the same reachable universe. In the end:
U = S . A|S + (1 - S) . C
Fair comparisons require similar differentiations. To be able to compare the impact of each intervention, we need a fair comparison. A naive unfair comparison would be to compute the impact of one intervention by differentiating by S and the impact of another intervention by differentiating by S/10. A less naive unfair comparison would be to differentiate by A in one case, and by A∧S in another case. Why is it unfair in that case? We know that d(A∧S) = A|S * dS + S * d(A|S), now let’s imagine A|S, and S are both close to 1. Then d(A∧S) ~ dA|S + dS, this would be unfair since it is as if we intervene on both A|S, and S with the same strength as when we were only differentiating by one of these probabilities alone. To prevent that we will assume there is a fixed amount of effort put into each intervention and this effort is split between changing dA and dS. Thus either we change dA, or dS, or we split the effort between both, e.g. 50⁄50. When not using logistic tractability prior, this method implicitly assumes that the marginal return on changing A and S are constant, i.e. that investing 2x in increasing S leads to a 2x larger increase.
Analytic solutions for the four interventions
We don’t include yet the logistic tractability prior in these expressions.
Increasing S while keeping A|S constant (e.g., biosecurity)
The impact of this first intervention is computed by:
IS = dU/dS = A|S—C
Increasing ¬S|¬A while keeping A constant (e.g., Plan B AI Safety)
As said previously, U = S . A|S + (1 - S) . C. We are now going to reformulate this in function of ¬S|¬A and A. Using the law of total probability, we have ¬S = ¬S|A . A + ¬S|¬A . ¬A. And since we already assumed that S|A = 1, we have ¬S|A = 0 and ¬S = ¬S|¬A . ¬A. Using Bayes Law, and S|A = 1, we have: S . A|S = A∧S = A . S|A = A. Thus:
U = A + ¬S|¬A . ¬A . C
Is differentiating by ¬S|¬A a fair comparison? As seen above, we have: ¬S|¬A = ¬S / ¬A. And we are keeping A constant, thus d(¬S|¬A) = - d(S) / ¬A. Thus, differentiating by ¬S|¬A while keeping A constant would be an unfair comparison. We want our derivative to vary proportionally to either dS or dA, with a scale one. We thus need to scale the derivative by the constant ¬A, since d(¬S|¬A . ¬A) = - d(S) and is thus a fair comparison.
The impact of this intervention is computed by:
I¬S|¬A = dU/d(¬S|¬A . ¬A) = C
Increasing A|S while keeping S constant (e.g., new AI Safety target)
Is differentiating by d(A|S) a fair comparison? We know that A|S = A∧S / S = S|A . A / S. And S|A is assumed constant and equal to one. Thus, A|S = A / S. We are fixing S constant, thus d(A|S) = dA / S. In the end, differentiating by A|S would not produce a fair comparison. We want our derivative to vary proportionally to either dS or dA, with a scale of one. We thus need to scale our derive by the constant S, since d(A|S . S) = d(A) and is thus a fair comparison.
The impact of this intervention is computed by:
IA|S = dU/d(A|S . S) = 1
Increasing A∧S through increasing both A|S and S in equal measure (e.g., old AI Safety target)
We know that A∧S = A|S . S. As we have seen before, differentiating by A∧S would not produce a fair comparison since the amount of effort put into changing A and S would not be the same as for the previous three interventions. We thus compute the impact of increasing A∧S as the impact of increasing A|S and S while putting 50% of effort in each of them.
As seen previously dU/dS = A|S—C and dU/d(A|S . S) = 1. Thus the impact of this intervention is:
IA∧S = 0.5 . (dU/dS + dU/d(A|S . S)) = 0.5 . (A|S—C + 1).
Results without logistic tractability prior
Marginal Utility
(without logistic prior)
Analytic solution
(without logistic prior)
Nuclear and Bioweapon X-risk reduction
(↗ S)
Plan B AI Safety
(↗ ¬S|¬A)
Updated AI Safety
(↗ A|S)
Initial AI Safety
(↗ A∧S)
Improving these models with a logistic tractability prior.
The tractability of changing A, S or any probability is, a priori, proportional to their logits. It makes more sense to assume we can uniformly change the logit of probabilities. Thus we need to differentiate by d(logit(A)) and d(logit(S)), instead of dA and dS to bake in this prior. This change would simply add the terms S . (S-1) to the impact of the interventions differentiating by S or ¬S|¬A, and A . (1 - A) when differentiating by A|S.
We report here an updated table using this prior:
Marginal Utility
(with logistic prior)
Analytic solution
(with logistic prior)
Nuclear and Bioweapon X-risk reduction
(↗ S)
Plan B AI Safety
(↗ ¬S|¬A)
Updated AI Safety
(↗ A|S)
Initial AI Safety
(↗ A∧S)
Values in parenthesis are after normalizing values such that the highest score in each scenario gets 100%.
Numerical comparisons
You can find numerical comparisons and run the model with your own parameters using the following spreadsheet.
Context
Evaluating the Existence Neutrality Hypothesis—Introductory Series. This post is part of a series introducing a research project for which I am seeking funding: Evaluating the Existence Neutrality Hypothesis. This project includes evaluating both the Civ-Saturation[2] and the Civ-Similarity Hypotheses[1] and their longtermist macrostrategic implications. This introductory series hints at preliminary research results and looks at the tractability of making further progress in evaluating these hypotheses.
Acknowledgements
Thanks to Tristan Cook, and Justis Mills for having spent some of their personal time providing excellent feedback on this post and ideas. Note that this research was done under my personal name and that this content is not meant to represent any organization’s stance.
The Existence Neutrality Hypothesis posits that influencing humanity’s chance at creating a Space-Faring Civilization (SFC) produces little value compared to increasing the quality of the SFC we would eventually create conditional on doing so.
The Civ-Saturation Hypothesis posits that when making decisions, we should assume most of humanity’s Space-Faring Civilization (SFC) resources will eventually be grabbed by SFCs regardless of whether humanity’s SFC exists or not.
The Civ-Similarity Hypothesis posits that the expected utility efficiency of humanity’s future Space-Faring Civilization (SFC) would be similar to that of other SFCs.
We are talking about expected utilty efficiency, meaning the utility produced per unity of resource grabbed.
Speculative 80% CI = [0.25, 2]. Speculative 50% CI= [0.90, 1.05]
Here are a few reasons for having very high uncertainty: multiagent causal interactions in Civ-Saturated worlds and acausal interactions in any kind of worlds, not understanding counsciousness, not knowing what we need to optimize for, not understanding how to align the SFC humanity may create, not understanding the universe and the multiverse, chaostic or noisy systems deralling predictions over billions of years, etc…
Until now, I have spent about a hundred hours studying the Civ-Saturation Hypothesis, another hundred on the Civ-Similarity Hypothesis, and two hundred writing about both.
Though when talking about a ratio above one, we will stop abusing the term “correctness” and will refer to the ENH ratio.
As a reminder, this value means that 84% of humanity’s SFC resources would NOT be lost if it does not exist.
As a reminder, this value means that for a unit of resource grabbed, other SFCs would produce 90% of the value humanity’s SFC would have created if it had grabbed it.
This is the lower bound of our current 50% CI.
Actually, no value in this plot since it assumes ENH = 1.
Actual allocation to Non-AI Safety X-risk reduction (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
Guessed allocation (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
Guessed ideal allocation (share of total allocations):
4.0% excluding capacity building—Our own quick analysis of the OpenPhil Fund grant database for 2024 estimates that “Biosecurity & Pandemic Preparedness” accounted for (2019: 13.7%, 2020: 13.1%, 2021: 4.7%, 2022: 7.6%, 2023: 6.4%, 2024: 4.0%) of the total allocation. This value of 4% can be compared to the (2019: 21.7%, 2020: 5.7%, 2021: 15.8%, 2022: 11.3%, 2023: 10.3%, 2024: 18.4%) allocated to “Potential Risks From Advanced AI” in 2024 by OpenPhil.
2% - Our own quick analysis of the EA Fund grant database for 2024 - Underestimates based on the presence of keywords in the name of the grants - (2019: 0.7%, 2020: 0%, 2021: 2%, 2022: 0.6%, 2023: 0.8%, 2024: 2%). This value of 2% can be compared to 51%, which is the share of all EA Funds grants in 2024 allocated by the Long-term Fund specifically.
13% - Benjamin Todd’s 2019 estimates, funding plus people—Split: 8% Biosecurity + 5% Other global catastrophic risk (incl. climate tail risks)
13% - Benjamin Todd’s 2019 estimates, funding only—Split: 10% Biosecurity + 3% Other global catastrophic risk (incl. climate tail risks)
14% − 2020 EA Coordination Forum—Split: 8% Biosecurity + 5% Other global catastrophic risks + 1% Broad longtermist
17% - EA Survey Supplement 2023 High engagement—Labelled: Existential risk other than AI
16% - EA Survey Supplement 2023 Avg. - Labelled: Existential risk other than AI
22% − 2020 EA Coordination Forum—Split: 9% Biosecurity + 4% Other global catastrophic risks + 9% Broad longtermist
25% − 2019 EA Leaders Forum—Labelled: Other Longtermist work
22% - EA Survey 2019 (Forum) - Labelled: Other Longtermist work
This 50% value is by aggregating grants by their number instead of their value.
Ryan Greenblatt and Paul Christiano argued that misaligned ASIs may create positive value, Wei Dai and Daniel Kokotajlo mentioning that they may create negative value. For more, see the post “When is unaligned AI morally valuable?” and the related comments.
“Extinction” here is the extinction of intelligent life on Earth, it is not limited to the extinction of humanity. In this post, we mostly conflate the extinction of intelligent life on Earth with humanity not creating an Space-Faring Civilization.
Agendas focusing on improving our chance at creating an SFC conditional on us failing at aligning it may have their impact reevaluated to be negative. At first sight, the AI Misuse, AI Control, and Pause AI agendas may be the most worrisome from that point of view. For example AI Control is mostly designed to control early misaligned AIs. But these weak AIs are also the most likely to lead to humanity not creating an SFC while destroying humanity, because they are weak and not able to spread through space after destroying humanity. The goals of these weak AIs could also be less scope-sensitive relative to later and more capable AIs. At the same time, these agendas do not straightforwardly increase our capacity at aligning AIs. Given a capability level, they don’t directly increase how likely aligned ASIs will be. These agendas may mostly decrease extinction risks, conditional on producing early misaligned AIs, while we go through an intermediary phase, during which these misaligned AIs would be capable enough to destroy humanity but not enough to survive and colonize space after destroying us. To be clear, while it is pretty unlikely that updating on the ENH would sign-flip the value of these AI Safety agendas, it is a possibility.
Note that this PTI somewhat contradicts the path (b) described for AI Misuse.
E.g., intellectual work, which will be the first to be automated, is sufficient to cause extinction while manual work, which will be only automatized later, is necessary for the AI to reliably survive.
Quotes from The Art of War by Sun Tzu: “Be extremely subtle, even to the point of formlessness. Be extremely mysterious, even to the point of soundlessness. Thereby you can be the director of the opponent’s fate.”, “All warfare is based on deception. Hence, when we are able to attack, we must seem unable; when using our forces, we must appear inactive; when we are near, we must make the enemy believe we are far away; when far away, we must make him believe we are near.”
Quote from Lukas Vinnveden: “However, I also expect people to be training AIs for obedience and, in particular, training them to not disempower humanity. So if we condition on a future where AIs disempower humanity, we evidentally didn’t have that much control over their values. This signiciantly weakens the strength of the argument “they’ll be nice because we’ll train them to be nice”. In addition: human disempowerment is more likely to succeed if AIs are willing to egregiously violate norms, such a by lying, stealing, and killing. So conditioning on human disempowerment also updates me somewhat towards egregiously norm-violating AI. That makes me feel less good about their values.”
Long reflection—EA Forum