Longtermism and Computational Complexity
Introduction
This post argues that longtermist effective altruism may not be action-guiding. This is bad news for a theory of how we ought to live. The crux of my argument is as follows: determining which long-term risks we ought to try to prevent or mitigate requires us to solve problems that may be computationally intractable for finite agents such as ourselves. I take my argument to establish that: i) the longtermist effective altruist helps themselves to the unjustified assumption that we are capable of making inferences with high computational complexity, and ii) this raises difficult challenges for the longtermist effective altruist program that have been under-appreciated to date. Responding to these challenges may require significant rethinking of the conceptual foundations of the longtermist effective altruist movement. In what follows, I’ll lay out my argument in more detail, consider some potential counterarguments to my central claims, and respond to those counterarguments. I’ll then offer some recommendations in light of my conclusions.
My Argument in Ordinary Language
In this section, I use ordinary language to state my argument that longtermist effective altruism is not action-guiding. In a later section, I will also give a more mathematical version of the same argument. I take both arguments to be valid and sound, although the more mathematical version allows the premises to be stated more precisely.
The ordinary-language version of the argument begins with the following premise, which I take to be a core tenet of longtermist effective altruism as it is defended by, among others, Ord (2020) and MacAskill (2022).
Prevent Possible Harms: For any event e that might occur in the future (including the very far future), there is a sufficiently large reduction in utility such that, if e’s occurrence would lead to said reduction, then we ought to take costly steps now to make e less likely, or mitigate the negative impacts of e.
For example, if a potential climate catastrophe could cause many vertebrate species to become extinct, then we ought to take steps to reduce the probability of said catastrophe, even if doing so requires us to incur significant costs.
The ethos expressed in Prevent Possible Harms is sometimes known as “fanaticism,” since it recommends that we should potentially incur very high costs now in order to lower the already-microscopic probability of some far-future event. The charge of fanaticism is sometimes seen as an objection to longtermist effective altruism, but I follow MacAskill and Greaves (2021, Sec. 8) in holding that these charges do not stick. If Prevent Possible Harms could feasibly guide our actions now, and also recommends fanaticism, then I think that we ought to be fanatics. Thus, my aim here isn’t to object to the content of Prevent Possible Harms per se. Rather, I’ll object here to the assumption that Prevent Possible Harms is capable of informing how actual human agents ought to live their lives.
The second premise in my argument to this end is the following:
Don’t Prevent Impossible Harms: For any event e that will not occur in the future, there is no sufficiently large reduction in utility such that, if e’s occurrence would lead to said reduction, then we ought take costly steps now to make e less likely, or mitigate the negative impacts of e.
While it is not discussed as frequently in the longtermist and effective altruist literature, Don’t Prevent Impossible Harms is a truism for most theories of how we ought to act: if an event won’t occur, then we shouldn’t take steps to make it less likely, or mitigate it’s negative impacts. Moreover, Don’t Prevent Impossible Harms follows from the idea that “ought implies can” (Kant, 1781); if e won’t occur, then it’s not possible for us to make it any less likely, or to mitigate negative outcomes that occur because e occurs, and so we cannot be compelled to attempt to do so. To illustrate, if Venus were to suddenly deviate from its orbit tomorrow and collide with Earth, this would presumably lead to a very large aggregate reduction in utility on Earth. But Venus won’t do that, and so no matter how large this hypothetical reduction in utility might be, we shouldn’t take any costly steps now to prevent this hypothetical event from occurring.
My third and final premise describes a key limitation on our ability to predict whether an arbitrary future event will occur.
No Efficient Algorithm: There might not be an efficient algorithm that can take as input: i) a theory of the relationships between all possible events, and ii) any event e such that e’s occurrence would lead to a significant reduction in utility, and return the correct answer as to whether or not, according to the theory, e could occur.
In what follows, I will define the terms “theory” and “efficient algorithm” more precisely. For now, I note that No Efficient Algorithm is derived from an ordinary-language statement of a famous result in computational complexity theory due to Cooper (1990).
Equipped with these three premises, we are able to state the following proposition:
Proposition 1: If Prevent Possible Harms, Don’t Prevent Impossible Harms, and No Efficient Algorithm are all true, then there might not be an efficient algorithm that can take as that input: i) a theory of the relationships between all possible events, and ii) any event e such that e’s occurrence would lead to a significant reduction in utility, and return the correct answer as to whether or not, according to the theory, we ought to take costly steps to prevent E or mitigate the negative impacts of e.
This proposition follows immediately from the definitions of its three antecedent conditions. It is on the basis of this result that I conclude that any theory committed to Prevent Possible Harms, which I take to include most versions of longtermist effective altruism, may be limited in its ability to be action-guiding. If there isn’t an efficient algorithm that tells us whether not we ought to take steps to prevent or mitigate the effects of any potentially catastrophic event, then it seems that Prevent Possible Harms does not give us any practical advice as to how we ought to live our lives. The key assumption here is that if a maxim is meant to be generally action-guiding, then we need to be able to efficiently determine, for any given case, whether or not it recommends taking a particular action.
In what follows, I’ll unpack this argument in more mathematical detail. But first, I’ll need to provide some necessary technical background on computational complexity theory, probabilistic inference, and the computational complexity of probabilistic inference.
Technical Background
Computational Complexity
My discussion of computational complexity makes use of Turing’s (1936) now-famous theoretical framework defining a machine that computes the outputs of functions from their inputs. I won’t go into the details of Turing machines here, as they ultimately aren’t necessary for any of the arguments that I will give below. I’ll also use the term ‘Turing machine’ as a shorthand for ‘deterministic Turing machine.’
Computational complexity theory is a field of mathematics that studies how difficult is is for a computer (e.g., a Turing machine) to solve problems. A problem, for our purposes here, is a pair , where is a function . A Turing machine solves a problem if, given any input in , it eventually outputs and then halts. Note that some problems cannot be solved by a Turing machine (e.g., if the function is not computable). We suppose that is possible to measure the size of any input in ; that is, there is a function that tells us how large any input is. For example, if is a set of bit strings, then might be the number of bits in the string. We also suppose that for any problem , we can measure the maximum amount of time that it takes any Turing machine to solve that problem.
Equipped with these definitions, we can now define some complexity classes, which are sets of problems such that a problem’s membership in the set depends on how difficult that problem is to solve efficiently. The first is the complexity class :
The Complexity Class : A problem is in if and only if there is a Turing machine such that, given any input in , the Turing machine outputs and then halts, in an amount of time that is a polynomial function of the size of .
Intuitively, if a problem is in and we use a Turing machine to find that value for the input with size n, then the Turing machine will take some amount of time t to find the correct answer and then halt. The specific amount of time t will be determined by a polynomial function of n (e.g., a quadratic, cubic, or quartic function). This means that if we feed the same Turing machine some larger input , that Turing machine will take longer to return the value , but the difference between the time that it takes to compute , as compared to , will not be astronomical, since the time it takes to solve a problem in grows sub-exponentially with the size of the input.
When a problem is in , there is a precise sense in which there is an “efficient algorithm” for solving that problem. A Turing machine can solve the problem in such a way that as the size of the input increases, the time it takes to solve the problem does not undergo a quantitative explosion, but instead increases at a manageable rate. A good intuition pump is to think of the time it might take you to read a 400 page book versus an 800 page book written in a similar style. If the 400-page book takes ten hours of active reading to complete, then the 800-page book might take twenty hours, or thirty hours, but it won’t take 1,000 hours of active reading. This is because reading is a task that humans, in general, are able to do efficiently.
A second important class of problems is the set of problems in . These are problems that can be verified in polynomial time by a Turing machine. We define this complexity class more precisely as follows:
The Complexity Class : A problem is in if and only if there is a problem , where only if , that is in .
Intuitively, a problem is in if and only if, for any input-output pair, one can be given a “hint” that allows you to verify whether the output given is the correct one for that input, in an amount of time that scales polynomially with the size of the input. For instance, there is no known algorithm that solves the general problem of determining whether a finite set of integers has a subset that sums to zero in time polynomial in the size of the set. But, if we are given a pair containing the set of integers and the subset , then we can check, in time polynomial in the size of M, whether or not all elements of M sum to zero, by simply find the sum of the elements of M. This allows us to verify that N contains a subset of integers that sum to zero, if indeed it does. The set M is the “hint” that allows us to verify our answer.
Clearly, . But one of the most beguiling open questions in mathematics is whether or not (resolving this question currently comes with a $1 million prize). Most professionals working on the problem believe that (see Rosenberger 2012), such that there are problems that can be verified but not solved in polynomial time.
The final class of problems that we will consider is the set of -Hard problems:
The -Hard Problems: A problem is -Hard if and only if, were it to be in , then every problem in would be in .
Intuitively, one can think of -Hard problems as problems such that, if we could somehow solve them in polynomial time, then it would act as a key to unlock our ability to solve all of the problems in in polynomial time. Crucially, if it is indeed the case that , then no -Hard problem can be solved in polynomial time.
In what follows, I’ll present the result, which I take to be crucial to my argument that longtermist effective altruism may not be action guiding, that probabilistic inference is -Hard. But first, I’ll have to say a little more about what I mean by “probabilistic inference” in this context.
Probabilistic Inference
In this section, I’ll present some formal details on how I’m understanding probabilistic inference, as well as formalize the concept of a “theory,” as the term is used in my argument above.
A probability space is a triple , where is a set of all possible worlds, is a set of subsets of closed under union, intersection, and complement, and is a probability distribution such that , , and for any disjoint sets A and B, . Intuitively, since is a set of possible worlds, the elements of are possible events, to which P assigns probabilities.
A binary random variable is a function . Any binary random variable is measurable with respect to a probability space if and . For our purposes, we’ll treat any binary random variable V as corresponding to a proposition in a natural language, with if the proposition is true and if it is false. We can assign a probability to the event that , where , using the formula
A propositional network is a pair , in which is a set of binary random variables and is a set of edges, or ordered pairs, relating those variables. If contains an ordered pair , we say that is a parent of . One can visualize a propositional network as a directed graph in which the nodes are binary random variables, connected by directed arrows indicating parent-child relationships.
A belief network is a pair consisting of a propositional network and a probability space such that:
All variables in are measurable with respect to .
For any variable in and any , the probability is given by the equation: , where is the set containing all and only V’s parents in the belief network, and each is a possible assignment of truth values to all variables in .
Thus, in a belief network, the probability of the truth or falsehood of some proposition is determined solely by the conditional distribution over the corresponding propositional variable, given its parents, and the prior distribution over each of those parents. A belief network formalizes the intuitive idea of a “theory,” in that it specifies how, and the extend to which, the truth of each proposition in some salient set depends on the truth of the other propositions in that set. As such, it also provides a blueprint for making probabilistic inferences, in that it allows us to determine, based on which propositions we take to be true, what our degree of belief in the truth of other propositions should be. A belief network is a special case of what is also called a Bayesian network (see Pearl, 1988).
The Computational Complexity of Probabilistic Inference
Consider the following problem, which Cooper (1990) calls “PIBNETD” (an abbreviation for “Probabilistic Inference in Belief NETworks—Decision”):
PIBNETD: For any belief network and variable within that belief network, output if and output otherwise.
More formally, PIBNETD is the problem , where is the set of all triples such that the pair is a belief network. The problem is solved by outputting if and only if , with otherwise.
A Turing machine that solves this problem is able to determine, for any probabilistic theory of the relationships between propositions, whether or not any proposition expressible in that theory has positive probability of being true.
As foreshadowed above, Cooper (1990) proves the following:
Proposition 2: PIBNETD is -Hard, where the size of the input is given by the number of variables in the belief network.
This comes with the immediate implication that if , then there is no algorithm that can take as input any belief network and return an answer as to whether a given proposition has positive probability, while always outputting that answer in time polynomial in the number of variables in the belief network. This important result is central to the mathematical argument, given in the next section, that longtermist effective altruism may not be action-guiding.
My Argument in More Mathematical Language
I begin by re-writing Prevent Possible Harms in terms of belief networks and probabilities.
Prevent Possible Harms*: Let be a belief network representing some agent or group of agents’ beliefs about the probabilistic relationships between some set of propositions. Let be a function such that represents the change in aggregate utility that is realized if rather than . If , then there is real number such that, if , then those agents ought to take costly steps now to make it less likely that , or to mitigate the negative impacts of the event(s) that occur if .
To illustrate, suppose that, according to our best theory of AI development (represented as a belief network), there is positive probability of an AI-driven human extinction event in the next hundred years. Presumably, the change in utility that is realized by this extinction event occurring, rather than not occurring, is sufficiently negative that the event’s having positive probability warrants taking costly action now to make this event less likely, or to mitigate its potential negative effects.
Next, I’ll provide an analogous re-writing of Don’t Prevent Impossible Harms:
Don’t Prevent Impossible Harms*: Let be a belief network representing some agent or group of agents’ beliefs about the probabilistic relationships between some set of propositions. Let be a function such that represents the change in aggregate utility that is realized if rather than . If , then there is no real number such that, if , then those agents ought to take costly steps now to make it less likely that , or to mitigate the negative impacts of the event(s) that occur if .
This formalizes the idea that if an event has zero probability of occurring, then no matter how bad it would be if that event were to occur, we are not compelled to take costly steps now to lower the probability of that event occurring in the future.
I now define the following problem:
PIBNETD-Harms: For any belief network and any variable within that belief network such that for some , output if and output otherwise.
PIBNETD-Harms is just PIBNETD restricted to those propositional variables whose truth would lead to a sufficiently negative outcome. From this definition, the corollary immediately follows:
Corollary 3: If Prevent Possible Harms* and Don’t Prevent Impossible Harms* are both true, then we can determine, for any belief network and any variable within that belief network such that for some , whether we ought to take steps now to make it less likely that , or to mitigate the negative impacts of the event(s) that occur if , only if we can solve PIBNETD-Harms.
Corollary 3 establishes the conditions under which Prevent Possible Harms* and Don’t Prevent Impossible Harms* are mutually action guiding. Namely, if we can solve PIBNETD-Harms, then these two tenets of longtermist action can help us decide what to do now. Unfortunately, as I’ll argue below, we have reason to suspect that we cannot solve PIBNETD-Harms efficiently.
To this end, I’ll introduce a new premise, with no analog in the ordinary-language argument:
Independence of Bad Outcomes*: For any belief network , any function , and any real number , there is no Turing machine that takes as input the belief network and the real number and outputs the proper subset of variables such that if , then , and halts.
In other words, there is no way for any finite intelligent being to determine, using only structural and probabilistic properties of a belief network, which proper subset of propositions represented in the network are such that their truth could amount to a very bad outcome. Under this assumption, the badness of a proposition is independent of its structural or probabilistic relationships to other propositions. As an intuition pump, suppose that you were able to view a large spreadsheet of propositions, each labelled just with a number. You also have a conditional probability table showing you the probabilistic relationships between all these propositions. It is hard to see how, given just this information, you could identify a proper propositions are such that, if they were true, things could be especially bad. The structural and probabilistic information contained in the spreadsheet does not, on its own, tell us anything about how we value the truth or falsehood of the propositions depicted.
With this new premise in hand, I can state the following proposition:
Proposition 4: If Independence of Bad Outcomes* is true, then PIBNETD-Harms is -Hard, where the size of the input is given by the number of variables in the belief network.
Proof. If it were the case that for any given belief network and , any function , and any real number , only certain variables could be such that , then it would be the case that a Turing machine could take the belief network and the number as input and output the relevant subset of variables that could be such that . Thus, the truth of Independence of Bad Outcomes* entails that for any belief network , any function , and any real number , any can be such that . Thus, solving PIBNETD-Harms requires us to solve the problem of determining, for any any belief network and variable within that belief network, whether . This is just the problem PIBNETD, which is known to be -Hard, where the size of the input is given by the number of variables in the belief network. And so PIBNETD-Harms is also -Hard where the size of the input is measured in the same way.
This proposition leads directly to the following corollary:
Corollary 5: If , then PIBNETD-Harms cannot be solved in time polynomial in the number of variables in the belief network.
Which, in conjunction with Corollary 3, leads to the key claim of this post:
Corollary 6: If , and if Prevent Possible Harms* and Don’t Prevent Impossible Harms* are both true, then there is no algorithm that takes as input any belief network and any variable within that belief network such that for some and determines, in time polynomial in the number of variables in the belief network, whether we ought to take steps now to make it less likely that , or to mitigate the negative impacts of the event(s) that occur if .
It is on the basis of this corollary that I conclude that Prevent Possible Harms* and Don’t Prevent Impossible Harms*, and their ordinary-language counterparts, may not be generally action-guiding. My inclusion of the quantifier “may” reflects that my conclusion ultimately depends on the conjecture that . But the fact that this conjecture may be true (indeed, a single polynomial-time algorithm that solves an -Hard problem would establish that , and yet no such algorithm has ever been found) should give us serious pause that we have a general ability to separate between those potential catastrophes that have no probability of occurring, and those that have positive probability of occurring. This is bad news for a worldview that seeks to tell us what we ought to do, and which insists that extreme measures may need to be taken to prevent low-probability events with potentially catastrophic effects.
In the next section, I’ll consider some counterarguments to the claims that I have presented here, and respond to each.
Counterarguments and Responses
Counterargument 1: Salient Variables Only
Counterargument: The results given above show that no general-purpose algorithm can efficiently receive as input any probabilistic theory and any proposition in that theory, and then tell us whether that proposition has positive probability of being true. But this is more than the longtermist effective altruist needs. Rather, the longtermist effective altruist just needs a special-purpose algorithm to tell them whether or not, on a given probabilistic theory, a few salient propositions (e.g., that a catastrophic climate change event, nuclear war, or AI-driven extinction event will occur) have positive probability of being true. Nothing in the argument given above rules out the possibility of such an algorithm. Moreover, we have independent reasons for believing that each of these events has positive probability, and so it seems that such a special-purpose algorithm already exists.
Response: This response assumes that when we attempt to solve PIBNETD-Harms, the fact that a propositional variable is such such that (and so, is salient from the perspective of avoiding significant harms) will entail that the same variable is also one such that, however it appears in a belief network, we will be able to efficiently determine whether or not it has positive probability. It would be fantastic if this were true, but I see no reason to believe that it is. In general, we may be able to design special-purpose algorithms that efficiently calculate the probability distribution over a variable when that variable occupies a special position in a belief network; namely, when it is in a sparse region of that network, without too many edges connecting variables (see Pearl 1986). As expressed in Independence of Bad Outcomes*, it seems plausible that there is no way of inferring, just from the structural and probabilistic properties of a belief network, what the moral valence of a given propositional variable is. By the same token, it would be strange if the special badness of a propositional variable’s being true would imply that the propositional variable in question must occupy a special position within a belief network.
That said, this response is correct to point out that there are some catastrophic events that longtermist effective altruists tell us we ought to take costly steps to avoid for which we have good reason to assign positive probability. Take the possibility of a catastrophe due to compounding effects of anthropogenic climate change. Here, we have an abundance of well-defined models of Earth’s climate, each supported by empirical data. While making high-resolution predictions of the behavior of the climate on 100-year timescales is not possible (see Frigg et al., 2013), these models are in agreement that there is positive probability of extremely harmful climate change that warrants very costly action now (Hoegh-Guldberg et al., 2019). What is noteworthy about this case is not that the possibility of catastrophic climate change is a particularly salient variable, but rather that on the much narrower set of belief networks that are supported by existing climate data, we are able to efficiently reason that a climate catastrophe has positive probability of occurring.
However, I will also note that the imperative to avoid and mitigate the possibility of catastrophic climate change is not uniquely highlighted by longtermist effective altruists. Indeed, we have good evidence that we are already experiencing significant negative impacts of climate change (Letchner 2021), such that there is nothing especially longtermist about taking steps now to reduce climate change. The threat of climate change is also a concern that is highlighted by many organizations that are not explicitly effective altruist (e.g., the United Nations or the Democratic Party in the U.S.). Much the same could be said about the existential risks due to nuclear war; there are models, well-supported by data, that demonstrate the consequences of multiple detonations of the kinds of nuclear weapons that we know for a fact are currently deployable. That these weapons are deployable now means that there is nothing “longtermist” about wanting to avoid nuclear war. Moreover, nuclear de-proliferation is a concern of many organizations that are not explicitly effective altruist (indeed, many of the same ones concerned with climate change).
On the other hand, consider a putative existential risk like the possibility of malevolent artificial general intelligence. The possibility that the creation of greater-than-human intelligence could pose an existential risk to humanity was first seriously theorized in a thought experiment due to Good (1966), and has been an animating cause of many longtermist effective altruists since at least the publication of Bostrom’s Superintelligence (2014). However, unlike in the case of climate change, the proposition that we have data-supported models that entail that there is a positive probability of a catastrophe brought about by artificial intelligence is much less plausible. Here is Ord (2020), discussing his motivation for entertaining the possibility of an AI-driven catastrophic event, one that we ought to take costly action to prevent:
The most plausible existential risk would come from success in AI researchers’ grand ambition of creating agents with a general intelligence that surpasses our own. But how likely is that to happen, and when? In 2016, a detailed survey was conducted of more than 300 top researchers in machine learning. Asked when an AI system would be “able to accomplish every task better and more cheaply than human workers,” on average they estimated a 50 percent chance of this happening by 2061 and a 10 percent chance of it happening as soon as 2025 (pp. 139-140).
In light of the results presented above, we should be very skeptical of the epistemic value of individuals’ estimates of the probability of any event, especially when those estimates are not transparently based on a data-supported model. We know that in general, Turing machines cannot take an arbitrary probabilistic theory of how their environment works and accurately compute whether some proposition in that network has positive probability of being true. As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.
Counterargument 2: Salient Belief Networks Only
Counterargument: Along the same lines of the previous counterargument, it is not the case that longtermist effective altruists need a general-purpose efficient algorithm for telling them whether any probabilistic theory assigns positive probability to certain catastrophic events. Rather, one only needs an algorithm that works on those belief networks that are well-supported by our best scientific theories of natural systems. The success of science to date suggests that these kinds of efficient algorithms do exist, and can be used to detect positive-probability disaster events like catastrophic climate change or an AI-driven extinction event.
Response: Many of the disaster events that are most salient in longtermist effective altruist discourse are mesoscale phenomena. Their occurrence is governed by the dynamics of objects larger that the particles whose behavior is very well-predicted by quantum mechanics, and smaller than the astronomical bodies whose behavior is similarly well-predicted by general relativity. It is precisely at this mid-size range that science has a harder time making accurate predictions and even generating good models. Thus, we are not in a position now to tell which belief networks relating propositions about mesoscale phenomena are well-supported by our best science.
Again, there are cases where we do have well-supported scientific models that assign positive probability to catastrophic events like climate change. And in the case of nuclear war, we have historical evidence of the destruction that nuclear weapons can cause. We can and should heed the recommendations of those models. But for more esoteric putative existential risks like an AI-driven extinction event, it is far less clear that we have any scientific basis for restricting ourselves to any class of belief networks when representing the relevant aspects of our environment. Indeed, we still do not know whether humanity-threatening superintelligence is even physically possible. Without an answer to this question, we lack a basis for restricting ourselves to any one class of belief networks for determining whether such an event has positive probability of occurring. Lacking such a basis, and also lacking any efficient algorithm for determining whether such an event has positive probability according to any input belief network, we must, I conclude, accept that our epistemic position with respect to whether such events have positive probability of occurring is extremely limited.
Counterargument 3: Limits on Harms and Costs
Counterargument: It may be that there is a finite lower bound on the function , such that for any propositional variable, there is only so much negative change in utility that can result from that proposition’s being true. Moreover, there may also be a finite lower bound on the cost (expressed as a positive real number) of taking any action to lower the probability of any event. Under these circumstances, the longtermist effective altruist does not need to compute whether for any V, but must instead be able to compute whether . If this inequality does not hold, then , and so the magnitude of the greatest possible decrease in utility that would occur if is less than or equal to the minimal cost that we can incur to reduce the probability that , such that we should be either indifferent or averse to incurring these costs now to reduce the probability that in the future. If the inequality does hold, then the cost of acting now to reduce the probability that may be worth incurring.
Response: Efficiently determining whether for any c and b is still an -Hard problem. To show this, we first define a problem:
PIBNETD-THRESHOLD: For some , any belief network, and any variable within that belief network, output if and output otherwise.
We then prove the proposition:
Proposition 7: PIBNETD-THRESHOLD is -Hard for when the size of the input is defined in terms of the number of variables in the belief network.
Proof. Shimony (1994) shows that for any , the following problem is -Hard when the size of the input is defined in terms of the number of variables in the belief network:
MAPBNETD (Maximum A-posteriori Probability in a Belief NETwork—Decision): For any belief network, output if there is an assignment of values to all variables in the belief network such that , and output otherwise.
Suppose that we could solve PIBNETD-THRESHOLD in time polynomial in the size of the input belief network. Then for the input variable V, we would could determine in polynomial time either that or that (since ). Let be an assignment of values to the variables . If we could determine in polynomial time either that or that , then we also know either that or that . This would entail in turn that there exists a such that either or , where n is the cardinality of . This would provide a polynomial-time solution to MAPBNETD for . This shows that PIBNETD-THRESHOLD is -Hard.
Even if there is a lower bound on the probability of a catastrophic event such that the longtermist effective altruist will recommend taking costly steps now to lower the probability of that event, it can be safely assumed that this lower bound is less than .5. Thus, this result shows that restrictions on the badness of any outcome or the minimal costs we can incur to lower the probability of some outcome occurring won’t get the longtermist effective altruist out their epistemic predicament.
Counterargument 4: Why Not Approximate?
Counterargument: Dagum and Luby (1993) show that if one is able to introduce some randomness into an algorithmic procedure, then one can use simulation techniques to estimate the value of the probability for any variable V in any belief network to accuracy level , with failure probability . That is, with we can run an algorithm that takes as input a belief network and a variable, and outputs a value Z such that the probability that Z is in the interval is . This algorithm runs in time polynomial in the size of the belief network, and the values and . Thus, we can efficiently find an arbitrarily tight bound on an estimate of with minimal failure probability. This is good enough for the longtermist effective altruist to make action-guiding predictions about what will happen in the future.
Response: In the same paper, Dagum and Luby note that the algorithm described in this response only works if we do not allow ourselves to condition on any events in our belief network. That is, they also show that there is not a similarly efficient estimation procedure that takes as input any belief network, any variable V in that network, and any assignment of values to some subset of variables within the network, and returns an estimate of with accuracy level and failure probability . I take it that longtermist effective altruists do not only want to estimate the unconditional values of probabilities, but also want to be able to adopt a particular probabilistic theory of the world, make observations about that world, and use those observations to accurately update their estimates of the probability of future events. Dagum and Luby’s result shows that this cannot be done in efficiently in the general case.
In a later paper, Dagum and Luby (1997) do provide an efficient sampling algorithm for solving the estimation problem described immediately above. However, that algorithm adopts the assumption that there is no assignment of values to variables in the belief network such that . That is, every conjunction of propositions is assigned positive probability of being true and positive probability of being false. This means that if the longtermist effective altruist wants to use such an algorithm in a way that is action-guiding, they will first have to assume the existence of the kinds of limits on harms and costs discussed above. This may be controversial in its own right. But even if we can make this assumption, we note that lower bounds on any sampling algorithm for estimating probabilities mean that the extra effort needed to lower margins of error to acceptable rates can still be computationally expensive.
Canetti et al. (1994) prove that in the best-case scenario, sampling algorithms for estimating probability are lower-bounded by a constant factor of , where is again the probability of failure and is the margin for error. In general, this means we can make efficient gains in the accuracy of our inferences. Setting , if it takes takes approximately 1 minute to generate an estimate with a margin for error of , then achieving a margin for error of will take four minutes. However, as we approach , efficiency gains become more expensive. Reducing the margin for error to would require us to run the same algorithm for approximately 174 days. When deciding whether to expend significant resources to reduce the probability of a low-probability event, tight margins for error and low failure rates will be required, rendering statistical inference using sampling algorithms increasingly intractable, even if we are willing to make the assumption that all events have positive probability.
Counterargument 5: Possibility vs. Probability
Counterargument: The longtermist effective altruist can reject the move, implicit in the restatement of Prevent Possible Harms and Don’t Prevent Impossible Harms as Prevent Possible Harms* and Don’t Prevent Impossible Harms*, that ‘possible’ and ‘impossible’ are equivalent to ‘has positive probability’ and ‘has probability zero.’ To illustrate, suppose that one believes that a bus is equally likely to arrive at any time between three and four o’clock, and that the set of times between three and four o’clock can be represented using the set of real numbers between zero and one. On this set-up, the bus has probability zero of arriving at any specific time, and yet (let’s assume), there is a specific time that the bus arrives. (See Williamson (2007) for further arguments that probability-zero events can occur.) Moreover, suppose that it would be very bad for the bus to arrive at exactly 3:30, and perfectly fine for it to arrive at all other times. Even though this event has zero probability, it seems that one would be justified in incurring some costs to prevent the bus from possibly arriving at exactly 3:30 (e.g., by blocking the road to the bus stop). By the same token, if a possible future event is bad enough, then we may still want to incur costs now, even if that event currently has probability zero.
Response: My only response here is to say that while this is a deeply interesting counterargument from a theoretical proposal, the conceptual problem that it points to is very challenging. For the longtermist effective altruist to stand behind this counterargument, they would need to provide a decision theory that recommends, in a systematic way, taking costly action to avoid the potential bad consequences of some zero-probability events but not others, while simultaneously allowing us to rank the choice-worthiness of actions whose expected value depends on a probability distribution over positive-probability events. From a philosophical standpoint, this amounts to a deeply interesting project, but one that would need to be significantly further development before it constitutes a serious response to the challenge that I have presented here.
Recommendations
Good-Old-Fashioned-EA
In Singer’s (1972) drowning-child thought experiment, the reader is asked whether the cost of ruining one’s nice new clothes could ever override the moral imperative to save a child drowning in a shallow pool. The answer is clearly ‘no.’ Singer then argues via analogy that the globally affluent ought to incur far greater costs than they currently do to improve the lot of the global poor. This argument has had a massive influence on the effective altruist movement, and in my opinion, it gets things mostly right. Importantly, it does not depend in any way on our predictive capacities. In the thought experiment, the child is drowning right in front of us, such that there is no reason not to assign significant credence to the possibility that they will drown. We can also be confident that we would be able to save the child, should we jump into the pool after them. Thus, the idea that we should do more to efficiently and effectively improve the lot of those living more precarious lives is not subject to any of the objections put forward in this post.
In light of this, one recommendation to take away from this piece is that the effective altruist movement should be wary of aligning itself too strongly with the view that the most important thing we can be doing now is offsetting the risk of future catastrophic events. Indeed, the thrust of my argument has been that intellectual humility requires deep skepticism about whether we are currently in an epistemic position to accurately assess what steps we should be taking now to help those who will exist after we are long gone. Instead, I’d advocate a return to the roots of effective altruism as a movement dedicated to doing more and doing better to help those who are in danger now. Note that this is all consistent with an effective altruist movement that dedicates significant resources and attention to risks like anthropogenic climate change and nuclear war, both of which pose a clear danger to those of us living now, and to those who will live in the near future.
Models, not Forecasts
As described above, longtermist effective altruists occasionally appeal to expert testimony to estimate the probability of far-future events for which we lack well-supported empirical models. One also finds support in some pockets of effective altruist movement for efforts like Metaculus, which aim to crowdsource forecasts of future events that may lead to significant catastrophes. As argued above, I think that the provable -Hardness of probabilistic inference in belief networks support deep skepticism about the value of these kinds of predictions.
By contrast, where we have empirically-supported, demonstrably-tractable mathematical models that assign positive probability to some future event, we should take that event much more seriously as a real possibility. This leads to the recommendation that effective altruists should spend less resources on forecasting and more effort on modeling. If effective altruists suspect that a given complex system may result in a catastrophic event, the way to confirm this suspicion is to begin modeling that system, comparing the model with available data, refining that model, and using the model to tractably produce increasingly reliable estimates of the probabilities of these potential catastrophes. Briefly, to the extent that there is a longtermist effective altruist program, that program should be significantly dedicated to the tractable probabilistic modeling of complex systems.
Computational Capabilities
My conclusions in this post have a conditional form: if , then the kind of inferential capacities needed for longtermist effective altruism to be action-guiding may not be achievable by beings such as ourselves. But if , then nothing that I have argued here applies, and the longtermist effective altruist’s epistemic situation may be far better than I have made it out to be. Moreover, the only evidence that I have presented for the claim is exactly the kind of expert testimony and survey data with respect to which I have advocated skepticism. So, there is a case to be made that the longtermist effective altruist actually has a lot to gain by focusing at least some of their effort on settling the question of whether . If this question were settled in the affirmative, it could open up incredible opportunities for our ability to forecast and prevent catastrophic events.
There are at two streams of work that would be effective in achieving this end. The first is technical work aimed at settling the question from a theoretical perspective (i.e., proving a theorem that settles the question one way or the other). The second is work aimed at designing novel algorithms (recall that if a single algorithm solves an -Hard problem in polynomial time, then and the floodgates are open). However, improving our algorithms in this way would also raise many of the AI-security risks that I have argued we currently lack the ability to accurately forecast. To this end, it is likely a positive feature of current effective altruist practice that funding on computational research goes to organizations, like the Machine Intelligence Research Institute, that place safety and alignment at the heart of their mission. This holds even if, as I argue above, we have little reason now to take predictions regarding the possibility of an AI-driven extinction event seriously.
Conclusion
Nothing that I have said here should be taken as an argument that it is wrong, in itself, to extend moral concern to those persons who will exist in the future, even the distant future. Rather, my conclusion is that we have good reasons to adopt significant epistemic humility with respect to the risks those future people might face. These reasons, which are based on results in computational complexity theory, have been under-explored to date. This leads me to the conclusion our moral concern with future people, justifiable as it may be, nevertheless fails to be action-guiding in some crucial cases.
I’m sorry and I might well be missing something, but it seems to me that the main argument in the post is “here is a particular problem with having one’s actions guided by models which are so messy as to make calculating probabilities of events impossible”. But such models are not used by anyone to make predictions anyway, because they are computationally intractable (a forecaster can’t generally use such models to compute probabilities, unless they are secretly in possession of something like something more powerful than a polynomial time SAT-solver)! So since no forecaster is operating with such a model, it seems to me that the post’s main argument says nothing about whether what forecasters are actually doing makes sense or not.
The only part of the post that I understand as saying something about [whether [what [people actually assigning probabilities to potential catastrophic events] are doing] makes sense or not] is the response to counterargument 2.
It seems to me that ~all salient belief networks, at least in the sense of belief networks used by people in practice to come up with probabilities in this context, are computationally tractable. (I.e., coming up with probabilities given the model has to be computationally tractable, otherwise people would not be using the model to come up with probabilities.) So it seems to me that the (~only) crucial question here is whether we have reason to think that these particular tractable belief networks would make decent predictions – I understand that this is addressed a little bit in the response to counterargument 2. I don’t find that argument convincing at all, but also this is not the main point I’d like to make with this comment. My main point is that it does not look like the main thrust of this post contributes anything to understanding of this ~only crucial question.
So it currently seems to me that the number of paragraphs in this post which address whether longtermist reasoning can be action-guiding is approximately upper-bounded by 2. Again, I feel bad about being harsh here. To say something positive: the stuff about computational complexity is cool on its own! My meta-level guess is that I’m missing something crucial...
+1, See also this: https://www.cold-takes.com/the-track-record-of-futurists-seems-fine/ for forecasting track records which are a bit longer term.
Thanks for reading and for the very thoughtful comment. I take issue with the idea that human forecasters can’t be using models where bayesian inference is intractable, since they demonstrably are able to infer the probabilistic consequences of their prior beliefs. I suspect that we can do this in some well-understood systems (e.g., a tennis player might have a near-bayes-optimal mental model of what will happen under a lot of different salient possibilities when they return a serve). But it also seems like a salient possibility to me that when someone says “I think the probability of X event is positive,” that they’re actually not in coherence with all of their other beliefs about the world, were we to systematically elucidate all of those beliefs.
What are your thoughts on forecasting techniques for open-ended/subjective questions that have demonstrably good track records?
See e.g. https://www.cold-takes.com/prediction-track-records-i-know-of/ (or my personal track record)
(Edited to add: I agree that no one has a perfectly coherent model of the world because we’re all flawed humans, but that doesn’t mean we don’t have any coherence or prediction ability)
My kind of quick off-the-cuff theory of good forecasting is that you’re probably running something like a good Monte Carlo algorithm in your head as you simulate outcomes. That’s great if you’re willing to assign all events positive probability (a good idea when forecasting something like an election). But that assumption begs the question against Don’t Prevent Impossible Harms. And, as I note in the article, getting very high precision can still be computationally expensive.
Sorry, I had just read your comment and not the post previously. I’ve now read the section “My Argument in Ordinary Language” and skimmed a few other portions; I don’t think I would be able to understand the technical details very quickly.
My new question is (sorry if I missed the answer to this somewhere): Why can’t I just say that all possible events have positive probability, and our task is to figure out which ones are higher and worth paying attention to and which ones are very very low (and such not worth worrying about)? Isn’t the idea that we should have nonzero probability on any event occurring a core tenet of Bayesian epistemology? Do you disagree with Bayesian epistemology, or am I missing something (totally possible)?
I guess the worry then is that you’re drawn into fanaticism: in principle, any positive probability event, however small that probability is, can be bad enough to justify taking extremely costly measures now to ameliorate it.
I’d also say that assigning all events positive probability can’t be a part of bayesianism in general if we want to allow for a continuum of possible events (e.g., as many possible events as there are real numbers).
I do think the best way out for the position I’m arguing against is something like: assume all events have positive probability, set an upper bound on the badness of events and the costliness of ameliorating them (to avoid fanaticism) and then hope you can run simulations that give you a tight margin for error with low failure probability.
Ah yeah, I think I’m probably more sympathetic to (the maybe unfortunately named) fanaticism than you are, see e.g. In Defense of Fanaticism. Honestly the thing that makes me worry about it the most by far is infinite ethics.
Yeah I’m confused about how to think about this; I’d be interested to hear from an expert on this topic on what the Bayesian view here is. I did some searching but couldn’t find anything in the literature within a few minutes.
My intuition says that it’s fine as long as the number of possible events isn’t a “bigger infinity” than real numbers in the same way uncountable infinities are larger than countable infinities? But not sure.
So I’m actually fine with fanaticism in principle if we allow some events to have probability zero. But if every event in our possibility space has positive probability, then I worry that you’ll just throw ever-more resources at preventing ever-lower probability catastrophes.
On probability zero events and Bayesianism in the case where the sample space is a continuum, Easwaran is a great source (this is long but worth it, sec. 1.3.3 and sec. 2 are the key parts): https://philpapers.org/archive/EASCP.pdf
On a way to defuse the fanaticism problem, I’ve actually written a post on it, showing why a noise floor is the most useful way to solve the problem.
Here’s the post, called EV Maximization for Humans:
https://forum.effectivealtruism.org/posts/qSnjYwsAFeQv2nGnX/ev-maximization-for-humans
I don’t see why this is an issue. It seems like a good thing to prevent catastrophes as long as it’s more cost-effective to do so than non-catastrophe-preventing interventions. If the catastrophe is low enough probability then we should pursue other interventions instead.
Thanks for linking. I read through Section 1.3.3 and thought it was interesting.
I thought of an argument that you might be wrong about the disanalogy you claim between longtermist forecasting and election forecasting that you mention above. You write in previous comments:
and
Let’s consider the election case. At a high level of abstraction, either Candidate A or Candidate B will win. We should obviously assign positive probability to both A and B winning. But with a more fine-grained view, I could say there’s an infinite continuum of possible outcomes that the state of the world will be in: e.g. where Candidate A is standing at 10 PM on election night might have an infinite number of possibilities with enough precision. The key is that we’re collapsing an infinite amount of possible worlds into a finite amount of possible worlds, similar to how the probability of a random point on a circle being in any exact spot is 0, but the chance of it ending up on the right half is 0.5.
I claim that the same thing is happening with catastrophic risk forecasting (a representative instance of longtermist forecasting). Let’s take AI risk: there are 2 states the world could be in 2100, one without AI having caused extinction and one with AI having caused extinction. I claim that similarly to the election forecasting examples, it would be absurd not to assign nonzero probabilities to both states. Similar to the election forecasting example, this is collapsing infinitely many states of the world into just two categories, and this should obviously lead to nonzero credence in both categories.
My guess is that the key error you’re making in your argument is that you’re considering very narrow events e, while longtermists actually care about classes of many events which obviously in aggregate demand positive probability. In this respect forecasting catastrophic risks is the same as forecasting elections (though obviously there are other differences, such as the methods we use to come up with probabilities)!
Let me know if I’m missing something here.
I think the contrast with elections is an important and interesting one. I’ll start by saying that being able to coarse-grain the set of all possible worlds into two possibilities doesn’t mean we should assign both possibilities positive probability. Consider the set of all possible sequences of infinite coin tosses. We can coarse-grain those sequences into two sets: the ones where finitely many coins land heads, and the ones where infinitely many coins lands heads. But, assuming we’re actually going to toss infinitely many coins, and assuming each coin is fair, the first set of sequences has probability zero and the second set has probability one.
In the election case, we have a good understanding of the mechanism by which elections are (hopefully) won. In this simple case with a plurality rule, we just want to know which candidate will get the most votes. So we can define probability distributions over the possible number of votes cast, and probability distributions over possible distributions of those votes to different candidates (where vote distributions are likely conditional on overall turnout), and coarse-grain those various vote distributions into the possibility of each candidate winning. This is a simple case, and no doubt real-world election models have many more parameters, but my point is that we understand the relevant possibility space and how it relates to our outcomes of interest fairly well. I don’t think we have anything like this understanding in the AGI case.
Great, I think we’ve gotten to the crux. I agree we have much worse understanding in the AGI case but I think we easily have enough understanding to assign positive probabilities, and likely substantial ones. I agree more detailed models are ideal but in some cases they’re impractical and you have to do the best you’ve got with the evidence you have. Also, this is a matter of degree and not binary, and I think often people take explicit models too literally/seriously and don’t account enough for model uncertainty e.g. putting too much faith in oversimplified economic models, underestimating how much explicit climate models might be missing out on tail risks or unknown unknowns.
I’d be extremely curious to get your take on why AGI forecasting is so different from the long-term speculative forecasts in the piece Nuno linked above, of which many turned out to be true.
I don’t have a fully-formed opinion here, but for now I’ll just note that the task that the examined futurists are implicitly given is very different from assigning a probability distribution to a variable based on parameters. Rather, the implicit task is to say some stuff that you think will happen. Then we’re judging whether those things happen. But I’m not sure how to translate the output from the task into action. (E.g., Asimov says X will happen, and so we should do Y.)
Agree that these are different; I think they aren’t different enough to come anywhere close to meaning that longtermism can’t be action-guiding though!
Would love to hear more from you when you’ve had a chance to form more of an opinion :)
Edit: also, it seems like one could mostly refute this objection by just finding times when someone did something with the intention of affecting the future in 10-20 years (which many people give some weight to for AGI timelines), and the action had the intended effect? This seems trivial.
(By the way, I’m aware that it doesn’t quite canonically make sense to speak of the “tractability” of computing probabilities in a single belief network. But I don’t think this meaningfully detracts from what I say above making sense and being true. (+ I guess the translation of the result in this post from math to common language would probably ~equivocate similarly.))
I agree that we should be especially careful not to fool ourselves that we have worked out a way to positively affect the future. But I’m overall not convinced by this argument. (Thanks for writing it, though!)
I can’t quite crisply say why I’m not convinced. But as a start, why is this argument restricted just to longtermist EA? Wouldn’t these problems, if they exist, also make it intractable to say whether (for example) the outcome intended by a nearterm focused intervention has positive probability? The argument seems to prove too much.
So I think we should be skeptical of any claims that some event has positive probability when the event hasn’t happened before and we don’t have a well-worked-out model of the process that would produce that sort of event. It just strikes me that these features are more typical of longer-term predictions.
I agree we should be skeptical! (Although I am open to believing such events are possible if there seem to be good reasons to think so.)
But while the intractability stuff is kind of interesting, I don’t think it actually says much about how skeptical we should be of different claims in practice.
I think if someone tells you that a potentially catastrophic event has positive probability, then the general intractability of probabilistic inference is a good reason to demand a demonstrably tractable model of the system that generates the event, before incurring massive costs. Otherwise, this person is just saying: look, I’ve got some beliefs about the world, and I’m able to infer from those believes that this event that’s never happened before has positive probability. My response is to say that this just isn’t the sort of thing we can do in the general case; we can only do it in the case of specific classes of models. Thus, my recommendation for more science and less forecasting in EA.
Thanks for clarifying! I agree that if someone just tells me (say) what they think the probability of AI causing an existential catastrophe is without telling me why, I shouldn’t update my beliefs much, and I should ask for their reasons. Ideally, they’d have compelling reasons for their beliefs.
That said, I think I might be slightly more in favour of forecasting being useful than you. I think that my own credence in (say) AI existential risk should be an input into how I make decisions, but that I should be pretty careful about where that credence has come from.
I think we’ve arrived at a nice place then! Thank you so much for reading!
Thanks for a brilliant post! I really enjoyed it. And in particular, as someone unfamiliar with the computational complexity stuff, your explanation of that part was great!
I have a few thoughts/questions, most of them minor. I’ll try to order them from most to least important.
The recommendation for Good-Old-Fashioned-EA
If I’m understanding the argument correctly, it seems to imply that real-world agents can’t assign fully coherent probability distributions over Σ in general. So, if we want to compare actions by their prospects of outcomes, we just can’t do so. (By any plausible decision theory, not just expected value theory.) The same goes for the action of saving a drowning child—we can’t give the full prospect of how that’s going to turn out. And, at least on moral theories that say we should sometimes promote the good (impartially wrt time, etc), consequentialist theories especially, it seems that it’s going to be NP-hard to say whether it’s better to save the child or not save the child. (cf Greaves’ suggestion wrt cluelessness that we’re more clueless about the effects of near-term interventions than those of long-term interventions) So, why is it that the argument doesn’t undermine those near-term interventions too, at least if we do them on ‘promoting-the-good’ grounds?
2. Broader applications
On a similar note, I wonder if there are much broader applications of this argument than just longtermism (or even for promoting the good in general). Non-consequentialist views (both those that sometimes recommend against promoting the good and those that place no weight on promoting the good) are affected by uncertainty too. Some rule-absolutist theories in particular can have their verdicts swayed by extremely low-probability propositions—some versions say that if an action has any non-zero probability of killing someone, you ought not do it. (Interesting discussion here and here) And plausible versions of views that recognise a harm-benefit asymmetry run into similar problems of many low-probability risks of harm (see this paper). Given that, just how much of conventional moral reasoning do you think your argument undermines?
(FWIW, I think this is a really neat line of objection against moral theories!)
3. Characterising longtermism
The definition of Prevent Possible Harms seemed a bit unusual. In fact, it sounds like it might violate Ought Implies Can just by itself. I can imagine there being some event e that might occur in the future, for which there’s no possible way we could make that e less likely or mitigate its impacts.
On a similar note, I think most longtermist EAs probably wouldn’t sign up to that version of PPH. Even when e can be made less likely or less harmful, they wouldn’t want to say we should take costly steps to prevent such an e regardless of how costly those steps are, and regardless of how much they’d affect e’s probability/harms.
Also, how much more complicated would it be to run the argument with the more standard definition of “deontic strong longtermism” from p26 of Greaves & MacAskill? (Or even just their definition of “axiological strong longtermism” on p3?)
Related: the line “a worldview that seeks to tell us what we ought to do, and which insists that extreme measures may need to be taken to prevent low-probability events with potentially catastrophic effects” seems like a bit of a mischaracterization. A purely consequentialist longtermist might endorse taking extreme measures, but G&M’s definition is compatible with having absolute rules against doing awful things—it allows that we should only do what’s best for the long term in decision situations where we don’t need to do awful things to achieve it, or even just in decisions of which charity to donate to. (And in What We Owe The Future, Will explicitly advocates against doing things that commonsense morality says are wrong.)
4. Longtermism the idea vs. what longtermists do in practice
On your response to the first counterargument (”...the imperative to avoid and mitigate the possibility of catastrophic climate change is not uniquely highlighted by longtermist effective altruists...Indeed, we have good evidence that we are already experiencing significant negative impacts of climate change (Letchner 2021), such that there is nothing especially longtermist about taking steps now to reduce climate change...” etc), this doesn’t seem like an objection to longtermism actually being true (at least as Greaves & MacAskill define it). It sounds like potentially a great objection to working on AI risk or causes with even more speculative evidence bases (some wild suggestions here). But for it to be ex ante better for the far future to work on climate change seems perfectly consistent with the basic principle of longtermism; it just means that a lot of self-proclaimed longtermists aren’t actually doing what longtermism recommends.
5. What sort of probabilities?
One thing I wasn’t clear on was what sort of probabilities you had in mind.
If they’re objective chances: The probabilities of lots of things will just be 0 or 1, perhaps including the proposition about AI risk. And objective chances already don’t seem action-guiding—there are plenty of decision situations where agents just won’t have any clue what the objective chances are (unless they’re running all sorts of quantum measurements).
If they’re subjective credences: It seems pretty easy for agents to figure out the probability of, say, AI catastrophe. They just need to introspect about how confident they are that it will/won’t happen. But then I think (but am unsure) that the basic problem you identify is that it would take way too much computation (more than any human could ever do) to figure out if those credences are actually coherent with all of the agent’s other credences. And, if they’re not, you might think that all possible decision theories just break down. Which is worrying! But it seems like, if we can put together a decision theory for incoherent probability distributions / bounded agents, then the problem could be overcome, maybe?
If they’re evidential probabilities (of the Williamson sort, relative to the agent’s evidence): These seem like the best candidate for being the normatively relevant sort of probabilities. And, if that’s what you have in mind, then it makes sense that agents can’t do all the computation necessary to work out what all the evidential probabilities are (which maybe isn’t a new point—it seems pretty widely recognised that doing Bayesian updating on everything would be way too hard for human agents).
6. “for all”
I think you’ve mostly answered this with the first counterargument, but I’ll ask anyway.
In the definitions of No Efficient Algorithm, PIBNETD-Harms, Independence of Bad Outcomes, and the statement of Dagum & Luby’s result, I was confused about the quantifiers. Why are we interested in the computational difficulty of this for any value of δ , for any belief network, for any proposition/variable V, and (for estimation) for any assignment of w to variables? Not just the actual value of δ, the agent’s actual belief network, and the actual propositions we’re trying do figure out whether they have non-zero probability? I don’t quite understand how general this needs to be to say something very specific like “There’s a non-zero probability that a pandemic will wipe out humanity”.
Here’s my more general confusion, I think: I don’t quite intuitively understand why it’s computationally hard to look up the probability of something if you’ve already got the full probability distribution over possible outcomes. Is it basically that, to do so, we have to evaluate Δu(V) across lots and lots of different possible states? Or is it the difficulty of thinking up every possible way the proposition could be true and every possible way it could be false and checking the probability of each of those? (Apologies for the dumb question!)
7. Biting the fanaticism bullet
(Getting into the fairly minor comments now)
I don’t think you need to bite the fanaticism bullet for your argument. At least if I’m roughly understanding the argument, it doesn’t require that we care about all propositions with non-zero probability, no matter how low their probability. Your response to the 3rd counterargument seems to get at this: we can just worry about propositions with absolute harms/benefits below some bound (and, I’m guessing, with probabilities above some bound) and we still have an NP-hard problem to solve. Is this right?
This is mainly a dialectical thing. I agree that fanaticism has good arguments behind it, but still many decision theorists would reject it and so would most longtermist EAs. It’d be a shame to give them the impression that, because of that, they don’t need to worry about this result!
8. Measuring computation time
I was confused by this: “In general, this means we can make efficient gains in the accuracy of our inferences. Setting δ=10^−4, if it takes takes approximately 1 minute to generate an estimate with a margin for error of ϵ=.05, then achieving a margin for error of ϵ=.025 will take four minutes.”
To be able to give computation times like 1 minute, do you have a particular machine in mind? And can you make the general point that “the time it takes goes up by a factor of 4 if we reduce the margin of error from x to y”?
Typos/phrasing
In the definition of Don’t Prevent Impossible Harms, I initially misread “For any event e that will not occur in the future” as being about what will actually happen, as against what it’s possible/impossible will happen. Maybe change the phrasing?
On the Ought Implies Can point, specifically “Moreover, Don’t Prevent Impossible Harms follows from the idea that “ought implies can” (Kant, 1781); if e won’t occur, then it’s not possible for us to make it any less likely, or to mitigate negative outcomes that occur because e occurs, and so we cannot be compelled to attempt to do so. To illustrate, if Venus were to suddenly deviate from its orbit tomorrow and collide with Earth, this would presumably lead to a very large aggregate reduction in utility on Earth. But Venus won’t do that...”: Ought Implies Can implies the version of Don’t Prevent Impossible Harms that you give (put in terms of reducing the probability), but it doesn’t imply that we shouldn’t prevent such harms. After all, if Venus is definitely not going to do that, then any action we take might (arguably) be said to ‘prevent’ it!
When you say “If P(V=1)>0, then there is real number δ such that, if Δu(V)<δ, then those agents ought to take costly steps now to make it less likely that V=1” (and mention Δu(V) elsewhere), shouldn’t it be “Δu(V)<δ” since Δu(V) is a measure of the difference in value, and only if that difference is great enough should agents take costly steps?
Typo: “Dagum and Luby’s result shows that this cannot be done in efficiently in the general case.”
Thanks again for the post!
Loved this post—reminds me a lot of intractability critiques of central economic planning, except now applied to consequentialism writ large.
I’d be curious if you think a weaker version of the “Prevent Possible Harms” principle would solve the issue—perhaps “Prevent Computably Possible Harms” and “Don’t Prevent Computably Impossible Harms”? Seems possibly related to debates around normative externalism and the extent to which we need our beliefs to be “objective” to be justified.
Yes I think you’re spot on in thinking that my thinking is more externalist, and a lot of longtermist reasoning has a distinctly internalist flavor. But spelling all that out will take even more work!
Hey, your theoretical arguments seem right, though not sure how much they bite in practice.
I think that one possible answer would be to not take a theory of the relationships between all possible events, but instead to take a simpler, more manageable set, and output a not-absolutely-certain recommendation.
As an analogy for something similar, it’s usually quite intractable to learn the full causal structure of some event, but we can do some forecasting with reference to, for instance, base rates.
For example, if I’m trying to predict whether Lukashenko will continue being the president of Belarus, I can’t learn literally all the relevant facts, so instead I will start out by looking at the average reign of similar dictators, and assume that the map from a reduced set of observables to survivability is similar (and then make further adjustments based on facts about the current context). This doesn’t seem like an intractable operation.
Some notes on a second below. But overall I’d tend to think that something went lost in the mathematization, and I’m fairly confident that at least one of the constraints doesn’t hold in practice.
- OpenPhilanthropy’s “hits based giving” approach seems like it doesn’t fall prey to your argument, because they are willing to ignore the “Don’t Prevent Impossible Harms” constraint.
- Still seems like you can get cheaper algorithms if you accept approximate recommendations.
> If there isn’t an efficient algorithm that tells us whether not we ought to take steps to prevent or mitigate the effects of any potentially catastrophic event, then it seems that Prevent Possible Harms does not give us any practical advice as to how we ought to live our lives.
> The key assumption here is that if a maxim is meant to be generally action-guiding, then we need to be able to efficiently determine, for any given case, whether or not it recommends taking a particular action.
Seems like this proves too much, per your note on climate change later in the post.
> PIBNETD: For any belief network and variable V within that belief network, output Yes if P ( V = 1 ) > 0 and output No otherwise.
Oh wow, this is much weaker than what I was expecting. I was expecting your formalization to also allow for approximate results. E.g., something like:
> PIBNETD: For any belief network and variable V within that belief network, output some approximation of P(V | N)
Also, my sense is that many belief networks may be sparse.
Re: Counterarguments. Yeah, this answers some questions, but not enough.
Note the contrast between:
> What is noteworthy about this case is not that the possibility of catastrophic climate change is a particularly salient variable, but rather that on the much narrower set of belief networks that are supported by existing climate data, we are able to efficiently reason that a climate catastrophe has positive probability of occurring.
> However, unlike in the case of climate change, the proposition that we have data-supported models that entail that there is a positive probability of a catastrophe brought about by artificial intelligence is much less plausible
We could still have fairly sparse belief networks based on subjective probability assessments.
> In light of the results presented above, we should be very skeptical of the epistemic value of individuals’ estimates of the probability of any event, especially when those estimates are not transparently based on a data-supported model. We know that in general, Turing machines cannot take an arbitrary probabilistic theory of how their environment works and accurately compute whether some proposition in that network has positive probability of being true. As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.
Seems not true. E.g., under a materialistic position, you could scan the brains of extremely excellent people (von Neumann, Mandela, etc.) and run them faster. This seems like it provides a proof of concept.
> we must, I conclude, accept that our epistemic position with respect to whether such events have positive probability of occurring is extremely limited.
If you have uncertainty about whether something has a 0% probability or not, you probably wouldn’t assign it a 0% probability. Or am I missing something?
***
Nitpick:
> In other words, there is no way for any finite intelligent being to determine, using only structural and probabilistic properties of a belief network, which proper subset of propositions represented in the network are such that their truth could amount to a very bad outcome. Under this assumption, the badness of a proposition is independent of its structural or probabilistic relationships to other propositions. As an intuition pump, suppose that you were able to view a large spreadsheet of propositions, each labelled just with a number. You also have a conditional probability table showing you the probabilistic relationships between all these propositions. It is hard to see how, given just this information, you could identify a proper propositions are such that, if they were true, things could be especially bad. The structural and probabilistic information contained in the spreadsheet does not, on its own, tell us anything about how we value the truth or falsehood of the propositions depicted
This may not true given intelligent agents trying to steer away from bad outcomes
For what it’s worth, I don’t think this is true (unless I’m misinterpreting!). Preferring low-probability, high-expected value gambles doesn’t require preferring gambles with probability 0 of success.
Well, what you are saying is true if you are certain that they are 0 probability. But not if you are willing to take bets which, in hindsight, you will realize had 0 probability of occurring.
Ah, I think we’ve got different notions of probability in mind: the subjective credence of the agent (OpenPhil grantmakers) versus something like the objective chances of the thing actually happening, irrespective of anyone’s beliefs.
Yeah, I think that if you stare at the second one, it doesn’t seem that decision relevant. E.g., a coin which is either heads or tails is 100% heads with 50% probability and 100% tails with 50% probability.
And if some important decision depended on whether it was heads or tails you might not wait and find out.
Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
Btw, what do you think about/are you familiar with work on logical induction?
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have the scheme on the top of my head, but this doesn’t seem right. If you assign probability 0, you would take any odds, and so I could make a lot of money when you eventually shift to a non-zero probability.
Right, but then that seems like a different objection, e.g., a recluctance to taking Pascal’s wager-type deals, or some preference related to your risk averseness, or some objection to expected value calculations under not-particularly-resilient low probabilities. But then that feels more like the true objection, not the computational complexity part. Would you say that’s a fair characterization?
I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
How about this gripe: You’ve shown that in theory, for an arbitrary set of probability assignments, it’s very difficult to compute implications.
But the landscape of probabilities in the real world is not an arbitrary set, and we’d expect to have it much more structure.
Thoughts?
This is the issue I was trying to address in counterargument 2.
Yeah, I disagree with this. If it is practically impossible to know whether X, then per the principle of ignorance we can assign 50% to X and 50% to not X.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
2. The von Neumann point was as a response to “As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.”, i.e., copying von Neumann’s brain is something that I’d consider possible, just very difficult. But it provides a proof of existence.
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.
PS: I think that I’m mostly in “trying to poke holes”, and I’ll take a bit longer to come to a view about whether this is actually true in practice.
Your argument in favour of epistemic humility seems compelling. I wonder, does it leave room for a Mitigate Possible Harms principle that promotes resilience rather than attempting to predict specific scenarios?
I think so! At the societal level, we can certainly do a lot more to make our world resilient without making specific predictions.