I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
David Kinney
Yes I think you’re spot on in thinking that my thinking is more externalist, and a lot of longtermist reasoning has a distinctly internalist flavor. But spelling all that out will take even more work!
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
This is the issue I was trying to address in counterargument 2.
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have a fully-formed opinion here, but for now I’ll just note that the task that the examined futurists are implicitly given is very different from assigning a probability distribution to a variable based on parameters. Rather, the implicit task is to say some stuff that you think will happen. Then we’re judging whether those things happen. But I’m not sure how to translate the output from the task into action. (E.g., Asimov says X will happen, and so we should do Y.)
I think the contrast with elections is an important and interesting one. I’ll start by saying that being able to coarse-grain the set of all possible worlds into two possibilities doesn’t mean we should assign both possibilities positive probability. Consider the set of all possible sequences of infinite coin tosses. We can coarse-grain those sequences into two sets: the ones where finitely many coins land heads, and the ones where infinitely many coins lands heads. But, assuming we’re actually going to toss infinitely many coins, and assuming each coin is fair, the first set of sequences has probability zero and the second set has probability one.
In the election case, we have a good understanding of the mechanism by which elections are (hopefully) won. In this simple case with a plurality rule, we just want to know which candidate will get the most votes. So we can define probability distributions over the possible number of votes cast, and probability distributions over possible distributions of those votes to different candidates (where vote distributions are likely conditional on overall turnout), and coarse-grain those various vote distributions into the possibility of each candidate winning. This is a simple case, and no doubt real-world election models have many more parameters, but my point is that we understand the relevant possibility space and how it relates to our outcomes of interest fairly well. I don’t think we have anything like this understanding in the AGI case.
I think so! At the societal level, we can certainly do a lot more to make our world resilient without making specific predictions.
Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
So I’m actually fine with fanaticism in principle if we allow some events to have probability zero. But if every event in our possibility space has positive probability, then I worry that you’ll just throw ever-more resources at preventing ever-lower probability catastrophes.
On probability zero events and Bayesianism in the case where the sample space is a continuum, Easwaran is a great source (this is long but worth it, sec. 1.3.3 and sec. 2 are the key parts): https://philpapers.org/archive/EASCP.pdf
I guess the worry then is that you’re drawn into fanaticism: in principle, any positive probability event, however small that probability is, can be bad enough to justify taking extremely costly measures now to ameliorate it.
I’d also say that assigning all events positive probability can’t be a part of bayesianism in general if we want to allow for a continuum of possible events (e.g., as many possible events as there are real numbers).
I do think the best way out for the position I’m arguing against is something like: assume all events have positive probability, set an upper bound on the badness of events and the costliness of ameliorating them (to avoid fanaticism) and then hope you can run simulations that give you a tight margin for error with low failure probability.
My kind of quick off-the-cuff theory of good forecasting is that you’re probably running something like a good Monte Carlo algorithm in your head as you simulate outcomes. That’s great if you’re willing to assign all events positive probability (a good idea when forecasting something like an election). But that assumption begs the question against Don’t Prevent Impossible Harms. And, as I note in the article, getting very high precision can still be computationally expensive.
Thanks for reading and for the very thoughtful comment. I take issue with the idea that human forecasters can’t be using models where bayesian inference is intractable, since they demonstrably are able to infer the probabilistic consequences of their prior beliefs. I suspect that we can do this in some well-understood systems (e.g., a tennis player might have a near-bayes-optimal mental model of what will happen under a lot of different salient possibilities when they return a serve). But it also seems like a salient possibility to me that when someone says “I think the probability of X event is positive,” that they’re actually not in coherence with all of their other beliefs about the world, were we to systematically elucidate all of those beliefs.
I think we’ve arrived at a nice place then! Thank you so much for reading!
I think if someone tells you that a potentially catastrophic event has positive probability, then the general intractability of probabilistic inference is a good reason to demand a demonstrably tractable model of the system that generates the event, before incurring massive costs. Otherwise, this person is just saying: look, I’ve got some beliefs about the world, and I’m able to infer from those believes that this event that’s never happened before has positive probability. My response is to say that this just isn’t the sort of thing we can do in the general case; we can only do it in the case of specific classes of models. Thus, my recommendation for more science and less forecasting in EA.
So I think we should be skeptical of any claims that some event has positive probability when the event hasn’t happened before and we don’t have a well-worked-out model of the process that would produce that sort of event. It just strikes me that these features are more typical of longer-term predictions.
Even though you disagreed with my post, I was touched to see that it was one of the “top” posts that you disagreed with :). However, I’m really struggling to see the connection between my argument and Deutsch’s views on AI and universal explainers. There’s nothing in the piece that you link to about complexity classes or efficiency limits on algorithms.