Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
Btw, what do you think about/are you familiar with work on logical induction?
We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that our algorithm, an instance of what we call a logical inductor, satisfies a number of intuitive desiderata, including: (1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time; (2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom; and (3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have the scheme on the top of my head, but this doesn’t seem right. If you assign probability 0, you would take any odds, and so I could make a lot of money when you eventually shift to a non-zero probability.
But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
Right, but then that seems like a different objection, e.g., a recluctance to taking Pascal’s wager-type deals, or some preference related to your risk averseness, or some objection to expected value calculations under not-particularly-resilient low probabilities. But then that feels more like the true objection, not the computational complexity part. Would you say that’s a fair characterization?
I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’
Yeah, I disagree with this. If it is practically impossible to know whether X, then per the principle of ignorance we can assign 50% to X and 50% to not X.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
2. The von Neumann point was as a response to “As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.”, i.e., copying von Neumann’s brain is something that I’d consider possible, just very difficult. But it provides a proof of existence.
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.
Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
Btw, what do you think about/are you familiar with work on logical induction?
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have the scheme on the top of my head, but this doesn’t seem right. If you assign probability 0, you would take any odds, and so I could make a lot of money when you eventually shift to a non-zero probability.
Right, but then that seems like a different objection, e.g., a recluctance to taking Pascal’s wager-type deals, or some preference related to your risk averseness, or some objection to expected value calculations under not-particularly-resilient low probabilities. But then that feels more like the true objection, not the computational complexity part. Would you say that’s a fair characterization?
I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
How about this gripe: You’ve shown that in theory, for an arbitrary set of probability assignments, it’s very difficult to compute implications.
But the landscape of probabilities in the real world is not an arbitrary set, and we’d expect to have it much more structure.
Thoughts?
This is the issue I was trying to address in counterargument 2.
Yeah, I disagree with this. If it is practically impossible to know whether X, then per the principle of ignorance we can assign 50% to X and 50% to not X.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
2. The von Neumann point was as a response to “As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.”, i.e., copying von Neumann’s brain is something that I’d consider possible, just very difficult. But it provides a proof of existence.
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.