Some notes on a second below. But overall I’d tend to think that something went lost in the mathematization, and I’m fairly confident that at least one of the constraints doesn’t hold in practice.
- OpenPhilanthropy’s “hits based giving” approach seems like it doesn’t fall prey to your argument, because they are willing to ignore the “Don’t Prevent Impossible Harms” constraint. - Still seems like you can get cheaper algorithms if you accept approximate recommendations.
> If there isn’t an efficient algorithm that tells us whether not we ought to take steps to prevent or mitigate the effects of any potentially catastrophic event, then it seems that Prevent Possible Harms does not give us any practical advice as to how we ought to live our lives. > The key assumption here is that if a maxim is meant to be generally action-guiding, then we need to be able to efficiently determine, for any given case, whether or not it recommends taking a particular action.
Seems like this proves too much, per your note on climate change later in the post.
> PIBNETD: For any belief network and variable V within that belief network, output Yes if P ( V = 1 ) > 0 and output No otherwise.
Oh wow, this is much weaker than what I was expecting. I was expecting your formalization to also allow for approximate results. E.g., something like:
> PIBNETD: For any belief network and variable V within that belief network, output some approximation of P(V | N)
Also, my sense is that many belief networks may be sparse.
Re: Counterarguments. Yeah, this answers some questions, but not enough.
Note the contrast between:
> What is noteworthy about this case is not that the possibility of catastrophic climate change is a particularly salient variable, but rather that on the much narrower set of belief networks that are supported by existing climate data, we are able to efficiently reason that a climate catastrophe has positive probability of occurring.
> However, unlike in the case of climate change, the proposition that we have data-supported models that entail that there is a positive probability of a catastrophe brought about by artificial intelligence is much less plausible
We could still have fairly sparse belief networks based on subjective probability assessments.
> In light of the results presented above, we should be very skeptical of the epistemic value of individuals’ estimates of the probability of any event, especially when those estimates are not transparently based on a data-supported model. We know that in general, Turing machines cannot take an arbitrary probabilistic theory of how their environment works and accurately compute whether some proposition in that network has positive probability of being true. As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.
Seems not true. E.g., under a materialistic position, you could scan the brains of extremely excellent people (von Neumann, Mandela, etc.) and run them faster. This seems like it provides a proof of concept.
> we must, I conclude, accept that our epistemic position with respect to whether such events have positive probability of occurring is extremely limited.
If you have uncertainty about whether something has a 0% probability or not, you probably wouldn’t assign it a 0% probability. Or am I missing something?
***
Nitpick:
> In other words, there is no way for any finite intelligent being to determine, using only structural and probabilistic properties of a belief network, which proper subset of propositions represented in the network are such that their truth could amount to a very bad outcome. Under this assumption, the badness of a proposition is independent of its structural or probabilistic relationships to other propositions. As an intuition pump, suppose that you were able to view a large spreadsheet of propositions, each labelled just with a number. You also have a conditional probability table showing you the probabilistic relationships between all these propositions. It is hard to see how, given just this information, you could identify a proper propositions are such that, if they were true, things could be especially bad. The structural and probabilistic information contained in the spreadsheet does not, on its own, tell us anything about how we value the truth or falsehood of the propositions depicted
This may not true given intelligent agents trying to steer away from bad outcomes
OpenPhilanthropy’s “hits based giving” approach seems like it doesn’t fall prey to your argument, because they are willing to ignore the “Don’t Prevent Impossible Harms” constraint.
For what it’s worth, I don’t think this is true (unless I’m misinterpreting!). Preferring low-probability, high-expected value gambles doesn’t require preferring gambles with probability 0 of success.
Well, what you are saying is true if you are certain that they are 0 probability. But not if you are willing to take bets which, in hindsight, you will realize had 0 probability of occurring.
Ah, I think we’ve got different notions of probability in mind: the subjective credence of the agent (OpenPhil grantmakers) versus something like the objective chances of the thing actually happening, irrespective of anyone’s beliefs.
something like the objective chances of the thing actually happening, irrespective of anyone’s beliefs
Yeah, I think that if you stare at the second one, it doesn’t seem that decision relevant. E.g., a coin which is either heads or tails is 100% heads with 50% probability and 100% tails with 50% probability.
And if some important decision depended on whether it was heads or tails you might not wait and find out.
Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
Btw, what do you think about/are you familiar with work on logical induction?
We present a computable algorithm that assigns probabilities to every logical statement in a given formal language, and refines those probabilities over time. For instance, if the language is Peano arithmetic, it assigns probabilities to all arithmetical statements, including claims about the twin prime conjecture, the outputs of long-running computations, and its own probabilities. We show that our algorithm, an instance of what we call a logical inductor, satisfies a number of intuitive desiderata, including: (1) it learns to predict patterns of truth and falsehood in logical statements, often long before having the resources to evaluate the statements, so long as the patterns can be written down in polynomial time; (2) it learns to use appropriate statistical summaries to predict sequences of statements whose truth values appear pseudorandom; and (3) it learns to have accurate beliefs about its own current beliefs, in a manner that avoids the standard paradoxes of self-reference
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have the scheme on the top of my head, but this doesn’t seem right. If you assign probability 0, you would take any odds, and so I could make a lot of money when you eventually shift to a non-zero probability.
But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
Right, but then that seems like a different objection, e.g., a recluctance to taking Pascal’s wager-type deals, or some preference related to your risk averseness, or some objection to expected value calculations under not-particularly-resilient low probabilities. But then that feels more like the true objection, not the computational complexity part. Would you say that’s a fair characterization?
I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’
Yeah, I disagree with this. If it is practically impossible to know whether X, then per the principle of ignorance we can assign 50% to X and 50% to not X.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
2. The von Neumann point was as a response to “As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.”, i.e., copying von Neumann’s brain is something that I’d consider possible, just very difficult. But it provides a proof of existence.
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.
Some notes on a second below. But overall I’d tend to think that something went lost in the mathematization, and I’m fairly confident that at least one of the constraints doesn’t hold in practice.
- OpenPhilanthropy’s “hits based giving” approach seems like it doesn’t fall prey to your argument, because they are willing to ignore the “Don’t Prevent Impossible Harms” constraint.
- Still seems like you can get cheaper algorithms if you accept approximate recommendations.
> If there isn’t an efficient algorithm that tells us whether not we ought to take steps to prevent or mitigate the effects of any potentially catastrophic event, then it seems that Prevent Possible Harms does not give us any practical advice as to how we ought to live our lives.
> The key assumption here is that if a maxim is meant to be generally action-guiding, then we need to be able to efficiently determine, for any given case, whether or not it recommends taking a particular action.
Seems like this proves too much, per your note on climate change later in the post.
> PIBNETD: For any belief network and variable V within that belief network, output Yes if P ( V = 1 ) > 0 and output No otherwise.
Oh wow, this is much weaker than what I was expecting. I was expecting your formalization to also allow for approximate results. E.g., something like:
> PIBNETD: For any belief network and variable V within that belief network, output some approximation of P(V | N)
Also, my sense is that many belief networks may be sparse.
Re: Counterarguments. Yeah, this answers some questions, but not enough.
Note the contrast between:
> What is noteworthy about this case is not that the possibility of catastrophic climate change is a particularly salient variable, but rather that on the much narrower set of belief networks that are supported by existing climate data, we are able to efficiently reason that a climate catastrophe has positive probability of occurring.
> However, unlike in the case of climate change, the proposition that we have data-supported models that entail that there is a positive probability of a catastrophe brought about by artificial intelligence is much less plausible
We could still have fairly sparse belief networks based on subjective probability assessments.
> In light of the results presented above, we should be very skeptical of the epistemic value of individuals’ estimates of the probability of any event, especially when those estimates are not transparently based on a data-supported model. We know that in general, Turing machines cannot take an arbitrary probabilistic theory of how their environment works and accurately compute whether some proposition in that network has positive probability of being true. As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.
Seems not true. E.g., under a materialistic position, you could scan the brains of extremely excellent people (von Neumann, Mandela, etc.) and run them faster. This seems like it provides a proof of concept.
> we must, I conclude, accept that our epistemic position with respect to whether such events have positive probability of occurring is extremely limited.
If you have uncertainty about whether something has a 0% probability or not, you probably wouldn’t assign it a 0% probability. Or am I missing something?
***
Nitpick:
> In other words, there is no way for any finite intelligent being to determine, using only structural and probabilistic properties of a belief network, which proper subset of propositions represented in the network are such that their truth could amount to a very bad outcome. Under this assumption, the badness of a proposition is independent of its structural or probabilistic relationships to other propositions. As an intuition pump, suppose that you were able to view a large spreadsheet of propositions, each labelled just with a number. You also have a conditional probability table showing you the probabilistic relationships between all these propositions. It is hard to see how, given just this information, you could identify a proper propositions are such that, if they were true, things could be especially bad. The structural and probabilistic information contained in the spreadsheet does not, on its own, tell us anything about how we value the truth or falsehood of the propositions depicted
This may not true given intelligent agents trying to steer away from bad outcomes
For what it’s worth, I don’t think this is true (unless I’m misinterpreting!). Preferring low-probability, high-expected value gambles doesn’t require preferring gambles with probability 0 of success.
Well, what you are saying is true if you are certain that they are 0 probability. But not if you are willing to take bets which, in hindsight, you will realize had 0 probability of occurring.
Ah, I think we’ve got different notions of probability in mind: the subjective credence of the agent (OpenPhil grantmakers) versus something like the objective chances of the thing actually happening, irrespective of anyone’s beliefs.
Yeah, I think that if you stare at the second one, it doesn’t seem that decision relevant. E.g., a coin which is either heads or tails is 100% heads with 50% probability and 100% tails with 50% probability.
And if some important decision depended on whether it was heads or tails you might not wait and find out.
Thank you so much for your careful engagement with this piece! There’s a lot to respond to here, but just for starters:
You can certainly design a sparse belief network wherein Bayesian inference is tractable and one node corresponds to the possibility of an AI apocalypse. But I don’t see how such a network would justify the credences that you derive from it, to the point that you would be willing to make a costly bet now on such an apocalypse being possible. Intelligence, and interactions between intelligent creatures, strikes me as an extremely complex system that requires elaborate, careful modeling before we can make meaningful predictions.
Scanning von Neumann’s brain and speeding it up to do Bayesian inference could maybe establish a more efficient baseline for the speed of inference. But that doesn’t change the fact that unless P=NP, the time it takes the super-vonNeumann-brain to do inference will still grow exponentially in the size of the input belief network.
It don’t think ‘we ought to assign X positive probability’ follows from ‘it is practically impossible to know whether X has positive probability.’ That said, I also don’t have well-worked-out theory of how reasoning under uncertainty should countenance practical limits on said reasoning.
I don’t understand the final nitpick. You have just a Bayesian network and the associated conditional probability distribution. How do you thereby determine which nodes correspond to potential catastrophes? In general it seems that a utility function over outcomes just contains information that can’t be extracted from the probability function over those same outcomes.
Btw, what do you think about/are you familiar with work on logical induction?
I love that work! And I think this fits in nicely with another comment that you make below about the principle of indifference. The problem, as I see it, is that you have an agent who adopts some credences and a belief structure that defines a full distribution over a set of propositions. It’s either consistent or inconsistent with that distribution to assign some variable X a strictly positive probability. But, let’s suppose, a Turing machine can’t determine that in polynomial time. As I understand Garrabrant et al., I’m fine to pick any credence I like, since logical inconsistencies are only a problem if they allow you to be Dutch booked in polynomial time. As a way of thinking about reasoning under logical uncertainty, it’s ingenious. But once we start thinking about our personal probabilities as guides to what we ought to do, I get nervous. Note that just as I’m free to assign X a strictly positive probability distribution under Garrabrant’s criterion, I’m also free to assign it a distribution that allows for probability zero (even if that ends up being inconsistent, by stipulation I can’t be dutch-booked in polynomial time). One could imagine a precautionary principle that says, in these cases, to always pick a strictly positive probability distribution. But then again I’m worried that once we allow for all these conceivable events that we can’t figure out much about to have positive probability, we’re opening the floodgates for an ever-more-extreme apportionment of resources to lower-and-lower probability catastrophes.
I don’t have the scheme on the top of my head, but this doesn’t seem right. If you assign probability 0, you would take any odds, and so I could make a lot of money when you eventually shift to a non-zero probability.
Right, but then that seems like a different objection, e.g., a recluctance to taking Pascal’s wager-type deals, or some preference related to your risk averseness, or some objection to expected value calculations under not-particularly-resilient low probabilities. But then that feels more like the true objection, not the computational complexity part. Would you say that’s a fair characterization?
I do think that the issues with Pascal’s wager-type deals are compounded by the possibility that the positive probability you assign to the relevant outcome might be inconsistent with other beliefs you have, and settling the question of consistency is computationally intractable). In the classic Pascal’s wager, there’s no worry about internal inconsistency in your credences.
How about this gripe: You’ve shown that in theory, for an arbitrary set of probability assignments, it’s very difficult to compute implications.
But the landscape of probabilities in the real world is not an arbitrary set, and we’d expect to have it much more structure.
Thoughts?
This is the issue I was trying to address in counterargument 2.
Yeah, I disagree with this. If it is practically impossible to know whether X, then per the principle of ignorance we can assign 50% to X and 50% to not X.
In light of my earlier comment about logical induction, I think this case is different from the classical use-case for the principle of ignorance, where we have n that we know nothing about, and so we assign each probability 1/n. Here, we have a set of commitments that we know entails that there is either a strictly positive or an extreme, delta-function-like distribution over some variable X, but we don’t know which. So if we apply the principle of ignorance to those two possibilities, we end up assigning equal higher-order-credence to the normative proposition that we ought to assign a strictly positive distribution over X and to the proposition that we ought to assign a delta-function-distribution over X. If our final credal distribution over X is a blend of these two distributions, then we end up with a strictly positive credal distribution over X. But, now we’ve arrived at a conclusion that we stipulated might be inconsistent with our other epistemic commitments! If nothing else, this shows that applying indifference reasoning here is much more involved than in the classic case. Garrabrant wants to say, I think, that this reasoning could be fine as long as the inconsistency that it potentially leads to can’t be exploited in polynomial time. But then see my other worries about this kind of reasoning in my response above.
2. The von Neumann point was as a response to “As such, we have no reason to think that any human being, even a top ML researcher, can use their best attempt at a theory of how intelligence and computing work to compute whether it is even possible for an AI system to accomplish every task better and more cheaply than human workers.”, i.e., copying von Neumann’s brain is something that I’d consider possible, just very difficult. But it provides a proof of existence.
Thanks for clarifying that! I think there are few reasons to be wary of whole brain emulation as a route to super-intelligence (see this from Mandelbaum: https://philpapers.org/rec/MANEAM-4). Now I’m aware that if whole brain emulation isn’t possible, then some of the computationalist assumptions in my post (namely, that the same limits on Turing machines apply to humans) seem less plausible. But I think there are at least two ways out. One is to suppose that computation in the human brain is sub-neural, and so brain emulation will still leave out important facets of human cognition. Another is to say that whole brain emulation may still be plausible, but that there are speed limits on the computations that the brain does that prevent the kind of speeding up that you imagine. Here, work on the thermodynamics of computation is relevant.
But, in any event (and I suspect this is a fundamental disagreement between me and many longtermists) I’m wary of the argumentative move from mere conceivability to physical possibility. We know so little about the physics of intelligence. The idea of emulating a brain and then speeding it up may turn out to be similar to the idea of getting something to move at the speed of light, and then speeding it up a bit more. It sounds fine as a thought experiment, but it turns out it’s physically incoherent. On the other hand, whole brain emulation plus speed-ups may be perfectly physically coherent. But my sense is we just don’t know.
PS: I think that I’m mostly in “trying to poke holes”, and I’ll take a bit longer to come to a view about whether this is actually true in practice.