Thanks so much for this very clear response, it was a very satisfying read, and there’s a lot for me to chew on. And thanks for locating the point of disagreement — prior to this post, I would have guessed that the biggest difference between me and some others was on the weight placed on the arguments for the Time of Perils and Value Lock-In views, rather than on the choice of prior. But it seems that that’s not true, and that’s very helpful to know. If so, it suggests (advertisement to the Forum!) that further work on prior-setting in EA contexts is very high-value.
I agree with you that under uncertainty over how to set the prior, because we’re clearly so distinctive in some particular ways (namely, that we’re so early on in civilisation, that the current population is so small, etc), my choice of prior will get washed out by models on which those distinctive features are important; I characterised these as outside-view arguments, but I’d understand if someone wanted to characterise that as prior-setting instead.
I also agree that there’s a strong case for making the prior over persons (or person-years) rather than centuries. In your discussion, you go via number of persons (or person-years) per century to the comparative importance of centuries. What I’d be inclined to do is just change the claim under consideration to: “I am among the (say) 100,000 most influential people ever”. This means we still take into account the fact that, though more populous centuries are more likely to be influential, they are also harder to influence in virtue of their larger population. If we frame the core claim in terms of being among the most influential people, rather than being at the most influential time, the core claim seems even more striking to me. (E.g. a uniform prior over the first 100 billion people would give a prior of 1 in 1 million of being in the 100,000 most influential people ever. Though of course, there would also be an extra outside-view argument for moving from this prior, which is that not many people are trying to influence the long-run future.)
However, I don’t currently feel attracted to your way of setting up the prior. In what follows I’ll just focus on the case of a values lock-in event, and for simplicity I’ll just use the standard Laplacean prior rather than your suggestion of a Jeffreys prior.
In significant part my lack of attraction is because the claims — that (i) there’s a point in time where almost everything about the fate of the universe gets decided; (ii) that point is basically now; (iii) almost no-one sees this apart from us (where ‘us’ is a very small fraction of the world) — seem extraordinary to me, and I feel I need extraordinary evidence in order to have high credence in them. My prior-setting discussion was one way of cashing out why these seem extraordinary. If there’s some way of setting priors such that claims (i)-(iii) aren’t so extraordinary after all, I feel like a rabbit is being pulled out of a hat.
Then I have some specific worries with the Laplacean approach (which I *think* would apply to the Jeffreys prior, too, but I’m yet to figure out what a Fischer information matrix is, so I don’t totally back myself here).
But before I mention the worries, I’ll note that it seems to me that you and I are currently talking about priors over different propositions. You seem to be considering the propositions, ‘there is a lock-in event this century’ or ‘there is an extinction event this century’; I’m considering the proposition ‘I am at the most influential time ever’ or ‘I am one of the most influential people ever.’ As is well-known, when it comes to using principle-of-indifference-esque reasoning, if you use that reasoning over a number of different propositions then you can end up with inconsistent probability assignments. So, at best, one should use such reasoning in a very restricted way.
The reason I like thinking about my proposition (‘are we at the most important time?’ or ‘are we one of the most influential people ever?’) for the restricted principle of indifference, is that:
(i) I know the frequency of occurrence of ‘most influential person’, for each possible total population of civilization (past, present and future). Namely, it occurs once out of the total population. So I can look at each possible population size for the future, look at my credence in each possible population occurring, and in each case know the frequency of being the most influential person (or, more naturally, in the 100,000 most influential people).
(ii) it’s the most relevant proposition for the question of what I should do. (e.g. Perhaps it’s likely that there’s a lock-in event, but we can’t do anything about it and future people could, so we should save for a later date.)
Anyway, the worries about Laplacean (and Jeffreys) prior.
First, the Laplacean prior seems to get the wrong answer for lots of similar predicates. Consider the claims: “I am the most beautiful person ever” or “I am the strongest person ever”, rather than “I am the most important person ever”. If we used the Laplacean prior in the way you suggest for these claims, the first person would assign 50% credence to being the strongest person ever, even if they knew that there was probably going to be billions of people to come. This doesn’t seem right to me.
Second, it also seems very sensitive to our choice of start date. If the proposition under question is, ‘there will be a lock-in event this century’, I’d get a very different prior depending on whether I chose to begin counting from: (i) the dawn of the information age; (ii) the beginning of the industrial revolution; (iii) the start of civilisation; (iv) the origin of homo sapiens; (v) the origin of the genus homo; (vi) the origin of mammals, etc.
Of course, the uniform prior has something similar, but I think it handles the issue gracefully. e.g. On priors, I should think it’s 1 in 5 million likely that I’m the funniest person in Scotland; 1 in 65 million that I’m the funniest person in Britain, 1 in 7.5 billion that I’m the funniest person in the world. Similarly, with whether I’m the most influential person in the post-industrial era, the post-agricultural era, etc.
Third, the Laplacean prior doesn’t add up to 1 across all people. For example, suppose you’re the first person and you know that there will be 3 people. Then, on the Laplacean prior, the total probability for being the most influential person ever is ½ + ½(⅓) + ½(⅔)(¼) = ¾. But I know that someone has to be the most influential person ever. This suggests the Laplacean prior is the wrong prior choice for the proposition I’m considering, whereas the simple frequency approach gets it right.
So even if one feels skeptical of the uniform prior, I think the Laplacean way of prior-setting isn’t a better alternative. In general: I’m sympathetic to having a model where early people are more likely to be more influential, but a model which is uniform over orders of magnitude seems too extreme to me.
(As a final thought: Doesn’t this form of prior-setting also suffer from the problem of there being too many hypotheses? E.g. consider the propositions:
A—There will be a value lock-in event this century B—There will be a lock-in of hedonistic utilitarian values this century
C—There will be a lock-in of preference utilitarian values this century
D—There will be a lock-in of Kantian values this century
E—There will be a lock-in of fascist values this century
On the Laplacean approach, these would all get the same probability assignment—which seems inconsistent. And then just by stacking priors over particular lock-in events, we can get a probability that it’s overwhelmingly likely that there’s some lock-in event this century. I’ve put this comment in parentheses, though, as I feel *even less* confident about my worry here than my other worries listed.)
Thanks for this very thorough reply. There are so many strands here that I can’t really hope to do justice to them all, but I’ll make a few observations.
1) There are two versions of my argument. The weak/vague one is that a uniform prior is wrong and the real prior should decay over time, such that you can’t make your extreme claim from priors. The strong/precise one is that it should decay as 1/n^2 in line with a version of LLS. The latter is more meant as an illustration. It is my go-to default for things like this, but my main point here is the weaker one. It seems that you agree that it should decay, and that the main question now is whether it does so fast enough to make your prior-based points moot. I’m not quite sure how to resolve that. But I note that from this position, we can’t reach either your argument that from priors this is way too unlikely for our evidence to overturn (and we also can’t reach my statement of the opposite of that).
2) I wouldn’t use the LLS prior for arbitrary superlative properties where you fix the total population. I’d use it only if the population over time was radically unknown (so that the first person is much more likely to be strongest than the thousandth, because there probably won’t be a thousand) or where there is a strong time dependency such that it happening at one time rules out later times.
3) You are right that I am appealing to some structural properties beyond mere superlatives, such as extinction or other permanent lock-in. This is because these things happening in a century would be sufficient for that century to have a decent chance of being the most influential (technically this still depends on the influenceability of the event, but I think most people would grant that conditional on next century being the end of humanity, it is no longer surprising at all if this or next century were the most influential). So I think that your prior setting approach proves too much, telling us that there is almost no chance of extinction or permanent lock-in next century (and even after updating on evidence). This feels fishy. A bit like Bostrom’s ‘presumptuous philosopher’ example. I think it looks even more fishy in your worked example where the prior is low precisely because of an assumption about how long we will last without extinction: especially as that assumption is compatible with, say, a 50% chance of extinction in the next century. (I don’t think this is a knockdown blow here: but I’m trying to indicate the part of your argument I think would be most likely to fall and roughly why).
4) I agree there is an issue to do with too many hypotheses . And a related issue with what is the first timescale on which to apply a 1⁄2 chance of the event occurring. I think these can be dealt with together. You modify the raw LLS prior by some other kind of prior you have for each particular type of event (which you need to have since some are sub-events of others and rationality requires you to assign lower probability to them). You could operationalise this by asking over what time frame you’d expect a 1⁄2 chance of that event occurring. Then LLS isn’t acting as an indifference principle, but rather just as a way of keeping track of how to update your ur prior in light of how many time periods have elapsed without the event occurring. I think this should work out somewhat similarly, just with a stretched PDF that still decays as 1/n^2, but am not sure. There may be a literature on this.
I appreciate your explicitly laying out issues with the Laplace prior! I found this helpful.
The approach to picking a prior here which I feel least uneasy about is something like: “take a simplicity-weighted average over different generating processes for distributions of hinginess over time”. This gives a mixture with some weight on uniform (very simple), some weight on monotonically-increasing and monotonically-decreasing functions (also quite simple), some weight on single-peaked and single-troughed functions (disproportionately with the peak or trough close to one end), and so on…
If we assume a big future and you just told me the number of people in each generation, I think my prior might be something like 20% that the most hingey moment was in the past, 1% that it was in the next 10 centuries, and the rest after that. After I notice that hingeyness is about influence, and causality gives a time asymmetry favouring early times, I think I might update to >50% that it was in the past, and 2% that it would be in the next 10 centuries.
(I might start with some similar prior about when the strongest person lives, but then when I begin to understand something about strength the generating mechanisms which suggest that the strongest people would come early and everything would be diminishing thereafter seem very implausible, so I would update down a lot on that.)
I’m sympathetic to the mixture of simple priors approach and value simplicity a great deal. However, I don’t think that the uniform prior up to an arbitrary end point is the simplest as your comment appears to suggest. e.g. I don’t see how it is simpler than an exponential distribution with an arbitrary mean (which is the max entropy prior over R+ conditional on a finite mean). I’m not sure if there is a max entropy prior over R+ without the finite mean assumption, but 1/x^2 looks right to me for that.
Also, re having a distribution that increases over a fixed time interval giving a peak at the end, I agree that this kind of thing is simple, but note that since we are actually very uncertain over when that interval ends, that peak gets very smeared out. Enough so that I don’t think there is a peak at the end at all when the distribution is denominated in years (rather than centiles through human history or something). That said, it could turn into a peak in the middle, depending on the nature of one’s distribution over durations.
Hi Toby,
Thanks so much for this very clear response, it was a very satisfying read, and there’s a lot for me to chew on. And thanks for locating the point of disagreement — prior to this post, I would have guessed that the biggest difference between me and some others was on the weight placed on the arguments for the Time of Perils and Value Lock-In views, rather than on the choice of prior. But it seems that that’s not true, and that’s very helpful to know. If so, it suggests (advertisement to the Forum!) that further work on prior-setting in EA contexts is very high-value.
I agree with you that under uncertainty over how to set the prior, because we’re clearly so distinctive in some particular ways (namely, that we’re so early on in civilisation, that the current population is so small, etc), my choice of prior will get washed out by models on which those distinctive features are important; I characterised these as outside-view arguments, but I’d understand if someone wanted to characterise that as prior-setting instead.
I also agree that there’s a strong case for making the prior over persons (or person-years) rather than centuries. In your discussion, you go via number of persons (or person-years) per century to the comparative importance of centuries. What I’d be inclined to do is just change the claim under consideration to: “I am among the (say) 100,000 most influential people ever”. This means we still take into account the fact that, though more populous centuries are more likely to be influential, they are also harder to influence in virtue of their larger population. If we frame the core claim in terms of being among the most influential people, rather than being at the most influential time, the core claim seems even more striking to me. (E.g. a uniform prior over the first 100 billion people would give a prior of 1 in 1 million of being in the 100,000 most influential people ever. Though of course, there would also be an extra outside-view argument for moving from this prior, which is that not many people are trying to influence the long-run future.)
However, I don’t currently feel attracted to your way of setting up the prior. In what follows I’ll just focus on the case of a values lock-in event, and for simplicity I’ll just use the standard Laplacean prior rather than your suggestion of a Jeffreys prior.
In significant part my lack of attraction is because the claims — that (i) there’s a point in time where almost everything about the fate of the universe gets decided; (ii) that point is basically now; (iii) almost no-one sees this apart from us (where ‘us’ is a very small fraction of the world) — seem extraordinary to me, and I feel I need extraordinary evidence in order to have high credence in them. My prior-setting discussion was one way of cashing out why these seem extraordinary. If there’s some way of setting priors such that claims (i)-(iii) aren’t so extraordinary after all, I feel like a rabbit is being pulled out of a hat.
Then I have some specific worries with the Laplacean approach (which I *think* would apply to the Jeffreys prior, too, but I’m yet to figure out what a Fischer information matrix is, so I don’t totally back myself here).
But before I mention the worries, I’ll note that it seems to me that you and I are currently talking about priors over different propositions. You seem to be considering the propositions, ‘there is a lock-in event this century’ or ‘there is an extinction event this century’; I’m considering the proposition ‘I am at the most influential time ever’ or ‘I am one of the most influential people ever.’ As is well-known, when it comes to using principle-of-indifference-esque reasoning, if you use that reasoning over a number of different propositions then you can end up with inconsistent probability assignments. So, at best, one should use such reasoning in a very restricted way.
The reason I like thinking about my proposition (‘are we at the most important time?’ or ‘are we one of the most influential people ever?’) for the restricted principle of indifference, is that:
(i) I know the frequency of occurrence of ‘most influential person’, for each possible total population of civilization (past, present and future). Namely, it occurs once out of the total population. So I can look at each possible population size for the future, look at my credence in each possible population occurring, and in each case know the frequency of being the most influential person (or, more naturally, in the 100,000 most influential people).
(ii) it’s the most relevant proposition for the question of what I should do. (e.g. Perhaps it’s likely that there’s a lock-in event, but we can’t do anything about it and future people could, so we should save for a later date.)
Anyway, the worries about Laplacean (and Jeffreys) prior.
First, the Laplacean prior seems to get the wrong answer for lots of similar predicates. Consider the claims: “I am the most beautiful person ever” or “I am the strongest person ever”, rather than “I am the most important person ever”. If we used the Laplacean prior in the way you suggest for these claims, the first person would assign 50% credence to being the strongest person ever, even if they knew that there was probably going to be billions of people to come. This doesn’t seem right to me.
Second, it also seems very sensitive to our choice of start date. If the proposition under question is, ‘there will be a lock-in event this century’, I’d get a very different prior depending on whether I chose to begin counting from: (i) the dawn of the information age; (ii) the beginning of the industrial revolution; (iii) the start of civilisation; (iv) the origin of homo sapiens; (v) the origin of the genus homo; (vi) the origin of mammals, etc.
Of course, the uniform prior has something similar, but I think it handles the issue gracefully. e.g. On priors, I should think it’s 1 in 5 million likely that I’m the funniest person in Scotland; 1 in 65 million that I’m the funniest person in Britain, 1 in 7.5 billion that I’m the funniest person in the world. Similarly, with whether I’m the most influential person in the post-industrial era, the post-agricultural era, etc.
Third, the Laplacean prior doesn’t add up to 1 across all people. For example, suppose you’re the first person and you know that there will be 3 people. Then, on the Laplacean prior, the total probability for being the most influential person ever is ½ + ½(⅓) + ½(⅔)(¼) = ¾. But I know that someone has to be the most influential person ever. This suggests the Laplacean prior is the wrong prior choice for the proposition I’m considering, whereas the simple frequency approach gets it right.
So even if one feels skeptical of the uniform prior, I think the Laplacean way of prior-setting isn’t a better alternative. In general: I’m sympathetic to having a model where early people are more likely to be more influential, but a model which is uniform over orders of magnitude seems too extreme to me.
(As a final thought: Doesn’t this form of prior-setting also suffer from the problem of there being too many hypotheses? E.g. consider the propositions:
A—There will be a value lock-in event this century
B—There will be a lock-in of hedonistic utilitarian values this century
C—There will be a lock-in of preference utilitarian values this century
D—There will be a lock-in of Kantian values this century
E—There will be a lock-in of fascist values this century
On the Laplacean approach, these would all get the same probability assignment—which seems inconsistent. And then just by stacking priors over particular lock-in events, we can get a probability that it’s overwhelmingly likely that there’s some lock-in event this century. I’ve put this comment in parentheses, though, as I feel *even less* confident about my worry here than my other worries listed.)
Thanks for this very thorough reply. There are so many strands here that I can’t really hope to do justice to them all, but I’ll make a few observations.
1) There are two versions of my argument. The weak/vague one is that a uniform prior is wrong and the real prior should decay over time, such that you can’t make your extreme claim from priors. The strong/precise one is that it should decay as 1/n^2 in line with a version of LLS. The latter is more meant as an illustration. It is my go-to default for things like this, but my main point here is the weaker one. It seems that you agree that it should decay, and that the main question now is whether it does so fast enough to make your prior-based points moot. I’m not quite sure how to resolve that. But I note that from this position, we can’t reach either your argument that from priors this is way too unlikely for our evidence to overturn (and we also can’t reach my statement of the opposite of that).
2) I wouldn’t use the LLS prior for arbitrary superlative properties where you fix the total population. I’d use it only if the population over time was radically unknown (so that the first person is much more likely to be strongest than the thousandth, because there probably won’t be a thousand) or where there is a strong time dependency such that it happening at one time rules out later times.
3) You are right that I am appealing to some structural properties beyond mere superlatives, such as extinction or other permanent lock-in. This is because these things happening in a century would be sufficient for that century to have a decent chance of being the most influential (technically this still depends on the influenceability of the event, but I think most people would grant that conditional on next century being the end of humanity, it is no longer surprising at all if this or next century were the most influential). So I think that your prior setting approach proves too much, telling us that there is almost no chance of extinction or permanent lock-in next century (and even after updating on evidence). This feels fishy. A bit like Bostrom’s ‘presumptuous philosopher’ example. I think it looks even more fishy in your worked example where the prior is low precisely because of an assumption about how long we will last without extinction: especially as that assumption is compatible with, say, a 50% chance of extinction in the next century. (I don’t think this is a knockdown blow here: but I’m trying to indicate the part of your argument I think would be most likely to fall and roughly why).
4) I agree there is an issue to do with too many hypotheses . And a related issue with what is the first timescale on which to apply a 1⁄2 chance of the event occurring. I think these can be dealt with together. You modify the raw LLS prior by some other kind of prior you have for each particular type of event (which you need to have since some are sub-events of others and rationality requires you to assign lower probability to them). You could operationalise this by asking over what time frame you’d expect a 1⁄2 chance of that event occurring. Then LLS isn’t acting as an indifference principle, but rather just as a way of keeping track of how to update your ur prior in light of how many time periods have elapsed without the event occurring. I think this should work out somewhat similarly, just with a stretched PDF that still decays as 1/n^2, but am not sure. There may be a literature on this.
I appreciate your explicitly laying out issues with the Laplace prior! I found this helpful.
The approach to picking a prior here which I feel least uneasy about is something like: “take a simplicity-weighted average over different generating processes for distributions of hinginess over time”. This gives a mixture with some weight on uniform (very simple), some weight on monotonically-increasing and monotonically-decreasing functions (also quite simple), some weight on single-peaked and single-troughed functions (disproportionately with the peak or trough close to one end), and so on…
If we assume a big future and you just told me the number of people in each generation, I think my prior might be something like 20% that the most hingey moment was in the past, 1% that it was in the next 10 centuries, and the rest after that. After I notice that hingeyness is about influence, and causality gives a time asymmetry favouring early times, I think I might update to >50% that it was in the past, and 2% that it would be in the next 10 centuries.
(I might start with some similar prior about when the strongest person lives, but then when I begin to understand something about strength the generating mechanisms which suggest that the strongest people would come early and everything would be diminishing thereafter seem very implausible, so I would update down a lot on that.)
I’m sympathetic to the mixture of simple priors approach and value simplicity a great deal. However, I don’t think that the uniform prior up to an arbitrary end point is the simplest as your comment appears to suggest. e.g. I don’t see how it is simpler than an exponential distribution with an arbitrary mean (which is the max entropy prior over R+ conditional on a finite mean). I’m not sure if there is a max entropy prior over R+ without the finite mean assumption, but 1/x^2 looks right to me for that.
Also, re having a distribution that increases over a fixed time interval giving a peak at the end, I agree that this kind of thing is simple, but note that since we are actually very uncertain over when that interval ends, that peak gets very smeared out. Enough so that I don’t think there is a peak at the end at all when the distribution is denominated in years (rather than centiles through human history or something). That said, it could turn into a peak in the middle, depending on the nature of one’s distribution over durations.