How much do you worry that MIRI’s default non-disclosure policy is going to hinder MIRI’s ability to do good research, because it won’t be able to get as much external criticism?
Suppose you find out that Buck-in-2040 thinks that the work you’re currently doing is a big mistake (which should have been clear to you, now). What are your best guesses about what his reasons are?
What’s the biggest misconception people have about current technical AI alignment work? What’s the biggest misconception people have about MIRI?
Thanks Greg—I really enjoyed this post.
I don’t think that this is what you’re saying, but I think if someone drew the lesson from your post that, when reality is underpowered, there’s no point in doing research into the question, that would be a mistake.
When I look at tiny-n sample sizes for important questions (e.g.: “How have new ideas made major changes to the focus of academic economics?” or “Why have social movements collapsed in the past?”), I generally don’t feel at all like I’m trying to get a p<0.05 ; it feels more like hypothesis generation. So when I find out that Kahneman and Tversky spent 5 years honing the article Prospect Theory into a form that could be published in an economics journal, I think “wow, ok, maybe that’s the sort of time investment that we should be thinking of”. Or when I see social movements collapse because of in-fighting (e.g. pre-Copenhagen UK climate movement), or romantic disputes between leaders (e.g. Objectivism), then—insofar as we just want to take all the easy wins to mitigate catastrophic risks to the EA community—I know that this risk is something to think about and focus on for EA.
For these sorts of areas, the right approach seems to be granular qualitative research—trying to really understand in depth what happened in some other circumstance, and then think through what lessons that entail for the circumstance you’re interested in. I think that, as a matter of fact, EA does this quite a lot when relevant. (E.g. Grace on Szilard, or existing EA discussion of previous social movements). So I think this gives us extra reason to push against the idea that “EA-style analysis” = “quant-y RCT-esque analysis” rather than “whatever research methods are most appropriate to the field at hand”. But even on qualitative research I think the “EA mindset” can be quite distinctive—certainly I think, for example, that a Bayesian-heavy approach to historical questions, often addressing counterfactual questions, and looking at those issues that are most interesting from an EA perspective (e.g. how modern-day values would be different if Christianity had never taken off), would be really quite different from almost all existing historical research.
Sorry - ‘or otherwise lost’ qualifier was meant to be a catch-all for any way of the investment losing its value, including (bad) value-drift.
I think there’s a decent case for (some) EAs doing better at avoiding this than e.g. typical foundations:
If you have precise values (e.g. classical utilitarianism) then it’s easier to transmit those values across time—you can write your values down clearly as part of the constitution of the foundation, and it’s easier to find and identify younger people to take over the fund who also endorse those values. In contrast, for other foundations, the ultimate aims of the foundation are often not clear, and too dependent on a particular empirical situation (e.g. Benjamin Franklin’s funds were to ‘to provide loans for apprentices to start their businesses’ (!!)).
If you take a lot of time carefully choosing who your successors are (and those people take a lot of time over who their successors are).
Then to reduce appropriation, one could spread the funds across many different countries and different people who share your values. (Again, easier if you endorse a set of values that are legible and non-idiosyncratic.)
It might still be true that the chance of the fund becoming valueless gets large over time (if, e.g. there’s a 1% risk of it losing its value per year), but the size of the resources available also increases exponentially over time in those worlds where it doesn’t lose its value.
Caveat also tricky questions on when ‘value drift’ is a bad thing rather than the future fund owners just having a better understanding of the right thing to do than the founders did, which often seems to be true for long-lasting foundations.
I think you might be misunderstanding what I was referring to. An example of what I mean: Suppose Jane is deciding whether to work for Deepmind on the AI safety team. She’s unsure whether this speeds up or slows down AI development; her credence is imprecise, represented by the interval [0.4, 0.6]. She’s confident, let’s say, that speeding up AI development is bad. Because there’s some precisification of her credences on which taking the job is good, and some on which taking the job is bad, then if she uses a Liberal decision rule (= it is permissible for you to perform any action that is permissible according to at least one of the credence functions in your set), it’s permissible for her to take the job or not take the job.
The issue is that, if you have imprecise credences and a Liberal decision rule, and are a longtermist, then almost all serious contenders for actions are permissible.
So the neartermist would need to have some way of saying (i) we can carve out the definitely-good part of the action, which is better than not-doing the action on all precisifications of the credence; (ii) we can ignore the other parts of the action (e.g. the flow-through effects) that are good on some precisifications and bad on some precisifications. It seems hard to make that theoretically justified, but I think it matches how people actually think, so at least has some common-sense motivation.
But you could do it if you could argue for a pseudodominance principle that says: “If there’s some interval of time t_i over which action x does more expected good than action y on all precisifications of one’s credence function, and there’s no interval of time t_j at which action y does more expected good than action x on all precisifications of one’s credence function, then you should choose x over y”.
(In contrast, it seems you thought I was referring to AI vs some other putative great longtermist intervention. I agree that plausible longtermist rivals to AI and bio are thin on the ground.)
Yeah, I think I messed up this bit. I should have used the harmonic mean rather than the arithmetic mean when averaging over possibilities of how many people will be in the future. Doing this brings the chance of being among the most influential person ever close to the chance of being the most influential person ever in a small-population universe. But then we get the issue that being the most influential person ever in a small-population universe is much less important than being the most influential person in a big-population universe. And it’s only the latter that we care about.
So what I really should have said (in my too-glib argument) is: for simplicity, just assume a high-population future, which are the action-relevant futures if you’re a longtermist. Then take a uniform prior over all times (or all people) in that high-population future. So my claim is: “In the action-relevant worlds, the frequency of ‘most important time’ (or ‘most important person’) is extremely low, and so should be our prior.”
Thanks for these links. I’m not sure if your comment was meant to be a criticism of the argument, though? If so: I’m saying “prior is low, and there is a healthy false positive rate, so don’t have high posterior.” You’re pointing out that there’s a healthy false negative rate too — but that won’t cause me to have a high posterior?
And, if you think that every generation is increasing in influentialness, that’s a good argument for thinking that future generations will be more influential and we should therefore save.
There were a couple of recurring questions, so I’ve addressed them here.
What’s the point of this discussion — isn’t passing on resources to the future too hard to be worth considering? Won’t the money be stolen, or used by people with worse values?
In brief: Yes, losing what you’ve invested is a risk, but (at least for relatively small donors) it’s outweighed by investment returns.
Longer: The concept of ‘influentialness of a time’ is the same as the cost-effectiveness (from a longtermist perspective) of the best opportunities accessible to longtermists at a time. Suppose I think that the best opportunities in, say, 100 years, are as good as the best opportunities now. Then, if I have a small amount of money, then I can get (say) at least a 2% return per year on those funds. But I shouldn’t think that the chance of my funds being appropriated (or otherwise lost) is as high as 2% per year. So the expected amount of good I do is greater by saving.
So if you think that hingeyness (as I’ve defined it) is about the same in 100 years as it is now, or greater, then there’s a strong case for investing for 100 years before spending the money.
(Caveat that once we consider larger amounts of money, diminishing returns for expenditure becomes an issue, and chance of appropriation increases.)
What’s your view on anthropics? Isn’t that relevant here?
I’ve been trying to make claims that aren’t sensitive to tricky issues in anthropic reasoning. The claim that if there are n people, ordered in terms of some relation F (like ‘more important than’), then the claim that the prior probability that you are most F (‘most important’) person is 1/n doesn’t distinguish between anthropic principles, because I’ve already conditioned on the number of people in the world. So I think anthropic principles aren’t directly relevant for the argument I’ve made, though obviously they are relevant more generally.
I don’t think I agree with this, unless one is able to make a comparative claim about the importance (from a longtermist perspective) of these events relative to future events’ importance—which is exactly what I’m questioning.
I do think that weighting earlier generations more heavily is correct, though; I don’t feel that much turns on whether one construes this as prior choice or an update from one’s prior.
Given this, if one had a hyperprior over different possible Beta distributions, shouldn’t 2000 centuries of no event occurring cause one to update quite hard against the (0.5, 0.5) or (1, 1) hyperparameters, and in favour of a prior that was massively skewed towards the per-century probability of no-lock-in-event being very low?
(And noting that, depending exactly on how the proposition is specified, I think we can be very confident that it hasn’t happened yet. E.g. if the proposition under consideration was ‘a values lock-in event occurs such that everyone after this point has the same values’.)
Thanks so much for this very clear response, it was a very satisfying read, and there’s a lot for me to chew on. And thanks for locating the point of disagreement — prior to this post, I would have guessed that the biggest difference between me and some others was on the weight placed on the arguments for the Time of Perils and Value Lock-In views, rather than on the choice of prior. But it seems that that’s not true, and that’s very helpful to know. If so, it suggests (advertisement to the Forum!) that further work on prior-setting in EA contexts is very high-value.
I agree with you that under uncertainty over how to set the prior, because we’re clearly so distinctive in some particular ways (namely, that we’re so early on in civilisation, that the current population is so small, etc), my choice of prior will get washed out by models on which those distinctive features are important; I characterised these as outside-view arguments, but I’d understand if someone wanted to characterise that as prior-setting instead.
I also agree that there’s a strong case for making the prior over persons (or person-years) rather than centuries. In your discussion, you go via number of persons (or person-years) per century to the comparative importance of centuries. What I’d be inclined to do is just change the claim under consideration to: “I am among the (say) 100,000 most influential people ever”. This means we still take into account the fact that, though more populous centuries are more likely to be influential, they are also harder to influence in virtue of their larger population. If we frame the core claim in terms of being among the most influential people, rather than being at the most influential time, the core claim seems even more striking to me. (E.g. a uniform prior over the first 100 billion people would give a prior of 1 in 1 million of being in the 100,000 most influential people ever. Though of course, there would also be an extra outside-view argument for moving from this prior, which is that not many people are trying to influence the long-run future.)
However, I don’t currently feel attracted to your way of setting up the prior. In what follows I’ll just focus on the case of a values lock-in event, and for simplicity I’ll just use the standard Laplacean prior rather than your suggestion of a Jeffreys prior.
In significant part my lack of attraction is because the claims — that (i) there’s a point in time where almost everything about the fate of the universe gets decided; (ii) that point is basically now; (iii) almost no-one sees this apart from us (where ‘us’ is a very small fraction of the world) — seem extraordinary to me, and I feel I need extraordinary evidence in order to have high credence in them. My prior-setting discussion was one way of cashing out why these seem extraordinary. If there’s some way of setting priors such that claims (i)-(iii) aren’t so extraordinary after all, I feel like a rabbit is being pulled out of a hat.
Then I have some specific worries with the Laplacean approach (which I *think* would apply to the Jeffreys prior, too, but I’m yet to figure out what a Fischer information matrix is, so I don’t totally back myself here).
But before I mention the worries, I’ll note that it seems to me that you and I are currently talking about priors over different propositions. You seem to be considering the propositions, ‘there is a lock-in event this century’ or ‘there is an extinction event this century’; I’m considering the proposition ‘I am at the most influential time ever’ or ‘I am one of the most influential people ever.’ As is well-known, when it comes to using principle-of-indifference-esque reasoning, if you use that reasoning over a number of different propositions then you can end up with inconsistent probability assignments. So, at best, one should use such reasoning in a very restricted way.
The reason I like thinking about my proposition (‘are we at the most important time?’ or ‘are we one of the most influential people ever?’) for the restricted principle of indifference, is that:
(i) I know the frequency of occurrence of ‘most influential person’, for each possible total population of civilization (past, present and future). Namely, it occurs once out of the total population. So I can look at each possible population size for the future, look at my credence in each possible population occurring, and in each case know the frequency of being the most influential person (or, more naturally, in the 100,000 most influential people).
(ii) it’s the most relevant proposition for the question of what I should do. (e.g. Perhaps it’s likely that there’s a lock-in event, but we can’t do anything about it and future people could, so we should save for a later date.)
Anyway, the worries about Laplacean (and Jeffreys) prior.
First, the Laplacean prior seems to get the wrong answer for lots of similar predicates. Consider the claims: “I am the most beautiful person ever” or “I am the strongest person ever”, rather than “I am the most important person ever”. If we used the Laplacean prior in the way you suggest for these claims, the first person would assign 50% credence to being the strongest person ever, even if they knew that there was probably going to be billions of people to come. This doesn’t seem right to me.
Second, it also seems very sensitive to our choice of start date. If the proposition under question is, ‘there will be a lock-in event this century’, I’d get a very different prior depending on whether I chose to begin counting from: (i) the dawn of the information age; (ii) the beginning of the industrial revolution; (iii) the start of civilisation; (iv) the origin of homo sapiens; (v) the origin of the genus homo; (vi) the origin of mammals, etc.
Of course, the uniform prior has something similar, but I think it handles the issue gracefully. e.g. On priors, I should think it’s 1 in 5 million likely that I’m the funniest person in Scotland; 1 in 65 million that I’m the funniest person in Britain, 1 in 7.5 billion that I’m the funniest person in the world. Similarly, with whether I’m the most influential person in the post-industrial era, the post-agricultural era, etc.
Third, the Laplacean prior doesn’t add up to 1 across all people. For example, suppose you’re the first person and you know that there will be 3 people. Then, on the Laplacean prior, the total probability for being the most influential person ever is ½ + ½(⅓) + ½(⅔)(¼) = ¾. But I know that someone has to be the most influential person ever. This suggests the Laplacean prior is the wrong prior choice for the proposition I’m considering, whereas the simple frequency approach gets it right.
So even if one feels skeptical of the uniform prior, I think the Laplacean way of prior-setting isn’t a better alternative. In general: I’m sympathetic to having a model where early people are more likely to be more influential, but a model which is uniform over orders of magnitude seems too extreme to me.
(As a final thought: Doesn’t this form of prior-setting also suffer from the problem of there being too many hypotheses? E.g. consider the propositions:
A—There will be a value lock-in event this centuryB—There will be a lock-in of hedonistic utilitarian values this century
C—There will be a lock-in of preference utilitarian values this century
D—There will be a lock-in of Kantian values this century
E—There will be a lock-in of fascist values this century
On the Laplacean approach, these would all get the same probability assignment—which seems inconsistent. And then just by stacking priors over particular lock-in events, we can get a probability that it’s overwhelmingly likely that there’s some lock-in event this century. I’ve put this comment in parentheses, though, as I feel *even less* confident about my worry here than my other worries listed.)
The way I’d think about it is that we should be uncertain about how justifiably confident people can be that they’re at the HoH. If our current credence in HoH is low, then the chance that it might be justifiably much higher in the future should be the significant consideration. At least if we put aside simulation worries, I can imagine evidence which would lead me to have high confidence that I’m at the HoH.
E.g., the prior is (say) 1/million this decade, but if the evidence suggests it is 1%, perhaps we should drop everything to work on it, if we won’t expect our credence to be this high again for another millenia.
I think if that were one’s credences, what you say makes sense. But it seems hard for me to imagine a (realistic) situation where I think that it’s 1% chance of HoH this decade, but I’m confident that the chance will much much lower than that for all of the next 99 decades.
For what it’s worth, my intuition is that pursuing a mixed strategy is best; some people aiming for impact now, in case now is a hinge, and some people aiming for impact in many many years, at some future hinge moment.
So I would say both the population and pre-emption (by earlier stabillization) factors intensely favor earlier eras in per resource hingeyness, constrained by the era having any significant lock-in opportunities and the presence of longtermists.
I think this is a really important comment; I see I didn’t put these considerations into the outside-view arguments, but I should have done as they are make for powerful arguments.
The factors you mention are analogous to the parameters that go into the Ramsey model for discounting: (i) a pure rate of time preference, which can account for risk of pre-emption; (ii) a term to account for there being more (and, presumably, richer) future agents and some sort of diminishing returns as a function of how many future agents (or total resources) there are. Then given uncertainty about these parameters, in the long run the scenarios that dominate the EV calculation are where there’s been no pre-emption and the future population is not that high. e.g. There’s been some great societal catastrophe and we’re rebuilding civilization from just a few million people. If we think the inverse relationship between population size and hingeyness is very strong, then maybe we should be saving for such a possible scenario; that’s the hinge moment.
For the later scenarios here you’re dealing with much larger populations. If the plausibility of important lock-in is similar for solar colonization and intergalactic colonization eras, but the population of the latter is billions of times greater, it doesn’t seem to be at all an option that it could be the most HoH period on a per resource unit basis.
I agree that other things being equal a time with a smaller population (or: smaller total resources) seems likelier to be a more influential time. But ‘doesn’t seem to be at all an option’ seems overstated to me.
Simple case: consider a world where there just aren’t options to influence the very long-run future. (Agents can make short-run perturbations but can’t affect long-run trajectories; some sort of historical determinism is true). Then the most influential time is just when we have the best knowledge of how to turn resources into short-run utility, which is presumably far in the future.
Or, more importantly, where hingeyness is essentially 0 up until a certain point far in the future. If our ability to positively influence the very long-run future were no better than a dart-throwing chimp until we’ve got computers the size of solar systems, then the most influential times would also involve very high populations
More generally, per-resource hingeyness increases with:
Availability of pivotal moments one can influence, and their pivotality
Knowledge / understanding of how to positively influence the long-run future
And hingeyness decreases with:
Level of expenditure on long-term influence
Chance of being pre-empted already
If knowledge or availability of pivotal moments at a time is 0, then hingeyness at the time is 0, and lower populations can’t outweigh that.
I think this overstates the case. Diminishing returns to expenditures in a particular time favor a nonzero disbursement rate (e.g. with logarithmic returns to spending at a given time 10x HoH levels would drive a 10x expenditure for a given period)
Sorry, I wasn’t meaning we should be entirely punting to the future, and in case it’s not clear from my post my actual all-things-considered views is that longtermist EAs should be endorsing a mixed strategy of some significant proportion of effort spent on near-term longtermist activities and some proportion of effort spent on long-term longtermist activities.
I do agree that, at the moment, EA is mainly investing (e.g. because of Open Phil and because of human capital and because much actual expenditure is field-building-y, as you say). But it seems like at the moment that’s primarily because of management constraints and weirdness of borrowing-to-give (etc), rather than a principled plan to spread giving out over some (possibly very long) time period. Certainly the vibe in the air is ‘expenditure (of money or labour) now is super important, we should really be focusing on that’.
(I also don’t think that diminishing returns is entirely true: there are fixed costs and economies of scale when trying to do most things in the world, so I expect s-curves in general. If so, that would favour a lumpier disbursement schedule.)
I would note that the creation of numerous simulations of HoH-type periods doesn’t reduce the total impact of the actual HoH folk
Agree that it might well be that even though one has a very low credence in HoH, one should still act in the same way. (e.g. because if one is not at HoH, one is a sim, and your actions don’t have much impact).
The sim-arg could still cause you to change your actions, though. It’s somewhat plausible to me, for example, that the chance of being a sim if you’re at the very most momentous time is 1000x higher than the chance of being a sim if you’re at the 20th most hingey time, but the most hingey time is not 1000x more hingey than the 20th most hingey time. In which case the hypothesis that you’re at the 20th most hingey time has a greater relative importance than it had before.
I agree we are learning more about how to effectively exert resources to affect the future, but if your definition is concerned with the effect of a marginal increment of resources (rather than the total capacity of an era), then you need to wrestle with the issue of diminishing returns.
I agree with this, though if we’re unsure about how many resources will be put towards longtermist causes in the future, then the expected value of saving will come to be dominated by the scenario where very few resources are devoted to it. (As happens in the Ramsey model for discounting if one includes uncertainty over future growth rates and the possibility of catastrophe.) This considerations gets stronger if one thinks the diminishing marginal returns curve is very steep.
E.g. perhaps in 150 years’ time, EA and Open Phil and longtermist concern will be dust; in which case those who saved for the future (and ensured that there would be at least some sufficiently likeminded people to pass their resources onto) will have an outsized return. And perhaps returns diminish really steeply, so that what matters is guaranteeing that there are at least some longtermists around. If the outsized return in this scenario if large enough, then even a low probability of this scenario might be the dominant consideration.
Founding fields like AI safety or population ethics is much better on a per capita basis than expanding them by 1% after they have developed more.
Strongly agree, though by induction it seems we should think there will be more such fields in the future.
The longtermist of 1600 would indeed have mostly ‘invested’ in building a movement and eventually in things like financial assets when movement-building returns fell below financial returns, but they also should have made concrete interventions like causing the leveraged growth of institutions like science and the Enlightenment that looked to have a fair chance of contributing to HoH scenarios over the coming centuries, and those could have paid off.
You might think the counterfactual is unfair here, but I wouldn’t regard it as accessible to someone in 1600 to know that they could make contributions to science and the Enlightenment as a good way of influencing the long-run future.
This is analogous to the general point in financial markets that assets classes with systematically high returns only have them before those returns are widely agreed on to be valuable and accessible...
A world in which everyone has shared correct values and strong knowledge of how to improve things is one in which marginal longtermist resources are gilding the lily.
Though if we’re really clueless right now (perhaps not much better than the person in 1600) then perhaps that’s the best we can do.
And it would seem that the really high-value scenario is where (i) knowledge is very high but (ii) concern for the very long-run future is very low (but not nonexistent, allowing for resources to be passed onto those times.)
In terms of the financial analogy, that would be like how someone with strange preferences, who gets extraordinary utility from eating bread and potatoes, gets a much higher return (when measured in utility gained) from a regular salary than other people would.
And in general I’m more inclined to believe stories of us having extraordinary impact if that primarily results from a difference in what we care about compared with others, rather than from having greater insight.
I will say, though: the argument “we’re at an unusual period where longtermist (/impartial consequentialish) concern is very low but not nonexistent” as a reason for now being a particularly influential time seems pretty good to me, and wasn’t one that I included in my list of arguments in favour of HoH.
To talk about what they would have been one needs to consider a counterfactual in which we anachronistically introduce at least some minimal version of longtermist altruism, and what one includes in that intervention will affect the result one extracts from the exercise.
I agree there’s a tricky issue of how exactly one constructs the counterfactual. The definition I’m using is trying to get it as close as possible to a counterfactual we really face: how much to spend now vs how much to pass resources onto future altruists. I’d be interested if others thought of very different approaches. It’s possible that I’m trying to pack too much into the concept of ‘most influential’, or that this concept should be kept separate from the idea of moving resources around to different times.
I feel that involving the anachronistic insertion of a longtermist altruist into the past, if anything, makes my argument harder to make, though. If I can’t guarantee that the past person I’m giving resources to would even be a longtermist, that makes me less inclined to give them resources. And if I include the possibility that longtermism might be wrong and that the future-person that I pass resources onto will recognise this, that’s (at least some) argument to me in favour of passing on resources. (Caveat subjectivist meta-ethics, possibility of future people’s morality going wayward, etc.)
I would dispute this. Possibilities of AGI and global disaster were discussed by pioneers like Turing, von Neumann, Good, Minsky and others from the founding of the field of AI.
Thanks, I’ve updated on this since writing the post and think my original claim was at least too strong, and probably just wrong. I don’t currently have a good sense of, say, if I were living in the 1950s, how likely I would be to figure out AI as the thing, rather than focus on something else that turned out not to be as important (e.g. the focus on nanotech by the Foresight Institute (a group of idealistic futurists) in the late 80s could be a relevant example).