I am a Research Fellow at Forethought. Before that I was an analyst at Longview Philanthropy, a Research Scholar at FHI, and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
I am a Research Fellow at Forethought. Before that I was an analyst at Longview Philanthropy, a Research Scholar at FHI, and assistant to Toby Ord. Philosophy at Cambridge before that.
I also do a podcast about EA called Hear This Idea.
Thanks James!
What advice would have for someone who wanted to work there as a researcher?
Some things I appreciate in my colleagues: having some discernment for which questions or ideas are most important, rather than just conceptually interesting but not urgent; being able to contribute to group conversations by driving at cruxes, being willing to ask naive questions and avoiding the impulse to sound clever for the sake of it, and being able to spot and entertain “big if true” hypotheses; and being able to clearly communicate ideas where you often don’t have an especially deep literature to draw on.
Do you think it’s important to have a strong understanding of how LLMs work?
I think it’s important to understand how AI works in the fundamentals; including some of the theory. I don’t think it’s important to have deep technical knowledge of LLMs, unless you can think why those details could end up being relevant for macrostrategy.
On your second question, many of those points seem good to me. I’ll single out “Locking-in one’s values” since I’ve been thinking about it recently. It seems to me that some people roughly think that great futures are futures which resemble our own (or which carry on our values) in many particular ways. In particular, maybe great futures are futures which are recognisably human in their values. Inhuman futures, like futures where AI successors call the shots, might just seem empty of what we today care about; even if they involve a lot of moral reflection and nothing morally offensive from a human perspective. We could call this a “humanity forever” view.
On the other hand, some people roughly think that great futures are necessarily futures which are radically different from humanity today, including in the values which guide it, and perhaps the kind of actors living there. See Dan Faggella on the “Worthy Successor” idea (and here), which I see as one version of this view.
Both these views care about preventing obvious catastrophes from AGI, but it seems to me like they might end up disagreeing quite profoundly on what should come next. It’s possible that there is opportunity for trade and compromise between the two views, but in any case this strikes me as a potentially important difference in approach to post-AGI futures.
To me, the ideas presented in the series (strong self-modification, modification of descendants, selection of beliefs by evolutionary pressures) indicate that we should expect future humans to be very different from us, and that, as a result, we should expect the future to be neutral in expectation.
Firstly, you’re right that the series doesn’t discuss negative futures, but I should say that’s not because Will or I think they are worth ignoring, or very unlikely in absolute terms. We didn’t discuss them more just so we could make a more focused argument about how to think about making good futures even better.
I think your point (quoted) touches on the difference I mentioned above between “humanity forever” views and views which are more open to change in values. I think it’s coherent to take a view such as:
You want to value whatever is ultimately valuable. You’re unsure what that is, but you trust the processes which guide the future to converge on it;
You want to value whatever you would value under some idealised process of reflection, and you think the processes which guide the future will emulate idealised reflection on your own values closely enough;
You value roughly what you currently value. But you’re scope-insensitive: in order to think we’ve reached a great future, you just need your neck of the woods to be how you want it, and the rest of the future to avoid things you think are morally repugnant. You expect almost the entire future not to be guided by what you value; but you’re confident you can get the things you want to be satisfied the future is great, and you’re confident the rest of the future will avoid the morally repugnant (perhaps through trade)
Similar to above, but what you personally value is cheap by the lights of other value systems which guide the future, and vice-versa. So you are confident you can secure a great future by your lights through trade.
Better Futures argues that these views may be less tenable than they first appear, but I think they’re not totally doomed.
Additionally I would point out a potential “missing mood” in the framing we adopt of cardinally quantifying the value of the future in terms of a fraction of the value of the best feasible future. This suggests futures which are only, say, 10E-5 the value of the best feasible future are barren, hollow, ‘neutral’. But this would be a mistake: potentially our own world, even with all the harm and pain removed, is achieving a tiny fraction of what a great future could achieve. So we might imagine (as Better Futures points out) a “common-sense eutopia” which is radically better than the world today, but still only a fraction as good as things could get. That could be true, but it doesn’t undermine the value of such a future, which would also truly be (by stipulation) wildly better than the world today! All the joy and freedom and discovery and so on, in this near-zero world, would be entirely real and could dwarf all the good we have achieved and enjoyed so far.
Currently, there are a wide range of ideas about how a post-AGI future will go and what features it will contain. To me, this strongly indicates that we should expect the post-AGI future could go in a very broad range of ways and that we should prepare for the many different ways it could go.
Maybe I’m misreading but I don’t think it follows from uncertainty about how things go that many different things will actually happen. For example, if you’re uncertain who wins a political election, you don’t infer that everyone wins and shares power.
At the same time, I get the sense that Forethought has a very specific vision about how a post-AGI future will go (there will be an intelligence explosion, tools for epistemics will be beneficial, we might begin acquiring resources in other solar systems, small sets of actors could use AGI in malicious ways.) I’m wondering how do you decide what ideas you think are likely, and do you guys have any measures in place to ensure you’re receiving criticism of your ideas so you don’t create an epistemic bubble?
I’m in a few minds about this, so I’ll just list some reactions:
You say the Forethought vision is “very specific”, and then you list some claims (e.g. “small sets of actors could use AGI in malicious ways”) which seem… surprisingly anodyne? Or in particular it doesn’t strike me as egregious or unusual to put a decent amount of credence in those claims being true. I think that’s all you need to take them seriously and work on them. Indeed I don’t myself feel extremely confident in any one of them.
I think there is a way to do criticism in a performative way, where you invite people you know to disagree, for reasons you are already familiar with. I don’t think that is totally useless, because performing these dialogues in public can be useful for other people to decide what they think.
On the other hand, I think the best kind of outside criticism for the sake of throwing out bad ideas often isn’t very flashy, and can look like outside experts telling you “this isn’t really how [my domain of expertise works], so [ABC] seems confused but [XYZ] seems plausible”.
From my perspective there is quite a lot of internal disagreement, including between broad worldviews, although that’s relative.
Speaking personally, I worry a bit that there are components of the implicit shared Forethought worldview which are tricky to pin down from the outside, and thus more likely to influence research decisions in an unscrutinised way. I do think this is a generic problem, and think the most useful place from which to notice and communicate these implicit beliefs is straddling being enough of an insider to have context, and enough of an outsider to see alternatives.
On the other hand I think you do at some point just need to pick some assumptions and some worldview and work within it to make any progress at all. In my experience simply pointing out that those assumptions could be wrong is often less valuable than proposing more fleshed-out alternative assumptions and worldviews, which themselves can be criticised and so on…
I’m wondering, if you think it’s valuable for additional people to work in the field, why do you think this?
We are at Forethought, running a research programme on space right now, which I guess reflects a view that it does seem worth investigating more. I don’t think the central case for space runs through the hope for binding international treaties because I agree that we shouldn’t expect them to hold. I think there are a few other reasons to want to investigate space. One is that the space economy could be somewhat relevant for the course of AGI development, for example if orbital data centres are a big deal, or because of the role of sensing satellites in peace and security.
Another is that most of the physical stuff is in space. At some point it seems likely to me, if the human project continues at all, that most of the important stuff will also eventually be in space. AGI + automated manufacturing + rapid R&D progress suggests that expanding into space could happen in the time span of decades rather than centuries or millennia; and that seems generically worth planning for. And it seems like there are some policy levers which don’t root through international treaties.
To be clear I don’t currently think that space governance should be the next big cause in EA or anything like that.
It seems like longtermism is an unhelpful idea since it requires people to believe that our actions could persist for millions of years.
This feels like a slightly odd sentence construction, because you seem to be saying that longtermism is unhelpful because it requires people to believe one of its central claims. I agree it’s contentious and I’m certainly not confident that the effects of our actions could persist for millions of years but it seems plausible enough that the anticipated long-term effects of our actions should meaningfully weigh into what we prioritise, at least where you can tell a story about how your decisions could have some systematic long-run effects.
It also seems like the idea has been somewhat harmful to EA as a movement since people can always point out that some of the founders of the movement are focused on helping people millions of years from now, which sounds pretty crazy.
I do think that is plausible. Although, to state the obvious, there is a difference between which ideas have good or bad PR effects when you say them out loud, and which ideas are actually true or important. So questions about communicating longtermist ideas are, naturally, different from the question of whether longtermist ideas are worth taking seriously as ideas.
And then, I also want to say: the full-on version of longtermism — that the very long-run effects of our actions are overwhelmingly important for what we prioritise — just doesn’t feel especially necessary for working on most or even all of the topics that Forethought is focused on. There is a far more common-sense and mundane reason to focus on them, which is that they could matter enormously within our own lifetimes! Another way of putting that is that when trying to prioritise between possible focuses within Forethought, my personal view is that longtermism is rarely a crux. Maybe my colleagues disagree with that; obviously I’m not speaking on their behalf.
In “How To Make The Future Better,” MacAskill argues that we should make AIs encourage humans to be good people and use them as a source of moral reflection. This seems like it could be deeply problematic in case moral sense theory is true, but AIs lack a moral sense. Do you agree with this?
I’m not sure I’m entirely following your points but I don’t see a strong reason why AIs or non-human entities could not in principle engage in genuine moral reasoning in the same way that humans do. Maybe instead the AIs will do something which superficially resembles real moral reasoning, but which is closer to just telling humans what they want to hear.
I do think that is not a crazy thing to worry about because it is much easier to train some skill where an uncontroversial and abundant source of ground truth data exists. Moral reasoning is not one of those domains because people often don’t agree on what good moral reasoning looks like. So I think there is much work to be done on that front although I’m not sure that answers your question.
Thanks again for your questions!
A point I’m skeptical on is that trying to preserve key information is likely to make much of a difference. I find it hard to imagine civilisation floundering after a catastrophic setback because it lacked the key insights we’d achieved so far about how to recover tech and do AI alignment and stuff.
On timescale and storage media, I’d guess we’re talking about less than a century to recover back (since you’re assuming a setback in tech progress of 100 years). That’s enough time for hard drives to keep working, especially specialist hardware designed to survive in extreme conditions or be ultra-reliable. We also have books, which are very cheap to make and easy to read.
On AI specifically, my sense is that the most important algorithmic insights are really very compressable — they could fit into a small book, if you’re prepared to do a lot of grunt work figuring out how to implement them.
We also have the ability to rebuild institutions while being able to see how previous attempts failed or succeeded, effectively getting ‘unstuck’ from sticky incentives which maintain the existing institutional order. Which is one factor suggesting a re-roll wouldn’t be so bad.
Elaborating on the last paragraph: when considering the value of the set-back society, we’re conditioning on the fact that it got set back. On one hand, (as you say) this could be evidence that society was (up to the point of catastrophe) more liberal and decentralised than it could have been, since many global catastrophes are less likely to occur under the control of a world government. Since I think the future looks brighter if society is more liberal on the dawn of AGI, then I think that’s evidence the current “run” is worth preserving over the next roll we’d get; even if we’re absolutely confident civilisation would survive another run after being set back (assuming a catastrophe would re-roll the dice on how well things are going). That’s not saying anything about whether the world is currently looking surprisingly liberal — just that interventions to prevent pre-AGI catastrophes plausibly move probability mass from liberal/decentralised civilisations to illberal/centralised ones. And maybe that’s the main effect of preventing pre-AGI catastrophes.
Thanks for writing this! Some personal thoughts:
I have a fair amount of research latitude, and I’m working at an org with a broad and flexible remit to try to identify and work on the most important questions. This makes the Hamming question — what are the most important questions in your field and why aren’t you working on them — hard to avoid! This is uncomfortable, because if you don’t feel like you’re doing useful work, you’re out of excuses. But it’s also very motivating.
There is an ‘agenda’ in the sense that there’s a list of questions and directions with some consensus that someone at Forethought should work on them. But there’s a palpable sense that the bottleneck to progress isn’t just more researchers to shlep on with writing up ideas, so much as more people with crisp, opinionated takes about what’s important, who can defend their views in good faith.
One possible drawback is that Forethought is not a place where you learn well-scoped skills or knowledge by default, because as a researcher you are not being trained for a particular career track (like junior → senior SWE) or taught a course (like doing a PhD). But there is support and time for self-directed learning, and I’ve learned a lot of tacit knowledge about how to do this kind of research especially from the more senior researchers.
I would personally appreciate people applying with research or industry expertise in fields like law, law and economics, physics, polsci, and ML itself. You should not hold off on applying because you don’t feel like you belong to the LessWrong/EA/AI safety sphere, and I’m worried that Forethought culture becomes too insular in that respect (currently it’s not much of a concern).
If you’re considering applying, I recorded a podcast with Mia Taylor, who recently joined as a researcher!
Yeah, fair! I guess there’s a broad understanding of utilitarianism, which is “the sum of any monotone or non-decreasing transformation of utilities”, and a narrower understanding, which is “the sum of utilities”. But I want to say that prioritarianism (a version of the former) is an alternative to utilitarianism, not a variant. Not actually sure what prioritarians would say. Also not really an important point to argue about.
Glad to have highlighted the cardinality point!
And that meant we could hammer out more of the details in the comments, which seems appropriate.
Agree! As I say, I feel much clearer now on your position.
Thanks for all this! I agree that something like Nash is appealing for a bunch of reasons. Not least because it’s Pareto efficient, so doesn’t screw people over for the greater good, which feels more politically legitimate. It is also principled, in that it doesn’t require some social planner to decide how to weigh people’s preferences or wellbeing.
My sense, though you know much more about all this, is that Nash bargaining is not well described as a variant of utilitarianism, though I case it’s a grey area.
Maybe I’m realising now that a lot of the action in your argument is not in arguing for the values which guide the future to be democratically chosen, but rather in thikning through which kinds of democratic mechanisms are best. Where plain old majority rule seems very unappealing, but more granular approaches which give more weight to those who care most about a given issue look much better. And (here we agree) this is especially important if you think that the wrong kind of popular future, such as a homogenous majority-determined future, could fall far short of the best future.
I really am thinking this as more like “what values should we be aiming at to guide the future” and being fairly agnostic on mechanism
Thanks, super useful! And makes me much more clear on the argument / sympathetic to it.
Of course, all the labs say things about democratic inputs into AI model specs and are trying stuff to this end, and these could be deeper or shallow.
Ah, that’s a good point! And then I guess both of as are in some kind of agreement that this kind of stuff (deliberate structured initatives to inject some democracy into the models) ends up majorly determining outcomes from AGI.
>Re: what deep democracy, I like Nash bargaining!
I find this somewhat confusing, because elsewhere you say the kind of deep democracy you are imagining “is basically equivalent to enlightened preference utilitarianism”. But the Nash solution is basically never the utility-maximising solution! You can even set things up so that the Nash solution is an arbitrarily small fraction, in terms of total welfare, between the disagreement point and the utility-maxising optimum. I do think there is an interesting and important question of how good, in practice, the Nash outcome is compared to the utility-maximing outcome, maybe it’s great ¯\_(ツ)_/¯
Thanks for the post! Sure seems like an important topic. Unnecessarily grumpy thoughts to follow —
First, I feel unsure what you’re imagining this ‘deep democracy’ thing is an answer to. You write, “If you want the long-term future to go well by the lights of a certain value function[…]” Who is “you” here?
An individual? If so, it seems clear to me that the best way to bring about things I care about, holding fixed others’ strategies, is indeed to get my AI(s) to pursue things I care about, and clearly not best or desirable from any almost individual’s point of view (certainly not an equilibrium) to get their AI(s) to pursue some kind of democratic conglomeration of everybody’s interests.
An AI company? If so, I imagine (at least by default) that the companies are just trying to serve demand with more and better models, complying with regulation, and maybe proactively adding in side constraints and various revenue-sacrificing behaviours for prosocial or reputational reasons. It’s not obvious to me that there comes a point where the executives sit down to discuss: “what set of values should we align AI to?”, where the candidate answers are various (pseudo-)moral views.
In the case where we’re considering a company considering the target of alignment, I wonder if my confusion comes from starting from different assumptions about how ‘unitary’ their product is. One outcome, which I currently view as a default, is just an extrapolation of the status quo today: companies train base models, then those base models are post-trained and broken up into various specialised variants, which can in turn be fine-tuned for specific use cases, and all eventually served for a million different uses. On this picture, it’s not clear how much influence an AI company can have on the moral vision of the models they ultimately serve. The main reason is just that the vast majority of what these AIs are doing, in my mind, are just helpful or economically useful tasks based on (direct or delegated) specific human instructions, not following a grand impartial moral plan. And if the models are too eager to break ranks and pursue an abstract moral vision, people won’t use them. This is what runs through my head when people talk about “the AGI” — what is that?
Of course, there are some reasons for thinking the AI landscape will be more unitary, and this picture could be wrong. Maybe a corporate monopoly, maybe a centralised (state-led) project, maybe a coup. Let’s consider the extreme case where “you” are a lab exec, you hold total power over the world through some single alignable AI system, and you face the decision of what to tell it to do. Here I’d zoom in on the part where you say it would be “uncooperative, undemocratic, coercive” to implement your values. One rejoinder here is to make the point that, AI aside, you should (i) be morally uncertain, (ii) interested in figuring out what’s good through deliberation, and (iii) care about other people. So if the hegemon-leader had a reasonable moral view, directly implementing it through an AI hegemon doesn’t strike me as obviously worse in expectation than ceding influence. If the hegemon-leader has a moral view which is obviously bad, then I don’t think it’s very interesting that a democratic thing seems better.
In any case, I agree that the main reasons against the hegemon-leader directly implementing their vision of what’s good. But (at a gut level) this is, as you say, because it would be illegitimate, uncooperative, etc., not to mention practically likely to fail (we haven’t tested global totalitarianism, but most new political experiments fail). And I think democractic-but-hegemonic AI probably fares pretty badly by those lights, too, compared to actually just ceding power or not becoming a hegemon in the first place?
I do feel like I’m being unfair or missing something here. Maybe another reading is: look, anarchy is bad, especially when everyone has an army of AIs to carry out their bidding, and everyone is rich enough to start caring about scary, scope-sensitive, ideologically-motivated outcomes. The result is a bunch of winner-take-all conflict, and general destructive competition. So we need some governance system (national and/or global?) which curbs this destruction, but also does the best job possible aggregating people’s values. And this is what “deep democracy” should be doing.
The part of this I agree with is that, as far as we have voting systems, they could be much improved post-AGI, in a million ways. Thumbs up to people imagining what those tools and approaches could look like, to make democratic political procedures more flexible, effective, rich in information, great at finding win-win compromises, and so on.
But there’s a part that feels underspecified, and a part that I’m more sceptical of. The part that feels underspecified is what “deep democracy” actually is. The way you’ve phrased it, and I’m being a bit unfair here, is close to being good by definition (“deeply capturing and being responsive to every single person’s values” — I mean, sure!) I expect this is one of those cases where, once forced to actually specify the system, you make salient the fact that any particular system has to make tradeoffs (cf Arrow, though that’s a bit overblown).
The part I’m more sceptical of is that the anarchic alternative is chaotic and destructive, and the best way to aggregate preferencesis via setting up some centralised monopoly on force, and figuring out what centralised process it follows. Consider the international ~anarchy. War is a really unattractive option, even for neighboring expansionary states, so in theory (and often in practice) compromise is virtually always preferred to war. And that’s the hard (fully anarchic) case — smaller-scale conflict is avoided because of criminal and civil laws which make it very not worth it.
Finally, I’d suggest that, in a sense, we have a way to allocate resources and efforts towards what people want in a granular, deep, preference-aggregating way: trade. My sense is to think about this as (even today) the main means by which society is arranged to make everyone better-off; and then consider cases where centralised processes (like voting) are necessary or valuable. One example, which seems potentially very important, is if something like “moral worth” becomes even more divorced from wealth. Of course in designing a democratic process you don’t have perfectly neutral ground to stand on; you have to make a call on who gets to partake. But you can give more of a voice to people who are otherwise practically totally disenfranchised because they lack resources; while the ultra-wealthy otherwise would dominate outcomes. That’s already an issue and could become a bigger issue, but does suggest an alternative answer to the question you’re asking, which is (potentially a lot of) wealth redistribution.
If value is a power law or similarly distributed, then you have a high chance of at least capturing some of the stuff that is astronomically more valuable than everything else, rather than losing out on this stuff entirely.
Of course there are a bunch of methods in social choice which do this, like quadratic voting; though it’s notable that most electoral democracies are not good examples, and anything like “go with the majority while protecting the rights of the minority” seems apt to highly underrate cases where some voters think that a particular issue is astronimically higher in stakes than others think. But this is also a case where I don’t think there’s an especially neutral, non-theory-laden approach for how to recognise how and when to give people’s views more weight because (intuitively) they think some issue is higher stakes. Then again, I think this is a general problem, not a specific issue with designing democratic methods.
Ok sorry for the chaotically written thoughts, I think they probably understate how much I’m a fan of this line of thinking. And your comments in reply to this were clarifying and made me realise I was in fact a bit confused on a couple points. Thanks again for writing.
as people get richer and happier and wiser and so forth, they just have more time and interest and mental/emotional capacity to think carefully about ethics and act accordingly
I think that’s right as far as it goes, but it’s worth considering the limiting behaviour with wealth, which is (as the ultra-wealthy show) presumably that the activity of thinking about ethics still competes with other “leisure time” activities, and behaviour is still sensitive to non-moral incentives like through social competition. But also note that the ways in which it becomes cheaper to help others as society gets richer are going to tend to be the ways in which it becomes cheaper for others to help themselves (or ways in which people’s lives just get better without much altruism). That’s not always true, like in the case of animals.
Good point about persuasion. I guess one way of saying that back, is that (i) if the “right” or just “less bad” moral views are on average the most persuasive views, and (ii) at least some people are generating them, then they will win out. One worry is that (i) isn’t true, because other bad views are more memetically fit, even in a society of people with access to very good abstract reasoning abilities.
Thanks for the kind words, Rafael!
I can see how the island analogy is confusing:
It suggests that the task of society is to reach some very particular kind of state(s), and otherwise it (presumably) flounders.
One way that’s inaccurate is that you can ask how well things are going on the ship, before it reaches the island, or after it misses the island.
It’s also unclear that the landscape of possible futures looks like a relatively discrete region of “success” and its complement, a region of failure.
Finally, the value of the expedition, whether or not it reaches the island, might depend on the course which the ship took. In jargon, the value of society at a time might be stateful with respect to its history, or the value over time might not be time-separable. In words, it’s about the journey…
So if that’s what you had in mind, I agree.
And then as I read you, the worry is something like: “Better Futures argues that great futures are likely to be difficult to reach. So even if great futures are somehow many times better than mediocre futures, shouldn’t we plan to make mediocre futures slightly better (or less bad), rather than throw a Hail Mary at the best futures?”
I wonder if there are two versions of this worry. The first might be: “the strategy which maximises the chance of passing some (or any) ‘great’ threshold of value is meaningfully worse, in expected value terms, than other strategies”. In particular it could be that the max(p(great future)) strategy involves doing common-sensically bad stuff with a slim chance of paying off, and which otherwise does harm. For example, strategies which involve massively centralising power and then hoping that whoever holds all the power makes the right decisions.
I think some version of this worry is decently plausible. But I don’t think anyone thinks that you absolutely should take whatever course maximises the chance of a great future. Rather, I think the rough idea in Better Futures is that, as a heuristic, it seems generally worthwhile to choose actions in light of whether they make great futures more likely. This is similar to Bostrom’s “maxipok” principle, which Will and Guive Assadi argue against here. But both principles are derived from asking “what heuristic seems like it does a good job at approximating trying to take the max-EV option in a more granular way”; rather than suggesting a direct alternative. So the question is whether it’s a good approximation of max-EV, rather than a good alternative to it.
The second version of the worry is something like: “on the Better Futures worldview, the max-EV strategy may well involve a Hail Mary with a slim chance of a great future, and otherwise a very likely outcome which is mediocre or bad. And this seems fanatical, or otherwise wrong.”
Here again I would be pretty sympathetic, and it might be useful to distinguish which actions are EV maximising, from which actions are right. If you are a consequentialist (with caveats and asterisks) you think the right action is the EV-maximising one. But we don’t try to argue that you should think right action is always EV-maximising and vice-versa (and IIRC there is a footnote trying to make this clear-ish). As with other cases of fanaticism, you could think that the plan which results in the most expected value is not the right plan!
Practically speaking, my hope is that aiming for truly great futures, and just trying to improve incrementally on ‘default’ futures, recommend quite similar and compatible courses. For example, it seems like power concentration looks pretty bad for both, in practice.