Re “Oxford EAs”—Toby Ord is presumably a paradigm of that. In the Great AI Timelines Scare of 2017, I spent some time looking into timelines. His median, then, was 15 years, which has held up pretty well. (And his x-risk probability, as stated in the Precipice, was 10%.)
I think I was wrong in my views on timelines then. But people shouldn’t assume I’m a stand-in for the views of “Oxford EAs”.
William_MacAskill
I agree—this is a great point. Thanks, Simon!
You are right that the magnitude of rerun risk from alignment should be lower than the probability of misaligned AI doom. However, in worlds in which AI takeover is very likely but that we can’t change that, or in worlds where it’s very unlikely and we can’t change that, those aren’t the interesting worlds, from the perspective of taking action. (Owen and Fin have a post on this topic that should be coming out fairly soon). So, if we’re taking this consideration into account, this should also discount the value of word to reduce misalignment risk today, too.
(Another upshot: bio-risk seems more like chance than uncertainty, so biorisk becomes comparatively more important than you’d think before this consideration.)
Okay, looking at the spectrum again, it still seems to me like I’ve labelled them correctly? Maybe I’m missing something. It’s optimistic if we can retain a knowledge of how to align AGI because then we can just use that knowledge later and we don’t face the same magnitude of risk of the misaligned AI.
I agree with this. One way of seeing that is how many doublings of energy consumption civilisation can have before it needs to move beyond the solar system? The answer to that is about 40 doublings. Which, depending on your views on just how fast explosive industrial expansion goes, could be a pretty long time, e.g. decades.
I do think that non-existential level catastrophes are a big deal even despite the rerun risk consideration, because I expect the civilisation that comes back from such a catastrophe to be on a worse values trajectory than the one we have today. In particular, because the world today is unusually democratic and liberal, and I expect a re-roll of history to result in less democracy than we have today at the current technological level. However, other people have pushed me on that, and I don’t feel like the case here is very strong. There are also obvious reasons why one might be biassed towards having that view.
In contrast, the problem of having to rerun the time of perils is very crisp. It doesn’t seem to me like a disputable upshot at the moment, which puts it in a different category of consideration at least — one that everybody should be on board with.
I’m also genuinely unsure whether non-existential level catastrophe increases or decreases the chance of future existential level catastrophes. One argument that people have made that I don’t put that much stock in is that future generations after the catastrophe would remember it and therefore be more likely to take action to reduce future catastrophes. I don’t find that compelling because I don’t think that the Spanish flu made us more prepared against Covid-19, for example. Let alone that the plagues of Justinian prepared us against Covid-19. However, I’m not seeing other strong arguments in this vein, either.
One more clarification to the comment for forum users: I have tendonitis, and so I’m voice dictating all of my comments, so they might read oddly!
Thanks, that’s a good catch. Really, in the simple model the relevant point of time for the first run should be when the alignment challenge has been solved, even for superintelligence. But that’s before ’reasonably good global governance”.
Of course, there’s an issue that this is trying to model alignment as a binary thing for simplicity, even though really if a catastrophe came when half of the alignment challenge had been solved, that would still be a really big deal for similar reasons to the paper.One additional comment is that this sort of “concepts moving around issue” is one of the things that I’ve found most annoying from AI, and where it happens quite a lot. You need to try and uproot these issues from the text, and this was a case of me missing it.
I think not at all a dumb idea, and I talk about this in Section 6.4. It actually feels like an activity you could do at very low cost that might have very high value per unit cost.
The second way in which this post is an experiment is that it’s an example of what I’ve been calling AI-enhanced writing. The experiment here is to see how much more productive I can be in the research and writing process by relying very heavily on AI assistance — Ttrying to use AI rather than myself wherever I can possibly do so. In this case, I went from having the basic idea to having this draft in about a day of work.
I’d be very interested in people’s comments on how apparent it is that AI was used so extensively in drafting this piece — in particular if there are examples of AI slop that you can find in the text and that I missed.
The first way in which this post is an experiment is that it’s work-in-progress that I’m presenting at a Forethought Research progress meeting. The experiment is just to publish it as a draft and then have the comments that I would normally receive as GoogleDoc comments on this forum post instead. The hope is that by doing this more people can get up to speed with Forethought research earlier than they would have and we can also get more feedback and thoughts at an earlier stage from a wider diversity of people.
I’d welcome takes from Forumites on how valuable or not this was.
I, of course, agree!
One additional point, as I’m sure you know, is that potentially you can also affect P(things go really well | AI takeover). And actions to increase ΔP(things go really well | AI takeover) might be quite similar to actions that increase ΔP(things go really well | no AI takeover). If so, that’s an additional argument for those actions compared to affecting ΔP(no AI takeover).
Re the formal breakdown, people sometimes miss the BF supplement here which goes into this in a bit more depth. And here’s an excerpt from a forthcoming paper, “Beyond Existential Risk”, in the context of more precisely defining the “Maxipok” principle. What it gives is very similar to your breakdown, and you might find some of the terms in here useful (apologies that some of the formatting is messed up):
”An action x’s overall impact (ΔEVx) is its increase in expected value relative to baseline. We’ll let C refer to the state of existential catastrophe, and b refer to the baseline action. We’ll define, for any action x: Px=P[¬C | x] and Kx=E[V |¬C, x]. We can then break overall impact down as follows:ΔEVx = (Px – Pb) Kb+ Px(Kx– Kb)
We call (Px – Pb) Kb the action’s existential impact and Px(Kx– Kb) the action’s trajectory impact. An action’s existential impact is the portion of its expected value (relative to baseline) that comes from changing the probability of existential catastrophe; an action’s trajectory impact is the portion of its expected value that comes from changing the value of the world conditional on no existential catastrophe occurring.
We can illustrate this graphically, where the areas in the graph represent overall expected value, relative to a scenario with a guarantee of catastrophe:
With these in hand, we can then define:Maxipok (precisified): In the decision situations that are highest-stakes with respect to the longterm future, if an action is near‑best on overall impact, then it is close-to-near‑best on existential impact.
[1] Here’s the derivation. Given the law of total expectation:E[V|x] = P(¬C | x)E[V |¬C, x] + P(C | x)E[V |C, x]
To simplify things (in a way that doesn’t affect our overall argument, and bearing in mind that the “0” is arbitrary), we assume that E[V |C, x] = 0, for all x, so:
E[V|x] = P(¬C | x)E[V |¬C, x]
And, by our definition of the terms:
P(¬C | x)E[V |¬C, x] = PxKx
So:
ΔEVx= E[V|x] – E[V|b] = PxKx – PbKb
Then adding (PxKb – PxKb) to this and rearranging gives us:
ΔEVx = (Px–Pb)Kb + Px(Kx–Kb)”
Thanks! I agree strongly with that.
(Also, thank you for doing this analysis, it’s great stuff!)
😢
Rutger Bregman isn’t on the Forum, but sent me this message and gave me permission to share:
Great piece! I strongly agree with your point about PR. EA should just be EA, like the Quakers just had to be Quakers and Peter Singer should just be Peter Singer.
Of course EA had to learn big lessons from the FTX saga. But those were moral and practical lessons so that the movement could be proud of itself again. Not PR-lessons. The best people are drawn to EA not because it’s the coolest thing on campus, but because it’s a magnet for the most morally serious + the smartest people.
As you know, I think EA is at it’s best when it’s really effective altruism (“I deeply care about all the bad stuff in the world, desperately want to make it difference, so I gotta think really fcking hard about how I can make the biggest possible difference”) and not altruistic rationalism (“I’m super smart, and I might as well do a lot of good with it”).
This ideal version EA won’t appeal to all super talented people of course, but that’s fine. Other people can build other movements for that. (It’s what we’re trying to do at The School for Moral Ambition..)
Argh, thanks for catching that! Edited now.
If this perspective involves a strong belief that AI will not change the world much, then IMO that’s just one of the (few?) things that are ~fully out of scope for Forethought
I disagree with this. There would need to be some other reason for why they should work at Forethought rather than elsewhere, but there are plausible answers to that — e.g. they work on space governance, or they want to write up why they think AI won’t change the world much and engage with the counterarguments.
I can’t speak to the “AI as a normal technology” people in particular, but a shortlist I created of people I’d be very excited about includes someone who just doesn’t buy at all that AI will drive an intelligence explosion or explosive growth.
I think there are lots of types of people where it wouldn’t be a great fit, though. E.g. continental philosophers; at least some of the “sociotechnical” AI folks; more mainstream academics who are focused on academic publishing. And if you’re just focused on AI alignment, probably you’ll get more at a different org than you would at Forethought.
More generally, I’m particularly keen on situations where V(X, Forethought team) is much greater than than V(X) + V(Forethought team), either because there are synergies between X and the team, or because X is currently unable to do the most valuable work they could in any of the other jobs they could be in.
Thanks for writing this, Lizka!
Some misc comments from me:
I have the worry that people will see Forethought as “the Will MacAskill org”, at least to some extent, and therefore think you’ve got to share my worldview to join. So I want to discourage that impression! There’s lots of healthy disagreement within the team, and we try to actively encourage disagreement. (Salient examples include disagreement around: AI takeover risk; whether the better futures perspective is totally off-base or not; moral realism / antirealism; how much and what work can get punted until a later date; AI moratoria / pauses; whether deals with AIs make sense; rights for AIs; gradual disempowerment).
I think from the outside it’s probably not transparent just how involved some research affiliates or other collaborators are, in particular Toby Ord, Owen Cotton-Barratt, and Lukas Finnveden.
I’d in particular be really excited for people who are deep in the empirical nitty-gritty — think AI2027 and the deepest criticisms of that; or gwern; or Carl Shulman; or Vaclav Smil. This is something I wish I had more skill and practice in, and I think it’s generally a bit of a gap in the team.
While at Forethought, I’ve been happier in my work than I have in any other job. That’s a mix of: getting a lot of freedom to just focus on making intellectual progress rather various forms of jumping through hoops; the (importance)*(intrinsic interestingness) of the subject matter; the quality of the team; the balance of work ethic and compassion among people — it really feels like everyone has each other’s back; and things just working and generally being low-drama.
In my memory, the main impetus was a couple of leading AI safety ML researchers started making the case for 5-year timelines. They were broadly qualitatively correct and remarkably insightful (promoting the scaling-first worldview), but obviously quantitatively too aggressive. And AlphaGo and AlphaZero had freaked people out, too.
A lot of other people at the time (including close advisers to OP folks) had 10-20yr timelines. My subjective impression was that people in the OP orbit generally had more aggressive timelines than Ajeya’s report did.