Benjamin was a research analyst at 80,000 Hours. Before joining 80,000 Hours, he worked for the UK Government and did some economics and physics research.
Benjamin Hilton
I agree that I’d love to see more work on this! (And I agree that the last story I talk about, of a very fast takeoff AI system with particularly advanced capabilities, seems unlikely to me—although others disagree, and think this “worst case” is also the most likely outcome.)
It’s worth noting again though that any particular story is unlikely to be correct. We’re trying to forecast the future, and good ways of forecasting should feel uncertain at the end, because we don’t know what the future will hold. Also, good work on this will (in my opinion) give us ideas about what many possible scenarios will look like . This sort of work (e.g. the first half of this article, rather than the second), often feels less concrete, but is, I think, more likely to be correct—and can inform actions that target many possible scenarios rather than one single unlikely event.
All that said, I’m excited to see work like OpenPhil’s nearcasting project which I find particularly clarifying and which will, I hope, improve our ability to prevent a catastrophe.
That particular story, in which I write “one day, every single person in the world suddenly dies”, is about a fast takeoff self-improvement scenario. In such scenarios, a sudden takeover is exactly what we should expect to occur, and the intermediate steps set out by Holden and others don’t apply to such scenarios. Any guessing about what sort of advanced technology would do this necessarily makes the scenario less likely, and I think such guesses (e.g. “hypnodrones”) are extremely likely to be false and aren’t useful or informative.
For what it’s worth, I personally agree that slow takeoff scenarios like those described by Holden (or indeed those I discuss in the rest of this article) are far more likely. That’s why I focus many different ways in which an AI could take over—rather than on any particular failure story. And, as I discuss, any particular combination of steps is necessarily less likely than the claim that any or all of these capabilities could be used.
But a significant fraction of people working on AI existential safety disagree with both of us, and think that a story which literally claims that a sufficiently advanced system will suddenly kill all humans is the most likely way for this catastrophe to play out! That’s why I also included a story which doesn’t explain these intermediate steps, even though my inside view is that this is less likely to occur.
Yeah, it’s a good question! Some thoughts:
-
I’m being quite strict with my definitions. I’m only counting people working directly on AI safety. So, for example, I wouldn’t count the time I spent writing this profile on AI (or anyone else who works at 80k for that matter). (Note: I do think lots of relevant work is done by people who don’t directly work on it) I’m also not counting people who think of themselves as on an AI safety career path and are, at the moment, skilling up rather than working directly on the problem. There are some ambiguities, e.g. are the ops team of an AI org working on safety? In general though these ambiguities seem much lower than the error in the data itself.
-
AI safety is hugely neglected outside EA (which is a key reason why it seems so useful to work on). This isn’t a big surprise and may be in large part a result of the fact that it used to be even more neglected, which means that anything that is started as an AI safety org is likely to have been started by EAs, so is also seen as an EA org. Which makes AI safety a subset of EA rather than the other way round.
-
Also, I’m looking at AI existential safety rather than broader AI ethics or AI safety issues. The focus on x-risk (combined with reasons to think that lots of work on AI non-existential safety isn’t that relevant—as compared with e.g. bio where lots of policy work for example is relevant to major pandemics and existential pandemics) makes it even more likely that this is just looking at a strict subset of EAs
-
There are I think up to around 10 thousand engaged EAs—of those maybe 1-2 thousand are longtermism or x-risk focused. So we’re looking at 10% of these people working full-time on AI x-risk! Seems like a pretty high proportion to me given the various causes in the wider EA (not even longtermist) community.
-
So in many ways the question of “how are so few people working on AI safety after 10 years” is similar to “how are there so few EAs after 10 years”, which is a pretty complicated question. But it seems to me like EA is way way way bigger and more influential than I would ever have expected in 2012!
-
There are also some other bottlenecks (notably mentoring capacity). The field was nearly non-existent 10 years ago, with very few senior people to help others enter the field – and it’s (rightly) a very technical field, focused on theoretical and practical computer science / ML. Even now, the proportion of time those 300 people should be spending mentoring is very much unclear to me.
I’d also like to highlight the footnote alongside this number: “There’s a lot of subjective judgement in the estimate (e.g. “does it seem like this research agenda is about AI safety in particular?”), and it could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. My 90% confidence interval would range from around 100 people to around 1,500 people.”
-
Hi Gideon,
I wrote the 80,000 Hours problem profile on climate change. Thank you so much for this feedback! I’m genuinely really grateful to see such engagement with the things I write—and criticism is always a welcome contribution to making sure that I’m saying the right things.
Just to be clear, when I said “we think it’s potentially harmful to do work that could advance solar geoengineering”, I meant that (with a fair degree of uncertainty), it could be harmful to do work that advances the technology (which I think you agree with) not that all research around the topic seems bad! It definitely seems plausible that some research on the topic might be good—but I was trying to recommend the very best things to do to mitigate climate change. My reviewers pretty much all agreed that, partly as a result of potential harmful effects, it doesn’t seem like SRM research would be one of those very best things, and so suggested that we stop recommending working in the area. In large part I’m deferring to this consensus among the reviewers on this.
Hope that helps!
Benjamin
I think these are all great points! We should definitely worry about negative effects of work intended to do good.
That said here are two other places where maybe we have differing intuitions:You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.
It seems hard to conclude that the counterfactual where any one or more of “no work on AI safety / no interpretability work / no robustness work / no forecasting work” were true is in fact a world with less x-risk from AI overall. That is, while I can see there are potential negative effects of these things, when I truly try to imagine the counterfactual, the overall impact seems likely positive to me.
Of course, intuitions like these are much less concrete than actually trying to evaluate the claims , and I agree it seems extremely important for people evaluating or doing anything in AI safety to ensure they’re doing positive work overall.
Ah thanks :) Fixed.
Yes, there was!
This is a great story! Good motivational content.
But I do think, in general, a mindset of “only I can do this” is innacurate and has costs. There are plenty of other people in the world, and other communities in the world, attempting to do good, and often succeeding. I think EAs have been a small fraction of the success in reducing global poverty over the last few decades, for example.
Here are a few plausible costs to me:
-
Knowing when and why others will do things significantly changes estimates of the marginal value of acting. For example, if you are starting a new project, it’s reasonably likely that even if you have a completely new idea, other people will be in similar epistemic situations as you, and will soon stumble upon the same idea. So to estimate your counterfactual impact you might want to be estimating how much earlier something will occur because you made it occur, rather than purely the impact of the thing occurring. More generally, neglectedness is a key part of estimating your marginal impact—and estimating neglectedness relies heavily on an understanding of what others are focusing on, and usually at least a few people are doing things in a similar space to you.
-
Also, knowing when and why others will do things affects strategic considerations. The fact that in many places we now try to do good there are few non-EAs working there is a result of our attempts to find neglected areas. But—especially in the case of x-risk—we can expect others to begin to do good work in these areas as time progresses (see e.g. AI discussions around warning shots). The extent to which this is the case affects what is valuable to do now.
-
This does seem to be an important dynamic.
Here are a few reasons this might be wrong (both sound vaguely plausible to me):If someone being convinced of a different non-weird version of an argument makes it easier to convince them of the actual argument, you end up with more people working on the important stuff overall.
If you can make things sound less weird without actually changing the content of what you’re saying, you don’t get this downside (This might be pretty hard to do though.)
(1) is particularly important if you think this “non-weird to weird” approach will appeal to a set of people who wouldn’t otherwise end up agreeing with your arguments. That would mean it has a high counterfactual impact—even if some of the people do work that whilst still being good is ultimately far less relevant to x-risk reduction. This is even more true if you think there’s a low rate of people who would have just listened to your weirder sounding arguments in the first place who will get “stuck” at the non-weird stuff and as a result never do useful things.
That’s not the intention, thanks for pointing this out!
To clarify, by “route”, I mean gaining experience in this space through working on engineering roles directly related to AI. Where those roles are not specifically working on safety, it’s important to try to consider any downside risk that could result from advancing general AI capabilities (this in general will vary a lot across roles and can be very difficult to estimate).
A bit of both—but you’re right, I primarily meant “secure” (as I expect this is where engineers have something specific to contribute).
I’m curious about the ethical decisions you’ve made in this report. What’s your justification for evaluating current lives lost? I’d be far more interested in cause-X research that considers a variety of worldviews, e.g. a number of different ways of evaluating the medium or long-term consequences of interventions.