How much does work in AI safety help the world?
Using a method of Owen’s, we made an interactive tool to estimate the probability that joining the AI safety research community would actually avert existential catastrophe.
Though I’m posting this and it’s written from my point of view, most of the writing and reasoning here is Owen’s—I’ll take the blame for any misunderstandings or mis-statements of his argument :)
There’s been some discussion lately about whether we can make estimates of how likely efforts to mitigate existential risk from AI are to succeed and about what reasonable estimates of that probability might be. In a recent conversation between the two of us, I mentioned to Owen that I didn’t have a good way to estimate the probability that joining the AI safety research community would actually avert existential catastrophe. Though it would be hard to be certain about this probability, it would be nice to have a principled back-of-the-envelope method for approximating it. Owen actually has a rough method based on the one he used in his article Allocating risk mitigation across time, but he never spelled it out.
We thought that this would be best presented interactively; since the EA Forum doesn’t allow JavaScript in posts, we put the tool on the Global Priorities Project website.
You can use the tool here to make your own estimate!
(Assuming that you’ve gone to the site and made your own estimate...)
So, what does this mean? Obviously this is quite a crude method, and some of the variables you have to estimate are themselves quite tricky to get a handle on, but we think they’re more approachable than trying to estimate the whole thing directly, and expect the answer to be within a few orders of magnitude of correct.
How big does the number have to be to imply that a career in AI safety research is one of the best things to do? One natural answer is to multiply out by the number of lives we expect we could get in the future. I think this is understandable, and worth doing as a check to see if the whole thing is dominated by focusing on the present, but it’s not the end of the story. There are fewer than 10 billion people alive today, and collectively it seems like we may have a large amount of influence over the future. Therefore if your estimate for the likelihood of a career in AI safety looks much worse than a 1 in 10 billion chance, it seems likely that there are other more promising ways to productively use your share of that influence. These might eventually help by influencing AI safety in another way, or through a totally different mechanism. The method we’ve used here could also be modified to estimate the value of joining communities working on other existential risks, or perhaps other interventions that change the eventual size or productivity of the AI research community, for example through outreach, funding, or field-steering work.
Cool idea and initiative to make such a calculator :) Although it doesn’t quite reflect how I make estimations myself (I might make a more complicated calculator of my own at some point that does).
The way I see it, the work that is done now will be the most valuable per person, and the amount of people working on this towards the end may not be so indicative (nine women cannot make a baby in a month, etc).
Agree that we are missing some things here. My guess is that this one is not too large (less than an order of magnitude), in significant part because we’re just using ‘eventual size’ as a convenient proxy, and increases to the size there seem likely to be highly correlated with increases at intermediate times.
That said, it’s great to look for things we may be missing, and see if we can reach consensus about which could change the answer crucially.
Thanks! :) After our conversation Owen jumped right into the write-up, and I pitched in with the javascript—it was fun to just charge ahead and execute a small idea like this.
It’s true that this calculator doesn’t take field-steering or paradigm-defining effects of early research into account, nor problems of inherent seriality vs parallelizable work. These might be interesting to incorporate into a future model, at some risk of over-complicating what will always be a pretty rough estimate.
Actually contributing positively now, I think it would be cool to get other people’s estimates on these plus reasoning.
A while ago MIRI had a go at something like this, The Uncertain Future
Now instead of being confused about “the probability that joining the AI safety research community would actually avert existential catastrophe”, I’m confused about “the total existential risk associated with developing highly capable AI systems” and “[w]hat percentage of the bad scenarios should we expect [doubling the AI safety committee] to avert”...
Yeah, I think these are still pretty hard and uncertain, but I find it substantially easier to have a direct intuition of what kind of range is reasonable than I do for the original question.
I agree. I’m sorry for coming across as rude with this comment—I do think the framework provides for a meaningful reduction.
“the total existential risk associated with developing highly capable AI systems” means the chance of existential risk from AI and “[w]hat percentage of the bad scenarios should we expect [doubling the AI safety committee] to avert” means how much would that risk be reduced if the AI safety effort was doubled. Or you’re having difficulty estimating the probabilities?
Yeah, I meant that.
Does it just seem impossible to you? Can you not think of any related questions, or related problems?
It does seem quite hard, but I admittedly haven’t thought about it very much. I imagine it’s not something I’d be generally good at estimating.
Well it’s important to have an estimate; have you looked at others’ estimates?
Have you heard the aphorism ‘curiosity seeks to annihilate itself’? Like these are a set of physical and anthropological questions, that are important to narrow our uncertainty on—so saying ‘i haven’t so far envisageg how evidence could be brought to bear on these questions’ is depressing!
Hey Ryan, I appreciate you helping out, but I’m finding you to be quite condescending. Maybe you weren’t aware that’s how you’re coming across to me? It’s not that I don’t want your help, of course!
I know that I could spend a lot of time attempting to reduce my uncertainty by investigating and I have done some, but I only have so much time!
And yes, I have read the LW sequences. ;)
Sorry, I’d redrafted it to try to avoid that.
From my perspective, the previous posts read like someone cheerfully breaking conversational/intellectual norms by saying ‘your question confuses me’, without indicating what you tried / how to help you, making it hard to respond productively.
Yes, you’re right. I’m sorry about that.
It looks like you accidentally a word. I don’t know what this is supposed to be saying.
Thanks! Going to fix. It was supposed to say “by the time we develop those...”
The AI sabotage begins.
Cool idea, thanks – one usability comment:
“Existential risk” doesn’t have units of percentages in my mind. I would phrase this as “probability that Earth-based life goes extinct” or something.