# How much does work in AI safety help the world? Probability distribution version (Oxford Prioritisation Project)

*By Tom Sittler*

*2017-04-26*

*We’re centralising all discussion on the Effective Altruism forum. To discuss this post, please comment here.*

The Global Priorities Project (GPP) has a model quantifying the impact of adding a researcher to the field of AI safety. Quoting from GPP:

There’s been some discussion lately about whether we can make estimates of how likely efforts to mitigate existential risk from AI are to succeed and about what reasonable estimates of that probability might be. In a recent conversation between the two of us, Daniel mentioned that he didn’t have a good way to estimate the probability that joining the AI safety research community would actually avert existential catastrophe. Though it would be hard to be certain about this probability, it would be nice to have a principled back-of-the-envelope method for approximating it. Owen actually has a rough method based on the one he used in his article Allocating risk mitigation across time, but he never spelled it out.

I found this model (moderately) useful and turned it into a Guesstimate model, which you can view here. You can write to me privately and I’ll share my inputs with you. (So as not to anchor people).

Have other people found this model useful? Why, or why not? What would be your inputs into the model?

This is interesting. I’m strongly in favor of having rough models like this in general. Thanks for sharing!

Edit suggestions:

STI says “what percent of bad scenarios should we expect this to avert”, but the formula uses it as a fraction. Probably best to keep the formula and change the wording.

Would help to clarify that TXR is a probability of X-risk. (This is clear after a little thought/inspection, but might as well make it as easy to use as possible.)

Quick thoughts:

It might be helpful to talk in terms of research-years rather than researchers.

It’s slightly strange that the model assumes 1-P(xrisk) is linear in researchers, but then only estimates the coefficient from TXR x STI/(2 x SOT), when (1-TXR)/SOT should also be an estimate. It does make sense that risk would be “more nonlinear” for lower n_researchers, though.

A clear problem with this model is that AFAICT, it assumes that (i) the size of the research community working on safety when AI is developed is independent of (ii) the the degree to which adding a researcher now will change the total number of researchers.

Both (i) and (ii) can vary by orders of magnitude, at least on my model, but are very correlated, because they depend on timelines. This means I get an oddly high chance of averting existential risk. If the questions where combined together into “what fraction of the AI community will the community by enlarged by adding an extra person” then I think my chance of averting existential risk would come out much lower.

Yes, I think this is a significant concern with this version of the model (somewhat less so with the original cruder version using something like medians, but that version also fails to pick up on legitimate effects of “what if these variables are all in the tails”). Combining the variables as you suggest is the easiest way to patch it. More complex would be to add in explicit time-dependency.

When I visit the model page I see errors about improper syntax. (I assume this is because it’s publicly editable and someone accidentally messed up the syntax?)