Sjlver comments on Preventing an AI-related catastrophe—Problem profile

Sjlver 5 Sep 2022 19:11 UTC
2 points
0 ∶ 0
Thanks for pointing out these two places!

You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.

Work on AI drives AI risk. This is not equally true of all AI work, but the overall correlation is clear. There are good arguments that AI will not be aligned by default, and that current methods can produce bad outcomes if naively scaled up. These are cited in your problem profile. With that in mind, I would not say that I’m confident that AI work is net-negative… but the risk of negative outcomes is too large to feel comfortable.

It seems hard to conclude that the counterfactual where any one or more of “no work on AI safety / no interpretability work / no robustness work / no forecasting work” were true is in fact a world with less x-risk from AI overall.

A world with more interpretability / robustness work is a world where powerful AI arrives faster (maybe good, maybe bad, certainly risky). I am echoing section 2 of the problem profile, which argues that the sheer speed of AI advances is cause for concern. Moreover, because interpretability and robustness work advances AI, traditional AI companies are likely to pursue such work even without an 80000hours problem profile. This could be an opportunity for 80000hours to direct people to work that is even more central to safety.

As you say, these are currently just intuitions, not concretely evaluated claims. It’s completely OK if you don’t put much weight on them. Nevertheless, I think these are real concerns shared by others (e.g. Alexander Berger, Michael Nielsen, Kerry Vaughan), and I would appreciate a brief discussion, FAQ entry, or similar in the problem profile.

And now I’ll stop bothering you :) Thanks for having written the problem profile. It’s really nice work overall.