I think these are all great points! We should definitely worry about negative effects of work intended to do good.
That said here are two other places where maybe we have differing intuitions:
You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.
It seems hard to conclude that the counterfactual where any one or more of “no work on AI safety / no interpretability work / no robustness work / no forecasting work” were true is in fact a world with less x-risk from AI overall. That is, while I can see there are potential negative effects of these things, when I truly try to imagine the counterfactual, the overall impact seems likely positive to me.
Of course, intuitions like these are much less concrete than actually trying to evaluate the claims , and I agree it seems extremely important for people evaluating or doing anything in AI safety to ensure they’re doing positive work overall.
You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.
Work on AI drives AI risk. This is not equally true of all AI work, but the overall correlation is clear. There are good arguments that AI will not be aligned by default, and that current methods can produce bad outcomes if naively scaled up. These are cited in your problem profile. With that in mind, I would not say that I’m confident that AI work is net-negative… but the risk of negative outcomes is too large to feel comfortable.
It seems hard to conclude that the counterfactual where any one or more of “no work on AI safety / no interpretability work / no robustness work / no forecasting work” were true is in fact a world with less x-risk from AI overall.
A world with more interpretability / robustness work is a world where powerful AI arrives faster (maybe good, maybe bad, certainly risky). I am echoing section 2 of the problem profile, which argues that the sheer speed of AI advances is cause for concern. Moreover, because interpretability and robustness work advances AI, traditional AI companies are likely to pursue such work even without an 80000hours problem profile. This could be an opportunity for 80000hours to direct people to work that is even more central to safety.
As you say, these are currently just intuitions, not concretely evaluated claims. It’s completely OK if you don’t put much weight on them. Nevertheless, I think these are real concerns shared by others (e.g. Alexander Berger, Michael Nielsen, Kerry Vaughan), and I would appreciate a brief discussion, FAQ entry, or similar in the problem profile.
And now I’ll stop bothering you :) Thanks for having written the problem profile. It’s really nice work overall.
I think these are all great points! We should definitely worry about negative effects of work intended to do good.
That said here are two other places where maybe we have differing intuitions:
You seem much more confident than I am that work on AI that is unrelated to AI safety is in fact negative in sign.
It seems hard to conclude that the counterfactual where any one or more of “no work on AI safety / no interpretability work / no robustness work / no forecasting work” were true is in fact a world with less x-risk from AI overall. That is, while I can see there are potential negative effects of these things, when I truly try to imagine the counterfactual, the overall impact seems likely positive to me.
Of course, intuitions like these are much less concrete than actually trying to evaluate the claims , and I agree it seems extremely important for people evaluating or doing anything in AI safety to ensure they’re doing positive work overall.
Thanks for pointing out these two places!
Work on AI drives AI risk. This is not equally true of all AI work, but the overall correlation is clear. There are good arguments that AI will not be aligned by default, and that current methods can produce bad outcomes if naively scaled up. These are cited in your problem profile. With that in mind, I would not say that I’m confident that AI work is net-negative… but the risk of negative outcomes is too large to feel comfortable.
A world with more interpretability / robustness work is a world where powerful AI arrives faster (maybe good, maybe bad, certainly risky). I am echoing section 2 of the problem profile, which argues that the sheer speed of AI advances is cause for concern. Moreover, because interpretability and robustness work advances AI, traditional AI companies are likely to pursue such work even without an 80000hours problem profile. This could be an opportunity for 80000hours to direct people to work that is even more central to safety.
As you say, these are currently just intuitions, not concretely evaluated claims. It’s completely OK if you don’t put much weight on them. Nevertheless, I think these are real concerns shared by others (e.g. Alexander Berger, Michael Nielsen, Kerry Vaughan), and I would appreciate a brief discussion, FAQ entry, or similar in the problem profile.
And now I’ll stop bothering you :) Thanks for having written the problem profile. It’s really nice work overall.