It’s happened a few times at our local meetup (South Bay EA) that we get someone new who says something like “okay I’m a fairly good ML student who wants to decide on a research direction for AI Safety.” In the past we’ve given fairly generic advice like “listen to this 80k podcast on AI Safety” or “apply to AIRCS”. One of our attendees went on to join OpenAI’s safety team after this advice, and gave us some attribution for it. While this probably makes folks a little better off, it feels like we could do better for them.
If you had to give someone more concrete object-level advice on how to get started AI safety what would you tell them?
I’m a fairly good ML student who wants to decide on a research direction for AI Safety.
I’m not actually sure whether I think it’s a good idea for ML students to try to work on AI safety. I am pretty skeptical of most of the research done by pretty good ML students who try to make their research relevant to AI safety—it usually feels to me like their work ends up not contributing to one of the core difficulties, and I think that they might have been better off if they’d instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.
I don’t have very much better advice for how to get started on AI safety; I think the “recommend to apply to AIRCS and point at 80K and maybe the Alignment Newsletter” path is pretty reasonable.
I think that they might have been better off if they’d instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.
I’m broadly sympathetic to this, but I also want to note that there are some research directions in mainstream ML which do seem significantly more valuable than average. For example, I’m pretty excited about people getting really good at interpretability, so that they have an intuitive understanding of what’s actually going on inside our models (particularly RL agents), even if they have no specific plans about how to apply this to safety.