Hi, I’m Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don’t initially know what the user wants.
I’m particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.
It is more like this stronger claim.
I might not use “inherently” here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The “natural” test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.