The evaluations project at the Alignment Research Center is looking to hire a generalist technical researcher and a webdev-focused engineer. We’re a new team at ARC building capability evaluations (and in the future, alignment evaluations) for advanced ML models. The goals of the project are to improve our understanding of what alignment danger is going to look like, understand how far away we are from dangerous AI, and create metrics that labs can make commitments around (e.g. ‘If you hit capability threshold X, don’t train a larger model until you’ve hit alignment threshold Y’). We’re also still hiring for model interaction contractors, and we may be taking SERI MATS fellows.
The evaluations project at the Alignment Research Center is looking to hire a generalist technical researcher and a webdev-focused engineer. We’re a new team at ARC building capability evaluations (and in the future, alignment evaluations) for advanced ML models. The goals of the project are to improve our understanding of what alignment danger is going to look like, understand how far away we are from dangerous AI, and create metrics that labs can make commitments around (e.g. ‘If you hit capability threshold X, don’t train a larger model until you’ve hit alignment threshold Y’). We’re also still hiring for model interaction contractors, and we may be taking SERI MATS fellows.