I agree with this post. I’ve been reading many more papers since first entering this field because I’ve been increasingly convinced of the value of treating alignment as an engineering problem and pulling insights from the literature. I’ve also been trying to do more thinking about how to update on the current paradigm from the classic Yud and Bostrom alignment arguments. In this respect, I applaud Quintin Pope for his work.
This week, I will send a grant proposal to continue my work in alignment. I’d be grateful if you could look at my proposal and provide some critique. It would be great to have an outside view (like yours) to give feedback on it.
Current short summary: “This project comprises two main interrelated components: accelerating AI alignment research by integrating large language models (LLMs) into a research system, and conducting direct work on alignment with a focus on interpretability and steering the training process towards aligned AI. The “accelerating alignment” agenda aims to impact both conceptual and empirical aspects of alignment research, with the ambitious long-term goal of providing a massive speed-up and unlocking breakthroughs in the field. The project also includes work in interpretability (using LLMs for interpreting models; auto-interpretability), understanding agency in the current deep learning paradigm, and designing a robustly aligned training process. The tools built will integrate seamlessly into the larger alignment ecosystem. The project serves as a testing ground for potentially building an organization focused on using language models to accelerate alignment work.”
Please send me a DM if you’d like to give feedback!
I agree with this post. I’ve been reading many more papers since first entering this field because I’ve been increasingly convinced of the value of treating alignment as an engineering problem and pulling insights from the literature. I’ve also been trying to do more thinking about how to update on the current paradigm from the classic Yud and Bostrom alignment arguments. In this respect, I applaud Quintin Pope for his work.
This week, I will send a grant proposal to continue my work in alignment. I’d be grateful if you could look at my proposal and provide some critique. It would be great to have an outside view (like yours) to give feedback on it.
Current short summary: “This project comprises two main interrelated components: accelerating AI alignment research by integrating large language models (LLMs) into a research system, and conducting direct work on alignment with a focus on interpretability and steering the training process towards aligned AI. The “accelerating alignment” agenda aims to impact both conceptual and empirical aspects of alignment research, with the ambitious long-term goal of providing a massive speed-up and unlocking breakthroughs in the field. The project also includes work in interpretability (using LLMs for interpreting models; auto-interpretability), understanding agency in the current deep learning paradigm, and designing a robustly aligned training process. The tools built will integrate seamlessly into the larger alignment ecosystem. The project serves as a testing ground for potentially building an organization focused on using language models to accelerate alignment work.”
Please send me a DM if you’d like to give feedback!