Thanks this is really useful. (I will try to go through this course, as well.)
I’m not sure the talk has it quite right though. My take is that on the most popular definitions of alignment and capabilities, they are partly conceptually the same, depending on which intentions we are meant to be aligning with. So, it’s not the case that there is a ‘alignment externality’ of a capabilities improvements, but rather that some alignment improvements are capabilities improvements, by definition.
Dan Hendrycks’ lecture on “Safety-Capabilities Balance” might be helpful here.
Thanks this is really useful. (I will try to go through this course, as well.)
I’m not sure the talk has it quite right though. My take is that on the most popular definitions of alignment and capabilities, they are partly conceptually the same, depending on which intentions we are meant to be aligning with. So, it’s not the case that there is a ‘alignment externality’ of a capabilities improvements, but rather that some alignment improvements are capabilities improvements, by definition.