I see no meaningful path to impact on safety working as an AI lab researcher
This is a very strong statement. I’m not following technical alignment research that closely, but my general sense is that exciting work is being done. I just wrote this comment advertising a line of research which strikes me as particularly promising.
I noticed the other day that the people who are particularly grim about AI alignment also don’t seem to be engaging much with contemporary technical alignment research. That missing intersection seems suspicious. I’m interested in any counterexamples that come to mind.
My subjective sense is there’s a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn’t get sufficiently developed or implemented. This seems especially true for techniques like gradient routing which would need to be baked in to a big, expensive training run.
(I’m also interested in arguments for why unlearning won’t work. I’ve thought about this a fair amount, and it seems to me that sufficiently good unlearning kind of just oneshots AI safety, as elaborated in the comment I linked.)
My subjective sense is there’s a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn’t get sufficiently developed or implemented.
For both theoretical and empirical reasons, I would assign a probably as low as 5% to there being alignment insights just laying around that could protect us at the superintelligence capabilities level and don’t require us to slow or stop development to implement in time.
I don’t see a lot of technical safety people engaging in advocacy, either? It’s not like they tried advocacy first and then decided on technical safety. Maybe you should question their epistemology.
I don’t see a lot of technical safety people engaging in advocacy, either? It’s not like they tried advocacy first and then decided on technical safety. Maybe you should question their epistemology.
My impression is that so far most of the impactful “public advocacy” work has been done by “technical safety” people. Some notable examples include Yoshua Bengio, Dan Hendryks, Ian Hogarth, and Geoffrey Hinton.
This is a very strong statement. I’m not following technical alignment research that closely, but my general sense is that exciting work is being done. I just wrote this comment advertising a line of research which strikes me as particularly promising.
I noticed the other day that the people who are particularly grim about AI alignment also don’t seem to be engaging much with contemporary technical alignment research. That missing intersection seems suspicious. I’m interested in any counterexamples that come to mind.
My subjective sense is there’s a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn’t get sufficiently developed or implemented. This seems especially true for techniques like gradient routing which would need to be baked in to a big, expensive training run.
(I’m also interested in arguments for why unlearning won’t work. I’ve thought about this a fair amount, and it seems to me that sufficiently good unlearning kind of just oneshots AI safety, as elaborated in the comment I linked.)
Here’s our crux:
For both theoretical and empirical reasons, I would assign a probably as low as 5% to there being alignment insights just laying around that could protect us at the superintelligence capabilities level and don’t require us to slow or stop development to implement in time.
I don’t see a lot of technical safety people engaging in advocacy, either? It’s not like they tried advocacy first and then decided on technical safety. Maybe you should question their epistemology.
My impression is that so far most of the impactful “public advocacy” work has been done by “technical safety” people. Some notable examples include Yoshua Bengio, Dan Hendryks, Ian Hogarth, and Geoffrey Hinton.
Yeah good point. I thought Ebenezer was referring to more run-of-the-mill community members.