I think you make an important point that Iâm inclined to agree with.
Most of the discourse, theories, intuitions, and thought experiments about AI alignment was formed either before the popularization of deep learning (which started circa 2012) or before the people talking and writing about AI alignment started really caring about deep learning.
In or around 2017, I had an exchange with Eliezer Yudkowsky in an EA-related or AI-related Facebook group where he said he didnât think deep learning would lead to AGI and thought symbolic AI would instead. Clearly, at some point since then, he changed his mind.
For example, in his 2023 TED Talk, he said he thinks deep learning is on the cusp of producing AGI. (That wasnât the first time, but it was a notable instance and an instance where he was especially clear on what he thought.)
I havenât been able to find anywhere where Eliezer talks about changing his mind or explains why he did. It would probably be helpful if he did.
All the pre-deep learning (or pre-caring about deep learning) ideas about alignment have been carried into the ChatGPT era and Iâve seen a little bit of discourse about this, but only a little. It seems strange that ideas about AI itself would change so much over the last 13 years and ideas about alignment would apparently change so little.
If there are good reasons why those older ideas about alignment should still apply to deep learning-based systems, I havenât seen much discussion about that, either. You would think there would be more discussion.
My hunch is that AI alignment theory could probably benefit from starting with a fresh sheet of paper. I suspect there is promise in the approach of starting from scratch in 2025 without trying to build on or continue from older ideas and without trying to be deferential toward older work.
I suspect there would also be benefit in getting out of the EA/âAlignment Forum/âLessWrong/ârationalist bubble.
I agree with the âfresh sheet of paper.â Reading the alignment faking paper and the current alignment challenges has been way more informative than reading Yudkowsky.
I think theese circles have granted him too many bayes points for predicting alignment when the technical details of his alignment problems basically donât apply to deep learning as you said.
I think you make an important point that Iâm inclined to agree with.
Most of the discourse, theories, intuitions, and thought experiments about AI alignment was formed either before the popularization of deep learning (which started circa 2012) or before the people talking and writing about AI alignment started really caring about deep learning.
In or around 2017, I had an exchange with Eliezer Yudkowsky in an EA-related or AI-related Facebook group where he said he didnât think deep learning would lead to AGI and thought symbolic AI would instead. Clearly, at some point since then, he changed his mind.
For example, in his 2023 TED Talk, he said he thinks deep learning is on the cusp of producing AGI. (That wasnât the first time, but it was a notable instance and an instance where he was especially clear on what he thought.)
I havenât been able to find anywhere where Eliezer talks about changing his mind or explains why he did. It would probably be helpful if he did.
All the pre-deep learning (or pre-caring about deep learning) ideas about alignment have been carried into the ChatGPT era and Iâve seen a little bit of discourse about this, but only a little. It seems strange that ideas about AI itself would change so much over the last 13 years and ideas about alignment would apparently change so little.
If there are good reasons why those older ideas about alignment should still apply to deep learning-based systems, I havenât seen much discussion about that, either. You would think there would be more discussion.
My hunch is that AI alignment theory could probably benefit from starting with a fresh sheet of paper. I suspect there is promise in the approach of starting from scratch in 2025 without trying to build on or continue from older ideas and without trying to be deferential toward older work.
I suspect there would also be benefit in getting out of the EA/âAlignment Forum/âLessWrong/ârationalist bubble.
I agree with the âfresh sheet of paper.â Reading the alignment faking paper and the current alignment challenges has been way more informative than reading Yudkowsky.
I think theese circles have granted him too many bayes points for predicting alignment when the technical details of his alignment problems basically donât apply to deep learning as you said.