Thank you for explaining more. In that case, I can understand why you’d want to spend more time thinking about AI safety.
I suspect that much of the reason that “understanding the argument is so hard” is because there isn’t a definitive argument—just a collection of fuzzy arguments and intuitions. The intuitions seem very, well, intuitive to many people, and so they become convinced. But if you don’t share these intuitions, then hearing about them doesn’t convince you. I also have an (academic) ML background, and I personally find some topics (like mesa-optimization) to be incredibly difficult to reason about.
I think that generating more concrete arguments and objections would be very useful for the field, and I encourage you to write up any thoughts that you have in that direction!
(Also, a minor disclaimer that I suppose I should have included earlier: I provided technical feedback on a draft of TAP, and much of the “AGI safety” section focuses on my team’s work. I still think that it’s a good concrete introduction to the field, because of how specific and well-cited it is, but I also am probably somewhat biased.)
Thank you for explaining more. In that case, I can understand why you’d want to spend more time thinking about AI safety.
I suspect that much of the reason that “understanding the argument is so hard” is because there isn’t a definitive argument—just a collection of fuzzy arguments and intuitions. The intuitions seem very, well, intuitive to many people, and so they become convinced. But if you don’t share these intuitions, then hearing about them doesn’t convince you. I also have an (academic) ML background, and I personally find some topics (like mesa-optimization) to be incredibly difficult to reason about.
I think that generating more concrete arguments and objections would be very useful for the field, and I encourage you to write up any thoughts that you have in that direction!
(Also, a minor disclaimer that I suppose I should have included earlier: I provided technical feedback on a draft of TAP, and much of the “AGI safety” section focuses on my team’s work. I still think that it’s a good concrete introduction to the field, because of how specific and well-cited it is, but I also am probably somewhat biased.)