I’d mainly point to relatively introductory / high-level resources like Alignment research field guide and Risks from learned optimization, if you haven’t read them. I’m more confident in the relevance of methodology and problem statements than of existing attempts to make inroads on the problem.
I’d mainly point to relatively introductory / high-level resources like Alignment research field guide and Risks from learned optimization, if you haven’t read them. I’m more confident in the relevance of methodology and problem statements than of existing attempts to make inroads on the problem.
There’s a lot of good high-level content on Arbital (https://arbital.com/explore/ai_alignment/), but it’s not very organized and a decent amount of it is in draft form.