Wonderful! This will make me feel (slightly) less stupid for asking very basic stuff. I actually had 3 or so in mind, so I might write a couple of comments.
Most pressing: what is the consensus on the tractability of the Alignment problem? Have there been any promising signs of progress? I’ve mostly just heard Yudkowky portray the situation in terms so bleak that, even if one were to accept his arguments, the best thing to do would be nothing at all and just enjoy life while it lasts.
I’d say alignment research is not going very well! There have been successes in areas that help products get to market (e.g. RLHF) and on problems of academic interest that leave key problems unsolved (e.g. adversarial robustness), but there are several “core problems” that have not seen much progress over the years.
Wonderful! This will make me feel (slightly) less stupid for asking very basic stuff. I actually had 3 or so in mind, so I might write a couple of comments.
Most pressing: what is the consensus on the tractability of the Alignment problem? Have there been any promising signs of progress? I’ve mostly just heard Yudkowky portray the situation in terms so bleak that, even if one were to accept his arguments, the best thing to do would be nothing at all and just enjoy life while it lasts.
I’d say alignment research is not going very well! There have been successes in areas that help products get to market (e.g. RLHF) and on problems of academic interest that leave key problems unsolved (e.g. adversarial robustness), but there are several “core problems” that have not seen much progress over the years.
Good overview of this topic: https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/