I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I’m almost entirely uninformed on that. If anyone’s written a summary on where they think these grey-area research areas lie I’d be interested to read it. Scott’s recent post was not a bad entry into the genre but obviously just worked a a very high level.
I have a very uninformed view on the relative Alignment and Capabilities contributions of things like RLHF. My intuition is that RLHF is positive for alignment I’m almost entirely uninformed on that. If anyone’s written a summary on where they think these grey-area research areas lie I’d be interested to read it. Scott’s recent post was not a bad entry into the genre but obviously just worked a a very high level.