So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress—our current plan is to collaborate with theory researchers elsewhere.
In this comment, I mentioned the simple model of “labs align their AGI if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.” The kind of applied alignment work we’re doing is targeted at reducing the cost of using these techniques, rather than increasing the pressure—we’re hoping to make it cheaper and easier for capabilities labs to apply alignment techniques that they’re already fairly motivated to use, eg by ensuring that these techniques have been tried out in miniature, and so the labs feel pretty optimistic that their practical kinks have been worked out, and there are people who have implemented the techniques before who can help them.
Organizations grow and change over time, and I wouldn’t be shocked to hear that Redwood eventually ended up engaging in various kinds of efforts to get capabilities labs to put more work into alignment. We don’t currently have plans to do so.
Do you hope for your techniques to be useful enough to AGI research that labs adopt them anyway?
That would be great, and seems plausible.
Do you want to heavily evangelize your techniques in publications/the press/etc.?
I don’t imagine wanting to heavily evangelize techniques in the press. I think that getting prominent publications about alignment research is probably useful.
So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress—our current plan is to collaborate with theory researchers elsewhere.
In this comment, I mentioned the simple model of “labs align their AGI if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.” The kind of applied alignment work we’re doing is targeted at reducing the cost of using these techniques, rather than increasing the pressure—we’re hoping to make it cheaper and easier for capabilities labs to apply alignment techniques that they’re already fairly motivated to use, eg by ensuring that these techniques have been tried out in miniature, and so the labs feel pretty optimistic that their practical kinks have been worked out, and there are people who have implemented the techniques before who can help them.
Organizations grow and change over time, and I wouldn’t be shocked to hear that Redwood eventually ended up engaging in various kinds of efforts to get capabilities labs to put more work into alignment. We don’t currently have plans to do so.
That would be great, and seems plausible.
I don’t imagine wanting to heavily evangelize techniques in the press. I think that getting prominent publications about alignment research is probably useful.