What might be an example of a “much better weird, theory-motivated alignment research” project, as mentioned in your intro doc? (It might be hard to say at this point, but perhaps you could point to something in that direction?)
I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.
What might be an example of a “much better weird, theory-motivated alignment research” project, as mentioned in your intro doc? (It might be hard to say at this point, but perhaps you could point to something in that direction?)
I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.