Yepp, that seems right. I do think this is a risk, but I also think it’s often overplayed in EA spaces. E.g. I’ve recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What’s the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it’s not actually clear that this by itself should make a difference, because even if they’re pessimistic they should “play to their outs”, and “interpretability becomes much better” seems like one of the main ways that pessimists could be wrong.
The main case I see for being so concerned about capability infohazards as to stop interpretability research is if you’re pessimistic about alignment but optimistic about governance. But I think that governance will still rely on e.g. a deep understanding of the systems involved. I’m pretty skeptical about strategies which only work if everything is shut down (and Scenario 2 is one attempt to gesture at why).
Yepp, that seems right. I do think this is a risk, but I also think it’s often overplayed in EA spaces. E.g. I’ve recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What’s the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it’s not actually clear that this by itself should make a difference, because even if they’re pessimistic they should “play to their outs”, and “interpretability becomes much better” seems like one of the main ways that pessimists could be wrong.
The main case I see for being so concerned about capability infohazards as to stop interpretability research is if you’re pessimistic about alignment but optimistic about governance. But I think that governance will still rely on e.g. a deep understanding of the systems involved. I’m pretty skeptical about strategies which only work if everything is shut down (and Scenario 2 is one attempt to gesture at why).