I published this post on LessWrong as well, and someone made this exact same point as you. However their tone of voice was unproductive and condescending—it was clear they weren’t trying to converse. It’s good to know there’s an alternative platform where people actually want to have constructive discussions.
I’m aware of this possibility. I was aware of it even before writing the post—it was one item on the list of potential issues I noted. I have ideas on how to navigate it - possibly it’ll be the subject of a subsequent post.
Great, I would be keen to read yoir next post! Esp because I think that the ability of attackers to remove many kinds of safeguards is a fundamental challenge in open source safety.
Hey Jan, thanks for your comment.
I published this post on LessWrong as well, and someone made this exact same point as you. However their tone of voice was unproductive and condescending—it was clear they weren’t trying to converse. It’s good to know there’s an alternative platform where people actually want to have constructive discussions.
I’m aware of this possibility. I was aware of it even before writing the post—it was one item on the list of potential issues I noted. I have ideas on how to navigate it - possibly it’ll be the subject of a subsequent post.
Great, I would be keen to read yoir next post! Esp because I think that the ability of attackers to remove many kinds of safeguards is a fundamental challenge in open source safety.