Evan Hubinger (he/him/his) (evanjhub@gmail.com)
Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.
Previously: MIRI, OpenAI
See: “Why I’m joining Anthropic”
Selected work:
I don’t think this and didn’t say it. If you have any quotes from the post that you think say this, I’d be happy to edit it to be more clear, but from my perspective it feels like you’re inventing a straw man to be mad at rather than actually engaging with what I said.
I think that, for the most part, you should be drawing your ethical boundaries in a way that is logically prior to learning about these sorts of facts. Otherwise it’s very hard to cooperate with you, for example.
It isn’t intended as a rhetorical question. I am being quite sincere there, though rereading it, I see how you could be confused. I just edited that section to the following: