Hmm, I think if smart EA/Rat types get “corrupted” in general, they’ll present as thoughtful people with reasons that are hard to dismiss quickly when questioned by EAs. I get the vague sense that your evidence bar for “corruption” is going to be too high to be useful in most worlds where there’s a lot of corruption.
(that’s not to say that EAs/Rats/etc. who join labs/start wildly profitable companies speeding up AI progress have been “corrupted”—I just think if they were, it would present pretty similarly to how it has done and it’s hard to get lots of easy to share evidence)
I don’t disagree, it’s more that this feels a bit like privileging the hypothesis? I think the modal reason I’ve heard from people who did capabilities work and now regret it is something like “I knew I was misaligned with leadership but I thought leaving would be even worse.”
If, for some reason, Anthropic asked me how to prevent people from regretting working for them, I would focus much more on “have a thing for people to do once they realize their colleague is corrupt” instead of “have a more nuanced way of telling if their colleague is corrupt.”
Hmm, I think if smart EA/Rat types get “corrupted” in general, they’ll present as thoughtful people with reasons that are hard to dismiss quickly when questioned by EAs. I get the vague sense that your evidence bar for “corruption” is going to be too high to be useful in most worlds where there’s a lot of corruption.
(that’s not to say that EAs/Rats/etc. who join labs/start wildly profitable companies speeding up AI progress have been “corrupted”—I just think if they were, it would present pretty similarly to how it has done and it’s hard to get lots of easy to share evidence)
I don’t disagree, it’s more that this feels a bit like privileging the hypothesis? I think the modal reason I’ve heard from people who did capabilities work and now regret it is something like “I knew I was misaligned with leadership but I thought leaving would be even worse.”
If, for some reason, Anthropic asked me how to prevent people from regretting working for them, I would focus much more on “have a thing for people to do once they realize their colleague is corrupt” instead of “have a more nuanced way of telling if their colleague is corrupt.”