Huh. If I had a bright idea for AI Safety, Iโd share it and expect to get status/โcredit for doing so.
The idea of hiding any bright alignment research ideas I came up with didnโt occur to me.
Iโm under the impression that because of common sense morals (i.e. I wouldnโt deliberately sabotage to get the chance to play hero), selfishly motivated EAs like me donโt behave particularly different in common scenarios.
There are scenarios where my selfishness will be highlighted, but theyโre very, very narrow states and unlikely to materialise in the real world (highly contrived and only in thought experiment land). In the real world, I donโt expect it to be relevant. Ditto for concerns about superrational behaviour. The kind of superrational coordination thatโs possible for purely motivated EAs but isnโt possible with me is behaviour I donโt expect to actually manifest in the real world.
Yeah the example above with choosing to not get promoted or not recieve funding is a more realistic scenario.
I agree these situations are somewhat rare in practice.
Re. AI Safety, my point was that these situations are especially rare there (among people who agree itโs a problem, which is about states of knowledge anyway, not about goals)
Thanks for this post, I think itโs a good discussion.
Huh. If I had a bright idea for AI Safety, Iโd share it and expect to get status/โcredit for doing so.
The idea of hiding any bright alignment research ideas I came up with didnโt occur to me.
Iโm under the impression that because of common sense morals (i.e. I wouldnโt deliberately sabotage to get the chance to play hero), selfishly motivated EAs like me donโt behave particularly different in common scenarios.
There are scenarios where my selfishness will be highlighted, but theyโre very, very narrow states and unlikely to materialise in the real world (highly contrived and only in thought experiment land). In the real world, I donโt expect it to be relevant. Ditto for concerns about superrational behaviour. The kind of superrational coordination thatโs possible for purely motivated EAs but isnโt possible with me is behaviour I donโt expect to actually manifest in the real world.
Yeah the example above with choosing to not get promoted or not recieve funding is a more realistic scenario.
I agree these situations are somewhat rare in practice.
Re. AI Safety, my point was that these situations are especially rare there (among people who agree itโs a problem, which is about states of knowledge anyway, not about goals)
Thanks for this post, I think itโs a good discussion.