It’s about trust, although it definitely varies in importance from situation to situation. There’s a very strong trust between people who have strong shared knowledge that they are all utilitarian. Establishing that is where the “purity tests” get value.
Here’s a little example.
Let’s say you had some private information about a problem/solution that the ea community hadn’t yet worked on, and the following choice:
A) reveal it to the community, with near certainty that the problem will be solved at least as well as if you yourself solved it (because you still might be the person to solve it), get only a little recognition for being the person to start the thread of investigation.
B) work on/think about the solution yourself for some time first, which gives you a significantly higher likelihood of getting credit for the solution, with few/no personal repercussions.
(B) is strictly worse from a utilitarian perspective than (A)
Which would you do? In almost all industries/communities/whatever, people do B. Many EAs (me included) like to imagine we can be a community where people do A, even though it ispersonally bad to do A. There’s a lot of kinda-decision-theory-y stuff that becomes possible between people who know each other will take (A)-like options.
For X-risk reduction (well, direct work at least), it’s much less important than in other EA stuff, because there’s not as many ways for these situations to come up, because anyone who groks (near-term) X-risk knows it’s in their own greater interest to increase progress on the solution rather than recieve the recognition.
For other areas though, personal and altruistic interests aren’t aligned and so these situations are gonna come up.
I personally wouldn’t call anyone “bad”, it’s an unhealthy way to think. I prefer people be honest about their motivations, and big respect to you for doing so.
I agree that high-trust networks are valuable (and therefore important to build or preserve). However, I think that trustworthiness is quite disconnected to how people think of their life goals (whether they’re utilitarian/altruistic or self-oriented). Instead, I think the way to build high-trust networks is by getting to know people well and paying attention to the specifics.
For instance, we can envision”selfish” people who are nice to others but utilitarians who want to sabotage others over TAI timeline disagreements or disagreements about population ethics. Similarly, we can envision “selfish” people who are transparent about their motivations, aware of their weaknesses, etc., but utilitarians who are deluded. (E.g., a utilitarian may keep a project idea secret because it doesn’t even occur to them that others might be a better fit – they may think they excel at everything and lack trust in others / not want them to have influence.)
I think it’s bad to have social norms that punish people who admit they have self-oriented goals. I think this implicitly reinforces a culture where claiming to be fully utilitarian gives you a trustworthiness benefit – but that’s the type of thing that “bad actors” would exploit.
Huh. If I had a bright idea for AI Safety, I’d share it and expect to get status/credit for doing so.
The idea of hiding any bright alignment research ideas I came up with didn’t occur to me.
I’m under the impression that because of common sense morals (i.e. I wouldn’t deliberately sabotage to get the chance to play hero), selfishly motivated EAs like me don’t behave particularly different in common scenarios.
There are scenarios where my selfishness will be highlighted, but they’re very, very narrow states and unlikely to materialise in the real world (highly contrived and only in thought experiment land). In the real world, I don’t expect it to be relevant. Ditto for concerns about superrational behaviour. The kind of superrational coordination that’s possible for purely motivated EAs but isn’t possible with me is behaviour I don’t expect to actually manifest in the real world.
Yeah the example above with choosing to not get promoted or not recieve funding is a more realistic scenario.
I agree these situations are somewhat rare in practice.
Re. AI Safety, my point was that these situations are especially rare there (among people who agree it’s a problem, which is about states of knowledge anyway, not about goals)
Thanks for this post, I think it’s a good discussion.
Epistemic status: 2am ramble.
It’s about trust, although it definitely varies in importance from situation to situation. There’s a very strong trust between people who have strong shared knowledge that they are all utilitarian. Establishing that is where the “purity tests” get value.
Here’s a little example.
Let’s say you had some private information about a problem/solution that the ea community hadn’t yet worked on, and the following choice: A) reveal it to the community, with near certainty that the problem will be solved at least as well as if you yourself solved it (because you still might be the person to solve it), get only a little recognition for being the person to start the thread of investigation. B) work on/think about the solution yourself for some time first, which gives you a significantly higher likelihood of getting credit for the solution, with few/no personal repercussions.
(B) is strictly worse from a utilitarian perspective than (A)
Which would you do? In almost all industries/communities/whatever, people do B. Many EAs (me included) like to imagine we can be a community where people do A, even though it is personally bad to do A. There’s a lot of kinda-decision-theory-y stuff that becomes possible between people who know each other will take (A)-like options.
For X-risk reduction (well, direct work at least), it’s much less important than in other EA stuff, because there’s not as many ways for these situations to come up, because anyone who groks (near-term) X-risk knows it’s in their own greater interest to increase progress on the solution rather than recieve the recognition.
For other areas though, personal and altruistic interests aren’t aligned and so these situations are gonna come up.
I personally wouldn’t call anyone “bad”, it’s an unhealthy way to think. I prefer people be honest about their motivations, and big respect to you for doing so.
I agree that high-trust networks are valuable (and therefore important to build or preserve). However, I think that trustworthiness is quite disconnected to how people think of their life goals (whether they’re utilitarian/altruistic or self-oriented). Instead, I think the way to build high-trust networks is by getting to know people well and paying attention to the specifics.
For instance, we can envision”selfish” people who are nice to others but utilitarians who want to sabotage others over TAI timeline disagreements or disagreements about population ethics. Similarly, we can envision “selfish” people who are transparent about their motivations, aware of their weaknesses, etc., but utilitarians who are deluded. (E.g., a utilitarian may keep a project idea secret because it doesn’t even occur to them that others might be a better fit – they may think they excel at everything and lack trust in others / not want them to have influence.)
I think it’s bad to have social norms that punish people who admit they have self-oriented goals. I think this implicitly reinforces a culture where claiming to be fully utilitarian gives you a trustworthiness benefit – but that’s the type of thing that “bad actors” would exploit.
Huh. If I had a bright idea for AI Safety, I’d share it and expect to get status/credit for doing so.
The idea of hiding any bright alignment research ideas I came up with didn’t occur to me.
I’m under the impression that because of common sense morals (i.e. I wouldn’t deliberately sabotage to get the chance to play hero), selfishly motivated EAs like me don’t behave particularly different in common scenarios.
There are scenarios where my selfishness will be highlighted, but they’re very, very narrow states and unlikely to materialise in the real world (highly contrived and only in thought experiment land). In the real world, I don’t expect it to be relevant. Ditto for concerns about superrational behaviour. The kind of superrational coordination that’s possible for purely motivated EAs but isn’t possible with me is behaviour I don’t expect to actually manifest in the real world.
Yeah the example above with choosing to not get promoted or not recieve funding is a more realistic scenario.
I agree these situations are somewhat rare in practice.
Re. AI Safety, my point was that these situations are especially rare there (among people who agree it’s a problem, which is about states of knowledge anyway, not about goals)
Thanks for this post, I think it’s a good discussion.