Thanks!
I haven’t engaged much with the psychodynamic literature or mostly only indirectly (as some therapy modalities like CFT or ST are quite eclectic and thus reference various psychodynamic concepts) but perhaps @Clare_Diane has. Is there any specific construct, paper/book or test that you have in mind here?
I’m not familiar with the SWAP but it looks very interesting (though Clare may know it), thanks for mentioning it! As you most likely know, there even exists a National Security Edition developed in collaboration with the US government.
David_Althaus
I just realized that in this (old) 80k podcast episode[1], Holden makes similar points and argues that aligned AI could be bad.
My sense is that Holden alludes to both malevolence (“really bad values, [...] we shouldn’t assume that person is going to end up being nice”) and ideological fanaticism (“create minds that [...] stick to those beliefs and try to shape the world around those beliefs”, [...] “This is the religion I follow. This is what I believe in. [...] And I am creating an AI to help me promote that religion, not to help me question it or revise it or make it better.”).
Longer quotes below (emphasis added):Holden: “The other part — if we do align the AI, we’re fine — I disagree with much more strongly. [...] if you just assume that you have a world of very capable AIs, that are doing exactly what humans want them to do, that’s very scary. [...]
Certainly, there’s the fact that because of the speed at which things move, you could end up with whoever kind of leads the way on AI, or is least cautious, having a lot of power — and that could be someone really bad. And I don’t think we should assume that just because that if you had some head of state that has really bad values, I don’t think we should assume that that person is going to end up being nice after they become wealthy, or powerful, or transhuman, or mind uploaded, or whatever — I don’t think there’s really any reason to think we should assume that.
And then I think there’s just a bunch of other things that, if things are moving fast, we could end up in a really bad state. Like, are we going to come up with decent frameworks for making sure that the digital minds are not mistreated? Are we going to come up with decent frameworks for how to ensure that as we get the ability to create whatever minds we want, we’re using that to create minds that help us seek the truth, instead of create minds that have whatever beliefs we want them to have, stick to those beliefs and try to shape the world around those beliefs? I think Carl Shulman put it as, “Are we going to have AI that makes us wiser or more powerfully insane?”
[...] I think even if we threw out the misalignment problem, we’d have a lot of work to do — and I think a lot of these issues are actually not getting enough attention.”
Rob Wiblin: Yeah. I think something that might be going on there is a bit of equivocation in the word “alignment.” You can imagine some people might mean by “creating an aligned AI,” it’s like an AI that goes and does what you tell it to — like a good employee or something. Whereas other people mean that it’s following the correct ideal values and behaviours, and is going to work to generate the best outcome. And these are really quite separate things, very far apart.
Holden Karnofsky: Yeah. Well, the second one, I just don’t even know if that’s a thing. I don’t even really know what it’s supposed to do. I mean, there’s something a little bit in between, which is like, you can have an AI that you ask it to do something, and it does what you would have told it to do if you had been more informed, and if you knew everything it knows. That’s the central idea of alignment that I tend to think of, but I think that still has all the problems I’m talking about. Just some humans seriously do intend to do things that are really nasty, and seriously do not intend — in any way, even if they knew more — to make the world as nice as we would like it to be.
And some humans really do intend and really do mean and really will want to say, you know, “Right now, I have these values” — let’s say, “This is the religion I follow. This is what I believe in. This is what I care about. And I am creating an AI to help me promote that religion, not to help me question it or revise it or make it better.” So yeah, I think that middle one does not make it safe. There might be some extreme versions, like, an AI that just figures out what’s objectively best for the world and does that or something. I’m just like, I don’t know why we would think that would even be a thing to aim for. That’s not the alignment problem that I’m interested in having solved.
- ^
I’m one of those bad EAs who don’t listen to all 80k episodes as soon as they come out.
- ^
Thanks Mike. I agree that the alliance is fortunately rather loose in the sense that most of these countries share no ideology. (In fact, some of them should arguably be ideological enemies, e.g., Islamic theocrats in Iran and Maoist communists in China).
But I worry that this alliance is held together by a hatred of (or ressentiment in general) Western secular democratic principles for ideological and (geo-)political reasons. Hatred can be an extremely powerful and unifying force. (Many political/ideological movements are arguably primarily defined, united, and motivated by what they hate, e.g., Nazism by the hatred of Jews, communism by the hatred of capitalists, racists hate other ethnicities, Democrats hate Trump and racists, Republicans hate the woke and communists, etc.)
So I worry that as long as Western democracies to influence international affairs, this alliance will continue to exist. And I certainly hope that Western democracies will continue to be powerful and worry that the world (and the future) will become a worse place if not.
Another disagreement may be related to the tractability / how easy it is to contribute:
For example, we mentioned above that the three ways totalitarian regimes have been brought down in the past are through war, resistance movements, and the deaths of dictators. Most of the people reading this article probably aren’t in a position to influence any of those forces (and even if they could, it would be seriously risky to do so, to say the least!).
Most EAs may not be able to directly work on these topics but there are various options that allow you to do something indirectly:
- working in (foreign) policy or politics (or working on financial reforms that make illegal money laundering harder for autocratic states like Russia (again, cf. Autocracy Inc.).
- becoming a journalist and writing about such topics (e.g., doing investigative journalism on the corruption in autocratic regimes), generally moving the discussion towards more important topics and away from currently trendy but less important topics
- working at think thanks that protect democratic institutions (Stephen Clare lists several)
- working on AI governance (e.g., info sec, export controls) to reduce autocratic regimes gaining access to AI. (Again, Stephen Clare already lists this area).
- probably several more career paths that we haven’t thought of
In general, it doesn’t seem harder to have an impactful career in this area than in, say, AI risk. Depending on your background and skills, it may even be a lot easier; e.g., in order to do valuable work on AI policy, you often need to understand policy/politics and technical fields like computer science & machine learning. Of course, the area is arguably more crowded (though AI is becoming more crowded every day).
I just read Stephen Clare’s 80k excellent article about the risks of stable totalitarianism.
I’ve been interested in this area for some time (though my focus is somewhat different) and I’m really glad more people are working on this.
In the article, Stephen puts the probability that a totalitarian regime will control the world indefinitely at about 1 in 30,000. My probability on a totalitarian regime controlling a non-trivial fraction of humanity’s future is considerably higher (though I haven’t thought much about this).One point of disagreement may be the following. Stephen writes:
There’s also the fact that the rise of a stable totalitarian superpower would be bad for everyone else in the world. That means that most other countries are strongly incentivized to work against this problem.
This is not clear to me. Stephen most likely understands the relevant topics way more than myself but I worry that autocratic regimes often seem to cooperate. This has happened historically—e.g., Nazi Germany, fascist Italy, and Imperial Japan—and also seems to be happening today. My sense is that Russia, China, Venezuela, Iran, and North Korea seem to have formed some type of loose alliance, at least to some extent (see also Anne Applebaum’s Autocracy Inc.). Perhaps, this doesn’t apply to strictly totalitarian regimes (though it did so for Germany, Italy and Japan in the 1940s).
Autocratic regimes control a non-trivial fraction (like 20-25%?) of World GDP. A naive extrapolation could thus suggest that some type of coalition of autocratic regimes will control 20-25% of humanity’s future (assuming these regimes won’t reform themselves).
Depending on the offense-defense balance (and depending on how people trade off reducing suffering/injustive against other values such as national sovereignty, non-interference, isolationism, personal costs to themselves, etc.), this arrangement may very well persist.
It’s unclear how much suffering such regimes would create—perhaps there would be fairly little; e.g. in China, ignoring political prisoners, the Uyghurs, etc., most people are probably doing fairly well (though a lot of people in, say, Iran aren’t doing too well, see more below). But it’s not super unlikely there would exist enormous amounts of suffering.
So, even though I agree that it’s very unlikely that a totalitarian regime will control all or even the majority of humanity’s future, it seems considerably more likely to me (perhaps even more than 1%) that a totalitarian regime—or a regime that follows some type of fanatical ideology—will control a non-trivial fraction of the universe and cause astronomical amounts of suffering indefinitely. (E.g., religious fanatics often have extremely retributive tendencies and may value the suffering of dissidents or non-believers. In a pilot, I found that 22% of religious participants at least tentatively agreed with the statement “if hell didn’t exist, we should create hell in order to punish all the sinners”. Senior officials in Iran have ordered raping female prisoners so that they would end up in hell, or at least prevented from going to heaven (IHRDC, 2011; IranWire, 2023). One might argue that religious fanatics (with access to AGI) will surely change their irrational beliefs once it’s clear they are wrong. Maybe. I don’t find it implausible that at least some people (and especially religious or political fanatics) will decide that giving up their beliefs is the greatest possible evil and decide to use their AGIs to align reality with their beliefs, rather than vice versa.)
To be clear, all of this is much more important from a s-risk focused perspective than from an upside-focused perspective.
Thanks for this[1], I’ve been interested in this area for some time as well.
Two organizations / researchers in this area that I’d like to highlight (and get others’ views on) are Protect Democracy (the executive director is actually a GiveDirectly donor) and Lee Drutman—see e.g. his 2020 book Breaking the Two-Party Doom Loop: The Case for Multiparty Democracy in America. For a shorter summary, see Drutman’s Vox piece (though Drutman has become less enthusiastic about ranked choice voting and more excited about fusion vorting).
I’d be excited for someone to write up a really high-quality report on how to best reduce polarization / political dysfunction / democratic backsliding in the US and identify promising grants in this area (if anyone is interested, feel free to contact me as I’m potentially interested in making grants in this area (though I cannot promise anything, obviously)).
- ^
ETA (July 25th). Only managed to fully read the post now. I also think that the post is a little bit too partisan. My sense is that Trump and his supporters are clearly the main threat to US democracy and much worse than the Democrats/left. However, the Democrats/left also have some radicals, and some (parts of) cultural and elite institutions promote illiberal “woke” ideology and extreme identity politics (e.g., DiAngelo’s white fragility) that gives fuel to Trump and his base (see e.g. Urban (2023), Hughes (2024) or Bowles (2024), McWhorter (2021)). I wish they would stop doing that. It’s also not helpful to brand everyone who is concerned about illegal immigration and Islam as racist and Islamophobic. I think there are legitimate concerns to be had here (especially regarding radical Islam) and telling people that they are bigoted if they have any concerns will drive some of them towards Trump.
- ^
Thanks.
I guess I agree with the gist of your comment. I’m very worried about extremist / fanatical ideologies but more on this below.because every ideology is dangerous
I guess it depends on how you define “ideology”. Let’s say “a system of ideas and ideals”. Then it seems evident that some ideologies are less dangerous than others and some seem actually beneficial (e.g., secular humanism, the Enlightenment, or EA). (Arguably, the scientific method itself is an ideology.)
I’d argue that ideologies are dangerous if they are fanatical and extreme. The main characteristics of such fanatical ideologies include dogmatism (extreme irrationality and epistemic & moral certainty), having a dualistic/Manichean worldview that views in-group members as good and everyone who disagrees as irredeemably evil, advocating for the use of violence and unwillingness to compromise, blindly following authoritarian leaders or scriptures (which is necessary since debate, evidence and reason are not allowed), and promising utopia or heaven. Of course, all of this is a continuum. (There is much more that could be said here; I’m working on a post on the subject).The reason why some autocratic rulers were no malevolent such as Marcus Aurelius, Atatürk, and others is because they followed no ideology. [...] Stoicism was a physicalist philosophy, a realist belief system.
Sounds like an ideology to me but ok. :)
Yes, I think investigative journalism (and especially Kelsey Piper’s work on Altman & OpenAI) is immensely valuable.
In general, I’ve become more pessimistic about technology-centric/ “galaxy-brained” interventions in this area and more optimistic about “down-to-earth” interventions like, for example, investigative journalism, encouraging whistleblowing (e.g. setting up prizes or funding legal costs), or perhaps psychoeducation / workshops on how to detect malevolent traits and what do when this happens (which requires, in part, courage / the ability to endure social conflict and being socially savvy, arguably not something that most EAs excel in).
I’m excited about work in this area.
Somewhat related may also be this recent paper by Costello and colleagues who found that engaging in a dialogue with GPT-4 stably decreased conspiracy beliefs (HT Lucius).
Perhaps social scientists can help with research on how to best design LLMs to improve people’s epistemics; or to make sure that interacting with LLMs at least doesn’t worsen people’s epistemics.
Great comment.
Will says that usually, that most fraudsters aren’t just “bad apples” or doing “cost-benefit analysis” on their risk of being punished. Rather, they fail to “conceptualise what they’re doing as fraud”.
I agree with your analysis but I think Will also sets up a false dichotomy. One’s inability to conceptualize or realize that one’s actions are wrong is itself a sign of being a bad apple. To simplify a bit, on the one end of the spectrum of the “high integrity to really bad continuum”, you have morally scrupulous people who constantly wonder whether their actions are wrong. On the other end of the continuum, you have pathological narcissists whose self-image/internal monologue is so out of whack with reality that they cannot even conceive of themselves doing anything wrong. That doesn’t make them great people. If anything, it makes them more scary.
Generally, the internal monologue of the most dangerous types of terrible people (think Hitler, Stalin, Mao, etc.) doesn’t go like “I’m so evil and just love to hurt everyone, hahahaha”. My best guess is, that in most cases, it goes more like “I’m the messiah, I’m so great and I’m the only one who can save the world. Everyone who disagrees with me is stupid and/or evil and I have every right to get rid of them.” [1]
Of course, there are people whose internal monologues are more straightforwardly evil/selfish (though even here lots of self-delusion is probably going on) but they usually end up being serial killers or the like, not running countries.
Also, later when Will talks about bad applies, he mentions that “typical cases of fraud [come] from people who are very successful, actually very well admired”, which again suggests that “bad apples” are not very successful or not very well admired. Well, again, many terrible people were extremely successful and admired. Like, you know, Hitler, Stalin, Mao, etc.Nor am I implying that improved governance is not a part of the solution.
Yep, I agree. In fact, the whole character vs. governance thing seems like another false dichotomy to me. You want to have good governance structures but the people in relevant positions of influence should also know a little bit about how to evaluate character.
- ^
In general, bad character is compatible with genuine moral convictions. Hitler, for example, was vegetarian for moral reasons and “used vivid and gruesome descriptions of animal suffering and slaughter at the dinner table to try to dissuade his colleagues from eating meat”. (Fraudster/bad apple vs. person with genuine convictions is another false dichotomy that people keep setting up.)
- What is malevolence? On the nature, measurement, and distribution of dark traits by 23 Oct 2024 8:41 UTC; 107 points) (
- What is malevolence? On the nature, measurement, and distribution of dark traits by 23 Oct 2024 8:41 UTC; 76 points) (LessWrong;
- 24 Oct 2024 13:35 UTC; 5 points) 's comment on What is malevolence? On the nature, measurement, and distribution of dark traits by (
- ^
Thanks Anthony!
Regarding 2: I’m totally no expert but it seems to me that there are other ways of influencing the preferences/dispositions of AI—e.g., i) penalizing, say, malevolent or fanatical reasoning/behavior/attitudes (e.g., by telling RLHF raters to specifically look out for such properties and penalize them), or ii) similarly amending the principles and rules of constitutional AI.
Great post, thanks for writing!
I like the idea of trying to shape the “personalities” of AIs.
Is there a reason to only focus on spite here instead of also trying to make AI personalities less malevolent in general? Malevolent/dark traits, at least in humans, often come together and thus arguably constitute a type of personality (also, spitefulness correlates fairly highly with most other dark traits). (Cf. the dark factor of personality.) I guess we don’t fully understand why these traits seem to cluster together in humans but I think we can’t rule out that they will also cluster together in AIs.
Another undesirable (personality?epistemic?) trait or property (in both AIs and humans) that I’m worried about is ideological fanaticism/extremism (see especially footnote 4 of the link for what I mean by that).
My sense is that ideological fanaticism is arguably:the opposite of wisdom, terrible epistemics, anti-corrigble.
very hard to cooperate with (very “fussy” in your terminology), very conflict-seeking, not being willing to compromise, extremely non-pluralistic, arguably scoring very low on “having something to lose” (perhaps partly due to the mistaken belief that history/God is on the fanatics’ side and thus even death is not the end).
often goes together with hatred of the outgroup and excessive retributivism (or spite).
It’s unclear if this framing is helpful but I find it interesting that ideological fanaticism seems to encompass most of the undesirable attributes that you outline in this post.[1] So it may be a useful umbrella term for many of the things we don’t want to see in AIs (or the humans controlling AIs).
- ^
Also, it sure seems as though ideological fanaticism was responsible for many historical atrocities and we may worry that the future will resemble the past.
For example, it could hypothetically turn out, just as a brute empirical fact, that the most effective way of aligning AIs is to treat them terribly in some way, e.g. by brainwashing them or subjecting them to painful stimuli.
Yes, agree. (For this and other reasons, I’m supportive of projects like, e.g., NYU MEP.)
I also agree that there are no strong reasons to think that technological progress improves people’s morality.
As you write, my main reason for worrying more about agential s-risks is that the greater the technological power of agents, the more their intrinsic preferences matter in how the universe will look like. To put it differently, actors whose terminal goals put some positive value on suffering (e.g., due to sadism, retributivism or other weird fanatical beliefs) would deliberately aim to arrange matter in such a way that it contains more suffering—this seems extremely worrisome if they have access to advanced technology.
Altruists would also have a much harder time to trade with such actors, whereas purely selfish actors (who don’t put positive value on suffering) could plausibly engage in mutually beneficial trades (e.g., they use (slightly) less efficient AI training/alignment methods which contain much less suffering and altruists give them some of their resources in return).But at the very least, incidental s-risks seem plausibly quite bad in expectation regardless.
Yeah, despite what I have written above, I probably worry more about incidental s-risks than the average s-risk reducer.
Existential risks from within?
(Unimportant discussion of probably useless and confused terminology.)I sometimes use terms like “inner existential risks” to refer to risk factors like malevolence and fanaticism. Inner existential risks primarily arise from “within the human heart”—that is, they are primarily related to the values, goals and/or beliefs of (some) humans.
My sense is that most x-risk discourse focuses on outer existential risks, that is, x-risks which primarily arise from outside the human mind. These could be physical or natural processes (asteroids, lethal pathogens) or technological processes that once originated in the human mind but are now out of their control (e.g., AI, nuclear weapons, engineered pandemics).
Of course, most people already believe that the most worrisome existential risks are anthropogenic, that is, caused by humans. One could argue that, say, AI and engineered pandemics are actually inner existential risks because they arose from within the human mind. I agree that the distinction between inner and outer existential risks is not super clear. Still, it seems to me that the distinction between inner and outer existential risks captures something vaguely real and may serve as some kind of intuition pump.
Then there is the related issue of more external or structural risk factors, like political or economic systems. These are systems invented by human minds and which in turn are shaping human minds and values. I will conveniently ignore this further complication.
Other potential terms for inner existential risks could be intraanthropic, idioanthropic, or psychogenic (existential) risks.
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism
(Alternative title: Some bad human values may corrupt a long reflection[1])
The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia.
Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”:
“And now, of course, the question arises: how different, exactly, are human hearts from each other? And in particular: are they sufficiently different that, when they foom, and even “on reflection,” they don’t end up pointing in exactly the same direction? After all, Yudkowsky said, above, that in order for the future to be non-trivially “of worth,” human hearts have to be in the driver’s seat. But even setting aside the insult, here, to the dolphins, bonobos, nearest grabby aliens, and so on – still, that’s only to specify a necessary condition. Presumably, though, it’s not a sufficient condition? Presumably some human hearts would be bad drivers, too? Like, I dunno, Stalin?”
What makes human hearts bad?
What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat.
As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3]
Malevolence: dangerous personality traits
Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness.
Ideological fanaticism: dangerous belief systems
There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.
See this footnote[4] for a preliminary list of defining characteristics.
Malevolence and fanaticism seem especially dangerous
Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two reasons.
Deliberately resisting reflection and idealization
First, malevolence—if reflectively endorsed[6]—and fanatical ideologies deliberately resist being changed and would thus plausibly resist idealization even during a long reflection. The most central characteristic of fanatical ideologies is arguably that they explicitly forbid criticism, questioning, and belief change and view doubters and disagreement as evil.
Putting positive value on creating harm
Second, malevolence and ideological fanaticism would not only result in the future not being as good as it possibly could—they might actively steer the future in bad directions and, for instance, result in astronomical amounts of suffering.
The preferences of malevolent humans (e.g., sadists) may be such that they intrinsically enjoy inflicting suffering on others. Similarly, many fanatical ideologies sympathize with excessive retributivism and often demonize the outgroup. Enabled by future technology, preferences for inflicting suffering on the outgroup may result in enormous disvalue—cf. concentration camps, the Gulag, or hell[7].
In the future, I hope to write more about all of this, especially long-term risks from ideological fanaticism.
Thanks to Pablo and Ruairi for comments and valuable discussions.
- ^
“Human misalignment” is arguably a confusing (and perhaps confused) term. But it sounds more sophisticated than “bad human values”.
- ^
For example, Matthew Barnett in “AI alignment shouldn’t be conflated with AI moral achievement”, Geoffrey Miller in “AI alignment with humans… but with which humans?”, lc in “Aligned AI is dual use technology”. Pablo Stafforini has called this the “third alignment problem”. And of course, Yudkowsky’s concept of CEV is meant to address these issues.
- ^
These factors may not be clearly separable. Some humans may be more attracted to fanatical ideologies due to their psychological traits and malevolent humans are often leading fanatical ideologies. Also, believing and following a fanatical ideology may not be good for your heart.
- ^
Below are some typical characteristics (I’m no expert in this area):
Unquestioning belief, absolute certainty and rigid adherence. The principles and beliefs of the ideology are seen as absolute truth and questioning or critical examination is forbidden.
Inflexibility and refusal to compromise.
Intolerance and hostility towards dissent. Anyone who disagrees or challenges the ideology is seen as evil; as enemies, traitors, or heretics.
Ingroup superiority and outgroup demonization. The in-group is viewed as superior, chosen, or enlightened. The out-group is often demonized and blamed for the world’s problems.
Authoritarianism. Fanatical ideologies often endorse (or even require) a strong, centralized authority to enforce their principles and suppress opposition, potentially culminating in dictatorship or totalitarianism.
Militancy and willingness to use violence.
Utopian vision. Many fanatical ideologies are driven by a vision of a perfect future or afterlife which can only be achieved through strict adherence to the ideology. This utopian vision often justifies extreme measures in the present.
Use of propaganda and censorship.
- ^
For example, Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would make such types of tradeoffs much less costly.
- ^
Some people (how many?) with elevated malevolent traits don’t reflectively endorse their malevolent urges and would change them if they could. However, some of them do reflectively endorse their malevolent preferences and view empathy as weakness.
- ^
Some quotes from famous Christian theologians:
Thomas Aquinas: “the blessed will rejoice in the punishment of the wicked.” “In order that the happiness of the saints may be more delightful to them and that they may render more copious thanks to God for it, they are allowed to see perfectly the sufferings of the damned”.
Samuel Hopkins: “Should the fire of this eternal punishment cease, it would in a great measure obscure the light of heaven, and put an end to a great part of the happiness and glory of the blessed.”
Jonathan Edwards: “The sight of hell torments will exalt the happiness of the saints forever.”
- What is malevolence? On the nature, measurement, and distribution of dark traits by 23 Oct 2024 8:41 UTC; 107 points) (
- Long Reflection Reading List by 24 Mar 2024 16:27 UTC; 92 points) (
- What is malevolence? On the nature, measurement, and distribution of dark traits by 23 Oct 2024 8:41 UTC; 76 points) (LessWrong;
- 3 Aug 2024 10:17 UTC; 39 points) 's comment on David_Althaus’s Quick takes by (
- 11 Apr 2024 16:25 UTC; 20 points) 's comment on Reasons for optimism about measuring malevolence to tackle x- and s-risks by (
- 9 Mar 2024 12:56 UTC; 12 points) 's comment on AI things that are perhaps as important as human-controlled AI by (
- ^
I’m not aware of quantitative estimates of omnicidal actors. Personally, I’m less interested in omnicidal actors and more interested in actors that would decrease the quality of the long-term future if they had substantial levels of influence. This is partly because the latter type of category is plausibly much larger (e.g., Hitler, Mao, and Stalin wouldn’t have wanted to destroy the world but would have been bad news regardless).
FWIW, I’ve done a few pilots on how common various “s-risk conducive” values and attitudes are (e.g., extreme retributivism, sadism, wanting to create hell) and may publish these results at some point.
Thanks for this post! I’ve been thinking about similar issues.
One thing that may be worth emphasizing is that there are large and systematic interindividual differences in idea attractiveness—different people or groups probably find different ideas attractive.
For example, for non-EA altruists, the idea of “work in soup kitchen” is probably much more attractive than for EAs because it gives you warm fuzzies (due to direct personal contact with the people you are helping), is not that effortful, and so on. Sure, it’s not very cost-effective but this is not something that non-EAs take much into account. In contrast, the idea of “earning-to-give” may be extremely unattractive to non-EAs because it might involve working at a job that you don’t feel passionate about, you might be disliked by all your left-leaning friends, and so on. For EAs, the reverse is true (though earning-to-give may still be somewhat unattractive but not that unattractive).
In fact, in an important sense, one primary reason for starting the EA movement was the realization of schlep blindness in the world at large—certain ideas (earning to give, donate to the global poor or animal charities) were unattractive / uninspiring / weird but seemed to do (much) more good than the attractive ideas (helping locally, becoming a doctor, volunteering at a dog shelter, etc.).
Of course, it’s wise to ask ourselves whether we EAs share certain characteristics that would lead us to find a certain type of idea more attractive than others. As you write in a comment, it’s fair to say that most of us are quite nerdy (interested in science, math, philosophy, intellectual activities) and we might thus be overly attracted to pursue careers that primarily involve such work (e.g., quantitative research, broadly speaking). On the other hand, most people don’t like quantitative research, so you could also argue that quantitative research is neglected! (And that certainly seems to be true sometimes, e.g., GiveWell does great work relating to global poverty.)EA should prioritise ideas that sound like no fun
I see where you’re coming from but I also think it would be unwise to ignore your passions and personal fit. If you, say, really love math and are good at it, it’s plausible that you should try to use your talents somehow! And if you’re really bad at something and find it incredibly boring and repulsive, that’s a pretty strong reason to not work on this yourself. Of course, we need to be careful to not make general judgments based on these personal considerations (and this plausibly happens sometimes subconsciously), we need to be mindful of imitating the behavior of high-status folks of our tribe, etc.)
We could zoom out even more and ask ourselves if we as EAs might be attracted to certain worldviews/philosophies that are more attractive.
What type of worldviews might EAs be attracted to? I can’t speak for others but personally, I think that I’ve been too attracted to worldviews according to which I can (way) more impact than the average do-gooder. This is probably because I derive much of my meaning and self-worth in life from how much good I believe I do. If I can change the world only a tiny amount even if I try really, really hard and am willing to bite all the bullets, that makes me feel pretty powerless, insignificant, and depressed—there is so much suffering in the world and I can do almost nothing in the big scheme of things? Very sad.
In contrast, “silver-bullet worldviews” according to which I can have super large amounts of impact because I galaxy-brained my way into finding very potent, clever, neglected levers that will change the world-trajectory—that feels pretty good. It makes me feel like I’m doing something useful, like my life has meaning. More cynically, you could say it’s all just vanity and makes me feel special and important. “I’m not like all those others schmucks who just feel content with voting every four years and donating every now and then. Those sheeple. I’m helping way more. But no big deal, of course! I’m just that altruistic.”
To be clear, I think probably something in the middle is true. Most likely, you can have more (expected) impact than the average do-gooder if you really try and reflect hard and really optimize for this. But in the (distant) past, following the reasoning of people like Anna Salamon (2010) [to be clear: I respect and like Anna a lot], I thought this might buy you like a factor of a million, whereas now I think it might only buy you a factor of a 100 or something. As usual, Brian has argued for this a long time ago. However, a factor of a 100 is still a lot and most importantly, the absolute good you can do is what ultimately matters, not the relative amount of good, and even if you only save one life in your whole life, that really, really matters.
Also, to be clear, I do believe that many interventions of most longtermist causes like AI alignment plausibly do (a great deal) more good than most “standard” approaches to doing good. I just think that the difference is considerably smaller than I previously believed, mostly for reasons related to cluelessness.
For me personally, the main take-away is something like this: Because of my desperate desire to have ever-more impact, I’ve stayed on the train to crazy town for too long and was too hesitant to walk a few stops back. The stop I’m now in is still pretty far away from where I started (and many non-EAs would think it uncomfortably far away from normal town) but my best guess is that it’s a good place to be in.
(Lastly, there are also biases against “silver-bullet worldviews”. I’ve been thinking about writing a whole post about this topic at some point.)
Thanks, Geoffrey, great points.
I agree that people should adopt advocacy styles that fit them and that the best tactics depend on the situation. What (arguably) matters most is making good arguments and raising the epistemic quality of (online) discourse. This requires participation and if people want/need to use disagreeable rhetoric in order to do that, I don’t want to stop them!
Admittedly, it’s hypocritical of me to champion kindness while staying on the sidelines and not participating in, say, Twitter discussions. (I appreciate your engagement there!) Reading and responding to countless poor and obnoxious arguments is already challenging enough, even without the additional constraint of always having to be nice and considerate.
Your point about the evolutionary advantages of different personality traits is interesting. However, (you obviously know this already) just because some trait or behavior used to increase inclusive fitness in the EEA doesn’t mean it increases global welfare today. One particularly relevant example may be dark tetrad traits which actually negatively correlate with Agreeableness (apologies for injecting my hobbyhorse into this discussion :) ).
Generally, it may be important to unpack different notions of being “disagreeable”. For example, this could mean, say, straw-manning or being (passive-)aggressive. These behaviors are often infuriating and detrimental to epistemics so I (usually) don’t like this type of disagreeableness. On the other hand, you could also characterize, say, Stefan Schubert as being “disagreeable”. Well, I’m a big fan of this type of “disagreeableness”! :)
Thanks for writing this post!
You write:While this is somewhat compelling, this may not be enough to warrant such a restriction of our search area. Many of the actors we should be concerned about, for our work here, might have very low levels of such traits. And features such as spite and unforgivingness might also deserve attention (see Clifton et al. 2022).
I wanted to note that the term ‘malevolence’ wasn’t meant to exclude traits such as spite or unforgivingness. See for example the introduction which explicitly mentions spite (emphasis mine):
This suggests the existence of a general factor of human malevolence[2]: the Dark Factor of Personality (Moshagen et al., 2018)—[...] characterized by egoism, lack of empathy[3] and guilt, Machiavellianism, moral disengagement, narcissism, psychopathy, sadism, and spitefulness.
So to be clear, I encourage others to explore other traits!
Though I’d keep in mind that there exist moderate to large correlations between most of these “bad” traits such that for most new traits we can come up with, there will exist substantial positive correlations with other dark traits we already considered. (In general, I found it helpful to view the various “bad” traits not as completely separate, orthogonal traits that have nothing to do with each other but also as “[...] specific manifestations of a general, basic dispositional behavioral tendency [...] to maximize one’s individual utility— disregarding, accepting, or malevolently provoking disutility for others—, accompanied by beliefs that serve as justifications” (Moshagen et al., 2018).)
Given this, I’m probably more skeptical that there exist many actors who are, say, very spiteful but exhibit no other “dark” traits—but there are probably some!
That being said, I’m also wary of going too far in the direction of “whatever bad trait, it’s all the same, who cares” and losing conceptual clarity and rigor. :)
Thanks for writing this! I also voluntarily reduced my salary for several years (and lived partly off my savings) and had been meaning to write about this for some time but never got around to it. It’s always been somewhat puzzling why this isn’t more common. While it probably shouldn’t become a norm for the reasons you outline, my sense is that more EAs should consider this option (though I may be underestimating how common it is already).
I agree with all the downsides you list but I could imagine there are also other upsides to voluntary salary reduction. For example, it can signal your commitment to both your organization and to taking altruistic ideas seriously—following the logic where it leads, even when that means doing unconventional things. This might inspire others.
I also worry that we might be biased to overestimate the downsides of voluntary salary reductions: Donating creates tangible satisfaction—the concrete act of giving, the tax receipt, the social recognition, etc. Taking a lower salary offers none of these psychological rewards and can even feel like a loss in status and recognition.