Thanks Anthony!
Regarding 2: I’m totally no expert but it seems to me that there are other ways of influencing the preferences/dispositions of AI—e.g., i) penalizing, say, malevolent or fanatical reasoning/behavior/attitudes (e.g., by telling RLHF raters to specifically look out for such properties and penalize them), or ii) similarly amending the principles and rules of constitutional AI.
David_Althaus
Great post, thanks for writing!
I like the idea of trying to shape the “personalities” of AIs.
Is there a reason to only focus on spite here instead of also trying to make AI personalities less malevolent in general? Malevolent/dark traits, at least in humans, often come together and thus arguably constitute a type of personality (also, spitefulness correlates fairly highly with most other dark traits). (Cf. the dark factor of personality.) I guess we don’t fully understand why these traits seem to cluster together in humans but I think we can’t rule out that they will also cluster together in AIs.
Another undesirable (personality?epistemic?) trait or property (in both AIs and humans) that I’m worried about is ideological fanaticism/extremism (see especially footnote 4 of the link for what I mean by that).
My sense is that ideological fanaticism is arguably:the opposite of wisdom, terrible epistemics, anti-corrigble.
very hard to cooperate with (very “fussy” in your terminology), very conflict-seeking, not being willing to compromise, extremely non-pluralistic, arguably scoring very low on “having something to lose” (perhaps partly due to the mistaken belief that history/God is on the fanatics’ side and thus even death is not the end).
often goes together with hatred of the outgroup and excessive retributivism (or spite).
It’s unclear if this framing is helpful but I find it interesting that ideological fanaticism seems to encompass most of the undesirable attributes that you outline in this post.[1] So it may be a useful umbrella term for many of the things we don’t want to see in AIs (or the humans controlling AIs).
- ^
Also, it sure seems as though ideological fanaticism was responsible for many historical atrocities and we may worry that the future will resemble the past.
For example, it could hypothetically turn out, just as a brute empirical fact, that the most effective way of aligning AIs is to treat them terribly in some way, e.g. by brainwashing them or subjecting them to painful stimuli.
Yes, agree. (For this and other reasons, I’m supportive of projects like, e.g., NYU MEP.)
I also agree that there are no strong reasons to think that technological progress improves people’s morality.
As you write, my main reason for worrying more about agential s-risks is that the greater the technological power of agents, the more their intrinsic preferences matter in how the universe will look like. To put it differently, actors whose terminal goals put some positive value on suffering (e.g., due to sadism, retributivism or other weird fanatical beliefs) would deliberately aim to arrange matter in such a way that it contains more suffering—this seems extremely worrisome if they have access to advanced technology.
Altruists would also have a much harder time to trade with such actors, whereas purely selfish actors (who don’t put positive value on suffering) could plausibly engage in mutually beneficial trades (e.g., they use (slightly) less efficient AI training/alignment methods which contain much less suffering and altruists give them some of their resources in return).But at the very least, incidental s-risks seem plausibly quite bad in expectation regardless.
Yeah, despite what I have written above, I probably worry more about incidental s-risks than the average s-risk reducer.
Existential risks from within?
(Unimportant discussion of probably useless and confused terminology.)I sometimes use terms like “inner existential risks” to refer to risk factors like malevolence and fanaticism. Inner existential risks primarily arise from “within the human heart”—that is, they are primarily related to the values, goals and/or beliefs of (some) humans.
My sense is that most x-risk discourse focuses on outer existential risks, that is, x-risks which primarily arise from outside the human mind. These could be physical or natural processes (asteroids, lethal pathogens) or technological processes that once originated in the human mind but are now out of their control (e.g., AI, nuclear weapons, engineered pandemics).
Of course, most people already believe that the most worrisome existential risks are anthropogenic, that is, caused by humans. One could argue that, say, AI and engineered pandemics are actually inner existential risks because they arose from within the human mind. I agree that the distinction between inner and outer existential risks is not super clear. Still, it seems to me that the distinction between inner and outer existential risks captures something vaguely real and may serve as some kind of intuition pump.
Then there is the related issue of more external or structural risk factors, like political or economic systems. These are systems invented by human minds and which in turn are shaping human minds and values. I will conveniently ignore this further complication.
Other potential terms for inner existential risks could be intraanthropic, idioanthropic, or psychogenic (existential) risks.
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism
(Alternative title: Some bad human values may corrupt a long reflection[1])
The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia.
Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”:
“And now, of course, the question arises: how different, exactly, are human hearts from each other? And in particular: are they sufficiently different that, when they foom, and even “on reflection,” they don’t end up pointing in exactly the same direction? After all, Yudkowsky said, above, that in order for the future to be non-trivially “of worth,” human hearts have to be in the driver’s seat. But even setting aside the insult, here, to the dolphins, bonobos, nearest grabby aliens, and so on – still, that’s only to specify a necessary condition. Presumably, though, it’s not a sufficient condition? Presumably some human hearts would be bad drivers, too? Like, I dunno, Stalin?”
What makes human hearts bad?
What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat.
As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3]
Malevolence: dangerous personality traits
Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness.
Ideological fanaticism: dangerous belief systems
There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.
See this footnote[4] for a preliminary list of defining characteristics.
Malevolence and fanaticism seem especially dangerous
Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two reasons.
Deliberately resisting reflection and idealization
First, malevolence—if reflectively endorsed[6]—and fanatical ideologies deliberately resist being changed and would thus plausibly resist idealization even during a long reflection. The most central characteristic of fanatical ideologies is arguably that they explicitly forbid criticism, questioning, and belief change and view doubters and disagreement as evil.
Putting positive value on creating harm
Second, malevolence and ideological fanaticism would not only result in the future not being as good as it possibly could—they might actively steer the future in bad directions and, for instance, result in astronomical amounts of suffering.
The preferences of malevolent humans (e.g., sadists) may be such that they intrinsically enjoy inflicting suffering on others. Similarly, many fanatical ideologies sympathize with excessive retributivism and often demonize the outgroup. Enabled by future technology, preferences for inflicting suffering on the outgroup may result in enormous disvalue—cf. concentration camps, the Gulag, or hell[7].
In the future, I hope to write more about all of this, especially long-term risks from ideological fanaticism.
Thanks to Pablo and Ruairi for comments and valuable discussions.
- ^
“Human misalignment” is arguably a confusing (and perhaps confused) term. But it sounds more sophisticated than “bad human values”.
- ^
For example, Matthew Barnett in “AI alignment shouldn’t be conflated with AI moral achievement”, Geoffrey Miller in “AI alignment with humans… but with which humans?”, lc in “Aligned AI is dual use technology”. Pablo Stafforini has called this the “third alignment problem”. And of course, Yudkowsky’s concept of CEV is meant to address these issues.
- ^
These factors may not be clearly separable. Some humans may be more attracted to fanatical ideologies due to their psychological traits and malevolent humans are often leading fanatical ideologies. Also, believing and following a fanatical ideology may not be good for your heart.
- ^
Below are some typical characteristics (I’m no expert in this area):
Unquestioning belief, absolute certainty and rigid adherence. The principles and beliefs of the ideology are seen as absolute truth and questioning or critical examination is forbidden.
Inflexibility and refusal to compromise.
Intolerance and hostility towards dissent. Anyone who disagrees or challenges the ideology is seen as evil; as enemies, traitors, or heretics.
Ingroup superiority and outgroup demonization. The in-group is viewed as superior, chosen, or enlightened. The out-group is often demonized and blamed for the world’s problems.
Authoritarianism. Fanatical ideologies often endorse (or even require) a strong, centralized authority to enforce their principles and suppress opposition, potentially culminating in dictatorship or totalitarianism.
Militancy and willingness to use violence.
Utopian vision. Many fanatical ideologies are driven by a vision of a perfect future or afterlife which can only be achieved through strict adherence to the ideology. This utopian vision often justifies extreme measures in the present.
Use of propaganda and censorship.
- ^
For example, Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would make such types of tradeoffs much less costly.
- ^
Some people (how many?) with elevated malevolent traits don’t reflectively endorse their malevolent urges and would change them if they could. However, some of them do reflectively endorse their malevolent preferences and view empathy as weakness.
- ^
Some quotes from famous Christian theologians:
Thomas Aquinas: “the blessed will rejoice in the punishment of the wicked.” “In order that the happiness of the saints may be more delightful to them and that they may render more copious thanks to God for it, they are allowed to see perfectly the sufferings of the damned”.
Samuel Hopkins: “Should the fire of this eternal punishment cease, it would in a great measure obscure the light of heaven, and put an end to a great part of the happiness and glory of the blessed.”
Jonathan Edwards: “The sight of hell torments will exalt the happiness of the saints forever.”
- Long Reflection Reading List by 24 Mar 2024 16:27 UTC; 74 points) (
- 11 Apr 2024 16:25 UTC; 20 points) 's comment on Reasons for optimism about measuring malevolence to tackle x- and s-risks by (
- ^
I’m not aware of quantitative estimates of omnicidal actors. Personally, I’m less interested in omnicidal actors and more interested in actors that would decrease the quality of the long-term future if they had substantial levels of influence. This is partly because the latter type of category is plausibly much larger (e.g., Hitler, Mao, and Stalin wouldn’t have wanted to destroy the world but would have been bad news regardless).
FWIW, I’ve done a few pilots on how common various “s-risk conducive” values and attitudes are (e.g., extreme retributivism, sadism, wanting to create hell) and may publish these results at some point.
Thanks for this post! I’ve been thinking about similar issues.
One thing that may be worth emphasizing is that there are large and systematic interindividual differences in idea attractiveness—different people or groups probably find different ideas attractive.
For example, for non-EA altruists, the idea of “work in soup kitchen” is probably much more attractive than for EAs because it gives you warm fuzzies (due to direct personal contact with the people you are helping), is not that effortful, and so on. Sure, it’s not very cost-effective but this is not something that non-EAs take much into account. In contrast, the idea of “earning-to-give” may be extremely unattractive to non-EAs because it might involve working at a job that you don’t feel passionate about, you might be disliked by all your left-leaning friends, and so on. For EAs, the reverse is true (though earning-to-give may still be somewhat unattractive but not that unattractive).
In fact, in an important sense, one primary reason for starting the EA movement was the realization of schlep blindness in the world at large—certain ideas (earning to give, donate to the global poor or animal charities) were unattractive / uninspiring / weird but seemed to do (much) more good than the attractive ideas (helping locally, becoming a doctor, volunteering at a dog shelter, etc.).
Of course, it’s wise to ask ourselves whether we EAs share certain characteristics that would lead us to find a certain type of idea more attractive than others. As you write in a comment, it’s fair to say that most of us are quite nerdy (interested in science, math, philosophy, intellectual activities) and we might thus be overly attracted to pursue careers that primarily involve such work (e.g., quantitative research, broadly speaking). On the other hand, most people don’t like quantitative research, so you could also argue that quantitative research is neglected! (And that certainly seems to be true sometimes, e.g., GiveWell does great work relating to global poverty.)EA should prioritise ideas that sound like no fun
I see where you’re coming from but I also think it would be unwise to ignore your passions and personal fit. If you, say, really love math and are good at it, it’s plausible that you should try to use your talents somehow! And if you’re really bad at something and find it incredibly boring and repulsive, that’s a pretty strong reason to not work on this yourself. Of course, we need to be careful to not make general judgments based on these personal considerations (and this plausibly happens sometimes subconsciously), we need to be mindful of imitating the behavior of high-status folks of our tribe, etc.)
We could zoom out even more and ask ourselves if we as EAs might be attracted to certain worldviews/philosophies that are more attractive.
What type of worldviews might EAs be attracted to? I can’t speak for others but personally, I think that I’ve been too attracted to worldviews according to which I can (way) more impact than the average do-gooder. This is probably because I derive much of my meaning and self-worth in life from how much good I believe I do. If I can change the world only a tiny amount even if I try really, really hard and am willing to bite all the bullets, that makes me feel pretty powerless, insignificant, and depressed—there is so much suffering in the world and I can do almost nothing in the big scheme of things? Very sad.
In contrast, “silver-bullet worldviews” according to which I can have super large amounts of impact because I galaxy-brained my way into finding very potent, clever, neglected levers that will change the world-trajectory—that feels pretty good. It makes me feel like I’m doing something useful, like my life has meaning. More cynically, you could say it’s all just vanity and makes me feel special and important. “I’m not like all those others schmucks who just feel content with voting every four years and donating every now and then. Those sheeple. I’m helping way more. But no big deal, of course! I’m just that altruistic.”
To be clear, I think probably something in the middle is true. Most likely, you can have more (expected) impact than the average do-gooder if you really try and reflect hard and really optimize for this. But in the (distant) past, following the reasoning of people like Anna Salamon (2010) [to be clear: I respect and like Anna a lot], I thought this might buy you like a factor of a million, whereas now I think it might only buy you a factor of a 100 or something. As usual, Brian has argued for this a long time ago. However, a factor of a 100 is still a lot and most importantly, the absolute good you can do is what ultimately matters, not the relative amount of good, and even if you only save one life in your whole life, that really, really matters.
Also, to be clear, I do believe that many interventions of most longtermist causes like AI alignment plausibly do (a great deal) more good than most “standard” approaches to doing good. I just think that the difference is considerably smaller than I previously believed, mostly for reasons related to cluelessness.
For me personally, the main take-away is something like this: Because of my desperate desire to have ever-more impact, I’ve stayed on the train to crazy town for too long and was too hesitant to walk a few stops back. The stop I’m now in is still pretty far away from where I started (and many non-EAs would think it uncomfortably far away from normal town) but my best guess is that it’s a good place to be in.
(Lastly, there are also biases against “silver-bullet worldviews”. I’ve been thinking about writing a whole post about this topic at some point.)
Thanks, Geoffrey, great points.
I agree that people should adopt advocacy styles that fit them and that the best tactics depend on the situation. What (arguably) matters most is making good arguments and raising the epistemic quality of (online) discourse. This requires participation and if people want/need to use disagreeable rhetoric in order to do that, I don’t want to stop them!
Admittedly, it’s hypocritical of me to champion kindness while staying on the sidelines and not participating in, say, Twitter discussions. (I appreciate your engagement there!) Reading and responding to countless poor and obnoxious arguments is already challenging enough, even without the additional constraint of always having to be nice and considerate.
Your point about the evolutionary advantages of different personality traits is interesting. However, (you obviously know this already) just because some trait or behavior used to increase inclusive fitness in the EEA doesn’t mean it increases global welfare today. One particularly relevant example may be dark tetrad traits which actually negatively correlate with Agreeableness (apologies for injecting my hobbyhorse into this discussion :) ).
Generally, it may be important to unpack different notions of being “disagreeable”. For example, this could mean, say, straw-manning or being (passive-)aggressive. These behaviors are often infuriating and detrimental to epistemics so I (usually) don’t like this type of disagreeableness. On the other hand, you could also characterize, say, Stefan Schubert as being “disagreeable”. Well, I’m a big fan of this type of “disagreeableness”! :)
Thanks for writing this post!
You write:While this is somewhat compelling, this may not be enough to warrant such a restriction of our search area. Many of the actors we should be concerned about, for our work here, might have very low levels of such traits. And features such as spite and unforgivingness might also deserve attention (see Clifton et al. 2022).
I wanted to note that the term ‘malevolence’ wasn’t meant to exclude traits such as spite or unforgivingness. See for example the introduction which explicitly mentions spite (emphasis mine):
This suggests the existence of a general factor of human malevolence[2]: the Dark Factor of Personality (Moshagen et al., 2018)—[...] characterized by egoism, lack of empathy[3] and guilt, Machiavellianism, moral disengagement, narcissism, psychopathy, sadism, and spitefulness.
So to be clear, I encourage others to explore other traits!
Though I’d keep in mind that there exist moderate to large correlations between most of these “bad” traits such that for most new traits we can come up with, there will exist substantial positive correlations with other dark traits we already considered. (In general, I found it helpful to view the various “bad” traits not as completely separate, orthogonal traits that have nothing to do with each other but also as “[...] specific manifestations of a general, basic dispositional behavioral tendency [...] to maximize one’s individual utility— disregarding, accepting, or malevolently provoking disutility for others—, accompanied by beliefs that serve as justifications” (Moshagen et al., 2018).)
Given this, I’m probably more skeptical that there exist many actors who are, say, very spiteful but exhibit no other “dark” traits—but there are probably some!
That being said, I’m also wary of going too far in the direction of “whatever bad trait, it’s all the same, who cares” and losing conceptual clarity and rigor. :)
Definitely!
Thanks, makes sense!
I agree that confrontational/hostile tactics have their place and can be effective (under certain circumstances they are even necessary). I also agree that there are several plausible positive radical flank effects. Overall, I’d still guess that, say, PETA’s efforts are net negative—though it’s definitely not clear to me and I’m by no means an expert on this topic. It would be great to have more research on this topic.[1]I also think we should reconceptualize what the AI companies are doing as hostile, aggressive, and reckless. EA is too much in a frame where the AI companies are just doing their legitimate jobs, and we are the ones that want this onerous favor of making sure their work doesn’t kill everyone on earth.
Yeah, I’m sympathetic to such concerns. I sometimes worry about being biased against the more “dirty and tedious” work of trying to slow down AI or public AI safety advocacy. For example, the fact that it took us more than ten years to seriously consider the option of “slowing down AI” seems perhaps a bit puzzling. One possible explanation is that some of us have had a bias towards doing intellectually interesting AI alignment research rather than low-status, boring work on regulation and advocacy. To be clear, there were of course also many good reasons to not consider such options earlier (such as a complete lack of public support). (Also, AI alignment research (generally speaking) is great, of course!)
It still seems possible to me that one can convey strong messages like “(some) AI companies are doing something reckless and unreasonable” while being nice and considerate, similarly to how Martin Luther King very clearly condemned racism without being (overly) hostile.Again, though, one amazing thing about not having explored outside game much in AI Safety is that we have the luxury of pushing the Overton window with even the most bland advocacy.
Agreed. :)
- ^
For example, present participants with (hypothetical) i) confrontational and ii) considerate AI pause protest scenarios/messages and measure resulting changes in beliefs and attitudes. I think Rethink Priorities has already done some work in this vein.
- ^
Great post, I agree with most of it!
Overall, I’m in favor of more (well-executed) public advocacy à la AI Pause (though I do worry a lot about various backfire risks (also, I wonder whether a message like “AI slow” may be better)), and I commend you for taking the initiative despite it (I imagine) being kinda uncomfortable or even scary at times!
(ETA: I’ve become even more uncertain about all of this. I might still be slightly in favor of (well-executed) AI Pause public advocacy but would probably prefer emphasizing messages like conditional AI Pause or AI Slow, and yeah, it all really depends greatly on the execution.)The inside-outside game spectrum seems very useful. We might want to keep in mind another (admittedly obvious) spectrum, ranging from hostile/confrontational to nice/considerate/cooperative.
Two points in your post made me wonder whether you view the outside-game as necessarily being more on the hostile/confrontational end of the spectrum:
1) As an example for outside-game you list “moralistic, confrontational advocacy” (emphasis mine).
2) You also write (emphasis mine):
Funnily enough, even though animal advocates do radical stunts, you do not hear this fear expressed much in animal advocacy. If anything, in my experience, the existence of radical vegans can make it easier for “the reasonable ones” to gain access to institutions.
This implicitly characterizes the outer game with radical stunts, radical, and “unreasonable” people.
However, my sense is that outside-game interventions (hereafter: activism or public advocacy) can differ enormously on the hostility vs. considerateness dimension, even while holding other effects (such as efficacy) constant.
The obvious example is Martin Luther King’s activism, perhaps most succinctly characterized by his famous “I have a Dream” speech which was non-confrontational and emphasized themes of cooperation, respect, and even camaraderie.[1] (In fact, King was criticized by others for being too compromising.[2]) On the hostile/confrontational side of the spectrum you had people like Malcolm X, or the Black Panther Party.[3] In the field of animal advocacy, you have organizations like PETA on the confrontational end of the spectrum and, say, Mercy for Animals on the more considerate side.
As you probably have guessed, I prefer considerate activism over more confrontational activism. For example, my guess is that King and Mercy for Animals have done much more good for African Americans and animals, respectively, than Malcolm X and PETA.
(As an aside and to be super clear, I didn’t want to suggest that you or AI Pause is or will be disrespectful/hostile and, say, throw paper clips at Meta employees! :P )
A couple of weak arguments in favor of considerate/cooperative public advocacy over confrontational/hostile advocacy:
Taking a more confrontational tone makes everyone more emotional and tense, which probably decreases truth-seeking, scout-mindset, and the general epistemic quality of discourse. It also makes people more aggressive and might escalate conflict, and dangerous emotional and behavioral patterns such as spite, retaliation, or even (threats of) violence. It may also help to bring about a climate where the most outrage-inducing message spreads the fastest. Last, since this is EA, here’s the obligatory option value argument: It seems easier to go from a more considerate to a more confrontational stance than vice versa.
As an aside, (and contrary to what you write in the above quote), I often have heard the fear expressed that the actions of radical vegans will backfire. I’ve certainly witnessed that people were much less receptive to my animal welfare arguments because they’ve had bad experiences with “unreasonable” vegans who e.g. yelled expletives at them.[4] I think you can also see this reflected in the general public where vegans don’t have a great reputation, partly based on the aggressive actions of a few confrontational and hostile vegans or vegan organizations like PETA.
Political science research (e.g., Simpson et al., 2018) also seems to suggest that nonviolent protests are better than violent protests. (Of course, I’m not trying to imply that you were arguing for violent protests, in fact, you repeatedly say (in other places) that you’re organizing a nonviolent protest!) Importantly, the Simpson et al. paper suggests that violent protests make the protester side appear unreasonable and that this is the mechanism that causes the public to support this side less. It seems plausible to me that more confrontational and hostile public activism, even if it’s nonviolent, is more likely to appear unreasonable (especially when it comes to movements that might seem a bit fringe and which don’t yet have a long history of broad public support).
In general, I worry that increasing hostility/conflict, in particular in the field of AI, may be a risk factor for x-risk and especially s-risks. Of course, many others have written about the value of compromise/being nice and the dangers of unnecessary hostility, e.g., Schubert & Cotton-Barratt (2017), Tomasik (many examples, most relevant 2015), and Baumann (here and here).
Needless to say, there are risks to being too nice/considerate but I think they are outweighed by the benefits though it obviously depends on the specifics. (As you imply in your post, it’s probably also true that all public protests, by their very nature, are more confrontational than silently working out compromises behind closed doors. Still, my guess is that certain forms of public advocacy can score fairly high on the considerateness dimension while still being effective.)
To summarize, it may be valuable to emphasize considerateness (along other desiderate such as good epistemics) as a core part of the AI Pause movement’s memetic fabric, to minimize the probability that it will become more hostile in the future since we will probably have only limited memetic control over the movement once it gets big. This may also amount to pulling the rope sideways, in the sense that public advocacy against AI risk may be somewhat overdetermined (?) but we are perhaps at an inflection point where we can shape its overall tone / stance on the confrontational vs. considerate spectrum.
- ^
Examples: “former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood” and “little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers”.
- ^
From Wikipedia: “Some Black leaders later criticized the speech (along with the rest of the march) as too compromising. Malcolm X later wrote in his autobiography: “Who ever heard of angry revolutionaries swinging their bare feet together with their oppressor in lily pad pools, with gospels and guitars and ‘I have a dream’ speeches?
- ^
To be fair, their different tactics were probably also the result of more extreme religious and political beliefs.
- ^
I should note that I probably have much less experience with animal advocacy than you.
- ^
Sorry, yeah, I didn’t make my reasoning fully transparent.
One worry is that most private investigations won’t create common knowledge/won’t be shared widely enough that they cause the targets of these investigations to be sufficiently prevented from participating in a community even if this is appropriate. It’s just difficult and has many drawbacks to share a private investigations with every possible EA organization, EAGx organizer, podcast host, community builder, etc.
My understanding is that this has actually happened to some extent in the case of NonLinear and in other somewhat similar cases (though I may be wrong!).
But you’re right, if private investigations are sufficiently compelling and sufficiently widely shared they will have almost the same effects. Though at some point, you may also wonder how different very widely shared private investigations are from public investigations. In some sense, the latter may be more fair because the person can read the accusations and defend themselves. (Also, frequent widely shared private investigations might contribute even more to a climate of fear, paranoia and witch hunts than public investigations.)
ETA: Just to be clear, I also agree that public investigations should be more of a “last resort” measure and not be taken lightly. I guess we disagree about where to draw this line.
- More negative press for EA (which I haven’t seen yet)
- Reducing morale of EA people in general, causing lower productivity or even people leaving the movement.
My sense is that these two can easily go the other way.
If you try to keep all your worries about bad actors a secret you basically count on their bad actions never becoming public. But if they do become public at a later date (which seems fairly likely because bad actors usually don’t become more wise and sane with age, and, if they aren’t opposed, they get more resources and thus more opportunities to create harm and scandals), then the resulting PR fallout is even bigger. I mean, in the case of SBF, it would have been good for the EA brand if there were more public complaints about SBF early on and then EAs could refer to them and say “see, we didn’t fully trust him, we weren’t blindly promoting him”.
Keeping silent about bad actors can easily decrease morale because many people who interacted with bad actors will have become distrustful of them and worry about the average character/integrity of EAs. Then they see these bad actors giving talks at EAGs, going on podcast interviews, and so on. That can easily give rise to thoughts/emotions like “man, EA is just not my tribe anymore, they just give a podium to whomever is somewhat productive, doesn’t matter if they’re good people or not.”
Thanks for your kind words, I’m glad to hear that you found the post helpful! :)
Thanks for pointing this out! I corrected both links.
Appendix B: Really random musings
The burnout-surrender-recovery cycle
Here’s a vicious cycle I’ve observed in myself. (I wasn’t aware of this for a long time.)
I) For whatever reason (often after a period of trying to accomplish something but failing), I feel terrible about myself. I might have thoughts like: “I’m a total failure. I haven’t done anything impactful in my life. I’m stupid and have no talents. I made idiotic mistakes with my career choice. I’m also really unproductive and have zero energy, so there is no hope for me anymore.”
II) Usually, I don’t really believe these thoughts and resist them. But during particularly bad periods, I just fully surrender to them. I give up. There is nothing I can do. Yes, I had no impact and won’t have any impact going forward. Yes, my life has been a waste. But whatever, fuck it, there is nothing I can do. It’s game over, I fucked up beyond redemption.
II) Paradoxically perhaps, a sense of surrender, acceptance and relief sets in. Whatever, it’s over, I can stop beating myself up. I can just do whatever, nothing matters, it’s all pointless anyways. I might watch movies, play video games or listen to music and don’t even think about doing something “productive”.
This period might last anywhere from a few hours to several days. More intense thoughts and feelings (outlined in I and II) usually lead to longer and more “committed” periods of nihilistic surrender.
IV) Weirdly enough, at some point, I end up feeling much more refreshed, motivated, and creative.
V) Of course, I now start working again and push myself hard, partly because I try to make up for all the time I wasted. Predictably, since I haven’t changed my relationship to and motivations for having impact on a more fundamental level, sooner or later I start feeling burnt out again.
“Nihilism vacations”
One theory for why a period of increased motivation follows step III) is that I actually spend some time fully committing, without guilt or doubt (because I have given up) to resting and relaxing which allows me to recover more fully.
Paradoxically, only hitting “rock bottom” allowed me to do this. Before that, I might spend weeks only being able to work a few hours a day and then being too exhausted to continue but not being able to fully commit to resting because I feel guilty about not having worked more. But this “half-assed resting” didn’t allow me to recover fully and on the next day, I still can work only a few hours, certainly not enough to feel like I deserve to rest and have fun, which perpetuates the cycle.
A related vicious circle (cf. the Matthew effect)
Generally speaking, if you obsess about having impact, the less impact you have, the more guilty and worthless you feel and the more you crave impact as this would (temporarily) relieve your feelings of worthlessness and guilt. Consequently, the less impact you have, the harder it is to stop thinking about EA, relax, rest, and recover. Due to this inability to take a break and recover, you might be stuck in a state of low mood, low creativity, and low motivation.
In contrast, the more impact you have, the easier it is to take a break from thinking about impact and just have fun and rest. You feel you deserve it, after all, you already did so much! This care-free attitude allows you to recharge your batteries and come up with new creative ideas and generally be more productive. You also feel much more comfortable taking big risks.
It’s perhaps a bit similar to how if you are poor, have an exhausting minimum-wage job, and are living from paycheck to paycheck, you don’t have the money or energy to take a break in order to find more well-paying jobs or invest in your education or skills to get better paying jobs later.
All of this is also reminiscent of Zvi’s notion of Slack.
More on individual differences in resting needs
When setting standards for productivity/effort, it’s easy to ignore individual differences and compare yourself with the most committed and hard-working EAs. In our early twenties, we lived in an EA office and noticed that some people were working 12 hours almost every day, even on weekends. We tried to match that by pushing ourselves very hard, failed, and felt lazy and selfish in comparison.
Nowadays, we believe that some people can simply work more than others, due to a variety of factors, including ones outside of one’s control. You aren’t lazy, selfish or otherwise mistaken if you can’t work as much as the most hard-working EAs. We know there are enormous individual differences in many traits that feed into productivity (defined here roughly as average hours worked per week). These traits include how much sleep you need, sleep quality, mental health (ADHD-symptoms, depression, etc.), hedonic set point, physical endurance, immune functioning, and so on. This variation seems at least partly attributable to genetics. It’s thus plausible that there will also be large interindividual differences in productivity and that some of the variance will be due to factors outside one’s control. [1]
Also, keep in mind that your ability to work hard might decrease with age, for various reasons. Don’t make the mistake of basing your standards for productive daily hours partly on what you could do when you were young (especially if you forget to factor in that the work you’re doing now might be much more difficult and less rewarding than what you did when you were younger).
More on healthy vs. unhealthy impact obsession and ‘hardcoreness’
A reader wrote that the posts seems to occasionally alternate between two contradicting views on the distinction between unhealthy and healthy impact focus:
1) Continuum view: There’s a continuum that goes from ‘doesn’t care at all about doing good’ to ‘is totally obsessed with impact’. The ‘healthy + moral’ good region is towards the obsessive end, e.g. 80-95%. But people who are, say, above the 95th percentile are (usually) unhealthy.
2) Orthogonality view: Impact focus and mental health are orthogonal. People can be 100% impact-focused but still healthy, as long as they are enjoying themselves and feel motivated by passion.
Here is my rambly reply:
As a matter of human biology / physiology / psychology, you simply cannot always think about maximizing impartial impact or take actions to this effect. Humans are biological creatures with a variety of psychological and physiological needs.
The most obvious, boring example is that you need to sleep, shower, eat, have shelter, and you need to spend some time optimizing for these things. But these examples are probably uninteresting.
Let’s take a more concrete example of Julia Wise, widely considered (rightfully so) to be one of the most altruistic and dedicated EAs. Some time ago, Julia wanted to have children. At first she tried denying herself this wish because she thought that having children is not optimal for impartial impact, all else equal (I think this is most likely correct, at least in her case).
Of course, all else is not equal. Julia realized that not having children would make her miserable and she allowed herself to have children. In one sense, Julia isn’t maximally hardcore because she is not a utilitarian robot that only cares about impartial impact and nothing else. No, she also wants to have children, simply as a desire in itself, regardless of how much impact this has.
However, in order to maximize her impact, it actually makes total sense for Julia to have children. If Julia doesn’t have children, she will be sad or even depressed. If she is sad, she is less productive, happy, motivated and creative, and thus less able to do her job well and ultimately has less impact. Thus, given her psychological desires and constraints, having children actually allows her to do more good. (See also this comment by Carl Shulman which is already mentioned in the post.)
Regarding the orthogonality view: Yes, there is a lot of truth to this. If you’re able to think about having impact and doing impactful work from a more approach-motivation oriented perspective (e.g., “this is cool and enjoyable and I’m doing meaningful work”), you can probably work many hours every week, in the long-term, than if you’re coming from a “this is my moral obligation, I need to do this, I just force myself to do this no matter if I enjoy it or not, this is irrelevant, if I don’t, I’m an evil person that is directly responsible for poor children starving to death” perspective.
So clearly the continuum view is not fully accurate.
But there are two problems here:
i) I’m skeptical that anyone is able to always 100% feel great about the impactful work they are doing, at least in the long-run. Sometimes doing the most impactful thing will be difficult and you won’t enjoy it. (See also beware surprising and suspicious convergence.)
Now, for some people, this ratio might be very small. For example, take an AI alignment researcher who’s always loved math and machine learning since they were young. They won the impact passion lottery. But still, even such people will have a few hours per week that they don’t enjoy, e.g., necessary admin work, meetings, strategic planning, and so on. For others, it may be that a much larger portion of their work is something they don’t enjoy doing. (This may be particularly the case during periods of crisis. (Of course, if you really hate 90% of the work you’re doing, you are probably not going to last long.)
ii) But even if you really love your impactful work, it’s probably good to rest and not at all think about work, at least from time to time. Doing work, even enjoyable work, involves being in the “drive system” and you sometimes probably need to be in the “rest and digest system”. But perhaps some people really need to do this only for like 30 min a day, while others need several hours a day, there are probably large interindividual differences here. (This post by Andrew Critch talks about this issue a bit.)
To sum up, I basically don’t think any existing human is maximally hardcore if you define this as “having the drives of an AI programmed with a purely utilitarian utility function”. I mean, we’re all humans and our motivations were shaped by evolution to a large extent. We have desires for belonging, acceptance, friendship, love, and so on. (Lots of individual differences here again, of course. Some people really want children, some people don’t mind not having them, some people are introverts and don’t mind being alone most of the time, other people need to talk a lot with others else they feel lonely, and so on). But I doubt there is any human who has literally none of these human desires, to any extent. That would be an evolutionary miracle.
Another important and related issue is self-deception. Imagine an EA who really wants to think of himself as being maximally altruistic, hardcore and as having lots of impact. What if his intrinsic interests and talents actually aren’t perfectly aligned with the interests and talents optimal for having impact (and the prior is that there is at least some mismatch)? Well, one way out would be to rationalize that the activities they tend to enjoy are also the most impactful ones. (To be clear, I think in most cases this happens subconsciously, not least because we may not know our intrinsic interests very well.) That’s another reason for dispelling the “stigma” associated with not being 100% hardcore.
Anyways, to summarize, I’d say that neither the continuum or the orthogonality view is a fully accurate description. There is a lot of truth to the orthogonality view in that changing your attitudes and relationship with having impact probably can substantially improve your mental health without reducing the amount of impact you have (or even improving it). However, there is also some truth to the continuum view because you probably sometimes need to take a break from thinking about impact otherwise you risk becoming overwhelmed and overexerted. Another point against the orthogonality view which is probably somewhat controversial: sometimes the most impactful action will be unpleasant and you may need to sacrifice some of your happiness or mental health (temporarily!), especially in moments of crisis if you really want to maximize impact. (I could give examples.) But doing this too often can easily backfire and leave you burnt out and risks long-lasting psychological damage that severely reduces your ability to have impact going forward. So, as a general rule, you want to do this very rarely.
- ^
Another personal example: There are people who can play video games for more than 24 hours, or even much longer (with occasional short breaks). The longest time I (David) was ever able to play was ~16 hours at a LAN party when I was 13. After that time, I was exhausted and needed to rest and sleep, even though I really wanted to play more and keep up with my friends. My 13-year old self was very fit and very motivated to play Diablo 2. My sense is that factors outside my control (e.g., genetics) largely contributed to this discrepancy between me and my friends who could play for much longer. It seems plausible that some of these factors are at least partly responsible for me not being able to work more in general.
- ^
Appendix A: Impact obsession and other mental health conditions, thoughts on etiology
How EA can reinforce impact obsession and perfectionism
There are several ways in which EA’s philosophy and culture can reinforce perfectionism and impact obsession. (Note that almost none of the following is meant as a critique of EA or as a call for change. Most of it is unfortunate but also unavoidable.)
EA emphasizes consequentialism (rightly so): Your intentions don’t really matter, it’s the actual consequences of your actions, i.e. your achievements that matter, not whether you’re a good person in some virtue-ethical or deontological sense. Generally, other EAs pay a lot of attention to your achievements and praise you if you did something impactful. All of this reinforces basing your self-worth on your achievements (in the domain of EA), which is essentially the core dynamic of both clinical perfectionism and impact obsession.
Conceptually, EA is about doing the most good, not just doing a bit of good. EA discourse emphasizes heavy tailed distributions, hits-based approaches, and how “EA superstars” can have much much more impact than the average EA. This can reinforce perfectionist tendencies like “I have to be great, otherwise I’m a failure”.
Also, making mistakes in EA can be very costly, as the stakes can be very high. Some of your mistakes really could lead (indirectly, counterfactually speaking) to many beings suffering or dying. It’s not like your mistakes might just cost you a promotion. This high cost of mistakes can also reinforce perfectionist tendencies.
Lastly, many EAs—including our past selves—tend to believe that we should indeed aim for the minimum of self-care necessary. These sentiments can lead those who are vulnerable to perfectionistic impact obsession to dismiss the negative consequences of overexerting themselves to reach their demanding standards.
Impact obsession, scrupulosity and OCD
Interestingly, impact obsession seems very related to moral scrupulosity (as briefly discussed in footnote 5). This excellent post by Holly Elmore makes several interesting observations on the relationship between scrupulosity and EA, and what risks to be aware of. Particularly noteworthy is the observation that scrupulosity is characterized by an “excessive sense of personal responsibility [...or] “hyper-responsibility” (in the context of OCD), or the dysfunctional attitude of omnipotence.” A heightened sense of responsibility is arguably another core characteristic of impact obsession. (See also the influential concept of “heroic responsibility”.)
Scrupulosity is usually seen as a form of obsessive-compulsive disorder (OCD), and clinical perfectionism also shares similarities with OCD (Egan et al., 2016, ch.3.; Limburg et al., 2017). Taken together, this suggests that impact obsession shares some characteristics with OCD (which is part of the reason for why we settled on the term ‘impact obsession’) and may benefit from similar treatment approaches, at least to some extent.
More on differences and similarities between impact obsession and perfectionism
Generally speaking, clinical perfectionism tends to be more like a personality trait that shows up in several domains of life. In contrast, many impact-obsessed EAs are not perfectionistic in other areas of life. Impact obsession seems more related to one’s values and overall meaning, whereas clinical perfectionism is usually motivated by the fear of not wanting to feel incompetent or worthless. (That being said, unhealthy impact obsession often is motivated by these fears as well!)
Typical clinical perfectionism is often accompanied by more severe cognitive biases like overgeneralizing, catastrophizing, all-or-nothing thinking, selective attention, and so on. These biases might appear among those with impact obsession but usually in much less pronounced and rigid form. Generally, the patients described in most CBT books on clinical perfectionism are often quite irrational, rigid, and inflexible. EAs might read these books and think “well, I’m certainly not like these people” and conclude that they don’t have anything like clinical perfectionism. This may be true but they might still benefit from working on their unhealthy impact obsessive tendencies. Preventing this failure mode is another reason for having written this post.
Why do some people develop (unhealthy) impact obsession?
Our sense is that those who think in highly systematic, low-decoupling, non-compartmentalizing ways are more likely to develop impact obsession. In a sense, impact obsession is a logical consequence of deeply internalizing ideas and relevant thought experiments like utilitarianism/consequentialism + impartiality, astronomical stakes, the drowning child, and so on.
Speculatively, people who crave some form of objective, cosmic meaning and purpose (but who cannot find it elsewhere because, e.g., they don’t believe in God) might also be more prone to developing unhealthy impact obsession.
Some other idiosyncratic traits seem more common among those with unhealthy impact obsession, such as being disgusted with hypocrisy and other human flaws (especially when reflecting on their evolutionary roots) and a strong desire to not be like this (cf. Hanson’s “smart sincere syndrome”), i.e., wanting to be a hero. Moral scrupulosity and other maladaptive schemas like “unrelenting standards”, “approval seeking”, seem also more common among those with unhealthy impact obsession.
Great comment.
I agree with your analysis but I think Will also sets up a false dichotomy. One’s inability to conceptualize or realize that one’s actions are wrong is itself a sign of being a bad apple. To simplify a bit, on the one end of the spectrum of the “high integrity to really bad continuum”, you have morally scrupulous people who constantly wonder whether their actions are wrong. On the other end of the continuum, you have pathological narcissists whose self-image/internal monologue is so out of whack with reality that they cannot even conceive of themselves doing anything wrong. That doesn’t make them great people. If anything, it makes them more scary.
Generally, the internal monologue of the most dangerous types of terrible people (think Hitler, Stalin, Mao, etc.) doesn’t go like “I’m so evil and just love to hurt everyone, hahahaha”. My best guess is, that in most cases, it goes more like “I’m the messiah, I’m so great and I’m the only one who can save the world. Everyone who disagrees with me is stupid and/or evil and I have every right to get rid of them.” [1]
Of course, there are people whose internal monologues are more straightforwardly evil/selfish (though even here lots of self-delusion is probably going on) but they usually end up being serial killers or the like, not running countries.
Also, later when Will talks about bad applies, he mentions that “typical cases of fraud [come] from people who are very successful, actually very well admired”, which again suggests that “bad apples” are not very successful or not very well admired. Well, again, many terrible people were extremely successful and admired. Like, you know, Hitler, Stalin, Mao, etc.
Yep, I agree. In fact, the whole character vs. governance thing seems like another false dichotomy to me. You want to have good governance structures but the people in relevant positions of influence should also know a little bit about how to evaluate character.
In general, bad character is compatible with genuine moral convictions. Hitler, for example, was vegetarian for moral reasons and “used vivid and gruesome descriptions of animal suffering and slaughter at the dinner table to try to dissuade his colleagues from eating meat”. (Fraudster/bad apple vs. person with genuine convictions is another false dichotomy that people keep setting up.)