I wanted to push back on this because most commenters seem to agree with you. I disagree that the writing style on the EA forum, on a whole, is bad. Of course, some people here are not the best writers and their writing isn’t always that easy to parse. Some would definitely benefit from trying to make their writing easier to understand.
For context, I’m also a non-native English speaker and during high school, my performance in English (and other languages) was fairly mediocre.
But as a whole, I think there are few posts and comments that are overly complex. In fact, I personally really like the nuanced writing style of most content on the EA forum. Also, criticizing the tendency to “overly intellectualize” seems a bit dangerous to me. I’m afraid that if you go down this route you shut down discussions on complex issues and risk creating a more Twitter-like culture of shoehorning complex topics into simplistic tidbits. I’m sure this is not what you want but I worry that this will be an unintended side effect. (FWIW, in the example thread you give, no comment seemed overly complex to me.)
Of course, in the end, this is just my impression and different people have different preferences. It’s probably not possible to satisfy everyone.
David_Althaus
Reducing long-term risks from malevolent actors
Impact obsession: Feeling like you never do enough good
Many therapy schools work with inner multiplicity (not just IFS)
As Astrid Wilde noted on Twitter, there is a distinct possibility that the causality of the situation may have run the other way, with SBF as a conman taking advantage of the EA community’s high-trust environment to boost himself.
What makes this implausible for me is that SBF has been involved in EA since very early one (~2013 or earlier?). Back then, there was no money, power or fame to speak of, so why join this fringe movement?
Incentivizing forecasting via social media
Descriptive Population Ethics and Its Relevance for Cause Prioritization
To be clear, I think there were multiple causal factors, and believing in EA probably explained a lot less variance than SBF’s idiosyncratic character traits (e.g., plausibly dark triad traits, especially narcissism and Machiavellianism which entail immense lust for power and fame, disregard of common-sense morality such as not lying, etc.). I mean, there are like 10,000 EAs and I don’t know of anyone who has committed a serious crime because of EA.
A hypothetical human with the same personality traits as SBF but who doesn’t believe in EA, plausibly would have done pretty shady things as well. Unfortunately, it is possible that EA motivated SBF to amass even more power and money than otherwise. Also, EA provided SBF with a network of highly competent, motivated and trusting people.
I think conceiving of SBF as someone who totally did not believe in EA principles and did everything just for money and power is simplistic and false.
We will obviously never know the precise contents of SBF’s internal monologue. But it is conceivable that he thought he is doing everything only because of EA. Everyone is the hero of their own story.
But you cannot blindly trust your internal monologue. SBFs actions were probably shaped to a large extent by subconscious motivations. My best guess is that SBF might have been somewhat aware that he has ulterior motives but that he thought this is okay and that he is it all under control (“no one is perfect” “I’ll use my desires for power as additional motivational fuel”, etc.). Though it’s also possible that he thought of himself as practically a saint.
Great comment.
Will says that usually, that most fraudsters aren’t just “bad apples” or doing “cost-benefit analysis” on their risk of being punished. Rather, they fail to “conceptualise what they’re doing as fraud”.
I agree with your analysis but I think Will also sets up a false dichotomy. One’s inability to conceptualize or realize that one’s actions are wrong is itself a sign of being a bad apple. To simplify a bit, on the one end of the spectrum of the “high integrity to really bad continuum”, you have morally scrupulous people who constantly wonder whether their actions are wrong. On the other end of the continuum, you have pathological narcissists whose self-image/internal monologue is so out of whack with reality that they cannot even conceive of themselves doing anything wrong. That doesn’t make them great people. If anything, it makes them more scary.
Generally, the internal monologue of the most dangerous types of terrible people (think Hitler, Stalin, Mao, etc.) doesn’t go like “I’m so evil and just love to hurt everyone, hahahaha”. My best guess is, that in most cases, it goes more like “I’m the messiah, I’m so great and I’m the only one who can save the world. Everyone who disagrees with me is stupid and/or evil and I have every right to get rid of them.” [1]
Of course, there are people whose internal monologues are more straightforwardly evil/selfish (though even here lots of self-delusion is probably going on) but they usually end up being serial killers or the like, not running countries.
Also, later when Will talks about bad applies, he mentions that “typical cases of fraud [come] from people who are very successful, actually very well admired”, which again suggests that “bad apples” are not very successful or not very well admired. Well, again, many terrible people were extremely successful and admired. Like, you know, Hitler, Stalin, Mao, etc.Nor am I implying that improved governance is not a part of the solution.
Yep, I agree. In fact, the whole character vs. governance thing seems like another false dichotomy to me. You want to have good governance structures but the people in relevant positions of influence should also know a little bit about how to evaluate character.
- ^
In general, bad character is compatible with genuine moral convictions. Hitler, for example, was vegetarian for moral reasons and “used vivid and gruesome descriptions of animal suffering and slaughter at the dinner table to try to dissuade his colleagues from eating meat”. (Fraudster/bad apple vs. person with genuine convictions is another false dichotomy that people keep setting up.)
- ^
I’d be careful with this sort of advice.
All I have to offer is my personal story which is of course very little evidence (though I’ve seen similar stories play out among many of my friends and colleagues). I’ll first talk about my experience with stimulants and then about my experience with pushing myself to work more.
In my mid twenties I basically followed the same sort of logic and started taking amphetamines (very reasonable dosages, not more than twice a week). Every skeptic I told, among other things, the story of Paul Erdős who took amphetamines practically every day for decades and was perhaps the most productive mathematician ever. (In fact, Erdős was so productive that his friends created the widely used Erdős number which “describes a person’s degree of separation from Erdős himself, based on their collaboration with him, or with another who has their own Erdős number.”)
Well, for me personally this worked for a few months after which side effects started to slowly show up (low mood, fatigue, insomnia, etc.). It was so gradual that I needed another few months to attribute them to amphetamines. It was also very much unlike how I expected side effects and withdrawal to manifest. It’s (for many people) not this immediate thing that happens after the first time you take amphetamines. It’s usually more gradual than that. Several of my friends also experimented with amphetamine for productivity-enhancing effects, for none of them it turned out positively.
That being said, I’ve had somewhat better experiences with Ritalin (though I still often wonder whether it was net positive). And I made much better experiences yet with MAO inhibitors like tranylcypromine (which still have lots of side effects). (I might write more on this in the future.)
Regarding pushing yourself to work more: I think in my case this backfired extremely hard and caused me to burnout. In my twenties I always had this suspicion that talk about burnout is widely exacerbated, perhaps especially by lazy or selfish people who want to work less. (I was pretty stupid.) In my experience, especially if you are a researcher, pushing yourself hard doesn’t work that well, at least not in the long term. Your best ideas are probably two orders of magnitude more important than your average idea. If you start caring primarily about how many hours you work, you risk working on ideas that are easy to write about or execute. Coming up with good ideas often doesn’t look like work at all. You might just be reading for your own pleasure or out of curiosity. That’s at least how it worked for me. (As an example, I’ve had most of the core ideas for this article after I had reduced my working hours substantially.)
I guess if you have (very) short timeline, are generally quite young, healthy and robust, I’d be much more optimistic about such strategies. And as the case of e.g. Erdős shows, outliers do exist.
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism
(Alternative title: Some bad human values may corrupt a long reflection[1])
The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia.
Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”:
“And now, of course, the question arises: how different, exactly, are human hearts from each other? And in particular: are they sufficiently different that, when they foom, and even “on reflection,” they don’t end up pointing in exactly the same direction? After all, Yudkowsky said, above, that in order for the future to be non-trivially “of worth,” human hearts have to be in the driver’s seat. But even setting aside the insult, here, to the dolphins, bonobos, nearest grabby aliens, and so on – still, that’s only to specify a necessary condition. Presumably, though, it’s not a sufficient condition? Presumably some human hearts would be bad drivers, too? Like, I dunno, Stalin?”
What makes human hearts bad?
What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat.
As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3]
Malevolence: dangerous personality traits
Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness.
Ideological fanaticism: dangerous belief systems
There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.
See this footnote[4] for a preliminary list of defining characteristics.
Malevolence and fanaticism seem especially dangerous
Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two reasons.
Deliberately resisting reflection and idealization
First, malevolence—if reflectively endorsed[6]—and fanatical ideologies deliberately resist being changed and would thus plausibly resist idealization even during a long reflection. The most central characteristic of fanatical ideologies is arguably that they explicitly forbid criticism, questioning, and belief change and view doubters and disagreement as evil.
Putting positive value on creating harm
Second, malevolence and ideological fanaticism would not only result in the future not being as good as it possibly could—they might actively steer the future in bad directions and, for instance, result in astronomical amounts of suffering.
The preferences of malevolent humans (e.g., sadists) may be such that they intrinsically enjoy inflicting suffering on others. Similarly, many fanatical ideologies sympathize with excessive retributivism and often demonize the outgroup. Enabled by future technology, preferences for inflicting suffering on the outgroup may result in enormous disvalue—cf. concentration camps, the Gulag, or hell[7].
In the future, I hope to write more about all of this, especially long-term risks from ideological fanaticism.
Thanks to Pablo and Ruairi for comments and valuable discussions.
- ^
“Human misalignment” is arguably a confusing (and perhaps confused) term. But it sounds more sophisticated than “bad human values”.
- ^
For example, Matthew Barnett in “AI alignment shouldn’t be conflated with AI moral achievement”, Geoffrey Miller in “AI alignment with humans… but with which humans?”, lc in “Aligned AI is dual use technology”. Pablo Stafforini has called this the “third alignment problem”. And of course, Yudkowsky’s concept of CEV is meant to address these issues.
- ^
These factors may not be clearly separable. Some humans may be more attracted to fanatical ideologies due to their psychological traits and malevolent humans are often leading fanatical ideologies. Also, believing and following a fanatical ideology may not be good for your heart.
- ^
Below are some typical characteristics (I’m no expert in this area):
Unquestioning belief, absolute certainty and rigid adherence. The principles and beliefs of the ideology are seen as absolute truth and questioning or critical examination is forbidden.
Inflexibility and refusal to compromise.
Intolerance and hostility towards dissent. Anyone who disagrees or challenges the ideology is seen as evil; as enemies, traitors, or heretics.
Ingroup superiority and outgroup demonization. The in-group is viewed as superior, chosen, or enlightened. The out-group is often demonized and blamed for the world’s problems.
Authoritarianism. Fanatical ideologies often endorse (or even require) a strong, centralized authority to enforce their principles and suppress opposition, potentially culminating in dictatorship or totalitarianism.
Militancy and willingness to use violence.
Utopian vision. Many fanatical ideologies are driven by a vision of a perfect future or afterlife which can only be achieved through strict adherence to the ideology. This utopian vision often justifies extreme measures in the present.
Use of propaganda and censorship.
- ^
For example, Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would make such types of tradeoffs much less costly.
- ^
Some people (how many?) with elevated malevolent traits don’t reflectively endorse their malevolent urges and would change them if they could. However, some of them do reflectively endorse their malevolent preferences and view empathy as weakness.
- ^
Some quotes from famous Christian theologians:
Thomas Aquinas: “the blessed will rejoice in the punishment of the wicked.” “In order that the happiness of the saints may be more delightful to them and that they may render more copious thanks to God for it, they are allowed to see perfectly the sufferings of the damned”.
Samuel Hopkins: “Should the fire of this eternal punishment cease, it would in a great measure obscure the light of heaven, and put an end to a great part of the happiness and glory of the blessed.”
Jonathan Edwards: “The sight of hell torments will exalt the happiness of the saints forever.”
- Long Reflection Reading List by 24 Mar 2024 16:27 UTC; 76 points) (
- 11 Apr 2024 16:25 UTC; 20 points) 's comment on Reasons for optimism about measuring malevolence to tackle x- and s-risks by (
- 9 Mar 2024 12:56 UTC; 10 points) 's comment on AI things that are perhaps as important as human-controlled AI (Chi version) by (
- ^
Very much agree. Some EAs knew SBF for almost a decade and plausibly interacted with him for hundreds of hours (including in non-professional settings which are usually more revealing of someone’s character and personality).
ETA July: I regret posting the following comment for several reasons, partly because I got crucial information wrong and failed to put things into context and prevent misunderstandings. Please consider reading my longer explanation at the top of my follow-up comment here. I’m sorry to anyone I upset.
------------------------------------------------------------------
At EAG London 2022, they [ETA: this was an individual without consent of the organizers] distributed hundreds of stickers depicting Sam on a bean bag with the text “what would SBF do?”. To my knowledge, never before were flyers depicting individual EAs at EAG distributed. (Also, such behavior seems generally unusual to me, like, imagine going to a conference and seeing hundreds of flyers and stickers all depicting one guy. Doesn’t that seem a tad culty?)
On the 80k website, they had several articles mentioning SBF as someone highly praiseworthy and worth emulating.
Will vouched for SBF “very much” when talking to Elon Musk.
Sam was invited to many discussions between EA leaders.
There are probably more examples.
Generally, almost everyone was talking about how great Sam is and how much good he has achieved and how, as a good EA, one should try to be more like him.- 1 Nov 2023 18:56 UTC; 20 points) 's comment on How has FTX’s collapse impacted EA? by (
- 26 Mar 2023 10:41 UTC; 11 points) 's comment on Time Article Discussion—“Effective Altruist Leaders Were Repeatedly Warned About Sam Bankman-Fried Years Before FTX Collapsed” by (
[I’m sleep-deprived so this is not well written and fairly repetitive and unstructured, apologies. I also know nothing about politics and usually follow a policy of almost never reading the news. So me writing a comment on a complex geopolitical issue is arguably ludicrous.]
I see your reasoning with these points, and agree that the sign of donating is unclear, but I also think there are counterarguments to the points you have made.
I think that effectively giving in to Putin’s threats here plausibly emboldens him and other malevolent autocrats to take over more countries in the future with impunity. Instead, perhaps the more effective approach, and the one that might have better results in the long run, might be what the West is currently doing: forming a coalition that enforces punishments on malevolent autocrats invading other countries, etc. (I do think that the US invading e.g. Iraq is sufficiently dissimilar from the current case, though I know many people disagree on this.)
(Perhaps one might think that Putin is not a malevolent autocrat. Again, I think this seems likely but I don’t provide evidence here.)
If the West does not do this, it might become clear to Putin and others that they should invade neighboring countries (e.g., China taking over Taiwan) given that there are large material incentives to doing so, and that they will not face much resistance.
Therefore, if the West does not stand up strongly to Putin now, the result might be more violence and lives lost in the long run. Also, the historic track record of appeasement when it comes to malevolent dictators has not been good. (Though it’s difficult to know the relevant counterfactuals, of course.)
A few other miscellaneous points:
I guess the big question is what the overall policy should be for dealing with situations like this? If the West gives in to Putin now, what should the response be when he invades other countries? Or when other nuclear armed nations invade other countries? If we are going to give in to any nuclear armed autocrat, that’s a quick recipe for giving over more and more territory and power to malevolent leaders, which seems very negative from a short-termist and longtermist perspective.
Overall, it’s plausible to me that every day we prolong this war might be net positive from a long-term perspective, even if more lives are lost in the short-term. First, it makes it more likely that sanctions bite hard enough and Putin has to give up. Second, a longer war will be a greater deterrent to Putin and other autocrats in the future. Last, prolonging the war plausibly weakens Putin’s and his allies power and strengthens political opposition in Russia. If Putin is succesful now, then the Russian people will update on that and more likely support future nationalistic leaders. However, if Putin fails, they might be more likely to support more conciliatory, peaceful leaders.
Generally, reducing Putin’s influence seems very valuable from a longtermist perspective since he seems to has caused a lot of harm in the past decades. For example, he plausibly helped to increase political polarization in the US, perhaps helped Trump to win the election, weakened international cooperation, etc.
Another point to keep in mind: Imagine we live in the universe where Putin is really willing to consider launching nuclear missiles over this conflict, if Ukraine is not given to him without much resistance. (If we don’t live in this universe, we don’t have to worry about his nuclear threats.) It seems to me that the Putin of this universe would also be fairly likely to invade more countries and make further nuclear threats (to which the world would have to give in again and again) giving him ever more power. The Putin of this universe would be a terrible person to give much power to.
Lastly, I agree that getting involved in hot-button issues is usually not wise (e.g., because they are too crowded) but this is not always true. For example, many EAs also became involved in COVID.
All that said, I agree that this issue is complex and fraught with uncertainty about how one should act.
There hasn’t been a proper RCT yet though.
Pokorny et al. (2017) seems like a relevant RCT. They found that psilocybin significantly increased empathy.
However, even such results don’t make me very optimistic about the use of psychedelics for reducing malevolence.
The kind of individuals that seem most dangerous (people with highly elevated Dark Tetrad traits who are also ambitious, productive and strategic) seem less likely to be interested in taking psychedelics—such folks don’t seem interested in increasing their empathy, becoming less judgmental or having spiritual experiences. In contrast, the participants of the Pokorny et al. study—like most participants in current psychedelics studies (I think)—wanted to take psychedelics which is why they signed up for the study.
Moreover, my sense is that psychedelics are most likely to increase openness and compassion in those who already started out with some modicum of these traits and who would like to increase them further. I’m somewhat pessimistic that giving psychedelics to highly malevolent individuals would make them substantially more compassionate. That being said, I’m certainly not confident in that assessment.
My intuition is partly based on personal experience and anecdotes but also more objective evidence like the somewhat disappointing results of the Concord Prison Experiment. However, due to various methodological flaws, I’d be hesitant to draw strong conclusions from this experiment.
Overall, I’d nevertheless welcome further research into psychedelics and MDMA. It would still be valuable if these pharmaceutical agents “only” increase empathy in individuals who are already somewhat empathic.
Anecdotally, a lot of Western contemplative teachers got started on that path because of psychedelic experiences (Zen, Tibetan Vajrayana, Vipassana, Advaita Vedanta, Kashmiri Shaivism). These traditions are extremely prosocial & anti-malevolent.
My guess is that most Western contemplative teachers who, as a result of taking psychedelics, got interested in Buddhism and meditation (broadly defined) were, on average, already considerably more compassionate, idealistic, and interested in spiritual questions than the type of practically oriented, ambitious, malevolent people I worry about.
As an aside, I’m much more optimistic about the use of psychedelics, empathogens, and entactogens for treating other issues such as depression or PTSD. For example, the early results on using MDMA for treatment-resistant PTSD seem extremely encouraging (and Doblin’s work in general seems promising).
Aside from the obvious dangers relating to bad trips, psychosis, neurotoxicity (which seems only relevant for MDMA), et cetera[1], my main worry is that psychedelics sometimes seem to decrease people’s epistemic and instrumental rationality. I also observed that they sometimes seem to have shifted people’s interests towards more esoteric matters and led to “spiritual navel-gazing”—of course, this can be beneficial for people whose life goals are comparatively uninformed.
- ↩︎
Though my impression is that these risks can be reduced to tolerable levels by taking psychedelics only in appropriate settings and with the right safety mechanisms in place.
- ↩︎
Thanks for sharing, I thought this was interesting and relatable.
For what it’s worth, you seem like a really committed person to me, so I wouldn’t call you lazy (if you’re “lazy”, why can you work 50 hours and managed to perform well in the military?). In some cheeky sense, you might have benefited from being more lazy and “giving up” sooner, rather than trying to push yourself to make it work for years, always hoping that change is around the corner.
In my early twenties I also tried to study computer science and programming for similar reasons (AI safety research, EtG potential). I think I basically gave up after like 1-2 weeks because I did not like it. In some sense, you could say that my own laziness saved me from making the potentially huge mistake of pursuing something for a few years and then burning out/getting stuck in the sunk cost fallacy, etc.
Though that’s usually not how I view it. Over the years I’ve often blamed myself for being a lazy quitter and that I should have tried harder back then to study CS. Otoh, stories like yours are (weak) evidence that it probably wouldn’t have ended well and that I should be glad to have continued to study where my personal fit was higher even though it was (way) less impactful.
Anyways, enough rambling about myself. In my book, you tried really hard to have impact and showed real courage in sharing your story. I think you’re cool. :)
Thanks for this post! I’ve been thinking about similar issues.
One thing that may be worth emphasizing is that there are large and systematic interindividual differences in idea attractiveness—different people or groups probably find different ideas attractive.
For example, for non-EA altruists, the idea of “work in soup kitchen” is probably much more attractive than for EAs because it gives you warm fuzzies (due to direct personal contact with the people you are helping), is not that effortful, and so on. Sure, it’s not very cost-effective but this is not something that non-EAs take much into account. In contrast, the idea of “earning-to-give” may be extremely unattractive to non-EAs because it might involve working at a job that you don’t feel passionate about, you might be disliked by all your left-leaning friends, and so on. For EAs, the reverse is true (though earning-to-give may still be somewhat unattractive but not that unattractive).
In fact, in an important sense, one primary reason for starting the EA movement was the realization of schlep blindness in the world at large—certain ideas (earning to give, donate to the global poor or animal charities) were unattractive / uninspiring / weird but seemed to do (much) more good than the attractive ideas (helping locally, becoming a doctor, volunteering at a dog shelter, etc.).
Of course, it’s wise to ask ourselves whether we EAs share certain characteristics that would lead us to find a certain type of idea more attractive than others. As you write in a comment, it’s fair to say that most of us are quite nerdy (interested in science, math, philosophy, intellectual activities) and we might thus be overly attracted to pursue careers that primarily involve such work (e.g., quantitative research, broadly speaking). On the other hand, most people don’t like quantitative research, so you could also argue that quantitative research is neglected! (And that certainly seems to be true sometimes, e.g., GiveWell does great work relating to global poverty.)EA should prioritise ideas that sound like no fun
I see where you’re coming from but I also think it would be unwise to ignore your passions and personal fit. If you, say, really love math and are good at it, it’s plausible that you should try to use your talents somehow! And if you’re really bad at something and find it incredibly boring and repulsive, that’s a pretty strong reason to not work on this yourself. Of course, we need to be careful to not make general judgments based on these personal considerations (and this plausibly happens sometimes subconsciously), we need to be mindful of imitating the behavior of high-status folks of our tribe, etc.)
We could zoom out even more and ask ourselves if we as EAs might be attracted to certain worldviews/philosophies that are more attractive.
What type of worldviews might EAs be attracted to? I can’t speak for others but personally, I think that I’ve been too attracted to worldviews according to which I can (way) more impact than the average do-gooder. This is probably because I derive much of my meaning and self-worth in life from how much good I believe I do. If I can change the world only a tiny amount even if I try really, really hard and am willing to bite all the bullets, that makes me feel pretty powerless, insignificant, and depressed—there is so much suffering in the world and I can do almost nothing in the big scheme of things? Very sad.
In contrast, “silver-bullet worldviews” according to which I can have super large amounts of impact because I galaxy-brained my way into finding very potent, clever, neglected levers that will change the world-trajectory—that feels pretty good. It makes me feel like I’m doing something useful, like my life has meaning. More cynically, you could say it’s all just vanity and makes me feel special and important. “I’m not like all those others schmucks who just feel content with voting every four years and donating every now and then. Those sheeple. I’m helping way more. But no big deal, of course! I’m just that altruistic.”
To be clear, I think probably something in the middle is true. Most likely, you can have more (expected) impact than the average do-gooder if you really try and reflect hard and really optimize for this. But in the (distant) past, following the reasoning of people like Anna Salamon (2010) [to be clear: I respect and like Anna a lot], I thought this might buy you like a factor of a million, whereas now I think it might only buy you a factor of a 100 or something. As usual, Brian has argued for this a long time ago. However, a factor of a 100 is still a lot and most importantly, the absolute good you can do is what ultimately matters, not the relative amount of good, and even if you only save one life in your whole life, that really, really matters.
Also, to be clear, I do believe that many interventions of most longtermist causes like AI alignment plausibly do (a great deal) more good than most “standard” approaches to doing good. I just think that the difference is considerably smaller than I previously believed, mostly for reasons related to cluelessness.
For me personally, the main take-away is something like this: Because of my desperate desire to have ever-more impact, I’ve stayed on the train to crazy town for too long and was too hesitant to walk a few stops back. The stop I’m now in is still pretty far away from where I started (and many non-EAs would think it uncomfortably far away from normal town) but my best guess is that it’s a good place to be in.
(Lastly, there are also biases against “silver-bullet worldviews”. I’ve been thinking about writing a whole post about this topic at some point.)
whether this was “naive utilitarian went too far” (bad) or “sociopath using EA to reputation-launder” (bad).
I think this is a false dichotomy. You can be a psychopath (i.e. have highly elevated psychopathic traits) and nonetheless be a true believer in an ideology that is about bettering the world. For example, Stalin clearly had highly elevated psychopathic traits but (IIRC) he also risked his life to further communist goals. (To be clear, I’m not saying that Sam is nearly as bad as Stalin.)
Great post, I agree with most of it!
Overall, I’m in favor of more (well-executed) public advocacy à la AI Pause (though I do worry a lot about various backfire risks (also, I wonder whether a message like “AI slow” may be better)), and I commend you for taking the initiative despite it (I imagine) being kinda uncomfortable or even scary at times!
(ETA: I’ve become even more uncertain about all of this. I might still be slightly in favor of (well-executed) AI Pause public advocacy but would probably prefer emphasizing messages like conditional AI Pause or AI Slow, and yeah, it all really depends greatly on the execution.)The inside-outside game spectrum seems very useful. We might want to keep in mind another (admittedly obvious) spectrum, ranging from hostile/confrontational to nice/considerate/cooperative.
Two points in your post made me wonder whether you view the outside-game as necessarily being more on the hostile/confrontational end of the spectrum:
1) As an example for outside-game you list “moralistic, confrontational advocacy” (emphasis mine).
2) You also write (emphasis mine):
Funnily enough, even though animal advocates do radical stunts, you do not hear this fear expressed much in animal advocacy. If anything, in my experience, the existence of radical vegans can make it easier for “the reasonable ones” to gain access to institutions.
This implicitly characterizes the outer game with radical stunts, radical, and “unreasonable” people.
However, my sense is that outside-game interventions (hereafter: activism or public advocacy) can differ enormously on the hostility vs. considerateness dimension, even while holding other effects (such as efficacy) constant.
The obvious example is Martin Luther King’s activism, perhaps most succinctly characterized by his famous “I have a Dream” speech which was non-confrontational and emphasized themes of cooperation, respect, and even camaraderie.[1] (In fact, King was criticized by others for being too compromising.[2]) On the hostile/confrontational side of the spectrum you had people like Malcolm X, or the Black Panther Party.[3] In the field of animal advocacy, you have organizations like PETA on the confrontational end of the spectrum and, say, Mercy for Animals on the more considerate side.
As you probably have guessed, I prefer considerate activism over more confrontational activism. For example, my guess is that King and Mercy for Animals have done much more good for African Americans and animals, respectively, than Malcolm X and PETA.
(As an aside and to be super clear, I didn’t want to suggest that you or AI Pause is or will be disrespectful/hostile and, say, throw paper clips at Meta employees! :P )
A couple of weak arguments in favor of considerate/cooperative public advocacy over confrontational/hostile advocacy:
Taking a more confrontational tone makes everyone more emotional and tense, which probably decreases truth-seeking, scout-mindset, and the general epistemic quality of discourse. It also makes people more aggressive and might escalate conflict, and dangerous emotional and behavioral patterns such as spite, retaliation, or even (threats of) violence. It may also help to bring about a climate where the most outrage-inducing message spreads the fastest. Last, since this is EA, here’s the obligatory option value argument: It seems easier to go from a more considerate to a more confrontational stance than vice versa.
As an aside, (and contrary to what you write in the above quote), I often have heard the fear expressed that the actions of radical vegans will backfire. I’ve certainly witnessed that people were much less receptive to my animal welfare arguments because they’ve had bad experiences with “unreasonable” vegans who e.g. yelled expletives at them.[4] I think you can also see this reflected in the general public where vegans don’t have a great reputation, partly based on the aggressive actions of a few confrontational and hostile vegans or vegan organizations like PETA.
Political science research (e.g., Simpson et al., 2018) also seems to suggest that nonviolent protests are better than violent protests. (Of course, I’m not trying to imply that you were arguing for violent protests, in fact, you repeatedly say (in other places) that you’re organizing a nonviolent protest!) Importantly, the Simpson et al. paper suggests that violent protests make the protester side appear unreasonable and that this is the mechanism that causes the public to support this side less. It seems plausible to me that more confrontational and hostile public activism, even if it’s nonviolent, is more likely to appear unreasonable (especially when it comes to movements that might seem a bit fringe and which don’t yet have a long history of broad public support).
In general, I worry that increasing hostility/conflict, in particular in the field of AI, may be a risk factor for x-risk and especially s-risks. Of course, many others have written about the value of compromise/being nice and the dangers of unnecessary hostility, e.g., Schubert & Cotton-Barratt (2017), Tomasik (many examples, most relevant 2015), and Baumann (here and here).
Needless to say, there are risks to being too nice/considerate but I think they are outweighed by the benefits though it obviously depends on the specifics. (As you imply in your post, it’s probably also true that all public protests, by their very nature, are more confrontational than silently working out compromises behind closed doors. Still, my guess is that certain forms of public advocacy can score fairly high on the considerateness dimension while still being effective.)
To summarize, it may be valuable to emphasize considerateness (along other desiderate such as good epistemics) as a core part of the AI Pause movement’s memetic fabric, to minimize the probability that it will become more hostile in the future since we will probably have only limited memetic control over the movement once it gets big. This may also amount to pulling the rope sideways, in the sense that public advocacy against AI risk may be somewhat overdetermined (?) but we are perhaps at an inflection point where we can shape its overall tone / stance on the confrontational vs. considerate spectrum.
- ^
Examples: “former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood” and “little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers”.
- ^
From Wikipedia: “Some Black leaders later criticized the speech (along with the rest of the march) as too compromising. Malcolm X later wrote in his autobiography: “Who ever heard of angry revolutionaries swinging their bare feet together with their oppressor in lily pad pools, with gospels and guitars and ‘I have a dream’ speeches?
- ^
To be fair, their different tactics were probably also the result of more extreme religious and political beliefs.
- ^
I should note that I probably have much less experience with animal advocacy than you.
- ^
Thanks Magnus for your more comprehensive summary of our population ethics study.
You mention this already, but I want to emphasize how much different framings actually matter. This surprised me the most when working on this paper. I’d thus caution anyone against making strong inferences from just one such study.
For example, we conducted the following pilot study (n = 101) where participants were randomly assigned to two different conditions: i) create a new happy person, and ii) create a new unhappy person. See the vignette below:
The response scale ranged from 1 = Extremely bad to 7 = Extremely good.
Creating a happy person was rated as only marginally better than neutral (mean = 4.4), whereas creating an unhappy person was rated as extremely bad (mean = 1.4). So this would lead one to believe that there is strong popular support for the asymmetry. [1]
However, those results were most likely due to the magical machine framing and/or the “push-a-button” framing. Even though these framings clearly “shouldn’t” make such a huge difference.
All in all, we tested many different framings, too many to discuss here. Occasionally, there were significant differences between framings that shouldn’t matter (though we also observed many regularities). For example, we had one pilot with the “multiplier framing”:
Here, the median trade ratio was 8.5 compared to the median trade ratio of 3-4 that we find in our default framing. It’s clear that the multiplier framing shouldn’t make any difference from a philosophical perspective.
So seemingly irrelevant or unimportant changes in framings (unimportant at least from a consequentialist perspective) sometimes could lead to substantial changes in median trade ratios.
However, changes in the intensity of the experienced happiness and suffering—which is arguably the most important aspect of the whole thought experiment—affected the trade ratios considerably less than the above mentioned multiplier framing.
To see this, it’s worth looking closely at the results of study 1b. Participants were first presented with the following scale:
[Editor’s note: From now on, the text is becoming more, um, expressive.]
Note that “worst form of suffering imaginable” is pretty darn bad. Being brutally tortured while kept alive by nano bots is more like −90 on this scale. Likewise, “absolute best form of bliss imaginable” is pretty far out there. Feeling, all your life, like you just created friendly AGI and found your soulmate, while being high on ecstasy would still not be +100.
(Note that we also conducted a pilot study where we used more concrete and explicit descriptions such as “torture”, “falling in love”, “mild headaches”, and “good meal” to describe the feelings of mild or extreme [un]happiness. The results were similar.)
Afterwards, participants were asked:
So how do the MTurkers approach these awe-inspiring intensities?
First, extreme happiness vs. extreme unhappiness. MTurkers think that there need to exist at least 72% people experiencing the absolute best form of bliss imaginable in order to outweigh the suffering of 28% of people experiencing the worst form of suffering imaginable.
Toby Ord and the classical utilitarians rejoice, that’s not bad! That’s like a 3:1 trade ratio, pretty close to a 1:1 trade ratio! “And don’t forget that people’s imagination is likely biased towards negativity for evolutionary reasons!”, Carl Shulman says. “In humans, the pleasure of orgasm may be less than the pain of deadly injury, since death is a much larger loss of reproductive success than a single sex act is a gain.” Everyone nods in agreement with the Shulmaster.
How about extreme happiness vs. mild unhappiness? MTurkers say that there need to exist at least 62% of people experiencing the absolute best form of bliss imaginable in order to outweigh the extremely mild suffering of unhappy people (e.g., people who are stubbing their toes a bit too often for their liking). Brian Tomasik and the suffering-focused crowd rejoice, a 1.5 : 1 trade ratio for practically hedonium to mild suffering?! There is no way the expected value of the future is that good. Reducing s-risks is common sense after all!
How about mild happiness vs. extreme unhappiness? The MTurkers have spoken: A world in which 82% of people experience extremely mild happiness—i.e., eating particularly bland potatoes and listening to muzak without one’s hearing aids on—and 18% of people are brutally tortured while being kept alive by nano bots, is… net positive.
“Wait, that’s a trade ratio of 4.5:1 !” Toby says. “How on Earth is this compatible with a trade ratio of 3:1 for practically hedonium vs. highly optimized suffering, let alone a trade ratio of 1.5:1 for practically hedonium vs. stubbing your toes occasionally!” Carl screams. He looks at Brian but Brian has already fainted.
Toby, Carl and Brian meet the next day, still looking very pale. They shake hands and agree to not do so much descriptive ethics anymore.
Years later, all three still cannot stop wincing with pain when “the Long Reflection” is mentioned.
We also had two conditions about preventing the creation of a happy [unhappy] person. Preventing a happy person from being created (mean = 3.1) was rated as somewhat bad. Preventing an unhappy person (mean = 5.5) from being created was rated as fairly good.