This is super interesting. Thanks for writing it. Do you think you’re conflating several analytically distinct phenomena when you say (i) “Fanaticism is the idea that we should base our decisions on all of the possible outcomes of our actions no matter how unlikely they are … base our decisions on all of the possible outcomes of our actions no matter how unlikely they are EA fanatics take a roughly maximize expected utility approach” and (ii) “Fanaticism is unreasonable”?
For (i), I mainly have in mind two approaches “fanatics” could be defined by: (ia) “do a quick back-of-the-envelope calculation of expected utility and form beliefs based solely on its output,” and (ib) “do what you actually think maximizes expected utility, no matter whether that’s based on a spreadsheet, heuristic, intuition, etc.” I think (ia) isn’t something basically anyone would defend, while (ib) is something I and many others would (and it’s how I think “fanaticism” tends to be used). And for (ib), we need to account for heuristics like, (f) quick BOTE calculations tend to overestimate the expected utility of low probabilities of high impact, and (g) extremely large and extremely small numbers should be sandboxed (e.g., capped in the influence they can have on the conclusion). This is a (large) downside of these “very weird projects,” and I think it makes the “should support” case a lot weaker.
For (ii), I mainly have in mind three claims about fanaticism: (iia) “Fanaticism is unintuitive,” (iib) “Fanaticism is absurd (a la reductio ad absurdum,” and (iic) “Fanaticism breaks some utility axioms.” These each have different evidence . For example, (iia) might not really matter if we don’t think our intuitions—which have been trained through evolution and life experience—are reliable for such unusual questions like maximizing long-run aggregate utility.
Did you have some of these in mind? Or maybe other operationalizations?
Jacy
Brief Thoughts on the Prioritization of Quality Risks
This is a brief shortform post to accompany “The Future Might Not Be So Great.” These are just some scattered thoughts on the prioritization of quality risks not quite relevant enough to go in the post itself. Thanks to those who gave feedback on the draft of that post, particularly on this section.
People ask me to predict the future, when all I want to do is prevent it. Better yet, build it. Predicting the future is much too easy, anyway. You look at the people around you, the street you stand on, the visible air you breathe, and predict more of the same. To hell with more. I want better. ⸻ Ray Bradbury (1979)
I present a more detailed argument for the prioritization of quality risks (particularly moral circle expansion) over extinction risk reduction (particularly through certain sorts of AI research) in Anthis (2018), but here I will briefly note some thoughts on importance, tractability, and neglectedness. Two related EA Forum posts are “Cause Prioritization for Downside-Focused Value Systems” (Gloor 2018) and “Reducing Long-Term Risks from Malevolent Actors” (Althaus and Baumann 2020). Additionally, at this early stage of the longtermist movement, the top priorities for population and quality risk may largely intersect. Both issues suggest foundational research of topics such as the nature of AI control and likely trajectories of the long-term future, community-building of thoughtful do-gooders, and field-building of institutional infrastructure to use for steering the long-term future.
Importance
One important application of the EV of human expansion is to the “importance” of population and quality risks. Importance can be operationalized as the good done if the entire cause succeeded in solving its corresponding problem, such as the good done by eliminating or substantially reducing extinction risk, which is effectively zero if the EV of human expansion is zero and effectively negative if the EV of human expansion is negative.
The importance of quality risk reduction is clearer, in the sense that the difference in quality between possible futures is clearer than the difference in extinction and non-extinction, and larger, in the sense that while population risk entails only the range of zero-to-positive difference between human extinction and non-extinction (or population risk between zero population and some positive number of individuals) across quality risk entails the difference between the best quality humans could engender and the worst, across all possible population sizes. This is arguably a weakness of the framework because we could categorize the quality risk cause area as smaller in importance (say, an increase of 1 trillion utils, i.e., units of goodness), and it would tend to become more tractable as we narrow the category.
Tractability
The tractability difference between population and quality risk seems the least clear of the three criteria. My general approach is thinking through the most likely “theories of change” or paths to impact and assessing them step-by-step. For example, one commonly discussed extinction risk reduction path to impact is “agent foundations,” building mathematical frameworks and formally proving claims about the behavior of intelligent agents, which would then allow us to build advanced AI systems more likely to do what we tell them to do, and then using these frameworks to build AGI or persuading the builders of AGI to use them. Quality-risk-focused AI safety strategies may be more focused on the outer alignment problem, ensuring that an AI’s objective is aligned with the right values, rather than just the inner alignment problem, ensuring that all actions of the AI are aligned with the objective.[1] Also, we can influence quality by steering the “direction” or “speed” of the long-term future, approaches with potentially very different impact, hinging on factors such as the distribution of likely futures across value and likelihood (e.g., Anthis 2018c; Anthis and Paez 2021).
One argument that I often hear on the tractability of trajectory changes is that changes need to “stick” or “persist” over long periods. It is true that there needs to be a persistent change in the expected value (i.e., the random variable or time series regime of value in the future), but I frequently hear the claim that there needs to be a persistent change in the realization of that value. For example, if we successfully broker a peace deal between great powers, neither the peace deal itself nor any other particular change in the world has to persist in order for this to have high long-term impact. The series of values itself can have arbitrarily large variance, such as it being very likely that the peace deal is broken within a decade.
For a sort of change to be intractable, it needs to not just lack persistence, but to rubber band (i.e., create opposite-sign effects) back to its counterfactual. For example, if brokering a peace deal causes an equal and opposite reaction of anti-peace efforts, then that trajectory change is intractable. Moreover, we should not only consider rubber banding but dominoing (i.e., create same-sign effects), perhaps because of how this peace deal inspires other great powers to follow suit even if this particular deal is broken. There is much of this potential energy in the world waiting to be unlocked by thoughtful actors.
The tractability of trajectory change has been the subject of research at Sentience Institute, including our historical case studies and “Harris’ (2019)” How Tractable Is Changing the Course of History?”
Neglectedness
The neglectedness difference between population and quality risk seems the most clear. There are far more EAs and longtermists working explicitly on population risks than on quality risks (i.e., risks to the moral value of individuals in the long-term future). Two nuances for this claim are first that it may not be true for other relevant comparisons: For example, many people in the world are trying to change social institutions, such as different sides of the political spectrum trying to pull public opinion towards their end of the spectrum. This group seems much larger than people focused explicitly on extinction risks, and there are many other relevant reference classes. Second, it is not entirely clear whether extinction risk reduction and quality risk reduction face higher or lower returns to being less neglected (i.e., more crowded). It may be that so few people are focused on quality risks that marginal returns are actually lower than they would be if there were more people working on them (i.e., increasing returns).
- ↩︎
In my opinion, there are many different values involved in developing and deploying an AI system, so the distinction between inner and outer alignment is rarely precise in practice. Much of identifying and aligning with “good” or “correct” values can be described as outer alignment. In general, I think of AI value alignment as a long series of mechanisms from the causal factors that create human values (which themselves can be thought of as objective functions) to a tangled web of objectives in each human brain (e.g., values, desires, preferences) to a tangled web of social objectives aggregated across humans (e.g., voting, debates, parliaments, marketplaces) to a tangled web of objectives communicated from humans to machines (e.g., material values in game-playing AI, training data, training labels, architectures) to a tangled web of emergent objectives in the machines (e.g., parametric architectures in the neural net, (smoothed) sets of possible actions in domain, (smoothed) sets of possible actions out of domain) and finally to the machine actions (i.e., what it actually does in the world). We can reasonably refer to the alignment of any of these objects with any of the other objects in this long, tangled continuum of values. Two examples of outer alignment work that I have in mind here are Askell et al. (2021) “A General Language Assistant as a Laboratory for Alignment” and Hobbhan et al. (2022) “Reflection Mechanisms as an Alignment Target: A Survey.”
- ↩︎
Jamie Harris at Sentience Institute authored a report on “Social Movement Lessons From the US Anti-Abortion Movement” that may be of interest.
That’s right that we don’t have any ongoing projects exclusively on the impact of AI on nonhuman biological animals, though much of our research includes that, especially the outer alignment idea of ensuring an AGI or superintelligence accounts for the interests about all sentient beings, including wild and domestic nonhuman biological animals. We also have several empirical projects where we collect data on both moral concern for animals and for AI, such as on perspective-taking, predictors of moral concern, and our recently conducted US nationally representative survey on Artificial Intelligence, Morality, and Sentience (AIMS).
For various reasons discussed in those nonhumans and the long-term future posts and in essays like “Advantages of Artificial Intelligences, Uploads, and Digital Minds” (Sotala 2012), biological nonhuman animals seem less likely to exist in very large numbers in the long-term future than animal-like digital minds. That doesn’t mean we shouldn’t work on the impact of AI on those biological nonhuman animals, but it has made us prioritize laying groundwork on the nature of moral concern and the possibility space of future sentience. I can say that we have a lot of researcher applicants propose agendas focused more directly on AI and biological nonhuman animals, and we’re in principle very open to it. There are far more promising research projects in this space than we can fund at the moment. However, I don’t think Sentience Institute’s comparative advantage is working directly on research projects like CETI or Interspecies Internet that wade through the detail of animal ethology or neuroscience using machine learning, though I’d love to see a blog-depth analysis of the short-term and long-term potential impacts of such projects, especially if there are more targeted interventions (e.g., translating farmed animal vocalizations) that could be high-leverage for EA.
Good points! This is exactly the sort of work we do at Sentience Institute on moral circle expansion (mostly for farmed animals from 2016 to 2020, but since late 2020, most of our work has been directly on AI—and of course the intersections), and it has been my priority since 2014. Also, Peter Singer and Yip Fai Tse are working on “AI Ethics: The Case for Including Animals”; there are a number of EA Forum posts on nonhumans and the long-term future; and the harms of AI and “smart farming” for farmed animals is a common topic, such as this recent article that I was quoted in. My sense from talking to many people in this area is that there is substantial room for more funding; we’ve gotten some generous support from EA megafunders and individuals, but we also consistently get dozens of highly qualified applicants whom we have to reject every hiring round, including people with good ideas for new projects.
Same perspective here! Thank you for sharing.
Oh, sorry, I was thinking of the arguments in my post, not (only) those in your post. I should have been more precise in my wording.
Thank you for the reply, Jan, especially noting those additional arguments. I worry that your article neglects them in favor of less important/controversial questions on this topic. I see many EAs taking the “very unlikely that [human descendants] would see value exactly where we see disvalue” argument (I’d call this the ‘will argument,’ that the future might be dominated by human-descendant will and there is much more will to create happiness than suffering, especially in terms of the likelihood of hedonium over dolorium) and using that to justify a very heavy focus on reducing extinction risk, without exploration of those many other arguments. I worry that much of the Oxford/SF-based EA community has committed hard to reducing extinction risk without exploring those other arguments.
It’d be great if at some point you could write up discussion of those other arguments, since I think that’s where the thrust of the disagreement is between people who think the far future is highly positive, close to zero, and highly negative. Though unfortunately, it always ends up coming down to highly intuitive judgment calls on these macro-socio-technological questions. As I mentioned in that post, my guess is that long-term empirical study like the research in The Age of Em or done at Sentience Institute is our best way of improving those highly intuitive judgment calls and finally reaching agreement on the topic.
Thanks for posting on this important topic. You might be interested in this EA Forum post where I outlined many arguments against your conclusion, the expected value of extinction risk reduction being (highly) positive.
I do think your “very unlikely that [human descendants] would see value exactly where we see disvalue” argument is a viable one, but I think it’s just one of many considerations, and my current impression of the evidence is that it’s outweighed.
Also FYI the link in your article to “moral circle expansion” is dead. We work on that approach at Sentience Institute if you’re interested.
I remain skeptical of how much this type of research will influence EA-minded decisions, e.g. how many people would switch donations from farmed animal welfare campaigns to humane insecticide campaigns if they increased their estimate of insect sentience by 50%? But I still think the EA community should be allocating substantially more resources to it than they are now, and you seem to be approaching it in a smart way, so I hope you get funding!
I’m especially excited about the impact of this research on general concern for invertebrate sentience (e.g. establishing norms that there are at least some smart humans are actively working on insect welfare policy) and on helping humans better consider artificial sentience when important tech policy decisions are made (e.g. on AI ethics).
[1] Cochrane mass media health articles (and similar):
Targeted mass media interventions promoting healthy behaviours to reduce risk of non-communicable diseases in adult, ethnic minorities
Mass media interventions for smoking cessation in adults
Mass media interventions for preventing smoking in young people.
Mass media interventions for promoting HIV testing
Smoking cessation media campaigns and their effectiveness among socioeconomically advantaged and disadvantaged populations
Population tobacco control interventions and their effects on social inequalities in smoking: systematic review
Are physical activity interventions equally effective in adolescents of low and high socioeconomic status (SES): results from the European Teenage project
The effectiveness of nutrition interventions on dietary outcomes by relative social disadvantage: a systematic review
Use of folic acid supplements, particularly by low-income and young women: a series of systematic reviews to inform public health policy in the UK
Use of mass media campaigns to change health behaviour
The role of the media in promoting and reducing tobacco use
Getting to the Truth: Evaluating National Tobacco Countermarketing Campaigns
Effect of televised, tobacco company-funded smoking prevention advertising on youth smoking-related beliefs, intentions, and behavior
Do mass media campaigns improve physical activity? a systematic review and meta-analysis
I can’t think of anything that isn’t available in a better form now, but it might be interesting to read for historical perspective, such as what it looks like to have key EA ideas half-formed. This post on career advice is a classic. Or this post on promoting Buddhism as diluted utilitarianism, which is similar to the reasoning a lot of utilitarians had for building/promoting EA.
The content on Felicifia.org was most important in my first involvement, though that website isn’t active anymore. I feel like forum content (similar to what could be on the EA Forum!) was important because it’s casually written and welcoming. Everyone was working together on the same problems and ideas, so I felt eager to join.
Just to add a bit of info: I helped with THINK when I was a college student. It wasn’t the most effective strategy (largely, it was founded before we knew people would coalesce so strongly into the EA identity, and we didn’t predict that), but Leverage’s involvement with it was professional and thoughtful. I didn’t get any vibes of cultishness from my time with THINK, though I did find Connection Theory a bit weird and not very useful when I learned about it.
I get it pretty frequently from newcomers (maybe in the top 20 questions for animal-focused EA?), but everyone seems convinced by a brief explanation of how there’s still a small chance of big purchasing changes even if every small consumption change doesn’t always lead to a purchasing change.
Exactly. Let me know if this doesn’t resolve things, zdgroff.
Yes, terraforming is a big way in which close-to-WAS scenarios could arise. I do think it’s smaller in expectation than digital environments that develop on their own and thus are close-to-WAS.
I don’t think terraforming would be done very differently than today’s wildlife, e.g. done without predation and diseases.
Ultimately I still think the digital, not-close-to-WAS scenarios seem much larger in expectation.
I’d qualify this by adding that the philosophical-type reflection seems to lead in expectation to more moral value (positive or negative, e.g. hedonium or dolorium) than other forces, despite overall having less influence than those other forces.
Thanks for commenting, Lukas. I think Lukas, Brian Tomasik, and others affiliated with FRI have thought more about this, and I basically defer to their views here, especially because I haven’t heard any reasonable people disagree with this particular point. Namely, I agree with Lukas that there seems to be an inevitable tradeoff here.
Whoops! Thanks!