Short-Term AI Alignment as a Priority Cause
In this post, I will argue that short-term AI alignment should be viewed as today’s greatest priority cause, whether you are concerned by long-term AGI risks or not.
To do so, I will first insist on the fact that AIs are automating information collection, storage, analysis and dissemination; and that they are now doing a lot of this much better than humans. Yet, many of the priority cause areas in EA strongly depend on collecting, storing, analyzing and disseminating quality information. As of today, an aligned large-scale AI would thus be a formidable ally for EA.
In this post, I will particularly focus on the cases of public health, animal suffering, critical thinking and existential risks, since these are leading priority causes in EA.
The Power of Information
I already discussed this on LessWrong. But I fear that AI discussions are annoyingly too focused on futuristic robots, even within the EA community. In contrast, in this post, I propose to stress the growing role of algorithms and information processing in our societies.
It’s indeed noteworthy that the greatest disruptions in human history have arguably been the invention of new information technologies. Language enabled coordination, writing enabled long-term information storage, printing press enabled scalable information dissemination, and computing machines enabled ultra-fast information processing. We also now have all sorts of sensors and cameras to scale data collection, and worldwide fiber-optics for super-reliable worldwide information communication.
Information technologies powered new sorts of economies, organizations, science discoveries, industrial revolutions, agricultural practices and product customization. They also moved our societies towards information societies. These days, essentially all jobs are information processing jobs, from the CEO of the largest company down to the baby-sitter. Scientists, journalists, managers, software developers, lawyers, doctors, workers, teachers, drivers, regulators, and even effective altruists — or me writing this post. All spend their days doing mostly information processing; they collect, store, analyze and communicate information.
When Yuval Noah Harari came to EPFL, his talk was greatly focused on information. “Those who control the flow of data in the world control the future, not only of humanity, but also perhaps of life itself”, he said. This is because, according to Harari, “humans are hackable”. As psychology keeps showing it, the information we are exposed to radically biases our beliefs, preferences and habits, with both short-term and long-term effects.
Now ask yourself. Today, who is the most in control the flow of information? Which entity holds more than any other, according to Harari, “the future of life”?
I would argue that, by far, algorithms have taken the control of the flow of information. Well, not all algorithms. Arguably, a handful of algorithms are controlling the flow of information more than all humans combined; and arguably, the YouTube algorithm is more in control of information than any other algorithm — with 1 billion watch-time hours per day for 2 billion users, 70% of which are results of recommendations.
And as algorithms become better and better at complex information processing, because of economical incentives, they seem bound to take more and more control of the information that powers our information societies. It seems critical that they be aligned with what we truly want them to do.
How short-term alignment can help all EA causes
In the sequel, I will particularly focus on the global impacts that the alignment of large-scale algorithms, like the YouTube algorithm, could have on some of the main EA causes.
Impact on public health
Much of healthcare is an information processing challenge. In particular, early diagnosis is often critical to efficient treatment. Enabling anomaly detection with non-intrusive sensors, like a mere picture with a phone, could enable great improvement in public health, especially if accompanied by adequate medical recommendations (which may be as simple as “you should see a doctor”). While exciting, and while there are major AI Safety challenges in this regard, I will not dwell on them since alignment is arguably not the bottleneck here.
On the other hand, much of public health has to do with daily habits, which are strongly influenced by recommender systems like the YouTube algorithm. Unfortunately, as long as they are unaligned, these recommender systems might encourage poor habits, like fast food consumption, taking the car for transport or binge-watching videos for hours without exercising.
More aligned recommender systems might instead encourage good habits, for instance in terms of hygiene habits, quality food recommendations and encouragements to do sports. By customizing adequately video recommendations, it might be even possible to motivate users to cook healthy food or practice sports that the users are more likely to enjoy.
A more tractable beneficial recommendation could be the promotion of evidenced-based medicine with large effect sizes, like vaccination. The World Health Organization reported 140,000 deaths by measles in 2018, for which a vaccine exists. Unfortunately, the anti-vaccination propaganda seems to have slowed down the systematic vaccination of children. Even if only 10% of deaths by measles could have been avoided by exposure to better information, this still represents tens of thousands of lives that could be saved by more aligned recommendation algorithms for measles alone.
As a more EA example, we can consider the case of the Malaria Consortium (or other GiveWell top charities). Much of philanthropy could become a lot more effective if donators were better informed. An aligned recommender could stress this fact, and recommend effective charities, as opposed to appealing ineffective ones. Thousands, if not hundreds of thousands of lives, could probably be saved by exposing potential donators to better quality information.
To conclude this section, I would like to stress the growing challenges with mental health. This will arguably be the ultimate frontier of healthcare, and a major cause for utilitarians. Unfortunately, fighting addiction, loneliness, depression and suicide seems nearly intractable through conventional channels. But data from social medias may provide formidable radically new means to diagnose these mental health conditions, as a Facebook study suggests. Interestingly, by aligning recommender algorithms, social medias could provide means to treat such conditions, for instance by recommending effective therapeutic contents. Indeed, studies showed that the mere exposure to the principles of cognitive behavioral therapy improved patients’ conditions. Alternatively, algorithms could simply recommend contents that encourage viewers in need to see a psychiatrist.
Impact on animal suffering
Another important cause in EA is animal suffering. Here, again, it seems that information is critical. Most people seem to simply be unaware of the horror and scale of industrial farming. They also seem to neglect the impact of their daily consumptions on the incentive structure that motivates industrial farming.
But this is not all. Our food consumption habits arguably strongly depend on our social and informational environments. By fostering communities that, for instance, like to try different substitutes to meat, it seems more likely to convince a larger population to at least try such substitutes, which could reduce significantly our impacts on animal suffering, and on the environment.
(I once pointed this out to Ed Winters, a vegan YouTuber activist, who acknowledged that the number of views of his videos seems mostly controlled by the YouTube algorithm. Our discussion was recorded, and I guess it will be on his YouTube channel soon...)
It may also be possible to nudge biologists and aspiring biologists towards research on, say, meat substitutes. This seems critical to accelerate the development of such substitutes, but also of their price, which could then have a strong impact on animal suffering.
Finally, one of the great challenges of cultivated meat may be its social acceptance. There may be a lot of skepticism merely due to a misunderstanding, either of the nature of cultivated meat, or of the “naturalness” of conventional meat.
Impact on critical thinking
This leads us to what may be one of the most impactful consequences of aligned recommender systems. It might be possible to promote much more effectively critical thinking, at least within intellectual communities. Improving the way a large population thinks may be one of the most effective way to do a lot of good in a robust manner.
As a convinced Bayesian (with an upcoming book on the topic!), I feel that the scientific community (and others) would gain a lot by pondering at much greater length their epistemology, that is, how they came to believe what they believe, and what they ought to do to acquire more reliable beliefs. Unfortunately, most scientists seem to neglect the importance of thinking in bets. While they usually acknowledge themselves that they are poor in probability theory, they mostly seem fine with their inability to think probabilistically. When it comes to preparing ourselves for an uncertain future, this shortcoming seems very concerning. Arguably, this is why AI researchers are not sufficiently concerned by AGI risks.
An aligned algorithm could promote contents that stress the importance of thinking probabilistically, the fundamental principles to do so and the useful exercises to train our intuitions of probability, like the Bayes-up application.
Perhaps more importantly still, an aligned algorithm could be critical to promote intellectual honesty. Studies suggest that what’s lacking in people’s reasonings is often not information itself, but the ability to process information in an effective unbiased manner. Typically, more informed Republicans are also more likely to deny climate change. One hypothesis is that this is because they also gain the ability to better lie to themselves.
In this video (and probably her upcoming book), Julia Galef argues that the most effective way to combat our habit to lie to ourselves is to design incentives that reward intellectual honesty, changing our own minds, providing clear arguments, dismissing our own bullshits, and so on. While many of such rewards may be designed internally (by and for our own brains), because we are social creatures, most will likely need to come from our environments. Today, much of this environment and of the social rewards we receive come from social medias; and unfortunately, most people usually receive greater rewards (likes and retweets) by being offensive, sarcastic and caricatural.
An aligned algorithm could align our own rewards with what motivates intellectual honesty, by favoring connections with communities that value intellectual honesty, modesty and growth mindset. Thereby, the aligned algorithm may be effective in aligning ourselves with what we truly desire; not with our bullshits.
Impact on existential risks
What may be most counter-intuitive is that short-term alignment may be extremely useful for long-term AGI alignment (and other existential risks). In fact, to be honest, this is why I care so much about short-term alignment. I care about short-term alignment because I see this as the most effective way to increase the probability of achieving long-term AGI alignment.
An aligned recommender algorithm could typically promote video contents on long-term concerns. This would be critical to nudge people towards longer-term perspectives, and to combat our familiarity bias. This seems crucial as well to defend the respectability of long-term perspectives.
Perhaps more importantly, the great advantage of focusing on short-term alignment is that it makes it a lot easier to convince scientists, philosophers, but also engineers, managers and politicians to invest time and money on alignment. Yet, all such expertises (and still others) seem critical for robust alignment. We will likely need the formidable interdisciplinary collaboration of thousands, if not hundreds of thousands, of scholars and professionals to increase significantly the probability of long-term AGI alignment. So let’s start recruiting them, one after the other, using arguments that they will find more compelling.
But this is not all. Since short-term alignment is arguably not completely different from long-term alignment, this research may be an excellent practice to better outline the cruxes and the pitfalls we will encounter for long-term alignment. In fact, some of the research on short-term alignment (see for instance this page on social choice) might be giving more reliable insights into long-term alignment than long-term alignment research itself, which can be argued to be sometimes too speculative.
Typically, it does not seem unlikely that long-term alignment will have to align algorithms of big (private or governmental) organizations, even though most people in these big organizations neglect the negative side effects of their algorithms.
Conclusion
I have sometimes faced a sort of contempt for short-term agendas within EA. I hope to have convinced you in this post that this contempt may have been highly counter-productive, because it might have led to the neglect of short-term AI alignment research. Yet, short-term AI alignment research seems critical to numerous EA causes, perhaps even including long-term AGI alignment.
To conclude, I would like to stress the fact that this post is the result of years of reflexions by a few of us, mostly based in Lausanne, Switzerland. Our reflexions culminated in the publication of a Robustly Beneficial AI book in French called Le Fabuleux Chantier, whose English translation is pending (feel free to contact us directly to see the current draft). But we have also explored other information dissemination formats, like the Robustly Beneficial Podcast (YouTube, iTunes, RSS) and the Robustly Beneficial Wiki.
In fact, after successfully initiating a research group at EPFL (with papers at ICML, NeurIPS,...), we are in the process of starting an AI Safety company, called Calicarpa, to exploit our published results and softwares (see for example this). Also, we have convinced a researcher in Morocco to tackle these questions, who’s now building a team and looking for 3 postdocs to do so.
- Big List of Cause Candidates by 25 Dec 2020 16:34 UTC; 282 points) (
- Pathways: Google’s AGI by 25 Sep 2021 7:02 UTC; 47 points) (LessWrong;
- Are social media algorithms an existential risk? by 15 Sep 2020 8:52 UTC; 24 points) (
- Looking for collaborators after last 80k podcast with Tristan Harris by 7 Dec 2020 22:23 UTC; 19 points) (
- Google’s ethics is alarming by 25 Feb 2021 5:57 UTC; 6 points) (
- 課題候補のビッグリスト by 20 Aug 2023 14:59 UTC; 2 points) (
Thanks for writing this!
I don’t see much quantitative analysis in this post demonstrating this. You’ve shown that it’s plausibly something worth working on and it can impact other priorities, but not that work on this is better than other work, either by direct comparison or by putting it on a common scale (e.g. Scale/Importance + Solvability/Tractability + Neglectedness).
I think in health and poverty in developing countries, there are well-known solutions that don’t need AI or recommender alignment, although more powerful AI, more data and targeted ads might be useful in some cases (but generally not, in my view). Public health generally seems to be a lower priority than health and poverty in developing countries, but maybe the gains across many domains from better AI can help, but even then, is alignment the problem here, or is it just that we should collect more data, use more powerful algorithms and/or pay for targeted ads?
For animal welfare, I think more sophisticated targeted ads will probably go further than trying to align recommender systems, and it’s not clear targeted ads are particularly cost-effective compared to, say, corporate outreach/campaigns, so tweaking them might have little value (I’m not sure how much of a role targeted ads play in corporate campaigns, though).
It’s unfortunately very hard to quantify the impact of recommender systems. But here’s one experiment that may update your prior on the effectiveness of targeted video recommendations.
In 2013, Facebook did a large-scale experiment where they tweaked their newsfeeds. For some of their users, they removed 10% of posts with negative contents. For others, they removed 10% of posts with positive contents. And there was also a control group. After only one week, they observed a change in users’ behaviors: the first group posted more positive contents, and the second posted more negative contents. Here’s our discussion of the paper.
I cannot stress enough the fact that Facebook’s intervention was minimal. A 2020 aligned recommender system could completely upset the entire newsfeed. Also, it would do so for longer than just a week — and other studies suggest that it takes months for users to learn even basic adaptation to tweaks of the algorithm.
In terms of healthcare in developing countries, it still seems to me that most philanthropists or small donators neglect the effectiveness of healthcare interventions in developing countries. Yet videos (not ads!) that do discuss this are not necessarily sufficiently promoted by the YouTube algorithm. As an example, this excellent video by Looking Glass Universe only has a fraction of the views of her other videos. Similarly, this video by Bill Gates has a lot less views than his videos on, say, climate change.
Note that this other post tried to quantify aligning recommender systems in terms of the common scale Scale/Tractability/Neglectedness. But the authors acknowledge themselves that they have low confidence in their estimates. But I’d argue that this goes on showing just how neglectedness this cause area is. Scientists both within and outside these companies have a hard time estimating the impact and tractability of (making progress in) aligning recommender systems (most don’t even know what “alignment” is).
I think this is an interesting topic. However, I downvoted because if you’re going to claim something is the “greatest priority cause,” which is quite a claim, I would at least want to see an analysis of how it fares against other causes on scale, tractability + neglectedness.
(Basically I agree with MichaelStJules’s comment, except I think the analysis need not be quantitative.)
I guessed the post strongly insisted on the scale and neglectedness of short-term AI alignment. But I can dwell more on this. There are now more views on YouTube than searches on Google, 70% of which are results of recommendation. And a few studies show (cited here) suggest that the influence of repeated exposure to some kind of information has a strong effect on beliefs, preferences and habits. Since this has major impacts on all other EA causes, I’d say the scale of the problem is at least that of any other EA cause.
I believe that alignment is extremely neglected in academia and industry, and short-term alignment is still greatly neglected within EA.
The harder point to estimate is tractability. It is noteworthy that Google, Facebook and Twitter have undertaken a lot of measures recently towards more ethical algorithms, which suggest that it may be possible to get them to increase the amount of ethics in their algorithms. The other hard part is technical. While it might be possible to upgrade some videos “by hand”, it seems desirable to have more robust sustainable solutions to robustly beneficial recommendation. I think that having a near-perfect recommender is technically way out of reach (it’s essentially solving AGI safety!). But there are likely numerous small tweaks that can greatly improve how robustly beneficial the recommender system is.
Of course, all of this is a lot more complex to discuss. I’ve only presented a glance of what we discuss in our book or in our podcast. And I’m very much aware of the extent of my ignorance, which is unfortunately huge...
Many of these advantages (e.g. aligned recommenders pushing people towards longtermism, or animal rights) seem more like aligning recommenders with our values than any neutral account of alignment. It seems than any ideology could similarly claim that aligned recommenders are important for introducing people to libertarianism/socialism/conservatism/feminism etc. In contrast, this probably isn’t the best interests of the viewer—e.g. your average omnivore probably doesn’t want to be recommended videos about animal suffering.
I’m curious to get a more concrete proposal in this domain. For example, what could a sympathetic Youtube engineer do to make the things you propose come about?
I didn’t see any mentions of existing organizations that work on recommender alignment (even if they don’t use the “short-term aligned AI” framing). It sounds as though many of the goals/benefits you discuss here could come from tweaks to existing algorithms that needn’t be connected to AI alignment (if Facebook wanted to focus on making users healthier, would it need “alignment” to do so?).
What do you think of the goals of existing “recommender alignment” organizations, like the Center for Humane Technology? They are annoyingly vague about their goals, but this suggestion sheet lays out some of what they care about: Users being able to focus, not being stressed, etc.
Why would an aligned recommender stress this fact? Is this something we could have much influence over?
The importance of ethics in YouTube recommendation seems to have grown significantly over the last two years (see this for instance). This suggests that there are pressures both from outside and inside that may be effective in making YouTube care about recommending quality information.
Now, YouTube’s effort seems to have been mostly about removing (or less recommending) undesirable contents so far (though as an outsider it’s hard for me to say). Perhaps they can be convinced to also recommend more desirable contents too.
Possibly. One trend in YouTube’s recommendations seems to be towards more mainstream content, and EA, x-risks and farm animal welfare/rights aren’t really mainstream topics (animal rights specifically might be considered radical), so any technical contributions to recommender alignment might be used further to the exclusion of these topics and be net-negative.
Advocacy, policy and getting the right people on (ethics) boards might be safer. Maybe writing about the issue for Vox’s Future Perfect could be a good place to start?
Related: Aligning Recommender Systems as Cause Area by Ivan Vendrov and Jeremy Nixon