We, El Mahdi El Mhamdi and Lê Nguyên Hoang, are about to release a book in French on AI safety. The deadline for the final version of the manuscript is September 30. Next, we will be working on the English translation of the book. But before all that, we would love to have a few feedbacks. Hence this article.
The book defends three main theses and provides an understandable explanation of key technical aspects of AI Safety for a wide audience.
Thesis 1. Making AIs beneficial is urgent. Thesis 2. Making AIs beneficial is a huge challenge.
Combining these two theses, we then conclude with the main thesis.
Thesis 3. All sorts of talents should be put in the best conditions to help make AIs beneficial.
Defending Thesis 3 is really the goal of the book. We want to convince readers that a lot more focus and investments should be made to encourage and allow all kinds of talents, technical and nontechnical, to contribute actively and efficiently to make AIs beneficial.
To this end, we then detail Theses 1 and 2 with further arguments.
Making AIs beneficial is urgent
Chapter 2 insists on the fact that AIs are already widespread and hugely influential. We present several reasons for this. Most notably, AIs process huge amounts of data at minimal costs, with now reasonable, and sometimes superhuman, performances. In particular, we stress that today’s most powerful AIs are probably (Facebook and YouTube’s) recommender systems, which influence billions of people every day. For instance, there are now more views on YouTube than searches on Google. They add up to over 1 billion watch-time hours per day! This is an average of half an hour for the 2 billion users. Yet 70% of these views are results of recommendations by YouTube’s AI.
Chapter 4 takes a step back at the notion of information. We argue that the critical role of information, in science and civilizations, is neglected. In fact, recent advancements in physics, biology or economics are often related to greater focus on information. Meanwhile, some of greatest breakthroughs in history, like the invention of language, writing or printing, consist of a greater mastery of information. Finally, these days, it seems that nearly all high-paying jobs are mostly information processing jobs. Given that AIs are information processing tools, they seem to be bound to completely upset our societies. It seems urgent to be aware of and to direct the upcoming upheavals towards good.
Chapter 5 then argues against the possibility to greatly slow down the pace of AI progress, mostly because of financial and political incentives. But also, we argue, because of (perceived) moral incentives. From healthcare to energy, from research to activism, we argue that there are enormous benefits to developing AIs. In fact, if we are concerned by AIs, we argue, our focus should probably not be to slow down the progress of any particular AI. It probably should rather be to make AIs, especially influential AIs, beneficial, as argued for example by OpenAI.
Chapter 6 addresses the possibility of human-level AI. We first insist on the fact that such an AI’s side effects should be expected to be much larger than are today’s AIs’. Next, based on surveys of experts, past performances of expert predictions and further more theoretical considerations, we argue (very conservatively) that the probability of human-level AI by 2025 should be given a probability of at least 1%. We argue that this is definitely sufficient to be extremely worried about human-level AI (though, to avoid controversies with human-level AI skeptics, we do make a point that our claim is not necessary in order to defend Thesis 1). This concludes our defense of Thesis 1. We then move on to Thesis 2.
Making AIs beneficial is a huge challenge
Chapter 7 discusses the risk of AI race, and the constraints that this AI race implies. We argue that, because of this inevitable race, we cannot demand too constraining constraints on AIs. Or put differently, there are constraints on which constraints can be required to make AIs beneficial. In particular, to determine which constraints should be required, it seems essential to have an in-depth technical understanding of AIs. We also discuss ways to mitigate the AI race, by discussing the advantages of monopolies in terms of long-term planning and safety incentives, among other things. We also make the distinction between making beneficial AIs and making AIs beneficial. The latter seems much harder.
Chapter 8 addresses the control problem. We insist on the fact that this is much harder than one might naively think, by focusing on the example of the YouTube recommender system. You would probably need to convince a huge fraction of YouTube’s board to achieve the interruption of YouTube’s AI. This cannot be done easily. We even argue that designing a control switch may even be a flaw for AI safety. Indeed, any switch might then be activated by some tired, drunk or blackmailed engineer, or even by some malicious hacker. In software security, such a “backdoor” is often regarded as a potential breach to be avoided. In fact, we argue that AIs should be made safe despite humans.
Chapter 9 (finally!) introduces machine learning. We argue that today’s and tomorrow’s most influential AIs will likely rely on (something similar to) the reinforcement learning framework. Essentially, each AI will be associated with a reward mechanism, and will choose actions that seem to maximize discounted future expected rewards. We explain the basics of this in a (hopefully) understandable language. We also discuss exploitation versus exploration, and unsafe exploration.
Chapter 10 stresses the importance of the objective function (= reward mechanism) in reinforcement learning. We stress the difficulty of designing adequate rewards, by illustrating Goodhart’s law and reward hacking. We also present the orthogonality thesis, instrumental goals and instrumental convergence. We finally argue that AI alignment, i.e. making sure the AI’s objective reflects humans’ values, is nearly a necessary and sufficient condition for making powerful AIs robustly beneficial. We then move on to ideas and challenges to implement robust AI alignement.
A roadmap towards beneficial AIs
Chapter 11 insists on the importance of quality data and quality inference from the data to do good (and avoid doing harm). We stress numerous difficulties, like privacy and adversarial machine learning. We also argue for the importance of modelling uncertainty. We argue for more research in this direction.
Chapter 12 then moves on the difficulty to agree on what objective to assign to AIs. It stresses the relevancy of voting systems and so-called social choice theories to achieve agreement in a limited amount of time. Based on the Moral Machine experiment, we present several additional difficulties, like biased voter demographics and limited computational power. We discuss ideas to mitigate these issues, like inverse reinforcement learning and heuristics. Again, more research in this direction seems desirable.
Chapter 13 then argues that we should not design AI’s goals based on humans’ (declared) preferences. In particular, we stress the presence of numerous cognitive biases and inconsistencies in our moral judgments. Instead, we argue for Yudkowsky’s coherent extrapolated volition. We also argue that the Moral Machine experiment could be regarded as a primitive form of coherent extrapolated volition, though still unsatisfactory and limited to a very restricted kind of moral dilemma. We also discuss at length normative moral uncertainty, by arguing for instance that we should be aware of our difficulty to distinguish preferences (what we want) from volitions (what we would want to want). Unfortunately, there seems to be little research on this problem.
Chapter 14 discusses the risk of wireheading, i.e. the AI hacking its own reward mechanism. Because of this, we argue that an AI should not be given directly the rewards we want it to maximize. Instead, we argue that the rewards should be such that the AI will want to protect, and even enhance, the reward mechanism. We make parallels with how this applies to humans as well, who might hijack their reward mechanism by, say, taking drugs. In fact, we argue that designing incentive-compatible rewards is perhaps the most critical (and most neglected) challenge to make AIs robustly beneficial. We believe that a lot more attention should be given to this problem.
Chapter 15 adds another challenge: decentralization. If all computations were made on a single machine, then a crash of this machine would break down the AI. To avoid this, today’s large-scale AIs (like the YouTube recommender system) are already widely distributed among a large number of different machines. We argue that future AIs will likely be similar, which raises additional difficulties, like Byzantine fault tolerance or specialized reward mechanism design. Research on distributed machine learning has begun to flourish recently. But it seems that many of the questions we raise have not been sufficiently addressed yet. This concludes the defense of Thesis 2.
Remarks and conclusion
The book ends with two chapters somewhat different. Chapter 16 promotes what we called “computational moral philosophy”. It essentially consists of combining moral philosophy with computer science. In particular, we stress the importance of data-driven approach to moral philosophy. We also use complexity theory to discuss additional pragmatic constraints on moral philosophy. This chapter attempts a response to the need for a “philosophy on a deadline” that technology creates.
Finally, Chapter 17 concludes by inviting readers to reflect further on AIs and on the theses of the book (which we were told to do more within the book as well). We also suggest a wide range of possible actions to contribute to make AIs beneficial, like funding research, raising awareness (with care!) especially among talents, valuing ethics and safety, learning and teaching machine learning, organizing discussion groups and following content creators like 80,000 Hours or the EA forum :)
In particular, we stress the need to have all sorts of different expertises working on AI alignment, like social scientists, psychologists, economists, mathematicians, legislators, jurists, investors, entrepreneurs, managers, directors and so on.
Concerns
While we both strongly believe that the book is very likely to be overall greatly beneficial, concerns have been brought to our attention about the possibility that our book may not be sufficiently robustly beneficial. It may have undesirable side effects. The main concern that we agree with is the risk of turning AI Safety into a political debacle, where subtle and nuanced ideas are torn apart. In particular, our current preferred choice of title has been hotly debated and questioned, especially by other EAs.
Right now, it is “AI kills—The fabulous enterprise to make it robustly beneficial”. We will be making sure that the subtitle will be written in a particularly large font on the cover. Moreover, in much of the book, especially in the last chapter, we have done our best to show excitement and enthusiasm about this “fabulous enterprise”. In fact, we believe that making AIs robustly beneficial may be the greatest and most intellectually stimulating enterprise that mankind will ever have to tackle!
Evidently, the “AI kills” part is designed to be clickbait. We do hope to reach a wide audience, which we regard as desirable to accelerate the spread of AI ethics in all sorts of corporations. Evidently, we expect the title to be misused and abused by many people. But we are confident that a 3-minute discussion with a calm person is sufficient to convince them of the relevancy of the title (see Chapter 3). Moreover, we believe that focusing on the example of YouTube’s AI will be helpful to move beyond the Terminator cliché, and to stress the problem of side effects. Nevertheless, we more than welcome your feedbacks on this hotly debated topic. In particular, we are still seeking better alternatives. So far though, it seems to us that the current title maximizes quite well the goal of convincing a sufficiently large audience of Thesis 3.
EDIT: Thanks to your useful feedbacks, we decided to change the title into something like “The fabulous enterprise to make artificial intelligence robustly beneficial”. In French: “Le fabuleux chantier pour rendre l’intelligence artificielle robustement bénéfique”.
Evidently, we also welcome any of your feedbacks on other aspects of the book. In particular, if you feel that we may have missed out on some important topic relevant to AI alignment, or if you think that there is a point we might be skipping over too quickly, please let us know (though, as you can guess, a lot more details are present in the book).
Unfortunately, so far, the book is only in French. But if you can read French and would like to (p)review our book, please feel free to get in touch with us (our emails can easily be found, for instance on the EPFL website).
Our forthcoming AI Safety book
Dear all,
We, El Mahdi El Mhamdi and Lê Nguyên Hoang, are about to release a book in French on AI safety. The deadline for the final version of the manuscript is September 30. Next, we will be working on the English translation of the book. But before all that, we would love to have a few feedbacks. Hence this article.
The book defends three main theses and provides an understandable explanation of key technical aspects of AI Safety for a wide audience.
Thesis 1. Making AIs beneficial is urgent.
Thesis 2. Making AIs beneficial is a huge challenge.
Combining these two theses, we then conclude with the main thesis.
Thesis 3. All sorts of talents should be put in the best conditions to help make AIs beneficial.
Defending Thesis 3 is really the goal of the book. We want to convince readers that a lot more focus and investments should be made to encourage and allow all kinds of talents, technical and nontechnical, to contribute actively and efficiently to make AIs beneficial.
To this end, we then detail Theses 1 and 2 with further arguments.
Making AIs beneficial is urgent
Chapter 2 insists on the fact that AIs are already widespread and hugely influential. We present several reasons for this. Most notably, AIs process huge amounts of data at minimal costs, with now reasonable, and sometimes superhuman, performances. In particular, we stress that today’s most powerful AIs are probably (Facebook and YouTube’s) recommender systems, which influence billions of people every day. For instance, there are now more views on YouTube than searches on Google. They add up to over 1 billion watch-time hours per day! This is an average of half an hour for the 2 billion users. Yet 70% of these views are results of recommendations by YouTube’s AI.
Chapter 3 discusses the large-scale negative side effects of recommender systems, like privacy, biases, filter bubbles, polarization, addiction to social media, mental disorders, junk news, pandemics of anger, and so on. In particular, based on the cases of the spread of anti-vaccination on one hand, and of the spread of stigmatization of minorities in Myanmar on the other hand, we argue that AI (already) kills. Not “intentionally”. But as a side effect of its attention maximization.
Chapter 4 takes a step back at the notion of information. We argue that the critical role of information, in science and civilizations, is neglected. In fact, recent advancements in physics, biology or economics are often related to greater focus on information. Meanwhile, some of greatest breakthroughs in history, like the invention of language, writing or printing, consist of a greater mastery of information. Finally, these days, it seems that nearly all high-paying jobs are mostly information processing jobs. Given that AIs are information processing tools, they seem to be bound to completely upset our societies. It seems urgent to be aware of and to direct the upcoming upheavals towards good.
Chapter 5 then argues against the possibility to greatly slow down the pace of AI progress, mostly because of financial and political incentives. But also, we argue, because of (perceived) moral incentives. From healthcare to energy, from research to activism, we argue that there are enormous benefits to developing AIs. In fact, if we are concerned by AIs, we argue, our focus should probably not be to slow down the progress of any particular AI. It probably should rather be to make AIs, especially influential AIs, beneficial, as argued for example by OpenAI.
Chapter 6 addresses the possibility of human-level AI. We first insist on the fact that such an AI’s side effects should be expected to be much larger than are today’s AIs’. Next, based on surveys of experts, past performances of expert predictions and further more theoretical considerations, we argue (very conservatively) that the probability of human-level AI by 2025 should be given a probability of at least 1%. We argue that this is definitely sufficient to be extremely worried about human-level AI (though, to avoid controversies with human-level AI skeptics, we do make a point that our claim is not necessary in order to defend Thesis 1). This concludes our defense of Thesis 1. We then move on to Thesis 2.
Making AIs beneficial is a huge challenge
Chapter 7 discusses the risk of AI race, and the constraints that this AI race implies. We argue that, because of this inevitable race, we cannot demand too constraining constraints on AIs. Or put differently, there are constraints on which constraints can be required to make AIs beneficial. In particular, to determine which constraints should be required, it seems essential to have an in-depth technical understanding of AIs. We also discuss ways to mitigate the AI race, by discussing the advantages of monopolies in terms of long-term planning and safety incentives, among other things. We also make the distinction between making beneficial AIs and making AIs beneficial. The latter seems much harder.
Chapter 8 addresses the control problem. We insist on the fact that this is much harder than one might naively think, by focusing on the example of the YouTube recommender system. You would probably need to convince a huge fraction of YouTube’s board to achieve the interruption of YouTube’s AI. This cannot be done easily. We even argue that designing a control switch may even be a flaw for AI safety. Indeed, any switch might then be activated by some tired, drunk or blackmailed engineer, or even by some malicious hacker. In software security, such a “backdoor” is often regarded as a potential breach to be avoided. In fact, we argue that AIs should be made safe despite humans.
Chapter 9 (finally!) introduces machine learning. We argue that today’s and tomorrow’s most influential AIs will likely rely on (something similar to) the reinforcement learning framework. Essentially, each AI will be associated with a reward mechanism, and will choose actions that seem to maximize discounted future expected rewards. We explain the basics of this in a (hopefully) understandable language. We also discuss exploitation versus exploration, and unsafe exploration.
Chapter 10 stresses the importance of the objective function (= reward mechanism) in reinforcement learning. We stress the difficulty of designing adequate rewards, by illustrating Goodhart’s law and reward hacking. We also present the orthogonality thesis, instrumental goals and instrumental convergence. We finally argue that AI alignment, i.e. making sure the AI’s objective reflects humans’ values, is nearly a necessary and sufficient condition for making powerful AIs robustly beneficial. We then move on to ideas and challenges to implement robust AI alignement.
A roadmap towards beneficial AIs
Chapter 11 insists on the importance of quality data and quality inference from the data to do good (and avoid doing harm). We stress numerous difficulties, like privacy and adversarial machine learning. We also argue for the importance of modelling uncertainty. We argue for more research in this direction.
Chapter 12 then moves on the difficulty to agree on what objective to assign to AIs. It stresses the relevancy of voting systems and so-called social choice theories to achieve agreement in a limited amount of time. Based on the Moral Machine experiment, we present several additional difficulties, like biased voter demographics and limited computational power. We discuss ideas to mitigate these issues, like inverse reinforcement learning and heuristics. Again, more research in this direction seems desirable.
Chapter 13 then argues that we should not design AI’s goals based on humans’ (declared) preferences. In particular, we stress the presence of numerous cognitive biases and inconsistencies in our moral judgments. Instead, we argue for Yudkowsky’s coherent extrapolated volition. We also argue that the Moral Machine experiment could be regarded as a primitive form of coherent extrapolated volition, though still unsatisfactory and limited to a very restricted kind of moral dilemma. We also discuss at length normative moral uncertainty, by arguing for instance that we should be aware of our difficulty to distinguish preferences (what we want) from volitions (what we would want to want). Unfortunately, there seems to be little research on this problem.
Chapter 14 discusses the risk of wireheading, i.e. the AI hacking its own reward mechanism. Because of this, we argue that an AI should not be given directly the rewards we want it to maximize. Instead, we argue that the rewards should be such that the AI will want to protect, and even enhance, the reward mechanism. We make parallels with how this applies to humans as well, who might hijack their reward mechanism by, say, taking drugs. In fact, we argue that designing incentive-compatible rewards is perhaps the most critical (and most neglected) challenge to make AIs robustly beneficial. We believe that a lot more attention should be given to this problem.
Chapter 15 adds another challenge: decentralization. If all computations were made on a single machine, then a crash of this machine would break down the AI. To avoid this, today’s large-scale AIs (like the YouTube recommender system) are already widely distributed among a large number of different machines. We argue that future AIs will likely be similar, which raises additional difficulties, like Byzantine fault tolerance or specialized reward mechanism design. Research on distributed machine learning has begun to flourish recently. But it seems that many of the questions we raise have not been sufficiently addressed yet. This concludes the defense of Thesis 2.
Remarks and conclusion
The book ends with two chapters somewhat different. Chapter 16 promotes what we called “computational moral philosophy”. It essentially consists of combining moral philosophy with computer science. In particular, we stress the importance of data-driven approach to moral philosophy. We also use complexity theory to discuss additional pragmatic constraints on moral philosophy. This chapter attempts a response to the need for a “philosophy on a deadline” that technology creates.
Finally, Chapter 17 concludes by inviting readers to reflect further on AIs and on the theses of the book (which we were told to do more within the book as well). We also suggest a wide range of possible actions to contribute to make AIs beneficial, like funding research, raising awareness (with care!) especially among talents, valuing ethics and safety, learning and teaching machine learning, organizing discussion groups and following content creators like 80,000 Hours or the EA forum :)
In particular, we stress the need to have all sorts of different expertises working on AI alignment, like social scientists, psychologists, economists, mathematicians, legislators, jurists, investors, entrepreneurs, managers, directors and so on.
Concerns
While we both strongly believe that the book is very likely to be overall greatly beneficial, concerns have been brought to our attention about the possibility that our book may not be sufficiently robustly beneficial. It may have undesirable side effects. The main concern that we agree with is the risk of turning AI Safety into a political debacle, where subtle and nuanced ideas are torn apart. In particular, our current preferred choice of title has been hotly debated and questioned, especially by other EAs.
Right now, it is “AI kills—The fabulous enterprise to make it robustly beneficial”. We will be making sure that the subtitle will be written in a particularly large font on the cover. Moreover, in much of the book, especially in the last chapter, we have done our best to show excitement and enthusiasm about this “fabulous enterprise”. In fact, we believe that making AIs robustly beneficial may be the greatest and most intellectually stimulating enterprise that mankind will ever have to tackle!
Evidently, the “AI kills” part is designed to be clickbait. We do hope to reach a wide audience, which we regard as desirable to accelerate the spread of AI ethics in all sorts of corporations. Evidently, we expect the title to be misused and abused by many people. But we are confident that a 3-minute discussion with a calm person is sufficient to convince them of the relevancy of the title (see Chapter 3). Moreover, we believe that focusing on the example of YouTube’s AI will be helpful to move beyond the Terminator cliché, and to stress the problem of side effects. Nevertheless, we more than welcome your feedbacks on this hotly debated topic. In particular, we are still seeking better alternatives. So far though, it seems to us that the current title maximizes quite well the goal of convincing a sufficiently large audience of Thesis 3.
EDIT: Thanks to your useful feedbacks, we decided to change the title into something like “The fabulous enterprise to make artificial intelligence robustly beneficial”. In French: “Le fabuleux chantier pour rendre l’intelligence artificielle robustement bénéfique”.
Evidently, we also welcome any of your feedbacks on other aspects of the book. In particular, if you feel that we may have missed out on some important topic relevant to AI alignment, or if you think that there is a point we might be skipping over too quickly, please let us know (though, as you can guess, a lot more details are present in the book).
Unfortunately, so far, the book is only in French. But if you can read French and would like to (p)review our book, please feel free to get in touch with us (our emails can easily be found, for instance on the EPFL website).
Thank you so much for your attention!
Mahdi and Lê.