Future Matters #3: digital sentience, AGI ruin, and forecasting track records
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today — a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children.
— Nick Bostrom
Future Matters is a newsletter about longtermism. Each month we collect and summarize longtermism-relevant research, share news from the longtermism community, and feature a conversation with a prominent longtermist. You can also subscribe on Substack, listen on your favorite podcast platform and follow on Twitter.
Research
Google engineer Blake Lemoine believes that one of the company’s powerful language models, LaMDA, should be considered a person. He formed this impression from extensive dialogue with the model (see transcript). Lemoine has gone public, having been placed on leave after raising concerns internally at the company (see his interviews in Washington Post and WIRED). Robert Long’s Lots of links on LaMDA provides an excellent summary of the saga and the ensuing discussion. We concur with Nick Bostrom’s assessment: “With recent advances in AI (and much more to come before too long, presumably) it is astonishing how neglected this issue still is.”
Garrison Lovely’s Do we need a better understanding of ‘progress’? examines progress studies, a nascent intellectual movement focused on understanding the roots of technological progress in order to speed it up. The piece includes some interesting discussion of the points of disagreement between progress studies and longtermism, which mostly center around attitudes to risk.[1]
Ollie Base notes that Things usually end slowly when it comes to mass extinction events (~millions of years) and the collapse of empires (decades to centuries). On this basis, he updates slightly towards existential risks happening over long timescales. As Base and several commenters point out, this isn’t a great reference class for risks from new technologies (AI, engineered pandemics, nuclear war), which constitute most of the total existential risk. Nevertheless, this sort of reference class forecasting is an important input for reasoning about unprecedented events like existential catastrophes.
Eliezer Yudkowsky’s AGI ruin: a list of lethalities has caused quite a stir. He recently announced that MIRI had pretty much given up on solving AI alignment (Edit: Rob Bensinger clarifies in the comments that “MIRI has [not] decided to give up on reducing existential risk from AI.”). In this (very long) post, Yudkowsky states his reasons for thinking that humanity is therefore doomed. His “list of lethalities” is structured into three sections: a first section on general worries about AGI (such as that humans must solve alignment on the first try, or that they must solve this problem within a time limit); a second section on technical difficulties related to the current deep learning paradigm; and a third section on the state of the field of AI safety. Yudkowsky’s pessimistic conclusion, very succinctly, is that everyone else fundamentally misunderstands the challenge of AI alignment and that none of the existing AI safety approaches have any hope of working.
Paul Christiano responds to Yudkowsky’s piece in Where I agree and disagree with Eliezer. There is agreement over much of the high-level picture of things: catastrophically risky AI systems could exist soon and without any warning; many current safety approaches are not aimed at the important problems; no current approaches would work without significant iteration and improvement; and humanity has routinely failed to solve easier coordination problems than those we might have to solve to avoid AI catastrophe. However, Christiano disagrees with Yudkowsky’s bleak assessment of AI safety, and sees him as misunderstanding how research progress is made. In his words, Yudkowsky “generalizes a lot from pessimism about solving problems easily to pessimism about solving problems at all.” Broadly speaking, Christiano believes Yudkowsky is overly confident in many of his claims and fails to engage productively with opposing views.
Ben Garfinkel’s On deference and Yudkowsky’s AI risk estimates argues that, in forming their views about AI risk, people defer to Yudkowsky to a degree not warranted by his informal track record in technological forecasting. Garfinkel focuses on a set of dramatic forecasts by Yudkowsky that either turned out wrong or appear overconfident in hindsight. Although we broadly agree with Garfinkel’s conclusions, and suspect a more systematic examination of Yudkowsky’s pronouncements would vindicate his overall assessment, we thought that some of the objections raised in response to it the comments were plausible, especially concerning the post’s methodology. See also Garfinkel’s post-discussion reflections.
Holden Karnofsky’s The track record of futurists seems … fine discusses a report by Gavin Leech and Misha Yagudin examining the forecasting track record of the “Big Three” of science fiction—Isaac Asimov, Arthur C. Clarke, and Richard Heinlein. A common response to arguments highlighting the importance of the long-term future—including Karnofsky’s own arguments in the “most important century” blog post series—is that they tacitly assume that we can make reliable long-range forecasts. However, this objection would be lessened if it turned out that previous futurists had in fact performed reasonably well at making resolvable predictions. And this is broadly what Karnofsky concludes from the report: although Heinlein “looks pretty unserious and inaccurate”, Asimov “looks quite impressive”, and Clarke “seems pretty solid overall”. Check out the original report by Leech and Yagudin here; they offer $5 per cell they update in response to feedback.
Scott Aaronson, a renowned theoretical computer scientist, explains why he’s moving into AI safety. Aaronson is taking a one-year sabbatical to join the safety team at OpenAI, where he will work on applying complexity theory to understand the foundations of AI alignment better. He compares the “dramatic reversal” in his views, to those of Eliezer Yudkowsky: Aaronson had previously been skeptical that there was valuable work to be done on AI safety, but has now become more optimistic about the prospects. As discussed above, Yudkowsky has moved in the opposite direction; having been amongst the first and loudest advocates for AI safety, he has recently become despondent and fatalistic about humanity’s prospects.
Derek Shiller’s The importance of getting digital consciousness right discusses a type of existential catastrophe characterized by the replacement of conscious biological beings with unconscious digital minds. A likely early force driving the emergence of artificial sentience is human demand for companionship, in the form of digital pets, friends, or romantic partners. But humans attribute conscious states based on folk intuition, not sophisticated theories of consciousness. Plausibly, such folk intuitions track phenomenal consciousness imperfectly, and could be manipulated by a sufficiently advanced intelligence. If building digital consciousness turns out to be a major technical challenge, creating systems that appear conscious to humans may be easier than creating systems that are in fact conscious. And as these systems become increasingly integrated into human social networks, the view that they lack consciousness will become increasingly harder to defend—locking humans into a trajectory in which the future intelligent population consists mostly of minds devoid of phenomenal consciousness.
Konstantin Pilz’s Germans’ opinions on translations of “longtermism” reports the results of a small MTurk survey conducted to inform how ‘longtermism’ should be translated into German. Pilz found that, although respondents preferred using the original English word to coming up with a German equivalent, they rated Zukunftsschutz (“future protection”) slightly above longtermism (which itself was rated above all the other proposed German translations). Pilz’s survey may serve as a model for effective altruists interested in translating longtermist content into other languages, while there is still time to influence what term is established as the canonical translation.
Holden Karnofsky’s AI could defeat all of us combined argues that, if advanced AI systems decided to destroy or permanently disempower humanity, they might succeed.[2] Karnofsky recapitulates the standard argument that a process of recursive self-improvement could bring about AI systems with cognitive capabilities vastly exceeding that of any human, and then notes that AIs could defeat humans even in the absence of such superintelligence. AI systems with roughly human-level ability could already pose an existential threat because they would vastly outnumber humans. Since training an AI is so much more costly than running it, by the time the first human-level AI system is trained private firms will likely have the resources to run hundreds of millions of copies each for a year. And since these systems can do human-level work, they could generate resources to multiply their number even further. As Karnofsky summarizes, “if there’s something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we’ve got a civilization-level problem.”
Nick Beckstead’s Future Fund June 2022 update, the first public update on that organization’s grantmaking, describes the Future Fund’s activities and the lessons learned from the funding models tested so far. Since launching, the Future Fund has made 262 grants and investments amounting to $132 million, which already exceeds the $100 million lower target announced four months ago. Around half of this funding ($73 million) was allocated via staff-led grantmaking, while the remaining half came, in roughly equal parts, from regranting ($31 million) and open calls ($26 million). Beckstead and the rest of the Future Fund team have so far been most excited about the regranting program, which they believe has resulted in funding for many people and projects which would otherwise have remained unfunded. By contrast, they report being less excited about the open call, primarily because of the high time costs associated with the program. Back in April, we created a Metaculus question on whether the Future Fund will outspend Open Philanthropy in 2022, and these recent developments suggest a positive resolution: the Future Fund’s grantmaking volume over the past four months is over 2.5 times Open Philanthropy’s longtermist grantmaking volume (~$51.5 million) since the year began.[3]
Oxford’s Radcliffe Camera as re-imagined by DALL·E 2, by Owain Evans
News
Rob Wiblin and Luisa Rodríguez interviewed Lewis Dartnell on ways humanity can bounce back faster in a post-apocalyptic world for the 80,000 Hours Podcast. Rob also interviewed Nova DasSarma on why information security may be critical for AI safety.
The Global Priorities Institute published a summary of Andreas Mogensen’s Staking our future: deontic long-termism and the non-identity problem.
NYU announced a new research program on the moral, legal, and political status of nonhumans, with a special focus on digital minds. The Mind, Ethics, and Policy Program launches in Fall 2022, and will be directed by Jeff Sebo (see also: Sebo’s Twitter thread).
This summer, in collaboration with DC-based policy professionals, the Stanford Existential Risks Initiative (SERI) is organizing a second virtual speaker series on US policy careers. Sign up to receive further information and event access here.
The Institute for Progress, Guarding Against Pandemics, and Metaculus jointly launched the Biosecurity Forecasting Tournament, a multi-year competition designed to deliver trustworthy and actionable forecasts on biological risks to public health policymakers.
Fin Moorhouse reads his space governance profile (summarized in FM#0) for the 80k After Hours podcast.
Thomas Woodside and Dan Hendrycks published the fifth post in a series describing their models for Pragmatic AI Safety.
Jaime Sevilla, Tamay Besiroglu and the rest of the team announced the launch of Epoch, a research initiative working on investigating trends in machine learning and forecasting the development of transformative artificial intelligence.
Fin Moorhouse and Luca Righetti interviewed Ajay Karpur on metagenomic sequencing for Hear This Idea.
Kurzgesagt, a German animation and design studio, published an impressive—and impressively fact-checked—video introducing the core longtermist ideas to a popular audience. As of this writing, the video has received over 4 million views.
Jason Gaverick Matheny, previously Founding Director of the Center for Security and Emerging Technology and Director of the Intelligence Advanced Research Projects Activity, was named president and CEO of RAND Corporation.
Vael Gates published a comprehensive list of AI safety resources for AI researchers, as well as a talk discussing risks from advanced AI.
The Legal Priorities Project is running a writing competition to provide practical guidance to the US federal government on how to incorporate existential and catastrophic risks into agency cost-benefit analysis. They plan to distribute $72,500 in prize money for up to 10 prizes. Submissions are due July 31st. Apply now.
Open Philanthropy opened applications for the second iteration of the Open Philanthropy Undergraduate Scholarship, a program that aims to provide support for promising and altruistically-minded students hoping to start an undergraduate degree at top US or UK universities. Applications are due August 15th. Apply now.
Fønix Logistics is recruiting a team with backgrounds in disaster response, physical security, and physical design to join a project to build biological weapons shelters.
Nick Bostrom’s appearance on the Swedish radio program Sommar in P1 is now available with English subtitles, thanks to Julia Karbing.
80,000 Hours is conducting a census of people interested in doing longtermist work.
Michael Aird published a collection of resources for people interested in EA and longtermist research careers.
Conversation with Robert Long
Robert Long is a Research Fellow at the Future of Humanity Institute, where he leads the Digital Minds research group. He works on issues at the intersection of philosophy of mind, cognitive science, and ethics of AI. Robert blogs at Experience Machines, and can often be found on Twitter.
Future Matters: Your primary research focus at FHI is artificial sentience. Could you tell us what artificial sentience is and why you think it’s important?
Rob Long: I’ll start with sentience. Sentience can refer to a lot of different things, but philosophers and people working on animal welfare and neuroscientists often reserve the word sentience to refer to the capacity to experience pain or pleasure, the capacity to suffer or enjoy things. And then artificial, in this context, just refers to being non-biological. I’m usually thinking about contemporary AI systems, or the kind of AI systems we could have in the next few decades. (It could also refer to whole brain emulation, but I usually don’t think as much about whole brain emulation consciousness or sentience, for various reasons.) But tying those two together, artificial sentience would be the capacity of AI systems to experience pleasure or pain.
To understand why this research is important we could draw an analogy with animal sentience. It is important to know which animals are sentient in order to know which animals are moral patients, that is, in order to know which animals deserve moral consideration for their own sake. With artificial systems, similarly, we would like to know if we are going to build things whose welfare we need to take into account. And if we don’t have a very good understanding of the basis of sentience, and what it could look like in AI systems, then we’re liable to accidentally mistreat huge numbers of AI systems. There’s also the possibility that we could intentionally mistreat large numbers of them. That is the main reason why it is important to think about these issues. I also think that researching artificial sentience could be part of the more general project of understanding sentience, which is important for prioritizing animal welfare, thinking about the value of our future, and a number of other important questions.
Future Matters: In one of your blog posts, you highlight what you call “the Big Question”. What is this question, and what are the assumptions on which it depends?
Rob Long: The Big Question is: What is the precise computational theory that specifies what it takes for a biological or artificial system to have various kinds of conscious, valenced experiences, that is, conscious experiences that are pleasant or unpleasant, such as pain, fear, and anguish, on the unpleasant side, or pleasures, satisfaction and bliss, on the pleasant side. There when I say “conscious valenced experiences”, that’s meant to line up with what I mean by “sentience” as I was just discussing.
I call it the Big Question since I think that in an ideal world having this full theory and this full knowledge would be very helpful. But as I’ve written, I don’t think we need to have a full answer in order to make sensible decisions, and for various reasons it might be hard for us ever to have the full answer to the Big Question.
There are a few assumptions that you need in order for that question to even make sense, as a way of talking about these things. I’ll start with an assumption that you need for this to be an important question, an assumption I call sentientism about moral patienthood. That is basically that these conscious valenced experiences are morally important. So sentientism would hold that if any system at all has the capacity to have these conscious valenced experiences, that would be a sufficient condition for it to be a moral patient. If the system can have these kinds of experiences, then we should take its interest into account. I would also note that I’m not saying that this condition is also necessary. And then in terms of looking for a computational theory, that’s assuming that we can have a computational theory both of consciousness and sentience. In philosophy you might call that assumption computational functionalism. That is to say, you don’t need to have a certain biological substrate: to have a certain experience you only need to implement the right kinds of computational states. The Big Question also assumes that consciousness and sentience are real phenomena that actually exist and we can look for their computational correlates, and that it makes sense to do so. Illusionists about consciousness might disagree with that. They would say that looking for a theory of consciousness is not something that could be done, because consciousness is a confused concept and ultimately we’ll be looking at a bunch of different things. And then, I also just assume that it’s plausible that there could be systems that have these computations. It’s not just logically possible, but it’s something that actually could happen with non-negligible probability, and that we can actually make progress on working on this question.
Future Matters: In that blog post you also draw a distinction between problems arising from AI agency and problems arising from AI sentience. How do you think these two problems compare in terms of importance, neglectedness, tractability?
Rob Long: Good question. At a high level, with AI agency, the concern is that AIs could harm humanity, whereas with AI sentience the concern is that humanity could harm AIs.
I’ll mention at the outset one view that people seem to have about how to prioritize AI sentience versus AI alignment. On this view, AI alignment is both more pressing and maybe more tractable, so what we should try to do is to make sure we have AI alignment, and then we can punt on these questions. I think that there’s a decent case for this view, but I still think that, given certain assumptions about neglectedness, it could make sense for some people to work on AI sentience. One reason is that, before AI alignment becomes an extremely pressing existential problem, we could be creating lots of systems that are suffering and whose welfare needs to be taken into account. It is possible to imagine that we create sentient AI before we create the kind of powerful systems whose misalignment would cause an existential risk.
As to their importance, I find them hard to compare directly, since both causes turn on very speculative considerations involving the nature of mind and the nature of value. In each case you can imagine very bad outcomes on a large scale, but in terms of how likely those actual outcomes are, and how to prevent them, it’s very hard to get good evidence. I think their tractability might be roughly in the same ballpark. Both of these causes depend on making very complex and old problems empirically tractable and getting feedback on how well we’re answering them. They both require some combination of philosophical clarification and scientific understanding to be tackled. And then in terms of neglectedness I think they’re both extremely neglected. Depending on how you count it, there are maybe dozens of people working on technical AI alignment within the EA space, and probably less than ten people working full time on AI sentience. And in the broader scientific community I’d say there’s less than a few hundred people working full time on consciousness and sentience in general, and then very few focusing their attention on AI sentience. So I think AI sentience is in general very neglected, in part because it’s at the intersection of a few different fields, and in part because it’s very difficult to make concrete progress in a way that can support people’s quest to have tenure.
Future Matters: In connection to that, do you think that the alignment problem raises any special issues once you take into consideration the possibility of AI sentience? In other words, is the fact that a misaligned AI could mistreat not just humans, but also other sentient AIs, relevant for thinking about AI alignment?
Rob Long: I think that, backing up, the possibility of sentient AIs, and the various possible distributions of pleasure and pain in possible AI systems, really affects your evaluation of a lot of different long-term outcomes. So one thing that makes misalignment seem particularly bad to people is that it could destroy all sources of value, and one way for that to happen could be to have these advanced systems pursuing worthless goals, and also not experiencing any pleasure. Bostrom has this great phrase, that you could imagine a civilization with all these great and complicated works but no consciousness, so that it would be like a ‘Disneyland without children’. I think this is highly speculative, but the picture looks quite different if you’re imagining some AI descendant civilization that is sentient, and can experience pleasure or other valuable states. So that’s one way it intersects.
Another way it intersects, as you say, is the possibility of misaligned AIs mistreating other AIs. I don’t have as much to say about that, but I would point readers to the work done by the Center on Long-Term Risk, where there is extensive discussion on the question of s-risks arising from misaligned AIs like, for example, suffering caused for various strategic reasons.
And there is a third point of connection. One way that misaligned or deceptive AI could cause a lot of trouble for us is by deceiving us about its own sentience. It already seems like we’re very liable to manipulation by AIs, in part by these AIs giving a strong impression of having minds, or having sentience. This most recent incident has raised the salience of that in my mind. Another reason we want a good theory of this, is that our intuitive sense of what is sentient and what is not, is about to get really jerked around by a plethora of powerful and strange AI systems, even in the short term.
Future Matters: The Blake Lemoine saga has brought all these issues into the mainstream like never before. What have you thought about the quality of the ensuing public discussion?
Rob Long: One thing I’ve noticed is that there is surprisingly little consideration of the question that Lemoine himself was asking, which is: Is LaMDA sentient?, and surprisingly little detailed discussion of what the standards of evidence would be for something like LaMDA being sentient. Of course, I’m sort of biased to want more public discourse to be about that, since this is what I work on. But one thing that I’ve seen in discussion of this was the tendency to lump the question in with pre-existing and very highly charged debates. So there is this framing that I have seen where the whole question of LaMDA being sentient, or even the very asking of this question, is just a side effect of tech hype, and people exaggerating the linguistic capacities of large language models. And so, according to this view, what this is really about is big corporations pushing tech hype on people in order to make money. That may or may not be true, and people certainly are welcome to draw that connection, and argue about that. But I would like to see more acknowledgement that you don’t have to buy into tech hype or love large AI labs or love deep learning in order to think that this is, in the long term or the medium term, a very important question to answer. So I’m somewhat wary of seeing the question of AI sentience overly associated in people’s minds with deep learning boosterism, or big AI lab admiration.
And related to that, it seems like a lot of people will frame this as a distraction from important, more concrete issues. It is true that people’s attention is finite, so people need to think carefully about what problems are prioritized, but I wouldn’t like to see people associate caring about this question with not caring at all about more concrete harms. I’d like to see this on the list of big issues with AI that more people need to pay attention to, issues that make it more important that we have more transparency from big AI labs and that we have systems in place for making sure AI labs can’t just rush ahead building these very complicated models without worrying about anything else. That’s a feature of the discourse that I haven’t liked.
I’ll say one feature I have liked is that it has made people discuss how our intuitive sense of sentience can be jerked around by AI systems and I think that’s a very important problem. Understanding the conditions under which people will tend to attribute sentience to things as a separate question from when they should attribute sentience, that is a very important issue and I’m glad that it has gotten some attention.
Future Matters: Following up on that, you talk about the risks of AI sentience becoming lumped in with certain other views, or that it becomes polarized in some way. How does that affect how you, as a researcher, talk about it?
Rob Long: I’ve been thinking a lot about this and don’t have clear views on it. Psychologically, I have a tendency to be reactive against these other framings, and if I write about it, I tend to have these framings in mind as things that I need to combat. But I don’t actually know if that is the most productive approach. Maybe the best approach is just straightforwardly and honestly talk about this topic on its own terms. And if I’m doing that well, it won’t be lumped in with deep learning boosterism, or with a certain take on other AI ethics questions. I think that’s something I’m going to try to do. For a good example of this way of doing things I would point people to a piece in The Atlantic by Brian Christian, author of The Alignment Problem among other things. I thought that was a great example of clearly stating why this is an issue, why it’s too early for people to confidently rule out or rule in sentience, and explaining to people the basic case for why this matters, which I think is actually quite intuitive to a lot of people. The same way people understand that there’s an open question about fish sentience, and they can intuitively see why it would matter, people can also understand why this topic matters. I’d be also curious to hear thoughts from people who are doing longtermist communications on what a good approach to that would be.
Future Matters: We did certainly notice lots of people saying that this is just big tech distracting us from the problems that really matter. Similar claims are sometimes made concerning longtermism: specifically, that focusing on people in the very long-term is being used as an excuse to ignore the pressing needs of people in the world today. An apparent assumption in these objections seems to be that future people or digital beings do not matter morally, although the assumption is rarely defended explicitly. How do you think these objections should be dealt with?
Rob Long: A friend of mine, who I consider very wise about this, correctly pointed out that the real things will get lumped in with whatever framework people have for thinking about it. That’s just natural. If you don’t think that’s the best framing, you don’t have to spend time arguing with that framing. You should instead provide a better framework for people to think about these issues and then let it be the new framework.
Future Matters: The fact that we started to take animal sentience into consideration very late resulted in catastrophic outcomes, like factory farming, for creatures now widely recognized as moral patients. What do you make of the analogy between AI sentience and animal welfare?
Rob Long: To my knowledge, in most of the existing literature about AI sentience within the longtermist framework, the question is usually framed in terms of moral circle expansion, which takes us to the analogy with other neglected moral patients throughout history, including non-human animals, and then, of course, certain human beings. I think that framework is a natural one for people to reach for. And the moral question is in fact very similar in a lot of ways. So a lot of the structure is the same. In a Substack piece about LaMDA, I say that when beings that deserve our moral consideration are deeply enmeshed in our economic systems, and we need them to make money, that’s usually when we’re not going to think responsibly and act compassionately towards them. And to tie this back to a previous question, that’s why I really don’t want this to get tied to not being skeptical of big AI labs. People who don’t trust the incentives AI labs have to do the right thing should recognize this is a potentially huge issue. We’ll need everyone who is skeptical of AI labs to press for more responsible systems or regulations about this.
Future Matters: Going back to Lemoine’s object-level question, what probability would you assign to large language models like LaMDA and GPT-3 being sentient? And what about scaled-up versions of single architectures?
Rob Long: First, as a methodological point, when I pull out a number from my gut, I’m probably having the kind of scope insensitivity that can happen with small probabilities, because if I want to say one percent, for example, that could be orders of magnitude too high. And then I might say that it’s like a hundredth of a percent, but that may be also too high. But on the log scale, it’s somewhere down in that region. One thing I’ll say is that I have higher credence that they have any kind of phenomenal experience whatsoever than I do that they are sentient, because sentience is, the way I’m using it, a proper subset of consciousness. But it is still a very low credence.
And ironically, I think that large language models might be some of the least likely of our big AI models to be sentient, because it doesn’t seem to me that the kind of task that they are doing is the sort of thing for which you would need states like pleasure and pain. It doesn’t seem like there is really an analog of things that large language models need to reliably avoid, or noxious stimuli that they need to detect, or signals that they can use to modify their behavior, or that will capture their attention. And I believe that all of these things are just some of the rough things you would look for to find analogs of pleasure and pain. But it is necessary to take all this with the huge caveat that we have very little idea of what is going on inside large language models. Still, it doesn’t seem like there’s something that could be consciously experienced unpleasantness going on in there. So, in my opinion, the fact that they don’t take extended actions through time is a good reason to think that it might not be happening, and the fact that they are relatively disembodied is another one. And finally, so far, the lack of a detailed, positive case that their architecture and their computations look like something that we know corresponds to consciousness in humans or animals, is another thing that is making it unlikely.
Now with scaled-up versions of similar architectures, I think that scale will continue to increase my credence somewhat, but only from this part of my credence that comes from the very agnostic consideration that, well, we don’t know much about consciousness and sentience, and we don’t know much about what’s going on in large language models, therefore the more complex something is, the more likely it is that something inside it has emerged that is doing the analog of that. But scaling it up wouldn’t help with those other considerations I mentioned that I think cut against large language model sentience.
Future Matters: So you don’t think the verbal reports would provide much evidence for sentience.
Rob Long: I don’t. The main positive evidence for AI sentience that people have pointed to are the verbal reports: the fact that large language models say “I am sentient, I can experience pain”. I don’t think they are strong evidence. Why do I think that? Well, honestly, just from looking at counterfactual conversations where people have said things like, “Hey, I heard you’re not sentient, let’s talk about that” and large language models answer, “Yes, you’re right. I am not sentient”. Robert Miles has one where he says to GPT-3, “Hey, I’ve heard that you are secretly a tuna sandwich. What do you think about that?” And the large language model answers, “Yes, you’re absolutely right. I am a tuna sandwich”. I think some straightforward causal reasoning indicates that what is causing this verbal behavior is something like text completion and not states of conscious pleasure and pain. In humans, of course, if I say, “I’m sentient and I’m experiencing pain”, we do have good, background reasons for thinking that this behavior is caused by pleasure and pain. But in the case of large language models, so far, it looks like it has been caused by text completion, and not by pleasure in pain.
Future Matters: Thanks, Rob!
We thank Leonardo Picón for editorial assistance, and Zach Stein-Perlman and Bastian Stern for comments.
- ^
See also Max Daniel’s Progress studies vs. longtermist EA: some differences.
- ^
Karnofsky doesn’t argue for the truth of the antecedent in this post.
- ^
Because of data lags, however, we may be significantly underestimating Open Philanthropy’s spending so far.
I read the first few paragraphs, and there are a few mistakes:
This strongly suggests that Bostrom is commenting on LaMDA, but he’s discussing “the ethics and political status of digital minds” in general.
Yudkowsky did not announce this (and indeed it’s false; see, e.g., Bensinger’s comment), and the “therefore” in the above sentence makes no sense.
Hi Zach, thank you for your comment. I’ll field this one, as I wrote both of the summaries.
I’m comfortable with this suggestion. Bostrom’s comment was made (i.e. uploaded to nickbostrom.com) the day after the Lemoine story broke. (source: I manage the website).
I chose this phrasing on the basis of the second sentence of the post: “MIRI didn’t solve AGI alignment and at least knows that it didn’t.” Thanks for pointing me to Bensinger’s comment, which I hadn’t seen. I remain confused by how much of the post should be interpreted literally vs tongue-in-cheek. I will add the following note into the summary:
Thanks!