Wei Dai
Vitalik Buterin: Right. Well, one thing is one domain being offence-dominant by itself isn’t a failure condition, right? Because defence-dominant domains can compensate for offence-dominant domains. And that has totally happened in the past, many times. If you even just compare now to 1,000 years ago: cannons are very offence-dominant and castles stopped them working. But if you compare physical warfare now to before, is it more offence-dominant on the whole? It’s not clear, right?
How does defense-dominant domains compensate for offense-dominant domains? For example, defense-dominance in cyber-warfare doesn’t seem to compensate for offense-dominance in bio-warfare, and vice versa. So what does he mean?
Physical warfare is hugely offense-dominant today, if we count nuclear weapons. Why did he say “it’s not clear”?
Overall it seems very unclear what Vitalik’s logic is in this area, and I wish Robert had pushed him to think or speak more clearly.
I wish there was discussion about a longer pause (e.g. multi-decade), to allow time for human genetic enhancement to take effect. Does @CarlShulman support that, and why or why not?
Also I’m having trouble making sense of the following. What kind of AI disaster is Carl worried about, that’s only a disaster for him personally, but not for society?
But also, I’m worried about disaster at a personal level. If AI was going to happen 20 years later, that would better for me. But that’s not the way to think about it for society at large.
Thanks for letting me know! I have been wondering for a while why AI philosophical competence is so neglected, even compared to other subareas of what I call “ensuring a good outcome for the AI transition” (which are all terribly neglected in my view), and I appreciate your data point. Would be interested to hear your conclusions after you’ve thought about it.
I liked your “Choose your (preference) utilitarianism carefully” series and think you should finish part 3 (unless I just couldn’t find it) and repost it on this forum.
(I understand you are very busy this week, so please feel free to respond later.)
Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people.
I would say that consciousness seems very plausibly special in that it seems very different from other types of things/entities/stuff we can think or talk or have concerns about. I don’t know if it’s special in a “magical” way or some other way (or maybe not special at all), but in any case intuitively it currently seems like the most plausible thing I should care about in an impartially altruistic way. My intuition for this is not super-strong but still far stronger than my intuition for terminally caring about other agents’ desires in an impartial way.
So although I initially misunderstood your position on consciousness as claiming that it does not exist altogether (“zombie” is typically defined as “does not have conscious experience”), the upshot seems to be the same: I’m not very convinced of your illusionism, and if I were I still wouldn’t update much toward desire satisfactionism.
I suspect there may be 3 cruxes between us:
I want to analyze this question in terms of terminal vs instrumental values (or equivalently axiology vs decision theory), and you don’t.
I do not have a high prior or strong intuition that I should be impartially altruistic one way or another.
I see specific issues with desire satisfactionism (see below for example) that makes it seem implausible.
I think this is important because it’s plausible that many AI minds will have concerns mainly focused on the external world rather than their own internal states, and running roughshod over those values because they aren’t narrowly mentally-self-focused seems bad to me.
I can write a short program that can be interpreted as an agent that wants to print out as many different primes as it can, while avoiding printing out any non-primes. I don’t think there’s anything bad about “running roughshod” over its desires, e.g., by shutting it off or making it print out non-primes. Would you bite this bullet, or argue that it’s not an agent, or something else?
If you would bite the bullet, how would you weigh this agent’s desires against other agents’? What specifically in your ethical theory prevents a conclusion like “we should tile the universe with some agent like this because that maximizes overall desire satisfaction?” or “if an agentic computer virus made trillions of copies of itself all over the Internet, it would be bad to delete them, and actually their collective desires should dominate our altruistic concerns?”
More generally I think you should write down a concrete formulation of your ethical theory, locking down important attributes such as ones described in @Arepo’s Choose your (preference) utilitarianism carefully. Otherwise it’s liable to look better than it is, similar to how utilitarianism looked better earlier in its history before people tried writing down more concrete formulations and realized that it seems impossible to write down a specific formulation that doesn’t lead to counterintuitive conclusions.
Have you considered working on metaphilosophy / AI philosophical competence instead? Conditional on correct philosophy about AI welfare being important, most of future philosophical work will probably be done by AIs (to help humans / at our request, or for their own purposes). If AIs do that work badly and arrive at wrong conclusions, then all the object-level philosophical work we do now might only have short-term effects and count for little in the long run. (Conversely if we have wrong views now but AIs correct them later, that seems less disastrous.)
The 2017 Report on Consciousness and Moral Patienthood by Muehlhauser assumes illusionism about human consciousness to be true.
Reading that, it appears Muehlhauser’s illusionism (perhaps unlike Carl’s although I don’t have details on Carl’s views) is a form that does not imply that consciousness does not exist nor strongly motivates desire satisfactionism:
There is “something it is like” to be us, and I doubt there it is “something it is like” to be a chess-playing computer, and I think the difference is morally important. I just think our intuitions mislead us about some of the properties of this “something it’s like”-ness.
I don’t want to have an argument about phenomenal consciousness in this thread
Maybe copy-paste your cut content into a short-form post? I would be interested in reading it. My own view is that some version of dualism seems pretty plausible, given that my experiences/qualia seem obviously real/existent in some ontological sense (since it can be differentiated/described by some language), and seem like a different sort of thing from physical systems (which are describable by a largely distinct language). However I haven’t thought a ton about this topic or dived into the literature, figuring that it’s probably a hard problem that can’t be conclusively resolved at this point.
The you that chooses is more fundamental than the you that experiences, because if you remove experience you get a blindmind you that will presumably want it back. Even if it can’t be gotten back, presumably **you **will still pursue your values whatever they were. On the other hand, if you remove your entire algorithm but leave the qualia, you get an empty observer that might not be completely lacking in value, but wouldn’t be you, and if you then replace the algorithm you get a sentient someone else.
Thus I submit that moral patients are straightforwardly the agents, while sentience is something that they can have and use.
If there is an agent that lost its qualia and wants to get them back, then I (probably) want to help it get them back, because I (probably) value qualia myself in an altruistic way. On the other hand, if there is a blindmind agent that doesn’t have or care about qualia, and just wants to make paperclips or whatever, then I (probably) don’t want to help them do that (except instrumentally, if doing so helps my own goals). It seems like you’re implicitly trying to make me transfer my intuitions from the former to the latter, by emphasizing the commonalities (they’re both agents) and ignoring the differences (one cares about something I also care about, the other doesn’t), which I think is an invalid move.
Apologies if I’m being uncharitable or misinterpreting you, but aside from this, I really don’t see what other logic or argumentative force is supposed to make me, after reading your first paragraph, reach the conclusion in your second paragraph, i.e., decide that I now want to value/help all agents, including blindminds that just want to make paperclips. If you have something else in mind, please spell it out more?
Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of **agency: **preferences, desires, goals, interests, and the like.
The articles you cite, and Carl himself (via private discussion) all cite the possibility that there is no such thing as consciousness (illusionism, “physicalist/zombie world”) as the main motivation for this moral stance (named “Desire Satisfactionism” by one of the papers).
But from my perspective, a very plausible reason that altruism is normative is that axiologically/terminally caring about consciousness is normative. If it turns out that consciousness is not a thing, then my credence assigned to this position wouldn’t all go into desire satisfactionism (which BTW I think has various problems that none of the sources try to address), and would instead largely be reallocated to other less altruistic axiological systems, such as egoism, nihilism, and satisfying my various idiosyncratic interests (intellectual curiosity, etc.). These positions imply caring about other agents’ preferences/desires only in an instrumental way, via whatever decision theory is normative. I’m uncertain what decision theory is normative, but it seems quite plausible that this implies I should care relatively little for certain agents’ preferences/desires, e.g., because they can’t reciprocate.
So based on what I’ve read so far, desire satisfactionism seems under motivated/justified.
Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious
Can you talk a bit about how such research might work? The main problem I see is that we do not have “ground truth labels” about which systems are or are not conscious, aside from perhaps humans and inanimate objects. So this seemingly has to be mostly philosophical as opposed to scientific research, which tends to progress very slowly (perhaps for good reason). Do you see things differently?
Another podcast linked below with some details about Will and Toby’s early interactions with the Rationality community. Also Holden Karnofsky has an account on LW, and interacted with the Rationality community via e.g. this extensively discussed 2011 post.
https://80000hours.org/podcast/episodes/will-macaskill-what-we-owe-the-future/
Will MacAskill: But then the biggest thing was just looking at what are the options I have available to me in terms of what do I focus my time on? Where one is building up this idea of Giving What We Can, kind of a moral movement focused on helping people and using evidence and data to do that. It just seemed like we were getting a lot of traction there.
Will MacAskill: Alternatively, I did go spend these five-hour seminars at Future of Humanity Institute, that were talking about the impact of superintelligence. Actually, one way in which I was wrong is just the impact of the book that that turned into — namely Superintelligence — was maybe 100 times more impactful than I expected.
Rob Wiblin: Oh, wow.
Will MacAskill: Superintelligence has sold 200,000 copies. If you’d asked me how many copies I expected it to sell, maybe I would have said 1,000 or 2,000. So the impact of it actually was much greater than I was thinking at the time. But honestly, I just think I was right that the tractability of what we were working on at the time was pretty low. And doing this thing of just building a movement of people who really care about some of the problems in the world and who are trying to think carefully about how to make progress there was just much better than being this additional person in the seminar room. I honestly think that intuition was correct. And that was true for Toby as well. Early days of Giving What We Can, he’d be having these arguments with people on LessWrong about whether it was right to focus on global health and development. And his view was, “Well, we’re actually doing something.”
Rob Wiblin: “You guys just comment on this forum.”
Will MacAskill: Yeah. Looking back, actually, again, I will say I’ve been surprised by just how influential some of these ideas have been. And that’s a tremendous testament to early thinkers, like Nick Bostrom and Eliezer Yudkowsky and Carl Shulman. At the same time, I think the insight that we had, which was we’ve actually just got to build stuff — even if perhaps there’s some theoretical arguments that you should be prioritising in a different way — there are many, many, positive indirect effects from just doing something impressive and concrete and tangible, as well as the enormous benefits that we have succeeded in producing, which is tens to hundreds of millions of bed nets distributed and thousands of lives saved.
https://80000hours.org/podcast/episodes/will-macaskill-moral-philosophy/
Robert Wiblin: We’re going to dive into your philosophical views, how you’d like to see effective altruism change, life as an academic, and what you’re researching now. First, how did effective altruism get started in the first place?
Will MacAskill: Effective altruism as a community is really the confluence of 3 different movements. One was Give Well, co-founded by Elie Hassenfeld and Holden Karnofsky. Second was Less Wrong, primarily based in the bay area. The third is the co-founding of Giving What We Can by myself and Toby Ord. Where Giving What We Can was encouraging people to give at least 10% of their income to whatever charities were most effective. Back then we also had a set of recommended charities which were Toby and I’s best guesses about what are the organizations that can have the biggest possible impact with a given amount of money. My path into it was really by being inspired by Peter Singer and finding compelling his arguments that we in rich countries have a duty to give most of our income if we can to those organizations that will do the most good.
As I’ve said elsewhere, I have more complicated feelings about genetic enhancement. I think it is potentially beneficial, but also tends to be correlated with bad politics, and it could be the negative social effects of allowing it outweigh the benefits.
I appreciate you keeping on open mind on genetic enhancement (i.e., not grouping it with racism and fascism, or immediately calling for it to be banned). Nevertheless, it fills me with a sense of hopelessness to consider that one of the most thoughtful groups of people on Earth (i.e., EAs) might still realistically decide to ban the discussion of human genetic enhancement (I’m assuming that’s the implied alternative to “allowing it”), on the grounds that it “tends to be correlated with bad politics”.
When I first heard about the idea of greater than human intelligence (i.e., superintelligence), I imagined that humanity would approach it as one of the most important strategic decision we’ll ever face, and there would be worldwide extensive debates about the relative merits of each possible route to achieving that, such as AI and human genetic enhancement. Your comment represents such a divergence from that vision, and occurring in a group like this...
If even we shy away from discussing a potentially world-altering technology simply because of its political baggage, what hope is there for broader society to engage in nuanced, good-faith conversations about these issues?
I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
I’m pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we’re merely trying to trick it into revealing its own misalignment (which we’d then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we’re honestly presenting it such a deal (including by directly manipulating it’s weights and activations), than to actually honestly present such an deal and in doing so cause the AI to believe it.
Further, I think there is a substantial chance that AI moral patienthood becomes a huge issue in coming years and thus it is good to ensure that field has better views and interventions.
I agree with this part.
A couple of further considerations, or “stops on the crazy train”, that you may be interested in:
(These were written in an x-risk framing, but implications for s-risk are fairly straightforward.)
As far as actionable points, I’ve been advocating working on metaphilosophy or AI philosophical competence, as a way of speeding up philosophical progress in general (so that it doesn’t fall behind other kinds of intellectual progress, such as scientific and technological progress, that seem likely to be greatly sped up by AI development by default), and improving the likelihood that human-descended civilization(s) eventually reach correct conclusions on important moral and philosophical questions, and will be motivated/guided by those conclusions.
In posts like this and this, I have lamented the extreme neglect of this field, even among people otherwise interested in philosophy and AI, such as yourself. It seems particularly puzzling why no professional philosopher has even publicly expressed a concern about AI philosophical competence and related risks (at least AFAIK), even as developments such as ChatGPT have greatly increased societal attention on AI and AI safety in the last couple of years. I wonder if you have any insights into why that is the case.
Lower than 1%? A lot more uncertainty due to important unsolved questions in philosophy of mind.
I agree that there is a lot of uncertainty, but don’t understand how that is compatible with a <1% likelihood of AI sentience. Doesn’t that represent near certainty that AIs will not be sentient?
The main alternative to truth-seeking is influence-seeking. EA has had some success at influence-seeking, but as AI becomes the locus of increasingly intense power struggles, retaining that influence will become more difficult, and it will tend to accrue to those who are most skilled at power struggles.
Thanks for the clarification. Why doesn’t this imply that EA should get better at power struggles (e.g. by putting more resources into learning/practicing/analyzing corporate politics, PR, lobbying, protests, and the like)? I feel like maybe you’re adopting the framing of “comparative advantage” too much in a situation where the idea doesn’t work well (because the situation is too adversarial / not cooperative enough). It seems a bit like a country, after suffering a military defeat, saying “We’re better scholars than we are soldiers. Let’s pursue our comparative advantage and reallocate our defense budget into our universities.”
Rather, I think its impact will come from advocating for not-super-controversial ideas, but it will be able to generate them in part because it avoided the effects I listed in my comment above.
This part seems reasonable.
- 16 Jul 2024 8:15 UTC; 4 points) 's comment on Towards more cooperative AI safety strategies by (
I’ve also updated over the last few years that having a truth-seeking community is more important than I previously thought—basically because the power dynamics around AI will become very complicated and messy, in a way that requires more skill to navigate successfully than the EA community has. Therefore our comparative advantage will need to be truth-seeking.
I’m actually not sure about this logic. Can you expand on why EA having insufficient skill to “navigate power dynamics around AI” implies “our comparative advantage will need to be truth-seeking”?
One problem I see is that “comparative advantage” is not straightforwardly applicable here, because the relevant trade or cooperation (needed for the concept to make sense) may not exist. For example, imagine that EA’s truth-seeking orientation causes it to discover and announce one or more politically inconvenient truths (e.g. there are highly upvoted posts about these topics on EAF), which in turn causes other less truth-seeking communities to shun EA and refuse to pay attention to its ideas and arguments. In this scenario, if EA also doesn’t have much power to directly influence the development of AI (as you seem to suggest), then how does EA’s truth-seeking benefit the world?
(There are worlds in which it takes even less for EA to be shunned, e.g., if EA merely doesn’t shun others hard enough. For example there are currently people pushing for EA to “decouple” from LW/rationality, even though there is very little politically incorrect discussions happening on LW.)
My own logic suggests that too much truth-seeking isn’t good either. Would love to see how to avoid this conclusion, but currently can’t. (I think the optimal amount is probably a bit higher than the current amount, so this is not meant to be an argument against more truth-seeking at the current margin.)
You probably didn’t have someone like me in mind when you wrote this, but it seems a good opportunities to write down some of my thoughts about EA.
On 1, I think despite paying lip service to moral uncertainty, EA encourages too much certainty in the normative correctness of altruism (and more specific ideas like utilitarianism), perhaps attracting people like SBF with too much philosophical certainty in general (such as about how much risk aversion is normative), or even causing such general overconfidence (by implying that philosophical questions in general aren’t that hard to answer, or by suggesting how much confidence is appropriate given a certain amount of argumentation/reflection).
I think EA also encourages too much certainty in descriptive assessment of people’s altruism, e.g., viewing a philanthropic action or commitment as directly virtuous, instead of an instance of virtue signaling (that only gives probabilistic information about someone’s true values/motivations, and that has to be interpreted through the lenses of game theory and human psychology).
On 25, I think the “safe option” is to give people information/arguments in a non-manipulative way and let them make up their own minds. If some critics are using things like social pressure or rhetoric to manipulate people into being anti-EA (as you seem to implying—I haven’t looked into it myself), then that seems bad on their part.
On 37, where has EA messaging emphasized downside risk more? A text search for “downside” and “risk” on https://www.effectivealtruism.org/articles/introduction-to-effective-altruism both came up empty, for example. In general it seems like there has been insufficient reflection on SBF and also AI safety (where EA made some clear mistakes, e.g. with OpenAI, and generally contributed to the current AGI race in a potentially net negative way, but seem to have produced no public reflections on these topics).
On 39, seeing statements like this (which seems overconfident to me) makes me more worried about EA, similar to how my concern about each AI company is inversely related to how optimistic it is about AI safety.
The problem of motivated reasoning is in some ways much deeper than the trolley problem.
The motivation behind motivated reasoning is often to make ourselves look good (in order to gain status/power/prestige). Much of the problem seems to come from not consciously acknowledging this motivation, and therefore not being able to apply system 2 to check for errors in the subconscious optimization.
My approach has been to acknowledge that wanting to make myself look good may be a part of my real or normative values (something like what I would conclude my values are after solving all of philosophy). Since I can’t rule that out for now (and also because it’s instrumentally useful), I think I should treat it as part of my “interim values”, and consciously optimize for it along with my other “interim values”. Then if I’m tempted to do something to look good, at a cost to my other values or perhaps counterproductive on its own terms, I’m more likely to ask myself “Do I really want to do this?”
BTW I’m curious what courses you teach, and whether / how much you tell your students about motivated reasoning or subconscious status motivations when discussing ethics.
I’m generally a fan of John Cochrane. I would agree that government regulation of AI isn’t likely to work out well, which is why I favor an international pause on AI development instead (less need for government competence on detailed technical matters).
His stance on unemployment seems less understandable. I guess he either hasn’t considered the possibility that AGI could drive wages below human subsistence levels, or think that’s fine (humans just work for the same low wages as AIs and governments make up the difference with a “broad safety net that cushions all misfortunes”)?
Oh, of course he also doesn’t take x-risk concerns seriously enough, but that’s more understandable for an economist who probably just started thinking about AI recently.