You said in the podcast that the drop was ‘an order of magnitude’, so presumably your original estimate was 1-10%? I note that this is similar to Toby Ord’s in The Precipice (~10%) so perhaps that should be a good rule of thumb: if you are convinced by the classic arguments your estimate of existential catastrophe from AI should be around 10% and if you are unconvinced by specific arguments, but still think AI is likely to become very powerful in the next century, then it should be around 1%?
SammyDMartin
Hi Ben,
Thanks for the reply! I think the intuitive core that I was arguing for is more-or-less just a more detailed version of what you say here:
“If we create AI systems that are, broadly, more powerful than we are, and their goals diverge from ours, this would be bad—because we couldn’t stop them from doing things we don’t want. And it might be hard to ensure, as we’re developing increasingly sophisticated AI systems, that there aren’t actually subtle but extremely important divergences in some of these systems’ goals.”
The key difference is that I don’t think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong—you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe.
My point in this comment (and follow up) was that the Orthogonality Thesis, Instrumental Convergence and eventual fast progress are essential for any argument about AI risk, even if you also need other assumptions in there—you need to know the OT will apply to your method of developing AI, you need more specific reasons to think the particular goals of your system look like those that lead to instrumental convergence.
If you approached the classic arguments with that framing, then perhaps it begins to look like less a matter of them being mistaken and more a case of having a vague philosophical picture that then got filled in with more detailed considerations—that’s how I see the development over the last 10 years.
The only mistake was in mistaking the vague initial picture for the whole argument—and that was a mistake, but it’s not the same kind of mistake as just having completely false assumptions. You might compare it to the early development of a new scientific field. Perhaps seeing it that way might lead you to have a different view about how much to update against trusting complicated conceptual arguments about AI risk!
“AI safety and alignment issues exist today. In the future, we’ll have crazy powerful AI systems with crazy important responsibilities. At least the potential badness of safety and alignment failures should scale up with these systems’ power and responsibility. Maybe it’ll actually be very hard to ensure that we avoid the worst-case failures.”
This is how Stuart Russell likes to talk about the issue, and I have a go at explaining that line of thinking here.
This is an interesting post, and I have a couple of things to say in response. I’m copying over the part of my shortform that deals with this:
Normative Realism by degrees
Further to the whole question of Normative / moral realism, there is this post on Moral Anti-Realism. While I don’t really agree with it, I do recommend reading it—one thing that it convinced me of is that there is a close connection between your particular normative ethical theory and moral realism. If you claim to be a moral realist but don’t make ethical claims beyond ‘self-evident’ ones like pain is bad, given the background implausibility of making such a claim about mind-independent facts, you don’t have enough ‘material to work with’ for your theory to plausibly refer to anything. The Moral Anti-Realism post presents this dilemma for the moral realist:
There are instances where just a handful of examples or carefully selected “pointers” can convey all the meaning needed for someone to understand a far-reaching and well-specified concept. I will give two cases where this seems to work (at least superficially) to point out how—absent a compelling object-level theory—we cannot say the same about “normativity.”
...these thought experiments illustrate that under the right circumstances, it’s possible for just a few carefully selected examples to successfully pinpoint fruitful and well-specified concepts in their entirety. We don’t have the philosophical equivalent of a background understanding of chemistry or formal systems… To maintain that normativity—reducible or not—is knowable at least in theory, and to separate it from merely subjective reasons, we have to be able to make direct claims about the structure of normative reality, explaining how the concept unambiguously targets salient features in the space of possible considerations. It is only in this way that the ambitious concept of normativity could attain successful reference. As I have shown in previous sections, absent such an account, we are dealing with a concept that is under-defined, meaningless, or forever unknowable.
The challenge for normative realists is to explain how irreducible reasons can go beyond self-evident principles and remain well-defined and speaker-independent at the same time.
To a large degree, I agree with this claim—I think that many moral realists do as well. Convergence type arguments often appear in more recent metaethics (Hare and Parfit are in those previous lists) - so this may already have been recognised. The post discusses such a response to antirealism at the end:
I titled this post “Against Irreducible Normativity.” However, I believe that I have not yet refuted all versions of irreducible normativity. Despite the similarity Parfit’s ethical views share with moral naturalism, Parfit was a proponent of irreducible normativity. Judging by his “climbing the same mountain” analogy, it seems plausible to me that his account of moral realism escapes the main force of my criticism thus far.
But there’s one point I want to make which is in disagreement with that post. I agree that how much you can concretely say about your supposed mind-independent domain of facts affects how plausible its existence should seem, and even how coherent the concept is, but I think that this can come by degrees. This should not be surprising - we’ve known since Quine and Kripke that you can have evidential considerations for/against and degrees of uncertainty about a priori questions. The correct method in such a situation is Bayesian—tally the plausibility points for and against admitting the new thing into your ontology. This can work even if we don’t have an entirely coherent understanding of normative facts, as long as it is coherent enough.
Suppose you’re an Ancient Egyptian who knows a few practical methods for trigonometry and surveying, doesn’t know anything about formal systems or proofs, and someone asks you if there are ‘mathematical facts’. You would say something like “I’m not totally sure what this ‘maths’ thing consists of, but it seems at least plausible that there are some underlying reasons why we keep hitting on the same answers”. You’d be less confident than a modern mathematician, but you could still give a justification for the claim that there are right and wrong answers to mathematical claims. I think that the general thrust of convergence arguments puts us in a similar position with respect to ethical facts.
If we think about how words obtain their meaning, it should be apparent that in order to defend this type of normative realism, one has to commit to a specific normative-ethical theory. If the claim is that normative reality sticks out at us like Mount Fuji on a clear summer day, we need to be able to describe enough of its primary features to be sure that what we’re seeing really is a mountain. If all we are seeing is some rocks (“self-evident principles”) floating in the clouds, it would be premature to assume that they must somehow be connected and form a full mountain.
So, we don’t see the whole mountain, but nor are we seeing simply a few free-floating rocks that might be a mirage. Instead, what we see is maybe part of one slope and a peak.
Let’s be concrete, now—the 5 second, high level description of both Hare’s and Parfit’s convergence arguments goes like this:
If we are going to will the maxim of our action to be a universal law, it must be, to use the jargon, universalizable. I have, that is, to will it not only for the present situation, in which I occupy the role that I do, but also for all situations resembling this in their universal properties, including those in which I occupy all the other possible roles. But I cannot will this unless I am willing to undergo what I should suffer in all those roles, and of course also get the good things that I should enjoy in others of the roles. The upshot is that I shall be able to will only such maxims as do the best, all in all, impartially, for all those affected by my action. And this, again, is utilitarianism.
and
An act is wrong just when such acts are disallowed by some principle that is optimific, uniquely universally willable, and not reasonably rejectable
In other words, the principles that (whatever our particular wants) would produce the best outcome in terms of satisfying our goals, could be willed to be a universal law by all of us and would not be rejected as the basis for a contract, are all the same principles. That is at least suspicious levels of agreement between ethical theories. This is something substantive that can be said—out of every major attempt to get at a universal ethics that has in fact been attempted in history: what produces the best outcome, what can you will to be a universal law, what would we all agree on, seem to produce really similar answers.
The particular convergence arguments given by Parfit and Hare are a lot more complex, I can’t speak to their overall validity. If we thought they were valid then we’d be seeing the entire mountain precisely. Since they just seem quite persuasive, we’re seeing the vague outline of something through the fog, but that’s not the same as just spotting a few free-floating rocks.
Now, run through these same convergence arguments but for decision theory and utility theory, and you have a far stronger conclusion. there might be a bit of haze at the top of that mountain, but we can clearly see which way the slope is headed.
This is why I think that ethical realism should be seen as plausible and realism about some normative facts, like epistemic facts, should be seen as more plausible still. There is some regularity here in need of explanation, and it seems somewhat more natural on the realist framework.
I agree that this ‘theory’ is woefully incomplete, and has very little to say about what the moral facts actually consist of beyond ‘the thing that makes there be a convergence’, but that’s often the case when we’re dealing with difficult conceptual terrain.
From Ben’s post:
I wouldn’t necessarily describe myself as a realist. I get that realism is a weird position. It’s both metaphysically and epistemologically suspicious. What is this mysterious property of “should-ness” that certain actions are meant to possess—and why would our intuitions about which actions possess it be reliable? But I am also very sympathetic to realism and, in practice, tend to reason about normative questions as though I was a full-throated realist.
From the perspective of x, x is not self-defeating
From the antirealism post, referring to the normative web argument:
It’s correct that anti-realism means that none of our beliefs are justified in the realist sense of justification. The same goes for our belief in normative anti-realism itself. According to the realist sense of justification, anti-realism is indeed self-defeating.
However, the entire discussion is about whether the realist way of justification makes any sense in the first place—it would beg the question to postulate that it does.
Sooner or later every theory ends up question-begging.
From the perspective of Theism, God is an excellent explanation for the universe’s existence since he is a person with the freedom to choose to create a contingent entity at any time, while existing necessarily himself. From the perspective of almost anyone likely to read this post, that is obvious nonsense since ‘persons’ and ‘free will’ are not primitive pieces of our ontology, and a ‘necessarily existent person’ makes as much sense as ‘necessarily existent cabbage’- so you can’t call it a compelling argument for the atheist to become a theist.
By the same logic, it is true that saying ‘anti-realism is unjustified on the realist sense of justification’ is question-begging by the realist. The anti-realist has nothing much to say to it except ‘so what’. But you can convert that into a Quinean, non-question begging plausibility argument by saying something like:
We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist ‘justification’ for our beliefs, in purely descriptive terms, the other in which there are mind-independent facts about which of our beliefs are justified, and the latter is a more plausible, parsimonious account of the structure of our beliefs.
This won’t compel the anti-realist, but I think it would compel someone weighing up the two alternative theories of how justification works. If you are uncertain about whether there are mind-independent facts about our beliefs being justified, the argument that anti-realism is self-defeating pulls you in the direction of realism.
- 29 Jul 2020 10:30 UTC; 7 points) 's comment on SDM’s Shortform by (LessWrong;
- 23 Jul 2020 17:32 UTC; 6 points) 's comment on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher by (
- 28 Jul 2020 21:12 UTC; 5 points) 's comment on Against Irreducible Normativity by (
- 6 Aug 2020 9:36 UTC; 3 points) 's comment on Against Irreducible Normativity by (
Parfit isn’t quite a non-naturalist (or rather, he’s a very unconventional kind of non-naturalist, not a Platonist) - he’s a ‘quietist’. Essentially, it’s the view that there are normative facts, they aren’t natural facts, but we don’t feel the need to say what category they fall into metaphysically, or that such a question is meaningless.
I think a variant of that, where we say ‘we don’t currently have a clear idea what they are, just some hints that they exist because of normative convergence, and the internal contradictions of other views’ is plausible:
This is something substantive that can be said—out of every major attempt to get at a universal ethics that has in fact been attempted in history: what produces the best outcome, what can you will to be a universal law, what would we all agree on, seem to produce really similar answers.
The particular convergence arguments given by Parfit and Hare are a lot more complex, I can’t speak to their overall validity. If we thought they were valid then we’d be seeing the entire mountain precisely. Since they just seem quite persuasive, we’re seeing the vague outline of something through the fog, but that’s not the same as just spotting a few free-floating rocks.
Now, run through these same convergence arguments but for decision theory and utility theory, and you have a far stronger conclusion. there might be a bit of haze at the top of that mountain, but we can clearly see which way the slope is headed.
This is why I think that ethical realism should be seen as plausible and realism about some normative facts, like epistemic facts, should be seen as more plausible still. There is some regularity here in need of explanation, and it seems somewhat more natural on the realist framework.
But instilling the urgency to do so may require another type of writing-that of science fiction, of more creative visionaries who are willing to paint in vivid detail a picture of what a flourishing human future could be.
If it’s emotive force you’re after, you may be interested in this—Toby Ord just released a collection of quotations on Existential risk and the future of humanity, everyone from Kepler to Winston Churchill (in fact, a surprisingly large number are from Churchill) to Seneca to Mill to the Aztecs—it’s one of the most inspirational things I have ever read, and makes it clear that there have always been people who cared about humanity as a whole. My all-time favourite is probably this by the philosopher Derek Parfit:
Life can be wonderful as well as terrible, and we shall increasingly have the power to make life good. Since human history may be only just beginning, we can expect that future humans, or supra-humans, may achieve some great goods that we cannot now even imagine. In Nietzsche’s words, there has never been such a new dawn and clear horizon, and such an open sea.
If we are the only rational beings in the Universe, as some recent evidence suggests, it matters even more whether we shall have descendants or successors during the billions of years in which that would be possible. Some of our successors might live lives and create worlds that, though failing to justify past suffering, would have given us all, including those who suffered most, reasons to be glad that the Universe exists.
You’ve given me a lot to think about! I broadly agree with a lot of what you’ve said here.
I think that it is a more damaging mistake to think moral antirealism is true when realism is true than vice versa, but I agree with you that the difference is nowhere near infinite, and doesn’t give you a strong wager.
However, I do think that normative anti-realism is self-defeating, assuming you start out with normative concepts (though not an assumption that those concepts apply to anything). I consider this argument to be step 1 in establishing moral realism, nowhere near the whole argument.
Epistemic anti-realism
Cool, I’m happy that this argument appeals to a moral realist! ….
...I don’t think this argument (“anti-realism is self-defeating”) works well in this context. If anti-realism is just the claim “the rocks or free-floating mountain slopes that we’re seeing don’t connect to form a full mountain,” I don’t see what’s self-defeating about that...
To summarize: There’s no infinitely strong wager for moral realism.
I agree that there is no infinitely strong wager for moral realism. As soon as moral realists start making empirical claims about the consequences of realism (that convergence is likely), you can’t say that moral realism is true necessarily or that there is an infinitely strong prior in favour of it. An AI that knows that your idealised preferences don’t cohere could always show up and prove you wrong, just as you say. If I were Bob in this dialogue, I’d happily concede that moral anti-realism is true.
If (supposing it were the case) there were not much consensus on anything to do with morality (“The rocks don’t connect...”), someone who pointed that out and said ‘from that I infer that moral realism is unlikely’ wouldn’t be saying anything self-defeating. Moral anti-realism is not self-defeating, either on its own terms or on the terms of a ‘mixed view’ like I describe here:
We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist ‘justification’ for our beliefs, in purely descriptive terms, the other in which there are mind-independent facts about which of our beliefs are justified...
However, I do think that there is an infinitely strong wager in favour of normative realism and that normative anti-realism is self-defeating on the terms of a ‘mixed view’ that starts out considering the two alternatives like that given above. This wager is because of the subset of normative facts that are epistemic facts.
The example that I used was about ‘how beliefs are justified’. Maybe I wasn’t clear, but I was referring to beliefs in general, not to beliefs about morality. Epistemic facts, e.g. that you should believe something if there is sufficient amount of evidence, are a kind of normative fact. You noted them on your list here.
So, the infinite wager argument goes like this -
1) On normative anti-realism there are no facts about which beliefs are justified. So there are no facts about whether normative anti-realism is justified. Therefore, normative anti-realism is self-defeating.
Except that doesn’t work! Because on normative anti-realism, the whole idea of external facts about which beliefs are justified is mistaken, and instead we all just have fundamental principles (whether moral or epistemic) that we use but don’t question, which means that holding a belief without (the realist’s notion of) justification is consistent with anti-realism.
So the wager argument for normative realism actually goes like this -
2) We have two competing ways of understanding how beliefs are justified. One is where we have anti-realist ‘justification’ for our beliefs, in purely descriptive terms of what we will probably end up believing given basic facts about how our minds work in some idealised situation. The other is where there are mind-independent facts about which of our beliefs are justified. The latter is more plausible because of 1).
Evidence for epistemic facts?
I find it interesting the imagined scenario you give in #5 essentially skips over argument 2) as something that is impossible to judge:
AI: Only in a sense I don’t endorse as such! We’ve gone full circle. I take it that you believe that just like there might be irreducibly normative facts about how to do good, the same goes for irreducible normative facts about how to reason?
Bob: Indeed, that has always been my view.
AI: Of course, that concept is just as incomprehensible to me.
The AI doesn’t give evidence against there being irreducible normative facts about how to reason, it just states it finds the concept incoherent, unlike the (hypothetical) evidence that the AI piles on against moral realism (for example, that people’s moral preferences don’t cohere).
Either you think some basic epistemic facts have to exist for reasoning to get off the ground and therefore that epistemic anti-realism is self-defeating, or you are an epistemic anti-realist and don’t care about the realist’s sense of ‘self-defeating’. The AI is in the latter camp, but not because of evidence, the way that it’s a moral anti-realist (...However, you haven’t established that all normative statements work the same way—that was just an intuition...), but just because it’s constructed in such a way that it lacks the concept of an epistemic reason.
So, if this AI is constructed such that irreducibly normative facts about how to reason aren’t comprehensible to it, it only has access to argument 1), which doesn’t work. It can’t imagine 2).
However, I think that we humans are in a situation where 2) is open to consideration, where we have the concept of a reason for believing something, but aren’t sure if it applies—and if we are in that situation, I think we are dragged towards thinking that it must apply, because otherwise our beliefs wouldn’t be justified.
However, this doesn’t establish moral realism—as you said earlier, moral anti-realism is not self-defeating.
If anti-realism is just the claim “the rocks or free-floating mountain slopes that we’re seeing don’t connect to form a full mountain,” I don’t see what’s self-defeating about that
Combining convergence arguments and the infinite wager
If you want to argue for moral realism, then you need evidence for moral realism, which comes in the form of convergence arguments. But the above argument is still relevant, because the convergence and ‘infinite wager’ arguments support each other.
The reason 2) would be bolstered by the success of convergence arguments (in epistemology, or ethics, or any other normative domain) is that convergence arguments increase our confidence that normativity is a coherent concept—which is what 2) needs to work. It certainly seems coherent to me, but this cannot be taken as self-evident since various people have claimed that they or others don’t have the concept.
I also think that 2) is some evidence in favour of moral realism, because it undermines some of the strongest antirealist arguments.
By contrast, for versions of normativity that depend on claims about a normative domain’s structure, the partners-in-crime arguments don’t even apply. After all, just because philosophers might—hypothetically, under idealized circumstances—agree on the answers to all (e.g.) decision-theoretic questions doesn’t mean that they would automatically also find agreement on moral questions.[29] On this interpretation of realism, all domains have to be evaluated separately
I don’t think this is right. What I’m giving here is such a ‘partners-in-crime’ argument with a structure, with epistemic facts at the base. Realism about normativity certainly should lower the burden of proof on moral realism to prove total convergence now, because we already have reason to believe normative facts exist. For most anti-realists, the very strongest argument is the ‘queerness argument’ that normative facts are incoherent or too strange to be allowed into our ontology. The ‘partners-in-crime’/‘infinite wager’ undermines this strong argument against moral realism. So some sort of very strong hint of a convergence structure might be good enough—depending on the details.
I agree that it then shifts the arena to convergence arguments. I will discuss them in posts 6 and 7.
So, with all that out of the way, when we start discussing the convergence arguments, the burden of proof on them is not colossal. If we already have reason to suspect that there are normative facts out there, perhaps some of them are moral facts. But if we found a random morass of different considerations under the name ‘morality’ then we’d be stuck concluding that there might be some normative facts, but maybe they are only epistemic facts, with nothing else in the domain of normativity.
I don’t think this is the case, but I will have to wait until your posts on that topic—I look forward to them!
All I’ll say is that I don’t consider strongly conflicting intuitions in e.g. population ethics to be persuasive reasons for thinking that convergence will not occur. As long as the direction of travel is consistent, and we can mention many positive examples of convergence, the preponderance of evidence is that there are elements of our morality that reach high-level agreement. (I say elements because realism is not all-or-nothing—there could be an objective ‘core’ to ethics, maybe axiology, and much ethics could be built on top of such a realist core—that even seems like the most natural reading of the evidence, if the evidence is that there is convergence only on a limited subset of questions.) If Kant could have been a utilitarian and never realised it, then those who are appalled by the repugnant conclusion could certainly converge to accept it after enough ideal reflection!
Belief in God, or in many gods, prevented the free development of moral reasoning. Disbelief in God, openly admitted by a majority, is a recent event, not yet completed. Because this event is so recent, Non-Religious Ethics is at a very early stage. We cannot yet predict whether, as in Mathematics, we will all reach agreement. Since we cannot know how Ethics will develop, it is not irrational to have high hopes.
- 29 Jul 2020 10:30 UTC; 7 points) 's comment on SDM’s Shortform by (LessWrong;
How to make anti-realism existentially satisfying
Instead of “utilitarianism as the One True Theory,” we consider it as “utilitarianism as a personal, morally-inspired life goal...
”While this concession is undoubtedly frustrating, proclaiming others to be objectively wrong rarely accomplished anything anyway. It’s not as though moral disagreements—or disagreements in people’s life choices—would go away if we adopted moral realism.
If your goal here is to convince those inclined towards moral realism to see anti-realism as existentially satisfying, I would recommend a different framing of it. I think that framing morality as a ‘personal life goal’ makes it seem as though it is much more a matter of choice or debate than it in fact is, and will probably ring alarm bells in the mind of a realist and make them think of moral relativism.
Speaking as someone inclined towards moral realism, the most inspiring presentations I’ve ever seen of anti-realism are those given by Peter Singer in The Expanding Circle and Eliezer Yudkowsky in his metaethics sequence. Probably not by coincidence—both of these people are inclined to be realists. Eliezer said as much, and Singer later became a realist after reading Parfit. Eliezer Yudkowsky on ‘The Meaning of Right’:
The apparent objectivity of morality has just been explained—and not explained away. For indeed, if someone slipped me a pill that made me want to kill people, nonetheless, it would not be right to kill people. Perhaps I would actually kill people, in that situation—but that is because something other than morality would be controlling my actions.
Morality is not just subjunctively objective, but subjectively objective. I experience it as something I cannot change. Even after I know that it’s myself who computes this 1-place function, and not a rock somewhere—even after I know that I will not find any star or mountain that computes this function, that only upon me is it written—even so, I find that I wish to save lives, and that even if I could change this by an act of will, I would not choose to do so. I do not wish to reject joy, or beauty, or freedom. What else would I do instead? I do not wish to reject the Gift that natural selection accidentally barfed into me.
And Singer in the Expanding Circle:
“Whether particular people with the capacity to take an objective point of view actually do take this objective viewpoint into account when they act will depend on the strength of their desire to avoid inconsistency between the way they reason publicly and the way they act.”
These are both anti-realist claims. They define ‘right’ descriptively and procedurally as arising from what we would want to do under some ideal circumstances, and rigidifies on the output of that idealization, not on what we want. To a realist, this is far more appealing than a mere “personal, morally-inspired life goal”, and has the character of ‘external moral constraint’, even if it’s not really ultimately external, but just the result of immovable or basic facts about how your mind will, in fact work, including facts about how your mind finds inconsistencies in its own beliefs. This is a feature, not a bug:
According to utilitarianism, what people ought to spend their time on depends not on what they care about but also on how they can use their abilities to do the most good. What people most want to do only factors into the equation in the form of motivational constraints, constraints about which self-concepts or ambitious career paths would be long-term sustainable. Williams argues that this utilitarian thought process alienates people from their actions since it makes it no longer the case that actions flow from the projects and attitudes with which these people most strongly identify...
The exact thing that Williams calls ‘alienating’ is the thing that Singer, Yudkowsky, Parfit and many other realists and anti-realists consider to be the most valuable thing about morality! But you can keep this ‘alienation’ if you reframe morality as being the result of the basic, deterministic operations of your moral reasoning, the same way you’d reframe epistemic or practical reasoning on the anti-realist view. Then it seems more ‘external’ and less relativistic.
One thing this framing makes clearer, which you don’t deny but don’t mention, is that anti-realism does not imply relativism.
In that case, normative discussions can remain fruitful. Unfortunately, this won’t work in all instances. There will be cases where no matter how outrageous we find someone’s choices, we cannot say that they are committing an error of reasoning.
What we can say, on anti-realism as characterised by Singer and Yudkowsky, is that they are making an error of morality. We are not obligated (how could we be?) towards relativism, permissiveness or accepting values incompatible with our own on anti-realism. Ultimately, you can just say that ‘I am right and you are wrong’.
That’s one of the major upsides of anti-realism to the realist—you still get to make universal, prescriptive claims and follow them through, and follow them through because they are morally right, and if people disagree with you then they are morally wrong and you aren’t obligated to listen to their arguments if they arise from fundamentally incompatible values. Put that way, anti-realism is much more appealing to someone with realist inclinations.
I thought that this post would make a bigger deal of the UK’s coronavirus response—currently top in the world for both vaccine development and large-scale clinical trials, and one of the leading funders of international vaccine development research.
This discussion continues to feel like the most productive discussion I’ve had with a moral realist! :)
Glad to be of help! I feel like I’m learning a lot.
What would you reply if the AI uses the same structure of arguments against other types of normative realism as it uses against moral realism? This would amount to the following trilemma for proponents of irreducible normativity (using section headings from my text)
...
(3) Is there a speaker-independent normative reality?
Focussing on epistemic facts, the AI could not make that argument. I assumed that you had the AI lack the concept of epistemic reasons because you agreed with me that there is no possible argument out of using this concept, if you start out with the concept, not because you just felt that it would have been too much of a detour to have the AI explain why it finds the concept incoherent.
I think I agree with all of this, but I’m not sure, because we seem to draw different conclusions. In any case, I’m now convinced I should have written the AI’s dialogue a bit differently. You’re right that the AI shouldn’t just state that it has no concept of irreducible normative facts. It should provide an argument as well!
How would this analogous argument go? I’ll take the AI’s key point and reword it to be speaking about epistemic facts instead of moral facts
AI: To motivate the use of irreducibly normative concepts, philosophers often point to instances of universal agreement on epistemic propositions. Sammy Martin uses the example “we always have a reason to believe that 2+2=4.” Your intuition suggests that all epistemic propositions work the same way. Therefore, you might conclude that even for propositions philosophers disagree over, there exists a solution that’s “just as right” as “we always have a reason to believe that 2+2=4” is right. However, you haven’t established that all epistemic statements work the same way—that was just an intuition. “we always have a reason to believe that 2+2=4” describes something that people are automatically disposed to believe. It expresses something that normally-disposed people come to endorse by their own lights. That makes it a true fact of some kind, but it’s not necessarily an “objective” or “speaker-independent” fact. If you want to show beyond doubt that there are epistemic facts that don’t depend on the attitudes held by the speakers—i.e., epistemic facts beyond what people themselves will judge to be what you should believe —you’d need to deliver a stronger example. But then you run into the following dilemma: If you pick a self-evident epistemic proposition, you face the critique that the “epistemic facts” that you claim exist are merely examples of a subjectivist epistemology. By contrast, if you pick an example proposition that philosophers can reasonably disagree over, you face the critique that you haven’t established what it could mean for one party to be right. If one person claims we have reason to believe that alien life exists, and another person denies this, how would we tell who’s right? What is the question that these two parties disagree on? Thus far, I have no coherent account of what it could mean for an epistemic theory to be right in the elusive, objectivist sense that Martin and other normative realists hold in mind.
Bob: I think I followed that. You mentioned the example of uncontroversial epistemic propositions, and you seemed somewhat dismissive about their relevance? I always thought those were pretty interesting. Couldn’t I hold the view that true epistemic statements are always self-evident? Maybe not because self-evidence is what makes them true, but because, as rational beings, we are predisposed to appreciate epistemic facts?
AI: Such an account would render epistemology very narrow. Incredibly few epistemic propositions appear self-evident to all humans. The same goes for whatever subset of “well-informed” or “philosophically sophisticated” humans you may want to construct.
It doesn’t work, does it? The reason it doesn’t work is that the scenario in which the AI is written where it ‘concluded’ that ‘incredibly few epistemic propositions appear self-evident to all humans’ is unimaginable. What would it mean for this to be true, what would the world have to be like?
I think the points in (3) apply to all domains of normativity, and they show that unless we come up with some other way to make normative concepts work that I haven’t yet thought of, we are forced to accept that normative concepts, in order to be action-guiding and meaningful, have to be linked to claims about convergence in human expert reasoners.
I do not believe it is logically impossible that expert reasoners could diverge on all epistemic facts, but I do think that it is in some fairly deep sense impossible. For there to be such a divergence, reality itself would have to be unknowable.
The ‘speaker-independent normative reality’ that epistemic facts refer to is just actual objective reality—of all the potential epistemic facts out there, the one that actually corresponds to reality is the one that ‘sticks out’ in exactly the way that a speaker-independent normative reality should.
This means that there is no possible world where anyone with the concept of epistemic facts gets convinced, probabilistically, because they fail to see any epistemic convergence, that there are no epistemic facts. There would never be such a lack of convergence.
So my initial point,
The AI is in the latter camp, but not because of evidence, the way that it’s a moral anti-realist (...However, you haven’t established that all normative statements work the same way—that was just an intuition...), but just because it’s constructed in such a way that it lacks the concept of an epistemic reason.
So, if this AI is constructed such that irreducibly normative facts about how to reason aren’t comprehensible to it, it only has access to argument 1), which doesn’t work. It can’t imagine 2).
still stands—that the AI is a normative anti-realist because it doesn’t have the concept of a normative reason, not because it has the concept and has decided that it probably doesn’t apply (and there was no alternative way for you to write the AI reaching that conclusion).
The trilemma applies here as well. Saying that it must apply still leaves you with the task of making up your mind on how normative concepts even work. I don’t see alternatives to my suggestions (1), (2) and (3).
So I take option (3), where the ‘extremely strong convergence’ on claims about epistemic facts about what we should believe implies with virtual certainty that there is a speaker-independent normative reality, because the reality-corresponding collection of epistemic claims, in fact, stick out compared to all the other possible epistemic facts.
So, maybe the ‘normativity argument’ as I called it is really just another convergence argument, but just a convergence argument that is of infinite or near-infinite strength, because the convergence among our beliefs about what is epistemically justified is so strong that it’s effectively unimaginable that they couldn’t converge.
If you wish to deny that epistemic facts are needed to explain the convergence, I think that you end up in quite a strong form of pragmatism about truth, and give up on the notion of knowing anything about mind-independent objective reality, Kant-style, for reasons that I discuss here. That’s quite a bullet to bite. You don’t expect much convergence on epistemic facts, so maybe you are already a pragmatist about truth?
“Since we probably agree that there is a lot of convergence among expert reasoners on epistemic facts, we shouldn’t be too surprised if morality works similarly.”
And I kind of agree with that, but I don’t know how much convergence I would expect in epistemology. (I think it’s plausible that it would be higher than for morality, and I do agree that this is an argument to at least look really closely for ways of bringing about convergence on moral questions.)
Lastly,
My confidence that convergence won’t work is based on not only observing disagreements in fundamental intuitions, but also on seeing why people disagree, and seeing that these disagreements are sometimes “legitimate” because ethical discussions always get stuck in the same places (differences in life goals, which is intertwined with axiology).
I’ll have to wait for your more specific arguments on this topic! I did give some preliminary discussion here of why, for example, I think that you’re dragged towards a total-utilitarian view whether you like it or not. It’s also important to note that the convergence arguments aren’t (principally) about people, but about possible normative theories—people might refuse to accept the implications of their own beliefs.
- 28 Aug 2020 4:30 UTC; 2 points) 's comment on SDM’s Shortform by (LessWrong;
I kind of feel this way, except that I think the target criteria can differ between people, and are often underdetermined. (As you point out in some comment, things also depend on which parts of one’s psychology one identifies with.)
I think that you were referring to this?
Normative realism implies identification with system 2
...
I find this very interesting because locating personal identity in system 1 feels conceptually impossible or deeply confusing. No matter how much rationalization goes on, it never seems intuitive to identify myself with system 1. How can you identify with the part of yourself that isn’t doing the explicit thinking, including the decision about which part of yourself to identify with? It reminds me of Nagel’s The Last Word.
My point here was that if you are a realist about normativity of any kind, you have to identify with system 2 as that is what makes the (potentially correct) judgements about what you ought to do.
But that’s not to say that if you are antirealist, you have to identify with system 1. If you are an antirealist, then in some sense (the realist sense) you don’t have to identify with anything, but how easy and natural it is to identify with system 2 depends on how much importance you place on coherence among your values, which in turn depends on how coherent and universalizable your values actually are—you can be an antirealist but accept that some fairly strong degree of convergence does occur in practice, for whatever reason. This:
target criteria can differ between people, and are often underdetermined
seems to imply that you don’t think there will be much convergence practically, or that we should feel a strong pressure to reach high-level agreement on moral questions because such a project is never going to succeed.
I think this is part of the motivation for your ‘case for suffering focussed ethics’ - even though any asymmetry between preventing suffering and producing happiness falls victim to the absurd conclusion and paralysis argument, I’m assuming that this wouldn’t bother you much.
I talk about why, regardless of whether realism is true, I think this is an unstable position in that post.
Thanks for getting this done so quickly! Do you have any internal estimates (even order of magnitude ones) of the margin by which this exceeds Givewell’s top recommended charities? I’m intending to donate, but my decision would be significantly different if, for example, you thought GiveIndia Oxygen fundraiser was currently ~1-1.5 times better than Givewell’s top recommended charities, versus ~20 times better.
- 14 May 2021 13:22 UTC; 24 points) 's comment on Covid 5/13: Moving On by (LessWrong;
- 4 Jun 2021 14:16 UTC; 10 points) 's comment on Covid 6/3: No News is Good News by (LessWrong;
Thanks for getting back to me—I took Jeff’s calculations and did some guesstimating to try and figure out what demand might look like over the next few weeks. The only covid forecast I was able to find for India (let me know if you’ve seen another!) is this by IHME. Their ‘hospital resource use’ forecast shows that they expect a demand of 2 million beds, roughly what was the case in the week before Jeff produced his estimate of the value of oxygen-based interventions (last week of April), to be exceeded until the start of June, which is 30 days from when the estimate was produced. I’m assuming that his estimate was based on what the demand looked like over the previous week.
There’s a lot of uncertainty in this figure, but around 3-8 weeks is a reasonable range for how many weeks demand for oxygen will be at or above what it was in the last week of April, given that the IHME forecast is 4 weeks.
Taking the mean of the estimates, excluding ventilators (since they’re an outlier), gives us 31 days of use to equal givewell’s top charities, i.e. 4 weeks, and we can expect 3-8 weeks of demand being that high. So depending on how the epidemic pans out, it seems like, very roughly, three quarters to twice as good as Givewell’s top charities is a reasonable range of uncertainty.
EDIT: what I said should be taken as a lower limit, as it assumes that the value of oxygen is exactly what Jeff calculated when demand is greater than or equal to 2 Million, and zero below then, when in reality the value is real but smaller if demand is under 2M. I tried to account for this by skewing my guess, so 0.75 to 2x as good, where IHMEs demand numbers would suggest 1x as good.
- Three charitable recommendations for COVID-19 in India by 5 May 2021 8:23 UTC; 71 points) (
- 14 May 2021 13:22 UTC; 24 points) 's comment on Covid 5/13: Moving On by (LessWrong;
We know he’s been active on lesswrong in the past. Is it possible he’s been reading the posts here?
Is there any public organisation which can be proud of last year?
This is an important question, because we want to find out what was done right organizationally in a situation where most failed, so we can do more of it. Especially if this is a test-run for X-risks.
There are two examples that come to mind of government agencies that did a moderately good job at a task which was new and difficult. One is the UK’s vaccine taskforce, which was set up by Dominic Cummings and the UK’s chief scientific advisor, Patrick Vallance and responsible for the relatively fast procurement and rollout. You might say similar for the Operation Warp Speed team, but the UK vaccine taskforce overordered to a larger extent than Warp Speed and was also responsible for other sane things like the simple oldest-first vaccine prioritization and the first doses first decision, which prevented a genuine catastrophe due to the B117 variant. (Also credit to the MHRA (the UK’s regulator) for mostly staying out of the way.)
See this from Cummings’ blog, which also outlines many of the worst early expert failures on covid, and my discussion of it here:
This is why there was no serious vaccine plan — i.e spending billions on concurrent (rather than the normal sequential) creation/manufacturing/distribution etc — until after the switch to Plan B. I spoke to Vallance on 15 March about a ‘Manhattan Project’ for vaccines out of Hancock’s grip but it was delayed by the chaotic shift from Plan A to lockdown then the PM’s near-death. In April Vallance, the Cabinet Secretary and I told the PM to create the Vaccine Taskforce, sideline Hancock, and shift commercial support from DHSC to BEIS. He agreed, this happened, the Chancellor supplied the cash. On 10 May I told officials that the VTF needed a) a much bigger budget, b) a completely different approach to DHSC’s, which had been mired in the usual processes, so it could develop concurrent plans, and c) that Bingham needed the authority to make financial decisions herself without clearance from Hancock.
This plan later went on to succeed and significantly outperform expectations for rollout speed, with early approval for the AZ and Pfizer vaccines and an early decision to delay second doses by 12 weeks. I see the success of the UK vaccine taskforce and its ability to have a somewhat appropriate sense of the costs and benefits involved and the enormous value of vaccinations, to be a good example of how it’s institution design that is the key issue which most needs fixing. Have an efficient, streamlined taskforce, and you can still get things done in government.
The other example of success often discussed is the central banks, especially in the US, which responded quickly to the COVID-19 dip and prevented a much worse economic catastrophe. Alex Tabarrok:
So what lessons should we take from this? Lewis doesn’t say but my colleague Garett Jones argues for more independent agencies in his excellent book 10% Less Democracy. The problem with the CDC was that after 1976 it was too responsive to political pressures, i.e. too democratic. What are the alternatives?
The Federal Reserve is governed by a seven-member board each of whom is appointed to a single 14- year term, making it rare for a President to be able to appoint a majority of the board. Moreover, since members cannot be reappointed there is less incentive to curry political favor. The Chairperson is appointed by the President to a four-year term and must also be approved by the Senate. These checks and balances make the Federal Reserve a relatively independent agency with the power to reject democratic pressures for inflationary stimulus. Although independent central banks can be a thorn in the side of politicians who want their aid in juicing the economy as elections approach, the evidence is that independent central banks reduce inflation without reducing economic growth. A multi-member governing board with long and overlapping appointments could also make the CDC more independent from democratic politics which is what you want when a once in 100 year pandemic hits and the organization needs to make unpopular decisions before most people see the danger.
I really would like to be able to agree with Tabarrok here and say that, yes, choosing the right experts and protecting them from democratic feedback is the right answer and all we would need, and the expert failures we saw were due to democratic pressure in one form another, but the problem is that we can just look at SAGE in the UK early in the Pandemic or Anders Tegnell in Sweden, who were close to unfireable and much more independent, but underperformed badly. Or China, which is entirely protected from democratic interference and still didn’t do challenge trials.
Just saying the words ‘have the right experts and prevent them from being biased by outside interference’ doesn’t make it so. But, at the same time, it is possible to have fast-responding teams of experts that make the right decisions, if they’re the right experts—the Vaccine Taskforce proves that. I think the advice from the book 10% less democracy still stands, but we have to approach implementing it with far more caution than I would have thought pre-covid.
It seems like following the 10% less democracy policy can give you either a really great outcome like the one you’ve described, and like we saw a small sliver of in the UK’s vaccine procurement, or a colossal disaster like your impossible to fire expert epidemiologists torpedoing your economy and public health and then changing their mind a year late.
Suppose the UK had created a ‘pandemic taskforce’ with similar composition to the vaccine taskforce, in February instead of April, and with a wider remit over things like testing and running the trials. I think many of your happy timeline steps could have taken place.
One of the more positive signs that I’ve seen in recent times, is that well-informed elite opinion (going by, for example, the Economist editorials) has started to shift towards scepticism of these institutions and a recognition of how badly they’ve failed. We even saw an NYT article about the CDC and whether reform is possible.
Among the people who matter for policymaking, the scale of the failure has not been swept under the rug. See here:
We believe that Mr Biden is wrong. A waiver may signal that his administration cares about the world, but it is at best an empty gesture and at worst a cynical one.
...
Economists’ central estimate for the direct value of a course is $2,900—if you include factors like long covid and the effect of impaired education, the total is much bigger.
This strikes me as the sort of remark I’d expect to see in one of these threads, which has to be a good sign.
- 18 Aug 2021 17:09 UTC; 18 points) 's comment on AMA: Jason Brennan, author of “Against Democracy” and creator of a Georgetown course on EA by (
- 5 Oct 2021 14:15 UTC; 11 points) 's comment on Dominic Cummings : Regime Change #2: A plea to Silicon Valley by (LessWrong;
- 27 Nov 2021 15:16 UTC; 9 points) 's comment on Omicron Variant Post #1: We’re F***ed, It’s Never Over by (LessWrong;
- 5 Oct 2021 18:01 UTC; 2 points) 's comment on Dominic Cummings : Regime Change #2: A plea to Silicon Valley by (LessWrong;
I don’t think the view that moral philosophers had a positive influence on moral developments in history is a simple model of ‘everyone makes a mistake, moral philosopher points out the mistake and convinces people, everyone changes their minds’. I think that what Bykvist, Ord and MacAskill were getting at is that these people gave history a shove at the right moment.
At the very least, it doesn’t seem that discovering the correct moral view is sufficient for achieving moral progress in actuality.
I have no doubt that they’d agree with you about this. But if we all accept this claim, there are two further models we could look at.
One is a model where changing economic circumstances influence what moral views it is feasible to act on, but progress in moral knowledge still affects what we choose to do, given the constraints of our economic circumstances.
The other is a model where economics determines everything and the moral views we hold are an epiphenomenon blown about by these conditions (note this is very similar to some Marxist views of history). Your view is that ‘the two are totally decoupled’, but at most your examples just show that the two are decoupled somewhat, not that moral reasoning has no effect. And there are plenty of examples that show explicit moral reasoning having at least some effect on events—see Bykvist, Ord and MacAskill’s original list.
The strawman view that moral advances determine everything is not what’s being proposed by Bykvist, Ord and MacAskill, it’s the mixed view that ideas influence things within the realm of what’s possible.
Do you think that the West’s disastrous experience with Coronavirus (things like underinvesting in vaccines, not adopting challenge trials, not suppressing the virus, mixed messaging on masks early on, the FDA’s errors on testing, and others as enumerated in this thread- or in books like The Premonition) has strengthened, weakened or not changed much the credibility of your thesis in ‘Against Democracy’, that we should expect better outcomes if we give the knowledgeable more freedom to choose policy?
For reasons it might weaken ‘Against Democracy’, it seems like a lot of expert bureaucracies did an unusually bad job because they couldn’t take correction, see this summary post for examples:
https://forum.effectivealtruism.org/posts/dYiJLvcRJ4nk4xm3X#Vax
For reasons it might strengthen the argument, it seems like the institutions that did better than average were the ones that were more able to act autonomously, see e.g. this from Alex Tabarok,
https://marginalrevolution.com/marginalrevolution/2021/06/the-premonition.html
Or this summary
- 5 Oct 2021 14:15 UTC; 11 points) 's comment on Dominic Cummings : Regime Change #2: A plea to Silicon Valley by (LessWrong;
- 5 Oct 2021 18:01 UTC; 2 points) 's comment on Dominic Cummings : Regime Change #2: A plea to Silicon Valley by (LessWrong;
Thanks for this reply. Would you say then that Covid has strengthened the case for some sorts of democracy reduction, but not others? So we should be more confident in enlightened preference voting but less confident in Garett Jones’ argument (from 10% less democracy) in favour of more independent agencies?
Very good summary! I’ve been working on a (much drier) series of posts explaining different AI risk scenarios—https://forum.effectivealtruism.org/posts/KxDgeyyhppRD5qdfZ/link-post-how-plausible-are-ai-takeover-scenarios
But I think I might adopt ‘Sycophant’/‘Schemer’ as better more descriptive names for WFLL1/WFLL2, Outer/Inner alignment failure going forward
I also liked that you emphasised how much the optimist Vs pessimist case depends on hard to articulate intuitions about things like how easily findable deceptive models are and how easy incremental course correction is. I called this the ‘hackability’ of alignment—https://www.lesswrong.com/posts/zkF9PNSyDKusoyLkP/investigating-ai-takeover-scenarios#Alignment__Hackability_
One thing that your account might miss is the impact of ideas on empowerment and well-being down the line. E.g. it’s a very common argument that Christan ideas about the golden rule motivated anti-slavery sentiment, so if the Roman empire hadn’t spread Christianity across Europe then we’d have ended up with very different values.
Similarly, even if the content of ancient Greek moral philosophy wasn’t directly useful to improve wellbeing, they inspired the Western philosphical tradition that led to Enlignment ideals that led to the abolition of slavery.
I’ve told two stories about why the Greeks and Romans might have been necessary for future moral progress—are you skeptical of these appeals to historical contingency or are the long run causes of these events just outside the scope of this way of looking at history?
- 4 Dec 2021 13:04 UTC; 8 points) 's comment on Summary of history (empowerment and well-being lens) by (
Hi Ben—this episode really gave me a lot to think about! Of the ‘three classic arguments’ for AI X-risk you identify, I argued in a previous post that the ‘discontinuity premise’ is based on taking a high-level argument that should be used to establish that sufficiently capable AI will produce very fast progress too literally and assuming the ‘fast progress’ has to happen suddenly and in a specific AI.
Your discussion of the other two arguments led me to conclude that the same sort of mistake is at work in all of them, as I explain here—each is (I think) a case of ‘directly applying a (correct) abstract argument (incorrectly) to the real world’. So we shouldn’t say that the classic arguments are wrong, just overextended/incorrectly applied, as I argue here.
If rapid capability gain, the orthogonality thesis and instrumental convergence are good reasons to suggest AI might pose an existential risk, but were just interpreted too literally, and it’s also true that the ‘new’ arguments make use of these old arguments along with further premises and evidence, then that should raise our confidence that some basic issues have been correctly dealt with since the 2000s. You suggest something like this in the podcast episode, but the discussion never got far into exactly what the underlying intuitions might be:
Do you think there actually is an ‘intuitive core’ to the old arguments that is correct?