Here’s my (in-progress) collation of important EA resources, organised by topic. Contributions welcome :)
Using those two different types of “should” makes your proposed sentence (“It seems that (at least) the humans who are utilitarians should commit mass suicide in order to bring the new beings into existence, because that’s what utilitarianism implies is the right action in that situation.”) unnecessarily confusing, for a couple of reasons.
1. Most moral anti-realists don’t use “epistemic should” when talking about morality. Instead, I claim, they use my definition of moral should: “X should do Y means that I endorse/prefer some moral theory T and T endorses X doing Y”. (We can test this by asking anti-realists who don’t subscribe to negative utilitarianism whether a negative utilitarian should destroy the universe—I predict they will either say “no” or argue that the question is ambiguous.) And so introducing “epistemic should” makes moral talk more difficult.
2. Moral realists who are utilitarians and use “moral should” would agree with your proposed sentence, and moral anti-realists who aren’t utilitarians and use “epistemic should” would also agree with your sentence, but for two totally different reasons. This makes follow-up discussions much more difficult.
How about “Utilitarianism endorses humans voluntarily replacing themselves with these new beings.” That gets rid of (most of) the contractarianism. I don’t think there’s any clean, elegant phrasing which then rules out the moral uncertainty in a way that’s satisfactory to both realists and anti-realists, unfortunately—because realists and anti-realists disagree on whether, if you prefer/endorse a theory, that makes it rational for you to act on that theory. (In other words, I don’t know whether moral realists have terminology which distinguishes between people who act on false theories that they currently endorse, versus people who act on false theories they currently don’t endorse).
I originally wrote a different response to Wei’s comment, but it wasn’t direct enough. I’m copying the first part here since it may be helpful in explaining what I mean by “moral preferences” vs “personal preferences”:
Each person has a range of preferences, which it’s often convenient to break down into “moral preferences” and “personal preferences”. This isn’t always a clear distinction, but the main differences:
1. Moral preferences are much more universalisable and less person-specific (e.g. “I prefer that people aren’t killed” vs “I prefer that I’m not killed”).
2. Moral preferences are associated with a meta-preference that everyone has the same moral preferences. This is why we feel so strongly that we need to find a shared moral “truth”. Fortunately, most people are in agreement in our societies on the most basic moral questions.
3. Moral preferences are associated with a meta-preference that they are consistent, simple, and actionable. This is why we feel so strongly that we need to find coherent moral theories rather than just following our intuitions.
4. Moral preferences are usually phrased as “X is right/wrong” and “people should do right and not do wrong” rather than “I prefer X”. This often misleads people into thinking that their moral preferences are just pointers to some aspect of reality, the “objective moral truth”, which is what people “objectively should do”.
When we reflect on our moral preferences and try to make them more consistent and actionable, we often end up condensing our initial moral preferences (aka moral intuitions) into moral theories like utilitarianism. Note that we could do this for other preferences as well (e.g. “my theory of food is that I prefer things which have more salt than sugar”) but because I don’t have strong meta-preferences about my food preferences, I don’t bother doing so.
The relationship between moral preferences and personal preferences can be quite complicated. People act on both, but often have a meta-preference to pay more attention to their moral preferences than they currently do. I’d count someone as a utilitarian if they have moral preferences that favour utilitarianism, and these are a non-negligible component of their overall preferences.
My first objection is that you’re using a different form of “should” than what is standard. My preferred interpretation of “X should do Y” is that it’s equivalent to “I endorse some moral theory T and T endorses X doing Y”. (Or “according to utilitarianism, X should do Y” is more simply equivalent to “utilitarianism endorses X doing Y”). In this case, “should” feels like it’s saying something morally normative.
Whereas you seem to be using “should” as in “a person who has a preference X should act on X”. In this case, should feels like it’s saying something epistemically normative. You may think these are the same thing, but I don’t, and either way it’s confusing to build that assumption into our language. I’d prefer to replace this latter meaning of “should” with “it is rational to”. So then we get:
“it is rational for humans who are utilitarians to commit mass suicide in order to bring the new beings into existence, because that’s what utilitarianism implies is the right action.”
My second objection is that this is only the case if “being a utilitarian” is equivalent to “having only one preference, which is to follow utilitarianism”. In practice people have both moral preferences and also personal preferences. I’d still count someone as being a utilitarian if they follow their personal preferences instead of their moral preferences some (or even most) of the time. So then it’s not clear whether it’s rational for a human who is a utilitarian to commit suicide in this case; it depends on the contents of their personal preferences.
I think we avoid all of this mess just by saying “Utilitarianism endorses replacing existing humans with these new beings.” This is, as I mentioned earlier, a similar claim to “ZFC implies that 1 + 1 = 2″, and it allows people to have fruitful discussions without agreeing on whether they should endorse utilitarianism. I’d also be happy with Simon’s version above: “Utilitarianism seems to imply that humans should...”, although I think it’s slightly less precise than mine, because it introduces an unnecessary “should” that some people might take to be a meta-level claim rather than merely a claim about the content of the theory of utilitarianism (this is a minor quibble though. Analogously: “ZFC implies that 1 + 1 = 2 is true”).
Anyway, we have pretty different meta-ethical views, and I’m not sure how much we’re going to converge, but I will say that from my perspective, your conflation of epistemic and moral normativity (as I described earlier) is a key component of why your position seems confusing to me.
Are you aware of any surveys or any other evidence supporting this? (I’d accept “most people in AI safety that I know started working in it because EA investigative work convinced them that AI safety matters” or something of that nature.)
I’m endorsing this, and I’m confused about which part you’re skeptical about. Is it the “many EAs” bit? Obviously the word “many” is pretty fuzzy, and I don’t intend it to be a strong claim. Mentally the numbers I’m thinking of are something like >50 people or >25% of committed (or “core”, whatever that means) EAs. Don’t have a survey to back that up though. Oh, I guess I’m also including people currently studying ML with the intention of doing safety. Will edit to add that.
Why are you trying to answer this, instead of “How should I update, given the results of all available investigations into AI safety as a cause area?”
There are other questions that I would like answers to, not related to AI safety, and if I trusted EA consensus, then that would make the process much easier.
For this question then, it seems that Paul Christiano also needs to be discounted (and possibly others as well but I’m not as familiar with them).
Indeed, I agree.
Okay, thanks. So I guess the thing I’m curious about now is: what heuristics do you have for deciding when to prioritise contractarian intuitions over consequentialist intuitions, or vice versa? In extreme cases where one side feels very strongly about it (like this one) that’s relatively easy, but any thoughts on how to extend those to more nuanced dilemmas?
I think that “utilitarianism seems to imply that humans who are utilitarians should...” is a type error regardless of whether you’re a realist or an anti-realist, in the same way as “the ZFC axioms imply that humans who accept those axioms should believe 1+1=2″. That’s not what the ZFC axioms imply—actually, they just imply that 1+1 = 2, and it’s our meta-theory of mathematics which determines how you respond to this fact. Similarly, utilitarianism is a theory which, given some actions (or maybe states of the world, or maybe policies) returns a metric for how “right” or “good” they are. And then how we relate to that theory depends on our meta-ethics.
Given how confusing talking about morality is, I think it’s important to be able to separate the object-level moral theories from meta-ethical theories in this way. (For more along these lines, see my post here).
I imagine so, but if that’s the reason it seems out of place in a paper on theoretical ethics.
Nice! Seems like a cool paper. One thing that confuses me, though, is why the authors think that their theory’s “moral risk aversion with respect to empirically expected utility” is undesirable. People just have weird intuitions about expected utility all the time, and don’t reason about it well in general. See, for instance, how people prefer (even when moral uncertainty isn’t involved) to donate to many charities rather than donating only to the one highest expected utility charity. It seems reasonable to call that preference misguided, so why can’t we just call the intuitive objection to “moral risk aversion with respect to empirically expected utility” misguided?
Let me try answer the latter question (and thanks for pushing me to flesh out my vague ideas more!) One very brief way you could describe the development of AI safety is something like “A few transhumanists came up with some key ideas and wrote many blog posts. The rationalist movement formed from those following these things online, and made further contributions. Then the EA movement formed, and while it was originally focused on causes like global poverty, over time did a bunch of investigative work which led many EAs to become convinced that AI safety matters, and to start working on it, directly or indirectly (or to gain skills with the intent of doing such work).”
The three questions I am ultimately trying to answer are: a) how valuable is it to build up the EA movement? b) how much should I update when I learn that a given belief is a consensus in EA? and c) how much evidence do the opinions of other people provide in favour of AI safety being important?
To answer the first question, assuming that analysis of AI safety as a cause area is valuable, I should focus on contributions by people who were motivated or instigated by the EA movement itself. Here Nick doesn’t count (except insofar as EA made his book come out sooner or better).
To answer the second question, it helps to know whether the focus on AI safety in EA came about because many people did comprehensive due diligence and shared their findings, or whether there wasn’t much investigation and the ubiquity of the belief was driven via an information cascade. For this purpose, I should count work by people to the extent that they or people like them are likely to critically investigate other beliefs that are or will become widespread in EA. Being motivated to investigate AI safety by membership in the EA movement is the best evidence, but for the purpose of answering this question I probably should have used “motivated by the EA movement or motivated by very similar things to what EAs are motivated by”, and should partially count Nick.
To answer the third question, it helps to know whether the people who have become convinced that AI safety is important are a relatively homogenous group who might all have highly correlated biases and hidden motivations, or whether a wide range of people have become convinced. For this purpose, I should count work by people to the extent that they are dissimilar to the transhumanists and rationalists who came up with the original safety arguments, and also to the extent that they rederived the arguments for themselves rather than being influenced by the existing arguments. Here EAs who started off not being inclined towards transhumanism or rationalism at all count the most, and Nick counts very little.
Note that Nick is quite an outlier though, so while I’m using him as an illustrative example, I’d prefer engagement on the general points rather than this example in particular.
I agree and didn’t mean to imply that Knutsson endorses the argument in absolute terms; thanks for the clarification.
To my knowledge it doesn’t meet the “Was motivated or instigated by EA” criterion, since Nick had been developing those ideas since well before the EA movement started. I guess he might have gotten EA money while writing the book, but even if that’s the case it doesn’t feel like a central example of what I’m interested in.
Thanks for the informative reply! And also for writing the paper in the first place :)
“Such theories could be understood as sometimes prescribing ‘say whatever is optimal to say; that is, say whatever will bring about the best results.’ It might be optimal to pretend to not bite the bullet even though the person actually does.”
I think we need to have high epistemic standards in this community, and would be dismayed if a significant number of people with strong moral views were hiding them in order to make a better impression on others. (See also https://forum.effectivealtruism.org/posts/CfcvPBY9hdsenMHCr/integrity-for-consequentialists)
Nice post :) A couple of comments:
even if we’re at some enormously influential time right now, if there’s some future time that is even more influential, then the most obvious EA activity would be to invest resources (whether via financial investment or some sort of values-spreading) in order that our resources can be used at that future, more high-impact, time. Perhaps there’s some reason why that plan doesn’t make sense; but, currently, almost no-one is even taking that possibility seriously.
To me it seems that the biggest constraint on being able to invest in future centuries is the continuous existence of a trustworthy movement from now until then. I imagine that a lot of meta work implicitly contributes towards this; so the idea that the HoH is far in the future is an argument for more meta work (and more meta work targeted towards EA longevity in particular). But my prior on a given movement remaining trustworthy over long time periods is quite low, and becomes lower the more money it is entrusted with.
But there are future scenarios that we can imagine now that would seem very influential:
To the ones you listed, I would add:
The time period during which we reach technological completion, since from then on the stochasticity from the rate of technological advancement becomes a much less important factor.
As you mentioned previously, the time period during which we develop comprehensive techniques for engineering the motivations and values of the subsequent generation—if it actually happens to not be very close to us. (E.g. it might require a much more developed understanding of sociology than we currently have to carry out in practice).
RAISE was oriented toward producing people who become typical MIRI researchers… I expect that MIRI needs atypically good researchers.
Slightly odd phrasing here which I don’t really understand, since I think the typical MIRI researcher is very good at what they do, and that most of them are atypically good researchers compared with the general population of researchers.
Do you mean instead “RAISE was oriented toward producing people who would be typical for an AI researcher in general”? Or do you mean that there are only minor benefits from additional researchers who are about as good as current MIRI researchers?
Nice document overall, makes a lot of sense. A few small (slightly nit-picky) comments:
Our vision is an optimal world.
This slogan feels a bit off to me. Most EA activities are aimed towards avoiding clearly bad things; the idea of aiming for any specific conception of utopia doesn’t seem to me to represent that very well. There’s a lot of disagreement over what sort of worlds would be optimal, or whether that concept even makes sense.
People for whom doing good is a goal in their life, who are open to changing their focus
I’m not sure either of these things is a crucial characteristic of the people you should be targeting. Consider someone working in an EA cause area who’s not open to changing their focus, and who joined that area solely out of personal interest, but who nevertheless is interested in EA ideas and contributes a lot of useful things to the community (career guidance, support, etc).
We also will attempt to track the following metrics to inform strategy...
While I’m sure you’ll have a holistic approach towards these metrics, they all fall into the broad bucket of “do more standard EA things”. I have some concerns that this leads to people overfitting to ingroup incentives. So I’d suggest also prioritising something like “promoting the general competence and skills of group members”. For example, there are a bunch of EA London people currently working in government. If they informally gave each other advice and mentorship and advanced to more senior roles more rapidly, that would be pretty valuable, but not show up in any of the metrics you mention.
Wei’s list focused on ethics and decision theory, but I think that it would be most valuable to have more good conceptual analysis of the arguments for why AI safety matters, and particularly the role of concepts like “agency”, “intelligence”, and “goal-directed behaviour”. While it’d be easier to tackle these given some knowledge of machine learning, I don’t think that background is necessary—clarity of thought is probably the most important thing.