TW123

Karma: 3,718

TW123 Sep 4, 2022, 3:06 AM
8 points
1 ∶ 0
on: EA is about maximization, and maximization is perilous
And I’m nervous about what I perceive as dynamics in some circles where people seem to “show off” how little moderation they accept—how self-sacrificing, “weird,” extreme, etc. they’re willing to be in the pursuit of EA goals.
There is an episode in the life of the Buddha where he believes that it would be best for him to eat very little. He does so and:
When I wanted to touch the skin of my belly I felt my backbone, and when I wanted to touch my backbone I felt the skin of my belly; in fact the skin of my belly stuck to my backbone because of eating so little
But it does not work for him:
And yet by means of this severe, harsh practice I had achieved no special knowledge and insight beyond the capacity of human beings and worthy of the noble ones: might there in fact be another path to awakening? Then it occurred to me that I remembered that when I was sitting in the cool shade of a rose apple tree while my Sakyan father was engaged in work, I had spent time having attained the joy and happiness of the first absorption, which is accompanied by thinking and examining, and is born of seclusion: might this in fact be the path to awakening?
He concludes:
’It occurred to me that it would not be easy to achieve this happiness with a body that had become so extremely emaciated, so I thought I should take some solid food, some rice gruel. Then I took solid food, some rice gruel. Now at that time there were five monks with me, who thought: “The ascetic Gotama [the Buddha] will inform us of any Truth he achieves.” But after I had taken some solid food, some rice gruel, those five monks lost their enthusiasm for me and left, thinking: “The ascetic Gotama is one for excess. He has given up the struggle and reverted to a life of excess.”
While it’s not common in my experience for EAs to intentionally deprive themselves of food (at least not nowadays), they do sometimes tend to deprive themselves of time for things other than impact. Perhaps they shouldn’t.
And be wary of being one of those monks.
(From The Dialogue with Prince Bodhi)

TW123 Sep 3, 2022, 7:08 PM
7 points
0 ∶ 0
in reply to: Max_Daniel’s comment on: EA is about maximization, and maximization is perilous
Thanks for the background on esoteric morality!
Yes, I perhaps should have been more clear that “Government House” was not Sidgwick’s term, but a somewhat derogatory term levied against him.

TW123 Sep 3, 2022, 3:57 AM
71 points
1 ∶ 0
on: EA is about maximization, and maximization is perilous
Thank you for writing this. For a while, I have been thinking of writing a post with many similar themes and maybe I still will at some point. But this post fills a large hole.
As is obligatory for me, I must mention Derek Parfit, who tends to have already well-described many ideas that resurface later.
In Reasons and Persons, Part 1 (especially Chapter 17), Derek Parfit argues that good utilitiarians should self-efface their utilitarianism. This is because people tend to have motivated reasoning, and tend to be wrong. Under utilitarianism, it is possible to justify nearly anything, provided your epistemics are reasonably bad (your epistemics would have to be very bad to justify murder under deontological theories that prohibit murder; you would have to claim that something was not in fact murder at all). Parfit suggests adopting whatever moral system seems to be most likely to produce the highest utility for that person in the long run (perhaps some theory somewhat like virtue ethics). This wasn’t an original idea, and Mill said similar things.
One way to self-efface your utilitiarianism would be to say “yeah, I know, it makes sense under utilitarianism for me to keep my promises” (or whatever it may be). Parfit suggests that may not be enough, because deep down you still believe in utilitarianism; it will come creeping through (if not in you, in some proportion of people who self-efface this way). He says that you may instead need to forget that you ever believed in utilitarianism, even if you think it’s correct. You need to believe a lie, and perhaps even convince everyone else of this lie.
He also draws an interesting caveat: what if the generally agreed upon virtues or rules are no longer those with the highest expected utility? If nobody believed in utilitarianism, why would they ever be changed? He responds:
This suggests that the most that could be true is that C [consequentialism] is partly self-effacing. It might be better if most people caused themselves to believe some other theory, by some process of self-deception that, to succeed, must also be forgotten. But, as a precaution, a few people should continue to believe C, and should keep convincing evidence about this self-deception. These people need not live in Government House, or have any other special status. If things went well, the few would do nothing. But if the moral theory believed by most did become disastrous, the few could then produce their evidence. When most people learnt that their moral beliefs were the result of self-deception, this would undermine these beliefs, and prevent the disaster.
This wasn’t an original idea either; Parfit here is making a reference to Sidgwick’s “Government House utilitarianism,” which seemed to suggest only people in power should believe utilitarianism but not spread it. Parfit passingly suggests the utilitarians don’t need to be the most powerful ones (and indeed Sidgwick’s assertion may have been motivated by his own high position).
Sometimes I think that this is the purpose of EA. To attempt to be the “few people” to believe consequentialism in a world where commonsense morality really does need to change due to a rapidly changing world. But we should help shift commonsense morality in a better direction, not spread utilitarianism.
Maybe utilitarianism is an info hazard not worth spreading. If something is worth spreading, I suspect it’s virtues.
Which virtues? Some have suggestions.
What links here?
- How Useful is Utilitarianism? by Richard Y Chappell🔸 (Sep 8, 2022, 12:01 AM; 18 points)

TW123 Aug 27, 2022, 6:36 PM
3 points
0 ∶ 0
on: Seeking Student Submissions: Edit Your Source Code Contest
Cool! This is highly related to the concept of the Transformative Experience. The author of that book suggests, for instance, that having a child is a bit like “editing your own source code” (though she doesn’t use those words). Might be a useful thing to look into for people writing for this.

TW123 Aug 19, 2022, 10:58 PM
4 points
0 ∶ 0
in reply to: Utilop’s comment on: Announcing the Introduction to ML Safety Course
I have another post planned in a few weeks, in which I will probably include something like this. If you haven’t already seen, we made a post about the ML safety component of the course here (though this doesn’t answer the question about a formal program). We are already going to be running a version of it in the fall at some universities, but if anyone else is interested imminently in running it, please DM me here!

TW123 Aug 16, 2022, 5:48 AM
8 points
0 ∶ 0
in reply to: Jack R’s comment on: Yale EA got an office. How did it go?
There was a fridge and the snacks were laid out very nicely (though I don’t have any pictures, maybe somebody else can share). I personally thought that our room was a pretty nice space, though the hallway/entryway was weird and kind of depressing. The biggest thing that made it nice for me was the view:

TW123 Aug 15, 2022, 7:12 PM
2 points
0 ∶ 0
in reply to: plex’s comment on: List of AI safety courses and resources
Oh, thanks, missed that form in the sheet. Might be worth updating this forum post with the form because it currently says:
Please let us know if you notice anything that we’re missing or that we need to update by commenting below. We’ll update the sheet in response to comments.

TW123 Aug 14, 2022, 6:19 PM
2 points
0 ∶ 0
on: List of AI safety courses and resources
I think the Introduction to ML Safety course would be a good addition!

TW123 Aug 11, 2022, 9:15 PM
5 points
0 ∶ 0
in reply to: Earthling’s comment on: Let’s not glorify people for how they look.
Yes you’re right, I edited the phrasing to be more what I meant.

TW123 Aug 11, 2022, 3:38 PM
19 points
1 ∶ 0
on: Let’s not glorify people for how they look.
People will always notice how other people look, and people will always try to look attractive if they can pull it off. That much is unavoidable and I don’t think it makes much sense to try to fight it. As you point out, some of the statements you mentioned are jokes.

That being said, I think EA leaders who find themselves targets of this sort of veneration ought to express outward annoyance towards it, in a way that feels serious rather than jocular. Every movement has the tendency to idolize some small cadre of leaders; and in this movement, especially, we need to avoid that as much as possible. Prominent EAs frequently say that this isn’t “their movement” and that they don’t set all the directions, presumably to discourage people from thinking too much of them. As public figures, they should say the same thing about their looks.

Declaring things publicly can draw too much attention to it, so I’m not sure that’s the best strategy. Rather, it likely makes sense for their annoyance to be declared privately in response to specific instances of this problem. Maybe this is already happening; I’m not sure.

TW123 Aug 11, 2022, 7:08 AM
33 points
0 ∶ 0
on: Against longtermism
I’m quite happy that you are thinking critically about what you are reading! I don’t think you wrote a perfect criticism (see below), but the act of taking the time to write a criticism and posting it to a public venue is not an easy step. EA always needs people who are willing and eager to probe its ethical foundations. Below I’m going to address some of your specific points, mostly in a critical way. I do this not because I think your criticism is bad (though I do disagree with a lot of it), but because I think it can be quite useful to engage with newer people who take the time to write reasonably good reactions to something they’ve read. Hopefully, what I say below is somewhat useful for understanding the reasons for longtermism and what I see as some flaws in your argument. I would love for you to reply with any critiques of my response.
This has been quoted several times, even though it’s an absurd argument on its face. Imagine the world where Cleopatra skipped dessert. How does this cure cancer?
It doesn’t, and that’s not Parfit’s point. Parfit’s point is that if one were to employ a discount rate, Cleopatra’s dessert would matter more than nearly anything today. Since (he claims) this is clearly wrong, there is something clearly wrong with a discount rate.
Most of the 80000 hours article attempts to persuade the reader that longtermism is morally good, by explaining the reasons that we should consider future people. But the part about how we are able to benefit future people is very short.
Well yes, but that’s because it’s in the other pages linked there. Mostly, this has to do with thinking about whether existential risks exist soon, and whether there is anything we can do about them. That isn’t really in the scope of that article but I agree the article doesn’t show it.
The world is a complex system, and trying to affect the far future state of a complex system is a fool’s errand.
That isn’t entirely true. There are some things that routinely affect the far future of complex systems. For instance, complex systems can collapse, and if you can get them to collapse, you can pretty easily affect its far future. If it’s about to collapse due to an extremely rare event, then preventing that collapse can affect its far future state.
Let’s look at a few major breakpoints and see whether longtermism was a significant factor.
Obviously, it wasn’t. But of course it wasn’t! There wasn’t even longtermism at all, so it wasn’t a significant factor in anyone’s decisions. Maybe you are trying to say “people can make long term changes without being motivated by longtermism.” But that doesn’t say anything about whether longtermism might make them better at creating long term changes than they otherwise would be.
We can achieve longtermism without longtermism
I generally agree with this and so do many others. For instance see here and here. However, I think it’s possible that this may not be true at some time in the future. I personally would like to have longtermism around, in case there is really something where it matters, mostly because I think it is roughly correct as a theory of value. Some people may even think this is already the case. I don’t want to speak for anyone, but my sense is that people who work on suffering risk are generally considering longtermism but don’t care as much about existential risk.
The main point is that intervening for long term reasons is not productive, because we cannot assume that interventions are positive. Historically, interventions based on “let’s think long term”, instead of solving an immediate problem, have tended to be negative or negligible in effect.
First, I agree that interventions may be negative, and I think most longtermists would also strongly agree with this. In terms of whether historical “long term” interventions have been negative, you’ve asserted it but you haven’t really shown it. I would be very interested in research on this; I’m not aware of any. If this were true, I do think that would be a knock against longtermism as a theory of action (though not decisive, and not against longtermism as a theory of value). Though it maybe could still be argued that we live at “the hinge of history” where longtermism is especially useful.
I made some distinguishment between theory of value and theory of action. A theory of value (or axiology) is a theory about what states of the world are most good. For instance, it might say that a world with more happiness, or more justice, is better than a world with less. A theory of action is a theory about what you should do; for instance, that we should take whichever action produces the maximum expected happiness. Greaves and MacAskill make the case for longtermism as both. But it’s possible you could imagine longtermism as a theory of value but not a theory of action.
For instance, you write:
Some blood may be shed and lives may be lost, but the expected value is strongly positive.
Various philosophers, such as Parfit himself, have suggested that for this reason, many utilitarians should actually “self-efface” their morality. In other words, they should perhaps start to believe that killing large numbers of people is bad, even if it increases utility, because they might simply be wrong about the utility calculation, or might delude themselves into thinking what they already wanted to do produces a lot of utility. I gave some more resources/quotes here.
Thanks for writing!

TW123 Aug 9, 2022, 5:30 AM
4 points
0 ∶ 0
on: How/When Should One Introduce AI Risk Arguments to People Unfamiliar With the Idea?
I think Is Power-Seeking AI An Existential Risk is probably the best introduction, though it’s probably too long as a first introduction if the person is yet that motivated. It’s also written as a list of propositions, with probabilities, and that might not appeal to many people.
I also listed some shorter examples in this post for the AI Safety Public Materials Bounty we’re running, that might be more suitable as a first introduction. Here are the ones most relevant to people not versed in machine learning:
- AI risk executive summary (2014)
- Robert Miles’ YouTube channel (2017-present)
- AGI Safety From First Principles (2020)
- The case for taking AI risk seriously as a threat to humanity (2020)
The competition is also trying to get more, because I think there is a lot more that can be done.

TW123 Aug 7, 2022, 12:22 AM
4 points
0 ∶ 0
on: [AMA] Announcing Open Phil’s University Group Organizer and Century Fellowships
Thanks for all the work you are doing here, I think some really amazing groups could come out of this. I am cautiously excited about many different kinds of groups starting.
I found it a bit surprising that the list of criteria for group organizers (including “nice to have”) doesn’t seem to have anything like “really cares about the objectives of their group,” “really cares about improving the long term future,” “is altruistic to some degree”
1. Being truth-seeking and open-minded
2. Having a strong understanding of whatever topic their group is about, and/or being self-aware about gaps in understanding
3. Being socially skilled enough that people won’t find them highly offputting (note that this is a much lower bar than being actively friendly, extroverted, etc.)
Secondary “nice-to-have” desiderata include:
- Taking ideas seriously
- Being conscientious
- Being ambitious / entrepreneurial
- Being friendly / outgoing
- Having good strategic judgment in what activities their group should be doing
- Actively coming off as sharp in conversation, such that others find them fun to have object-level discussions with
Maybe this is just implicit? But it seems useful to make it explicit. Otherwise, this list kind of makes me think of really nice, friendly, hardworking philosophers who do not actually behave any more ethically. Great understanding and leadership is not enough; group leaders need to actually care about whatever the thing is. Maybe this is what “taking ideas seriously” is supposed to mean (I never quite understand that phrase, people seem to use it in different ways)? If so, it seems like a must have, not a nice to have.

TW123 Aug 6, 2022, 12:38 AM
4 points
0 ∶ 0
in reply to: aog’s comment on: $20K in Bounties for AI Safety Public Materials
We don’t expect the work to be published anywhere when it’s submitted.
For certain pieces, we may work with authors to publish them somewhere, publish them on our website, or adapt them and publish an adapted version somewhere. But this is not guaranteed.
In general, we expect that the best pieces will be generally suited for an audience of either smart people who don’t know about ML, or ML researchers. Though there is a lot of room for pieces that are more optimized for particular audiences and venues, we think that more general pieces would serve as great inspiration for those later pieces.

TW123 Aug 5, 2022, 4:07 AM
6 points
0 ∶ 0
in reply to: Yitz’s comment on: $20K in Bounties for AI Safety Public Materials
I edited the title to say “$20k in bounties” to make it more clear.
From the original text:
Winners of the bounty will win $2,000 each, for a total of up to ten possible bounty recipients.
This doesn’t mean each person who submits an entry gets $2,000. We will award this to entries that meet a high bar for quality (roughly, material that we would actually be interested in using for outreach).

TW123 Aug 5, 2022, 1:14 AM
5 points
0 ∶ 0
in reply to: Benjamin_Todd’s comment on: Some updates in EA communications
I missed that part of footnote 3, it does seem to address a lot of what I said. I appreciate your response.
I do think the vast majority of people will not read footnote 3, so it’s important for the main body of the text (and the visuals) to give the right impression. This means comparing averages to averages, or possible tail events to possible tail events. It sounds like this is your plan now, and if so that’s great!

TW123 Aug 4, 2022, 9:14 PM
7 points
0 ∶ 0
in reply to: ClaireZabel’s comment on: [Linkpost] Criticism of Criticism of Criticism, ACX
Yes, that’s my mistake, sorry.

TW123 Aug 4, 2022, 9:13 PM
6 points
0 ∶ 0
in reply to: TW123’s comment on: [Linkpost] Criticism of Criticism of Criticism, ACX
Posted too soon, didn’t realize he had changed his mind about crossposting, please ignore.

TW123 Aug 4, 2022, 9:11 PM
11 points
0 ∶ 0
on: [Linkpost] Criticism of Criticism of Criticism, ACX
I linkposted this when it came out, and Devin Kalish sent this comment:
A quick note, this piece was already posted to the forum briefly, and then deleted. The author said in a comment that he would rather it not be crossposted to this forum:
https://astralcodexten.substack.com/p/criticism-of-criticism-of-criticism/comment/7853073
I don’t know if the two are related, but I might reach out to ask him if he’s alright with you posting it.
Which led to me taking down my post, since I don’t really like to crosspost things if people prefer that I not.
Just wanted to let you know!

TW123 Aug 3, 2022, 11:20 PM
26 points
1 ∶ 0
on: Consequentialists (in society) should self-modify to have side constraints
This point is covered quite well by Derek Parfit in his seminal book Reasons and Persons, Chapter 1, Part 17. In my view the entire chapter is excellent and worth reading, but here is an excerpt from Part 17:
Consider, for example, theft. On some versions of C [Consequentialism], it is intrinsically bad if property is stolen. On other versions of C [such as hedonistic utilitarianism], this is not so. On these versions, theft is bad only when it makes the outcome worse. Avoiding theft is not part of our ultimate moral aim. But it might be true that it would make the outcome better if we were strongly disposed not to steal. And it might make the outcome better if we believed stealing to be intrinsically wrong, and would feel remorse when we do steal. Similar claims might be made about many other kinds of act.
This paragraph, I think, is especially relevant for EA:
This suggests that the most that could be true is that C is partly self-effacing. It might be better if most people caused themselves to believe some other theory, by some process of self-deception that, to succeed, must also be forgotten. But, as a precaution, a few people should continue to believe C, and should keep convincing evidence about this self-deception. These people need not live in Government House, or have any other special status. If things went well, the few would do nothing. But if the moral theory believed by most did become disastrous, the few could then produce their evidence. When most people learnt that their moral beliefs were the result of self-deception, this would undermine these beliefs, and prevent the disaster.
Edit: I also recommend the related When Utilitarians Should Be Virtue Theorists.
What links here?
- TW123's comment on Against longtermism by Brian Lui (Aug 11, 2022, 7:08 AM; 33 points)

TW123

We can achieve longtermism without longtermism