There are a couple of strong “shoulds” in the EA Handbook (I went through it over the last two months as part of an EA Virtual program) and they stood out to me as the most disagreeable part of EA philosophy that was presented.
VictorW
Thanks for the clarification about how 1 and 2 may look very different in the EA communities.
I’m not particularly concerned about the thought that people might be out there taking maximization too far, the framing of my observations is more like “well here’s what going through the EA Handbook may prompt me to think about EA ideas or what other EAs may believe.
After thinking about your reply, I realized that I made a bunch of assumptions based on things that might just be incidental and not strongly connected. I came to the wrong impression that the EA Handbook is meant to be the most canonical and endorsed collection of EA fundamentals.
Here’s how I ended up there. In my encounters hearing about EA resources, the Handbook is the only introductory “course”, and presumably due to being the only one of its kind, it’s also the only one that’s been promoted to me via over multiple mediums. So I assumed that it must be the most official source of introduction, remaining alone in that spot over multiple years, seeing it bundled with EA VP also seemed like an endorsement. I also made the subconscious assumption that since there’s plenty of alternative high quality EA writing out there, as well as resources put into writing, that the Handbook as a compilation is probably designed to be the most representative collection of EA meta, otherwise it wouldn’t still be promoted the way it has been to me.
I’ve had almost no interaction with the EA Forum before reading the Handbook, so very limited prior context to gauge how “meta” the Handbook is among EA communities, or how meta any of its individual articles are. (Which now someone has helpfully provided a bunch of reading material that is also fundamental but while having quite different perspectives.)
I feel like psychology offers some pretty standard solutions to disillusionment, and have light-heartedly thought about whether providing an EA-targeted charity/service to address this could be worthwhile.
However, there is an ethical dilemma here or two which I mulled over for years in other contexts, with no conclusion:
1. The perfect prevention and cure for disillusionment would likely mean fewer smart people stay in EA. I.e., we successfully dissuade people who would have joined EA and become disillusioned from ever committing to EA in the first place. Of those who we retain, the plus is that they are doing so for the right reasons for them and thus in a sustainable way. We probably improve open-mindedness and decrease groupthink in the community too. Is this net positive? Is it net positive on average if disillusioned EAs had never been a part of it instead?2. A side-effect of an effective cure for disillusionment is increased life satisfaction. Could too much life satisfaction cause decreased productivity or drive within EA? (I don’t have any evidence for this, it’s just a thought.)
Does anyone have quick tips on who to ask for feedback on project ideas?
I feel like the majority of feedback I get falls into:No comment
Not my field and I don’t really see the value in that compared to other things
Not my field but have you considered doing it this alternative way [which seems broadly appealing/sensible/tame but then loses the specific thesis question which I think could unlock exceptional impact]
I see the value in that and I think it’s a great idea
The truth is, no one can really talk me out of attempting that specific thesis question except by providing a definitive answer to my specific question. And if the only feedback I get is one of the above, then I might as well only ask people who will be encouraging of my ideas and potentially recommend other people that are good to talk to.
It’s relatively uncommon for me to get feedback at the level of tweaking my ideas. Is it even worth trying to solicit that given how few people are in a good position to do so?
You’re probably right, I’m asking the wrong people. I don’t know if there are many of the right people to ask within EA or outside of EA. The project cause area is mental health / personal development as it relates to well-being measured through life satisfaction. I feel like my potential sources of targeted feedback are highly constrained because:
Most non-EA professional coaches, therapists or psychologists are not equipped to consider my proposals given that life satisfaction is a relatively disconnected concept from their work. (As strange as that may sound). I also find that more experienced professionals seem to apply route knowledge and apparently seem reluctant to deviate from considering anything outside of what they practice.
Relatively few EAs seem to have an interest in mental health as a cause area beyond general knowledge of its existence, let alone specific knowledge.
I suspect my ideas are somewhat wild compared to normal thinking, and I think it would take other people who have their own wild thoughts to comfortably critique mine.
I’m very excited by the work HLI is doing.
I’m a little confused by what psychotherapy refers to in this post, is this going by HLI’s contextual definition “any form of face-to-face psychotherapy delivered to groups or by non-specialists deployed in LMICs”?
I guess not strictly related then, but I’d be interested to know if anyone is aware of cost-effectiveness analyses using WELLBYs for one-on-one traditional psychotherapy/counselling, as this is a relevant baseline for a project I’m drafting.
During EAGxVirtual, I hesitantly reached out to almost everyone tagged with a particular affiliation group to ask for their brief input, and was very positively surprised by 1) the ratio of responses to non-responses, and 2) that the responses were more positive and enthusiastic than I might have expected.
That’s clear to me now and thank you also for the pointer on 1 to 1 effectiveness!
Does anyone have a resource that maps out different types/subtypes of AI interpretability work?
E.g. mechanistic interpretability and concept-based interpretability, what other types are there and how are they categorised?
One of the canonical EA books (can’t remember which) suggests that if an individual stops consuming eggs (for example), almost all the time this will have zero impact, but there’s some small probability that on some occasion it will have a significant impact. And that can make it worthwhile.
I found this reasonable at the time, but I’m now inclined to think that it’s a poor generalization where the expected impact still remains negligible in most scenarios. The main influence for my shift is when I think about how decisions are made within organizations, and how power-seeking approaches are vastly superior to voting in most areas of life where the system exceeds a threshold of complexity.
Anyone care to propose updates on this topic?
One of those sources (“Compassion, by the Pound”) estimates that reducing consumption by one egg results in an eventual fall in production by 0.91 eggs, i.e., less than a 1:1 effect.
I’m not arguing against the idea that reducing consumption leads to a long-term reduction in production. I’m doubtful that we can meaningfully generalise this kind of reasoning across different specifics as well as distinct contexts without investigating it practically.
For example, there probably exist many types of food products where reducing your consumption only has like a 0.1:1 effect. (It’s also reasonable to consider that there are some cases where reducing consumption could even correspond with increased production.) There are many assumptions in place that might not hold true. Although I’m not interested in an actual discussion about veganism, one example of a strong assumption that might not be true is that the consumption of egg is replaced by other food sources that are less bad to rely on.
I’m thinking that the overall “small chance of large impact by one person” argument probably doesn’t map well to scenarios where voting is involved, one-off or irregular events, sales of digital products, markets where the supply chain changes over time because there’s many ways to use those products, or where excess production can still be useful. When I say “doesn’t map well”, I mean that the effect of one person taking action could be anywhere between 0:1 to 1:1 compared to what happens when the sufficient number of people simultaneously make the change in decision-making required for a significant shift. If we talk about one million people needing to vote differently so that a decision is reversed, the expected impact of my one vote is always going to be less than 100% of one millionth, because it’s not guaranteed that one million people will sway their vote. If there’s only a 10% chance of the one million swayed votes, I’d think my expected impact to come out at far less than even 0.01:1 from a statistical model.
This is because the mechanism used to change levels of production is similar in these cases.
I’m unclear on the exact mechanism and suspect that the anecdote of “the manager sees the reduced demand across an extended period and decides to lower their store’s import by the exact observed reduction” is a gross oversimplification of what I would have guessed is a complex system where the manager isn’t perfectly rational, may have long periods without review due to contractual reasons, the supply chain lasting multiple parties all with non-linear relationships. Maybe some food supply chains significantly differ at the grower’s end, or in different countries. My missing knowledge here is why I don’t think I have a good reason to assume generality.
Other animal products
I think your cow leather example highlights the idea that for me threatens simplistic math assumptions. Some resources are multi-purpose, and can be made into different products through different processes and grades of quality depending on the use case. It’s pretty plausible that eggs are either used for human consumption or hatching. Some animal products might be more complicated and be used for human consumption or non-human consumption or products in other industries. It seems reasonable for me to imagine a case where decreasing human consumption results in wasted production which “inspires” someone to redirect that production to another product/market which becomes successful and results in increased non-dietary demand. I predict that this isn’t uncommon and could dilute some of the marginal impact calculations which are true short-term but might not play out long-term. (I’m not saying that reducing consumption isn’t positive expectation, I’m saying that the true variance of the positive could be very high over a long-term period that typically only becomes clear in retrospect.)
Voting
Thanks for that reference from Ord. I stand updated on voting in elections. I have lingering skepticism about a similar scenario that’s mathematically distinct: petition-like scenarios. E.g. if 100k people sign this petition, some organization is obliged to respond. Or if enough students push back on a school decision, the school might reconsider. This is kind of like voting except that the default vote is set. People who don’t know the petition exists have a default vote. I think the model described by Ord might still apply, I just haven’t got my head around this variation yet.
Brilliant, thank you. One of the very long lists of interp work on the forum seemed to have everything as mech interp (or possibly I just don’t recognize alternative key words). Does the EA AI safety community feel particularly strongly about mech interp or is it just my sample size being too small?
To add on to this vibe of “getting dogpiled is an unusually stressful experience that is probably hard to imagine accurately”, I feel a bit strange to be reading so many “reasoned” comments about how specific improvements in replies/wordings could have been decisively accurate/evident, as though anything less seems like a negative sign.
I relate to that logically as an observer, but at the same time I don’t particularly think the whole sea of suggestions are meaningfully actionable. I think a lot of time and thought went into these posts, virtually any variant would still be vulnerable to critique because we have limited time/energy, let alone the fact that we’re human beings and it’s more than okay to produce incomplete/flawed work. Like what expectations are we judging others by in this complex situation, and would we really be able to uphold our own expectations, let alone the combined expectations of hundreds of people in the community? It’s insanely hard to communicate all the right information in one go, and that’s why we have conversations. Though this broader discussion of “what’s the real story” isn’t one that I consider myself entitled to, nor do I think we should all be entitled to it just because we’re EAs.
Asking out of ignorance here, as I was only exposed to the general news version and not EA perspectives about FTX. What difference would it have made if FTX fraud was uncovered before things crashed? Is it really that straightforward to conclude that most of the harm done would have been preventable?
I’ll respond to one aspect you raised that I think might be more significant than you realize. I’ll paint a black and white picture just for brevity.
If you’re running organizations and do so for several years with dozens of employees across time, you will make poor hiring decisions at one time or another. While making a bad hire seems bad, avoiding this risk at all costs is probably a far inferior strategy. If making a bad hire doesn’t get in the way of success and doing good, does it even make sense to fixate on it?
Also, if you’re blind to the signs before it happens, then you reap the consequences, learn an expensive lesson, and are less likely to make it in future, at least for that type of deficit in judgment. Sometimes the signs are obvious after having made an error, though occasionally the signs are so well hidden that anyone with better judgment than you could have still have made the same mistake.
The underlying theme I’m getting at is that embracing mistakes and imperfection is instrumental. Although many EAs might wish that we could all just get hard things right the first time all the time, that’s not realistic. We’re flawed human beings and respecting the fact of our limitations is far more practical than giving into fear and anxiety about not having ultimate control and predictability. If anything, being willing to make mistakes is both rational and productive compared to other alternatives.
What I think I’m hearing from you (and please correct me if I’m not hearing you) is that you feel conflicted by the thought that the efforts of good people with good intentions can be so easily be undone, and that you wish there were some concrete ways to prevent this happening to organizations, both individually and systemically. I hear you on thinking about how things could work better as a system/process/community in this context. (My response won’t go into this systems level, not because it’s not important, but because I don’t have anything useful to offer you right now.)
I acknowledge your two examples (“Alice and Chloe almost ruined an organization) and (keeping bad workers anonymous has negative consequences). I’m not trying to dispute these or convince you that you’re wrong. What I am trying to highlight is that there is a way to think about these that doesn’t involve requiring us to never make small mistakes with big consequences. I’m talking about a mindset, which isn’t a matter of right or wrong, but simply a mental model that one can choose to apply.
I’m asking you to stash away your being right and whatever you perspective you think I hold for a moment and do a thought experiment for 60 seconds.
At t=0, it looks like ex-employee A, with some influential help, managed to inspire significant online backlash against organization X led by well-intentioned employer Z.
It could easily look like Z’s project is done, their reputation is forever tarnished, their options have been severely constrained. Z might well feel that way themselves.
Z is a person with good intentions, conviction, strong ambitions, interpersonal skills, and a good work ethic.
Suppose that organization X got dismantled at t=1 year. Imagine Z’s “default trajectory” extending into t=2 years. What is Z up to now? Do you think they still feel exactly the way they did at t=0?
At t=10, is Z successful? Did the events of t=0 really ruin their potential at the time?
At t=40, what might Z say recalling the events of t=0 and how much that impacted their overall life? Did t=0 define their whole life? Did it definitely lead to a worse career path, or did adaptation lead to something unexpectedly better? Could they definitely say that their overall life and value satisfaction would have been better if t=0 never played out that way?
In the grand scheme of things, how much did t=0 feeling like “Z’s life is almost ruined” translate into reality?
If you entertained this thought experiment, thank you for being open to doing so.
To express my opinion plainly, good and bad events are inevitable, it is inevitable that Z will make mistakes with negative consequences as part of their ambitious journey of life. Is it in Z’s best interests to avoid making obvious mistakes? Yes. Is it in their best interests to adopt a robust strategy such that they would never have fallen victim to t=0 events or similarly “bad” events at any other point? I don’t think so necessarily, because: we don’t know without long-term hindsight whether “traumatic” events t=0 lead to net positive changes or not; even if Z somehow became mistake-proof-without-being-perfect, that doesn’t mean something as significant as t=0 couldn’t still happen to them without them making a mistake; and lastly because being that robust is practically impossible for most people.
All this to say, without knowing whether “things like t=0” are “unequivocally bad to ever let happen”, I think it’s more empowering to be curious about what we can learn from t=0 than to arrive at the conclusion at t<1 that preventing it is both necessary and good.
I relate to your write-up on a personal level, as I can easily see myself having the same behavioral preferences as well as modes of imperfection as you if I was in a similar situation.
And with that in mind, there’s only one thing that I’m confused about:A thing I feel particularly bad about is not confronting Sam at any point about the ways he hurt people I care about.
What would that confrontation have looked like? How would you have approached it, even taking into account hindsight wisdom but without being a time-travelling mind-reader?
In that confrontation, what would you be asking for from Sam? (E.g., explanation? reassurance?apology? listening to your concerns?)
Thanks for entertaining my thought experiment, and I’m glad because I better understand your perspective too now, and I think I’m in full agreement with your response.
A shift of topic content here, feel free to not engage if this doesn’t interest you.
To share some vague thoughts about how things could be different. I think that posts which are structurally equivalent to a hit piece can be considered against the forum rules, either implicitly already or explicitly. Moderators could intervene before most of the damage is done. I think that policing this isn’t as subjective as one might fear, and that certain criteria can be checked even without any assumptions about truthfulness or intentions. Maybe an LLM could work for flagging high-risk posts for moderators to review.Another angle would be to try and shape discussion norms or attitudes. There might not be a reliable way to influence this space, but one could try for example by providing the right material that would better equip readers to have better online discussions in general as well as recognize unhelpful/manipulative writing. It could become a popular staple much like I think “Replacing Guilt” is very well regarded. Funnily enough, I have been collating a list of green/orange/red flags in online discussions for other educational reasons.
“Attitudes” might be way too subjective/varied to shape, whereas I believe “good discussion norms” can be presented in a concrete way that isn’t inflexibly limiting. NVC comes to mind as a concrete framework, and I am of the opinion that the original “sharing information” post can be considered violent communication.
Is there a viable version of non-catering where the organizers promise nothing but also invite a bunch of food trucks to camp by the venue? Perhaps that could help at least some of the attendees have fast and affordable food options.