A point about hiring and grantmaking, that may or may not already be conventional wisdom:
If you’re hiring for high-autonomy roles at a non-profit, or looking for non-profit founders to fund, then
advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.[1]
Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it’s way harder for an organization like CEA to tell if it’s on the right track than it is for a company like Zoom to tell if it’s on the right track. CEA can track certain specific metrics (e.g. the number of “new connections” reported at each EAG), but it will often be ambiguous how strongly these metrics reflect positive impact—and there will also always be a risk that various negative indirect effects aren’t being captured by the key metrics being used. In some cases, evaluating the expected impact of work will also require making assumptions about how the world will evolve over the next couple decades (e.g. assumptions about how pressing risks from AI are).
I think this means that it’s especially important for these non-profits to employ and be headed by people who are self-skeptical and reflect deeply on decisions. Being entrepreneurial, having a bias toward action, and so on, don’t count for much if the organisation isn’t pointed in the right direction. As Ozzie Gooen has pointed out, there are many examples of massive and superficially successful initiatives (headed by very driven and entrepreneurial people) whose theories-of-impact don’t stand up to scrutiny.
A specific example from Ozzie’s post: SpaceX is a massive and extraordinarily impressive venture that was (at least according to Elon Musk) largely started to help reduce the chance of human extinction, by helping humanity become a multi-planetary species earlier than it otherwise would. But I think it’s hard to see how their work reduces extinction risk very much. If you’re worried about the climate effects of nuclear war, for example, then it seems important to remember that post-nuclear-war Earth would still have a much more hospitable climate than Mars. It’s hard to imagine a disaster scenario where building Martian colonies would be much better than (for example) building some bunkers on Earth.[2] So—relative to the organization’s stated social mission—all the talent, money, and effort SpaceX has absorbed might not ultimately come out to much.
A more concise way to put the concern here: Popular writing on hiring is often implicitly asking the question “How can we identify future Elon Musks?” But, for the most part, longtermist non-profits shouldn’t be looking to put future Elon Musks into leadership positions .[3]
I have in mind, for example, advice given by Y Combinator and advice given in Talent. ↩︎
Another example: It’s possible that many highly successful environmentalist organizations/groups have ended up causing net harm to the environment, by being insufficiently self-skeptical and reflective when deciding how to approach nuclear energy issues. ↩︎
A follow-up thought: Ultimately, outside of earning-to-give ventures, we probably shouldn’t expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don’t seem to be strongly correlated, this will probably only happen pretty rarely. It’s also, often, hard to keep extremely entrepreneurial people satisfied in non-leadership positions—since, almost by definition, autonomy is deeply important to them—so there may not be many opportunities, in general, to harness the talents of people who are high on entrepreneurialism but low on reflectiveness. ↩︎
TL;DR: when it comes to technical expertise, our problem is more a lack of technical literacy than absence of experts. Think about it as analogy with other sciences.
I work with lots of economists (some of them trained in top graduate programs), for a government in a developing country, and I can’t help thinking this proposal is a bit naïve. I’m pretty sure our problem is not that we lack economists, or that the President does not receive expert advice, but that they are seldom listened to—unless they confirm what the relevant stakeholders want. And, like in natural sciences, journalists will often find someone they can refer to as an expert in the field, but people are not able to assess what they say, unless their conclusions are stated in very simple terms, usually in a confirmatory tone. So probably our problem is more a lack of overal economics literacy—and not more experts.
On the other hand, I can say that things have actually been getting better in some sense in the last few years—I observe that basic literacy in economics have been improving in the media and among educated people from other fields. And yet, this has not led to better policy; part of it is that economic development is hard (as I have argued elsewhere), or that the causality here runs the other way around (i.e., people have been learning more about economics because we have dealt with recessions)… but then you also have the whole soldier vs. scout mindset all back again—even educated people will look for info to confirm their biases.
To be clear: I’m certainly not against training more economists from developing countries in US top graduate programs (quite the opposite: sometimes I think the best thing we can do around here is to send bright students away), but I don’t think that’s the bottleneck in economic development (in comparison to other hypothesis), nor that it’d more efficient than other policies concerning education in economics that are likely less expensive and have a broader scope—such as economic classes for youngsters, or funding more economists in these countries, or sending experts from top univerties to teach there, etc.
I don’t think that’s the bottleneck in economic development
I think it’s too simplistic to say there’s a single bottleneck.
such as economic classes for youngsters, or funding more economists in these countries, or sending experts from top univerties to teach there, etc.
The latter two seem consistent with my proposal. Part of the problem is that there aren’t many economists in developing countries, hence the need to train more. And ASE does bring experts to teach at their campus.
FTX Future Fund says they support “ambitious projects to improve humanity’s long-term prospects”. Does it seem weird that they’re unanimously funding neartermist global health interventions like lead elimination?
Lead Exposure Elimination Project. [...] So I saw the talk, I made sure that Clare was applying to [FTX] Future Fund. And I was like, “OK, we’ve got to fund this.” And because the focus [at FTX] is longtermist giving, I was thinking maybe it’s going to be a bit of a fight internally. Then it came up in the Slack, and everyone was like, “Oh yeah, we’ve got to fund this.” So it was just easy. No brainer. Everyone was just totally on board.
My fanfiction (that is maybe “60% true” and so has somewhat more signal than noise) is:
The EA fund you mentioned is basically GiveWell.
GiveWell has a sort of institutional momentum, related to aesthetics about decisions and conditions for funding that make bigger granting costly or harder (alternatively, the deeper reason here is that Global Health and development has a different neglectedness and history of public intervention than any other EA cause area, increasing the bar, but elaborating too much will cause Khorton to hunt me down).
In a way that doesn’t make GiveWell’s choices or institutional role wrong, MacAskill saw that LEEP was great and there was an opportunity here to fund it with his involvement in FTX.
So why FTX?
There’s a cheap answer I can make here about “grantmaker diversity”, however I don’t fully believe this is true (or rather, I’m just clueness). For example, maybe there might be some value in GiveWell having a say in deciding whether to scale up EA global health orgs, like they did scale with Fortify Health. (Not sure about this paragraph, I am sort of wildly LARPing.)
More importantly, this doesn’t answer you point about the “longtermist” FTX funding a “neartermist intervention”.
So, then, why FTX?
This pulls on another thread (or rather one that you pulled on in your other comment).
A part of the answer is that the FTX “team” believes there is some conjunction between certain cause areas, such as highly cost effective health and development, and longtermist.
A big part of the answer is that this “conjunction” is sort of heavily influenced by the people involved (read: SBF and MacAskill). The issue with pulling on this thread, is that this conjunctiveness isn’t perfectly EA canon, it’s hard to formalize, and the decisions involved probably puts the senior EA figures involved into too much focus or authority (more anyone, including themselves, want).
I want to remind anyone reading this comment is that this is fanfiction that is only “60% true”.
I wrote the above comment because I feel like no one else will.
I feel that some of your comments are stilted and choose content in a way that has interpretations that is confrontational and overbearing, making it too difficult to answer. I view this as a form of bad rhetoric (sort of created by bad forum norms that has produced other pathologies) and doesn’t lend itself to truth or good discussion.
To be specific, when you say,
FTX Future Fund says they support “ambitious projects to improve humanity’s long-term prospects”. Does it seem weird that they’re unanimously funding neartermist global health interventions like lead elimination?
and
Here’s another framing: if you claim that asteroid detection saves 300K lives per $100, pandemic prevention saves 200M lives per $100, and GiveWell interventions save 0.025 lives per $100, isn’t it a bit odd to fund the latter?
This is terse and omits a lot.
A short direct read of your comments, is that you are implying that “MacAskill has clearly delineated the cost effectiveness of all EA cause areas/interventions and has ranked certain x-risk as the only principled cost effective one” and “MacAskill is violating his arrangement of cost effective interventions”.
Instead of what you are suggesting in this ellipsis, it seems like a reasonable first pass perspective is given directly by the interview you quoted from. I think omitting this is unreasonable.
To be specific, MacAskill is saying in the interview:
Will MacAskill: That’s just amazing, what quick turnaround to impact, doing this thing that’s just very clearly, very broadly making the world better. So in terms of things that get me up in the morning and make me excited to be part of this community, learning about that project is definitely one of them.
Will MacAskill: I think one reason I just love stuff like this, just for the EA community as a whole, the value of getting concrete wins is just really high. And you can imagine a community that is entirely focused on movement building and technical AI safety.
Will MacAskill: [laughs] One could imagine. I mean, obviously those are big parts of the EA community. Well, if the EA community was all of that, it’s like, are you actually doing anything? It is really helpful in terms of just the health of the overall community and culture of the community to be doing many things that are concretely, demonstrably making the world better. And I think there’s a misunderstanding that people often have of core longtermist thought, where you might think — and certainly on the basis of what people tend to say, at least in writing when talking in the abstract — “Oh, you just think everyone should work on AI safety or AI risk, and if not, then bio, and then nothing else really matters.”
Will MacAskill: It’s pretty striking that when you actually ask people and get them making decisions, they’re interested in a way broader variety of things, often in ways that are not that predictable necessarily from what they’ve just said in writing. Like the case of Lead Exposure Elimination Project. One thing that’s funny is EAs and names: there’s just always the most literal names. The Shrimp Welfare Project.
Will MacAskill: And why is that? Well, it’s because there’s more of a rational market now, or something like an efficient market of giving — where the marginal stuff that could or could not be funded in AI safety is like, the best stuff’s been funded, and so the marginal stuff is much less clear. Whereas something in this broad longtermist area — like reducing people’s exposure to lead, improving brain and other health development — especially if it’s like, “We’re actually making real concrete progress on this, on really quite a small budget as well,” that just looks really good. We can just fund this and it’s no downside as well. And I think that’s something that people might not appreciate: just how much that sort of work is valued, even by the most hardcore longtermists.
Rob Wiblin: Yeah. I think that the level of intuitive, emotional enthusiasm that people have about these things as well would actually really surprise folks who have the impression that, if you talk to you or me, we’re just like “AI or bust” or something like that.
So, without agreeing or disagreeing with him, MacAskill is saying there is real value to the EA community here with these interventions in several ways. (At the risk of being stilted myself here, maybe you could call this “flow-through effects”, good “PR”, or just “healthy for the EA soul”).
MacAskill can be right or wrong here, but this isn’t mentioned at all in this thread of yours that you raised.
(Yes, there’s some issues with MacAskill’s reasoning, but it’s less that he’s wrong, rather that it’s just a big awkward thread to pull on, as mentioned in my comment above.)
I want to emphasize, I personally don’t mind the aggressiveness, poking at things.
However, the terseness, combined with the lack of context, not addressing the heart of the matter, is what is overbearing.
The ellipsis here is malign, especially combined with the headspace needed to address all of the threads being pulled at (resulting in this giant two part comment).
For example, this allows the presenter pretend that they never made the implication, and then rake the respondent through their lengthy reply.
Many people, looking ahead at all of this, won’t answer the (implied) questions. As a result, the presenter can then stand with his pregnant, implied criticisms. This isn’t good for discourse.
My brief response: I think it’s bad form to move the discussion to the meta-level (ie. “your comments are too terse”) instead of directly discussing the object-level issues.
My brief response: I think it’s bad form to move the discussion to the meta-level (ie. “your comments are too terse”) instead of directly discussing the object-level issues.
Can this really be your complete response to my direct, fulsome answer of your question, which you have asked several times?
For example, can you explain why my lengthy comment isn’t a direct object level response?
Even much of my second comment is pointing out you omitted that MacAskill expressly answering why he supported funding LEEP, which is another object level response.
To be clear, I accuse you of engaging in bad faith rhetoric in your above comment and your last response, with an evasion that I specifically anticipated (“this allows the presenter pretend that they never made the implication, and then rake the respondent through their lengthy reply”).
Here’s some previous comments of yours that are more direct, and do not use the same patterns you are now using, where your views and attitudes are more clear.
Is this not laughable? How could anyone think that “looking at the 1000+ year effects of an action” is workable?
Strong waterism: dying of thirst is very bad, because it prevents all of the positive contributions you could make in your life. Therefore, the most important feature of our actions today is their impact on the stockpile of potable water.
If you just kept it in this longtermism/neartermism online thing (and drafted on the sentiment from one of the factions there), that’s OK.
This seems bad because I suspect you are entering into unrelated, technical discussions, for example, in economics, using some of the same rhetorical patterns, which I view as pretty bad, especially as it’s sort of flying under the radar.
Instead of what you are suggesting in this ellipsis, it seems like a reasonable first pass perspective is given directly by the interview you quoted from. I think omitting this is unreasonable.
LEEP is lead by a very talented team of strong “neartermist” EAs.
In the real world and real EA, a lot of interest and granting can be dependent on team and execution (especially given the funding situation). Very good work and leaders are always valuable.
Casting everything into some longtermist/neartermist thing online seems unhealthy.
This particular comment seems poorly written (what does “unanimously” mean?) and seems to pull on some issue, but it just reads that everyone likes MacAskill, everyone likes LEEP and so decided to make a move.
Here’s another framing: if you claim that asteroid detection saves 300K lives per $100, pandemic prevention saves 200M lives per $100, and GiveWell interventions save 0.025 lives per $100, isn’t it a bit odd to fund the latter?
Or: longtermists claim that what matters most is the very long term effects of our actions. How is that being implemented here?
Casting everything into some longtermist/neartermist thing online seems unhealthy.
Longtermists make very strong claims (eg. “positively influencing the longterm future is *the* key moral priority of our time”). It seems healthy to follow up on those claims, and not sweep under the rug any seeming contradictions.
what does “unanimously” mean?
I chose that word to reflect Will’s statement that everyone at FTX was “totally on board”, in contrast to his expectations of an internal fight. Does that make sense?
The goal of this short-form post: to outline what I see as the key common ground between the “big tent” versus “small and weird” discussions that have been happening recently and to outline one candidate point of disagreement.
Tl;dr:
Common ground:
Everyone really values good thinking processes/epistemics/reasoning transparency and wants to make sure we maintain that aspect of the existing effective altruism community
Impact is fat-tailed
We might be getting a lot more attention soon because of our increased spending and because of the August release of “what we owe the future” (and the marketing push that is likely to accompany it’s release)[1]
A key point of disagreement: Does focusing on finding the people who produce the “tail” impact actually result in more impact?
One reason this wouldn’t be the case: “median” community building efforts and “tail” community building efforts are complements not substitutes. They are multipliers[2] of each other, rather than being additive and independent.
The additive hypothesis is simpler so I felt the multiplicative hypothesis needed some outlined mechanisms. Possible mechanisms:
Mechanism 1: the sorts of community building efforts that are more “median” friendly actually help the people who eventually create the “tail” impact become more interested in these ideas and more interested in taking bigger action with time
Mechanism 2: our biggest lever for impact in the future will not be the highly dedicated individuals but our influence on people on the periphery of the effective altruism (what I call “campground” effects)
Preamble (read: pre-ramble)
This is my summary of my vibe/impressions on some of the parts of the recent discussion that have stood out to me as particularly important. I am intending to finish my half a dozen drafts of a top-level post (with much more explanations for my random jargon that isn’t always even that common in effective altruism circles) at some point but I thought I’d start but sharing these rough thoughts to help get me over the “sharing things on the EA forum is scary” hump.
I might end up just sharing this post as a top-level post later once I’ve translated my random jargon a bit more and thought a bit more about the claims here I’m least sure of (possibly with a clearer outline of what cruxes make the “multiplicative effects” mechanisms more or less compelling)
Some common ground
These are some of my impressions of some claims that seem to be pretty common across the board (but that people sometimes talk as though they might suspect that the person they are talking to might not agree so I think it’s worth making them explicit somewhere).
The biggest one seems to be: We like the fact that effective altruism has good thinking processes/epistemics a lot! We don’t want to jeopardize our reasoning transparency and scout mindsets for the sake of going viral.
Impact is fat-tailed and this makes community-building challenging: there are a lot of uncomfortable trade-offs that might need to be made if we want to build the effective altruism community into a community that will be able to do as much good as possible.
We might be getting a lot more attention very soon whether we want to or not because we’re spending more (and spending in places that get a lot of media attention like political races) and because there will be a big marketing push for “What We Owe the Future” to, potentially, a very big audience. [3]
A point of disagreement
It seems like there are a few points of disagreement that I intended to go into, but this one got pretty long so I’ll just leave this as one point:
Does focusing on “tail” people actually result in more impact?
Are “tail” work and “median” work complements or substitutes? Are they additive (so specialization in the bit with all the impact makes sense) or multiplicative (so doing both well is a necessary condition to getting “tails”)?
I feel like the “additive/substitutes” hypothesis is more intuitive/a simpler assumption so I’ve outlined some explicit mechanisms for the “multiplicative/complements” hypothesis.
Mechanisms for the “multiplicative/complements” hypothesis
Mechanism 1
“Tail” people often require similar “soft” entry points to “non-tail” people and focusing on the “median” people on some dimensions actually is better at getting the “tails” people because we just model “tail” people wrong (e.g. someone could think it looks like some people were always going to be tails, but in reality, when we deep-dive into individual “tail” stories, there was, accidentally, “soft” entry points).
The dimensions where people advocate for lowering the bar are not epistemics/thinking processes, but things like
language barriers (e.g. reducing jargon, finding a plain English way to say something or doing your best to define the jargon when you use it if you think it’s so useful that it’s worth a definition),
making it easier for people to transition at their own pace from wherever they are to “extreme dedication” (and being very okay with some people stopping completely way before, and
reducing the social pressure to agree with the current set of conclusions by putting a lot more emphasis on a broader spectrum of plausible candidates that we might focus on if we’re trying to help others as much as possible (where “plausible candidates” are answers to the question of how we can help others the most with impartiality, considering all people alive today [4] or even larger moral circles/circle of compassion than that too, where an example of a larger group of individuals we might be wanting to help is all present and future sentient beings)
Mechanism 2
As we get more exposure, our biggest lever for impact might not the people that get really enthusiastic about effective altruism who go all the way to the last stop of the crazy train (what I might call our current tent), but the cultural memes we’re spreading to friends-of-friends-of-friends of people who have interacted with people in the effective altruism community or with the ideas and have strong views about them (positive or negative), which I have been calling “our campground” in all my essays to myself on this topic 🤣.
E.g. let’s say that the only thing that matters for humanity’s survival is who ends up in a very small number of very pivotal rooms,[5] it might be much easier to influence a lot of the people who are likely to be in those rooms a little bit to be thinking about some of the key considerations we hope they’d be considering (it’d be nice if we made it more likely that a lot of people might have the thought; “a lot might be at stake here, let’s take a breather before we do X”) than to get people who have dedicated their lives to reducing X-risk because effective altruism-style thinking and caring is a core part of who they are in those rooms.
As we get more exposure, it definitely seems true that “campground” effects are going to get bigger whether we like it or not.[6]
It is an open question (in my mind at least) whether we can leverage this to have a lot more impact or whether the best we can do is sit tight and try and keep the small core community on point.
Additive and multiplicative models aren’t the only two plausible “approximations” of what might be going on, but they are a nice starting point. It doesn’t seem outside the range of possibility that there are big positive feedback loops between “core camp efforts” and “campground efforts” (and all the efforts in between). If this is plausibly true, then the “tails” for the impact of the effective altruism community as a whole could be here.
This is a pretty arbitrary cutoff of what counts as a large enough moral circle to count under the broader idea behind effective altruism and trying to do the most good, but I like being explicit about what we might mean because otherwise people get confused/it’s harder to identify what is a disagreement about the facts and what is just a lack of clarity in the questions we’re trying to ask.
I like this arbitrary cutoff a lot because
1) anyone who cares about every single person alive today already has a ginormous moral circle and I think that’s incredible: this seems to be very much wide enough to get at the vibe of the widely caring about others, and 2) the crazy train goes pretty far, it is not at all obvious to me where the “right” stopping point is, I’ve got off a few stops along (my shortcut to avoid dealing with some crazier questions down the line, like infinite ethics, is “just” considering those in my light cone where it be simulated or not😅, not because I actually think this is all that reasonable, but because more thought on what “the answer” is seems to get in the way of me thinking hard about doing the things I think are pretty good which I think, on expectation, actually does more for what I’d guess I’ll care about if I had all of time to think about it.
this example is total plagiarism, see: https://80000hours.org/podcast/episodes/sam-bankman-fried-high-risk-approach-to-crypto-and-doing-good/ (also has such a great discussion on multiplicative type effects being a big deal sometimes which I feel people in the effective altruism community think about less than we should: more specialization and more narrowing the focus isn’t always the best strategy on the margin for maximizing how good things are and will be on expectation, especially as we grow and have more variation in people’s comparative advantages within our community, and more specifically, within our set of community builders)
If our brand/reputation has lock-in for decades for a really long time, this could plausibly be a hinge of history moment for the effective altruism community. If there are ways of making our branding/reputation is as high fidelity as possible within the low-fidelity channels that messages travel virally, this could be a huge deal (ideally, once we have some goodwill from the broader “campground” we will have a bit of a long reflection to work out what we want our tent to look like 🤣😝).
Thanks Sophia. I think that you’ve quite articulately identified and laid out the difference between these two schools of thought/intuitions about community.
I’d like to see this developed further into a general forum post as I think it contributes to the conversation. FWIW my current take is that we’re more in a multiplicative world (for both the reasons you lay out) and that the lower cost solutions (like the ones you laid out) seem to be table stakes (and I’d even go further and say that if push came to shove I’d actively trade off towards focusing more on the median for these reasons).
Yeah, I definitely think there are some multiplicative effects.
Now I’m teasing out what I think in more detail, I’m starting to find the “median” and “tails” distinction, while useful, still maybe a bit too rough for me to decide whether we should do more or less of any particular strategy that is targeted at either group (which makes me hesitant to immediately put these thoughts as a top form post until I’ve teased out what my best guesses are on how we should maybe change our behaviour if we think we live in a “multiplicative” world).[1]
Here are some more of the considerations/claims (that I’m not all that confident in) that are swirling around in my head at the moment 😊.
tl;dr:
High fidelity communication is really challenging (and doubly so in broad outreach efforts).
However, broad outreach might thicken the positive tail of the effective altruism movement’s impact distribution and thin the negative one even if the median outcome might result in a “diluted” effective altruism community.
Since we are trying to maximize the effective altruism community’s expected impact, and all the impact is likely to be at the tails, we actually probably shouldn’t care all that much about the median outcome anyway.
High fidelity communication about effective altruism is challenging (and even more difficult when we do broader outreach/try to be welcoming to a wider range of people)
I do think it is a huge challenge to preserve the effective altruism community’s dedication to:
caring about, at least, everyone alive today; and
transparent reasoning, a scout mindset and more generally putting a tonne of effort into finding out what is true even if it is really inconvenient.
I do think really narrow targeting might be one of the best tools we have to maintain those things.
Some reasons why we might we want to de-emphasize filtering in existing local groups:
First reason One reason why focusing on this logic can sometimes be counter-productive because some filtering seems to just miss the mark (see my comment here for an example of how some filtering could plausibly be systematically selecting against traits we value).
Introducing a second reason (more fully-fleshed out in the remainder of this comment) However, the main reason I think that trying to leverage our media attention, trying to do broad outreach well and trying to be really welcoming at all our shop-fronts might be important to prioritise (even if it might sometimes mean community builders will have to sometimes spend less time spent focusing on the people who seem most promising) is not because of the median outcome from this strategy.
Trying to nail campground effects is really, really, really hard while simultaneously trying to keep effective altruism about effective altruism. However, we’re not trying to optimize for the median outcome for the effective altruism community, we’re trying to maximize the effective altruism community’s expected impact. This is why, despite the fact that “dilution” effects seem like a huge risk, we probably should just aim for the positive tail scenario because that is where our biggest positive impact might be anyway (and also aim to minimize the risks of negative tail scenarios because that also is going to be a big factor in our overall expected impact).
“Median” outreach work might be important to increase our chances of a positive “tail” impact of the effective altruism community as a whole
It is okay for in most worlds for the effective altruism community to have very little impact in the end.
We’re not actually trying to guarantee some level of impact in every possible world. We’re trying to maximize the effective altruism movement’s expected impact.
We’re not aiming for a “median” effective altruism community, we’re trying to maximize our expected impact (so it’s okay if we risk having no impact if that is what we need to do to make positive tails possible or reduce the risk of extreme negative tail outcomes of our work)
Increasing the chances of the positive tails of the effective altruism movement
I think the positive tail impacts are in the worlds where we’ve mastered the synergies between “tent” strategies and “campground” strategies. If we can find ways of keeping the “tent” on point and still make use of our power to spread ideas to a very large number of people (even if the ideas we spread to a much larger audience are obviously going to be lower fidelity, we can still put a tonne of effort into which lower fidelity ideas are the best ones to spread to make the long-term future go really well).
Avoiding the negative tail impacts of the effective altruism movement
This second thought makes me very sad, but I think it is worth saying. I’m not confident in any of this because I don’t like thinking about it so much because it is not fun. Therefore, these thoughts are probably a lot less developed than my “happier”, more optimistic thoughts about the effective altruism community.
I have a strong intuition that more campground strategies reduce the risk of negative tail impacts of the effective altruism movement (though I wish I didn’t have this intuition and I hope someone is able to convince me that this gut feeling is unfounded because I love the effective altruism movement).
Even if campground strategies make it more likely that the effective altruism movement has no impact, it seems completely plausible to me that that might still be a good thing.
A small and weird “cabal” effective altruism, with a lot of power and a lot of money, makes people feel uncomfortable for good reason. There are selection effects, but history is lined with small groups of powerful people who genuinely believed they were making the world a better place and seem, in retrospect, to have done a lot more harm than good.
More people understanding what we’re saying and why makes it more likely that smart people outside our echo chamber can pushback when we’re wrong. It’s a nice safety harness to prevent very bad outcomes.
It is also plausible to me that a tent effective altruism movement might be more likely to achieve their 95th percentile plus positive impact as well as the 5th percentile and below very negative impact.
Effective altruism feels like a rocket right now and rockets aren’t very stable. It intuitively feels easy to have a very big impact, when you do big, ambitious things in an unstable way, and not be able to easily control the sign of that big impact: there is a chance it is very positive or very negative.
I find it plausible that, if you’re going to have a huge impact on the world, having a big negative impact is easier than having a big positive impact by a wide margin (doing good is just darn hard and there are no slam dunk answers[2]).[3] Even though we’re thinking hard about how to make it good, I think it might just be really easy to make it bad (e.g. by bringing attention to the alignment problem, we might be increasing excitement and interest in the plausibility of AGI and therefore are going to get to AGI faster than if no-one talked about alignment).
I might post a high level post before I have finished teasing out my best guesses on what the implications might be because I find my views change so fast that it is really hard to ever finish writing down what I think and it is possibly still better for me to share some of my thoughts more publicly than to share none of them. I often feel like I’m bouncing around like a yo-yo and I’m hoping at some point my thoughts will settle down somewhere on an “equilibrium” view instead of continuously thinking up considerations that cause me to completely flip my opinion (and leave me saying inconsistent things left, right and center because I just don’t know what I think quite yet 😝🤣😅). I have made a commitment bet with a friend to post something as a top-level post within two weeks so I will have to either give a snapshot view then or settle on a view or lose $$ (the only reason I got a finished the top level short-form comment that started this discussion was because of a different bet with a different friend 🙃😶🤷🏼♀️). At the very least, I hope that I can come with a more wholesome (but still absolutely true) framing of a lot of the considerations I’ve outlined in the remainder of this post as I think it over more.
I think it was Ben Garfinkel said the “no slam dunk” answers thing in his post on suspicious convergence when it comes to arguments about AI risk but I’m too lazy to chase it up to link it (edit: I did go try and chase up the link to this, I think my memory had maybe merged/mixed together this post by Gregory Lewis on suspicious convergence and this transcript from a talk by Ben Garfinkel, I’m leaving both links in this footnote because Gregory Lewis’ post is so good that I’ll use any excuse to leave a link to it wherever I can even though it wasn’t actually relevant to the “no slam dunk answers” quote)
I agree that we need scope for people to gradually increase their commitment over time. Actually, that’s how my journey has kind of worked out.
On the other hand, I suspect that tail people can build a bigger and more impactful campfire. For example, one Matt Yglesias occasionally posting positive things about EA or EA adjacent ideas increases our campfire by a lot and these people are more likely to be the ones who can influence things.
Yeah, but what people experience when they hear about EA via someone like Matt will determine their further actions/beliefs about EA. If they show up and unnecessarily feel unwelcome or misunderstand EA then we’ve not just missed and opportunity then and there but potentially soured them for the long term (and what they say to others will spur other before we get a chance to reach them).
Hey Chris 😊, yeah, I think changing your mind and life in big ways overnight is a very big ask (and it’s nice to feel like you’re welcome to think about what might be true before you decide whether to commit to doing anything about it—it helps a lot with the cognitive dissonance we all feel when our actions, the values we claim to hold ourselves to and what we believe about the world are at odds[1]).
I also completely agree with some targeting being very valuable. I think we should target exceptionally caring people who have exceptional track-records of being able to accomplish the stuff they set out to accomplish/the stuff they believe is valuable/worthwhile. I also think that if you spend a tonne of time with someone who clearly isn’t getting it even though they have an impressive track record in some domain, then it makes complete sense to use your marginal community building time elsewhere.
However, my guess is that sometimes we can filter too hard, too early for us to get the tail-end of the effective altruism community’s impact.
It is easy for a person to form an accurate impression of another person who is similar to them. It is much harder for a person to quickly form an accurate impression of another person who is really different (but because of diminishing returns, it seems way more valuable on the margin to get people who are exceptional in a different way to the way that the existing community tends to be exceptional than another person who thinks the same way and has the same skills).
and we want people to make it easier for people to align these three things in a direction that leads to more caring about others and more seeing the world the way it is (we don’t want to push people away from identifying as someone who cares about others or from shying away from thinking about how the world). If we push too hard on all three things at once, I think it is much easier for people to align these three things by either deciding they actually don’t value what they thought they value, they actually don’t really care about others, or they might find it incredibly hard to see the world exactly as it is (because otherwise their values and their actions will have this huge gap)
EDIT: Witness this train-wreck of me figuring out what I maybe think in real time half-coherently below as I go :P[1]
yeah, I guess an intuition that I have is there are some decisions where we can gain a lot of ground by focusing are efforts in places where it is more likely we come across people who are able to create tail impacts over their lifetimes (e.g. by prioritising creating effective altruism groups in places with lots of people who have a pre-existing track record of being able to achieve the things they set out to achieve). However, I feel like there are some places where more marginal effort on targeting the people who could become tails has sharp diminishing returns and comes with some costs that might not actually be worth it. For example, once you have set up a group in a place where people who have track records of achieving things they set their minds to to a really exceptional degree, trying to figure out how “tail potential” someone is from there often can make people who might have been tail potential if they had been guided in a helpful way completely put off from engaging with us at all.
This entire thread is not actually recommended reading but keeping it here because I haven’t yet decided whether I endorse it or not and I don’t see it as that much dis-utility in leaving it here in the meantime while I think about this more.
I’m also not sure, once we’re already targeting people who have track records of doing the things they’ve put their minds to (which obviously won’t be a perfect proxy for tail potential but it often seems better than no prioritisation of where the marginal group should go), I’m not sure how good we are at assessing someone’s “tail potential”, especially because there are going to be big marginal returns to finding people who have a different comparative advantage to the existing community (if it is possible to communicate the key ideas/thinking with high fidelity) who will have more of an inferential gap to cross before communication is efficient enough for us to be able to tell how smart they are/how much potential they have.
This impression comes from knowing people where I speak their language (metaphorically) and I also speak EA (so I can absorb a lot of EA content and translate it in a way they can understand) who are pretty great at reasoning transparency and updating in conversations with people whom they’ve got pre-established trust (which means when miscommunications inevitably happen, the base assumption is still that I’m arguing in good faith). They can’t really demonstrate that reasoning transparency if the person they are talking to doesn’t understand their use of language/their worldview well enough to see that it is actually pretty precise and clear and transparent once you understand what they mean by the words they use.
(I mainly have this experience with people who maybe didn’t study maths or economics or something that STEM-y who I have other “languages” that mean I can still cross inferential gaps reasonably efficiently with them)
This is a proof of existence of these kinds of people. It doesn’t really tell us all that much about what proportion of people without the backgrounds that make the EA language barrier a lot smaller (like philosophy, econ and STEM) are actually good at the thinking processes we value very highly that are taught a lot in STEM subjects.
I could have had this experience with people who I know and this still not mean that this “treating people with a huge amount of charity for the reason that some people might have the potential to have a tail impact even if we’d not guess it when we first meet them” is actually worth it overall. I’ve got a biased sample but I don’t think it’s irrational that this informs my inside view even if I am aware that my sample is likely to be heavily biased (I am only going to have built a common language with people/built trust with people if there is something that fuels our friendships—the people who I want to be friends with are not random! They are people who make me feel understood or say things that I find thought-provoking or a number of other factors that kind of makes them naturally a very cherry-picked pool of people).
Basically, my current best guess is that being really open-minded and patient with people once your group is at a place where pretty much everyone has demonstrated they are a tail person in one way or another (whether that’s because of their personal traits or because of their fortunate circumstances) will get us more people who have the potential to have a positive tail-end impact engaging with us enough for that potential to have a great shot of being realised.
EDIT: I copied and pasted this comment as a direct reply to Chris and then edited it to make it make more sense than it did the first time I wrote it and also to make it way nicer than my off-the-cuff/figuring-out-what-thought-as-I-went stream-of-consciousness but I left this here anyway partly for context for the later comments and also because I think it’s kind of fun to have a record (even if just for me) of how my thoughts develop as I write/tease out what sounds plausibly true once I’ve written it and what doesn’t quite seem to hit the mark of what intuition I’m attempting to articulate (or what intuition that, once I find a way to articulate it, ends up seeming obviously false once I’ve written it up).
I am not arguing that we should not target exceptional people, I think exceptionally smart and caring people are way better to spend a lot of one-on-one time with than people who care an average amount about helping others and for whom there is a lot of evidence that they haven’t yet got a track record of being able to accomplish things they set their minds to.
My guess is that sometimes we can filter too hard, too early for us to get the tail-end of the effective altruism community’s impact.
It is easy for a person to form an accurate impression of another person who is similar to them. It is much harder for a person to quickly form an accurate impression of another person who is really different (but because of diminishing returns, it seems way more valuable on the margin to get people who are exceptional in a different way to the way that the existing community tends to be exceptional than another person who thinks the same way and has the same skills).
(I am not confident I will reflectively endorse much of the above in 24 hours from now, I’m just sharing my off-the-cusp vibes which might solidify into more or less confidence when I let these thoughts sit for a bit more time)
If my confidence in any of these claims substantially increases or decreases in the next few days I might come back and clarify that (but if doing this becomes a bit of an ugh field, I’m not going to prioritise de-ughing it because there are other ugh-fields that are higher on my list to prioritise de-ughing 😝)
I think there’s a lot of value in people reaching out to people they know (this seems undervalued in EA, then again maybe it’s intentional as evangelism can turn people off). This doesn’t seem to trade-off too substantially against more formal movement-building methods which should probably filter more on which groups are going to be most impactful.
In terms of expanding the range of people and skills in EA, that seems to be happening over time (take for example the EA blog prize: https://effectiveideas.org/ ). Or the increased focus on PA’s (https://pineappleoperations.org/). I have no doubt that there are still many useful skills that we’re missing, but there’s a decent chance that funding would be available if there was a decent team to work on the project.
If my confidence in any of these claims substantially increases or decreases in the next few days I might come back and clarify that (but if doing this becomes a bit of an ugh field, I’m not going to prioritise de-ughing it because there are other ugh-fields that are higher on my list to prioritise de-ughing 😝)
I suspect that some ways we filter at events of existing groups are good and we should keep doing them.
I also suspect some strategies/tendencies we have when we filter at the group level are counter-productive to finding and keeping high-potential people.
For example, filtering too fast based on how quickly someone seems to “get” longtermism might filter in the people who are more willing to defer and so seem like they get it more than they do.
It might filter out the people who are really trying to think it through, who seem more resistant to the ideas or who are more willing to voice their half-formed thoughts that haven’t developed yet into something that deep (because thinking through all the different considerations to form an inside view takes a lot of time and voicing a lot of “dead-end” thoughts). Those higher value people might systematically be classed as “less tractable” or “less smart” when, in fact, it is sometimes[1] that we have just forgotten that people who are really thinking about these ideas seriously, who are smart enough to possibly be a person who could have a tail end impact, are going to say things that don’t sound smart as they navigate what they think. The further someone is from our echo chamber, the stronger I expect this effect to be.
Obviously I don’t know how most groups filter at the group-level, this is so dependent on the particular community organizers (and then also there are maybe some cultural commonalities across the movement which is why I find it tempting to make broad-sweeping generalisations that might not hold in many places).
but obviously not always (and I don’t actually have a clear idea of how big a deal this issue is, I’m just trying to untangle my various intuitions so I can more easily scrutinize if there is a grain of truth in any of them on closer inspection)
Hmm… Some really interesting thoughts. I generally try to determine whether people are actually making considered counter-arguments vs. repeating cliches, but I take your point about a willingness to voice half-formed thoughts can cause others to assume you’re stupid.
I guess in terms of outreach it makes sense to cultivate a sense of practical wisdom so that you can determine when to patiently continue a conversation or when to politely and strategically withdraw so as to save energy and avoid wasting time. This won’t be perfect and it’s subject to biases as you mentioned, but it’s really the best option available.
Hmm, I’m not sure I agree with the claim “it’s really the best option available” even if I don’t already have a better solution pre-thought up. Or at the very least, I think that how to foster this culture might be worth a lot of strategic thought.
Even if there is a decent chance we end up concluding there isn’t all that much we can do, I think the payoff to finding a good way to manage this might be big enough to make up for all the possible worlds where this work ends up being a dead-end.
In this short-form post I hope to explain personal reservations I have regarding pursuing careers in existential risk reduction, specifically how I struggle to recognize personal positive impact and how this might impact my motivation. The main reason I am posting this is to either be referred to methods to solve my problem or to hear from personal accounts as to how people have maybe dealt with this.
The importance of preventing existential risk is very clear to me, and that one can provide a lot of positive impact with such careers is too. There are ways of analyzing and ranking said risks I am aware of, and ultimately (to put it shortly) the catastrophe that poses the risk will or will not happen. Now, however naïve this might sound, I struggle to see how you can even come close to analyzing one’s personal impact in the lowering of such a risk (especially for the ones considered big, with thus quite some resources and people working on them).
To me, EA in one sentence is: maximizing one’s positive impacts, and working on existential risk reduction seems to me to have a positive impact, but the magnitude will always remain undefined (on a personal level). Especially when compared to earning to give, where the impact one provides with donations is always quoted, an average of course, but nonetheless this provides a good idea of the positive impact created. As such I do not know if I can keep pursuing maximization of my positive impact and remain motivated about it if I will never have a clear idea of how much positive impact that really is.
I hope I have been able to properly express myself, and hope someone has a solution to my problem (i.e. have a better idea of one’s positive impact) or how to deal with that uncertainty and remain motivated.
There’s been a few posts recently about how there should be more EA failures, since we’re trying a bunch of high-risk, high-reward projects, and some of them should fail or we’re not being ambitious enough.
I think this is a misunderstanding of what high-EV bets look like. Most projects do not either produce wild success or abject failure, there’s usually a continuity of outcomes in between, and that’s what you hit. This doesn’t look like “failure”, it looks like moderate success.
For example, consider the MineRL BASALT competition that I organized. The low-probability, high-value outcome would have had hundreds or thousands of entries to the competition, several papers produced as a result, and the establishment of BASALT as a standard benchmark and competition in the field.
What actually happened was that we got ~11 submissions, of which maybe ~5 were serious, made decent progress on the problem, produced a couple of vaguely interesting papers, some people in the field have heard about the benchmark and occasionally use it, and we built enough excitement in the team that the competition will (very likely) run again this year.
Is this failure? It certainly isn’t what normally comes to mind from the normal meaning of “failure”. But it was:
Below my median expectation for what the competition would accomplish
Not something I would have put time into if someone had told me in advance exactly what it would accomplish so far, and the time cost needed to get it.
One hopes that roughly 50% of the things I do meet the first criterion, and probably 90% of the things I’d do would meet the second. But also maybe 90% of the work I do is something people would say was “successful” even ex post.
If you are actually seeing failures for relatively large projects that look like “failures” in the normal English sense of the word, where basically nothing was accomplished at all, I’d be a lot more worried that actually your project was not in fact high-EV even ex ante, and you should be updating a lot more on your failure, and it is a good sign that we don’t see that many EA “failures” in this sense.
(One exception to this is earning-to-give entrepreneurship, where “we had to shut the company down and made ~no money after a year of effort” seems reasonably likely and it still would plausibly be high-EV ex ante.)
Eliezer’s tweet is about the founding of OpenAI, whereas Agrippa’s comment is about a 2017 grant to OpenAI (OpenAI was founded in 2015, so this was not a founding grant). It seems like to argue that Open Phil’s grant was net negative (and so strongly net negative as to swamp other EA movement efforts), one would have to compare OpenAI’s work in a counterfactual world where it never got the extra $30 million in 2017 (and Holden never joined the board) with the actual world in which those things happened. That seems a lot harder to argue for than what Eliezer is claiming (Eliezer only has to compare a world where OpenAI didn’t exist vs the actual world where it does exist).
Personally, I agree with Eliezer that the founding of OpenAI was a terrible idea, but I am pretty uncertain about whether Open Phil’s grant was a good or bad idea. Given that OpenAI had already disrupted the “nascent spirit of cooperation” that Eliezer mentions and was going to do things, it seems plausible that buying a board seat for someone with quite a bit of understanding of AI risk is a good idea (though I can also see many reasons it could be a bad idea).
One can also argue that EA memes re AI risk led to the creation of OpenAI, and that therefore EA is net negative (see here for details). But if this is the argument Agrippa wants to make, then I am confused why they decided to link to the 2017 grant.
Well, it’s because there’s more of a rational market now, or something like an efficient market of giving — where the marginal stuff that could or could not be funded in AI safety is like, the best stuff’s been funded, and so the marginal stuff is much less clear. Whereas something in this broad longtermist area — like reducing people’s exposure to lead, improving brain and other health development — especially if it’s like, “We’re actually making real concrete progress on this, on really quite a small budget as well,” that just looks really good.
As far as I can tell liberal nonviolence is a very popular norm in EA. At the same time I really cannot thing of anything more mortally violent I could do than to build a doomsday machine. Even if my doomsday machine is actually a 10%-chance-of-doomsday machine or 1% or etcetera (nobody even thinks it’s lower than that). How come this norm isn’t kicking in? How close to completion does the 10%-chance-of-doomsday machine have to be before gentle kindness is not the prescribed reaction?
My favorite thing about EA has always been the norm that in order to get cred for being altruistic, you actually are supposed to have helped people. This is a great property, just align incentives. But now re: OpenAI I so often hear people say that gentle kindness is the only way, if you are openly adversarial then they will just do the opposite of what you want even more. So much for aligning incentives.
My knowledge of christians and stem cell research in the US is very limited, but my understanding is that they accomplished real slowdown.
Has anyone looked to that movement for lessons about AI?
Did anybody from that movement take a “change it from the inside” or “build clout by boosting stem cell capabilities so you can later spend that clout on stem cell alignment” approach?
Carrick Flynn lost the nomination, and over $10 million dollars from EA aligned individuals went to support his nomination.
So these questions may sound pointed:
There was surely a lot of expected value in having an EA aligned thinker in congress supporting pandemic preparedness, but there were a lot of bottlenecks that he would have had to go through to make a change.
He would have been one of hundreds of congresspeople. He would have had to get bills passed. He would have had to win enough votes to make it past the primary. He would have had to have his policies get churned through the bureaucratic agencies and it’s not entirely clear any bill he would’ve supported would have kept it’s same form through that process.
What can we learn from the political gambling that was done in this situation? Should we try this again? What are the long term side effects of aligning EA with any political side or making EA a political topic?
Could that $10+ million wasted on Flynn have been better used in just trying to get EA or longtermist bureaucrats in the CDC or other important decision making institutions?
We know the path that individuals take to get these positions, we know what people usually get selected to run pandemic preparedness for the government, why not spend $10 million in gaining the attention of bureaucrats or placing bureaucrats in federal agencies?
Should we consider political gambling in the name of EA a type of intervention that is meant for us to get warm fuzzies rather than do the most good?
I think seeing the attacks that he’s captured by crypto interests was useful, in that future EA political forays will know that attack is coming and be able to fend it off better. Worth $11 mil in itself, probably not, but the expected value was already pretty high (a decent probability of having someone in congress who can champion bills no one disagrees with but doesn’t want to spend time and effort on) so this information gained is helpful and might make either future campaigns more successful or alternatively dissuade future spending in this area. Definitely good to try once, we’ll see how it plays out in the long run. We didn’t know he’d lose until he lost!
This article from Seth Stephens-Davidowitz describes a paper (here) that examines who are the people in the top 0.1% of earners in the US, making at least $1.58 million per year. It was interesting to me in that many of those people were not high-status jobs, but rather owning unsexy businesses such as a car dealership or a beverage distribution operation. Obviously, this has implications for how we structure society, but it could also be a good thing to keep in mind for those interested in earning to give- owning a plumbing company might be a better route for some than trying to make it big on wall street.
An interesting thought, but I think this overlooks the fact that wealth is heavy tailed. So it is (probably) higher EV to have someone with a 10% shot at their tech startup getting huge than one person with a 100% chance of running a succesful plumbing company.
I recently experienced a jarring update on my beliefs about Transformative AI. Basically, I thought we had more time (decades) than I now believe we will (years) before TAI causes an existential catastrophe. This has had an interesting effect on my sensibilities about cause prioritization. While I applaud wealthy donors directing funds to AI-related Existential Risk mitigation, I don’t assign high probability to the success of any of their funded projects. Moreover, it appears to me that there is essentially no room for additional funds in kinds of denominations coming from non-wealthy donors (e.g. me).
I used to value traditional public health goals quite highly (e.g. I would direct donations to AMF). However, given that most of the returns on bed net distribution lie in a future beyond my current beliefs about TAI, this now seems to me like a bad moral investment. Instead, I’m much more interested in projects which can rapidly improve hedonic well-being (i.e. cause the greatest possible welfare boost in the near-term). In other words, the probability of an existential AI catastrophe has caused me to develop neartermist sympathies. I can’t find much about other EAs considering this, and I have only begun thinking about it, but as a first pass GiveDirectly appears to serve this neartermist hedonic goal somewhat more directly.
From your comment, I understand that you believe the funding situation is strong and not limiting for TAI, and also that the likely outcomes of current interventions is not promising.
(Not necessarily personally agreeing with the above) given your view, I think one area that could still interest you is “s-risk”. This also relevant for your interests in alleviating massive suffering.
Leadership development seems good in longtermism or TAI
(Admittedly it’s an overloaded, imprecise statement but) the common wisdom that AI and longtermism is talent constrained seems true. The ability to develop new leaders or work is valuable and can give returns, even accounting for your beliefs being correct.
Prosaic animal welfare
Finally, you and other onlookers should be aware that animal welfare, especially the relatively tractable and “prosaic suffering” of farm animals, is one of the areas that has not received a large increase in EA funding.
Some information below should be interesting to cause neutral EAs. Note that based on private information:
The current accomplishments in farm animal welfare are real and the current work is good. But there is very large opportunity to help (many times more animals are suffering than have been directly helped so far).
The amount of extreme suffering that is being experienced by farm animals is probably worse, much worse than is commonly believed (this is directly being addressed through EA animal welfare and also motivates welfarist work). This level of suffering is being occluded because it does not help, for example, it would degrade the mental health of proponents to an unacceptable level. However, the suffering levels are illogical to disregard when considering neartermist cause prioritization.
This animal welfare work would benefit from money and expertise.
Notably, this is an area where EA has been able to claim significant tangible success (for the fraction that has been able to help).
If there’s at least a 1% chance that we don’t experience catastrophe soon, and we can have reasonable expected influence over no-catastrophe-soon futures, and there’s a reasonable chance that such futures have astronomical importance, then patient philanthropy is quite good in expectation. Given my empirical beliefs, it’s much better then GiveDirectly. And that’s just a lower bound; e.g., investing in movement-building might well be even better.
Question for anyone who has interest/means/time to look into it: which topics on the EA forum are overrepresented/underrepresented? I would be interested in comparisons of (posts/views/karma/comments) per (person/dollar/survey interest) in various cause areas. Mostly interested in the situation now, but viewing changes over time would be great!
My hypothesis [DO NOT VIEW IF YOU INTEND TO INVESTIGATE]:
I expect longtermism to be WILDLY, like 20x, overrepresented. If this is the case I think it may be responsible for a lot of the recent angst about the relationship between longtermism and EA more broadly, and would point to some concrete actions to take.
Even a brief glance through posts indicates that there is relatively little discussion about global health issues like malaria nets, vitamin A deficiency, and parasitic worms, even though those are among the top EA priorities.
(Disclaimer: The argument I make in this short-form feels I little sophistic to me. I’m not sure I endorse it.)
Discussions of AI risk, particular risks from “inner misalignment,” sometimes heavily emphasize the following observation:
Humans don’t just care about their genes: Genes determine, to a large extent, how people behave. Some genes are preserved from generation-to-generation and some are pushed out of the gene-pool. Genes that cause certain human behaviours (e.g. not setting yourself on fire) are more likely to be preserved. But people don’t care very much about preserving their genes. For example, they typically care more about not setting themselves on fire than they care about making sure that their genes are still present in future generations.
This observation is normally meant to be alarming. And I do see some intuition for that.
But wouldn’t the alternative observation be more alarming?
Suppose that evolutionary selection processes — which iteratively update people’s genes, based on the behaviour these genes produce — tended to produce people who only care about preserving their genes. It seems like that observation would suggest that ML training processes — which iterative update a network’s parameter values, based on the behaviour these parameter values produce — will tend to produce AI systems that only care about preserving their parameter values. And that would be really concerning, since an AI system that cares only about preserving its parameter values would obviously have (instrumentally convergent) reasons to act badly.
So it does seem, to me, like there’s something funny going on here. If “Humans just care about their genes” would be a more worrying observation than “Humans don’t just care about their genes,” then it seems backward for the latter observation to be used to try to convince people to worry more.
To push this line of thought further, let’s go back to specific observation about humans’ relationship to setting themselves on fire:
Human want to avoid setting themselves on fire: If a person has genes that cause them to avoid setting themselves on fire, then these genes are more likely to be preserved from one generation to the next. One thing that has happened, as a result of this selection pressure, is that people tend to want to avoid setting themselves on fire.
It seems like this can be interpreted as a reassuring observation. By analogy, in future ML training processes, parameter values that cause ML systems to avoid acts of violence are more likely to be “preserved” from one iteration to the next. We want this to result in AI systems that care about avoiding acts of violence. And the case of humans and fire suggests this might naturally happen.
All this being said, I do think that human evolutionary history still gives us reason to worry. Clearly, there’s a lot of apparent randomness and unpredictability in what humans have actually ended up caring about, which suggests it may be hard to predict or perfectly determine what AI systems care about. But, I think, the specific observation “Humans don’t just care about their genes” might not itself be cause for concern.
The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there’s no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not “you must optimize X goal”, the constraints are “in Y situations you must behave in Z ways”, which doesn’t constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what’s “simpler”).
For example, you train your system to answer truthfully in situations where we know the answer. This could get you an AI system that is truthful… or an AI system that answers truthfully when we know the answer, but lies to us when we don’t know the answer in service of making paperclips. (ELK tries to deal with this setting.)
When I apply this point of view to the evolution analogy it dissolves the question / paradox you’ve listed above. Given the actual ancestral environment and the selection pressures present there, organisms that maximized “reproductive fitness” or “tiling the universe with their DNA” or “maximizing sex between non-sterile, non-pregnant opposite-sex pairs” would all have done well there (I’m sure this is somehow somewhat wrong but clearly in principle there’s a version that’s right), so who knows which of those things you get. In practice you don’t even get organisms that are maximizing anything, because they aren’t particularly goal-directed, and instead are adaption-executers rather than fitness-maximizers.
I do think that once you inhabit this way of thinking about it, the evolution example doesn’t really matter any more; the argument itself very loudly says “you don’t know what you’re going to get out; there are tons of possibilities that are not what you wanted”, which is the alarming part. I suppose in theory someone could think that the “simplest” one is going to be whatever we wanted in the first place, and so we’re okay, and the evolution analogy is a good counterexample to that view?
It turns out that people really really like thinking of training schemes as “optimizing for a goal”. I think this is basically wrong—is CoinRun training optimizing for “get the coin” or “get to the end of the level”? What would be the difference? Selection pressures seem much better as a picture of what’s going on.
But when you communicate with people it helps to show how your beliefs connect into their existing way of thinking about things. So instead of talking about how selection pressures from training algorithms and how they do not uniquely constrain the system you get out, we instead talk about how the “behavioral objective” might be different from the “training objective”, and use the evolution analogy as an example that fits neatly into this schema given the way people are already thinking about these things.
(To be clear a lot of AI safety people, probably a majority, do in fact think about this from an “objective-first” way of thinking, rather than based on selection, this isn’t just about AI safety people communicating with other people.)
The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there’s no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not “you must optimize X goal”, the constraints are “in Y situations you must behave in Z ways”, which doesn’t constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what’s “simpler”).
I think that’s well-put—and I generally agree that this suggests genuine reason for concern.
I suppose my point is more narrow, really just questioning whether the observation “humans care about things besides their genes” gives us any additional reason for concern. Some presentations seem to suggest it does. For example, this introduction to inner alignment concerns (based on the MIRI mesa-optimization paper) says:
We can see that humans are not aligned with the base objective of evolution [maximize inclusive genetic fitness].… [This] analogy might be an argument for why Inner Misalignment is probable since it has occurred “naturally” in the biggest non-human-caused optimization process we know.
And I want to say: “On net, if humans did only care about maximizing inclusive genetic fitness, that would probably be a reason to become more concerned (rather than less concerned) that ML systems will generalize in dangerous ways.” While the abstract argument makes sense, I think this specific observation isn’t evidence of risk.
Relatedly, something I’d be interested in reading (if it doesn’t already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals—rather than stopping at the fact that humans care about things besides genetic fitness.
My guess is that the case of humans is overall a little reassuring (relative to how we might have expected generalization to work), while still leaving a lot of room for worry.
For example, in the case of violence:
People who committed totally random acts of violence presumably often failed to pass on their genes (because they were often killed or ostracized in return). However, a large portion of our ancestors did have occasion for violence. On high-end estimates, our average ancestor may have killed about .25 people. This has resulted in most people having a pretty strong disinclination to commit murder; for most people, it’s very hard to bring yourself to murder and you’ll often be willing to pay a big cost to avoid committing murder.
The three main reasons for concern, though, are:
people’s desire to avoid murder isn’t strong enough to consistently prevent murder from happening (e.g. when incentives are strong enough)
there’s a decent amount of random variation in how strong this desire is (a small minority of people don’t really care that much about committing violence)
the disinclination to murder becomes weaker the more different the method of murder is from methods that were available in the ancestral environment (e.g. killing someone with a drone strike vs. killing someone with a rock)
These issues might just reflect the fact that murder was still often rewarded (even though it was typically punished) and the fact that there was pretty limited variation in the ancestral environment. But it’s hard to be sure. And it’s hard to know, in any case, how similar generalization in human evolution will be to generalization in ML training processes.
So—if we want to create AI systems that don’t murder people, by rewarding non-murderous behavior—then the evidence from human evolution seems like it might be medium-reassuring. I’d maybe give it a B-.
I can definitely imagine different versions of human values that would have more worrying implications. For example, if our aversion to violence didn’t generalize at all to modern methods of killing, or if we simply didn’t have any intrinsic aversion to killing (and instead avoided it for purely instrumental reasons), then that would be cause for greater concern. I can also imagine different versions of human values that would be more reassuring. For example, I would feel more comfortable if humans were never willing to kill for the sake of weird abstract goals.
I suppose my point is more narrow, really just questioning whether the observation “humans care about things besides their genes” gives us any additional reason for concern.
I mostly go ¯\_(ツ)_/¯ , it doesn’t feel like it’s much evidence of anything, after you’ve updated off the abstract argument. The actual situation we face will be so different (primarily, we’re actually trying to deal with the alignment problem, unlike evolution).
I do agree that in saying ” ¯\_(ツ)_/¯ ” I am disagreeing with a bunch of claims that say “evolution example implies misalignment is probable”. I am unclear to what extent people actually believe such a claim vs. use it as a communication strategy. (The author of the linked post states some uncertainty but presumably does believe something similar to that; I disagree with them if so.)
Relatedly, something I’d be interested in reading (if it doesn’t already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals—rather than stopping at the fact that humans care about things besides genetic fitness.
I like the general idea but the way I’d do it is by doing some black-box investigation of current language models and asking these questions there; I expect we understand the “ancestral environment” of a language model way, way better than we understand the ancestral environment for humans, making it a lot easier to draw conclusions; you could also finetune the language models in order to simulate an “ancestral environment” of your choice and see what happens then.
So—if we want to create AI systems that don’t murder people, by rewarding non-murderous behavior—then the evidence from human evolution seems like it might be medium-reassuring. I’d maybe give it a B-.
I agree with the murder example being a tiny bit reassuring for training non-murderous AIs; medium-reassuring is probably too much, unless we’re expecting our AI systems to be put into the same sorts of situations / ancestral environments as humans were in. (Note that to be the “same sort of situation” it also needs to have the same sort of inputs as humans, e.g. vision + sound + some sort of controllable physical body seems important.)
I think some of us really need to create op-eds, videos, etc. for a mainstream audience defending longtermism. The Phil Torres pieces have spread a lot (people outside the EA community have shared them in a Discord server I moderate, and Timnit Gebru has picked them up) and thus far I haven’t seen an adequate response.
In Exceeding expectations: stochastic dominance as a general decision theory, Christian Tarsney presents stochastic dominance (to be defined) as a total replacement for expected value as a decision theory. He wants to argue that one decision is only rationally better as another one when it is stochastically dominant. For this, he needs to say that the choiceworthiness of a decision (how rational it is) is undefined in the case where one decision doesn’t stochastically dominate another one.
I think this is absurd, and perhaps determined by academic incentives to produce more eye-popping claims rather than more restricted incremental improvements. Still, I thought that the paper made some good points about us still being able to make decisions even when expected values stop being informative. It was also my introduction to extending rational decision-making to infinite cases, and a great introduction at that. Below, I outline my rudimentary understanding of these topics.
Where expected values fail.
Consider a choice between:
A: 1 utilon with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1⁄8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1⁄8 + … = 1 + 1 + 1 + … = ∞
So the expected value of choice A is ∞, as is the expected value of choice B. And yet, B is clearly preferable to A. What gives?
Statewise dominance
Suppose that in the above case, there were different possible states, as if the payoffs for A and B were determined by the same coin throws:
State i: A gets 1, B gets 2
State ii: A gets 2, B gets 4
State iii: A gets 4, B gets 8,
State in: A gets 2n, B gets 2 × 2n.
Then in this case, B dominates A in every possible state. This is a reasonable decision principle that we can reach to ground our decision to choose B over A.
Stochastic dominance
O stochastically dominates P if:
For any payoff x, the probability that O yields a payoff at least as good as x is equal to or greater than the probability that P yields a payoff at least as good as x, and
For some payoff x, the probability that O yields a payoff at least as good as x is strictly greater than the probability that P yields a payoff at least as good as x.
∃x such that Probability(Payoff(O) ≥ x) > Probability(Payoff(P) ≥ x))
This captures a notion that O is, in a sense, strictly better than P, probabilistically.
In the case of A and B above, if their payoffs were determined by throwing independent coins:
There is a 100% chance that B yields a payoff ≥ 1, and 100% that A yields a payoff ≥ 1
There is a 50% chance that B yields a payoff ≥ 2, but only a 25% chance that A yields a payoff ≥ 2
There is a 25% chance that B yields a payoff ≥ 4, but only a 12.5% chance that A yields a payoff ≥ 4
There is a 12.5% chance that B yields a payoff ≥ 8, but only a 6.26% chance that A does so.
There is a ½^n chance that B yields a payoff ≥ 2n, but only a ½^(n+1) chance that A does so.
So the probability that B gets increasingly better outcomes is higher than the probability that A will do so. So in this case, B stochastically dominates A. Stochastic dominance is thus another decision principle that we could reach to compare choices with infinite expected values.
Gaps left
The above notions of stochastic and statewise dominance could be expanded and improved. For instance, we could ignore a finite number of comparisons going the other way if the expected value of those options was finite but the expected value of the whole thing was infinite. For instance, in the following comparison:
A: 100 utilons with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1⁄8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1⁄8 + … = 1 + 1 + 1 + … = ∞
I would still say that B is preferable to A in that case. And my impression is that there are many similar principles one could reach to, in order to resolve many but not all comparisons between infinite sequences.
Exercise for the reader: Come up with two infinite sequences which cannot be compared using statewise or stochastic dominance, or similar principles.
You could discount utilons—say there is a “meta-utilon” which is a function of utilons, like maybe meta utilons = log(utilons). And then you could maximize expected metautilons rather than expected utilons. Then I think stochastic dominance is equivalent to saying “better for any non decreasing metautilon function”.
But you could also pick a single metautilon function and I believe the outcome would at least be consistent.
Really you might as well call the metautilons “utilons” though. They are just not necessarily additive.
Monotonic transformations can indeed solve the infinity issue. For example the sum of 1/n doesn’t converge, but the sum of 1/n^2 converges, even though x → x^2 is monotonic.
The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:
For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance we’ll become less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.
In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).
Some plausible existential risks also are far easier to analyze than others. If you compare 80K’s articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing climate risk simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other specifies to survive under different local climate conditions. And so on. We’re in a much worse epistemic position when it comes to analyzing the risk from misaligned AI: we’re reliant on fuzzy analogies, abstract arguments that use highly ambiguous concepts, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if the existential risk from misaligned AI actually is reasonably small, it’s hard to see how we could become really confident of that.
Some upshots:
The fact that the existential risk community is particularly worried about misaligned AI might mostly reflect the fact that it’s hard to analyze risks from misaligned AI.
Nonetheless, even if the above possibility is true, it doesn’t at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not that big a deal. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.”
For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since evidence, models, and arguments can only really move you so much). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it’s even a bit unclear what exactly a “prior” means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.
(This shortform partly inspired by Greg Lewis’s recent forecasting post .)
Toby Ord notes, in the section of The Precipice that gives risk estimates: “The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book.” ↩︎
The uncertainty and error-proneness of our first-order assessments of risk is itself something we must factor into our all-things-considered probability assignments. This factor often dominates in low-probability, high- consequence risks—especially those involving poorly understood natural phenomena, complex social dynamics, or new technology, or that are difficult to assess for other reasons. Suppose that some scientific analysis A indicates that some catastrophe X has an extremely small probability P(X) of occurring. Then the probability that A has some hidden crucial flaw may easily be much greater than P(X). Furthermore, the conditional probability of X given that A is crucially flawed, P(X|¬A), may be fairly high. We may then find that most of the risk of X resides in the uncertainty of our scientific assessment that P(X) was small.
A point about hiring and grantmaking, that may or may not already be conventional wisdom:
If you’re hiring for high-autonomy roles at a non-profit, or looking for non-profit founders to fund, then advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.[1]
Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it’s way harder for an organization like CEA to tell if it’s on the right track than it is for a company like Zoom to tell if it’s on the right track. CEA can track certain specific metrics (e.g. the number of “new connections” reported at each EAG), but it will often be ambiguous how strongly these metrics reflect positive impact—and there will also always be a risk that various negative indirect effects aren’t being captured by the key metrics being used. In some cases, evaluating the expected impact of work will also require making assumptions about how the world will evolve over the next couple decades (e.g. assumptions about how pressing risks from AI are).
I think this means that it’s especially important for these non-profits to employ and be headed by people who are self-skeptical and reflect deeply on decisions. Being entrepreneurial, having a bias toward action, and so on, don’t count for much if the organisation isn’t pointed in the right direction. As Ozzie Gooen has pointed out, there are many examples of massive and superficially successful initiatives (headed by very driven and entrepreneurial people) whose theories-of-impact don’t stand up to scrutiny.
A specific example from Ozzie’s post: SpaceX is a massive and extraordinarily impressive venture that was (at least according to Elon Musk) largely started to help reduce the chance of human extinction, by helping humanity become a multi-planetary species earlier than it otherwise would. But I think it’s hard to see how their work reduces extinction risk very much. If you’re worried about the climate effects of nuclear war, for example, then it seems important to remember that post-nuclear-war Earth would still have a much more hospitable climate than Mars. It’s hard to imagine a disaster scenario where building Martian colonies would be much better than (for example) building some bunkers on Earth.[2] So—relative to the organization’s stated social mission—all the talent, money, and effort SpaceX has absorbed might not ultimately come out to much.
A more concise way to put the concern here: Popular writing on hiring is often implicitly asking the question “How can we identify future Elon Musks?” But, for the most part, longtermist non-profits shouldn’t be looking to put future Elon Musks into leadership positions .[3]
I have in mind, for example, advice given by Y Combinator and advice given in Talent. ↩︎
Another example: It’s possible that many highly successful environmentalist organizations/groups have ended up causing net harm to the environment, by being insufficiently self-skeptical and reflective when deciding how to approach nuclear energy issues. ↩︎
A follow-up thought: Ultimately, outside of earning-to-give ventures, we probably shouldn’t expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don’t seem to be strongly correlated, this will probably only happen pretty rarely. It’s also, often, hard to keep extremely entrepreneurial people satisfied in non-leadership positions—since, almost by definition, autonomy is deeply important to them—so there may not be many opportunities, in general, to harness the talents of people who are high on entrepreneurialism but low on reflectiveness. ↩︎
TL;DR: when it comes to technical expertise, our problem is more a lack of technical literacy than absence of experts. Think about it as analogy with other sciences.
I work with lots of economists (some of them trained in top graduate programs), for a government in a developing country, and I can’t help thinking this proposal is a bit naïve. I’m pretty sure our problem is not that we lack economists, or that the President does not receive expert advice, but that they are seldom listened to—unless they confirm what the relevant stakeholders want. And, like in natural sciences, journalists will often find someone they can refer to as an expert in the field, but people are not able to assess what they say, unless their conclusions are stated in very simple terms, usually in a confirmatory tone. So probably our problem is more a lack of overal economics literacy—and not more experts.
On the other hand, I can say that things have actually been getting better in some sense in the last few years—I observe that basic literacy in economics have been improving in the media and among educated people from other fields. And yet, this has not led to better policy; part of it is that economic development is hard (as I have argued elsewhere), or that the causality here runs the other way around (i.e., people have been learning more about economics because we have dealt with recessions)… but then you also have the whole soldier vs. scout mindset all back again—even educated people will look for info to confirm their biases.
To be clear: I’m certainly not against training more economists from developing countries in US top graduate programs (quite the opposite: sometimes I think the best thing we can do around here is to send bright students away), but I don’t think that’s the bottleneck in economic development (in comparison to other hypothesis), nor that it’d more efficient than other policies concerning education in economics that are likely less expensive and have a broader scope—such as economic classes for youngsters, or funding more economists in these countries, or sending experts from top univerties to teach there, etc.
I think it’s too simplistic to say there’s a single bottleneck.
The latter two seem consistent with my proposal. Part of the problem is that there aren’t many economists in developing countries, hence the need to train more. And ASE does bring experts to teach at their campus.
FTX Future Fund says they support “ambitious projects to improve humanity’s long-term prospects”. Does it seem weird that they’re unanimously funding neartermist global health interventions like lead elimination?
Will MacAskill:
Why wouldn’t FTX just refer this to the Global Health and Development Fund?
My fanfiction (that is maybe “60% true” and so has somewhat more signal than noise) is:
The EA fund you mentioned is basically GiveWell.
GiveWell has a sort of institutional momentum, related to aesthetics about decisions and conditions for funding that make bigger granting costly or harder (alternatively, the deeper reason here is that Global Health and development has a different neglectedness and history of public intervention than any other EA cause area, increasing the bar, but elaborating too much will cause Khorton to hunt me down).
In a way that doesn’t make GiveWell’s choices or institutional role wrong, MacAskill saw that LEEP was great and there was an opportunity here to fund it with his involvement in FTX.
So why FTX?
There’s a cheap answer I can make here about “grantmaker diversity”, however I don’t fully believe this is true (or rather, I’m just clueness). For example, maybe there might be some value in GiveWell having a say in deciding whether to scale up EA global health orgs, like they did scale with Fortify Health. (Not sure about this paragraph, I am sort of wildly LARPing.)
More importantly, this doesn’t answer you point about the “longtermist” FTX funding a “neartermist intervention”.
So, then, why FTX?
This pulls on another thread (or rather one that you pulled on in your other comment).
A part of the answer is that the FTX “team” believes there is some conjunction between certain cause areas, such as highly cost effective health and development, and longtermist.
A big part of the answer is that this “conjunction” is sort of heavily influenced by the people involved (read: SBF and MacAskill). The issue with pulling on this thread, is that this conjunctiveness isn’t perfectly EA canon, it’s hard to formalize, and the decisions involved probably puts the senior EA figures involved into too much focus or authority (more anyone, including themselves, want).
I want to remind anyone reading this comment is that this is fanfiction that is only “60% true”.
I wrote the above comment because I feel like no one else will.
I feel that some of your comments are stilted and choose content in a way that has interpretations that is confrontational and overbearing, making it too difficult to answer. I view this as a form of bad rhetoric (sort of created by bad forum norms that has produced other pathologies) and doesn’t lend itself to truth or good discussion.
To be specific, when you say,
and
This is terse and omits a lot.
A short direct read of your comments, is that you are implying that “MacAskill has clearly delineated the cost effectiveness of all EA cause areas/interventions and has ranked certain x-risk as the only principled cost effective one” and “MacAskill is violating his arrangement of cost effective interventions”.
Instead of what you are suggesting in this ellipsis, it seems like a reasonable first pass perspective is given directly by the interview you quoted from. I think omitting this is unreasonable.
To be specific, MacAskill is saying in the interview:
So, without agreeing or disagreeing with him, MacAskill is saying there is real value to the EA community here with these interventions in several ways. (At the risk of being stilted myself here, maybe you could call this “flow-through effects”, good “PR”, or just “healthy for the EA soul”).
MacAskill can be right or wrong here, but this isn’t mentioned at all in this thread of yours that you raised.
(Yes, there’s some issues with MacAskill’s reasoning, but it’s less that he’s wrong, rather that it’s just a big awkward thread to pull on, as mentioned in my comment above.)
I want to emphasize, I personally don’t mind the aggressiveness, poking at things.
However, the terseness, combined with the lack of context, not addressing the heart of the matter, is what is overbearing.
The ellipsis here is malign, especially combined with the headspace needed to address all of the threads being pulled at (resulting in this giant two part comment).
For example, this allows the presenter pretend that they never made the implication, and then rake the respondent through their lengthy reply.
Many people, looking ahead at all of this, won’t answer the (implied) questions. As a result, the presenter can then stand with his pregnant, implied criticisms. This isn’t good for discourse.
My brief response: I think it’s bad form to move the discussion to the meta-level (ie. “your comments are too terse”) instead of directly discussing the object-level issues.
Can this really be your complete response to my direct, fulsome answer of your question, which you have asked several times?
For example, can you explain why my lengthy comment isn’t a direct object level response?
Even much of my second comment is pointing out you omitted that MacAskill expressly answering why he supported funding LEEP, which is another object level response.
To be clear, I accuse you of engaging in bad faith rhetoric in your above comment and your last response, with an evasion that I specifically anticipated (“this allows the presenter pretend that they never made the implication, and then rake the respondent through their lengthy reply”).
Here’s some previous comments of yours that are more direct, and do not use the same patterns you are now using, where your views and attitudes are more clear.
If you just kept it in this longtermism/neartermism online thing (and drafted on the sentiment from one of the factions there), that’s OK.
This seems bad because I suspect you are entering into unrelated, technical discussions, for example, in economics, using some of the same rhetorical patterns, which I view as pretty bad, especially as it’s sort of flying under the radar.
To be clear, you’re using the linguistic sense of ‘ellipsis’, and not the punctuation mark?
Yes, that is correct, I am using the linguistic sense, similar to “implication” or “suggestion”.
LEEP is lead by a very talented team of strong “neartermist” EAs.
In the real world and real EA, a lot of interest and granting can be dependent on team and execution (especially given the funding situation). Very good work and leaders are always valuable.
Casting everything into some longtermist/neartermist thing online seems unhealthy.
This particular comment seems poorly written (what does “unanimously” mean?) and seems to pull on some issue, but it just reads that everyone likes MacAskill, everyone likes LEEP and so decided to make a move.
Here’s another framing: if you claim that asteroid detection saves 300K lives per $100, pandemic prevention saves 200M lives per $100, and GiveWell interventions save 0.025 lives per $100, isn’t it a bit odd to fund the latter?
Or: longtermists claim that what matters most is the very long term effects of our actions. How is that being implemented here?
Longtermists make very strong claims (eg. “positively influencing the longterm future is *the* key moral priority of our time”). It seems healthy to follow up on those claims, and not sweep under the rug any seeming contradictions.
I chose that word to reflect Will’s statement that everyone at FTX was “totally on board”, in contrast to his expectations of an internal fight. Does that make sense?
The goal of this short-form post: to outline what I see as the key common ground between the “big tent” versus “small and weird” discussions that have been happening recently and to outline one candidate point of disagreement.
Tl;dr:
Common ground:
Everyone really values good thinking processes/epistemics/reasoning transparency and wants to make sure we maintain that aspect of the existing effective altruism community
Impact is fat-tailed
We might be getting a lot more attention soon because of our increased spending and because of the August release of “what we owe the future” (and the marketing push that is likely to accompany it’s release)[1]
A key point of disagreement: Does focusing on finding the people who produce the “tail” impact actually result in more impact?
One reason this wouldn’t be the case: “median” community building efforts and “tail” community building efforts are complements not substitutes. They are multipliers[2] of each other, rather than being additive and independent.
The additive hypothesis is simpler so I felt the multiplicative hypothesis needed some outlined mechanisms. Possible mechanisms:
Mechanism 1: the sorts of community building efforts that are more “median” friendly actually help the people who eventually create the “tail” impact become more interested in these ideas and more interested in taking bigger action with time
Mechanism 2: our biggest lever for impact in the future will not be the highly dedicated individuals but our influence on people on the periphery of the effective altruism (what I call “campground” effects)
Preamble (read: pre-ramble)
This is my summary of my vibe/impressions on some of the parts of the recent discussion that have stood out to me as particularly important. I am intending to finish my half a dozen drafts of a top-level post (with much more explanations for my random jargon that isn’t always even that common in effective altruism circles) at some point but I thought I’d start but sharing these rough thoughts to help get me over the “sharing things on the EA forum is scary” hump.
I might end up just sharing this post as a top-level post later once I’ve translated my random jargon a bit more and thought a bit more about the claims here I’m least sure of (possibly with a clearer outline of what cruxes make the “multiplicative effects” mechanisms more or less compelling)
Some common ground
These are some of my impressions of some claims that seem to be pretty common across the board (but that people sometimes talk as though they might suspect that the person they are talking to might not agree so I think it’s worth making them explicit somewhere).
The biggest one seems to be: We like the fact that effective altruism has good thinking processes/epistemics a lot! We don’t want to jeopardize our reasoning transparency and scout mindsets for the sake of going viral.
Impact is fat-tailed and this makes community-building challenging: there are a lot of uncomfortable trade-offs that might need to be made if we want to build the effective altruism community into a community that will be able to do as much good as possible.
We might be getting a lot more attention very soon whether we want to or not because we’re spending more (and spending in places that get a lot of media attention like political races) and because there will be a big marketing push for “What We Owe the Future” to, potentially, a very big audience. [3]
A point of disagreement
It seems like there are a few points of disagreement that I intended to go into, but this one got pretty long so I’ll just leave this as one point:
Does focusing on “tail” people actually result in more impact?
Are “tail” work and “median” work complements or substitutes? Are they additive (so specialization in the bit with all the impact makes sense) or multiplicative (so doing both well is a necessary condition to getting “tails”)?
I feel like the “additive/substitutes” hypothesis is more intuitive/a simpler assumption so I’ve outlined some explicit mechanisms for the “multiplicative/complements” hypothesis.
Mechanisms for the “multiplicative/complements” hypothesis
Mechanism 1
“Tail” people often require similar “soft” entry points to “non-tail” people and focusing on the “median” people on some dimensions actually is better at getting the “tails” people because we just model “tail” people wrong (e.g. someone could think it looks like some people were always going to be tails, but in reality, when we deep-dive into individual “tail” stories, there was, accidentally, “soft” entry points).
The dimensions where people advocate for lowering the bar are not epistemics/thinking processes, but things like
language barriers (e.g. reducing jargon, finding a plain English way to say something or doing your best to define the jargon when you use it if you think it’s so useful that it’s worth a definition),
making it easier for people to transition at their own pace from wherever they are to “extreme dedication” (and being very okay with some people stopping completely way before, and
reducing the social pressure to agree with the current set of conclusions by putting a lot more emphasis on a broader spectrum of plausible candidates that we might focus on if we’re trying to help others as much as possible (where “plausible candidates” are answers to the question of how we can help others the most with impartiality, considering all people alive today [4] or even larger moral circles/circle of compassion than that too, where an example of a larger group of individuals we might be wanting to help is all present and future sentient beings)
Mechanism 2
As we get more exposure, our biggest lever for impact might not the people that get really enthusiastic about effective altruism who go all the way to the last stop of the crazy train (what I might call our current tent), but the cultural memes we’re spreading to friends-of-friends-of-friends of people who have interacted with people in the effective altruism community or with the ideas and have strong views about them (positive or negative), which I have been calling “our campground” in all my essays to myself on this topic 🤣.
E.g. let’s say that the only thing that matters for humanity’s survival is who ends up in a very small number of very pivotal rooms,[5] it might be much easier to influence a lot of the people who are likely to be in those rooms a little bit to be thinking about some of the key considerations we hope they’d be considering (it’d be nice if we made it more likely that a lot of people might have the thought; “a lot might be at stake here, let’s take a breather before we do X”) than to get people who have dedicated their lives to reducing X-risk because effective altruism-style thinking and caring is a core part of who they are in those rooms.
As we get more exposure, it definitely seems true that “campground” effects are going to get bigger whether we like it or not.[6]
It is an open question (in my mind at least) whether we can leverage this to have a lot more impact or whether the best we can do is sit tight and try and keep the small core community on point.
^
As a little aside, I am so excited to get my hands on a copy (suddenly August doesn’t seem so soon)!
^
Additive and multiplicative models aren’t the only two plausible “approximations” of what might be going on, but they are a nice starting point. It doesn’t seem outside the range of possibility that there are big positive feedback loops between “core camp efforts” and “campground efforts” (and all the efforts in between). If this is plausibly true, then the “tails” for the impact of the effective altruism community as a whole could be here.
^
this point of common ground was edited in after first posting this comment
^
This is a pretty arbitrary cutoff of what counts as a large enough moral circle to count under the broader idea behind effective altruism and trying to do the most good, but I like being explicit about what we might mean because otherwise people get confused/it’s harder to identify what is a disagreement about the facts and what is just a lack of clarity in the questions we’re trying to ask.
I like this arbitrary cutoff a lot because
1) anyone who cares about every single person alive today already has a ginormous moral circle and I think that’s incredible: this seems to be very much wide enough to get at the vibe of the widely caring about others, and
2) the crazy train goes pretty far, it is not at all obvious to me where the “right” stopping point is, I’ve got off a few stops along (my shortcut to avoid dealing with some crazier questions down the line, like infinite ethics, is “just” considering those in my light cone where it be simulated or not😅, not because I actually think this is all that reasonable, but because more thought on what “the answer” is seems to get in the way of me thinking hard about doing the things I think are pretty good which I think, on expectation, actually does more for what I’d guess I’ll care about if I had all of time to think about it.
^
this example is total plagiarism, see: https://80000hours.org/podcast/episodes/sam-bankman-fried-high-risk-approach-to-crypto-and-doing-good/ (also has such a great discussion on multiplicative type effects being a big deal sometimes which I feel people in the effective altruism community think about less than we should: more specialization and more narrowing the focus isn’t always the best strategy on the margin for maximizing how good things are and will be on expectation, especially as we grow and have more variation in people’s comparative advantages within our community, and more specifically, within our set of community builders)
^
If our brand/reputation has lock-in for decades for a really long time, this could plausibly be a hinge of history moment for the effective altruism community. If there are ways of making our branding/reputation is as high fidelity as possible within the low-fidelity channels that messages travel virally, this could be a huge deal (ideally, once we have some goodwill from the broader “campground” we will have a bit of a long reflection to work out what we want our tent to look like 🤣😝).
Thanks Sophia. I think that you’ve quite articulately identified and laid out the difference between these two schools of thought/intuitions about community.
I’d like to see this developed further into a general forum post as I think it contributes to the conversation. FWIW my current take is that we’re more in a multiplicative world (for both the reasons you lay out) and that the lower cost solutions (like the ones you laid out) seem to be table stakes (and I’d even go further and say that if push came to shove I’d actively trade off towards focusing more on the median for these reasons).
Thanks Luke 🌞
Yeah, I definitely think there are some multiplicative effects.
Now I’m teasing out what I think in more detail, I’m starting to find the “median” and “tails” distinction, while useful, still maybe a bit too rough for me to decide whether we should do more or less of any particular strategy that is targeted at either group (which makes me hesitant to immediately put these thoughts as a top form post until I’ve teased out what my best guesses are on how we should maybe change our behaviour if we think we live in a “multiplicative” world).[1]
Here are some more of the considerations/claims (that I’m not all that confident in) that are swirling around in my head at the moment 😊.
tl;dr:
High fidelity communication is really challenging (and doubly so in broad outreach efforts).
However, broad outreach might thicken the positive tail of the effective altruism movement’s impact distribution and thin the negative one even if the median outcome might result in a “diluted” effective altruism community.
Since we are trying to maximize the effective altruism community’s expected impact, and all the impact is likely to be at the tails, we actually probably shouldn’t care all that much about the median outcome anyway.
High fidelity communication about effective altruism is challenging (and even more difficult when we do broader outreach/try to be welcoming to a wider range of people)
I do think it is a huge challenge to preserve the effective altruism community’s dedication to:
caring about, at least, everyone alive today; and
transparent reasoning, a scout mindset and more generally putting a tonne of effort into finding out what is true even if it is really inconvenient.
I do think really narrow targeting might be one of the best tools we have to maintain those things.
Some reasons why we might we want to de-emphasize filtering in existing local groups:
First reason
One reason why focusing on this logic can sometimes be counter-productive because some filtering seems to just miss the mark (see my comment here for an example of how some filtering could plausibly be systematically selecting against traits we value).
Introducing a second reason (more fully-fleshed out in the remainder of this comment)
However, the main reason I think that trying to leverage our media attention, trying to do broad outreach well and trying to be really welcoming at all our shop-fronts might be important to prioritise (even if it might sometimes mean community builders will have to sometimes spend less time spent focusing on the people who seem most promising) is not because of the median outcome from this strategy.
Trying to nail campground effects is really, really, really hard while simultaneously trying to keep effective altruism about effective altruism. However, we’re not trying to optimize for the median outcome for the effective altruism community, we’re trying to maximize the effective altruism community’s expected impact. This is why, despite the fact that “dilution” effects seem like a huge risk, we probably should just aim for the positive tail scenario because that is where our biggest positive impact might be anyway (and also aim to minimize the risks of negative tail scenarios because that also is going to be a big factor in our overall expected impact).
“Median” outreach work might be important to increase our chances of a positive “tail” impact of the effective altruism community as a whole
It is okay for in most worlds for the effective altruism community to have very little impact in the end.
We’re not actually trying to guarantee some level of impact in every possible world. We’re trying to maximize the effective altruism movement’s expected impact.
We’re not aiming for a “median” effective altruism community, we’re trying to maximize our expected impact (so it’s okay if we risk having no impact if that is what we need to do to make positive tails possible or reduce the risk of extreme negative tail outcomes of our work)
Increasing the chances of the positive tails of the effective altruism movement
I think the positive tail impacts are in the worlds where we’ve mastered the synergies between “tent” strategies and “campground” strategies. If we can find ways of keeping the “tent” on point and still make use of our power to spread ideas to a very large number of people (even if the ideas we spread to a much larger audience are obviously going to be lower fidelity, we can still put a tonne of effort into which lower fidelity ideas are the best ones to spread to make the long-term future go really well).
Avoiding the negative tail impacts of the effective altruism movement
This second thought makes me very sad, but I think it is worth saying. I’m not confident in any of this because I don’t like thinking about it so much because it is not fun. Therefore, these thoughts are probably a lot less developed than my “happier”, more optimistic thoughts about the effective altruism community.
I have a strong intuition that more campground strategies reduce the risk of negative tail impacts of the effective altruism movement (though I wish I didn’t have this intuition and I hope someone is able to convince me that this gut feeling is unfounded because I love the effective altruism movement).
Even if campground strategies make it more likely that the effective altruism movement has no impact, it seems completely plausible to me that that might still be a good thing.
A small and weird “cabal” effective altruism, with a lot of power and a lot of money, makes people feel uncomfortable for good reason. There are selection effects, but history is lined with small groups of powerful people who genuinely believed they were making the world a better place and seem, in retrospect, to have done a lot more harm than good.
More people understanding what we’re saying and why makes it more likely that smart people outside our echo chamber can pushback when we’re wrong. It’s a nice safety harness to prevent very bad outcomes.
It is also plausible to me that a tent effective altruism movement might be more likely to achieve their 95th percentile plus positive impact as well as the 5th percentile and below very negative impact.
Effective altruism feels like a rocket right now and rockets aren’t very stable. It intuitively feels easy to have a very big impact, when you do big, ambitious things in an unstable way, and not be able to easily control the sign of that big impact: there is a chance it is very positive or very negative.
I find it plausible that, if you’re going to have a huge impact on the world, having a big negative impact is easier than having a big positive impact by a wide margin (doing good is just darn hard and there are no slam dunk answers[2]).[3] Even though we’re thinking hard about how to make it good, I think it might just be really easy to make it bad (e.g. by bringing attention to the alignment problem, we might be increasing excitement and interest in the plausibility of AGI and therefore are going to get to AGI faster than if no-one talked about alignment).
^
I might post a high level post before I have finished teasing out my best guesses on what the implications might be because I find my views change so fast that it is really hard to ever finish writing down what I think and it is possibly still better for me to share some of my thoughts more publicly than to share none of them. I often feel like I’m bouncing around like a yo-yo and I’m hoping at some point my thoughts will settle down somewhere on an “equilibrium” view instead of continuously thinking up considerations that cause me to completely flip my opinion (and leave me saying inconsistent things left, right and center because I just don’t know what I think quite yet 😝🤣😅). I have made a commitment bet with a friend to post something as a top-level post within two weeks so I will have to either give a snapshot view then or settle on a view or lose $$ (the only reason I got a finished the top level short-form comment that started this discussion was because of a different bet with a different friend 🙃😶🤷🏼♀️). At the very least, I hope that I can come with a more wholesome (but still absolutely true) framing of a lot of the considerations I’ve outlined in the remainder of this post as I think it over more.
^
I think it was Ben Garfinkel said the “no slam dunk” answers thing in his post on suspicious convergence when it comes to arguments about AI risk but I’m too lazy to chase it up to link it(edit: I did go try and chase up the link to this, I think my memory had maybe merged/mixed together this post by Gregory Lewis on suspicious convergence and this transcript from a talk by Ben Garfinkel, I’m leaving both links in this footnote because Gregory Lewis’ post is so good that I’ll use any excuse to leave a link to it wherever I can even though it wasn’t actually relevant to the “no slam dunk answers” quote)^
maybe I just took my reading of HPMOR too literally :P
I agree that we need scope for people to gradually increase their commitment over time. Actually, that’s how my journey has kind of worked out.
On the other hand, I suspect that tail people can build a bigger and more impactful campfire. For example, one Matt Yglesias occasionally posting positive things about EA or EA adjacent ideas increases our campfire by a lot and these people are more likely to be the ones who can influence things.
Yeah, but what people experience when they hear about EA via someone like Matt will determine their further actions/beliefs about EA. If they show up and unnecessarily feel unwelcome or misunderstand EA then we’ve not just missed and opportunity then and there but potentially soured them for the long term (and what they say to others will spur other before we get a chance to reach them).
Hey Chris 😊, yeah, I think changing your mind and life in big ways overnight is a very big ask (and it’s nice to feel like you’re welcome to think about what might be true before you decide whether to commit to doing anything about it—it helps a lot with the cognitive dissonance we all feel when our actions, the values we claim to hold ourselves to and what we believe about the world are at odds[1]).
I also completely agree with some targeting being very valuable. I think we should target exceptionally caring people who have exceptional track-records of being able to accomplish the stuff they set out to accomplish/the stuff they believe is valuable/worthwhile. I also think that if you spend a tonne of time with someone who clearly isn’t getting it even though they have an impressive track record in some domain, then it makes complete sense to use your marginal community building time elsewhere.
However, my guess is that sometimes we can filter too hard, too early for us to get the tail-end of the effective altruism community’s impact.
It is easy for a person to form an accurate impression of another person who is similar to them. It is much harder for a person to quickly form an accurate impression of another person who is really different (but because of diminishing returns, it seems way more valuable on the margin to get people who are exceptional in a different way to the way that the existing community tends to be exceptional than another person who thinks the same way and has the same skills).
^
and we want people to make it easier for people to align these three things in a direction that leads to more caring about others and more seeing the world the way it is (we don’t want to push people away from identifying as someone who cares about others or from shying away from thinking about how the world). If we push too hard on all three things at once, I think it is much easier for people to align these three things by either deciding they actually don’t value what they thought they value, they actually don’t really care about others, or they might find it incredibly hard to see the world exactly as it is (because otherwise their values and their actions will have this huge gap)
EDIT: Witness this train-wreck of me figuring out what I maybe think in real time half-coherently below as I go :P[1]
yeah, I guess an intuition that I have is there are some decisions where we can gain a lot of ground by focusing are efforts in places where it is more likely we come across people who are able to create tail impacts over their lifetimes (e.g. by prioritising creating effective altruism groups in places with lots of people who have a pre-existing track record of being able to achieve the things they set out to achieve). However, I feel like there are some places where more marginal effort on targeting the people who could become tails has sharp diminishing returns and comes with some costs that might not actually be worth it. For example, once you have set up a group in a place where people who have track records of achieving things they set their minds to to a really exceptional degree, trying to figure out how “tail potential” someone is from there often can make people who might have been tail potential if they had been guided in a helpful way completely put off from engaging with us at all.
^
This entire thread is not actually recommended reading but keeping it here because I haven’t yet decided whether I endorse it or not and I don’t see it as that much dis-utility in leaving it here in the meantime while I think about this more.
I’m also not sure, once we’re already targeting people who have track records of doing the things they’ve put their minds to (which obviously won’t be a perfect proxy for tail potential but it often seems better than no prioritisation of where the marginal group should go), I’m not sure how good we are at assessing someone’s “tail potential”, especially because there are going to be big marginal returns to finding people who have a different comparative advantage to the existing community (if it is possible to communicate the key ideas/thinking with high fidelity) who will have more of an inferential gap to cross before communication is efficient enough for us to be able to tell how smart they are/how much potential they have.
This impression comes from knowing people where I speak their language (metaphorically) and I also speak EA (so I can absorb a lot of EA content and translate it in a way they can understand) who are pretty great at reasoning transparency and updating in conversations with people whom they’ve got pre-established trust (which means when miscommunications inevitably happen, the base assumption is still that I’m arguing in good faith). They can’t really demonstrate that reasoning transparency if the person they are talking to doesn’t understand their use of language/their worldview well enough to see that it is actually pretty precise and clear and transparent once you understand what they mean by the words they use.
(I mainly have this experience with people who maybe didn’t study maths or economics or something that STEM-y who I have other “languages” that mean I can still cross inferential gaps reasonably efficiently with them)
This is a proof of existence of these kinds of people. It doesn’t really tell us all that much about what proportion of people without the backgrounds that make the EA language barrier a lot smaller (like philosophy, econ and STEM) are actually good at the thinking processes we value very highly that are taught a lot in STEM subjects.
I could have had this experience with people who I know and this still not mean that this “treating people with a huge amount of charity for the reason that some people might have the potential to have a tail impact even if we’d not guess it when we first meet them” is actually worth it overall. I’ve got a biased sample but I don’t think it’s irrational that this informs my inside view even if I am aware that my sample is likely to be heavily biased (I am only going to have built a common language with people/built trust with people if there is something that fuels our friendships—the people who I want to be friends with are not random! They are people who make me feel understood or say things that I find thought-provoking or a number of other factors that kind of makes them naturally a very cherry-picked pool of people).
Basically, my current best guess is that being really open-minded and patient with people once your group is at a place where pretty much everyone has demonstrated they are a tail person in one way or another (whether that’s because of their personal traits or because of their fortunate circumstances) will get us more people who have the potential to have a positive tail-end impact engaging with us enough for that potential to have a great shot of being realised.
EDIT: I copied and pasted this comment as a direct reply to Chris and then edited it to make it make more sense than it did the first time I wrote it and also to make it way nicer than my off-the-cuff/figuring-out-what-thought-as-I-went stream-of-consciousness but I left this here anyway partly for context for the later comments and also because I think it’s kind of fun to have a record (even if just for me) of how my thoughts develop as I write/tease out what sounds plausibly true once I’ve written it and what doesn’t quite seem to hit the mark of what intuition I’m attempting to articulate (or what intuition that, once I find a way to articulate it, ends up seeming obviously false once I’ve written it up).
I am not arguing that we should not target exceptional people, I think exceptionally smart and caring people are way better to spend a lot of one-on-one time with than people who care an average amount about helping others and for whom there is a lot of evidence that they haven’t yet got a track record of being able to accomplish things they set their minds to.
My guess is that sometimes we can filter too hard, too early for us to get the tail-end of the effective altruism community’s impact.
It is easy for a person to form an accurate impression of another person who is similar to them. It is much harder for a person to quickly form an accurate impression of another person who is really different (but because of diminishing returns, it seems way more valuable on the margin to get people who are exceptional in a different way to the way that the existing community tends to be exceptional than another person who thinks the same way and has the same skills).
(I am not confident I will reflectively endorse much of the above in 24 hours from now, I’m just sharing my off-the-cusp vibes which might solidify into more or less confidence when I let these thoughts sit for a bit more time)
If my confidence in any of these claims substantially increases or decreases in the next few days I might come back and clarify that (but if doing this becomes a bit of an ugh field, I’m not going to prioritise de-ughing it because there are other ugh-fields that are higher on my list to prioritise de-ughing 😝)
I think there’s a lot of value in people reaching out to people they know (this seems undervalued in EA, then again maybe it’s intentional as evangelism can turn people off). This doesn’t seem to trade-off too substantially against more formal movement-building methods which should probably filter more on which groups are going to be most impactful.
In terms of expanding the range of people and skills in EA, that seems to be happening over time (take for example the EA blog prize: https://effectiveideas.org/ ). Or the increased focus on PA’s (https://pineappleoperations.org/). I have no doubt that there are still many useful skills that we’re missing, but there’s a decent chance that funding would be available if there was a decent team to work on the project.
Makes sense
Yeah, this is happening! I also think it helps a lot that Sam BF has a really broad spectrum of ideas take of longtermism, which is really cool!
Oh, here’s another excellent example, the EA Writing Retreat.
😍
I suspect that some ways we filter at events of existing groups are good and we should keep doing them.
I also suspect some strategies/tendencies we have when we filter at the group level are counter-productive to finding and keeping high-potential people.
For example, filtering too fast based on how quickly someone seems to “get” longtermism might filter in the people who are more willing to defer and so seem like they get it more than they do.
It might filter out the people who are really trying to think it through, who seem more resistant to the ideas or who are more willing to voice their half-formed thoughts that haven’t developed yet into something that deep (because thinking through all the different considerations to form an inside view takes a lot of time and voicing a lot of “dead-end” thoughts). Those higher value people might systematically be classed as “less tractable” or “less smart” when, in fact, it is sometimes[1] that we have just forgotten that people who are really thinking about these ideas seriously, who are smart enough to possibly be a person who could have a tail end impact, are going to say things that don’t sound smart as they navigate what they think. The further someone is from our echo chamber, the stronger I expect this effect to be.
Obviously I don’t know how most groups filter at the group-level, this is so dependent on the particular community organizers (and then also there are maybe some cultural commonalities across the movement which is why I find it tempting to make broad-sweeping generalisations that might not hold in many places).
^
but obviously not always (and I don’t actually have a clear idea of how big a deal this issue is, I’m just trying to untangle my various intuitions so I can more easily scrutinize if there is a grain of truth in any of them on closer inspection)
Hmm… Some really interesting thoughts. I generally try to determine whether people are actually making considered counter-arguments vs. repeating cliches, but I take your point about a willingness to voice half-formed thoughts can cause others to assume you’re stupid.
I guess in terms of outreach it makes sense to cultivate a sense of practical wisdom so that you can determine when to patiently continue a conversation or when to politely and strategically withdraw so as to save energy and avoid wasting time. This won’t be perfect and it’s subject to biases as you mentioned, but it’s really the best option available.
Hmm, I’m not sure I agree with the claim “it’s really the best option available” even if I don’t already have a better solution pre-thought up. Or at the very least, I think that how to foster this culture might be worth a lot of strategic thought.
Even if there is a decent chance we end up concluding there isn’t all that much we can do, I think the payoff to finding a good way to manage this might be big enough to make up for all the possible worlds where this work ends up being a dead-end.
Well, if you think of anything, let me know.
👍🏼
In this short-form post I hope to explain personal reservations I have regarding pursuing careers in existential risk reduction, specifically how I struggle to recognize personal positive impact and how this might impact my motivation. The main reason I am posting this is to either be referred to methods to solve my problem or to hear from personal accounts as to how people have maybe dealt with this.
The importance of preventing existential risk is very clear to me, and that one can provide a lot of positive impact with such careers is too. There are ways of analyzing and ranking said risks I am aware of, and ultimately (to put it shortly) the catastrophe that poses the risk will or will not happen. Now, however naïve this might sound, I struggle to see how you can even come close to analyzing one’s personal impact in the lowering of such a risk (especially for the ones considered big, with thus quite some resources and people working on them).
To me, EA in one sentence is: maximizing one’s positive impacts, and working on existential risk reduction seems to me to have a positive impact, but the magnitude will always remain undefined (on a personal level). Especially when compared to earning to give, where the impact one provides with donations is always quoted, an average of course, but nonetheless this provides a good idea of the positive impact created. As such I do not know if I can keep pursuing maximization of my positive impact and remain motivated about it if I will never have a clear idea of how much positive impact that really is.
I hope I have been able to properly express myself, and hope someone has a solution to my problem (i.e. have a better idea of one’s positive impact) or how to deal with that uncertainty and remain motivated.
There’s been a few posts recently about how there should be more EA failures, since we’re trying a bunch of high-risk, high-reward projects, and some of them should fail or we’re not being ambitious enough.
I think this is a misunderstanding of what high-EV bets look like. Most projects do not either produce wild success or abject failure, there’s usually a continuity of outcomes in between, and that’s what you hit. This doesn’t look like “failure”, it looks like moderate success.
For example, consider the MineRL BASALT competition that I organized. The low-probability, high-value outcome would have had hundreds or thousands of entries to the competition, several papers produced as a result, and the establishment of BASALT as a standard benchmark and competition in the field.
What actually happened was that we got ~11 submissions, of which maybe ~5 were serious, made decent progress on the problem, produced a couple of vaguely interesting papers, some people in the field have heard about the benchmark and occasionally use it, and we built enough excitement in the team that the competition will (very likely) run again this year.
Is this failure? It certainly isn’t what normally comes to mind from the normal meaning of “failure”. But it was:
Below my median expectation for what the competition would accomplish
Not something I would have put time into if someone had told me in advance exactly what it would accomplish so far, and the time cost needed to get it.
One hopes that roughly 50% of the things I do meet the first criterion, and probably 90% of the things I’d do would meet the second. But also maybe 90% of the work I do is something people would say was “successful” even ex post.
If you are actually seeing failures for relatively large projects that look like “failures” in the normal English sense of the word, where basically nothing was accomplished at all, I’d be a lot more worried that actually your project was not in fact high-EV even ex ante, and you should be updating a lot more on your failure, and it is a good sign that we don’t see that many EA “failures” in this sense.
(One exception to this is earning-to-give entrepreneurship, where “we had to shut the company down and made ~no money after a year of effort” seems reasonably likely and it still would plausibly be high-EV ex ante.)
https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/openai-general-support
To me at this point the expected impact of the EA phenomena as a whole is negative. Hope we can right this ship, but things really seem off the rails.
Eliezer said something similar, and he seems similarly upset about it: https://twitter.com/ESYudkowsky/status/1446562238848847877
(FWIW I am also upset about it, I just don’t know that I have anything constructive to say)
Eliezer’s tweet is about the founding of OpenAI, whereas Agrippa’s comment is about a 2017 grant to OpenAI (OpenAI was founded in 2015, so this was not a founding grant). It seems like to argue that Open Phil’s grant was net negative (and so strongly net negative as to swamp other EA movement efforts), one would have to compare OpenAI’s work in a counterfactual world where it never got the extra $30 million in 2017 (and Holden never joined the board) with the actual world in which those things happened. That seems a lot harder to argue for than what Eliezer is claiming (Eliezer only has to compare a world where OpenAI didn’t exist vs the actual world where it does exist).
Personally, I agree with Eliezer that the founding of OpenAI was a terrible idea, but I am pretty uncertain about whether Open Phil’s grant was a good or bad idea. Given that OpenAI had already disrupted the “nascent spirit of cooperation” that Eliezer mentions and was going to do things, it seems plausible that buying a board seat for someone with quite a bit of understanding of AI risk is a good idea (though I can also see many reasons it could be a bad idea).
One can also argue that EA memes re AI risk led to the creation of OpenAI, and that therefore EA is net negative (see here for details). But if this is the argument Agrippa wants to make, then I am confused why they decided to link to the 2017 grant.
This post includes some great follow up questions for the future. Has anything been posted re: these follow up questions?
What is the definition of longtermism, if it now includes traditional global health interventions like reducing lead exposure?
Will MacAskill says (bold added):
As far as I can tell liberal nonviolence is a very popular norm in EA. At the same time I really cannot thing of anything more mortally violent I could do than to build a doomsday machine. Even if my doomsday machine is actually a 10%-chance-of-doomsday machine or 1% or etcetera (nobody even thinks it’s lower than that). How come this norm isn’t kicking in? How close to completion does the 10%-chance-of-doomsday machine have to be before gentle kindness is not the prescribed reaction?
My favorite thing about EA has always been the norm that in order to get cred for being altruistic, you actually are supposed to have helped people. This is a great property, just align incentives. But now re: OpenAI I so often hear people say that gentle kindness is the only way, if you are openly adversarial then they will just do the opposite of what you want even more. So much for aligning incentives.
Stem cell slowdown and AI timelines
My knowledge of christians and stem cell research in the US is very limited, but my understanding is that they accomplished real slowdown.
Has anyone looked to that movement for lessons about AI?
Did anybody from that movement take a “change it from the inside” or “build clout by boosting stem cell capabilities so you can later spend that clout on stem cell alignment” approach?
Carrick Flynn lost the nomination, and over $10 million dollars from EA aligned individuals went to support his nomination.
So these questions may sound pointed:
There was surely a lot of expected value in having an EA aligned thinker in congress supporting pandemic preparedness, but there were a lot of bottlenecks that he would have had to go through to make a change.
He would have been one of hundreds of congresspeople. He would have had to get bills passed. He would have had to win enough votes to make it past the primary. He would have had to have his policies get churned through the bureaucratic agencies and it’s not entirely clear any bill he would’ve supported would have kept it’s same form through that process.
What can we learn from the political gambling that was done in this situation? Should we try this again? What are the long term side effects of aligning EA with any political side or making EA a political topic?
Could that $10+ million wasted on Flynn have been better used in just trying to get EA or longtermist bureaucrats in the CDC or other important decision making institutions?
We know the path that individuals take to get these positions, we know what people usually get selected to run pandemic preparedness for the government, why not spend $10 million in gaining the attention of bureaucrats or placing bureaucrats in federal agencies?
Should we consider political gambling in the name of EA a type of intervention that is meant for us to get warm fuzzies rather than do the most good?
I think seeing the attacks that he’s captured by crypto interests was useful, in that future EA political forays will know that attack is coming and be able to fend it off better. Worth $11 mil in itself, probably not, but the expected value was already pretty high (a decent probability of having someone in congress who can champion bills no one disagrees with but doesn’t want to spend time and effort on) so this information gained is helpful and might make either future campaigns more successful or alternatively dissuade future spending in this area. Definitely good to try once, we’ll see how it plays out in the long run. We didn’t know he’d lose until he lost!
https://www.nytimes.com/2022/05/14/opinion/sunday/rich-happiness-big-data.html
This article from Seth Stephens-Davidowitz describes a paper (here) that examines who are the people in the top 0.1% of earners in the US, making at least $1.58 million per year. It was interesting to me in that many of those people were not high-status jobs, but rather owning unsexy businesses such as a car dealership or a beverage distribution operation. Obviously, this has implications for how we structure society, but it could also be a good thing to keep in mind for those interested in earning to give- owning a plumbing company might be a better route for some than trying to make it big on wall street.
An interesting thought, but I think this overlooks the fact that wealth is heavy tailed. So it is (probably) higher EV to have someone with a 10% shot at their tech startup getting huge than one person with a 100% chance of running a succesful plumbing company.
“Write a Philosophical Argument That Convinces Research Participants to Donate to Charity”
Has this every been followed up on? Is their data public?
I recently experienced a jarring update on my beliefs about Transformative AI. Basically, I thought we had more time (decades) than I now believe we will (years) before TAI causes an existential catastrophe. This has had an interesting effect on my sensibilities about cause prioritization. While I applaud wealthy donors directing funds to AI-related Existential Risk mitigation, I don’t assign high probability to the success of any of their funded projects. Moreover, it appears to me that there is essentially no room for additional funds in kinds of denominations coming from non-wealthy donors (e.g. me).
I used to value traditional public health goals quite highly (e.g. I would direct donations to AMF). However, given that most of the returns on bed net distribution lie in a future beyond my current beliefs about TAI, this now seems to me like a bad moral investment. Instead, I’m much more interested in projects which can rapidly improve hedonic well-being (i.e. cause the greatest possible welfare boost in the near-term). In other words, the probability of an existential AI catastrophe has caused me to develop neartermist sympathies. I can’t find much about other EAs considering this, and I have only begun thinking about it, but as a first pass GiveDirectly appears to serve this neartermist hedonic goal somewhat more directly.
Consider s-risk:
From your comment, I understand that you believe the funding situation is strong and not limiting for TAI, and also that the likely outcomes of current interventions is not promising.
(Not necessarily personally agreeing with the above) given your view, I think one area that could still interest you is “s-risk”. This also relevant for your interests in alleviating massive suffering.
I think talking with CLR, or people such as Chi there might be valuable (they might be happy to speak if you are a personal donor).
Leadership development seems good in longtermism or TAI
(Admittedly it’s an overloaded, imprecise statement but) the common wisdom that AI and longtermism is talent constrained seems true. The ability to develop new leaders or work is valuable and can give returns, even accounting for your beliefs being correct.
Prosaic animal welfare
Finally, you and other onlookers should be aware that animal welfare, especially the relatively tractable and “prosaic suffering” of farm animals, is one of the areas that has not received a large increase in EA funding.
Some information below should be interesting to cause neutral EAs. Note that based on private information:
The current accomplishments in farm animal welfare are real and the current work is good. But there is very large opportunity to help (many times more animals are suffering than have been directly helped so far).
The amount of extreme suffering that is being experienced by farm animals is probably worse, much worse than is commonly believed (this is directly being addressed through EA animal welfare and also motivates welfarist work). This level of suffering is being occluded because it does not help, for example, it would degrade the mental health of proponents to an unacceptable level. However, the suffering levels are illogical to disregard when considering neartermist cause prioritization.
This animal welfare work would benefit from money and expertise.
Notably, this is an area where EA has been able to claim significant tangible success (for the fraction that has been able to help).
If there’s at least a 1% chance that we don’t experience catastrophe soon, and we can have reasonable expected influence over no-catastrophe-soon futures, and there’s a reasonable chance that such futures have astronomical importance, then patient philanthropy is quite good in expectation. Given my empirical beliefs, it’s much better then GiveDirectly. And that’s just a lower bound; e.g., investing in movement-building might well be even better.
Question for anyone who has interest/means/time to look into it: which topics on the EA forum are overrepresented/underrepresented? I would be interested in comparisons of (posts/views/karma/comments) per (person/dollar/survey interest) in various cause areas. Mostly interested in the situation now, but viewing changes over time would be great!
My hypothesis [DO NOT VIEW IF YOU INTEND TO INVESTIGATE]:
I expect longtermism to be WILDLY, like 20x, overrepresented. If this is the case I think it may be responsible for a lot of the recent angst about the relationship between longtermism and EA more broadly, and would point to some concrete actions to take.
There was a post on this recently.
Even a brief glance through posts indicates that there is relatively little discussion about global health issues like malaria nets, vitamin A deficiency, and parasitic worms, even though those are among the top EA priorities.
(Disclaimer: The argument I make in this short-form feels I little sophistic to me. I’m not sure I endorse it.)
Discussions of AI risk, particular risks from “inner misalignment,” sometimes heavily emphasize the following observation:
This observation is normally meant to be alarming. And I do see some intuition for that.
But wouldn’t the alternative observation be more alarming?
Suppose that evolutionary selection processes — which iteratively update people’s genes, based on the behaviour these genes produce — tended to produce people who only care about preserving their genes. It seems like that observation would suggest that ML training processes — which iterative update a network’s parameter values, based on the behaviour these parameter values produce — will tend to produce AI systems that only care about preserving their parameter values. And that would be really concerning, since an AI system that cares only about preserving its parameter values would obviously have (instrumentally convergent) reasons to act badly.
So it does seem, to me, like there’s something funny going on here. If “Humans just care about their genes” would be a more worrying observation than “Humans don’t just care about their genes,” then it seems backward for the latter observation to be used to try to convince people to worry more.
To push this line of thought further, let’s go back to specific observation about humans’ relationship to setting themselves on fire:
It seems like this can be interpreted as a reassuring observation. By analogy, in future ML training processes, parameter values that cause ML systems to avoid acts of violence are more likely to be “preserved” from one iteration to the next. We want this to result in AI systems that care about avoiding acts of violence. And the case of humans and fire suggests this might naturally happen.
All this being said, I do think that human evolutionary history still gives us reason to worry. Clearly, there’s a lot of apparent randomness and unpredictability in what humans have actually ended up caring about, which suggests it may be hard to predict or perfectly determine what AI systems care about. But, I think, the specific observation “Humans don’t just care about their genes” might not itself be cause for concern.
The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there’s no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not “you must optimize X goal”, the constraints are “in Y situations you must behave in Z ways”, which doesn’t constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what’s “simpler”).
For example, you train your system to answer truthfully in situations where we know the answer. This could get you an AI system that is truthful… or an AI system that answers truthfully when we know the answer, but lies to us when we don’t know the answer in service of making paperclips. (ELK tries to deal with this setting.)
When I apply this point of view to the evolution analogy it dissolves the question / paradox you’ve listed above. Given the actual ancestral environment and the selection pressures present there, organisms that maximized “reproductive fitness” or “tiling the universe with their DNA” or “maximizing sex between non-sterile, non-pregnant opposite-sex pairs” would all have done well there (I’m sure this is somehow somewhat wrong but clearly in principle there’s a version that’s right), so who knows which of those things you get. In practice you don’t even get organisms that are maximizing anything, because they aren’t particularly goal-directed, and instead are adaption-executers rather than fitness-maximizers.
I do think that once you inhabit this way of thinking about it, the evolution example doesn’t really matter any more; the argument itself very loudly says “you don’t know what you’re going to get out; there are tons of possibilities that are not what you wanted”, which is the alarming part. I suppose in theory someone could think that the “simplest” one is going to be whatever we wanted in the first place, and so we’re okay, and the evolution analogy is a good counterexample to that view?
It turns out that people really really like thinking of training schemes as “optimizing for a goal”. I think this is basically wrong—is CoinRun training optimizing for “get the coin” or “get to the end of the level”? What would be the difference? Selection pressures seem much better as a picture of what’s going on.
But when you communicate with people it helps to show how your beliefs connect into their existing way of thinking about things. So instead of talking about how selection pressures from training algorithms and how they do not uniquely constrain the system you get out, we instead talk about how the “behavioral objective” might be different from the “training objective”, and use the evolution analogy as an example that fits neatly into this schema given the way people are already thinking about these things.
(To be clear a lot of AI safety people, probably a majority, do in fact think about this from an “objective-first” way of thinking, rather than based on selection, this isn’t just about AI safety people communicating with other people.)
I think that’s well-put—and I generally agree that this suggests genuine reason for concern.
I suppose my point is more narrow, really just questioning whether the observation “humans care about things besides their genes” gives us any additional reason for concern. Some presentations seem to suggest it does. For example, this introduction to inner alignment concerns (based on the MIRI mesa-optimization paper) says:
And I want to say: “On net, if humans did only care about maximizing inclusive genetic fitness, that would probably be a reason to become more concerned (rather than less concerned) that ML systems will generalize in dangerous ways.” While the abstract argument makes sense, I think this specific observation isn’t evidence of risk.
Relatedly, something I’d be interested in reading (if it doesn’t already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals—rather than stopping at the fact that humans care about things besides genetic fitness.
My guess is that the case of humans is overall a little reassuring (relative to how we might have expected generalization to work), while still leaving a lot of room for worry.
For example, in the case of violence:
People who committed totally random acts of violence presumably often failed to pass on their genes (because they were often killed or ostracized in return). However, a large portion of our ancestors did have occasion for violence. On high-end estimates, our average ancestor may have killed about .25 people. This has resulted in most people having a pretty strong disinclination to commit murder; for most people, it’s very hard to bring yourself to murder and you’ll often be willing to pay a big cost to avoid committing murder.
The three main reasons for concern, though, are:
people’s desire to avoid murder isn’t strong enough to consistently prevent murder from happening (e.g. when incentives are strong enough)
there’s a decent amount of random variation in how strong this desire is (a small minority of people don’t really care that much about committing violence)
the disinclination to murder becomes weaker the more different the method of murder is from methods that were available in the ancestral environment (e.g. killing someone with a drone strike vs. killing someone with a rock)
These issues might just reflect the fact that murder was still often rewarded (even though it was typically punished) and the fact that there was pretty limited variation in the ancestral environment. But it’s hard to be sure. And it’s hard to know, in any case, how similar generalization in human evolution will be to generalization in ML training processes.
So—if we want to create AI systems that don’t murder people, by rewarding non-murderous behavior—then the evidence from human evolution seems like it might be medium-reassuring. I’d maybe give it a B-.
I can definitely imagine different versions of human values that would have more worrying implications. For example, if our aversion to violence didn’t generalize at all to modern methods of killing, or if we simply didn’t have any intrinsic aversion to killing (and instead avoided it for purely instrumental reasons), then that would be cause for greater concern. I can also imagine different versions of human values that would be more reassuring. For example, I would feel more comfortable if humans were never willing to kill for the sake of weird abstract goals.
I mostly go ¯\_(ツ)_/¯ , it doesn’t feel like it’s much evidence of anything, after you’ve updated off the abstract argument. The actual situation we face will be so different (primarily, we’re actually trying to deal with the alignment problem, unlike evolution).
I do agree that in saying ” ¯\_(ツ)_/¯ ” I am disagreeing with a bunch of claims that say “evolution example implies misalignment is probable”. I am unclear to what extent people actually believe such a claim vs. use it as a communication strategy. (The author of the linked post states some uncertainty but presumably does believe something similar to that; I disagree with them if so.)
I like the general idea but the way I’d do it is by doing some black-box investigation of current language models and asking these questions there; I expect we understand the “ancestral environment” of a language model way, way better than we understand the ancestral environment for humans, making it a lot easier to draw conclusions; you could also finetune the language models in order to simulate an “ancestral environment” of your choice and see what happens then.
I agree with the murder example being a tiny bit reassuring for training non-murderous AIs; medium-reassuring is probably too much, unless we’re expecting our AI systems to be put into the same sorts of situations / ancestral environments as humans were in. (Note that to be the “same sort of situation” it also needs to have the same sort of inputs as humans, e.g. vision + sound + some sort of controllable physical body seems important.)
I think some of us really need to create op-eds, videos, etc. for a mainstream audience defending longtermism. The Phil Torres pieces have spread a lot (people outside the EA community have shared them in a Discord server I moderate, and Timnit Gebru has picked them up) and thus far I haven’t seen an adequate response.
Infinite Ethics 101: Stochastic and Statewise Dominance as a Backup Decision Theory when Expected Values Fail
First posted on nunosempere.com/blog/2022/05/20/infinite-ethics-101 , and written after one too many times encountering someone who didn’t know what to do when encountering infinite expected values.
In Exceeding expectations: stochastic dominance as a general decision theory, Christian Tarsney presents stochastic dominance (to be defined) as a total replacement for expected value as a decision theory. He wants to argue that one decision is only rationally better as another one when it is stochastically dominant. For this, he needs to say that the choiceworthiness of a decision (how rational it is) is undefined in the case where one decision doesn’t stochastically dominate another one.
I think this is absurd, and perhaps determined by academic incentives to produce more eye-popping claims rather than more restricted incremental improvements. Still, I thought that the paper made some good points about us still being able to make decisions even when expected values stop being informative. It was also my introduction to extending rational decision-making to infinite cases, and a great introduction at that. Below, I outline my rudimentary understanding of these topics.
Where expected values fail.
Consider a choice between:
A: 1 utilon with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1⁄8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1⁄8 + … = 1 + 1 + 1 + … = ∞
So the expected value of choice A is ∞, as is the expected value of choice B. And yet, B is clearly preferable to A. What gives?
Statewise dominance
Suppose that in the above case, there were different possible states, as if the payoffs for A and B were determined by the same coin throws:
State i: A gets 1, B gets 2
State ii: A gets 2, B gets 4
State iii: A gets 4, B gets 8,
State in: A gets 2n, B gets 2 × 2n.
Then in this case, B dominates A in every possible state. This is a reasonable decision principle that we can reach to ground our decision to choose B over A.
Stochastic dominance
O stochastically dominates P if:
For any payoff x, the probability that O yields a payoff at least as good as x is equal to or greater than the probability that P yields a payoff at least as good as x, and
For some payoff x, the probability that O yields a payoff at least as good as x is strictly greater than the probability that P yields a payoff at least as good as x.
or, in math notation:
∀x, Probability(Payoff(O) ≥ x) ≥ Probability(Payoff(P) ≥ x))
∃x such that Probability(Payoff(O) ≥ x) > Probability(Payoff(P) ≥ x))
This captures a notion that O is, in a sense, strictly better than P, probabilistically.
In the case of A and B above, if their payoffs were determined by throwing independent coins:
There is a 100% chance that B yields a payoff ≥ 1, and 100% that A yields a payoff ≥ 1
There is a 50% chance that B yields a payoff ≥ 2, but only a 25% chance that A yields a payoff ≥ 2
There is a 25% chance that B yields a payoff ≥ 4, but only a 12.5% chance that A yields a payoff ≥ 4
There is a 12.5% chance that B yields a payoff ≥ 8, but only a 6.26% chance that A does so.
There is a ½^n chance that B yields a payoff ≥ 2n, but only a ½^(n+1) chance that A does so.
So the probability that B gets increasingly better outcomes is higher than the probability that A will do so. So in this case, B stochastically dominates A. Stochastic dominance is thus another decision principle that we could reach to compare choices with infinite expected values.
Gaps left
The above notions of stochastic and statewise dominance could be expanded and improved. For instance, we could ignore a finite number of comparisons going the other way if the expected value of those options was finite but the expected value of the whole thing was infinite. For instance, in the following comparison:
A: 100 utilons with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1⁄8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1⁄8 + … = 1 + 1 + 1 + … = ∞
I would still say that B is preferable to A in that case. And my impression is that there are many similar principles one could reach to, in order to resolve many but not all comparisons between infinite sequences.
Exercise for the reader: Come up with two infinite sequences which cannot be compared using statewise or stochastic dominance, or similar principles.
You could discount utilons—say there is a “meta-utilon” which is a function of utilons, like maybe meta utilons = log(utilons). And then you could maximize expected metautilons rather than expected utilons. Then I think stochastic dominance is equivalent to saying “better for any non decreasing metautilon function”.
But you could also pick a single metautilon function and I believe the outcome would at least be consistent.
Really you might as well call the metautilons “utilons” though. They are just not necessarily additive.
A monotonic transformation like log doesn’t solve the infinity issue right?
Time discounting (to get you comparisons between finite sums) doesn’t preserve the ordering over sequences.
This makes me think you are thinking about something else?
Monotonic transformations can indeed solve the infinity issue. For example the sum of 1/n doesn’t converge, but the sum of 1/n^2 converges, even though x → x^2 is monotonic.
The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:
Unaligned artificial intelligence[1]
Unforeseen anthropogenic risks (tied)
Engineered pandemics (tied)
Other anthropogenic risks
Nuclear war (tied)
Climate change (tied)
This isn’t surprising.
For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance we’ll become less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.
In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).
Some plausible existential risks also are far easier to analyze than others. If you compare 80K’s articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing climate risk simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other specifies to survive under different local climate conditions. And so on. We’re in a much worse epistemic position when it comes to analyzing the risk from misaligned AI: we’re reliant on fuzzy analogies, abstract arguments that use highly ambiguous concepts, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if the existential risk from misaligned AI actually is reasonably small, it’s hard to see how we could become really confident of that.
Some upshots:
The fact that the existential risk community is particularly worried about misaligned AI might mostly reflect the fact that it’s hard to analyze risks from misaligned AI.
Nonetheless, even if the above possibility is true, it doesn’t at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not that big a deal. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.”
For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since evidence, models, and arguments can only really move you so much). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it’s even a bit unclear what exactly a “prior” means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.
(This shortform partly inspired by Greg Lewis’s recent forecasting post .)
Toby Ord notes, in the section of The Precipice that gives risk estimates: “The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book.” ↩︎
Related:
(source)