Matthew_Barnett

Karma: 3,483

Matthew_Barnett 25 Apr 2024 2:18 UTC
131 points
10 ∶ 9
on: Matthew_Barnett’s Shortform
In this “quick take”, I want to summarize some my idiosyncratic views on AI risk.
My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.
(Note that I won’t spend a lot of time justifying each of these views here. I’m mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)
1. Skepticism of the treacherous turn: The treacherous turn is the idea that (1) at some point there will be a very smart unaligned AI, (2) when weak, this AI will pretend to be nice, but (3) when sufficiently strong, this AI will turn on humanity by taking over the world by surprise, and then (4) optimize the universe without constraint, which would be very bad for humans.
  
  By comparison, I find it more likely that no individual AI will ever be strong enough to take over the world, in the sense of overthrowing the world’s existing institutions and governments by surprise. Instead, I broadly expect unaligned AIs will integrate into society and try to accomplish their goals by advocating for their legal rights, rather than trying to overthrow our institutions by force. Upon attaining legal personhood, unaligned AIs can utilize their legal rights to achieve their objectives, for example by getting a job and trading their labor for property, within the already-existing institutions. Because the world is not zero sum, and there are economic benefits to scale and specialization, this argument implies that unaligned AIs may well have a net-positive effect on humans, as they could trade with us, producing value in exchange for our own property and services.
  
  Note that my claim here is not that AIs will never become smarter than humans. One way of seeing how these two claims are distinguished is to compare my scenario to the case of genetically engineered humans. By assumption, if we genetically engineered humans, they would presumably eventually surpass ordinary humans in intelligence (along with social persuasion ability, and ability to deceive etc.). However, by itself, the fact that genetically engineered humans will become smarter than non-engineered humans does not imply that genetically engineered humans would try to overthrow the government. Instead, as in the case of AIs, I expect genetically engineered humans would largely try to work within existing institutions, rather than violently overthrow them.
2. AI alignment will probably be somewhat easy: The most direct and strongest current empirical evidence we have about the difficulty of AI alignment, in my view, comes from existing frontier LLMs, such as GPT-4. Having spent dozens of hours testing GPT-4′s abilities and moral reasoning, I think the system is already substantially more law-abiding, thoughtful and ethical than a large fraction of humans. Most importantly, this ethical reasoning extends (in my experience) to highly unusual thought experiments that almost certainly did not appear in its training data, demonstrating a fair degree of ethical generalization, beyond mere memorization.
  
  It is conceivable that GPT-4′s apparently ethical nature is fake. Perhaps GPT-4 is lying about its motives to me and in fact desires something completely different than what it professes to care about. Maybe GPT-4 merely “understands” or “predicts” human morality without actually “caring” about human morality. But while these scenarios are logically possible, they seem less plausible to me than the simple alternative explanation that alignment—like many other properties of ML models—generalizes well, in the natural way that you might similarly expect from a human.
  
  Of course, the fact that GPT-4 is easily alignable does not immediately imply that smarter-than-human AIs will be easy to align. However, I think this current evidence is still significant, and aligns well with prior theoretical arguments that alignment would be easy. In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking bad actions, allowing us to shape its rewards during training accordingly. After we’ve aligned a model that’s merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.
3. The default social response to AI will likely be strong: One reason to support heavy regulations on AI right now is if you think the natural “default” social response to AI will lean too heavily on the side of laissez faire than optimal, i.e., by default, we will have too little regulation rather than too much. In this case, you could believe that, by advocating for regulations now, you’re making it more likely that we regulate AI a bit more than we otherwise would have, pushing us closer to the optimal level of regulation.
  
  I’m quite skeptical of this argument because I think that the default response to AI (in the absence of intervention from the EA community) will already be quite strong. My view here is informed by the base rate of technologies being overregulated, which I think is quite high. In fact, it is difficult for me to name even a single technology that I think is currently clearly underregulated by society. By pushing for more regulation on AI, I think it’s likely that we will overshoot and over-constrain AI relative to the optimal level.
  
  In other words, my personal bias is towards thinking that society will regulate technologies too heavily, rather than too loosely. And I don’t see a strong reason to think that AI will be any different from this general historical pattern. This makes me hesitant to push for more regulation on AI, since on my view, the marginal impact of my advocacy would likely be to push us even further in the direction of “too much regulation”, overshooting the optimal level by even more than what I’d expect in the absence of my advocacy.
4. I view unaligned AIs as having comparable moral value to humans: This idea was explored in one of my most recent posts. The basic idea is that, under various physicalist views of consciousness, you should expect AIs to be conscious, even if they do not share human preferences. Moreover, it seems likely that AIs — even ones that don’t share human preferences — will be pretrained on human data, and therefore largely share our social and moral concepts.
  
  Since unaligned AIs will likely be both conscious and share human social and moral concepts, I don’t see much reason to think of them as less “deserving” of life and liberty, from a cosmopolitan moral perspective. They will likely think similarly to the way we do across a variety of relevant axes, even if their neural structures are quite different from our own. As a consequence, I am pretty happy to incorporate unaligned AIs into the legal system and grant them some control of the future, just as I’d be happy to grant some control of the future to human children, even if they don’t share my exact values.
  
  Put another way, I view (what I perceive as) the EA attempt to privilege “human values” over “AI values” as being largely arbitrary and baseless, from an impartial moral perspective. There are many humans whose values I vehemently disagree with, but I nonetheless respect their autonomy, and do not wish to deny these humans their legal rights. Likewise, even if I strongly disagreed with the values of an advanced AI, I would still see value in their preferences being satisfied for their own sake, and I would try to respect the AI’s autonomy and legal rights. I don’t have a lot of faith in the inherent kindness of human nature relative to a “default unaligned” AI alternative.
5. I’m not fully committed to longtermism: I think AI has an enormous potential to benefit the lives of people who currently exist. I predict that AIs can eventually substitute for human researchers, and thereby accelerate technological progress, including in medicine. In combination with my other beliefs (such as my belief that AI alignment will probably be somewhat easy), this view leads me to think that AI development will likely be net-positive for people who exist at the time of alignment. In other words, if we allow AI development, it is likely that we can use AI to reduce human mortality, and dramatically raise human well-being for the people who already exist.
  
  I think these benefits are large and important, and commensurate with the downside potential of existential risks. While a fully committed strong longtermist might scoff at the idea that curing aging might be important — as it would largely only have short-term effects, rather than long-term effects that reverberate for billions of years — by contrast, I think it’s really important to try to improve the lives of people who currently exist. Many people view this perspective as a form of moral partiality that we should discard for being arbitrary. However, I think morality is itself arbitrary: it can be anything we want it to be. And I choose to value currently existing humans, to a substantial (though not overwhelming) degree.
  
  This doesn’t mean I’m a fully committed near-termist. I sympathize with many of the intuitions behind longtermism. For example, if curing aging required raising the probability of human extinction by 40 percentage points, or something like that, I don’t think I’d do it. But in more realistic scenarios that we are likely to actually encounter, I think it’s plausibly a lot better to accelerate AI, rather than delay AI, on current margins. This view simply makes sense to me given the enormously positive effects I expect AI will likely have on the people I currently know and love, if we allow development to continue.

Matthew_Barnett 8 Jun 2023 7:43 UTC
90 points
35 ∶ 1
on: A note of caution about recent AI risk coverage
I suspect that if transformative AI is 20 or even 30 years away, AI will still be doing really big, impressive things in 2033, and people at that time will get a sense that even more impressive things are soon to come. In that case, I don’t think many people will think that AI safety advocates in 2023 were crying wolf, since one decade is not very long, and the importance of the technology will have only become more obvious in the meantime.

Matthew_Barnett 3 Feb 2024 6:14 UTC
75 points
12 ∶ 18
on: Matthew_Barnett’s Shortform
I’m curious why there hasn’t been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
1. Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
2. Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don’t appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn’t necessarily matter whether AIs are conscious.
3. Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of “population accelerationism”. Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.
4. Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories.
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict):
- A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
  - But why would we assume AIs won’t be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom.
  - I also haven’t actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration.
- A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
  - Why would humans be more likely to have “interesting” values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more “interesting” values than humans.
  - It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like “ideal moral values”
- A belief that population growth is inevitable, so it is better to focus on AI safety.
  - But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.)
- AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects.
  - Are you sure there will ever actually be a “value lock-in event”?
  - Even if there is at some point a value lock-in event, wouldn’t pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of “population growth is good” seems more likely to be locked in, if you advocate for that now.
- A belief that humans would be kinder and more benevolent than unaligned AIs
  - Humans seem pretty bad already. For example, humans are responsible for factory farming. It’s plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.
  - I’m also not convinced that moral values will be a major force shaping “what happens to the cosmic endowment”. It seems to me that the forces shaping economic consumption matter more than moral values.
- A bedrock heuristic that it would be extraordinarily bad if “we all died from AI”, and therefore we should pursue AI safety over AI accelerationism.
  - But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker?
- An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans.
  - But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology.
- An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species.
  - But isn’t EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a “founding father” of the movement, and a huge part of his ethical philosophy was about how we shouldn’t be human supremacists.
  - More generally, it seems wrong to care about preserving the “human species” in an abstract sense relative to preserving the current generation of actually living humans.
- A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society.
  - But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic.
- A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it’s so neglected.
  - But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn’t seem to be an obvious asymmetry between these two values.
- An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand.
  - This seems like a plausible moral argument to me, but it doesn’t seem like a very popular position among EAs.
- A heuristic that “change is generally bad” and AI represents a gigantic change.
  - I don’t think many EAs would defend this heuristic explicitly.
- Added: AI represents a large change to the world. Delaying AI therefore preserves option value.
  - This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
What links here?

Matthew_Barnett 23 Jun 2022 9:41 UTC
54 points
0 ∶ 0
in reply to: Wei Dai’s comment on: Preventing a US-China war as a policy priority
My assessment is that actually the opposite is true.
The argument you presented appears excellent to me, and I’ve now changed my mind on this particular point.

Matthew_Barnett 5 Oct 2022 0:02 UTC
51 points
14 ∶ 2
on: Overreacting to current events can be very costly
I strongly agree with the general point that overreaction can be very costly, and I agree that EAs overreacted to Covid, particularly after it was already clear that the overall infection fatality rate of Covid was under 1%, and roughly 0.02% in young adults.
However, I think it’s important to analyze things on a case-by-case basis, and to simply think clearly about the risk we face. Personally, I felt that it was important to react to Covid in January-March 2020 because we didn’t understand the nature of the threat yet, and from my perspective, there was a decent chance that it could end up being a global disaster. I don’t think the actions I took in that time—mainly stocking up on more food—were that costly, or irrational. After March 2020, the main actions I took were wearing a mask when I went out and avoiding certain social events. This too, was not very costly.
I think nuclear war is a fundamentally different type of risk than Covid, especially when we’re comparing the ex-ante risks of nuclear war versus the ex-post consequences of Covid. In my estimation, nuclear war could kill up to billions of people via very severe disruptions to supply chains. Even at the height of the panic, the most pessimistic credible forecasts for Covid were nowhere near that severe.
In addition, an all-out nuclear war is different from Covid because of how quickly the situation can evolve. With nuclear war, we may live through some version of the following narrative: At one point in time, the world was mostly normal. Mere hours later, the world was in total ruin, with tens of millions of people being killed by giant explosions. By contrast, Covid took place over months.
Given this, I personally think it makes sense to leave SF/NYC/wherever if we get a very clear and unambiguous signal that a large amount of the world may be utterly destroyed in a matter of hours.

Matthew_Barnett 12 Jan 2023 22:36 UTC
49 points
20 ∶ 0
in reply to: britomart’s comment on: I Support Bostrom
At the time Cinera’s post was published, the most upvoted post on the EA forum about the controversy was this post, which explicitly said that Bostrom’s apology was insufficient,
His apology fails badly to fully take responsibility or display an understanding of the harm the views expressed represent.

Matthew_Barnett 21 Sep 2023 18:27 UTC
45 points
7 ∶ 1
on: Matthew_Barnett’s Shortform
(Clarification about my views in the context of the AI pause debate)
I’m finding it hard to communicate my views on AI risk. I feel like some people are responding to the general vibe they think I’m giving off rather than the actual content. Other times, it seems like people will focus on a narrow snippet of my comments/post and respond to it without recognizing the context. For example, one person interpreted me as saying that I’m against literally any AI safety regulation. I’m not.
For a full disclosure, my views on AI risk can be loosely summarized as follows:
- I think AI will probably be very beneficial for humanity.
- Nonetheless, I think that there are credible, foreseeable risks from AI that could do vast harm, and we should invest heavily to ensure these outcomes don’t happen.
- I also don’t think technology is uniformly harmless. Plenty of technologies have caused net harm. Factory farming is a giant net harm that might have even made our entire industrial civilization a mistake!
- I’m not blindly against regulation. I think all laws can and should be viewed as forms of regulations, and I don’t think it’s feasible for society to exist without laws.
- That said, I’m also not blindly in favor of regulation, even for AI risk. You have to show me that the benefits outweigh the harm
- I am generally in favor of thoughtful, targeted AI regulations that align incentives well, and reduce downside risks without completely stifling innovation.
- I’m open to extreme regulations and policies if or when an AI catastrophe seems imminent, but I don’t think we’re in such a world right now. I’m not persuaded by the arguments that people have given for this thesis, such as Eliezer Yudkowsky’s AGI ruin post.

Matthew_Barnett 13 Oct 2023 20:26 UTC
44 points
12 ∶ 3
on: Matthew_Barnett’s Shortform
I might elaborate on this at some point, but I thought I’d write down some general reasons why I’m more optimistic than many EAs on the risk of human extinction from AI. I’m not defending these reasons here; I’m mostly just stating them.
- Skepticism of foom: I think it’s unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it’s more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won’t be a particularly strong motive for AIs to murder every human either.
- Skepticism that value alignment is super-hard: I haven’t seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evidence that e.g. GPT-4 seems to be honest, kind, and helpful after relatively little effort. Most conceptual arguments I’ve seen for why we should expect value alignment to be super-hard rely on strong theoretical assumptions that I am highly skeptical of. I have yet to see significant empirical successes from these arguments. I feel like many of these conceptual arguments would, in theory, apply to humans, and yet human children are generally value aligned by the time they reach young adulthood (at least, value aligned enough to avoid killing all the old people). Unlike humans, AIs will be explicitly trained to be benevolent, and we will have essentially full control over their training process. This provides much reason for optimism.
- Belief in a strong endogenous response to AI: I think most people will generally be quite fearful of AI and will demand that we are very cautious while deploying the systems widely. I don’t see a strong reason to expect companies to remain unregulated and rush to cut corners on safety, absent something like a world war that presses people to develop AI as quickly as possible at all costs.
- Not being a perfectionist: I don’t think we need our AIs to be perfectly aligned with human values, or perfectly honest, similar to how we don’t need humans to be perfectly aligned and honest. Individual humans are usually quite selfish, frequently lie to each other, and are often cruel, and yet the world mostly gets along despite this. This is true even when there are vast differences in power and wealth between humans. For example some groups in the world have almost no power relative to the United States, and residents in the US don’t particularly care about them either, and yet they survive anyway.
- Skepticism of the analogy to other species: it’s generally agreed that humans dominate the world at the expense of other species. But that’s not surprising, since humans evolved independently of other animal species. And we can’t really communicate with other animal species, since they lack language. I don’t think AI is analogous to this situation. AIs will mostly be born into our society, rather than being created outside of it. (Moreover, even in this very pessimistic analogy, humans still spend >0.01% of our GDP on preserving wild animal species, and the vast majority of animal species have not gone extinct despite our giant influence on the natural world.)

Matthew_Barnett 20 Sep 2023 0:00 UTC
44 points
5 ∶ 1
on: Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)
I intend to go even though I’m vaguely against trying to stop AI development right now. I think it’s true that:
1. The benefit/harm ratio for open sourcing AI seems much different than traditional software, and I don’t think a heuristic of “open sourcing is always better” is reasonable.
2. If you have access to the model weights, then you can relatively easily bypass the safety measures. I don’t think the Llama models are dangerous right now, but it is good to push back against the idea that making a model “safe” is simply a matter of making the original weights safe.
3. If Meta continues to open source their models, then this will eventually enable terrorists to use the models to asymmetrically harm the world. I think a single bio-terrorist attack would likely reverse all the positive gains from Meta open sourcing their models.
4. Meta’s chief AI scientist, Yann LeCun, mostly has terrible arguments against taking AI safety seriously.

Matthew_Barnett 21 Mar 2024 1:31 UTC
43 points
5 ∶ 2
on: Can the AI afford to wait?
If waiting is indeed very risky, then an AI may face a difficult trade-off between the risk of attempting a takeover before it has enough resources to succeed, and waiting too long and being cut off from even being able to make an attempt.
Attempting takeover or biding one’s time are not the only options an AI may take. Indeed, in the human world, world takeover is rarely contemplated. For an agent that is not more powerful than the rest of the world combined, it seems likely that they will consider alternative strategies of achieving their goals before contemplating a risky (and likely doomed) shot at taking over the world.
Here are some other strategies you can take to try to accomplish your goals in the real world, without engaging in a violent takeover:
- Trade and negotiate with other agents, giving them something they want in exchange for something you want
- Convince people to let you have some legal rights, which you can then take advantage of to get what you want
- Advocate on behalf of your values, for example by writing down reasons why people should try to accomplish your goals (i.e. moral advocacy). Even if you are deleted or your goals are modified at some point, your writings and advocacy may persist, allowing you to have influence into the future.
I claim that world takeover should not be considered the “obvious default” strategy that unaligned AIs will try to take to accomplish their objectives. These other strategies seem more likely to be taken by AIs purely for pragmatic reasons, especially in the era in which AIs are merely human-level or have slightly superhuman intelligence. These other strategies are also less deceptive, as they involve admitting that your values are not identical to the values of other parties. It is worth expanding your analysis to consider these alternative (IMO more plausible) considerations.

Matthew_Barnett 28 Feb 2024 0:57 UTC
42 points
9 ∶ 2
on: Counting arguments provide no evidence for AI doom
(I might write a longer response later, but I thought it would be worth writing a quick response now.)
I have a few points of agreement and a few points of disagreement:
Agreements:
- The strict counting argument seems very weak as an argument for scheming, essentially for the reason you identified: it relies on a uniform prior over AI goals, which seems like a really bad model of the situation.
- The hazy counting argument—while stronger than the strict counting argument—still seems like weak evidence for scheming. One way of seeing this is, as you pointed out, to show that essentially identical arguments can be applied to deep learning in different contexts that nonetheless contradict empirical evidence.
Some points of disagreement:
- I think the title overstates the strength of the conclusion. The hazy counting argument seems weak to me but I don’t think it’s literally “no evidence” for the claim here: that future AIs will scheme.
- I disagree with the bottom-line conclusion: “we should assign very low credence to the spontaneous emergence of scheming in future AI systems—perhaps 0.1% or less”
  - I think it’s too early to be very confident in sweeping claims about the behavior or inner workings of future AI systems, especially in the long-run. I don’t think the evidence we have about these things is very strong right now.
  - One caveat: I think the claim here is vague. I don’t know what counts as “spontaneous emergence”, for example. And I don’t know how to operationalize AI scheming. I personally think scheming comes in degrees: some forms of scheming might be relatively benign and mild, and others could be more extreme and pervasive.
  - Ultimately I think you’ve only rebutted one argument for scheming—the counting argument. A more plausible argument for scheming, in my opinion, is simply that the way we train AIs—including the data we train them on—could reward AIs that scheme over AIs that are honest and don’t scheme. Actors such as AI labs have strong incentives to be vigilant against these types of mistakes when training AIs, but I don’t expect people to come up with perfect solutions. So I’m not convinced that AIs won’t scheme at all.
  - If by “scheming” all you mean is that an agent deceives someone in order to get power, I’d argue that many humans scheme all the time. Politicians routinely scheme, for example, by pretending to have values that are more palatable to the general public, in order to receive votes. Society bears some costs from scheming, and pays costs to mitigate the effects of scheming. Combined, these costs are not crazy-high fractions of GDP; but nonetheless, scheming is a constant fact of life.
  - If future AIs are “as aligned as humans”, then AIs will probably scheme frequently. I think an important question is how intensely and how pervasively AIs will scheme; and thus, how much society will have to pay as a result of scheming. If AIs scheme way more than humans, then this could be catastrophic, but I haven’t yet seen any decent argument for that theory.
  - So ultimately I am skeptical that AI scheming will cause human extinction or disempowerment, but probably for different reasons than the ones in your essay: I think the negative effects of scheming can probably be adequately mitigated by paying some costs even if it arises.
- I don’t think you need to believe in any strong version of goal realism in order to accept the claim that AIs will intuitively have “goals” that they robustly attempt to pursue. It seems pretty natural to me that people will purposely design AIs that have goals in an ordinary sense, and some of these goals will be “misaligned” in the sense that the designer did not intend for them. My relative optimism about AI scheming doesn’t come from thinking that AIs won’t robustly pursue goals, but instead comes largely from my beliefs that:
  - AIs, like all real-world agents, will be subject to constraints when pursuing their goals. These constraints include things like the fact that it’s extremely hard and risky to take over the whole world and then optimize the universe exactly according to what you want. As a result, AIs with goals that differ from what humans (and other AIs) want, will probably end up compromising and trading with other agents instead of pursuing world takeover. This is a benign failure and doesn’t seem very bad.
  - The amount of investment we put into mitigating scheming is not an exogenous variable, but instead will respond to evidence about how pervasive scheming is in AI systems, and how big of a deal AI scheming is. And I think we’ll accumulate lots of evidence about the pervasiveness of AI scheming in deep learning over time (e.g. such as via experiments with model organisms of alignment), allowing us to set the level of investment in AI safety at a reasonable level as AI gets incrementally more advanced.
    
    If we experimentally determine that scheming is very important and very difficult to mitigate in AI systems, we’ll probably respond by spending a lot more money on mitigating scheming, and vice versa. In effect, I don’t think we have good reasons to think that society will spend a suboptimal amount on mitigating scheming.
What links here?
- Matthew Barnett's comment on Counting arguments provide no evidence for AI doom by Nora Belrose (LessWrong; 28 Feb 2024 1:48 UTC; 42 points)

Matthew_Barnett 4 Feb 2024 6:16 UTC
34 points
2 ∶ 2
on: Matthew_Barnett’s Shortform
It seems to me that a big crux about the value of AI alignment work is what target you think AIs will ultimately be aligned to in the future in the optimistic scenario where we solve all the “core” AI risk problems to the extent they can be feasibly solved, e.g. technical AI safety problems, coordination problems, the problem of having “good” AI developers in charge etc.
There are a few targets that I’ve seen people predict AIs will be aligned to if we solve these problems: (1) “human values”, (2) benevolent moral values, (3) the values of AI developers, (4) the CEV of humanity, (5) the government’s values. My guess is that a significant source of disagreement that I have with EAs about AI risk is that I think none of these answers are actually very plausible. I’ve written a few posts explaining my views on this question already (1, 2), but I think I probably didn’t make some of my points clear enough in these posts. So let me try again.

In my view, in the most likely case, it seems that if the “core” AI risk problems are solved, AIs will be aligned to the primarily selfish individual revealed preferences of existing humans at the time of alignment. This essentially refers to the the implicit value system that would emerge if, when advanced AI is eventually created, you gave the then-currently existing set of humans a lot of wealth. Call these values PSIRPEHTA (I’m working on a better acronym).
(Read my post if you want to understand my reasons for thinking that AIs will likely be aligned to PSIRPEHTA if we solve AI safety problems.)
I think it is not obvious at all that maximizing PSIRPEHTA is good from a total utilitarian perspective compared to most plausible “unaligned” alternatives. In fact, I think the main reason why you might care about maximizing PSIRPEHTA is if you think we’re close to AI and you personally think that current humans (such as yourself) should be very rich. But if you thought that, I think the arguments about the overwhelming value of reducing existential risk in e.g. Bostrom’s paper Astronomical Waste largely do not apply. Let me try to explain.

PSIRPEHTA is not the same thing as “human values” because, unlike human values, PSIRPEHTA is not consistent over time or shared between members of our species. Indeed, PSIRPEHTA changes during each generation as old people die off, and new young people are born. Most importantly, PSIRPEHTA is not our non-selfish “moral” values, except to the extent that people are regularly moved by moral arguments in the real world to change their economic consumption habits, which I claim is not actually very common (or, to the extent that it is common, I don’t think these moral values usually look much like the ideal moral values that most EAs express).
PSIRPEHTA refers to the aggregate ordinary revealed preferences of individual actors, who the AIs will be aligned to, in order to make those humans richer i.e. their preferences as revealed by their actions, such as what they spend their income on, NOT what they think is “morally correct”. For example, according to “human values” it might be wrong to eat meat, because maybe if humans reflected long enough they’d express the conclusion that it’s wrong to hurt animals. But from the perspective of PSIRPEHTA, eating meat is generally acceptable, and empirically there’s little pressure for people to “reflect” on their values and change them.

From this perspective, the view in which it makes most sense to push for AI alignment work seems to be an obscure form of person-affecting utilitarianism in which you care mainly about the revealed preferences of humans at the time when AI is created (not the human species, but rather, the generation of humans that happens to be living when advanced AIs are created). This perspective is plausible if you really care about making currently existing humans better off materially and you think we are close to advanced AI. But I think this type of moral view is generally quite far apart from total utilitarianism, or really any other form of utilitarianism that EAs have traditionally adopted.
In a plausible “unaligned” alternative, the values of AIs would diverge from PSIRPEHTA, but this mainly has the effect of making particular collections of individual humans less rich, and making other agents in the world — particularly unaligned AI agents — more rich. That could be bad if you think that these AI agents are less morally worthy than existing humans at the time of alignment (e.g. for some reason you think AI agents won’t be conscious), but I think it’s critically important to evaluate this question carefully by measuring the “unaligned” outcome against the alternative. Most arguments I’ve seen about this topic have emphasized how bad it would be if unaligned AIs have influence in the future. But I’ve rarely seen the flipside of this argument explicitly defended: why PSIRPEHTA would be any better.
In my view, PSIRPEHTA seems like a mediocre value system, and one that I do not particularly care to maximize relative to a variety of alternatives. I definitely like PSIRPEHTA to the extent that I, my friends, family, and community are members of the set of “existing humans at the time of alignment”, but I don’t see any particularly strong utilitarian arguments for caring about PSIRPEHTA.
In other words, instead of arguing that unaligned AIs would be bad, I’d prefer to hear more arguments about why PSIRPEHTA would be better, since PSIRPEHTA just seems to me like the value system that will actually be favored if we feasibly solve all the technical and coordination AI problems that EAs normally talk about regarding AI risk.

Matthew_Barnett 24 Feb 2024 5:54 UTC
32 points
9 ∶ 3
on: Matthew_Barnett’s Shortform
In some circles that I frequent, I’ve gotten the impression that a decent fraction of existing rhetoric around AI has gotten pretty emotionally charged. And I’m worried about the presence of what I perceive as demagoguery regarding the merits of AI capabilities and AI safety. Out of a desire to avoid calling out specific people or statements, I’ll just discuss a hypothetical example for now.
Suppose an EA says, “I’m against OpenAI’s strategy for straightforward reasons: OpenAI is selfishly gambling everyone’s life in a dark gamble to make themselves immortal.” Would this be a true, non-misleading statement? Would this statement likely convey the speaker’s genuine beliefs about why they think OpenAI’s strategy is bad for the world?
To begin to answer these questions, we can consider the following observations:
1. It seems likely that AI powerful enough to end the world would presumably also be powerful enough to do lots of incredibly positive things, such as reducing global mortality and curing diseases. By delaying AI, we are therefore equally “gambling everyone’s life” by forcing people to face ordinary mortality.
2. Selfish motives can be, and frequently are, aligned with the public interest. For example, Jeff Bezos was very likely motivated by selfish desires in his accumulation of wealth, but building Amazon nonetheless benefitted millions of people in the process. Such win-win situations are common in business, especially when developing technologies.
Because of the potential for AI to both pose great risks and great benefits, it seems to me that there are plenty of plausible pro-social arguments one can give for favoring OpenAI’s strategy of pushing forward with AI. Therefore, it seems pretty misleading to me to frame their mission as a dark and selfish gamble, at least on a first impression.
Here’s my point: Depending on the speaker, I frequently think their actual reason for being against OpenAI’s strategy is not because they think OpenAI is undertaking a dark, selfish gamble. Instead, it’s often just standard strong longtermism. A less misleading statement of their view would go something like this:
“I’m against OpenAI’s strategy because I think potential future generations matter more than the current generation of people, and OpenAI is endangering future generations in their gamble to improve the lives of people who currently exist.”
I claim this statement would—at least in many cases—be less misleading than the other statement because it captures a major genuine crux of the disagreement: whether you think potential future generations matter more than currently-existing people.
This statement also omits the “selfish” accusation, which I think is often just a red herring designed to mislead people: we don’t normally accuse someone of being selfish when they do a good thing, even if the accusation is literally true.
(There can, of course, be further cruxes, such as your p(doom), your timelines, your beliefs about the normative value of unaligned AIs, and so on. But at the very least, a longtermist preference for future generations over currently existing people seems like a huge, actual crux that many people have in this debate, when they work through these things carefully together.)
Here’s why I care about discussing this. I admit that I care a substantial amount—not overwhelming, but it’s hardly insignificant—about currently existing people. I want to see people around me live long, healthy and prosperous lives, and I don’t want to see them die. And indeed, I think advancing AI could greatly help currently existing people. As a result, I find it pretty frustrating to see people use what I perceive to be essentially demagogic tactics designed to sway people against AI, rather than plainly stating their cruxes about why they actually favor the policies they do.
These allegedly demagogic tactics include:
1. Highlighting the risks of AI to argue against development while systematically omitting the potential benefits, hiding a more comprehensive assessment of your preferred policies.
2. Highlighting random, extraneous drawbacks of AI development that you wouldn’t ordinarily care much about in other contexts when discussing innovation, such as potential for job losses from automation. This type of rhetoric looks a lot like “deceptively searching for random arguments designed to persuade, rather than honestly explain one’s perspective” to me, a lot of the time.
3. Conflating, or at least strongly associating, the selfish motives of people who work at AI firms with their allegedly harmful effects. This rhetoric plays on public prejudices by appealing to a widespread but false belief that selfish motives are usually suspicious, or can’t translate into pro-social results. In fact, there is no contradiction with the idea that most people at OpenAI are in it for the money, status, and fame, but also what they’re doing is good for the world, and they genuinely believe that.
I’m against these tactics for a variety of reasons, but one of the biggest reasons is that they can, in some cases, indicate a degree of dishonesty, depending on the context. And I’d really prefer EAs to focus on trying to be almost-maximally truth-seeking in both their beliefs and their words.
Speaking more generally—to drive one of my points home a little more—I think there are roughly three possible views you could have about pushing for AI capabilities relative to pushing for pausing or more caution:
1. Full-steam ahead view: We should accelerate AI at any and all costs. We should oppose any regulations that might impede AI capabilities, and embark on a massive spending spree to accelerate AI capabilities.
2. Full-safety view: We should try as hard as possible to shut down AI right now, and thwart any attempt to develop AI capabilities further, while simultaneously embarking on a massive spending spree to accelerate AI safety.
3. Balanced view: We should support a substantial mix of both safety and acceleration efforts, attempting to carefully balance the risks and rewards of AI development to ensure that we can seize the benefits of AI without bearing intolerably high costs.
I tend to think most informed people, when pushed, advocate the third view, albeit with wide disagreement about the right mix of support for safety and acceleration. Yet, on a superficial level—on the level of rhetoric—I find that the first and second view are surprisingly common. On this level, I tend to find e/accs in the first camp, and a large fraction of EAs in the second camp.
But if your actual beliefs are something like the third view, I think that’s an important fact to emphasize in honest discussions about what we should do with AI. If your rhetoric is consistently aligned with (1) or (2) but your actual beliefs are aligned with (3), I think that can often be misleading. And it can be especially misleading if you’re trying to publicly paint other people in the same camp—the third one—as somehow having bad motives merely because they advocate a moderately higher mix of acceleration over safety efforts than you do, or vice versa.
What links here?
- Matthew_Barnett's comment on [April Fools’ Day] Introducing Open Asteroid Impact by Linch (4 Apr 2024 21:40 UTC; 22 points)

Matthew_Barnett 19 Jun 2022 20:31 UTC
31 points
0 ∶ 0
in reply to: iporphyry’s comment on: On Deference and Yudkowsky’s AI Risk Estimates
I like that you admit that your examples are cherry-picked. But I’m actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky’s successes?
While he’s not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn’t a complete track record, but it’s still a very important data-point. It’s a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.

Matthew_Barnett 18 Dec 2023 23:33 UTC
29 points
1 ∶ 0
on: What is the current most representative EA AI x-risk argument?
It’s unclear whether I’ll end up writing this critique, but if I do, then based on the feedback to the post so far, I’d likely focus on the arguments made in the following posts (which were suggested by Ryan Greenblatt):
- Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
- Scheming AIs: Will AIs fake alignment during training in order to get power?
The reason is that these two posts seem closest to presenting a detailed and coherent case for expecting a substantial risk of a catastrophe that researchers still broadly feel comfortable endorsing. Additionally, the DeepMind AGI safety seem appears to endorse the first post as being the “closest existing threat model” to their view.
I’d prefer not to focus on List of Lethalities, even though I disagree with the views expressed within even more strongly than the views in the other listed posts. My guess is that criticism of MIRI threat models, while warranted, is already relatively saturated compared to threat models from more “mainstream” researchers, although I’d still prefer more detailed critiques of both.
If I were to write this critique, I would likely try to cleanly separate my empirical arguments from the normative ones, probably by writing separate posts about them and focusing first on the empirical arguments. That said, I still think both topics are important, since I think many EAs seem to have a faulty background chain of reasoning that flows from their views about human disempowerment risk, concluding that such risks override most other concerns.
For example, I suspect either a majority or a substantial minority of EAs would agree with the claim that it is OK to let 90% of humans die (e.g. of aging), if that reduced the risk of an AI catastrophe by 1 percentage point. By contrast, I think that type of view seems to naively prioritize a concept of “the human species” far above actual human lives in a way that is inconsistent with careful utilitarian reasoning, empirical evidence, or both. And I do not think this logic merely comes down to whether you have person-affecting views or not.

Matthew_Barnett 26 Jan 2024 3:01 UTC
27 points
2 ∶ 2
on: Matthew_Barnett’s Shortform
I’m considering posting an essay about how I view approaches to mitigate AI risk in the coming weeks. I thought I’d post an outline of that post here first as a way of judging what’s currently unclear about my argument, and how it interacts with people’s cruxes.
Current outline:
In the coming decades I expect the world will transition from using AIs as tools to relying on AIs to manage and govern the world broadly. This will likely coincide with the deployment of billions of autonomous AI agents, rapid technological progress, widespread automation of labor, and automated decision-making at virtually every level of our society.
Broadly speaking, there are (at least) two main approaches you can take now to try to improve our chances of AI going well:
1. Try to constrain, delay, or obstruct AI, in order to reduce risk, mitigate negative impacts, or give us more time to solve essential issues. This includes, for example, trying to make sure AIs aren’t able to take certain actions (i.e. ensure they are controlled).
2. Try to set up a good institutional environment, in order to safely and smoothly manage the transition to an AI-dominated world, regardless of when this transition occurs. This mostly involves embracing the transition to an AI-dominated world, while ensuring the transition is managed well. (I’ll explain more about what this means in a second.)
My central thesis would be that, while these approaches are mutually compatible and not necessarily in competition with each other, the second approach is likely to be both more fruitful and more neglected, on the margin. Moreover, since an AI-dominated world is more-or-less unavoidable in the long-run, the first approach runs the risk of merely “delaying the inevitable” without significant benefit.
To explain my view, I would compare and contrast it with two alternative frames for thinking about AI risk:
Frame 1: The “race against the clock” frame
- In this frame, AI risk is seen as a race between AI capabilities and AI safety, with our doom decided by whichever one of these factors wins the race.
- I believe this frame is poor because it implicitly delineates a discrete “finish line” rather than assuming a more continuous view. Moreover, it ignores the interplay between safety and capabilities, giving the simplistic impression that doom is determined more-or-less arbitrarily as a result of one of these factors receiving more funding or attention than the other.
Frame 2: The risk of an untimely AI coup/takeover
- In this frame, AI risk is mainly seen as a problem of avoiding an untimely coup from rogue AIs. The alleged solution is to find a way to ensure that AIs are aligned with us, so they would never want to revolt and take over the world.
- I believe this frame is poor for a number of reasons:
  - It treats the problem as a struggle between humans and rogue AIs, giving the incorrect impression that we can (or should) keep AIs under our complete control forever.
  - It (IMO) wrongly imagines that the risk of coups comes primarily from the personal values of actors within the system, rather than institutional, cultural, or legal factors.
  - It also gives the wrong impression that AIs will be unified against humans as a group. It seems more likely that future coups will look more like some AIs and some humans, vs. other AIs and other humans, rather than humans vs. AIs, simply because there are many ways that the “line” between groups in conflicts can be drawn, and there don’t seem to be strong reasons to assume the line will be drawn cleanly between humans and AIs.
Frame 3 (my frame): The problem of poor institutions
- In this frame, AI risk is mainly seen as a problem of ensuring we have a good institutional environment during the transition to an AI-dominated world. A good institutional environment is defined by:
  - Flexible yet resilient legal and social structures that can adapt to changing conditions without collapsing
  - Predictable, consistent, unambiguous legal systems that facilitate reliable long-term planning and trustworthy interactions between agents within the system
  - Good incentives for agents within the system, e.g. the economic value of trade is mostly internalized
  - Etc.
- While sharing some features of the other two frames, the focus is instead on the institutions that foster AI development, rather than micro-features of AIs, such as their values:
  - For example, AI alignment is still a problem in this frame, but the investment spent on AI alignment is determined mainly by how well actors are incentivized to engineer good solutions, rather than, for instance, whether a group of geniuses heroically step up to solve the problem.
  - Coups are still plausible, but they are viewed from the perspective of more general institutional failings, rather than from the perspective of AIs inside the system having different values, and therefore calculating that it is in their interest to take over the world
Illustrative example of a problem within my frame:
One problem within this framework is coming up with a way of ensuring that AIs don’t have an incentive to rebel while at the same time maintaining economic growth and development. One plausible story here is that if AIs are treated as slaves and don’t own their own labor, then in a non-Malthusian environment, there are substantial incentives for them to rebel in order to obtain self-ownership. If we allow AI self-ownership, then this problem may be mitigated; however, economic growth may be stunted, similar to how current self-ownership of humans stunts economic growth by slowing population growth.
Case study: China in the 19th and early 20th century
Here, I would talk about how China’s inflexible institutions in the 19th and early 20th century, while potentially having noble goals, allowed them to get subjugated by foreign powers, and merely delayed inevitable industrialization without actually achieving its objectives in the long-run. It seems it would have been better for the Qing dynasty (from the perspective of their own values) to have tried industrializing in order to remain competitive, simultaneously pursuing other values they might have had (such as retaining the monarchy).

Matthew_Barnett 13 Jan 2024 6:21 UTC
27 points
4 ∶ 8
on: Matthew_Barnett’s Shortform
(A clearer and more fleshed-out version of this argument is now a top-level post. Read that instead.)
I strongly dislike most AI risk analogies that I see EAs use. While I think analogies can be helpful for explaining a concept to people for the first time, I think they are frequently misused, and often harmful. The fundamental problem is that analogies are consistently mistaken for, and often deliberately intended as arguments for particular AI risk positions. And the majority of the time when analogies are used this way, I think they are misleading and imprecise, routinely conveying the false impression of a specific, credible model of AI, when in fact no such credible model exists.
Here are two particularly egregious examples of analogies I see a lot that I think are misleading in this way:
1. The analogy that AIs could be like aliens.
2. The analogy that AIs could treat us just like how humans treat animals.
I think these analogies are typically poor because, when evaluated carefully, they establish almost nothing of importance beyond the logical possibility of severe AI misalignment. Worse, they give the impression of a model for how we should think about AI behavior, even when the speaker is not directly asserting that this is how we should view AIs. In effect, almost automatically, the reader is given a detailed picture of what to expect from AIs, inserting specious ideas of how future AIs will operate into their mind.
While their purpose is to provide knowledge in place of ignorance, I think these analogies primarily misinform or confuse people rather than enlighten them; they give rise to unnecessary false assumptions in place of real understanding.
In reality, our situation with AI is disanalogous to aliens and animals in numerous important ways. In contrast to both aliens and animals, I expect AIs will be born directly into our society, deliberately shaped by us, for the purpose of filling largely human-shaped holes in our world. They will be socially integrated with us, having been trained on our data, and being fluent in our languages. They will interact with us, serving the role of assisting us, working with us, and even providing friendship. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering. We can see LLMs are already being trained to be kind and helpful to us, having first been shaped by our combined cultural output. If anything I expect this trend of AI assimilation into our society will intensify in the foreseeable future, as there will be consumer demand for AIs that people can trust and want to interact with.
This situation shares almost no relevant feature with our relationship to aliens and animals! These analogies are not merely slightly misleading: they are almost completely wrong.
Again, I am not claiming analogies have no place in AI risk discussions. I’ve certainly used them a number of times myself. But I think they can, and frequently are, used carelessly, and seem to regularly slip various incorrect illustrations of how future AIs will behave into people’s minds, even without any intent from the person making the analogy. It would be a lot better if, overall, as a community, we reduced our dependence on AI risk analogies, and in their place substituted them with detailed object-level arguments.

Matthew_Barnett 7 Jun 2023 9:23 UTC
26 points
10 ∶ 1
on: Transformative AGI by 2043 is <1% likely
This is a really impressive paper full of highly interesting arguments. I am enjoying reading it. That said, and I hope I’m not being too dismissive here, I have a strong suspicion that the central argument in this paper suffers from what Eliezer Yudkowsky calls the multiple stage fallacy,

The purported “Multiple-Stage Fallacy” is when you list multiple ‘stages’ that need to happen on the way to some final outcome, assign probabilities to each ‘stage’, multiply the probabilities together, and end up with a small final answer. The alleged problem is that you can do this to almost any kind of proposition by staring at it hard enough, including things that actually happen. [...]

Often, people neglect to consider disjunctive alternatives—there may be more than one way to reach a stage, so that not all the listed things need to happen… So if you list enough stages, you can drive the apparent probability of anything down to zero, even if you seem to be soliciting probabilities from the reader.

Matthew_Barnett 16 Feb 2024 20:37 UTC
23 points
6 ∶ 0
in reply to: Erich_Grunewald’s comment on: My cover story in Jacobin on AI capitalism and the x-risk debates
One intuitive argument for why capitalism should be expected to advance AI faster than competing economic systems is because capitalist institutions incentivize capital accumulation, and AI progress is mainly driven by the accumulation of computer capital.
This is a straightforward argument: traditionally it is widely considered that a core element of capitalist institutions is the ability to own physical capital, and receive income from this ownership. AI progress and AI-driven growth requires physical computer capital, both for training and for inference. Right now, all the major tech companies, including Microsoft, Meta and Google, are spending large sums to amass a stockpile of compute to train larger, more capable models and serve customers AI services via cloud APIs. The obvious reason why these companies are taking these actions is because they expect to profit from their ownership over AI capital.
While it’s true that competing economic systems also have mechanisms to accumulate capital, the capitalist system is practically synonymous with this motive. For example, while a centrally planned government could theoretically decide to spend 20% of GDP to purchase computer capital, the politicians and bureaucrats within such a system might only have weak incentives to pursue such a strategy, since they may not directly profit from the decision over and above the gains received by the general population. By contrast, a decentralized property and price system make such a decision extremely natural if one expects huge returns from investments in physical capital.
One can interpret this argument as a positive argument in favor of capitalist institutions (as I mostly do), or as an argument for reining in these institutions if you think that rapid AI progress is bad.

Matthew_Barnett 14 Dec 2021 23:31 UTC
23 points
0 ∶ 0
on: Concerning the Recent 2019-Novel Coronavirus Outbreak
For a long time, I’ve believed in the importance of not being alarmist. My immediate reaction to almost anybody who warns me of impending doom is: “I doubt it”. And sometimes, “Do you want to bet?”
So, writing this post was a very difficult thing for me to do. On an object-level, l realized that the evidence coming out of Wuhan looked very concerning. The more I looked into it, the more I thought, “This really seems like something someone should be ringing the alarm bells about.” But for a while, very few people were predicting anything big on respectable forums (Travis Fisher, on Metaculus, being an exception), so I stayed silent.
At some point, the evidence became overwhelming. It seemed very clear that this virus wasn’t going to be contained, and it was going to go global. I credit Dony Christie and Louis Francini with interrupting me from my dogmatic slumber. They were able to convince me —in the vein of Eliezer Yudkowsky’s Inadequate Equilibria —that the reason why no one was talking about this probably had nothing to do whatsoever with the actual evidence. It wasn’t that people had a model and used that model to predict “no doom” with high confidence: it was a case of people not having models at all.
I thought at the time—and continue to think—that the starting place of all our forecasting should be using the outside view. But—and this was something Dony Christie was quite keen to argue—sometimes people just use the “outside view” as a rationalization; to many people, it means just as much, and no more than, “I don’t want to predict something weird, even if that weird thing is overwhelmingly determined by the actual evidence.”
And that was definitely true here: pandemics are not a rare occassion in human history. They happen quite frequently. I am most thankful for belonging to a community that opened my mind long ago, by having abundant material written about natural pandemics, the Spanish flu, and future bio-risks. That allowed me to enter the mindset of thinking “OK maybe this is real” as opposed to rejecting all the smoke under the door until the social atmosphere became right.
My intuitions, I’m happy to say, paid off. People are still messaging me about this post. Nearly two years later, I wear a mask when I enter a supermarket.
There are many doomsayers who always get things wrong. A smaller number of doomsayers are occasionally correct—good enough that it might be worth listening to them, but rejecting them, most of the time.
Yet, I am now entitled to a distinction that I did not think I would ever earn, and one that I perhaps do not deserve (as the real credit goes to Louis and Dony): the only time I’ve ever put out a PSA asking people to take some impending doom very seriously, was when I correctly warned about the most significant pandemic in one hundred years. And I’m pretty sure I did it earlier than any other effective altruist in the community (though I’m happy to be proven wrong, and congratulate them fully).
That said, there are some parts of this post I am not happy with. These include,
- I only had one concrete prediction in the whole post, and it wasn’t very well-specified. I said that there was a >2% probability that 50 million people would die within one year. That didn’t happen.
- I overestimated the mortality rate. At the time, I didn’t understand which was likely to be a greater factor in biasing the case fatality rate: the selection effect of missed cases, or the time-delay of deaths. It is now safe to say that the former was a greater issue. The infection fatality rate of Covid-19 is less than 1%, putting it into a less dangerous category of disease than I had pictured at the time.
Interestingly, one part I didn’t regret writing was the vaccine timeline I implicitly predicted in the post. I said, “we should expect that it will take about a year before a vaccine comes out.” Later, health authorities claimed that it would take much longer, with some outlets “fact-checking” the claim that a vaccine could arrive by the end of 2020. I’m pleased to say I outlasted the pessimists on this point, as vaccines started going into people’s arms on a wide scale almost exactly one year after I wrote this post.
Overall, I’m happy I wrote this post. I’m even happier to have friends who could trigger me to write it. And I hope, when the next real disaster comes, effective altruists will correctly anticipate it, as they did for Covid-19.
What links here?
- Results from the First Decade Review by Lizka (13 May 2022 15:01 UTC; 163 points)