[Takeaways from Covid forecasting on Metaculus]
I’m probably going to win the first round of the Li Wenliang forecasting tournament on Metaculus, or maybe get second. (My screen name shows up in second on the leaderboard, but it’s a glitch that’s not resolved yet because one of the resolutions depends on a strongly delayed source.) (Update: I won it!)
With around 52 questions, this was the largest forecasting tournament on the virus. It ran from late February until early June.
I learned a lot during the tournament. Next to claiming credit, I want to share some observations and takeaways from this forecasting experience, inspired by Linch Zhang’s forecasting AMA:
I did well at forecasting, but it came at the expense of other things I wanted to do. In February, March and April, Covid had completely absorbed me. I spent several hours per day reading news and had anxiety about regularly updating my forecasts. This was exhausting; I was relieved when the tournament came to an end.
I had previously dabbled in AI forecasting. Unfortunately, I can’t tell if I excelled at it because the Metaculus domain for it went dormant. In any case, I noticed that I felt more motivated to delve into Covid questions because they seemed more connected. It felt like I was not only learning random information to help me with a single question, but I was acquiring a kind of expertise. (Armchair epidemiology? :P ) I think this impression was due to a mixture of perhaps suboptimal question design for the AI Metaculus domain and the increased difficulty of picking up useful ML intuitions on the go.
One thing I think I’m good at is identifying reasons why past trends might change. I’m always curious to understand the underlying reasons behind some trends. I come up with lots of hypotheses because I like the feeling of generating a new insight. I often realized that my hunches were wrong, but in the course of investigating them, I improved my understanding.
I have an aversion to making complex models. I always feel like model uncertainty is too large anyway. When forecasting Covid cases, I mostly looked for countries where similar situations have already played out. Then, I’d think about factors that might be different with the new situation, and make intuition-based adjustments in the direction predicted by the differences.
I think my main weakness is laziness. Occasionally, when there’s an easy way to do it, I’d spot-check hypotheses by making predictions about past events that I hadn’t yet read about. However, I don’t do this nearly enough. Also, I rely too much on factoids I picked up from somewhere without verifying how accurate they are. For instance, I had it stuck in my head that someone said that the case doubling rate was 4 days. So, I operated with this assumption for many days of forecasting, before realizing that it’s actually looking like 2.5 days in densely populated areas and that I should anyway have spent more time looking firsthand into this crucial variable. Lastly, I noticed a bunch of times that other forecasters were talking about issues I don’t have a good grasp on (e.g., test-positivity rates), and I felt that I’d probably improve my forecasting if I looked into it, but I preferred to stick with approaches I was more familiar with.
IT skills really would have helped me generate forecasts faster. I had to do crazy things with pen and paper because I lacked them. (But none of what I did involved more than elementary-school math.)
I learned that confidently disagreeing with the community forecast is different from “not confidently agreeing.” I lost a bunch of points twice due to underconfidence. In cases where I had no idea about some issue and saw the community predict <10%, I didn’t want to go <20% because that felt inappropriate given my lack of knowledge about the plausible-sounding scenario. I couldn’t confidently agree with the community, but since I also didn’t confidently disagree with them, I should have just deferred to their forecast. Contrarianism is a valuable skill, but one also has to learn to trust others in situations where one sees no reason not to.
I realized early that when I changed my mind on some consideration that initially had me predict different from the community median, I should make sure to update thoroughly. If I no longer believe my initial reason for predicting significantly above the median, maybe I should go all the way to slightly below the median next. (The first intuition is to just move closer to it but still stay above.)
From playing a lot of poker, I have the habit of imagining that I make some bet (e.g., a bluff or thin value bet) and it will turn out that I’m wrong in this instance. Would I still feel good about the decision in hindsight? This heuristic felt very useful to me in forecasting. It made me reverse initially overconfident forecasts when I realized that my internal assumptions didn’t feel like something I could later on defend as “It was a reasonable view at the time.”
I made a couple of bad forecasts after I stopped following developments every day. I realized I needed to re-calibrate how much to trust my intuitions once I no longer had a good sense of everything that was happening.
Some things I was particularly wrong about:
This was well before I started predicting on Metaculus, but up until about February 5th, I was way too pessimistic about the death rate for young healthy people. I think I lacked the medical knowledge to have the right prior about how strongly age-skewed most illnesses are, and therefore updated too strongly upon learning about the deaths of two young healthy Chinese doctors.
Like others, I overestimated the importance of hospital overstrain. I assumed that this would make the infection fatality rate about 1.5x–2.5x worse in countries that don’t control their outbreaks. This didn’t happen.
I was somewhat worried about food shortages initially, and was surprised by the resilience of the food distribution chains.
I expected more hospitalizations in Sweden in April.
I didn’t expect the US to put >60 countries on the level-3 health warning travel list. I was confident that they would not do this, because “If a country is gonna be safer than the US itself, why not let your citizens travel there??”
I was nonetheless too optimistic about the US getting things under control eventually, even though I saw comments from US-based forecasters who were more pessimistic.
My long-term forecasts for case numbers tended to be somewhat low. (Perhaps this was in part related to laziness; the Metaculus interface made it hard to create long tails for the distribution.)
Some things I was particularly right about:
I was generally early to recognize the risks from novel coronavirus / Covid.
For European countries and the US initially, I expected lockdown measures to work roughly as well as they did. I confidently predicted lower than the community for the effects of the first peak.
I somewhat confidently ruled out IFR estimates <0.5% in early March already, and I think this was for good reasons, even though I continued to accumulate better evidence for my IFR predictions later and was wrong about the effects of hospital overstrain.
I very confidently doubled down against <0.5% IFR estimates in late March, despite the weird momentum that developed around taking them seriously, and the confusion about the percentage of asymptomatic cases.
I have had very few substantial updates since mid March. I predicted the general shape of the pandemic quite well, e.g. here or here.
I confidently predicted that the UK and the Netherlands (later) would change course about their initial “no lockdown” policy.
I noticed early that Indonesia had a large undetected outbreak. A couple of days after I predicted this, the deaths there jumped from 1 to 5 and its ratio of confirmed cases to deaths became the worst (or second worst?) in the world at the time.
(I have stopped following the developments closely by now.)
I know it might not be what you’re looking for, but congratulations!
+1 to the congratulations from JP! I may have mentioned this before, but I considered your forecasts and comments for covidy questions to be the highest-quality on Metaculus, especially back when we were both very active.
You may not have considered it worth your time in the end, but I still think it’s good for EAs to do things that on the face of it seem fairly hard, and develop better self models and models of the world as a result.
This was a great writeup, thanks for taking the time to make it. Congrats on the contest, too!
I’m sorry to hear your experience was stressful. Do you intend to go back to Metaculus in a more relaxed way? I know some users restrict themselves to a subset of topics, for example.
Can you provide some links on the latest IFR estimates? A quick Google search leads me to the same 0.5% ballpark.
I’m not following the developments anymore. I could imagine that the IFR is now lower than it used to be in April because treatment protocols have improved.
[Is pleasure ‘good’?]
What do we mean by the claim “Pleasure is good”?
There’s an uncontroversial interpretation and a controversial one.
Vague and uncontroversial claim: When we say that pleasure is good, we mean that all else equal, pleasure is always unobjectionable, and often it is desired.
Specific and controversial claim: When we say that pleasure is good, what we mean is that, all else equal, pleasure is an end we should be striving for. This captures points like:
that pleasure is in itself desirable,
that no mental states without pleasure are in itself desirable,
that more pleasure is always better than less pleasure.
People who say “pleasure is good” claim that we can establish this by introspection about the nature of pleasure. I don’t see how one could establish the specific and controversial claim from mere introspection. After all, even if I personally valued pleasure in the strong sense (I don’t), I couldn’t, with my own introspection, establish that everyone does the same. People’s psychologies differ, and how pleasure is experienced in the moment doesn’t fully determine how one will relate to it. Whether one wants to dedicate one’s life (or, for altruists, at least the self-oriented portions of one’s life) to pursuing pleasure depends on more than just what pleasure feels like.
Therefore, I think pleasure is only good in the weak sense. It’s not good in the strong sense.
Another argument that points to “pleasure is good” is that people and many animals are drawn to things that gives them pleasure, and that generally people communicate about their own pleasurable states as good. Given a random person off the street, I’m willing to bet that after introspection they will suggest that they value pleasure in the strong sense. So while this may not be universally accepted, I still think it could hold weight.
Also, a symmetric statement can be said regarding suffering, which I don’t think you’d accept. People who say “suffering is bad” claim that we can establish this by introspection about the nature of suffering.
From reading Tranquilism, I think that you’d respond to these as saying that people confuse “pleasure is good” with an internal preference or craving for pleasure, while suffering is actually intrinsically bad. But taking an epistemically modest approach would require quite a bit of evidence for that, especially as part of the argument is that introspection may be flawed.
I’m curious as to how strongly you hold this position. (Personally, I’m totally confused here but lean toward the strong sense of pleasure is good but think that overall pleasure holds little moral weight)
Another argument that points to “pleasure is good” is that people and many animals are drawn to things that gives them pleasure
It’s worth pointing out that this association isn’t perfect. See  and  for some discussion. Tranquilism allows that if someone is in some moment neither drawn to (craving) (more) pleasurable experiences nor experiencing pleasure (or as much as they could be), this isn’t worse than if they were experiencing (more) pleasure. If more pleasure is always better, then contentment is never good enough, but to be content is to be satisfied, to feel that it is good enough or not feel that it isn’t good enough. Of course, this is in the moment, and not necessarily a reflective judgement.
I also approach pleasure vs suffering in a kind of conditional way, like an asymmetric person-affecting view, or “preference-affecting view”:
I would say that something only matters if it matters (or will matter) to someone, and an absence of pleasure doesn’t necessarily matter to someone who isn’t experiencing pleasure, and certainly doesn’t matter to someone who does not and will not exist, and so we have no inherent reason to promote pleasure. On the other hand, there’s no suffering unless someone is experiencing it, and according to some definitions of suffering, it necessarily matters to the sufferer. (A bit more on this argument here, but applied to good and bad lives.)
I agree that pleasure is not intrinsically good (i.e. I also deny the strong claim). I think it’s likely that experiencing the full spectrum of human emotions (happiness, sadness, anger, etc.) and facing challenges are good for personal growth and therefore improve well-being in the long run. However, I think that suffering is inherently bad, though I’m not sure what distinguishes suffering from displeasure.
[I’m an anti-realist because I think morality is underdetermined]
I often find myself explaining why anti-realism is different from nihilism / “anything goes.” I wrote lengthy posts in my sequence on moral anti-realism (2 and 3) about partly this point. However, maybe the framing “anti-realism” is needlessly confusing because some people do associate it with nihilism / “anything goes.” Perhaps the best short explanation of my perspective goes as follows:I’m happy to concede that some moral facts exist (in a comparatively weak sense), but I think morality is underdetermined.
This means that beyond the widespread agreement on some self-evident principles, expert opinions won’t converge even if we had access to a superintelligent oracle. Multiple options will be defensible, and people will gravitate to different attractors in value space.
I think if you concede that some moral facts exist, it might be more accurate to call yourself a moral realist. The indeterminacy of morality could be a fundamental feature, allowing for many more acts to be ethically permissible (or no worse than other acts) than with a linear (complete) ranking. I think consequentialists are unusually prone to try to rank outcomes linearly.
I read this recently, which describes how moral indeterminacy can be accommodated within moral realism, although it was kind of long for what it had to say. I think expert agreement (or ideal observers/judges) could converge on moral indeterminacy: they could agree that we can’t know how to rank certain options and further that there’s no fact of the matter.
Thanks for bringing up this option! I don’t agree with this framing for two reasons:
As I point out in my sequence’s first post, some ways in which “moral facts exist” are underwhelming.
I don’t think moral indeterminacy necessarily means that there’s convergence of expert judgments. At least, the way in which I think morality is underdetermined explicitly predicts expert divergence. Morality is “real” in the sense that experts will converge up to a certain point, and beyond that, some experts will have underdetermined moral values while others will have made choices within what’s allowed by indeterminacy. Out of the ones that made choices, not all choices will be the same.
I think what I describe in the second bullet point will seem counterintuitive to many people because they think that if morality is underdetermined, your views on morality should be underdetermined, too. But that doesn’t follow! I understand why people have the intuition that this should follow, but it really doesn’t work that way when you look at it closely. I’ve been working on spelling out why.
[Are underdetermined moral values problematic?]
If I think my goals are merely uncertain, but in reality they are underdetermined and the contributions I make to shaping the future will be driven, to a large degree, by social influences, ordering effects, lock-in effects, and so on, is that a problem?
I can’t speak for others, but I’d find it weird. I want to know what I’m getting up for in the morning.
On the other hand, because it makes it easier for the community to coordinate and pull things in the same directions, there’s a sense in which underdetermined values are beneficial.
[Moral uncertainty and moral realism are in tension]
Is it ever epistemically warranted to have high confidence in moral realism, and also be morally uncertain not only between minor details of a specific normative-ethical theory but between theories?
I think there’s a tension there. One possible reply might be the following. Maybe we are confident in the existence of some moral facts, but multiple normative-ethical theories can accommodate them. Accordingly, we can be moral realists (because some moral facts exist) and be morally uncertain (because there are many theories to choose from that accommodate the little bits we think we know about moral reality).
However, what do we make of the possibility that moral realism could be true only in a very weak sense? For instance, maybe some moral facts exist, but most of morality is underdetermined. Similarly, maybe the true morality is some all-encompassing and complete theory, but humans might be forever epistemically closed off to it. If so, then, in practice, we could never go beyond the few moral facts we already think we know for sure.
Assuming a conception of moral realism that is action-relevant for effective altruism (e.g., because it predicts reasonable degrees of convergence among future philosophers, or makes other strong claims that EAs would be interested in), is it ever epistemically warranted to have high confidence in that, and be open-endedly morally uncertain?
Another way to ask this question: If we don’t already know/see that a complete and all-encompassing theory explains many of the features related to folk discourse on morality, why would we assume that such a complete and all-encompassing theory exists in a for-us-accessible fashion? Even if there are, in some sense, “right answers” to moral questions, we need more evidence to conclude that morality is not vastly underdetermined.
For more detailed arguments on this point, see section 3 in this post.
[When thinking about what I value, should I take peer disagreement into account?]
Consider the question “What’s the best career for me?”
When we think about choosing careers, we don’t update to the career choice of the smartest person we know or the person who has thought the most about their career. Instead, we seek out people who have approached career choice with a similar overarching goal/framework (in my case, 80,000 Hours is a good fit), and we look toward the choices of people with similar personalities (in my case, I notice a stronger personality overlap with researchers than managers, operations staff, or those doing earning to give).
When it comes to thinking about one’s values, many people take peer disagreement very seriously.
I think that can be wise, but it shouldn’t be done unthinkingly. I believe that the quest to figure out one’s values shares strong similarities with the quest of figuring out one’s ideal career. Before deferring to others with one’s deliberations, I recommend making sure that others are asking the same questions (not everything that comes with the label “morality” is the same) and that they are psychologically similar in the ways that seem fundamental to what you care about as a person.