I’m probably going to win the first round of the Li Wenliang forecasting tournament on Metaculus, or maybe get second. (My screen name shows up in second on the leaderboard, but it’s a glitch that’s not resolved yet because one of the resolutions depends on a strongly delayed source.) (Update: I won it!)
With around 52 questions, this was the largest forecasting tournament on the virus. It ran from late February until early June.
I learned a lot during the tournament. Next to claiming credit, I want to share some observations and takeaways from this forecasting experience, inspired by Linch Zhang’s forecasting AMA:
I did well at forecasting, but it came at the expense of other things I wanted to do. In February, March and April, Covid had completely absorbed me. I spent several hours per day reading news and had anxiety about regularly updating my forecasts. This was exhausting; I was relieved when the tournament came to an end.
I had previously dabbled in AI forecasting. Unfortunately, I can’t tell if I excelled at it because the Metaculus domain for it went dormant. In any case, I noticed that I felt more motivated to delve into Covid questions because they seemed more connected. It felt like I was not only learning random information to help me with a single question, but I was acquiring a kind of expertise. (Armchair epidemiology? :P ) I think this impression was due to a mixture of perhaps suboptimal question design for the AI Metaculus domain and the increased difficulty of picking up useful ML intuitions on the go.
One thing I think I’m good at is identifying reasons why past trends might change. I’m always curious to understand the underlying reasons behind some trends. I come up with lots of hypotheses because I like the feeling of generating a new insight. I often realized that my hunches were wrong, but in the course of investigating them, I improved my understanding.
I have an aversion to making complex models. I always feel like model uncertainty is too large anyway. When forecasting Covid cases, I mostly looked for countries where similar situations have already played out. Then, I’d think about factors that might be different with the new situation, and make intuition-based adjustments in the direction predicted by the differences.
I think my main weakness is laziness. Occasionally, when there’s an easy way to do it, I’d spot-check hypotheses by making predictions about past events that I hadn’t yet read about. However, I don’t do this nearly enough. Also, I rely too much on factoids I picked up from somewhere without verifying how accurate they are. For instance, I had it stuck in my head that someone said that the case doubling rate was 4 days. So, I operated with this assumption for many days of forecasting, before realizing that it’s actually looking like 2.5 days in densely populated areas and that I should anyway have spent more time looking firsthand into this crucial variable. Lastly, I noticed a bunch of times that other forecasters were talking about issues I don’t have a good grasp on (e.g., test-positivity rates), and I felt that I’d probably improve my forecasting if I looked into it, but I preferred to stick with approaches I was more familiar with.
IT skills really would have helped me generate forecasts faster. I had to do crazy things with pen and paper because I lacked them. (But none of what I did involved more than elementary-school math.)
I learned that confidently disagreeing with the community forecast is different from “not confidently agreeing.” I lost a bunch of points twice due to underconfidence. In cases where I had no idea about some issue and saw the community predict <10%, I didn’t want to go <20% because that felt inappropriate given my lack of knowledge about the plausible-sounding scenario. I couldn’t confidently agree with the community, but since I also didn’t confidently disagree with them, I should have just deferred to their forecast. Contrarianism is a valuable skill, but one also has to learn to trust others in situations where one sees no reason not to.
I realized early that when I changed my mind on some consideration that initially had me predict different from the community median, I should make sure to update thoroughly. If I no longer believe my initial reason for predicting significantly above the median, maybe I should go all the way to slightly below the median next. (The first intuition is to just move closer to it but still stay above.)
From playing a lot of poker, I have the habit of imagining that I make some bet (e.g., a bluff or thin value bet) and it will turn out that I’m wrong in this instance. Would I still feel good about the decision in hindsight? This heuristic felt very useful to me in forecasting. It made me reverse initially overconfident forecasts when I realized that my internal assumptions didn’t feel like something I could later on defend as “It was a reasonable view at the time.”
I made a couple of bad forecasts after I stopped following developments every day. I realized I needed to re-calibrate how much to trust my intuitions once I no longer had a good sense of everything that was happening.
Some things I was particularly wrong about:
This was well before I started predicting on Metaculus, but up until about February 5th, I was way too pessimistic about the death rate for young healthy people. I think I lacked the medical knowledge to have the right prior about how strongly age-skewed most illnesses are, and therefore updated too strongly upon learning about the deaths of two young healthy Chinese doctors.
Like others, I overestimated the importance of hospital overstrain. I assumed that this would make the infection fatality rate about 1.5x–2.5x worse in countries that don’t control their outbreaks. This didn’t happen.
I was somewhat worried about food shortages initially, and was surprised by the resilience of the food distribution chains.
I expected more hospitalizations in Sweden in April.
I didn’t expect the US to put >60 countries on the level-3 health warning travel list. I was confident that they would not do this, because “If a country is gonna be safer than the US itself, why not let your citizens travel there??”
I was nonetheless too optimistic about the US getting things under control eventually, even though I saw comments from US-based forecasters who were more pessimistic.
My long-term forecasts for case numbers tended to be somewhat low. (Perhaps this was in part related to laziness; the Metaculus interface made it hard to create long tails for the distribution.)
Some things I was particularly right about:
I was generally early to recognize the risks from novel coronavirus / Covid.
For European countries and the US initially, I expected lockdown measures to work roughly as well as they did. I confidently predicted lower than the community for the effects of the first peak.
I somewhat confidently ruled out IFR estimates <0.5% in early March already, and I think this was for good reasons, even though I continued to accumulate better evidence for my IFR predictions later and was wrong about the effects of hospital overstrain.
I very confidently doubled down against <0.5% IFR estimates in late March, despite the weird momentum that developed around taking them seriously, and the confusion about the percentage of asymptomatic cases.
I have had very few substantial updates since mid March. I predicted the general shape of the pandemic quite well, e.g. here or here.
I confidently predicted that the UK and the Netherlands (later) would change course about their initial “no lockdown” policy.
I noticed early that Indonesia had a large undetected outbreak. A couple of days after I predicted this, the deaths there jumped from 1 to 5 and its ratio of confirmed cases to deaths became the worst (or second worst?) in the world at the time.
(I have stopped following the developments closely by now.)
+1 to the congratulations from JP! I may have mentioned this before, but I considered your forecasts and comments for covidy questions to be the highest-quality on Metaculus, especially back when we were both very active.
You may not have considered it worth your time in the end, but I still think it’s good for EAs to do things that on the face of it seem fairly hard, and develop better self models and models of the world as a result.
This was a great writeup, thanks for taking the time to make it. Congrats on the contest, too!
I’m sorry to hear your experience was stressful. Do you intend to go back to Metaculus in a more relaxed way? I know some users restrict themselves to a subset of topics, for example.
I’m not following the developments anymore. I could imagine that the IFR is now lower than it used to be in April because treatment protocols have improved.
[Takeaways from Covid forecasting on Metaculus]
I’m probably going to win the first round of the Li Wenliang forecasting tournament on Metaculus, or maybe get second. (My screen name shows up in second on the leaderboard, but it’s a glitch that’s not resolved yet because one of the resolutions depends on a strongly delayed source.) (Update: I won it!)
With around 52 questions, this was the largest forecasting tournament on the virus. It ran from late February until early June.
I learned a lot during the tournament. Next to claiming credit, I want to share some observations and takeaways from this forecasting experience, inspired by Linch Zhang’s forecasting AMA:
I did well at forecasting, but it came at the expense of other things I wanted to do. In February, March and April, Covid had completely absorbed me. I spent several hours per day reading news and had anxiety about regularly updating my forecasts. This was exhausting; I was relieved when the tournament came to an end.
I had previously dabbled in AI forecasting. Unfortunately, I can’t tell if I excelled at it because the Metaculus domain for it went dormant. In any case, I noticed that I felt more motivated to delve into Covid questions because they seemed more connected. It felt like I was not only learning random information to help me with a single question, but I was acquiring a kind of expertise. (Armchair epidemiology? :P ) I think this impression was due to a mixture of perhaps suboptimal question design for the AI Metaculus domain and the increased difficulty of picking up useful ML intuitions on the go.
One thing I think I’m good at is identifying reasons why past trends might change. I’m always curious to understand the underlying reasons behind some trends. I come up with lots of hypotheses because I like the feeling of generating a new insight. I often realized that my hunches were wrong, but in the course of investigating them, I improved my understanding.
I have an aversion to making complex models. I always feel like model uncertainty is too large anyway. When forecasting Covid cases, I mostly looked for countries where similar situations have already played out. Then, I’d think about factors that might be different with the new situation, and make intuition-based adjustments in the direction predicted by the differences.
I think my main weakness is laziness. Occasionally, when there’s an easy way to do it, I’d spot-check hypotheses by making predictions about past events that I hadn’t yet read about. However, I don’t do this nearly enough. Also, I rely too much on factoids I picked up from somewhere without verifying how accurate they are. For instance, I had it stuck in my head that someone said that the case doubling rate was 4 days. So, I operated with this assumption for many days of forecasting, before realizing that it’s actually looking like 2.5 days in densely populated areas and that I should anyway have spent more time looking firsthand into this crucial variable. Lastly, I noticed a bunch of times that other forecasters were talking about issues I don’t have a good grasp on (e.g., test-positivity rates), and I felt that I’d probably improve my forecasting if I looked into it, but I preferred to stick with approaches I was more familiar with.
IT skills really would have helped me generate forecasts faster. I had to do crazy things with pen and paper because I lacked them. (But none of what I did involved more than elementary-school math.)
I learned that confidently disagreeing with the community forecast is different from “not confidently agreeing.” I lost a bunch of points twice due to underconfidence. In cases where I had no idea about some issue and saw the community predict <10%, I didn’t want to go <20% because that felt inappropriate given my lack of knowledge about the plausible-sounding scenario. I couldn’t confidently agree with the community, but since I also didn’t confidently disagree with them, I should have just deferred to their forecast. Contrarianism is a valuable skill, but one also has to learn to trust others in situations where one sees no reason not to.
I realized early that when I changed my mind on some consideration that initially had me predict different from the community median, I should make sure to update thoroughly. If I no longer believe my initial reason for predicting significantly above the median, maybe I should go all the way to slightly below the median next. (The first intuition is to just move closer to it but still stay above.)
From playing a lot of poker, I have the habit of imagining that I make some bet (e.g., a bluff or thin value bet) and it will turn out that I’m wrong in this instance. Would I still feel good about the decision in hindsight? This heuristic felt very useful to me in forecasting. It made me reverse initially overconfident forecasts when I realized that my internal assumptions didn’t feel like something I could later on defend as “It was a reasonable view at the time.”
I made a couple of bad forecasts after I stopped following developments every day. I realized I needed to re-calibrate how much to trust my intuitions once I no longer had a good sense of everything that was happening.
Some things I was particularly wrong about:
This was well before I started predicting on Metaculus, but up until about February 5th, I was way too pessimistic about the death rate for young healthy people. I think I lacked the medical knowledge to have the right prior about how strongly age-skewed most illnesses are, and therefore updated too strongly upon learning about the deaths of two young healthy Chinese doctors.
Like others, I overestimated the importance of hospital overstrain. I assumed that this would make the infection fatality rate about 1.5x–2.5x worse in countries that don’t control their outbreaks. This didn’t happen.
I was somewhat worried about food shortages initially, and was surprised by the resilience of the food distribution chains.
I expected more hospitalizations in Sweden in April.
I didn’t expect the US to put >60 countries on the level-3 health warning travel list. I was confident that they would not do this, because “If a country is gonna be safer than the US itself, why not let your citizens travel there??”
I was nonetheless too optimistic about the US getting things under control eventually, even though I saw comments from US-based forecasters who were more pessimistic.
My long-term forecasts for case numbers tended to be somewhat low. (Perhaps this was in part related to laziness; the Metaculus interface made it hard to create long tails for the distribution.)
Some things I was particularly right about:
I was generally early to recognize the risks from novel coronavirus / Covid.
For European countries and the US initially, I expected lockdown measures to work roughly as well as they did. I confidently predicted lower than the community for the effects of the first peak.
I somewhat confidently ruled out IFR estimates <0.5% in early March already, and I think this was for good reasons, even though I continued to accumulate better evidence for my IFR predictions later and was wrong about the effects of hospital overstrain.
I very confidently doubled down against <0.5% IFR estimates in late March, despite the weird momentum that developed around taking them seriously, and the confusion about the percentage of asymptomatic cases.
I have had very few substantial updates since mid March. I predicted the general shape of the pandemic quite well, e.g. here or here.
I confidently predicted that the UK and the Netherlands (later) would change course about their initial “no lockdown” policy.
I noticed early that Indonesia had a large undetected outbreak. A couple of days after I predicted this, the deaths there jumped from 1 to 5 and its ratio of confirmed cases to deaths became the worst (or second worst?) in the world at the time.
(I have stopped following the developments closely by now.)
+1 to the congratulations from JP! I may have mentioned this before, but I considered your forecasts and comments for covidy questions to be the highest-quality on Metaculus, especially back when we were both very active.
You may not have considered it worth your time in the end, but I still think it’s good for EAs to do things that on the face of it seem fairly hard, and develop better self models and models of the world as a result.
I know it might not be what you’re looking for, but congratulations!
This was a great writeup, thanks for taking the time to make it. Congrats on the contest, too! I’m sorry to hear your experience was stressful. Do you intend to go back to Metaculus in a more relaxed way? I know some users restrict themselves to a subset of topics, for example.
Can you provide some links on the latest IFR estimates? A quick Google search leads me to the same 0.5% ballpark.
I’m not following the developments anymore. I could imagine that the IFR is now lower than it used to be in April because treatment protocols have improved.