Summary

Epidemiological computer simulations are well suited to modelling disease spread. These can be integrated with simulation models of other domains such as economics and supply chains to understand the second order effects of pandemics. Agent based simulations can help test interventions for managing disease spread at a granular level. Such simulations had an impact on decision making during the covid-19 pandemic, however this impact was limited due to insufficient prior investment in developing these models and lack of multi-domain integration. With proper investment in software infrastructure and scientific modelling techniques these models could be a valuable tool for tackling future pandemics.

In a previous forum post I wrote about how simulation and complexity science could contribute to a number of different EA cause areas. Since then there has been increased attention on and funding for pandemic preparedness efforts within EA. I consider this to be a particularly promising application of simulation models, so I wanted to write a post which makes the case for this specifically being a large opportunity for EA. I would be excited to support people working in this area, I also welcome any criticism or feedback in the comments.

My Background

I am not an epidemiologist; my background is as a software engineer and researcher in simulation and AI. I now work as a research engineer in AI safety. Previously I spent over 6 years working on building simulations for decision making and training at Improbable Defence. While there I published research on developing simulation models of the COVID-19 pandemic.

Recap—what are agent-based models (ABMs)?

Agent-based models are computer simulations which model the behaviour of many agents at the individual level. This is in contrast to other modelling techniques which model aggregated or macro variables. Each individual agent (e.g. a person) is represented explicitly in the simulation, which allows agents to have heterogeneous attributes and behaviours. This enables you to capture interactions between agents, allowing for nonlinear feedback effects. This interactive web demo is a fun example of a simple epidemic simulation ABM, it allows you to play around and investigate how the characteristics of agents and their interactions can affect disease spread. This section of my forum post on simulation and complexity science has a more detailed explanation of ABMs.

Why are ABMs a good fit for modelling pandemics?

The heterogeneous nature of ABMs make them particularly useful for epidemiological modelling. In the real world different people have idiosyncratic behaviours and varying levels of vulnerability to a virus. ABMs can capture interactions between agents and exponential growth which are core dynamics of disease spread. Much of this relies on individual behaviour, such as the level of adherence to lockdowns, and movement patterns of the population. The concepts of disease spread are relatively straightforward to map to the state of an agent in a computer simulation: people are either infected or not, or at different stages of infection.

These models can be initialised from datasets representing a population in the real world. They can also simulate policy interventions at a more granular level than alternative modelling techniques (such as only a fraction of the population being locked down), which helps to explore counterfactual scenarios.

What role did ABMs play in the pandemic?

ABMs, along with other types of epidemiological models, had a sizeable impact on decision making during the recent COVID-19 pandemic. The famous (or infamous) Imperial College model from Neil Ferguson et. al. was credited as one of the factors causing the UK government to change course and institute a lockdown in March 2020, by predicting 500,000 deaths in the UK if no action was taken (see this article in Nature: Special report: The simulations driving the world’s response to COVID-19).

Post-hoc modelling analysis of the pandemic is already providing useful insights and decision making recommendations for future pandemics. I collaborated on a research project which used an ABM incorporating real population movement data to understand the effects of the timing of the original March 2020 lockdown. We used this model to estimate the counterfactual number of deaths if the lockdown had been instituted a week earlier (see the paper here: A dynamic microsimulation model for epidemics). This was part of the Royal Society’s RAMP initiative for modelling the pandemic.

Making ABMs more useful, a retrospective

Despite influencing government policy the Ferguson model was clearly flawed, not least because it didn’t take into account how people would have adjusted their behaviour in the absence of a government mandated lockdown. It didn’t capture the complex mix of epidemiological, social, behavioural and economic factors which are crucial to an effective pandemic policy response. It even lacked any form of realistic people movement, having only a very abstract notion of spatial interaction.

None of this is the fault of the team working on this model; they were limited by lack of previous investment in these models. Their model was hastily repurposed from a model of a different disease (Influenza). It consisted of messy and untested code, which doesn’t meet modern modern software engineering standards, limiting its flexibility, maintainability and scientific credibility.

However this points to the potential for drastically improving the usefulness of these models, if we can invest properly in developing them ahead of time for the next pandemic.

Multi-domain modelling

There is no reason in principle why ABMs cannot include factors and dynamics from many different domains such as social science and economics. The core epidemiological ABMs can be “coupled” to other models and data sets, including travel, geographic data and economic models. This can capture important feedback effects between these different systems.

A key factor dominating decision making in a pandemic is behavioural: e.g. how will people react to extended lockdowns, what will the rate of compliance be? Obviously such things are uncertain in advance, but we can at least incorporate the best behavioural science research into these models to narrow the uncertainty. We can incorporate the demographics, existing opinions and political polarisation of the population, and attempt to capture the dynamics of dissemination of information and misinformation on social media, for example to forecast rates of vaccine uptake.

How might such multi-domain models have helped during the pandemic? If we had had such models we might have been able to better predict some of the second order impacts of lockdown policies, which seemed to take a lot of people by surprise. This includes the shock to supply chains, the labour market and the health care system. Doyne Farmer’s group at Oxford University has done some excellent analysis of the impact of covid on the labour market: Supply and demand shocks in the COVID-19 pandemic: an industry and occupation perspective. This forum post: how can economists best contribute to pandemic prevention and preparedness? discusses some interesting avenues for relevant economic research. In particular the section on Integrated Assessment Models covers the idea of using integrated multi-domain models.

The takeaway point is that these various real world systems feed back into each other, including the underlying spread of the disease itself. Improved simulation models would be able to capture these multi-domain interactions. This is an advantage of the agent-based modelling paradigm. More flexible and granular representations of heterogeneous agents allow models from different domains to be combined, for example each individual agent is an actor in an epidemiological model, but it can also have its actions determined by a model from behavioural psychology, and exist as a node in a simulated social network. This is much harder to achieve in alternative modelling paradigms, such as differential equations of macro variables.

On the other hand I don’t want to understate the challenges of coupling models from different domains. These are not just challenges of computational performance (discussed in the next section). This includes unsolved scientific modelling problems, such as how to handle the combined sources of uncertainty from multiple models, and how to ensure all models represent the world in a consistent way. In my view these challenges are a major reason why this type of multi-domain modelling has been relatively neglected. This presents an under-explored opportunity for large impact.

Computational performance and scale

Current models are far from capturing the scale and complexity of the real world. We need to improve both detail, to capture sophisticated individual agent behaviours, and scale. The world is highly interconnected, which is why global pandemics are a problem in the first place, so we need the ability to model much larger geographic areas, while retaining fidelity.

This is an ambitious goal. The current crop of tools for building ABMs fall far short on these dimensions. There are popular open source tools such as NetLogo, however this is more geared towards ease of use for educational purposes, rather than scale and performance. In my opinion none of the existing openly available tools have the right combination of performance and ease-of-use for modellers.

Most ABMs are built by academic groups, who tend to lack dedicated software engineering resources. Modern high performance computing techniques such as GPU hardware acceleration have the potential to vastly speed up agent-based simulations. In the work I mentioned previously my colleague and I got a model written by academics to run 10,000 times faster, from multiple hours to seconds, by re-writing the core of the model to run on GPUs.

The utility of a model for decision making is highly dependent on its speed. Being able to run the model quickly allows many more scenarios to be tested, in a tighter scientific feedback loop. It also enables sophisticated statistical calibration techniques which can match the model output to real-world observed data, and help to deal with uncertainty.

Running complex, multi-domain models simulating millions or billions of agents will require an enormous amount of computing power, however with Moore’s law and the advent of cloud computing there is more computing power available today than ever before. This article estimates that the amount of floating point operations per second available per dollar doubles every 2.5 years. The field of pandemic modelling has not yet been able to effectively harness this increase in available computing power.

How can these models be used?

The sole purpose of modelling is not just making point predictions, since that is incredibly difficult in practice, but it can help illuminate the distribution of possible outcomes. This can make the true potential impact and cost of such disasters much more visible and salient ahead of time (much like the Ferguson model did, although it’s a shame that the model itself was so limited). This permits far less room for deniability or wishful thinking from governments and key decision makers. Models can act as a focal point for coordination and group decision making.

I’m convinced that there could have been much more effective modelling of vaccine rollout ahead of time during the pandemic. This could have helped understand different scenarios with varying levels of vaccine effectiveness, and the implications of having vaccines which can prevent severe disease outcomes but not transmission.

Simulations can be used for “wargaming” exercises, where decision makers can train and practice by running through simulated pandemic scenarios. This is common in other domains, particularly training for military exercises. Looking ahead, simulations could help us understand and prepare for situations with hypothetical novel viruses. Properties of the model can be varied to produce different scenarios, such as infectiousness of the virus, place of geographical origin, the speed of vaccine development and rollout, and the evolution of new variants. We could even simulate scenarios that have never occurred before, such as the deliberate release of a bio-engineered pathogen, or a nightmare scenario where more than one novel pathogen is released at once.

Running a simulation many times (Monte Carlo simulation) can help us measure uncertainty. This enables sensitivity analysis to understand the effect of varying different parameters. This can help identify promising intervention points with high potential impact. It could even inform what advice EA affiliated groups give to governments on how to tackle future pandemics.

Simulation models incorporating economics could help estimate the potential cost of future pandemics in financial terms. This can strengthen the case for investing in preparation, and help quantify the expected value of interventions and targeted research. A major challenge for policy makers during the pandemic was navigating the (perceived) tradeoff between public health and the economy. We are in need of much better tools to help policy makers balance these different factors.

Viewing simulated scenarios can serve as a stark reminder of the importance of preparing for future disasters. In my experience output from a simulation can be very persuasive for nontechnical decision makers, if it is presented in the right way.

More speculatively, these models could serve as tools to plan for societal recovery from a pandemic, or at least help us understand the medium to longer term impact on society of a truly disastrous pandemic. How long would a full recovery take?

Is epidemiological modelling neglected?

A counterpoint to the idea of investing more in epidemiological models is that they are not particularly neglected. There were countless epidemiological models developed during the pandemic, many of them of the agent-based variety. However in my view almost all of them lacked key features most valuable for understanding and preparing for future pandemics. This includes the multi-domain integration described above. They also tend to be built by relatively small academic groups, who don’t have the professional software development capabilities to build models which are flexible, performant and with easy to use interfaces for nontechnical users.

Existing modelling work is not focussed on hypothetical worst case pandemic scenarios, an aspect which EAs and longtermists care relatively more about. ABMs focused on the tail of the distribution could thus be impactful. The field of decision making under deep uncertainty has relevant techniques for systematically discovering highly unfavourable scenarios.

What can the EA community do?

I would posit that there is a large opportunity for EA-minded people to build multi-domain simulation models to aid pandemic preparedness. A further use of such models could be to help make decisions on allocating EA funds to other pandemic prevention efforts, such as research on improved PPE and vaccine challenge trials.

Critical components that we currently lack are better software tools and libraries for building multi-domain simulations. This will require scalable infrastructure for complex models, data pipelines and user interfaces. Some private companies are working on simulation tools, for example Improbable Defence (my former employer) and Hash.ai. However these companies are not focused on pandemic modelling. They are tech companies with the sole concern of building proprietary software platforms, as opposed to the scientific modelling work required to build credible and useful models. A successful multi-domain pandemic modelling effort would benefit from open and collaborative modelling. In my view both the software tools and the scientific modelling work must go hand-in-hand. This is analogous to the field of Deep Learning, which is just as much an engineering effort as a scientific one.

There are very few existing organisations that promote this close collaboration between software engineers and scientific modellers / academics, particularly in pandemic modelling.

A new EA-funded organisation could create this environment for developing multi-domain pandemic models.

There are some existing examples of organisations that have aspects of what I’m imagining:

The Institute for Disease Modeling (part of the Gates foundation). I haven’t looked into their work in much detail, but it seems that they have built openly accessible agent-based simulation tools for modelling multiple different diseases (e.g. Malaria). It seems that they are more focussed on existing diseases than on preparing for future pandemics. They have a list of their modelling tools here, including an ABM tool called EMOD which “simulates the spread of disease to help determine the combination of health policies and intervention strategies that can lead to disease eradication”. However this doesn’t seem to be under active development; the EMOD GitHub repo hasn’t been updated since 2019.
While writing this blog post I came across an organisation called Gleam, who are a group of academics “using big data and computational modeling to fight infectious diseases”. They have an ABM tool which claims to incorporate realistic population movement at a global scale, which definitely looks like a step in the right direction!
We can also look at relevant examples from other domains, such as the Climate Modeling Alliance based at CalTech. They are a consortium of scientists and engineers “building a new Earth system model that leverages recent advances in the computational and data sciences”.

Perhaps there are opportunities to collaborate with or fund existing organisations. If someone is seriously considering working on this then a sensible first step could be to reach out to the team behind EMOD at the Institute for Disease Modelling. It would be interesting to know if they have considered targeting it at modelling novel pathogens.

If any of these ideas have piqued your interest, or you are considering working in this area, then I am more than happy to discuss further.

I should make it clear that I myself have switched from working on simulations for decision making to AI safety, for a variety of reasons. But that doesn’t mean I am any less excited for someone to take forward the ideas outlined in this post. I believe large scale, multi-domain simulation models have a high expected value for dealing with future pandemics.

Many thanks to David Manheim, Toby Weed, Max Reddel and Simon Grimm for reviewing a draft of this post (all errors and misguided opinions are my own!).