Note: Sorry if this came out a little bit harsh. I’m interested in your series of posts and I want to understand the situation better.
Small question first: What’s NPI? And IFR?
Bigger question: Some of your methods, as you mentioned, did not pass standards for rigor, and you claim the standards should bend around them. But how are you sure they were accurate? What makes you think the people who are “good at guessing numbers” made your model better rather than worse? Or that the surveys used to estimate social costs were really good enough to give relatively unbiased results?
In my own country, since the beginning of the pandemic and now still, I feel exactly as you said—the government doesn’t even try to estimate the costs of interventions, instead relying on a shouting match between the health and the treasury ministers to decide. So I’m very much in favour of actually getting these estimates—but to be helpful, they need to be good, and I would a priori expect good estimates for this to only be available to the government.
Also regarding IHME, admittedly I know nothing about any of this, but you say it is an organisation with an impressive track record who got COVID wrong, while your organisation is new, without any track record, but got COVID right. From a risk-averse perspective, I think the decision to fund the former—which can plausibly use its already proven abilities to improve its COVID team given funding—rather than the latter, may very well be the right decision.
NPI & IFR: thanks, it’s now explained in the text.
Re: Rigour
I think much of the problem is due not to our methods being “unrigourous” in any objective sense, but to interdisciplinarity. For example, in the survey case, we used mostly standard methods from a field called “discrete choice modelling” (btw, some EAs should learn it—it’s a pretty significant body of knowledge on “how to determine people’s utility functions”).
Unfortunately, it’s not something commonly found in the field of, for example, “mathematical modeling of infectious diseases”. It makes it more difficult for journals to review such a paper, because ideally they would need several different reviewers for different parts of the paper. This is unlikely to happen in practice, so usually the reviewers tend to either evaluate everything according to the conventions of their field, or to be critical and dismissive of things they don’t understand.
Similar thing is going on with use of “forecasting”-based methods. There is published scientific literature on their use, their track record is good, but before the pandemic there was almost no “published literature” on the subject of their use in combination with epidemic modelling (there is now!).
The second part of the problem is that we were ultimately more interested in “what is actually true” than what “looks rigorous”. A paper that contains few pages of equations, lots of complex modeling, and many simulations can look “rigorous” (in the sense of the stylized dialogue). If at the same time, for example, it contains completely and obviously wrong assumptions about the IFR of covid it will still pass many tests of “rigorousness” because it only shows that “under assumptions that do not hold in our world we reach conclusions that are irrelevant to our world” (the implication is true). At the same time, it can have disastrous consequences, if used by policymakers, who assume something like “research tracks reality”.
Ex post, we can demonstrate that some of our methods (relying on forecasters) were much closer to reality (e.g. based on serological studies) than a lot of published stuff.
Ex ante, it was clear this will be the case to many people who understand both academic research and forecasting.
Re: Funding
For the record, EpiFor is a project that has ended, and is not seeking any funding. Also, as noted in the post, we were actually able to get some funding offered: just not in a form which the university was able to accept, etc.
It’s not like there is one funder evaluating whether to fund IHME, or EpidemicForecasting. In my view the problems pointed to here are almost completely unrelated, and I don’t want them to get conflated in some way
This is reasonable except that it misunderstands peer review. Peer review does not check correctness*; it often doesn’t even really check rigour. Instead, it sometimes catches the worst half of outright errors and it enforces the discipline’s conventions. Many of those conventions are good and correlate with rigour. But when it comes to new methods it’s usually an uphill battle to get them accepted, rigour be damned.
We note above that our cost analysis (designed, implemented and calculated inside of three weeks) had weaknesses, and this is on us. But the strength of opposition to the whole approach (and not to our implementation) persuaded us that it was futile to iterate.
On forecasts: We used forecasters with an extremely good public track record. I have far more confidence in their general ability to give great estimates than I do in the mathematised parts of covid science. (I can send you their stats and then you will too.) You can tell that the reviewers’ antipathy was not about rigour because they were perfectly happy for us to set our priors based on weak apriori arguments and naive averages of past studies (the normal unrigorous way that priors are chosen).
In my limited experience, even major world governments rely heavily on academic estimates of all things, including economic costs. And as we noted we see no sign that they took into account the noneconomic cost, and nor did almost all academics. The pandemic really should make you downgrade your belief in secretly competent institutions.
The IHME section is there to note their object level failure, not to criticize the Gates Foundation for not funding us instead. (I don’t know to what extent their covid debacle should affect our estimate of the whole org—but not by nothing, since management allowed them to run amok for years.)
Note: Sorry if this came out a little bit harsh. I’m interested in your series of posts and I want to understand the situation better.
Small question first: What’s NPI? And IFR?
Bigger question: Some of your methods, as you mentioned, did not pass standards for rigor, and you claim the standards should bend around them. But how are you sure they were accurate? What makes you think the people who are “good at guessing numbers” made your model better rather than worse? Or that the surveys used to estimate social costs were really good enough to give relatively unbiased results?
In my own country, since the beginning of the pandemic and now still, I feel exactly as you said—the government doesn’t even try to estimate the costs of interventions, instead relying on a shouting match between the health and the treasury ministers to decide. So I’m very much in favour of actually getting these estimates—but to be helpful, they need to be good, and I would a priori expect good estimates for this to only be available to the government.
Also regarding IHME, admittedly I know nothing about any of this, but you say it is an organisation with an impressive track record who got COVID wrong, while your organisation is new, without any track record, but got COVID right. From a risk-averse perspective, I think the decision to fund the former—which can plausibly use its already proven abilities to improve its COVID team given funding—rather than the latter, may very well be the right decision.
NPI & IFR: thanks, it’s now explained in the text.
Re: Rigour
I think much of the problem is due not to our methods being “unrigourous” in any objective sense, but to interdisciplinarity. For example, in the survey case, we used mostly standard methods from a field called “discrete choice modelling” (btw, some EAs should learn it—it’s a pretty significant body of knowledge on “how to determine people’s utility functions”).
Unfortunately, it’s not something commonly found in the field of, for example, “mathematical modeling of infectious diseases”. It makes it more difficult for journals to review such a paper, because ideally they would need several different reviewers for different parts of the paper. This is unlikely to happen in practice, so usually the reviewers tend to either evaluate everything according to the conventions of their field, or to be critical and dismissive of things they don’t understand.
Similar thing is going on with use of “forecasting”-based methods. There is published scientific literature on their use, their track record is good, but before the pandemic there was almost no “published literature” on the subject of their use in combination with epidemic modelling (there is now!).
The second part of the problem is that we were ultimately more interested in “what is actually true” than what “looks rigorous”. A paper that contains few pages of equations, lots of complex modeling, and many simulations can look “rigorous” (in the sense of the stylized dialogue). If at the same time, for example, it contains completely and obviously wrong assumptions about the IFR of covid it will still pass many tests of “rigorousness” because it only shows that “under assumptions that do not hold in our world we reach conclusions that are irrelevant to our world” (the implication is true). At the same time, it can have disastrous consequences, if used by policymakers, who assume something like “research tracks reality”.
Ex post, we can demonstrate that some of our methods (relying on forecasters) were much closer to reality (e.g. based on serological studies) than a lot of published stuff.
Ex ante, it was clear this will be the case to many people who understand both academic research and forecasting.
Re: Funding
For the record, EpiFor is a project that has ended, and is not seeking any funding. Also, as noted in the post, we were actually able to get some funding offered: just not in a form which the university was able to accept, etc.
It’s not like there is one funder evaluating whether to fund IHME, or EpidemicForecasting. In my view the problems pointed to here are almost completely unrelated, and I don’t want them to get conflated in some way
This is reasonable except that it misunderstands peer review. Peer review does not check correctness*; it often doesn’t even really check rigour. Instead, it sometimes catches the worst half of outright errors and it enforces the discipline’s conventions. Many of those conventions are good and correlate with rigour. But when it comes to new methods it’s usually an uphill battle to get them accepted, rigour be damned.
We note above that our cost analysis (designed, implemented and calculated inside of three weeks) had weaknesses, and this is on us. But the strength of opposition to the whole approach (and not to our implementation) persuaded us that it was futile to iterate.
On forecasts: We used forecasters with an extremely good public track record. I have far more confidence in their general ability to give great estimates than I do in the mathematised parts of covid science. (I can send you their stats and then you will too.) You can tell that the reviewers’ antipathy was not about rigour because they were perfectly happy for us to set our priors based on weak apriori arguments and naive averages of past studies (the normal unrigorous way that priors are chosen).
In my limited experience, even major world governments rely heavily on academic estimates of all things, including economic costs. And as we noted we see no sign that they took into account the noneconomic cost, and nor did almost all academics. The pandemic really should make you downgrade your belief in secretly competent institutions.
The IHME section is there to note their object level failure, not to criticize the Gates Foundation for not funding us instead. (I don’t know to what extent their covid debacle should affect our estimate of the whole org—but not by nothing, since management allowed them to run amok for years.)
* Except in mathematics
Non-pharmaceutical interventions, infection fatality rate.