The major issue with a lot of this work is how you could identify a novel pathogen and know it’s a pathogen. It’s not just about detecting something before it gets recognised as an outbreak, it’s also about that info being actionable—if you don’t know if something is a human pathogen or how it spreads and the only thing to do once an alert is raised is wait to see if clinical cases start showing up in hospitals, it’s not much of an early warning system.
An alternative is to look at airplane waste, since you’ve got a higher probability that whatever you find came from a human, and you have a list of people to track down and see if they have mild symptoms of illness that haven’t reported or asymptomatic infections. You also have info on where the disease has been imported from, which can again give you more info on what to do next.
Finally got a chance to finish reading the paper! I don’t entirely understand it, though:
I think they’re modeling prevalence as initially constant and then sharply transitioning to an increase of 5% per year. In thinking about infections in the cases I’m familiar with (people and wild populations) this sounds very unlike real spread, which is exponential initially (or the sum of an exponential and a constant if you’re observing something new growing to exceed some background). Is their model more realistic for farmed animals, or is this just highly simplified?
It looks to me like maybe the statistical method they’re using relies on this clear transition between a linear constant regime and a (very nearly) linear increasing regime, and so if the model is overly simplified (above) then their results will be too optimistic about detection. (See Figure 2 on p4 of the supplementary materials.)
They talk about being able to detect entirely novel antibiotic resistance genes (scenario 3), but I don’t see anything in the paper about how they know to track a particular novel gene to see if it’s an antibiotic resistance one? Is the idea that once you do realize you care about a gene you can go back and re-analyze the sequencing data you’ve been collecting to learn how quickly it has been spreading?
how you could identify a novel pathogen
I’ve been working on simulating exponential growth detection: count how many times each k-mer (I’ve been using 40-mers) occurs on each day, and then run Poisson regression to see whether this looks like an exponential increase and how good the fit is. It works, though as discussed in this post the signal for novel viruses is likely to be extremely weak and so enough sequencing gets expensive.
and know it’s a pathogen
Yes, that’s also an important part of the problem. In some cases I think it would be clear how much of a problem it was simply from looking at it and seeing how close various parts are to matching known things (which could be automated if we’re getting lots of them). But yes, in others it would be pretty hard to judge how seriously to take it.
if you don’t know if something is a human pathogen
That one seems manageable: once we recognize that something is spreading in particular areas we could use more targeted and cheaper methods, like random sampling of hospital arrivals and qPCR.
how it spreads
I think how it spreads is probably smaller than the other concerns. Yes, if we had more details we could make a more targeted response, but general responses like lowering thresholds for wearing PPE, ramping up PPE production, ramping up testing ability and developing cheap targeted tests, reducing some forms of non-essential activity, etc would still make sense.
wait to see if clinical cases start showing up in hospitals, it’s not much of an early warning system.
As above, I think there are a bunch of things you can do aside from waiting to see if people show up in hospitals, but even then it’s much cheaper to check for in hospitals if you know specifically what you’re looking for.
An alternative is to look at airplane waste...
Yes, I think airplane waste is very promising, though the statistics are likely much trickier because of the small numbers (small numbers of fliers, small number of fliers using the in-flight toilets). I’d like to see exploration of both (and also sentinel populations) to see how they compare.
Plus, depending on your sampling system, you may not have plane-level data.
This is another nice piece of work looking at this problem, where cost of sequencing is factored in and they are working with pooled samples under different strategies: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4164148
The major issue with a lot of this work is how you could identify a novel pathogen and know it’s a pathogen. It’s not just about detecting something before it gets recognised as an outbreak, it’s also about that info being actionable—if you don’t know if something is a human pathogen or how it spreads and the only thing to do once an alert is raised is wait to see if clinical cases start showing up in hospitals, it’s not much of an early warning system.
An alternative is to look at airplane waste, since you’ve got a higher probability that whatever you find came from a human, and you have a list of people to track down and see if they have mild symptoms of illness that haven’t reported or asymptomatic infections. You also have info on where the disease has been imported from, which can again give you more info on what to do next.
Finally got a chance to finish reading the paper! I don’t entirely understand it, though:
I think they’re modeling prevalence as initially constant and then sharply transitioning to an increase of 5% per year. In thinking about infections in the cases I’m familiar with (people and wild populations) this sounds very unlike real spread, which is exponential initially (or the sum of an exponential and a constant if you’re observing something new growing to exceed some background). Is their model more realistic for farmed animals, or is this just highly simplified?
It looks to me like maybe the statistical method they’re using relies on this clear transition between a linear constant regime and a (very nearly) linear increasing regime, and so if the model is overly simplified (above) then their results will be too optimistic about detection. (See Figure 2 on p4 of the supplementary materials.)
They talk about being able to detect entirely novel antibiotic resistance genes (scenario 3), but I don’t see anything in the paper about how they know to track a particular novel gene to see if it’s an antibiotic resistance one? Is the idea that once you do realize you care about a gene you can go back and re-analyze the sequencing data you’ve been collecting to learn how quickly it has been spreading?
I’ve been working on simulating exponential growth detection: count how many times each k-mer (I’ve been using 40-mers) occurs on each day, and then run Poisson regression to see whether this looks like an exponential increase and how good the fit is. It works, though as discussed in this post the signal for novel viruses is likely to be extremely weak and so enough sequencing gets expensive.
Yes, that’s also an important part of the problem. In some cases I think it would be clear how much of a problem it was simply from looking at it and seeing how close various parts are to matching known things (which could be automated if we’re getting lots of them). But yes, in others it would be pretty hard to judge how seriously to take it.
That one seems manageable: once we recognize that something is spreading in particular areas we could use more targeted and cheaper methods, like random sampling of hospital arrivals and qPCR.
I think how it spreads is probably smaller than the other concerns. Yes, if we had more details we could make a more targeted response, but general responses like lowering thresholds for wearing PPE, ramping up PPE production, ramping up testing ability and developing cheap targeted tests, reducing some forms of non-essential activity, etc would still make sense.
As above, I think there are a bunch of things you can do aside from waiting to see if people show up in hospitals, but even then it’s much cheaper to check for in hospitals if you know specifically what you’re looking for.
Yes, I think airplane waste is very promising, though the statistics are likely much trickier because of the small numbers (small numbers of fliers, small number of fliers using the in-flight toilets). I’d like to see exploration of both (and also sentinel populations) to see how they compare.
Plus, depending on your sampling system, you may not have plane-level data.
(Writing for myself, not the NAO)