Just to be clear, to me the fact that Western government are already doing it is a positive point in favor of your proposal, since it is evidence for utility of contact tracing in containing the virus.
0) Even basic questions about the virus and how it spreads are still unanswered, like how infectious one is during the incubation period. This makes more advanced questions regarding a risk score difficult to answer.
I agree. And there are a lot of resources being put into research on this right now, so in time I hope we have better answers. But even imperfect information could be helpful. See the Q&A with Sukrit Silas. At first I imagined the app could only give a risk score that was very coarse. Just levels 1-7. I’ve commented separately, header Example Score Levels, with an example of what I mean, which I didn’t put in the post because I have no confidence in what it should be. But you might be able to show a decimal point too. I think it’s good not to start out being committed to any kind of risk scoring system until you have a sense of what’s possible.
3) You mention that Google traffic data is still useful, even when few people use it. I am not familiar with that part of the app, but if it involves some form of prediction, it is important to note that Google has had years to get this right. With a pandemic, you have at best months(!), and on top of that the situation changes constantly.
Yes. This is another reason why working with someone like Google Maps or some other mapping app could be crucial, because they have accumulated domain / tribal knowledge that no one else might have.
[Edit: I’ve received some private feedback that neither of us might be right here. The calculation for both (density of traffic and density of “infectiousness”) may be quite straightforward. The reason traffic updates got so much better might just have been more data.]
2) Regarding the computation of the risk score: If you only use confirmed cases with voluntary sign up, you might not get enough data; if you use suspected cases by symptoms, you will get a lot of false positives due to worried people with the flu. In the absence of data on how to properly account for that, this is a very difficult problem.
These are significant challenges and I talk a bit about how they can be addressed in The Incentives Align and at the end of the section Example App Questionnaire. I imagine there would always be more confidence put in a confirmed case with a code than someone who just answers yes to having cold or flu symptoms recently.
Also, for confirmed cases, as part of contact tracing, the CDC sometimes identifies a site of concern where a patient might recall that something particularly infectious happened before they were aware they were sick. For example: “Oh. I remember that a few days ago, I sneezed quite forcefully and unexpectedly at my favorite buffet, so I couldn’t cover my nose. Oh, and then again on the way home on the BART! I’m so sorry.” Tracing multiple paths backwards, you might get a lot of data from a single event.
1) How likely are you to catch the virus at all just by being in the same area/frequenting the same shops as somebody infected? My impression from the Western cases so far was that it infections occurred generally with close contacts; this risk changes obviously when more infected people are around, but still should be estimated to decide whether such an app would be worth it.
I imagine the CDC could already have some estimates for this, which they might use in contact tracing. And it might turn out that contact tracing is enough to solve the problem. It seems to be working well right now in the U.S.
But if not, a not-so-educated guess for a general outline of the calculation of risk for the app might be (1) Close contact. You were in the same location where a possibly infectious person spent some time within maybe 10 minutes of them. This is higher risk. (2) Semi-Close Contact. If the virus might live on surfaces for a few hours, the close contact risk distribution tapers off over a few hours and (3) Infectiousness. This isn’t binary so the infectious person’s distribution also peaks sometime around their first symptoms, and tapers off over a few days up until the tail reaches some maximum.
A heat map that changes with time could reflect this information. And any individual’s risk could be calculated by integrating over it. And, if the local situation is suddenly found to have had multiple people in it in the last few weeks, and the level of precision was possible, you could do multiple iterations of this given each user’s (small) risk of contracting the virus from their interactions too, given an estimate of what we think the time might be between when a person contracts the virus to when they become infectious. This is harder, but if everything else works, it’s possible.
Writing this out in words is long, but the actual summation/integral is not too complicated. It’s a combination of science and hack-y guesses. But this seems true to me of almost all engineering.
Just to be clear, to me the fact that Western government are already doing it is a positive point in favor of your proposal, since it is evidence for utility of contact tracing in containing the virus.
I agree. And there are a lot of resources being put into research on this right now, so in time I hope we have better answers. But even imperfect information could be helpful. See the Q&A with Sukrit Silas. At first I imagined the app could only give a risk score that was very coarse. Just levels 1-7. I’ve commented separately, header Example Score Levels, with an example of what I mean, which I didn’t put in the post because I have no confidence in what it should be. But you might be able to show a decimal point too. I think it’s good not to start out being committed to any kind of risk scoring system until you have a sense of what’s possible.
Yes. This is another reason why working with someone like Google Maps or some other mapping app could be crucial, because they have accumulated domain / tribal knowledge that no one else might have.
[Edit: I’ve received some private feedback that neither of us might be right here. The calculation for both (density of traffic and density of “infectiousness”) may be quite straightforward. The reason traffic updates got so much better might just have been more data.]
These are significant challenges and I talk a bit about how they can be addressed in The Incentives Align and at the end of the section Example App Questionnaire. I imagine there would always be more confidence put in a confirmed case with a code than someone who just answers yes to having cold or flu symptoms recently.
Also, for confirmed cases, as part of contact tracing, the CDC sometimes identifies a site of concern where a patient might recall that something particularly infectious happened before they were aware they were sick. For example: “Oh. I remember that a few days ago, I sneezed quite forcefully and unexpectedly at my favorite buffet, so I couldn’t cover my nose. Oh, and then again on the way home on the BART! I’m so sorry.” Tracing multiple paths backwards, you might get a lot of data from a single event.
I imagine the CDC could already have some estimates for this, which they might use in contact tracing. And it might turn out that contact tracing is enough to solve the problem. It seems to be working well right now in the U.S.
But if not, a not-so-educated guess for a general outline of the calculation of risk for the app might be (1) Close contact. You were in the same location where a possibly infectious person spent some time within maybe 10 minutes of them. This is higher risk. (2) Semi-Close Contact. If the virus might live on surfaces for a few hours, the close contact risk distribution tapers off over a few hours and (3) Infectiousness. This isn’t binary so the infectious person’s distribution also peaks sometime around their first symptoms, and tapers off over a few days up until the tail reaches some maximum.
A heat map that changes with time could reflect this information. And any individual’s risk could be calculated by integrating over it. And, if the local situation is suddenly found to have had multiple people in it in the last few weeks, and the level of precision was possible, you could do multiple iterations of this given each user’s (small) risk of contracting the virus from their interactions too, given an estimate of what we think the time might be between when a person contracts the virus to when they become infectious. This is harder, but if everything else works, it’s possible.
Writing this out in words is long, but the actual summation/integral is not too complicated. It’s a combination of science and hack-y guesses. But this seems true to me of almost all engineering.