The goal of this brief document is to identify expressions, and subsequently an actionable bound on the number of tests done in a binary testing scheme as a function of disease prevalence. I am interested in this testing method as it is being discussed to mitigate the high number of tests and associated costs during the COVID-19 pandemic, but also as a personal exercise in probabilistic analysis.
Background and Assumptions
Pooled testing is a popular procedure for clinical applications(Bilder & Tebbs 2012). I will be modeling a Non-informed pooled method that creates sub-pools by halving the original pool and assumes that all samples have the same risk for the disease. In most clinical applications, halving stops at the 3rd or 4th level. I will analyze as though halving continues for log2N levels.
I am also making the following additional assumptions to simplify my analysis.
Tests are binary, returning True if the individual or mixed pooled sample is positive and False otherwise
The tests are ideal with perfect accuracy
The probability that a sample tests positive is independent of other samples and is equal to the prevalence of the disease.
A random selection of samples is chosen as input.
The binary testing scheme seeks to reduce the net number of tests performed and offers an alternative to testing each individual sample, reducing the strain on clinical labs in times of high testing need. It is illustrated below.
Illustration of binary testing scheme
Expected Number of Tests as a Function of Prevalence
I am working with the following epidemiological definition of prevalence:
the proportion of a particular population found to be affected by a medical condition at a specific time.
To simplify my analysis I’m modeling each test outcome as a binomial random variable with probability p∈[0,1] equal to the prevalence of the disease.
The expected number of tests at any level i depends on the number of pruned tests in the previous level. If a test is negative, we no longer binary search it for positives, so we can prune that batch of samples from the scheme(Bilder & Tebbs 2012).
Defining an indicator variable for test outcome at level i (Ti) as follows
Ti is 0 if the test is negative, with probability (1−p)2iN
Ti is 1 if the test is positive, with probability 1−(1−p)2iN
this is because any number of samples can be positive for the whole test to be positive, slightly differing from the binomial distribution.
Using linearity of expectation, the expected number of positive tests at level i:
E[Ti]=2i∗(1−(1−p)2iN)
illustration of probability of a negative test at each level
Defining an indicator variable for if a test is run at level i:
Ii is 0 if the test is not run, because its parent was negative, w.p. (1−p)2i−1N
Ii is 1 if the test is run w.p. 1−(1−p)2i−1N
The formula for expectation of total tests is a handy expression that we can use to determine if it is efficient to use the binary testing strategy, or if it would cause us to run more tests than a linear testing approach would.
Introduction
Background and Assumptions
Expected Number of Tests as a Function of Prevalence
Conclusions