Modeling the Feasibility of Non-Informed Pooled Testing Methods
Chirasree Mandal (Berkeley CS ’22) | chirasree@berkeley.edu | April 2020

Introduction

The goal of this brief document is to identify expressions, and subsequently an actionable bound on the number of tests done in a binary testing scheme as a function of disease prevalence. I am interested in this testing method as it is being discussed to mitigate the high number of tests and associated costs during the COVID-19 pandemic, but also as a personal exercise in probabilistic analysis.

Background and Assumptions

Pooled testing is a popular procedure for clinical applications (Bilder & Tebbs 2012). I will be modeling a Non-informed pooled method that creates sub-pools by halving the original pool and assumes that all samples have the same risk for the disease. In most clinical applications, halving stops at the 3rd or 4th level. I will analyze as though halving continues for log2Nlog_2 N levels.

I am also making the following additional assumptions to simplify my analysis. 
  • Tests are binary, returning True if the individual or mixed pooled sample is positive and False otherwise
  • The tests are ideal with perfect accuracy
  • The probability that a sample tests positive is independent of other samples and is equal to the prevalence of the disease. 
  • A random selection of samples is chosen as input.

The binary testing scheme seeks to reduce the net number of tests performed and offers an alternative to testing each individual sample, reducing the strain on clinical labs in times of high testing need. It is illustrated below.


Expected Number of Tests as a Function of Prevalence

I am working with the following epidemiological definition of prevalence
  • the proportion of a particular population found to be affected by a medical condition at a specific time. 

To simplify my analysis I’m modeling each test outcome as a binomial random variable with probability p[0,1]p \in [0, 1] equal to the prevalence of the disease. 

The expected number of tests at any level i depends on the number of pruned tests in the previous level. If a test is negative, we no longer binary search it for positives, so we can prune that batch of samples from the scheme (Bilder & Tebbs 2012). 

Defining an indicator variable for test outcome at level i (Ti(T_i) as follows
  • TiT_i is 00 if the test is negative, with probability (1p)N2i(1-p)^{\frac{N}{2^i}}
  • TiT_i is 11 if the test is positive, with probability 1(1p)N2i1 - (1 - p)^{\frac{N}{2^i}}
  • this is because any number of samples can be positive for the whole test to be positive, slightly differing from the binomial distribution. 

Using linearity of expectation, the expected number of positive tests at level i:
E[Ti]=2i(1(1p)N2i)E[T_i] = {2^i}*(1 - (1 - p)^{\frac{N}{2^i}}) 

Defining an indicator variable for if a test is run at level i:
  • IiI_i is 00 if the test is not run, because its parent was negative, w.p. (1p)N2i1(1-p)^{\frac{N}{2^{i-1}}}
  • IiI_i is 11 if the test is run w.p. 1(1p)N2i11 - (1-p)^{\frac{N}{2^{i-1}}}

The expected number of tests run at level i:
E[I]=2E[Ti1]=22i1(1(1p)N2i1)E[I] = 2* E[T_{i-1}] = {2*2^{i-1}}*(1 - (1 - p)^{\frac{N}{2^{i-1}}}) 

Expected total number of tests run as a function of prevalence pp
E[Ttotal]=i=0log2N22i1(1(1p)N2i1)E[T_{total}] = \sum_{i=0}^{log_2{N}}{2*2^{i-1}}*(1 - (1 - p)^{\frac{N}{2^{i-1}}})

Conclusions


E[Ttotal]=i=0log2N2i(1(1p)N2i1)E[T_{total}] = \sum_{i=0}^{log_2{N}}{2^i}*(1 - (1 - p)^{\frac{N}{2^{i-1}}})

The formula for expectation of total tests is a handy expression that we can use to determine if it is efficient to use the binary testing strategy, or if it would cause us to run more tests than a linear testing approach would.