Abstract
We provide a method to track the active prevalence of COVID-19 in real time, correcting for time-varying sample selection in symptom-based testing data and incomplete tracking of recovered cases and fatalities. Our method only requires publicly available data on positive testing rates in combination with one parameter, which we estimate based on a representative randomized sample of nearly 10,000 individuals tested in Utah in May and June 2020. We validate our method using external studies in Indiana in April 2020 and two counties in Utah in March 2021. In all three locations and times, our estimates of latent prevalence are within the 95 percent confidence intervals of prevalence estimates from randomized testing. Applying our method to all 50 states, we show that true prevalence is 2-3 times higher than publicly reported.