Epidemiological measurement on COVID-19 pandemic may have statistical biases which might affect next variant responses
Reporter: Stephen J. Williams Ph.D.
Source: https://www.science.org/doi/10.1126/science.abi6602
From the jounal Science
Tackling the pandemic with (biased) data
CHRISTINA PAGEL AND CHRISTIAN A. YATESSCIENCE•22 Oct 2021•Vol 374, Issue 6566•pp. 403-404•DOI: 10.1126/science.abi66027,757
Accurate and near real-time data about the trajectory of the COVID-19 pandemic have been crucial in informing mitigation policies. Because choosing the right mitigation policies relies on an accurate assessment of the current state of the local epidemic, the potential ramifications of misinterpreting data are serious. Each data source has inherent biases and pitfalls in interpretation. The more data sources that are interpreted in combination, the easier it is to detect genuine changes in an epidemic. Recently, in many countries, this has involved disentangling the varying impact of rising but heterogeneous vaccination rates, relaxation of mitigations, and the emergence of new variants such as Delta.The exact data collected and their accuracy will vary by country. Typical data common to many countries are numbers of tests, confirmed cases, hospital and intensive care unit (ICU) admissions and occupancy, deaths, and vaccinations (1). Many countries additionally sequence a proportion of new positive tests to identify and track emerging variants. Some countries also now collect and publish data on infections, hospitalizations, and deaths by vaccination status (e.g., Israel and the UK). Stratifying all available data by different demographic factors (e.g., age, location, measures of deprivation, and ethnicity) is crucial for understanding patterns of spread, potential impact of policies, and efficacy of vaccines (age, timing of breakthrough infections, and prevalent variants).It is also necessary to be aware of what data are not being collected. For example, persistent symptoms of COVID-19 (Long Covid) were recognized as a long-term adverse outcome by the autumn of 2020. However, no simple diagnostic test has been associated with the up to 200 different reported symptoms (2). Counting Long Covid relies on a clinical diagnosis, based on a history of having had COVID-19 and a failure to fully recover, with development of some characteristic symptoms and with no obvious alternative cause (3). These features make it very difficult to measure routinely, and so it rarely is. As a result, Long Covid is often neglected in decision-making. Failure to account for the disease load associated with Long Covid may lead to an unnecessary long-term societal health burden.The feedback between different types of outcomes, different severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, different mitigation policies (including vaccination), and individual risks (a combination of exposure and clinical risk) is complex and must be factored into both interpretation of data and the development of policy. Using all available data to quantify transmission is crucial to ensuring rapid and effective responses to early phases of renewed exponential growth and to evaluating mitigation measures. Relying too much on a single data source, or without disaggregating data, risks fundamentally misunderstanding the state of the epidemic.The inherent biases and lags in data are particularly important to understand from the point of view of policy-makers. Because of the natural time scales of COVID-19 disease progression (see the figure), policy changes can take several weeks to show up in the data. Purely reactive policy-making is likely to be ineffective. When cases are rising, increases in hospital admissions and deaths will follow. When a new variant is outcompeting existing strains, it is likely to become dominant without action to suppress. The precautionary principle suggests acting early and emphatically. Conversely, when releasing restrictions, governments must wait long enough to assess them before continuing with re-opening.The most up-to-date indicator of the state of the epidemic is typically the number of confirmed cases, as ascertained through testing of both symptomatic individuals and those tested frequently regardless of symptoms. Symptom-based testing is likely to pick up more adults and fewer younger individuals (4). Infections in children are harder to detect: children are more likely to be asymptomatic than adults, are harder to administer tests to (particularly young children), are often exposed to other viruses with similar symptoms, and can present with symptoms that are atypical in adults (e.g., abdominal pain or nausea). Children under 12 are not routinely offered the COVID-19 vaccination, and their mixing in schools provides ongoing opportunities for the virus to circulate, so it will be important for countries to track infections in children as accurately as possible. Other testing biases include accessibility, reporting lags, and the ability to act lawfully upon receiving a positive result. Substantial changes in the number of people seeking tests may further confound case figures (5). Case positivity rates may provide a more accurate reflection of the state of the epidemic (6) but are dependent on the mix of symptomatic and asymptomatic people being tested.SARS-CoV-2 variants have been an important driver of local epidemics in 2021. The four main SARS-CoV-2 variants of concern, to date, are B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), and B.1.617.2 (Delta). Some have been more transmissible (Alpha), some have substantial resistance to previous infection or vaccines (Beta), and some have elements of both (Gamma and Delta) (7). Currently, the high transmissibility of Delta combined with some immune evasion has made it the world’s dominant variant. Determining which variants pose a substantial threat is difficult and takes time, particularly when many variants cocirculate. This is especially true for situations in which a dominant variant is declining, and a new one growing. While the declining variant remains dominant, its decrease masks increases in the new variant because case numbers remain unchanged or fall overall. Only when a new variant becomes dominant does its growth become apparent in aggregated case data, by which time it is, by definition, too late to contain its spread. This dynamic has been observed across the world with Delta over the latter half of 2021.With multiple variants circulating, there are, effectively, multiple epidemics occurring in parallel, and they must be tracked separately. This typically requires the availability of sequencing data, which is unfortunately limited in most countries. Sequencing takes time and so is typically a few weeks out of date. These lags, and the uncertainty in sampling, can lead to hesitancy in acting. The rapid path to dominance of the Delta variant in the UK highlights the need for action when a quickly growing variant represents a few percent (or less) of overall cases.Hospital admissions or occupancy data do not suffer the same biases associated with testing behaviors and provide unequivocal evidence of widespread transmission, its geography, and demographics. However, hospital admissions lag infections more than reported cases do, rendering these data less useful for proactive decision-making. Hospital data are also biased toward older people, who are more likely to suffer severe COVID-19, and now, unvaccinated populations. ICU occupancy data show a younger age profile than admissions because younger patients have a better chance of benefitting from the invasive treatment procedures (8).Deaths are the most lagged indicator, typically occurring 3 or more weeks after infection and with an additional lag in registration and reporting. Death data should never be used to inform real-time policy decisions. Instead, death figures can act as an eventual measure of the success of a country’s epidemic strategy and implementation. The age distribution of those who eventually die from COVID-19 is different from other metrics of the epidemic—skewed furthest toward older age groups (9). Those with clinical risk factors (such as immunodeficiency, obesity, or existing lung conditions), high exposure (health care workers and low-income workers), and the unvaccinated are overrepresented in COVID-19 deaths.In countries with high vaccination rates, vaccination has had a substantial impact—reducing COVID-19 cases, hospitalizations, and deaths. However, when looking at the raw numbers in highly vaccinated populations, it can be the case that more fully vaccinated people are dying of COVID-19 than unvaccinated. If these raw statistics are misinterpreted—or worse, deliberately misused—they can damage vaccine confidence. More vaccinated people may die than unvaccinated because such a high proportion of people are vaccinated (10). This does not mean vaccines are not effective at preventing death. Looking at the rates of death in vaccinated and unvaccinated individuals separately within age groups demonstrates that vaccines provide considerable protection against severe disease and death. This example illustrates how important it is to curate and manage the way in which data are presented.
Each country has established its own vaccination priority lists and dosing schedules to best achieve its goals (11, 12). Each of these strategies will manifest differently in the data. Additionally, many countries are using multiple vaccines in tandem and administer them differently for different demographics. Some countries are vaccinating adolescents, and others are not or not offering them the full approved dose. Most vaccines require two doses, spaced between 3 and 12 weeks apart, except for the Johnson & Johnson single-dose vaccine. This matters, particularly as variants spread, because different vaccines have different effectiveness after one and two doses, different timelines to full effectiveness, and different effectiveness against variants (13).Data published on the vaccination delivery itself must thus go beyond the raw numbers of people vaccinated. Vaccine uptake must be reported by whether fully or partially (one-dose in a two-dose regimen) vaccinated and using the whole population as a denominator. It is vital to disaggregate vaccine data by age, gender, and ethnicity as well as location so that it is possible, for example, to understand the impact of deprivation on vaccine coverage or vaccine hesitancy in particular demographics. When interpreting vaccination data, it is important to remember that there is also a lag between delivery and the build-up of immunity.Data on reinfection and post-vaccination (breakthrough) infection are also important to determine the relative benefits of infection-mediated and vaccine-mediated immunity and the length of protection offered. Studies that show those who were immunized earlier were acquiring COVID-19 with higher rates than those vaccinated more recently may suggest waning vaccine protection (14). Such studies have already prompted vaccine booster programs in some countries. However, any study that suggests waning immunity must be extremely careful to ensure that the “early” and “recent” subgroups are properly controlled. Differences in prior exposure, affluence, education level, age, and other demographic factors between these cohorts may be enough to explain the disparities in SARS-CoV-2 infection rates, even in the absence of waning immunity. Waning immunity must also be reported separately for different outcomes; for example, there might be waning in terms of preventing symptomatic infection but far less or none in preventing death (15). Additionally, there are ethical concerns about mass booster programs in high-income countries while many lower-income countries have been unable to procure vaccines.Moving into the vaccination era, reported cases, hospitalizations, and deaths should also be disaggregated by vaccination status (and by which vaccine), which will be easier in countries where national linked datasets exist. Additionally, incorporating Long Covid into routine reporting and policy-making is crucial. Consistent diagnostic criteria and well-controlled studies will be vital to this effort. These elusive data will be of critical importance to navigate our way successfully out of the pandemic.
Acknowledgments
C.P. and C.A.Y. are both members of Independent SAGE: www.independentsage.org.
References and Notes
1M. Roser et al., Our World in Data (2021); https://bit.ly/3kepLgw.GO TO REFERENCEGOOGLE SCHOLAR2H. E. Davis et al., E. Clin. Med.38, 101019 (2021).GO TO REFERENCEGOOGLE SCHOLAR3M. Sivan, S. Taylor, BMJ371, m4938 (2020).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR4S. M. Moghadas et al., Proc. Natl. Acad. Sci. U.S.A.117, 17513 (2020).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR5J. Wise, BMJ370, m3678 (2020).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR6D. Dowdy, G. D’Souza, COVID-19 Testing: Understanding the “Percent Positive” (2020); https://bit.ly/3CeN8wl.GO TO REFERENCEGOOGLE SCHOLAR7C. E. Gómez et al., Vaccines (Basel)9, 243 (2021).CROSSREFPUBMEDGOOGLE SCHOLAR8A. B. Docherty et al., BMJ369, 1985 (2020).GO TO REFERENCECROSSREFPUBMEDGOOGLE SCHOLAR9Office for National Statistics, Deaths registered weekly in England and Wales by age and sex: covid-19 (2021); https://bit.ly/3Ci2obS.
For more articles on COVID-19 please se our Coronavirus Portal at
For articles on Issues of Bias in Science on this Open Access Journal see
From @Harvardmed Center for Bioethics: The Medical Ethics of the Corona Virus Crisis
Paper in collection COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv