- Statistical models can help control the spread of infectious diseases such as COVID-19.
- Effective models must take both location and time into account.
- Professor Norou Diawara at Old Dominion University in Virginia, USA and his colleagues have been using spatio-temporal modelling techniques to predict the spread of the coronavirus.
- Their model captures disease characteristics and provides insights and guidance for public health leaders tasked with managing a pandemic.
Mathematical modelling played a vital role in controlling the COVID-19 pandemic; such models have facilitated our understanding of the disease’s transmission dynamics and provided essential guidance for shaping public policy. To mitigate the spread of the virus, it is essential that appropriate statistical models take both geographical location and time into account.
This approach allows for identifying the dependencies between neighbouring states and/or countries with the temporal aspects of transmission. By factoring in the geographic locations of outbreaks, these models may reveal insights into the transmission of COVID-19 over time, enabling us to monitor and predict the disease’s progression.
Forecasting the occurrence of a disease in both space and time poses numerous challenges, particularly as data analytics from epidemiological, environmental, and socio-economical predictors all have time-related factors embedded in them. Addressing these challenges, Professors Norou Diawara and Anna Jeng, with the support of their colleagues at Old Dominion University and at the Virginia Department of Health in Virginia, USA have developed spatio-temporal modelling techniques in cooperation with wastewater surveillance to forecast the spread of COVID-19.
Using the numbers of new cases at specific time points, these models can detect the spread of a virus.
Spatio-temporal modelling is used to analyse changes in relation to both space and time. This approach involves a detailed investigation of the spatial locations and the associated times for observations. The spatial units under analysis may correspond to municipalities, regions, districts or communities, with input data representing the recorded number of cases within each spatial unit during specific time intervals, eg, a day, week, or month.
Using information collected by the World Health Organization in the period from January 27, 2020 to August 10, 2020, the research team has built spatio-temporal models that show how the transmission of COVID-19 develops over time. By using the number of new cases at specific time points, these models can detect the spread of a virus at certain points in time.
By identifying fluctuations in the number of new cases, they can recognise changes in the rate of transmission and growth of the disease. They can also determine the significant statistical covariates or predictors that explain some of the variability in the number of new COVID-19 cases.
Diawara explains that precise location and time information is required because grouping or aggregating data could mean that vital local information is missed. Therefore, it is important to collect and analyse data across a variety of spatial and temporal scales, ranging from community to city levels. Capturing the finer details of disease transmission allow for more accurate modelling of the disease dynamics. The researchers also ensure the validation and calibration of the models, as well as reliability and standardisation of the data collection methods.
Bayesian analysis and spatial autocorrelation
This research combines Bayesian analysis with Moran statistics and conditional autoregressive modelling. Bayesian modelling involves conditional probability and integrates prior knowledge into the model, so information gathered from previous studies can be used to inform it. The longitudinal structure is modelled and the autocorrelated random effects are captured. The Bayesian models search the parameters at different time intervals. Estimates of the effect of risk on the random response variable of interest, in this case, the daily COVID-19 case counts, are also included. Findings revealed that population density and the size of the country were among the significant risk factors.
Autocorrelation measures the similarity of a random process with a lagged version of itself at different points in time. Spatial autocorrelation is measured by calculating the Global Moran’s I statistic. This measures spatial autocorrelation based on locations and values. Given a set of feature locations and associated attributes, or values, Moran statistics assess whether the pattern expressed is random, dispersed, or forms clusters/blocks, and measures the spatial association and patterns of a location with its neighbours.
In a conditional autoregressive (CAR) model, the probabilities of values estimated at any location are influenced by neighbouring values. The CAR model revealed that over the 29-week period, both area and time are significantly associated with new cases of COVID-19.
The researchers use the spatio-temporal association of locations and their neighbours to model COVID-19 evolution with selected targeted countries – USA, Malaysia, Spain, and Senegal – as the feature locations. Each country forms clusters with the ones surrounding them, forming blocks denoted North America Block, Asia Block, Europe Block, and Africa Block.
Further development of modelling
In their ongoing efforts, the team has deployed a copula function to forecast COVID-19 cases in a time series manner. A significant development is incorporating the SARS-CoV-2 viral load from wastewater surveillance to predict trends and case numbers of COVID-19. This modelling approach allows them to identify changes in the rate of transmission of the disease over time by observing fluctuations in the number of new cases.
By including wastewater surveillance in the modelling, the team can select SARS-CoV-2 viral load and other relevant covariates as predictors, explaining the variations in the new cases. In addition, the researchers have used data from diverse spatial and temporal scales, ranging from community to city levels, to capture the finer details of disease transmission. Such a multi-scale approach enhances the accuracy of modelling predictions regarding disease dynamics.\
The researchers aim to build stochastic models that could capture characteristics of an infectious disease such as COVID-19 and guide public health leaders in managing the pandemic.
With the increased availability of fine-scaled data, this data can take the form of a nested structure– a structure within a structure – for example, a list or table within a data cell. A list of times can be nested within a city, and a list of cities can then be nested within a country. It is now possible to model COVID-19 progression and identify high-risk areas or clusters of disease prevalence at the level of individual cities or communities. For example, the modelling can forecast future outbreaks in the community or the city.
The researchers aim to build stochastic models that could capture characteristics of infectious diseases such as COVID-19 and guide public health leaders in managing a pandemic. With the understanding of the complex systems of a pandemic, the team has collaborated with local and state health departments to better communicate applications of modelling outcomes for policy-making, such as predicting disease spread, identifying outbreaks, selecting effective interventions, vaccination strategies, what-if scenario planning, and monitoring new variants.
What inspired you to conduct this research?
There’s a great need for transformative research incorporating human behavioural processes and other associated factors into statistical models that can connect the interface of different sources of real-time large datasets. Currently, the lack of connecting large data on COVID-19, human behaviour, and environmental factors at the state and community levels creates a barrier to modelling and simulating the dynamic of the pandemic.
This research is to accelerate the effectiveness of using models to understand the spread and dynamics of diseases, including COVID-19, and to provide accurate predictions of future cases. It will contribute to the understanding of the dynamics of COVID-19 spread and provide valuable insights for decision-makers and public health officials in the ongoing response to this pandemic and future pandemics. It encourages the principles of dynamic time series modelling, promotes interoperability and data use, and increases operational efficiencies in combining data sets for analyses.
What aspect poses the greatest challenge in modelling a pandemic?
The difficulty in predicting a pandemic is in the underlying occurrence and amount or spikes in the case counts. Copula functions improve efficiency in disease predictions and are flexible tools to generalise the limitations in the Gaussian distribution functions and take in the dependencies between variables. Big variable data related to COVID-19 are collected. However, bigger and more data sets don’t automatically mean better information unless suitably analysed. The importance of big data aligned with better assessment will address governance and present statistics to a larger audience. Specifically, the applications of discrete count distribution to estimate the modelled parameters and predicted values in space and time periods to improve COVID-19 case count estimates are still underway.
The dynamics of the COVID-19 pandemic are associated with multiple covariates. Algorithms associated with such dynamics require careful attention to the distributional forms and properties.
What are your plans to extend this research?
By leveraging data analytics from environmental, epidemiological, and socio-economical predictors, our study extends the associations between COVID-19 case counts (using Poisson distribution) and environmental predictors. We plan to show that SARS-CoV-2 viral load in wastewater is correlated with COVID-19 case counts and can serve as a predictor for forecasting COVID-19 cases and trends. The plan is to predict pandemic clusters, resurgences, and trends. Research results will help to develop control measures and interventions.
We will advance modelling and prediction by considering the negative binomial and Conway-Maxwell Poisson distribution to account for dispersion. Moreover, using the bivariate copula time series (CTS), we will build joint distributions with discrete and continuous types of data. Inferences will be more accurate as the bivariate CTS will describe the time-dependent relationship between COVID-19 cases and SARS-CoV-2 viral load in wastewater, or other variables deemed key in controlling this disease. The extension under the multivariate copula autoregressive model under the class of vine copulas will build structured cross-dependences, spatial and temporal representations in pairwise copulas, for the dynamics of such diseases.
Can the research be used to describe other infectious diseases?
Yes, this research can be applied to other diseases such as the flu, influenza, hepatitis, meningococci, and more. Applying disease surveillance models will be the only way to assess the needs of the community. The non-invasive wastewater viral load information can help capture the fine-scale heterogeneity of disease transmission, inform more accurate modelling of disease dynamics, and increase operational efficiencies in action steps.