Original Article |
Corresponding author: Ralitsa Raycheva ( dirdriem@gmail.com ) © 2022 Ralitsa Raycheva, Ani Kevorkyan, Yordanka Stoilova.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Raycheva R, Kevorkyan A, Stoilova Y (2022) Stochastic modelling of scalar time series of varicella incidence for a period of 92 years (1928-2019). Folia Medica 64(4): 624-632. https://doi.org/10.3897/folmed.64.e65957
|
Introduction: Varicella is an acute, highly contagious disease, characterised by generalised vesicular exanthema caused by the initial infection with varicella zoster virus (VZV) which usually affects children aged 2 to 8 years.
Aim: To analyse the changes of varicella incidence in Bulgaria over the period of 1928–2019.
Materials and methods: The time series analysis is based on the official data for varicella incidence (per 100,000) in Bulgaria for ninety-two years (1928–2019), obtained from three major sources. We utilized the method to construct a time series model of overall incidence (1928–2019) using time series modeller in SPSS v. 25. We followed all three steps of the standard ARIMA methodology to establish the model – identification, parameter estimation, and diagnostic checking.
Results: Stochastic scalar time series modelling of the varicella incidence from 1928 to 2019 was performed. The stochastic ARIMA (0,1,1) was identified to be the most appropriate model. The decomposition of varicella incidence time series into a stochastic trend and a stationary component was reasoned based on the model defined. In addition, we assessed the importance of the long-term and immediate effect of one shock. The long-term forecast was also under discussion.
Conclusions: The ARIMA model (0,1,1) in our study is an adequate tool for presenting the varicella incidence trend and is suitable to forecast near future disease dynamics with acceptable error tolerance.
ARIMA models, dynamics, epidemiological forecasting, stationary series, trends
Varicella is an acute, highly contagious disease characterised by generalised vesicular exanthema caused by the initial infection with varicella zoster virus (VZV), which usually affects children aged 2-8 years.[
Varicella is not on the list of reportable infectious diseases and subject of registration in the European Union, thus the overall burden of disease is not well known and a country-level estimate is challenging.[
Bulgaria is among the countries with national mandatory surveillance system covering the total country population since 1940[
Ninety-two years of national surveillance of varicella incidence as one of the major intensive epidemiological indicators enables us to outline the trends and make an epidemiological forecast for varicella infection in this country. It should be noted that the stochastic modelling of different infectious disease parameters, although not often, is still available.[
In this context, our study is a continuation of the present trends for new experimental data accumulation with help of the modern achievements of statistics in the study of non-stationary time series.
The time series analysis is based on official data for varicella incidence (per population of 100,000) in Bulgaria for ninety-two years from 1928 to 2019. The information was collected from three major sources: The National Center for Public Health and Analysis (NCPHA), the National Center for Infectious and Parasitic Diseases (NCIPD) and data from the chronicles of Bulgarian epidemiology in the 20th century.[
We used the method of time series analysis to construct a time series model of overall incidence (1928–2019) using a Time Series Modeller in SPSS v. 25. We established autoregressive integrated moving average (ARIMA) model for prediction.[
The ARIMA model was based on the yearly incidence varicella rates (per 100,000 population) from 1928 to 2019 in Bulgaria. We followed all three steps of the standard ARIMA methodology to establish the model: identification, parameter estimation, and diagnostic checking. We analysed the graphs for the autocorrelation function (ACF) and partial autocorrelation function (PACF) to identify the possible values of non-seasonal autoregressive order (p), non-seasonal differencing (d), and non-seasonal moving average order (q). Three test procedures were used to determine whether the time series after differencing was stationary or not. In the second step, the ARIMA model parameters was estimated. In the last step, we diagnosed the residual error sequence for white-noise sequence using the Ljung-Box Q test. Finally, the SPSS Expert Modeller has identified and estimated the best fitting ARIMA.
We visually inspected Fig.
Fig.
Thus, there is enough evidence for a more in-depth analysis of non-stationarity of the surveyed time series. Appropriate stochastic modelling, on the other hand, necessitates a study of the nature of eventual non-stationarity. The contemporary time series analysis elaborates four options: trend stationarity, a unit root (a difference stationary process), structural changes, and long memory. While in the trend stationarity, the original series becomes stationary after proper linear detrending, in the case of a unit root process, this happens after the differentiation of the series. Structural changes in an autoregressive model are of interest as the time series properties of the model, such as stationarity, may be different before and after the change.[
First, we conducted Dickey-Fuller test to test the null hypothesis that a unit root is present in a time series of varicella incidence.[
Although the Dickey-Fuller test proves that the time series of varicella incidence is stationary after differentiation; however, as is well known, this test has a relatively small power.[
The test procedure requires fixed length of the Bartlett (triangular) window, used to calculate the one time step error dispersion. It turns out that the length of this window is short (= 3) delays. The results of the test itself are as follows: In the case of null hypothesis about stationarity around the constant, the value of the corresponding statistics is 2.21 and this hypothesis can be rejected at a significance level of 0.01%. Thus, the stationarity test we conducted rejected the null hypotheses of stationarity around the trend with a slope or around a constant.
It is possible that a time series will remain stationary around a trend with abrupt structural changes. We tested the time series of varicella incidence for this option by applying the so-called stationarity test after differentiation.[
The results of this incidence time series test are as follows: in the case of the alternative for structural change only in the trend constant, two endogenous variables are sufficient for the non-correlation of the residues. The year of the possible structural change is 1985. The value of the relevant statistics is 10.65 and the null hypothesis for post-differentiation stationarity is rejected at a 1% significance level. The conclusion is that the hypothesis of stationarity after differentiation of the time series of varicella incidence is rejected against the alternatives for stationarity with different types of structural change in the linear trend in the first stage of the analysis.
In the end, we will focus on the fourth option of non-stationarity – time series to be stationary after the filtration with fractional differentiation filter. The test procedure[
In conclusion, the applied test procedures confirm the conclusion that the non-stationarity of varicella incidence time series is stationarity after differentiation type. The differentiation of the time series leads to stationarity, which seems to be not only without a slope, but also without drift.
One-round differentiation of varicella incidence time series was sufficient to ensure stationarity. This result suggests Box-Jenkins modelling with the appropriate integrated autoregressive moving average (ARIMA).[
The visual examination of the differentiated time series leads to the conclusion that it is stable in terms of variations in dispersion over time. Fig.
Time series (top), autocorrelation function (middle) and partial autocorrelation function (below) of the differentiated time series of varicella incidence.
At this stage of the analysis, a question of crucial importance is how to define the “best fit” among the already outlined set of models. To respond to the challenge, we took advantage of the opportunity offered by IBM’s Statistical Package for Social Sciences (SPSS). IBM SPSS Forecasting Analysis contains a Time Series Module in which Expert Modeller automatically identifies and estimates the best fitting ARIMA or exponential smoothing model for one or more dependent variable series. Although users can specify a custom ARIMA manually, Expert Modeller eliminates a great deal of the trial and error associated with doing so. As a result of the application of Expert Modeller, the ARIMA model (0,1,1) was favoured. The prediction equation for ARIMA (0,1,1) model can be written in a number of mathematically equivalent forms, one of which is the so-called “error correction” form, in which the previous forecast is adjusted in the direction of the error it made:
Ŷt = Ŷt-1 + αet-1,
because et-1 = Ŷt-1 − Ŷt-1 by definition, this can be rewritten as:
Ŷt = Yt-1 − (1-α)et-1 = Yt-1 − θ1et-1
The estimation of the coefficients of the selected model using the maximum likelihood method results in:
Ŷt = Yt-1 – 0.404et-1
Stochastic ARIMA models suggest normal distribution and uncorrelated residues with a constant dispersion over time. The normal distribution was checked with the Shapiro-Wilk test. The value of W-statistics in the Shapiro-Wilk test is 0.99 and the corresponding p-value is 0.483. The hypothesis of normal distribution is not rejected. The final conclusion is that the residues are normally distributed. Fig.
Fig.
As a general rule, the long-term forecasts of stationarity after differentiation processes are prone to failure due to the indefinite accumulation of forecast errors over time. Here we will demonstrate a relatively extended forecast, but with a limited horizon outside the sample. In Fig.
Year | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | |
Observed | 528.97 | 315.28 | 343.88 | 453.94 | 353.12 | 339.90 | 434.19 | ||||
ARIMA (0,1,1) | Forecast | 368.32 | 369.47 | 370.61 | 371.76 | 372.91 | 374.06 | 375.21 | 376.36 | 377.50 | 378.65 |
95% LCL | 250.02 | 230.58 | 214.47 | 200.59 | 188.36 | 177.39 | 167.43 | 158.31 | 149.91 | 142.11 | |
95% UCL | 502.96 | 531.49 | 556.68 | 579.64 | 600.96 | 621.02 | 640.06 | 658.26 | 675.75 | 692.63 |
The proper identification procedures we applied indicated that ARIMA (0,1,1) model fits best our original time series of the annual varicella incidence for the period from 1928 to 2019. In trend-stationary models, characterised by given common properties, the time series returns to the trend, in particular, at stationarity – to the average.[
Our model is stationary after differentiation without drift. Hence, as long as the observed time series values do not tend to zero, the varicella incidence should not be steadily decreasing. The dispersion growth over time impacts the forecast accuracy. The effect of random shocks on incidence is long-lasting, but is smaller than the immediate ψ(1)=0.09.
Varicella incidence for the time period 1928-2019 demonstrates upward trend with an average of 280.14 (per 100,000). In the sub-period 1928-1951, the incidence was lower (between 40.40 and 65.60 per 100,000) with an average of 36.20 (per 100,000), most likely related to partial registration of the infection and incidence during that period. In 1952, the indicator for the first time exceeded 100 per 100,000 population, and in 1955 it exceeded 200 per 100,000 population (a rate below which the incidence did not fall throughout the overall sub-period from 1955 to 2019). During the 1955-2019 sub-period, the lowest registered rate was 230.40 (per 100,000) in 1956 and 1961, and the highest rate recorded was 614.29 (per 100,000) in 1982. Based on this evidence, the logical assumption is that the incidence rate will continue to increase in future decades. In the absence of varicella vaccination as a regular component for epidemic process modelling, we observed typical epidemics cycles at short intervals of 2-5 years.[
Most of the available articles and reports studying varicella infection in Bulgaria are descriptive with analyses of rates (incidence, mortality, and lethality) and proportions (age distribution of the patients, number of hospitalised patients, etc.) and clinical focus on the severity of illness and complications. To our knowledge, the presented ARIMA model is the first in this country to outline the dynamics of the epidemic process over a period of ninety-two years. Moreover, the established trend provides robust epidemiological forecast to inform health authorities and support evidenced based decision-making for the inclusion of varicella vaccination into the immunisation schedule of the country. The ARIMA model (0,1,1) in our study is an adequate tool for presenting the varicella incidence trend and is suitable to forecast near future disease dynamics, with acceptable error tolerance.