536
Views
1
CrossRef citations to date
0
Altmetric
Articles

An integrated epidemic modelling framework for the real-time forecast of COVID-19 outbreaks in current epicentres

& ORCID Icon
Pages 200-220 | Received 27 Nov 2020, Accepted 03 Jan 2021, Published online: 26 Mar 2021

Abstract

Various studies have provided a wide variety of mathematical and statistical models for early epidemic prediction of the COVID-19 outbreaks in Mainland China and other epicentres worldwide. In this paper, we present an integrated modelling framework, which incorporates typical exponential growth models, dynamic systems of compartmental models and statistical approaches, to depict the trends of COVID-19 spreading in 33 most heavily suffering countries. The dynamic system of SIR-X plays the main role for estimation and prediction of the epidemic trajectories showing the effectiveness of containment measures, while the other modelling approaches help determine the infectious period and the basic reproduction number. The modelling framework has reproduced the subexponential scaling law in the growth of confirmed cases and adequate fitting of empirical time-series data has facilitated the efficient forecast of the peak in the case counts of asymptomatic or unidentified infected individuals, the plateau that indicates the saturation at the end of the epidemic growth, as well as the number of daily positive cases for an extended period.

1. Introduction

Starting from Wuhan in early December 2019, a novel severe acute respiratory syndrome coronavirus, named as COVID-19, prevailed unexpectedly with a detrimental effect on public health. Though mainland China as the first epicentre of COVID-19, had the coronavirus successfully controlled within two months, its initial success has not prevented the beginning of a global pandemic due to the ignorance of the new virus and contempt of its threat. With the first COVID-19 case outside of China reported on January 13 in Thailand, in a short period of time, the unruly contagion boosted promptly in a wide array of countries all over the world, resulting in the continuous shift of the epicentre. In March, the COVID-19 hit massively on Italy, while jumping the fences into other countries of the European Union. At the same period of time, it cropped up in Iran, and the cumulative number surged to the peak by the end of March, radiating the Middle East and part of Central Asia. Followed after Europe, the USA took over and became the most infected country in the world with the highest total cases ever since. In April, Russia began to suffer seriously in consequence of the failure of lockdown. From May, more positive cases have been emerging in a wide range of countries of Latin America, with the new epicentres mainly consisting of Brazil, Mexico, Chile and Peru. India, along with Pakistan and Bangladesh in South Asia, has been triggered to an epidemic explosion around late May. After June, Africa became the latest epicentre with abounding underestimated positive cases in most of the African countries due to their limited conditions of detection. By mid-July, over 13 million people have been infected, with more than 500 thousands deaths worldwide. Currently, the highest incidences appear in the USA and the epidemic situation keeps deteriorating at a rapid growth rate in a great number of developing countries. The COVID-19 has been continuing sweeping the globe at a tremendous speed, bringing about the massive threat to the public health, economy and numerous aspects of society. Currently, epicentres in Latin America, Africa, part of Asia and Europe continue undergoing the first wave of the outbreaks. The collapse of a nationwide health system, high mortality rate and increasing economic recession have brought forth heavy losses in the vast majority of countries across the globe. As a consequence, the World Health Organisation (WHO) officially announced the outbreak of COVID-19 as a Public Health Emergency of International Concern (PHEIC) on 11 March 2020.

Under such circumstance, academic effort to comprehend the mechanism of the transmissibility is an urgent need to alleviate the negative effects of COVID-19, which will help adequate decision-making related to the public health system and other social and economic aspects. However, limited understanding of the epidemic source and spread remain the crucial problem to be solved. There has emerged a huge literature on modelling studies for COVID-19, starting from simple data-driven approaches concluded by Huang et al. (Citation2020), mostly focusing on the dynamic ODE-based compartmental models, up to the popular machine learning and deep learning methods (see Mohamadou et al., Citation2020). These modelling tools have been widely employed to study global pandemic from various perspectives, including transmissibility, epidemic prediction, import risk assessment, management strategies and image-based automatic detection.

Apart from the traditional data-driven modelling for epidemiological parameters like R0 and effective reproduction number (see Li et al., Citation2020; Zhao et al., Citation2020), enormous number of works mainly focused on the construction of compartmental models aiming to interpret the trends of COVID-19, including the most frequently used Susceptible-Infected-Recovered (SIR) and the Susceptible-Exposed-Infected-Removed (SEIR) models. The SIR model is the basis for epidemiological dynamic systems, which could be easily applied for basic prediction of the trends of COVID-19, see Song et al. (Citation2020), González (Citation2020). Sun et al. (Citation2020) and Chen et al. (Citation2020) proposed a modified SIR model with varying coefficients, vSIR for short, to characterise the time-varying dynamic regimes due to the significant intervention measures implemented by governments of different countries. Via the locally weighted regression given by Cleveland and Devlin (Citation1988) that produces estimates for parameters with desired smoothness, the vSIR model makes the transmission rate α and the effective reproduction number Rt varying with time and possesses the capability for capturing the changing dynamics with guaranteed statistical consistency.

Compared with the SIR, the SEIR model owns an additional compartment ‘E’ (Exposed), which contributes to the flexibility of the infectious period. Among recent studies, Zhao et al. (Citation2020) modelled the epidemic trends of COVID-19 at the early stage and estimated the transmission rate of COVID-19 via R0 based on the data of Wuhan, China from 10 January to 24 January 2020. Tang et al. (Citation2020) proposed a deterministic SEIR compartmental model for COVID-19 spreading. Wu et al. (Citation2020) used a typical SEIR compartmental model to infer the number of infected cases in Wuhan from the data on the number of cases that internationally exported from Wuhan. Later on, various modifications of SEIR were put forward with interesting prediction results. In the study given by Yang et al. (Citation2020), the epidemics trend of COVID-19 in China was predicted under public health interventions. Peng et al. (Citation2020) proposed a generalised SEIR model to analyse the spread of COVID-19 in China. The model can describe the trends of isolated individuals, recovered individuals and dead individuals.

A variety of extensions of compartmental models were derived from traditional SIR and SEIR models to measure the influence of asymptomatic individuals and the effects of intervention (see X. Wang et al., Citation2020). He et al. (Citation2020) combined the SEIR models with particle swarm optimisation algorithm for parameter optimisation. Liu et al. (Citation2020) proposed a SAIR (Susceptible-Asymptomatic-Infected-Removed) model in the context of social networks where nodes represent individuals and links stand for the contacts between individuals. Rajagopal et al. (Citation2020) developed a SEIRD model with fractional-order derivatives based on the data in Italy and showed that the model has less error than the classical ones. Maier and Brockmann (Citation2020) presented a parsimonious SIR-X model to absorb quarantine measures, containment policies and unidentified infectious individuals (containing asymptomatic patients). In addition to the standard parameters of SIR models, the SIR-X model extends the model with a new compartment ‘X’ to show effective quarantine measures acting on both symptomatic individuals and susceptible individuals. Another series of extension is based on SEIR model to reflect the effectiveness of actual measures like intervention implemented by the government. For instance, Xu et al. (Citation2020) created a complex SEIQRP model with six compartments (Susceptible-Exposed-Infectious-Quarantined-Recovered-Insusceptible) in order to accurately predict the cumulative number of cases. T. Wang et al. (Citation2020) proposed a novel SCEIRD model with susceptible subjects (S), close contacts (C), latent (E, infected and infectious but asymptomatic), infected (I), recovered (R), and dead (D) as its compartments and two new parameters to depict the social transmissibility and the pathologic transmissibility.

The structure of this article is organised as follows. Section 2 briefly describes the modelling framework. Section 3 elaborates the detailed methodologies for our modelling framework. Section 4 describes the model fitting and prediction results. Conclusions and discussions are given in Section 5.

2. Modelling framework

Although a wealth of recent modelling studies has demonstrated well-fitted results obtained from miscellaneous compartmental models, they are particularly dependent on epidemiological parameter estimation and the quality of real data collected in different countries. In this study, in order to overcome the shortage, we propose an integrated modelling framework which consists of three parts: estimation of epidemiological parameters, estimation of infectious period and compartmental models for the dynamic system. The compartmental models focus on the estimation and prediction of the epidemic trajectories and effectiveness of containment measures, while the other two parts play supplementary roles that specifically assess the infectious period and the basic reproduction number of each studied country, which directly determines the two main parameters (the transmission rate α and the recovery rate β) in the dynamic system.

In the first modelling part, we manage to estimate epidemiological parameters for different studied countries. The basic reproduction number R0, the final size of infected and timing of the turning point constitute the crucial epidemiological parameters during an outbreak. These parameters, summarising the temporal pattern of the pandemic, quantify the extent of contagiousness, epidemic severity and the inflection time point, respectively. The estimation of the key epidemiological parameters contributes to the forecast of the trend of transmissibility, which plays a vital role in the planning of containment policies. Inspired by the previous work of Zhao et al. (Citation2019), we adopt classical non-linear phenomenological models, including growth models like Gompertz model (see Gompertz, Citation1825), logistic model (see Verhulst, Citation1838) and Richards model (see Richards, Citation1959), to study the parameters of epidemic features.

In the second modelling part, we introduce another decisive factor, the infectious period TI, that seriously affects the transmissibility. TI stands for the duration of which pathogens could be transmitted from an infected individual to a susceptible host. It is considered as a critical feature that partly reflexes the extent of intervention, including the efficiency of quarantining the infected population. Here, we employ the statistical framework recently proposed by Lin et al. (Citation2020). This approach is technically based on a time-varying Poisson increment of daily cases, which was proved to be consistent in determining the infectious period of various countries despite spatial heterogeneity.

In the last modelling part, we concentrate on evaluating the transmissibility of the pandemic and checking if the current implemented containment measures are effective in decreasing the spread of the pandemic. Compartmental models, like the SIR models and their derivatives, are among the most commonly applied methodologies in the study of epidemic dynamics. However, the well-fitted results appear to be substantially dependent on the precise estimation of two crucial parameters, basic reproduction number R0 and the infectious period TI, which could be solved by the first two parts of our modelling framework.

We collected the exact number of COVID-19 confirmed positive cases in 33 highly infected countries using the data from 15 February 2020 to 10 July 2020 (with the date of the earliest case reported in a certain country) from the official websites of the World Health Organisation (https://covid19.who.int). Time-dependent incidence data were retrieved, covering a list of current epicentres in five continents: South America, North America, Asia, Africa and Europe. Countries with massive infected population, such as Brazil, India, Mexico, Russia, South Africa, were selected for our study following the basis that they kept the trends of developing within the first wave up to the end of this study. Note that the USA was excluded from our study not only because it has been preponderating over any other countries and demonstrating a unique and steadily-paced growth, but headed for an unknown second crest which is a completely different pattern in contrast to other countries as well.

3. Methods

We now describe the three parts in the integrated modelling framework, which play different roles but closely related to each other.

3.1. Estimation of epidemiological parameters

R0 is the expected number of infections by an infected individual over his/her infectious period at the start of the epidemic, which is closely connected to the time-varying effective reproduction number Rt. Both R0 and Rt are key measures of an epidemic. For fixed coefficient models, if R0<1, the epidemic will eventually subside with speed depending on the value of R0; otherwise, an inevitable explosion will occur until the growth ceased by the powerful containment or the rise of mortality.

Mathematical modelling is broadly applied to study the primary features of the pandemic by estimating the epidemiological parameters. Here, we apply a typical epidemiological framework of Zhao et al. (Citation2019) for the estimation of R0. Following previous studies of Wallinga and Lipsitch (Citation2007), the basic reproduction number R0 is given by the Euler–Lotka equation (1) R0=1M(r)=10erνg(ν)dν,(1) where r is the intrinsic growth rate from common growth models, and ν is the serial interval (SI) with probability density function g(). Thus, the function M() is the Laplace transformation of g(), also known as the moment generating function (MGF). The serial interval refers to the average time between clinical onsets in an infector and the corresponding infectees. For our study, SI was estimated using the result of Li et al. (Citation2020) based on the collected information on demographic features, exposure history, and illness onsets of the first 425 confirmed cases which had been reported in Wuhan by 22 January 2020, while its probability density was approximated by a Gamma distribution with a mean of 7.5 days and standard deviation (SD) of 3.4 days, see Li et al. (Citation2020).

Therefore, the intrinsic growth rate r remains to be solved by data-driven process. Three typical growth models are utilised in our attempt to fit the real number of cumulative cases C(t) (2) Logistic: C(t)=K1+er(tω),(2) (3) Gompertz: C(t)=Keer(tω),(3) and (4) Richards: C(t)=K[1+θeθr(tω)](1/θ).(4) The standard nonlinear least square approach is adopted for model fitting to estimate the parameters of K (maximum cumulative case number), ω (the unique inflection time point) and θ (the exponent of deviation) and finally, the intrinsic per capita growth rate r, which is the crucial factor required for calculating R0. In the growth models, the growth rate r does not keep decreasing but instead rises to a maximum before gradually declining. The turning point ω is the moment indicating the cease of growth acceleration, which is equivalent to the time of the maximum growth rate r.

The Akaike Information Criterion (AIC) (Akaike, Citation1973) and Bayesian Information Criterion (BIC) (Schwarz, Citation1978) were both employed to evaluate model performance and the model with the smallest AIC and BIC values is selected for further estimation process.

3.2. Estimation of infectious period TI

Infectious period, denoted as TI, indicates average time an infected individual remains infectious before recovery or being intervened by containment measures such as self-isolation and hospitalisation.

From the perspective of statistics, we also consider the novel approach raised by Lin et al. (Citation2020) which is a very typical data-driven application of parametric model. This statistical model, without making any explicit assumptions about the traditional epidemiological parameters, is a scalable framework to estimate the early dynamic trends of COVID-19. It assumes that the increment of cumulative number C(t) up to day t follows a Poisson distribution with time-varying mean, i.e. (5) dC(t)=C(t)C(t1)Poisson(Γ(t)W(t1)),(5) where W(t) is the underlying number of infected individuals at day t, and Γ(t) is the growth rate of the Poisson mean defined as (6) Γ(t)=η(t)×Γ(t1),(6) where η(t), the evolving parameter, is an arbitrary function (linear, polynomial, etc.) which could be specified and fitted by real data. Note that after the infectious period, the infected individuals will be hospitalised or quarantined from the population, so that the actual infected individuals should take those removals into consideration. Thus, W(t) could be expressed as (7) W(t)=W~(t),tTI,W~(t)W~(tTI),t>TI,(7) where W~(t) represents the observed number of cumulative infected individuals, and W~(tTI) for t>TI denotes the total number of removed infected individuals at data t. Let dW~(t)=W~(t)W~(t1). Note that the new cases diagnosed at day t may not be fully reported, which indicates E(dC(t))=pdW~(t), p<1. Though the estimation for p might not be easily archieved due to the limited data we have, fortunately, simple mathematical derivation shows that the value of p will not affect the trend of the epidemic, particularly, the duration, the peak time, the turning point, as well as the infectious period in which we are interested. Thus, we set p = 1 for simplicity, and it follows that (8) dW~(t)=E(dC(t))=Γ(t)W~(t1).(8) By chain calculation, the final expression for the actual number of infected individuals (considering removals after infectious period TI) could be expressed as (9) W(t)=Aj=1t(Γ(j)+1)AI(t>TI)j=1tTI(Γ(j)+1),(9) where A=W~(0) is the initial value of cumulative cases at t=t0=0. With the estimated parameters by maximising the log-likelihood function based on the Poisson assumption of dC(t) (10) L(δ)=t=1T{dC(t)log(λ(t))λ(t)}+C,(10) where λ(t)=Γ(t)W(t1) and C is a constant, we could estimate and predict the average daily new cases dW~(t) (11) dW~(t)=AΓ(t)j=1t1(Γ(j)+1).(11) With the calculated dW~(t), we could therefore perform the fitting with the actual numbers. The best-fitted infectious period TI could therefore be derived by minimising the prediction error.

This is a parsimonious but effective fashion to analyse the dynamic of COVID-19 outbreak by a completely parametric statistical model other than typical ODE-based dynamic models which seriously require an adequate initialisation of epidemiological parameters. Though it has shown versatility, we specifically employ it for estimating TI rather than other variables since its deficiency of a naive model hypothesis could be supplemented by other parts of our modelling framework.

3.3. SIR-X model

The SIR model is the origin of epidemiological compartmental models, which simplify the mathematical modelling of infectious diseases. The population affiliated to the contagion is assigned to three compartments with labels S, I and R (Susceptible, Infectious and Recovered, respectively). Transitions could be performed between compartments to symbolise the dynamics. They satisfy the system of partial differential equations (12) tS=αSI,tI=αSIβI,tR=κI.(12) The statistical inference has been discussed in literature in terms of stochastic versions of the SIR model, showing that it is one of the most explanatory and scalable dynamic systems for epidemiological modelling, see Becker (Citation1977), Becker and Britton (Citation1999), Yip and Chen (Citation1998) and Ball and Clancy (Citation1993). One of its generalisations, the Susceptible-Exposed-Infected-Removal (SEIR) model, was proposed by Hethcote (Citation2000), with four compartments, to depict the dynamics of epidemic outbreaks. It is generally assumed that the transmission coefficients are constant, which is not considered as ideal enough for modelling COVID-19, as it is unable to reflect the intervention imposed on the population by government.

Note that most of these methods studied the early exponential growth dynamics, which often lead to significant overestimation of the epidemic timing and size. However, in the real epidemic trajectories of COVID-19, what we could expect is that an initial exponential growth mitigates with the postponement due to containment policies for abating transmission and effective reproduction. This would lead to the saturation in the count of cumulative cases along with an exponential decline in the increment of infected population. As is suggested by Maier and Brockmann (Citation2020), the subsequent rise follows a sub-exponential and algebraic scaling law which was regarded as a consequence of internal and basic epidemiological processes and a balance between transmission events and containment factors. Thus, a parsimonious epidemiological compartmental model, the SIR-X model, was presented by Maier and Brockmann (Citation2020) to absorb quarantine measures, containment policies and unidentified infectious individuals (containing asymptomatic patients). In addition to the standard parameters of SIR model, the SIR-X model reflected effective quarantine measures acting on both symptomatic individuals and susceptible individuals, which is simply quantified by the new compartment ‘X’. A major revision is on the compartment ‘I’ which denotes the unidentified infecteds. We apply the SIR-X model to quantify the removal of symptomatic infecteds by quarantine procedures, based on the assumption that the containment strategies vary with regards to the epidemic and significantly deplete their contribution in the transmission process. Furthermore, indirect estimation of the peak time in the number of unidentified infectious individuals is also performed by the SIR-X model.

Note that in the setting of SIR-X model R0=αβ=αTI.Here, the basic reproduction number R0 and the infectious period TI directly define the transmission rate α and the recovery rate β. As an evolution of typical SIR models, the dynamics of SIR-X could be stated as follows: (13) tS=αSIκ0S,tI=αSIβIκ0IκI,tX=(κ+κ0)I,(13) where κ>0 is the quarantine rate, κ0>0 is the containment rate, and I(t0) is the initial value of I(t). They can be numerically fitted by nonlinear least square method. Specifically, κ0=0 corresponds to an exceptional scenario in which the containment policies commit no behavioural change on removal of susceptible and infected individuals, while κ=0 refers to the circumstance under which the symptomatic infecteds are not quarantined. Note that the infecteds are subtracted more efficiently from the compartment ‘I’ than from the compartment ‘S’, which is analytically implied by β+κ+κ0>κ0.

In the SIR-X model, S, I, and X quantify the respective compartments' fraction of the whole population. Here, we assume that X(t) is proportional to the actual number of confirmed cases with initialisation X(t0)=C(t0)N which is equal to the ratio of the cumulative cases C(t0) among the whole population N at time t0. Meanwhile the initialisation of I (Infected) and S (Susceptible) satisfies (see Maier and Brockmann, Citation2020 and its supplementary materials) (14) I(t0)=ϕ0X(t0),(14) (15) S(t0)=1I(t0)X(t0).(15) Since the initial size of unidentified infected population remains unknown, the proportionality factor ϕ0=I(t0)X(t0) was chosen as a parameter that requires numerical optimisation by model fitting. In practice, the initialisation of parameters in SIR-X plays a critical role for the goodness-of-fit to real data. Thus, the previous two modelling parts are closely associated to the eventual effect.

Two newly defined quantities in ratio form are defined to facilitate the assessment of the epidemiological modelling with quarantine and isolation. The first is (16) P=κ0κ0+κ,(16) which embodies the extent of containment measures affecting the public compared to quarantine measures constraining the symptomatic infected solely. The second one is defined as (17) Q=κ0+κβ+κ0+κ,(17) which reflects how probable an infected was identified and quarantined afterward.

3.4. Integrated modelling and algorithm

As is mentioned above, due to the limited source of data and the inclusion of quarantine and containment measures taken, we propose an integrated modelling framework, in which the compartmental model of SIR-X plays the main role, while the growth model for cumulative cases to determine the basic reproduction number R0 and the Poisson model for the increment of cumulative number to estimate the series interval TI help determine the transmission rate α and the recovery rate β. The connection of these three parts of the framework is the basic equality of R0=α/β=αTI. A good fitting of SIR-X requires accurate estimation of parameters, especially the transmission rate α and the recovery rate β, which are specified in advance and shows the necessity of the first two parts of our modelling framework.

Our modelling framework is summarised into Algorithm 1 with the following detailed procedures.

  1. Find the best-fitted intrinsic growth rate r by adapting three typical growth models.

  2. Calculate the basic reproduction number R0 via the Euler–Lotka equation.

  3. Apply the Poisson-based statistical approach to evaluate the infectious period TI.

  4. Calculate the transmission rate α and the recovery rate β based on R0=α/β=αTI.

  5. Estimate the quarantine rate κ, the containment rate κ0 and the initial value of infected cases I(t0) in the compartmental models of SIR-X.

  6. Calculate the quantities of interest: the peak time point in the cases of asymptomatic or unidentified infected individuals, prediction of daily positive cases for an extended period. effectiveness measure of containment P and Q in Equations (16) and (17), etc.

4. Results

By applying the integrated epidemic modelling framework, we reproduced the on-going trajectories of the first wave of COVID-19 outbreaks as well as predicting future trends based on daily confirmed case numbers within the corresponding study periods of 33 countries across Latin America, Asia, Europe and Africa.

Distinguished by three consecutive days with increasing positive cases, the beginning dates of outbreaks varied from 15 February 2020 to 17 March 2020 due to the imbalanced epidemic spreading worldwide. Regardless of various beginning dates of each country, the study periods lasted to 10 July 2020, which is the ending date of our time-series data. We used case incidence data within each epidemic period to fit our modelling framework for the current trajectory. For validation, we apply our model to forecast the epidemiological development for the following 100 days, obtaining the final epidemic size on the date of 19 October 2020. Estimation and prediction results for key parameters depicting the modelling framework are shown in Table .

Table 1. Estimation and prediction results for key parameters and the epidemic trends.

4.1. Estimation of R0 for 33 countries

In the first part of our modelling framework, the basic reproduction number R0 was estimated via Euler–Lotka equation, given the parameters fitted by three types of growth models. Among them, the Gompertz model adapted for the countries which remained in their early epidemic trends, while the Logistic model and the Richards model demonstrated better fitness on curves of the countries where the epidemic has already developed to the prime stage (see Figures ).

Figure 1. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (I).

Figure 1. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (I).

Figure 2. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (II).

Figure 2. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (II).

Figure 3. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (III).

Figure 3. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (III).

Figure 4. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (IV).

Figure 4. Fitting for the selected growth model by countries. The fitted growth curve (solid) and the actual number (dotted) of daily confirmed cases over the ordered days of the outbreak (IV).

The selected growth model to determine the intrinsic growth rate r was judged by various evaluation criteria. The model with the lowest AIC and BIC values was considered as the best-fitted decision, leading to the estimate of r (see Table ).

Table 2. Parameters of growth models for estimating basic reproduction number.

Results showed that the basic reproduction number R0 of the studied 33 countries ranged from 1.60 (of Romania) to 3.28 (of Bangladesh) with a mean of 2.48 and a median of 2.43 approximately (see Table ).

4.2. Estimation of infectious period TI

In the second part of the modelling framework, the best-fitted TI was determined by the lowest prediction error through a complex statistical model given by Lin et al. (Citation2020), which was built on the Poisson-distributed increment and shown in Figures .

Figure 5. Prediction error versus infectious period by countries (I).

Figure 5. Prediction error versus infectious period by countries (I).

Figure 6. Prediction error versus infectious period by countries (II).

Figure 6. Prediction error versus infectious period by countries (II).

Figure 7. Prediction error versus infectious period by countries (III).

Figure 7. Prediction error versus infectious period by countries (III).

Figure 8. Prediction error versus infectious period by countries (IV).

Figure 8. Prediction error versus infectious period by countries (IV).

The similarity was demonstrated among most of the studied countries with common V-shaped line trend for the relation between prediction error and TI, indicating the optimal number of the infectious period which stay at the trough.

The values of TI in Table , ranging from 5 (of Venezuela) to 13 (of Bangladesh and UAE) days, reflects the estimated average duration of infectious period, which in practice could be significantly shortened by intervention measures such as nationwide lockdown, social distancing, earlier population-based testing and self-isolation.

4.3. SIR-X model fitting

After implementing the calibrated model based on the R0 and TI determined in the previous two sections, we then move on to the SIR-X dynamic model for achieving an explanatory prediction result for the potential development of epidemics. During the numerical approximation procedure, a fourth-order Runge–Kutta method was applied for the fitting of parameters. Using the mid-year population sizes N which were collected from official websites of the United Nations (https://population.un.org), we obtained the specific values for key modelling parameters κ0, κ and ϕ0, shown in Table , along with α and β which denoted the transmissibility and recovery rate. The fitting curves for the corresponding study periods generally fell close to the observed trajectories, which suggested a relatively effective model fitting performance (see Figures ).

Figure 9. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (I).

Figure 9. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (I).

Figure 10. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (II).

Figure 10. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (II).

Figure 11. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (III).

Figure 11. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (III).

Figure 12. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (IV).

Figure 12. SIR-X model fitting by countries. The fitted growth curve (red-solid, smooth) and the actual growth curve (blue-solid, discretised) of daily confirmed cases over the ongoing dates of the outbreak. The predicted trajectory (green-solid) forecasts the daily growth of 100 days after the end of the study period (IV).

Table 3. Parameters of model SIR-X.

Assuming the sustainability of intervention measures, the values of P and Q, also shown in Table , which are derived from the three estimated parameters κ0, κ and β, quantify the public containment leverage and quarantine probability, respectively. Under most circumstances, higher public containment leverages leads to the substantial concordance with pure algebraic growth.

Appropriately, the SIR-X model was structurally consistent with respect to these parameters, while highlighting a sub-exponential scaling law as the balance between transmission and containment before the saturation of the case counts due to the decay of unidentified infecteds.

4.4. Estimation of the peak and plateau

In this study, the turning point of the first wave of a certain COVID-19 outbreak was defined as the date when the cumulative case number numerically reached the plateau which satisfies |f(v)v|c0 where the ratio of increment is defined as f(v)=[C(t)t|t=v]/[C(t)t|t=v1] and c0 is a prespecified small number. Here, we take c0=5×104 (see Lin et al., Citation2020), and judge the exact date of the plateau only if the daily confirmed case numbers of all the following days satisfying this criterion.

Besides, the unidentified infectious compartment‘I’ distinguished the timing of the peak when the most infectious cases emerged. Note that the exact value of unidentified infecteds is sensitive to parameter changing, especially the population size N. However, the general shape of I(t) remains consistent to be a Bell curve whose exponential decay right after the peak induces the saturation of the cumulative case number to a finite level.

From Table , conclusions about the predicted epidemic trends could therefore be drawn explicitly. Among 33 studied countries, the peaks were observed in 11 countries (with the earliest peak of UAE on the 13rd of May, 2020) and 15 countries before August, while the latest peak (of Colombia) is the 8th of September, 2020. Meanwhile, the timing of plateau exhibited that the first wave of epidemic would come into break around late August to September across the majority of the studied countries. Remarkably, under the current fitted parameters with the continuity of containment measures, the durations of fading, counted from the peak to the plateau, were mostly longer than one month. The difference of fading durations might partially reveal the significant effect of divergent containment measures implemented for various countries.

Specifically, in Asia, the epidemic will mostly continue until the end of August. India, Pakistan and Saudi Arabia, along with the Philippines, were expected to reach the plateau in September. With the second-largest population in the world, India has undoubtedly become the epicentre in Asia and was estimated to have a final size over 3.2 million cumulative confirmed cases by the ending date (19 October 2020) of our prediction.

In Latin America, the epidemic will fade out not before the second half of August, except for Peru and Chile whose turning points will emerge in early August. Among them, Brazil has the largest estimated infections with more than 4.2 million positive cases, which accounts for almost 2% of the nationwide population.

In the three studied European countries, both Russia and Ukraine have passed through the peak in late June and the plateau would be approached within this August. Moreover, Russia will achieve a final epidemic size of over 1 million confirmed infected cases. In addition, Romania was estimated to have a late climax a few days before September and the epidemic would last until the second half of September.

Though the substantial shortage of medical testing condition has greatly influence the validity of the statistics in Africa, it is no doubt that Africa has already become one of the non-negligible epicentre suffering from COVID-19. Among the five studied African countries, most of them would reach the plateau after September with the exception of Algeria, whose comparatively smooth trajectory would lead to the turning point at 18 August 2020. From our prediction, Morocco would hold the latest peak at 14 August 2020, while South Africa was expected to have the largest final size of nearly 700 thousand.

5. Conclusions and discussions

In this study, we focus on conducting analytical assessment and prediction on the degree of the epidemic outbreak across the currently developing epicentres. Considering the commonplace that when a brand new contagion starts evolving into the outbreak, there exists deficiency of public health related information, leaving only the reported cases available for academic research. Thus, we propose an integrated framework for analysing the COVID-19 time series cases, which were reported from 15 February 2020 to 10 July 2020. We track, evaluate and forecast the epidemic by comparative study of the epidemiological parameters and estimating the number of cumulative cases across various countries, assessing the impact of containment strategies, which should be constructive in mitigation planning and redeployment of resources.

The modelling framework has demonstrated adaptiveness and consistency on portraying epidemic trends across the studied countries, which is advisable for the quantitative analysis of the transmission mechanism of COVID-19, together with the implementation of control measures in current epicentres and for potential future outbreaks worldwide.

In summary, using the data from the first stage of the epidemic, our study provided a concrete modelling framework for estimation of the epidemiological parameters and prediction of future trajectories as well as explanatory features including the peak, the plateau and the final epidemic size of the current epicentres. Results of prediction were mostly considered as consistent with the observed growth curves. Meanwhile, we highlighted the importance of effective containment policies and quarantine measures in flattening the epidemic curves.

Fitted by the empirical case counts, the modelling framework generates the basic reproduction number R0 and the infectious period TI catering for each country through typical epidemic growth models and a parsimonious statistical approach respectively. Plausible parameter values were well archived in most of our studied countries, indicating decent results obtained for the following modelling procedure of the dynamics.

Thus, the reproduced epidemic trajectories of the 33 studied countries could be applied to estimate the trend of the number of asymptomatic infected individuals, which is the key quantity for estimating the peak time of the outbreak.

The SIR-X model discussed here unveils that the remarkable feature to better depict the dynamics of the COVID-19 outbreaks in 33 studied countries is the sub-exponential scaling law in the growth of positive case numbers during the first wave of the epidemic. This common behaviour demonstrates that fundamental principles are practically correlated with the epidemic that are manipulated by the coaction of internal behavioural changes in the susceptibles as well as external containment policies and quarantine measures.

Despite the explanatory modelling performance, our study has several limitations. First of all, high reliance on the quality of data collection always remains a realistic constraint for most of the modelling study. The under-reporting of infection, the delay of testing feedback and the bandwidth of update in statistics are commonplace in a vast majority of countries. Even under such circumstances, our model framework has still been proved to accomplish the analysis with credible results including the dynamics and the predictions of final epidemic size, the peak and the plateau.

Secondly, during the process for estimating the basic reproduction number R0, we directly applied the result of serial interval from a former study of Li et al. (Citation2020), which was considered as a general alternative in consequence of lacking in the specific onset data from each studied country. We believe that the estimation of the basic reproduction number R0 for each country will become more reliable with the support of its onset data that will lead to a precise assessment of the serial interval.

Thirdly, according to the setting of the statistical modelling in Section 3.2, the infectious period TI could only be achieved as a positive integer. Though more interpretable in practice, the neglect of numerical smoothness might restrict the parameter space for tuning of the SIR-X dynamic model, which would concern the accuracy for prediction.

Additionally, in the part of the SIR-X model, we simply assumed that the containment rate κ0 and the quarantine rate κ are constant considering the fundamentality of data. In practice, the intervention strategies in a certain country would probably change dramatically for different epidemic stages, resulting in the time-varying containment and quarantine rate which remain the modification for further research.

Last but not least, as is shown in the results, the saturation of confirmed cases informs that eventually all susceptibles will ideally fall into the removal from the epidemic transmission process assuming that the containment and quarantine measures could be held on to the end of the epidemic for an extended period of time. However, a considerable number of susceptibles will not be quarantined as the consequence of either the ignorance of intervention policies or the shortage of quarantine space and testing resources. Indeed, the number of daily new cases expected will decline following a slower path and finally saturate to a comparatively small, yet non-zero level instead. Advisably, aiming to thoroughly cease the epidemic transmission, it would be worthy to extend statistics for unidentified and unquarantined infecteds. As a result, we expect that our predictions will partly underestimate the final epidemic sizes for studied countries.

Though generally well-fitted, there are still some of the studied countries, such as the last four studied countries (Morocco, Russia, Ukraine and Romania), whose goodness-of-fit seems rather poor. One of the potential reasons is that these countries have already entered the second wave of the epidemic trajectories, which will certainly not be suitable for our single wave model. The extension could be considered in further research so that our modelling framework will be adaptive to the second wave. With the complete empirical time-series data for the first epidemic wave, an accurate estimation for the reproduction number will be carried out. Moreover, due to the accumulated experience for the policy-making of containment, it is most likely that the infectious period of the second wave will be modified with more complexity, as well as different sets of transmission rates α and recovery rates β for the SIR-X dynamic models. With increasing data collected, it will be practical to develop piecewise modelling according to different stages of policy implementation period, resulting in the multiple sets of parameters κ0 and κ.

The major difference between the modelling of the early epidemic and the second wave trajectories will be the initialisation, since the starting cases will remain non-negligible quantities on the basis of unidentified infected individuals that we've estimated. Thus, an adequate recognition of the initial point for launching the second wave will bring substantial influence on the prediction of trajectories.

It is generally believed that the second wave inclines to have a more extensive exponential rise to a higher peak which will probably last longer in a consequence of the hidden population of unidentified infected individuals and the relatively loosened containment policies implemented throughout the world. The upcoming vaccine will become another important factor that will affect the trajectories of some countries. Under such complicated circumstances, multi-wave modelling requires to be specifically validated by our future research based on the current modelling framework.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The research is supported by the Natural Science Foundation of China [Grant Numbers 11271136, 81530086] and the "Project 111" No. B14019 of China.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.