518
Views
0
CrossRef citations to date
0
Altmetric
Special issue In memory of Fred Brauer

A computational approach to identifiability analysis for a model of the propagation and control of COVID-19 in Chile

, , &
Article: 2256774 | Received 29 Dec 2022, Accepted 30 Aug 2023, Published online: 14 Sep 2023

Abstract

A computational approach is adapted to analyze the parameter identifiability of a compartmental model. The model is intended to describe the progression of the COVID-19 pandemic in Chile during the initial phase in early 2020 when government declared quarantine measures. The computational approach to analyze the structural and practical identifiability is applied in two parts, one for synthetic data and another for some Chilean regional data. The first part defines the identifiable parameter sets when these recover the true parameters used to create the synthetic data. The second part compares the results derived from synthetic data, estimating the identifiable parameter sets from regional Chilean epidemic data. Experiments provide evidence of the loss of identifiability if some initial conditions are estimated, the period of time used to fit is before the peak, and if a significant proportion of the population is involved in quarantine periods.

This article is part of the following collections:
An article collection in honour of Fred Brauer

1. Introduction

1.1. Scope

The COVID-19 pandemic has received overwhelming news coverage worldwide. We recall that the coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was declared a global pandemic by the World Health Organization (WHO) on March 11, 2020 [Citation11, Citation64]. This highly contagious unprecedented virus has impacted government and public institutions, strained the health care systems, restricted people in their homes, and caused country-wide lockdowns resulting in a global economic crisis [Citation20, Citation32, Citation61]. The impact of COVID-19 at the time of revision of this work (July 5, 2023) amounts to roughly 767 million cases moreover, 6.95 million deaths worldwide, of which 2.96 million have been reported in the Americas.

The morbidity, mortality, and economic indicators associated with the COVID-19 pandemic point to a devastating picture for Latin American countries. High poverty rates, poor leadership, high prevalence of underlying health conditions such as obesity and diabetes, and frail healthcare systems have exacerbated the impact of the novel coronavirus in this region [Citation9, Citation23, Citation30, Citation45]. In Chile, the Ministry of Health reported the first COVID-19 case on March 3, 2020, becoming the fifth country in Latin America after Brazil, Mexico, Ecuador and Argentina to report COVID-19 cases. Soon after, the Chilean government put in place a number of interventions including the closure of all daycares, schools, and universities (March 16), border controls, telework recommendations (March 18), closure of non-essential businesses (March 19), national night curfews (March 22), as well as targeted lockdowns at the level of municipalities since March 26, 2020. As of July 6, 2023, Chile has accumulated 5288437 cases and 61586 deaths [Citation40]. It is also worth noting that Chile has tested at a higher rate than any other Latin American country [Citation58]. Fortunately, Chile's mass COVID-19 vaccination campaign to immunize about 80% of the total population started in February 2021, and more than 90% of that target population has been fully vaccinated as of July 6, 2023.

One way to understand and explain these epidemic phenomena is by applying mathematical models and surveillance data, and the COVID-19 outbreak has been studied with different focuses, differing in their uses, their mathematical form, and objectives [Citation1, Citation6, Citation31, Citation43, Citation44, Citation47]. Since the first case reported of the COVID-19 disease, and the multiple models that have surged to respond to the emergency and its dynamics [Citation53, Citation54], we can understand their contributions, as well as their limitations [Citation2, Citation4, Citation62]. The mathematical models utilized to confront the coronavirus 2019 disease improve situational awareness, assess epidemiological characteristics, and inform the base of evidence for prevention strategies. Specifically, for models derived from the famous SEIR model, one must guarantee that information recovered by them is highly accurate and that one confidently understands the biological mechanisms to lay the ground for developing optimal strategies for public health. The successful application of such epidemic models depends upon our ability to estimate key transmission parameters. These estimates are subject to two principal sources of uncertainty: the noise of the data, and the assumptions built into the model. Both difficulties give rise to the problem of non-identifiability of parameters. That is, the information recovered by applying the model to available datasets is not representative of the dynamics that we desire to interpret. This problem has been studied intensively and there are various approaches to handle non-identifiability. For instance, in [Citation16, Citation22, Citation29, Citation48, Citation57, Citation69] guidelines and criteria are proposed to avoid such identifiability difficulties. Ignoring these factors can result in misleading inferences with potentially incorrect public health policy decisions [Citation12, Citation14, Citation51].

We are inspired by the dynamics of COVID-19 that occurred in Chile in the initial phase of 2020 to explore these factors in estimating parameters. Then, we propose a compartmental model with nonlinear differential equations for the Chilean dynamics, and we use it to distinguish the noise from the data and the structure. With this in mind, we developed an identifiability study employing a computational approach such as that described in [Citation51]. The identifiability analysis, as explained in [Citation50], determines how well the parameters of a model can be estimated from experimental data, so it is crucial for interpreting and determining confidence in model parameter values. Structural identifiability focuses on a mathematical model and the practical identifiability on data. Hence a model with structurally identifiable parameters may still, be nonidentifiable in practice due to a lack of information gleaned from real data. Thus, it is essential to understand both types of identifiability when applying an epidemic model to a specific situation [Citation27, Citation39, Citation51]. Therefore, we applied a computational approach to investigate the identifiability of the parameters from the compartmental model mentioned. This idea motivates an alternative parameter identifiability analysis to a complex compartment model, where other techniques can be rigorous. Specifically, we employ synthetic data from our model using various parameter sets. This parameter estimation is carried out with a defined optimization process with the simulated annealing method. Fitting all data points generated in the simulated epidemic curve, we aim to recover the parameters assumed to create synthetic data. If this approach is successful, we may confirm structural identifiability, which we aim to validate by adding a measurement error structure through a bootstrapping process [Citation14, Citation51]. For the case of practical identifiability, we only use a limited number of data points of the epidemic during the initial phase (e.g. pre-peak). We employed our sets of identifiable parameters to fit regional Chilean data to compare the conclusions from the previous analysis based on synthetic data.

1.2. Related work

We first recall that introductions to compartmental epidemiological models are provided in [Citation3, Citation5, Citation18, Citation35, Citation60, Citation66]. This class of models goes back to the well-known work by Kermack and McKendrick [Citation28]. Within a compartmental model the population is subdivided into at least two epidemiological compartments (e.g. susceptible and infected). The rates of progression between the compartments, as well as the incidence and possibly birth and death of individuals, need to be specified, which leads to a dynamical system, that is a system of coupled, nonlinear ordinary differential equations (ODEs) that describe the progression of the epidemic.

In many cases, the compartmental approach formulated as a dynamical system provides indications of the basic reproduction number R0, where this parameter plays the role of a threshold value for the system's dynamics. This role of R0 as a ‘threshold value’ differs from the original epidemiological significance, namely that R0 gives the number of secondary cases one infectious individual will produce in a population consisting only of susceptible individuals [Citation35, p. 21].

The conclusions obtained by a compartmental model are derived from parameter estimations and allow describing the dynamics present in a data set. Then, the identifiability analysis is crucial to guaranteeing that the parameter estimates are adequate. Specifically, the identifiability analysis addresses whether the parameters of a compartmental model can be uniquely identified. There are different strategies for the identifiability, for example, for models expressed by linear ODEs [Citation38, Citation65], nonlinear ODEs [Citation39], or partial differential equations [Citation50]. The type of data to fit also plays an essential role. For example, one needs to select between incidence or prevalence curves [Citation12, Citation19, Citation56]. In the present theory two types of identifiability are considered: structural and practical. The first involves the model and its structure, and the study can be centred on theoretical or computational techniques (e.g. [Citation13, Citation16, Citation19, Citation25, Citation33, Citation39, Citation50, Citation57]), and the second involves the data (e.g. [Citation22, Citation27, Citation29, Citation56, Citation57]). The analysis is separate because a model can be structurally identifiable, but real-world data employed to fit the model results may lead to unidentifiable parameters for reasons related to the characteristics of the data available.

We extend the computational approach developed by Rossa and Chowell [Citation51] to assess parameter identifiability in a compartmental model defined by nonlinear ODEs. Then, the computational approach that combines the optimization process determined by the simulated annealing method and a bootstrap process allows us to assess the identifiability of the corresponding parameters. In particular, structural identifiability is assessed by employing synthetic data with known parameters and the practical one with the same synthetic data, but the model is fitted only to the initial phases of the data curve (e.g. pre-peak periods). In both computations we expect to recover the set of assumed parameters with small uncertainty and error. These properties ensure that the model is indeed associated with identifiable parameters. Finally, we apply the identifiability analysis to various sets of parameters to estimate. We vary their elements in increasing numbers and measure the ability of the model to estimate the parameter sets in each case by fitting the data curves generated with assumed parameters. This procedure can be understood as a sensitivity analysis.

In the application to Chilean data, we consider the parameter sets identifiable from our computational approach (with synthetic data) and fit some regional Chilean data. We hope to confirm the conclusions from the identifiability study with our application to the Chilean case.

In our study, a parameter set is identifiable if it satisfies some criteria defined in Section 2.5, which are an adaptation from the definition presented in [Citation51], considering analyzing the confidence intervals and the mean square error (MSE).

2. Methods

2.1. Compartmental model

We apply our methodology to a complex compartmental transmission model inspired by the strategies implemented by the Chilean government to mitigate the COVID-19 emergency in the initial phase of the outbreak. For this purpose, we consider a single population and adopt a simplified version of the model described in [Citation15], combined with the description of individuals in quarantine proposed in [Citation26]. Individuals within the model are classified as susceptible (S), in home quarantine (Q), latent (E1), partially infectious but not yet symptomatic (E2), asymptomatic (A), infectious and will not be tested (In), infectious and will be tested and isolated (Is), hospitalized/isolated infectious (J), recovered (R), and deceased (D). Constant population size is assumed, i.e. (1) N:=S+Q+E1+E2+A+In+Is+J+R+D=const.(1) Note that we stipulate one single class of asymptomatic individuals. At the same time, in [Citation15] a distinction is made between asymptomatic individuals who will not be tested and those who are asymptomatic and will be tested and isolated.

It is assumed that home-quarantined individuals do not have contact with infected individuals, due to severe travel restrictions and rigorous supervision by their local communities, do not have contact with infected individuals. The parameters p and 1/λ represent the percentage of individuals in quarantine and the quarantine duration, respectively. Therefore, class Q removes susceptible individuals from the infection dynamics when p0 or λ0, and there is no quarantine when p=λ=0.

Five classes can contribute to new infections, namely E2, A, In, Is, and J. Each class has a relative transmission linked with the parameters qe, qa, q and h. These parameters are the relative transmissibilities from partially infectious population E2, asymptomatic population A, asymptomatic/infected population that wear PPE, and the infected population isolated or in hospitals J. Consequently, the product hJ models the contribution of hospitalized individuals to transmission.

If we denote by rXY the rate at which individuals move from class X to class Y, then susceptible individuals move to E1 at the rate rSE1=βN(qeE2+qaqA+qIn+qIs+hJ), where β denotes the overall transmission rate. Individuals in E1 progress to E2 at rate κ1. Individuals from E2 are partially infectious, with relative transmissibility qe, and progress at rate κ2, where a proportion ρa become asymptomatic and partially infectious (relative transmissibility qa), and 1ρa become fully infectious. Among the proportion 1ρa that become fully infectious, ρs will be tested, while 1ρs will be undetected. Asymptomatic individuals who are not tested and symptomatic individuals wear personal protective equipment (PPE) (such as wearing masks in public, handwashing, etc.) and thus have relative transmissibility q, which quantifies the effectiveness of those protective behaviours. Individuals within classes A and In (who are not tested) recover at rate γ1. Those tested (class Is) will progress to the hospitalized and isolated class at diagnosis rate α. Relative transmission within hospitals and isolated places may occur, but our analysis assumes perfect isolation. Hospitalized and isolated indiividuals progress to the recovered class at rate γ2 or to the deceased class at rate δ. Therefore, the model is defined by the following system of non-linear ODEs for a single population, where all variables in capital letters are functions of time t, i.e. it is understood that S=S(t), E1=E1(t) and so on, and a prime denotes the derivative, that is SdS/dt, etc.: (2) S=βSN(qeE2+qaqA+qIn+qIs+hJ)pS+λQ,Q=pSλQ,E1=βSN(qeE2+qaqA+qIn+qIs+hJ)κ1E1,E2=κ1E1κ2E2,A=κ2ρaE2γ1A,In=κ2(1ρa)(1ρs)E2γ1In,Is=κ2(1ρa)ρsE2αIs,J=αIs(γ2+δ)J,R=γ1(A+In)+γ2J,D=δJ,C=αIs.(2) The auxiliary variable C tracks the cumulative number of diagnosed/reported cases from the start of the outbreak, and C represents the new diagnosed cases. The quantity C is not a state of the system of equations but simply a class to track the cumulative incidences cases; thus, individuals from population are not moving to class C. The initial values of these state variables are denoted by S(0),Q(0),E1(0),,C(0), where S(0)=NQ(0)E1(0)E2(0)A(0)In(0)Is(0)J(0)R(0)D(0). A schematic of the transmissions is provided in Figure .

Figure 1. Schematic of the transmission of COVID-19 for one population.

Figure 1. Schematic of the transmission of COVID-19 for one population.

The basic, control, and effective control reproduction numbers R0, R0c and Rec quantify the epidemic impact of infectious disease in a population and the effectiveness of control measures. The basic reproduction number R0 defines the average number of secondary cases generated by primary infectious individuals during the early transmission period when the population is completely susceptible and without control interventions. The control reproduction number R0c defines the average number of new cases generated by infectious individuals when the population susceptible is with some control measure or intervention. The effective control reproduction number Rec is the reproduction number that varies with time, and it is defined as partially susceptible population, where if Rec>1, the disease transmission continues, and if Rec<1 the disease transmission declines. Therefore, while the basic reproduction number R0 and control reproduction number R0c captures the transmission potential of infectious disease during the early epidemic, the effective control reproduction number Rec, captures changes in the transmission potential over time.

To calculate these numbers, we employ the well-known next-generation matrix approach [Citation59]. This yields the expression R0c=ρ(FV1), where ρ denotes the spectral radius and for the present model, the matrices F and V are given by F=[0βqeβqaqβqβqβh000000000000000000000000000000],V=[κ100000κ1κ200000κ2ρaγ10000κ2(1ρa)(1ρs)0γ1000κ2(1ρa)ρs00α00000αγ2+δ]. The result is (3) R0c=β(qeκ2+qaqρaγ1+q(1ρs)(1ρa)γ1+qρs(1ρa)α+hρs(1ρa)γ2+δ).(3) Moreover, the corresponding effective reproduction number is defined as Rec(t)=R0cS(t)/N, where S(t)/N is the proportion of susceptible individuals in the population at time t, where the total population N is assumed to be constant (see (Equation1)). The explicit expression (Equation3) implies the following contributions of the individual compartments: RE2=βqeκ2,RA=βqaqρaγ1,RIn=βq(1ρs)(1ρa)γ1,RIs=βqρs(1ρa)α,RJ=βhρs(1ρa)γ2+δ, where (4) R0c=RE2+RA+RIn+RIs+RJ.(4) This parameter R0c is a key epidemic parameter to measure the transmissibility of a pathogen; by (Equation3) we see that R0c is a function of several parameters of the epidemic model (Equation2), including transmission rates and infectious periods of the epidemiological states that contribute to new contagions. In addition, we can observe that the basic reproduction number R0 corresponds to the control reproduction number R0c in the absence of interventions, i.e. when q = 1, h = 0, and ρs=0 (where q is interpreted as the level of no protection and 1−q is the level of protection due to the use of PPE). These assumptions lead to the expression R0=β(qeκ2+qaρaγ1+(1ρa)γ1).

2.2. Simulated data

Using our model proposed for COVID-19, we simulate various scenarios fixing their parameters and initial conditions. These curves will help analyze the ability of our model to capture such dynamics. Then a parameter estimation process is necessary together with a bootstrapping procedure to examine the identifiability of several of its parameters, as will be explained in the following. For this, we assume the values and references summarized in Table  for the parameters, and for the initial conditions the values J(0)=C(0)=1,Q(0)=D(0)=R(0)=0,In(0)=1,Is(0)=20,E1(0)=2J(0),E2(0)=4J(0)E1(0),A(0)=4,S(0)=NE1(0)E2(0)In(0)Is(0)A(0)J(0)R(0)D(0) for a total population N = 500, 000. These values are used to simulate the following scenarios.

Table 1. Values of model parameters selected to generate simulated data.

2.2.1. Scenario 1: initial phase for three values of R0c without quarantine

We consider three values of R0c that generate three values of β via (Equation3). These values along with other parameters are summarized in Table . The corresponding incidence curves tC(t) are generated for t[0,200], see Figure (a). These incidence curves will be our datasets for the identifiability study. Notice that the curve for R0c=3 effectively grows more slowly than the others, and its maximum is smaller than the curves with R0c=4 and R0c=5.

Figure 2. Incidence curves C=C(t) for synthetic data based on the parameters of Table (a) of Scenario 1, (b) of Scenario 2 with p = 0.01, (c) of Scenario 2 with p = 0.05. Magenta, green, and blue curves correspond to R0c=3, 4, and 5, respectively.

Figure 2. Incidence curves C′=C′(t) for synthetic data based on the parameters of Table 1(a) of Scenario 1, (b) of Scenario 2 with p = 0.01, (c) of Scenario 2 with p = 0.05. Magenta, green, and blue curves correspond to R0c=3, 4, and 5, respectively.

2.2.2. Scenario 2: initial phase for three values of R0c with quarantine.

For this scenario, we consider the same values for R0c and corresponding values of β as in Scenario 1 since expression (Equation3) does not depend on the parameters p or λ. The pairs (p = 0.01, λ=1/15) and (p = 0.05, λ=1/15), with the other parameters as given in Table , produce the datasets shown in Figure (b,c), respectively. Again, the curve with R0c=3 grows more slowly than the others, and its maximum is smaller than the curves with R0c=4 and R0c=5. Besides, the maxima of the curves with p = 0.05 are consistently smaller than those of the corresponding curves with p = 0.01, which evidences the effect of the quarantine modelled.

Using the bootstrapping approach, we intend to use these curves defined for each scenario to generate multiple samples from the best-fit model. Before doing this, we need to define the parameter estimation process to explain the bootstrap approach and finally the identifiability study.

2.3. Parameter estimation

To estimate the parameter values of our COVID-19 model, we need to use one optimization process to fit the model to each simulated dataset. Then, the method used for this purpose is outlined as follows, executed in Matlab (Mathworks, Inc).

2.3.1. Mixed simulated annealing method with a least-squares approach (SA-LSQ)

The SA-LSQ approach is defined as a parameter estimation process that combines the simulated annealing and least-squares optimization algorithms to minimize the Euclidean distance between the incidence curve C(t,Θ), defined by our COVID-19 model (Equation2), and the time series data (datat) that corresponds to time series data with t[0,T], where T is the final time (date). Here Θ is the set of parameters to estimate of our COVID-19 model.

In our approach, we obtain the set of estimated parameters Θˆ that satisfies Θˆ=argminΘSΘf(Θ,R), with initial guess SΘ and the objective function f(Θ,R):=1RtiT|C(ti,Θ)datati|2, for a given normalizing coefficient R.

We wish to avoid attaining local minima and to guarantee that the optimization process finds the global minimum. Consequently, due to the sensitivity of the solution with respect to the initial guess utilized for these processes, we decided, before starting the SA fitting, to select the best initial guess running several Latin hypercube sampling (LHS) runs. This is done by the built-in Matlab function lhsdesign(), applied to the bounds defined for each parameter to estimate.

Together with simulated annealing (SA), using the Matlab built-in function simulannealbnd(), and the best initial guess obtained with LHS, we estimated the parameters that are transformed as the initial guess to apply the least squares method (LSQ) with lsqcurvefit() Matlab function to speed up the steepest descent phase of the simulated annealing algorithm.

More details about the procedure are summarized in Algorithm 1.

This method is a powerful optimization tool, particularly for fitting models that describe epidemic phenomena with real data, such as those developed in [Citation7, Citation8, Citation36, Citation68, Citation70] Additionally, the SA technique was inspired by Metropolis et al. [Citation37] to select the new characterized solutions. Even so, its adaptation allows the application to a frequentist paradigm.

Specifically, to study parameter identifiability, we define various experiments that use the generated curves in Scenarios 1 and 2, and we intend to estimate or recover some parameters of interest from these curves. Therefore parameters of interest are selected from {α,β,ρs,Is(0),J(0),p}, while the parameters not selected are fixed to the respective value of Table .

2.3.2. Sets of parameters to be estimated

For the data generated using Scenarios 1 and 2, we define the sets of parameters to estimate given in Table . Each combination explored will be mentioned as an experiment. For example, Experiment 101 indicates to data from Scenario 1 to estimate the parameter set 1 and Experiment 201-1 corresponds to data from Scenario 2 to estimate the parameter set 1, with p = 0.01, where Experiment 201-2 will be with p = 0.05 (see Tables  and ). Therefore, for Scenario 1, we have 24 experiments, that is, the number of values assumed of R0c (three) times the number of parameter sets (eight). Likewise, for Scenario 2, we have 84 experiments corresponding to the number of values of R0c (three) times the number of values of p (two) times the number of parameter sets (fourteen).

Table 2. Set of parameters to be estimated in Scenarios 1 and 2.

Table 3. Summary of results obtained for Scenario 1.

Table 4. Summary of results for Scenario 2.

2.4. Bootstrapping method

This method uses the parametric bootstrapping approach with Poisson error structure (see [Citation14, Citation51]) to repeatedly sample observations from the best-fit obtained by the SA-LSQ fit of our COVID-19 model to an time series data. This process can be summarized in the following steps:

  1. Obtain the deterministic model solution C(t,Θˆ) or total daily incidence series from the best-fit obtained using the SA-LSQ estimation.

  2. Generate S replicated datasets assuming the Poisson error structure. To generate these simulated data, we incorporate the Poisson error structure using the incidence curve C(t,Θˆ), where for each time ti we generate a new incidence value using a Poisson random variable with mean C(ti,Θˆ). Therefore, this new dataset represents an incidence curve for the system, assuming the time series follows a Poisson distribution centred on the mean at time points ti.

  3. Re-estimate model parameters for each of the S simulated realizations (using the SA-LQS method), which are denoted by Θˆi for i=1,,S.

  4. Finally, using the set of Θˆi for i=1,,S, we construct the confidence intervals for each estimated parameter. Also, for each set of estimated parameters, R0c is calculated to obtain a distribution of R0c values as well.

2.5. Parameter identifiability

2.5.1. Preliminaries

The complexity of our mathematical model involves ten epidemiological states or compartments and an auxiliary variable C(t) that indicates the cumulative number of symptomatic diagnosed or new cases reported. In addition to the ten system states, our model consists of 15 parameters (see Table ), where five classes contribute to a new infection (E2,A,In,Is,J), which also contribute to the representation of R0c in these classes, see Equations (Equation3) and (Equation4). Then, the estimated individual parameters are used to compute R0c. Some initial conditions of the model are unknown and possibly need to be estimated along with the other model parameters. For our case we consider the values Is(0) and J(0) to estimate together to the parameters α, β, ρs, and p. This defines the set from parameters to estimate {α,β,ρs,p,Is(0),J(0)}, where the remaining parameters and initial conditions are known or assumed.

To estimate the parameter sets, we fit the model (with other parameters and initial values fixed) to each dataset using the auxiliary variable C and the curve of the daily cases generated. The parameter estimation procedure employs the methods detailed in Section 2.3. The initial parameter prediction affects the solution for the model as local minima occur. To guarantee the global minimum obtained in the optimization processes, we generate a multi-start (using LHS in Algorithm 1) repeating these processes for a random amount of initial parameters, whose ranges are defined as follows: 0<β<5,0.1<α<2,0<J(0)<C(0),0<Is(0)<300,0<ρs<1,and0<p<1. Here, the particular choice of the wide range for Is(0) aims to avoid restrictions during the calibration of the model due to its sensitivity.

We focus the parameter identifiability study for (Equation2) on characterizing the identifiable parameters using uncertainty studies to assess the stability of the parameters estimated. Then, the goal is to evaluate the identifiability using the mean value, the 95% confidence intervals, and the mean square error (MSE). The step-by-step description of the identifiability analysis is detailed in the next subsection.

2.5.2. Identifiability analysis process

In the study of identifiability, we must follow the following steps that rely on definitions and computations presented in detail in [Citation14, Citation51]:

  1. Assume that Θ={α,β,ρs,p,Is(0),J(0)} is the set of parameters to be estimated and Θ0 is the initial guess.

  2. Estimate Θ fitting the model to the dataset generated, using the SA-LSQ procedure with the values of the initial conditions and parameters given. The hat symbol indicates an estimated parameter. Then, we have the set Θˆ={αˆ,βˆ,ρˆs,pˆ,Isˆ(0),Jˆ(0)}.

  3. Using the bootstrapping process we obtain the 95% confidence intervals and their frequency distributions for each parameter estimated.

  4. Compute the mean squared error (MSE) MSE:=1Sj=1S(ΘΘˆj)2 for each parameter estimated. Here Θ represents the true parameter value, and Θˆj represents the estimated value of the parameter for the jth bootstrap realization.

  5. For each set of estimated parameters, R0c is calculated to obtain a distribution of R0c values as well.

Following the process described, we say that a model parameter is identifiable if its confidence interval lies in a finite range of values and contains the estimated (and real) parameter value. The preferred situations is that these values have a central position or are close to the mean value. Besides, a small confidence interval indicates that the parameter can be identifiable, while a broader range could indicate a lack of identifiability. When an estimated parameter has a low MSE and a small confidence interval, the we may assert that the parameter is identifiable from the model. On the other hand, larger confidence intervals or larger MSE values may suggest non-identifiability. Here confidence interval is considered to be small if its lower and upper bounds are not smaller or greater, respectively, than 10% of the value estimated. Besides, the smallness of the MSE depends on the nature of the estimated parameter. For instance, if the parameter represents a probability, then we say that MSE is small if it is a value close to zero, but if the parameter represents the quantity of population, one acceptable range can be considered no more than 0.01% of the total population.

Specifically, to analyze structural identifiability, we will fit the entire curves generated for each scenario to recover the values from the parameters assumed in each case. Moreover, to investigate practical identifiability, we will consider different fit times. This is done to address situations when only data of the initial phase (before the peak or maximum value) are available.

3. Results

3.1. Structural identifiability

By applying the identifiability methodology, we obtain results for each one of the steps 1 to 5 outlined in Section 2.5.2. The first type of results corresponds to the parameter estimation process that leads to the best-fit model solution. Secondly, we have 250 replicated datasets applying the bootstrap process to re-estimate model parameters. Figures  and  (Appendix) show the confidence intervals (represented by vertical lines) and the MSE bar plots for the experiments of Scenario 1, and Figures S1 and S2 (in the supplemental material) display analogous results for the experiments of Scenario 2. Observing the results obtained and registered in these figures, we concentrate our investigation on Tables  and , in which we classify the parameters estimated between identifiable and non-identifiable. A parameter is considered identifiable with narrow confidence interval, a mean value centred and close to the real value, and the MSE is small. If at least one of these identifiability criteria is not satisfied, then the parameter is considered non-identifiable.

From Table  we conclude that all estimated parameters are identifiable for Experiments 1 to 4, i.e. those without initial conditions to fit. For Experiments 5 to 7 in which the initial conditions are also estimated, we note that all the parameters except the initial condition J(0) are identifiable, maintaining mainly the identifiability of the parameter of interest R0c. Only Experiment 8 exhibits non-identifiability situations for all parameters except for the dataset generated under the assumption R0c=5. Here we can suspect that the identifiability improves when the force of infection is greater as in this last case. Nevertheless, we will continue to explore this part later. Another observation is that the estimated R0c can be robust to variation or bias in the other estimated parameters.

Analogously, we deduce from Table  that the parameters estimated in Experiments 1, 2, 3, and 5 are identifiable. The parameters in Experiments 8, 9, 10, and 12 are also identifiable (except for the initial condition J(0)). For Experiments 4, 7, and 9, the parameter R0c is identifiable even though no other parameters are identifiable. Particularly in Experiment 7, all the remaining parameters are non-identifiable. Other situations occur; for example, in Experiment 6, all estimated parameters except those based on data generated with R0c=3 and p = 0.05 are identifiable (red boxes in column Experiment 6 from Table ). For Experiment 13, all estimated parameters based on data generated with R0c=4 and 5 are identifiable. For Experiment 14, the results are not good, but the parameter R0c estimated using data generated with p = 0.01 is identifiable. A similar situation occurs with R0c in Experiment 11, where this quantity is identifiable only by fitting the data generated with R0c=5 and p = 0.1.

3.2. Practical identifiability

The results obtained for this part are in Supplemental Material, including coloured tables, namely Tables S2 and S3 that display information analogous to Tables  and . Tables S2 and S3 allow one to analyze the identifiability. For this study, we investigate the performance of our model to capture the initial phases of an epidemic disease, examining the choices of the fit periods, namely 20, 30, or 40 days. According to Table S2, in Experiment 1 the estimated parameters are identifiable. In Experiments 3 and 5 with fit times 30 and 40 days, the parameters R0c are identifiable, even though not all estimated parameters are identifiable. In contrast to this result, in Experiment 2 with a fitting period of 20 days, the parameters that define R0c are identifiable, but the computed value R0c is not. This situation could be linked with the inverse relation of R0 with parameter α (see Equation (Equation3)). Remember that 1/α represents the time from symptom onset to isolation or hospitalization. Thus a variation in α of 0.1 to 0.2 implies a change of 5 days from the period of symptoms onset to isolation, a very sensitive parameter. Notably, for this experiment, the confidence intervals and the MSE are plotted in Figure S3, where we can see that these intervals are not very large but lack precision, mainly for the case of a fit period of 20 days, and the epidemic growth is given by R0c=3 and 4. For this and other experiments, the parameter R0c tends to be identifiable when fit periods are greater than 20 days, and R0c4. The parameter ρs and the initial condition Is(0) and J(0) for all experiments of Scenario 1 are not identifiable. This suggests a lack of identifiability especially of the parameter ρs when we have fewer data to fit.

Table S2 also shows that the estimated parameters for Experiment 1 are identifiable. This result differs from Experiments 12, 13, and 14, where all parameters are non-identifiable. For Scenario 2, we can observe that for most experiments with a fit period of 20 days (except for Experiments 1 and 9 with R0c=3), the parameter R0c is non-identifiable. Moreover, all estimated parameters are also non-identifiable, especially for these cases, excluding Experiment 2. These results suggest using more fit data to guarantee identifiability, as it occurs more frequently in experiments with longer fit periods.

Another situation to comment on here is that for most experiments where the quarantine parameter p is estimated, the results mainly for R0c result non-identifiable, excluding only Experiment 5 for the fit period 40 days, and R0c assumed as 4 and 5, and Experiment 6 again for the fit period 40 days and R0c=3 and 4. Besides, for some quarantine parameters estimated from Experiments 5 and 6 with fit period 40 days, the result is identifiable, mostly when the fit data are generated assuming p = 0.01. We can think that the lack of identifiability increases when a larger part of the population is quarantined in the initial phase.

4. Application to regional data of Chile

We now apply our methodology to the data collected for some Chilean regions. From the structural and practical results obtained for our model, we apply the study to two cases, one to scan the initial phase (first 30 days) and another for an intermediate phase (first 120 days), considering the data before the change of quarantine criteria applied by the Chilean government. To explain the selection of regions and the phenomena of COVID-19 in Chile, we include the following subsection.

4.1. The early transmission of COVID-19 in Chile

We wish to apply the COVID-19 model of Section 2.1 to some administrative regions of Chile (see Figure ), where there are regions without quarantine periods, others with more than one quarantine period, a situation that the Chilean government named ‘dynamical quarantines’. This policy was in effect until July 27 2020. In addition, this measure was applied only to municipalities with a greater incidence of positive cases for COVID-19. For this reason, we consider a decoupled analysis to select the regions to analyze until July 27 2020, assuming the day of the first positive case as the first day for the timeline of each region and July 27 2020 is the last day. Since the onset of COVID-19 in Chile, the Ministry of Health has been reporting new cases of infection and death daily for each region and new cases of recovered persons at the national level. The official information (see [Citation24, Citation41] for details) also includes the daily numbers of polymerase chain reaction (PCR) tests applied, as well as the daily numbers of critical patients and occupancy of intensive care units (ICUs). A timeline of strategies and measures implemented to contain the pandemic is summarized in Table S1. Several times the Chilean Ministry of Health changed the case and death definitions. Table  lists the most relevant of these changes along with their dates. Considering the difficulties associated with the changes in case definitions, we decided to include data reported for symptomatic and asymptomatic cases in the COVID-19 model. Total data correspond to the addition of symptomatic and asymptomatic cases reported because an asymptomatic (tested) case may become symptomatic later.

Figure 3. The 16 administrative regions of Chile [Citation63] and their population according to the 2017 census [Citation10]. The Roman numbers are the official administrative numbers of the geographic regions. The greater Santiago area is the Metropolitan region (RM) and is counted as Region 13. Note that numbering is not strictly ordered from north to south; regions 14 to 16 have been created by dividing existing regions. The total population at the end of 2020 is estimated at 19.1 million.

Figure 3. The 16 administrative regions of Chile [Citation63] and their population according to the 2017 census [Citation10]. The Roman numbers are the official administrative numbers of the geographic regions. The greater Santiago area is the Metropolitan region (RM) and is counted as Region 13. Note that numbering is not strictly ordered from north to south; regions 14 to 16 have been created by dividing existing regions. The total population at the end of 2020 is estimated at 19.1 million.

Table 5. Changes in criteria of reported data.

4.2. Experiments applied to regional data

Analyzing the type of information and data published by the Chilean government we identified regions with more than one quarantine period. Then, considering these multiple quarantine periods and the conclusions obtained from our experimental identifiability study, we decided to select only those regions with exactly one and without quarantine periods from the Chilean data. Specifically, we selected those administrative regions that satisfy at least one of these conditions, namely those with

  1. exactly one unique quarantine period for the first 30 or 120 days,

  2. no quarantine periods in the first 30 or 120 days.

We also prefer data with an increasing number of reported cases, avoiding having consecutive null case counts, i.e. we select robust datasets for capturing epidemic growth. Overall, in light of the experimental results based on simulated data to analyze the (structural and practical) identifiability of the parameters, we know that the best results are achieved for the sets of estimated parameters (β,R0c) and (β,J(0),Is(0),R0c). However, to observe the quarantine parameter, we also consider the parameter sets (β,p,R0c) and (β,p,J(0),Is(0),R0c), for the case that the region has a single quarantine period, even knowing that these parameters are not always identifiable from the experimental results in the initial phase. Table  provides more details about parameter identifiability results.

Table 6. Summary of the identifiable parameters concluded from the structural and practical identifiability study used to Chilean data.

Subsequently, we filter the regional data for two periods, one for the first 30 days and the other for 120 days, and we select some of those that satisfy the above mentioned conditions. We have established the following regions for our application.

  • Regions 1 (Tarapacá) and 8 (Bío-Bío), without quarantine periods in the initial phase and one quarantine in the final phase.

  • Regions 3 (Atacama) and 4 (Coquimbo) were without quarantine during the 120 days.

  • Region 10 (Los Lagos) had one quarantine in the initial phase.

Then, with this regional data, we apply our identifiability study to two events,

  1. For regions without quarantine, we estimate the parameter sets (β,R0c) (Experiment 1) and (β,J(0),Is(0),R0c) (Experiment 2). We select a fit period of 30 or 120 days depending on the phase analyzed.

  2. For regions with quarantine, we estimate the parameter sets (β,p,R0c) (Experiment 1) and (β,p,J(0),Is(0),R0c) (Experiment 2).

We are preserving the same consideration for the time fit.

The left plots in Figures S4 and S5 display the regional data distributions, including some dates of quarantine measures and important holidays.

Establishing the experiments and the Chilean data for the analysis, we need to fix the values of those parameters that are not estimated. In Table , we summarize the values assumed for the parameters. For example, we consider the parameters for the time from symptom onset to isolation or hospitalization 1/α, and the proportions of fully infectious individuals who undergo testing ρs are averaging. In addition, we assume the quarantine parameters λ=0 and p = 0 when the experiment does not involve quarantine periods. Alternatively, when there is quarantine, the quarantine parameter p is estimated depending on the region, being the parameter 1/λ the quarantine duration applied to it (see Table ).

Table 7. Fixed parameters for running experiments and duration of quarantine periods.

Establishing the experiments and the Chilean data for the analysis, we need to fix the values of those parameters that are not estimated. In Table , we summarize the values assumed for the parameters. For example, we consider the parameters for the time from symptom onset to isolation or hospitalization 1/α, and the proportions of fully infectious individuals who undergo testing ρs are averaging. In addition, we assume the quarantine parameters λ=0 and p = 0 when the experiment does not involve quarantine periods. Alternatively, when there is quarantine, the quarantine parameter p is estimated depending on the region, being the parameter 1/λ the quarantine duration applied to it (see Table ).

The initial conditions satisfy the considerations assumed in the identifiability study, i.e. R(0)=Q(0)=D(0)=0,E1(0)=2J(0),E2(0)=4J(0)E1(0),In(0)=min{J(0),1},S(0)=NE1(0)E2(0)In(0)Is(0)A(0)J(0), where for the case when J(0) and Is(0) are estimated, these quantities are assumed to satisfy 0<J(0)<C(0) and 0<Is(0)<300. In those cases where J(0) and Is(0) are assumed, the values J(0)=1 and Is(0)=10 are utilized.

Table 8. Parameter estimation results for the application with Chilean data.

4.3. Application results

Finally, applying our methodology to Chilean regions data selected for the experiments defined, we obtained the parameter estimations (Table ), the fits plotted in Figures S4 and S5 and Table  which we constructed following the same procedure as for Tables  and . We observe the 95% confidence intervals obtained by the 250 replicates of the bootstrapping process and the MSE (see Figures S6 to S14 in Supplemental Material). According to the coloured table, the parameters β and R0c (computed) are usually identifiable, except for Region 10, which has a quarantine period in the first 30 days. Quarantine leads some points to fit, so perhaps our model is unsuccessful in this case, and the estimates generated are non-identifiable. As expected, given the summary of results in Table , Region 8 has a short quarantine period in the middle of the final phase, and it turns out difficult to fit data from this region during 120 days to estimate the sets of identifiable parameters. On the other hand, for Region 1, with a prolonged quarantine period in the final phase, conceivably, the initial conditions incorporate more noise in the estimations because it is difficult to estimate the parameters in Experiment 2.

Table 9. Parameter identifiability verification. A green box indicates that the parameter estimated satisfies the parameter identifiability conditions (confidence intervals obtained via bootstrapping are narrow; their mean values are centered; the MSE is small). A red box means the parameter that does not meet the identifiability criteria. Cases not studied are painted in gray.

In short of these two last regions, we can see that the duration of quarantine and the phase analyzed can be decisive when estimating parameters; these examples suggest that we need to explore more combinations of quarantine periods in our COVID-19 model. In addition, the parameters estimated from experiments applied to Regions 3 and 4 are identifiable, as expected, given the structural and practical identifiability demonstrated for our model using our methodology. In the same way, we verify the identifiability obtained for Experiments 1 and 2 applied to Region 8, and Region 1 has identifiable parameters only when we use Experiment 2. However, their confidence intervals are not distant when we apply Experiment 1. The noise from initial conditions is possibly more prevalent in Region 1 than Region 8.

5. Discussion and conclusions

We present and apply a computational approach developed by Roosa and Chowell [Citation51] for simple epidemic models. However, we propose adapting it to a complex compartmental model inspired by the first wave of the COVID-19 outbreak in Chile, where quarantines were declared involving different Chilean regions in early phases. Specifically, we propose a methodology to analyze the structural and practical parameter identifiability for a COVID-19 model. We explore the parameter uncertainty computationally through a parametric bootstrapping approach to achieve that goal. In that exploration, we involve synthetic data generated with the same COVID-19 model to measure the capability of our model to recover the parameters assumed to satisfy the identifiability structural and practical. Those criteria involve observing the 95% confidence intervals and the MSE (mean square error), constructed from empirical distributions resulting from the bootstrapping process. In this first attempt, we selected some experiments that would allow us to validate the structural and practical parameter identifiability. We consider different variables, for example, the number of parameters to fit, the time to fit, and the quarantine periods. Then, in our methodology, we construct coloured tables (e.g. Table ) to visualize these qualities. Overall in the results obtained, we observe that the parameter β, and in consequence R0c, are identifiable (when the rest of the parameters are known). The following best parameter sets in terms of structural identifiability are the sets without initial conditions to estimate, especially in the case without quarantines involved. Wherein the cases with quarantine are more sensitive as a percentage of the population in quarantine increases and the epidemic growth is R0<4, evidence of a lack of identifiability in the model when the quarantine dynamics are considered. Then, we would take quarantine periods studied in the models with caution. Expanding the analysis to the initial phases, we find a significant loss of parameter identifiability. It compared with the structural results. Possibly verifying the conclusion shown in [Citation52] indicates the non-identifiability in pre-peak moments. However, we are interested in analyzing the initial period to recover the control reproduction number. Therefore, we consider applying our computational approach to various initial epidemic periods to evaluate our COVID-19 model. In our experiments, we confirm that the parameter β is recovered, and in the cases when the initial conditions are also estimated and fit time is Tfit>20 days (see Tables S2 and S3 in the supplemental material) identifiability is guaranteed. Showing a characteristic dependence between the fit times and the growth rates, besides quarantine periods, which also affect identifiability, but maintain the parameter β identifiable.

Finally, collecting the conclusions from the experiment with synthetic data, we select the parameter sets (β), and (β,J(0),Is(0)) for periods without quarantine, and (β,p), and (β,p,J(0),Is(0)) for periods with quarantine, to fit with Chilean data for some regions. The model parameters are not identifiable in the Chilean regions with a quarantine period in the initial phase. On the other hand, regions without quarantine periods have better results. In particular, when more data (Tfit>20) is fit to estimate initial conditions is guaranteed practical identifiability. (see Table ). For future work, we see the necessity of selecting the fit times for each Chilean region because each one has a particular quarantine declared; besides, the quality data also is variable. The study developed by Freire-Flores et al. in [Citation21] also analyzes the quarantine periods in some Chilean regions using computations of reproduction numbers on different periods characterized by extraordinary government measures. Then we can use these times of fitting with our approach and compare with these results, where, in particular, the results for the parameter β in Region 8 are close.

Our findings further underscore the importance of investigating the identifiability of model parameters through structural and practical identifiability analysis to interpret the results better and avoid misleading model inferences during an epidemic outbreak [Citation22, Citation51, Citation56]. On the one hand, structural identifiability analysis should be conducted before the model is applied to real data since it can help guide the modeller about any parameters in the model that are not identifiable even in the scenario that the entire epidemic trajectory has been observed. Parameters deemed not structurally identifiable should be flagged by the modeller and determine whether the model can still be used to infer other quantities of interest. Analyzing a parameter's practical identifiability is unnecessary if it is not structurally identifiable. On the other hand, our results indicate that modellers should be especially careful when calibrating complex epidemic models using limited data of the early epidemic phases, as one or more parameters are often difficult to identify from limited data depending on the model's complexity level. Parameter identifiability occurs more frequently in cases involving more complex epidemic dynamics where parameters change over time due to the implementation and relaxation of social distancing interventions. Unsurprisingly, a parameter is unidentifiable when its function is to model dynamics for an event (e.g. quarantine measures) that has not occurred. As more data on an unfolding epidemic becomes available, model parameters tend to become identifiable, especially when the epidemic has already reached its peak. This highlights the challenges modellers face when using models to generate real-time epidemic predictions as the epidemic evolves.

Supplemental material

Supplemental Material

Download PDF (5.9 MB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Availability of data and materials

The data were recorded by the Chilean Government [Citation24], which can be accessed via a GitHub repository supported by Ministery of Science, Technology, Knowledge, and Innovation of Chile [Citation41].

Additional information

Funding

RB is supported by project MATH-Amsud 22-MATH-05 ‘NOTION: NOn-local conservaTION laws for engineering, biological and epidemiological applications: theoretical and numerical’ and from ANID (Chile) through Fondecyt project 1210610; Anillo project ANID/ACT210030; Centro de Modelamiento Matemático (CMM), project FB210005 of BASAL funds for Centers of Excellence; and Centro de Recursos Hídricos para la Agricultura y la Minería (CRHIAM), project ANID/FONDAP/15130015. GC is partially supported by National Science Foundation (NSF) grants #2026797, #2034003, and National Institutes of Health (NIH) R01 GM 130900, and by project ANID/PCI/MEC80170119. IK would like to thank the German Research Foundation (DFG) for financial support of the project within the Cluster of Excellence ‘Data-Integrated Simulation Science’ (EXC 2075), DFG Project 432343452. LYLD is supported by Agencia Nacional de Investigación y Desarrollo (ANID) scholarship ANID-PCHA/Doctorado Nacional/2019-21190640 and Centro de Modelamiento Matemático (CMM), project FB210005 of BASAL funds for Centers of Excellence.

References

  • A. Adiga, D. Dubhashi, B. Lewis, M. Marathe, S. Venkatramanan, and A. Vullikanti, Mathematical models for COVID-19 pandemic: A comparative analysis, J. Indian Inst. Sci. 100 (2020), pp. 793–807.
  • A. Afzal, C. Saleel, S. Bhattacharyya, N. Satish, O.D. Samuel, and I.A. Badruddin, Merits and limitations of mathematical modeling and computational simulations in mitigation of COVID-19 pandemic: A comprehensive review, Arch. Comput. Meth. Eng. 29 (2022), pp. 1311–1337.
  • R.M. Anderson and R.M. May, Infectious Diseases of Humans: Dynamics and Control, Oxford University Press, 1991.
  • M. Biggerstaff, R. Slayton, M. Johansson, and J. Butler, Improving pandemic response: Employing mathematical modeling to confront xoronavirus disease 2019, Clin. Infect. Dis. 74 (2022), pp. 913–917.
  • F. Brauer and C. Castillo-Chavez, Mathematical Models in Population Biology and Epidemiology, Springer, New York, 2012.
  • J.H. Buckner, G. Chowell, and M.R. Springborn, Dynamic prioritization of COVID-19 vaccines when social distancing is limited for essential workers, Proc. Nat. Acad. Sci. 118(16) (2021), p. e2025786118.
  • R. Bürger, G. Chowell, and L.Y. Lara-Díaz, Comparative analysis of phenomenological growth models applied to epidemic outbreaks, Math. Biosci. Eng. 16 (2019), pp. 4250–4273.
  • R. Bürger, G. Chowell, and L.Y. Lara-Díaz, Measuring differences between phenomenological growth models applied to epidemiology, Math. Biosci. 334 (2021), p. 108558. https://doi.org/10.1016/j.mbs.2021.108558
  • T. Burki, COVID-19 in Latin America, Lancet Infect. Dis. 20(5) (2020), pp. 547–548. https://doi.org/10.1016/s1473-3099(20)30303-0
  • Census Chile, Census Chile (2017). Accessed February 7, 2021. Available at http://www.censo2017.cl/descargas/home/sintesis-de-resultados-censo2017.pdf.
  • J.W. Chan, S. Yuan, K. Kok, K.W. To, H. Chu, J. Yang, F. Xing, J. Liu, C.Y. Yip, R.W.S. Poon, H.W. Tsoi, S.K.F. Lo, K.H. Chan, V.K.M. Poon, W.M. Chan, J.D. Ip, J.P. Cai, V.C.C. Cheng, H. Chen, C.K.M. Hui, and K.Y. Yuen, A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: A study of a family cluster, Lancet 395 (2020), pp. 514–523.
  • J. Chapman and N. Evans, The structural identifiability of susceptible–infective–recovered type epidemic models with incomplete immunity and birth targeted vaccination, Biomed. Signal Process. Control 4 (2009), pp. 278–284.
  • O.T. Chis, J.R. Banga, and E. Balsa-Canto, Structural identifiability of systems biology models: A critical comparison of methods, PLoS One 6(11) (2011), p. e27755.
  • G. Chowell, Fitting dynamic models to epidemic outbreaks with quantified uncertainty: A primer for parameter uncertainty, identifiability, and forecast, Infect. Disease Model. 2 (2017), pp. 379–398.
  • G. Chowell, D. Chowell, K. Roosa, R. Dhillon, and S. Devabhaktuni, Sustainable social distancing through facemask use and testing during the COVID-19 pandemic, preprint (2020). Available at medRXiv.
  • G. Chowell, S. Dahal, Y. Liyanage, A. Tariq, and N. Tuncer, Structural identifiability analysis of epidemic models based on differential equations: A primer, preprint (2022). Available at arXiv.
  • G. Chowell, P. Fenimore, M. Castillo-Garsow, and C. Castillo-Chavez, SARS outbreaks in Ontario, Hong Kong and Singapore: The role of diagnosis and isolation as a control mechanism, J. Theor. Biol. 224 (2003), pp. 1–8.
  • O. Diekmann, H. Heesterbeek, and T. Britton, Mathematical Tools for Understanding Infectious Disease Dynamics, Vol. 7. Princeton University Press, 2013.
  • N. Evans, L. White, M. Chapman, K. Godfrey, and M. Chappell, The structural identifiability of the susceptible infected recovered model with seasonal forcing, Math. Biosci. 194 (2005), pp. 175–197.
  • S. Flaxman, S. Mishra, A. Gandy, H.J.T. Unwin, T.A. Mellan, H. Coupland,E Eeeetal, H. Zhu, T. Berah, J.W. Eaton, M. Monod, A.C. Ghani, C.A. Donnelly, S. Riley, M.A.C. Vollmer, N.M. Ferguson, L.C. Okell, S. Bhatt, and Imperial College COVID-19 Response Team, Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe, Nature 194 (2005), pp. 175–197.
  • D. Freire-Flores, N. Llanovarced-Kawles, A. Sanchez-Daza, and Á. Olivera-Nappa, On the heterogeneous spread of COVID-19 in Chile, Chaos Solit. Fract. 150 (2021), p. 111156. Available at https://www.sciencedirect.com/science/article/pii/S0960077921005105.
  • L. Gallo, M. Frasca, V. Latora, and G. Russo, Lack of practical identifiability may hamper reliable predictions in COVID-19 epidemic models, Sci. Adv. 8 (2022), p. eabg5234. https://doi.org/10.1126/sciadv.abg5234.
  • P.J. Garcia, A. Alarcón, A. Bayer, P. Buss, G. Guerra, H. Ribeiro, K. Rojas, R. Saenz, N. Salgado de Snyder, G. Solimano, R. Torres, S. Tobar, R. Tuesca, G. Vargas, and R. Atun, COVID-19 response in Latin America, Am. J. Trop. Med. Hyg. 103 (2020), pp. 1765–1772.
  • Government of Chile, COVID-19 data. Accessed February 16, 2021. Official COVID-19. Available at https://www.gob.cl/coronavirus/cifrasoficiales/#reportes.
  • H. Hong, A. Ovchinnikov, G. Pogudin, and C. Yap, SIAN: Software for structural identifiability analysis of ODE models, Bioinformatics 35 (2019), pp. 2873–2874. https://doi.org/10.1093/bioinformatics/bty1069.
  • J. Jia, J. Ding, S. Liu, G. Liao, J. Li, B. Duan, G. Wang, and R. Zhang, Modeling the control of COVID-19: Impact of policy interventions and meteorological factors (2020).
  • Y.H. Kao and M.C. Eisenberg, Practical unidentifiability of a simple vector-borne disease model: Implications for parameter estimation and intervention assessment, Epidemics 25 (2018), pp. 89–100.Available at https://www.sciencedirect.com/science/article/pii/S1755436517301627.
  • W. Kermack and A. McKendrick, A contribution to the mathematical theory of epidemics, Proc. Roy. Soc. A 115 (1927), pp. 700–721.
  • E. Kharazmi, M. Cai, X. Zheng, Z. Zhang, G. Lin, and G.E. Karniadakis, Identifiability and predictability of integer- and fractional-order epidemiological models using physics-informed neural networks, Nature Comput. Sci. 1 (2021), pp. 744–753.
  • T. Kirby, South America prepares for the impact of COVID-19, Lancet Respir. Med. 8(6) (2020), pp. 551–552.
  • H.J. Li, L. Wan, Z. Wang, C.X.Z. u, A. Moustakas, and S. Pei, Mathematical modelling of the pandemic of 2019 novel coronavirus (COVID-19): Patterns, dynamics, prediction, and control, Front. Phys. 9 (2021), p. 738602.
  • Y. Li, H. Campbell, D. Kulkarni, A. Harpur, M. Nundy, X. Wang, and H. Nair, The temporal association of introducing and lifting non-pharmaceutical interventions with the time-varying reproduction number (R) of SARS-CoV-2: A modelling study across 131 countries, Lancet Infect. Dis. 21 (2021), pp. 193–202. Available at https://www.sciencedirect.com/science/article/pii/S1473309920307854.
  • T.S. Ligon, F. Fröhlich, O.T. Chiş, J.R. Banga, E. Balsa-Canto, and J. Hasenauer, GenSSI 2.0: Multi-experiment structural identifiability analysis of SBML models, Bioinformatics 34 (2017), pp. 1421–1423. https://doi.org/10.1093/bioinformatics/btx735.
  • N. Linton, T. Kobayashi, Y. Yang, K. Hayashi, A. Akhmetzhanov, S.M. Jung, B. Yuan, R. Kinoshita, and H. Nishiura, Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data, J. Clin. Med. 9 (2020), p. 538. https://doi.org/10.3390/jcm9020538.
  • M. Martcheva, An Introduction to Mathematical Epidemiology, Springer, New York, 2015.
  • T. McKinley, A.R. Cook, and R. Deardon, Inference in epidemic models without likelihoods, Int. J. Biostat. 5(1) 2009. https://doi.org/10.2202/1557-4679.1171.
  • N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (2004), pp. 1087–1092. https://doi.org/10.1063/1.1699114.
  • H. Miao, C. Dykes, L.M. Demeter, J. Cavenaugh, S.Y. Park, A.S. Perelson, and H. Wu, Modeling and estimation of kinetic parameters and replicative fitness of HIV-1 from flow-cytometry-based growth competition experiments, Bull. Math. Biol. 70 (2008), pp. 1749–1771.
  • H. Miao, X. Xia, A.S. Perelson, and H. Wu, On identifiability of nonlinear ODE models and applications in viral dynamics, SIAM Rev. 53 (2011), pp. 3–39.
  • Ministerio de Salud, Chile, Ministerio de Salud, Chile. Accessed July 29, 2021. Available at http://www.minsal.cl.
  • Ministery of Science, Technology, Knowledge and Innovation of Chile, Website of the official database for COVID-19 research. Accessed February 16, 2021. Available at http://www.minciencia.gob.cl/covid19.
  • K. Mizumoto, K. Kagaya, A. Zarebski, and G. Chowell, Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020, Euro Surveill. 25(10) (2020). https://doi.org/10.2807/1560-7917.es.2020.25.10.2000180.
  • A. Morciglio, B. Zhang, G. Chowell, J.M. Hyman, and Y. Jiang, Mask-ematics: Modeling the effects of masks in COVID-19 transmission in high-risk environments, Epidemiologia 2 (2021), pp. 207–226. https://doi.org/10.3390/epidemiologia2020016.
  • V.K. Murty and J. Wu, Mathematics of Public Health, Springer, Cham, 2022.
  • J.C. Navarro, J. Arrivillaga-Henríquez, J. Salazar-Loor, and A.J. Rodriguez-Morales, COVID-19 and dengue, co-epidemics in Ecuador and other countries in Latin America: Pushing strained health care systems over the edge, Travel Med. Infect. Dis. 37 (2020), p. 101656. Available at https://www.sciencedirect.com/science/article/pii/S1477893920301241.
  • H. Nishiura, T. Kobayashi, T. Miyama, A. Suzuki, S.M. Jung, K. Hayashi, R. Kinoshita, Y. Yang, B. Yuan, A.R. Akhmetzhanov, and N.M. Linton, Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19), Int. J. Infect. Dis 94 (2020), pp. 154–155.
  • R. Padmanabhan, H.S. Abed, N. Meskin, T. Khattab, M. Shraim, and M.A. Al-Hitmi, A review of mathematical model-based scenario analysis and interventions for COVID-19, Comput. Meth. Programs Biomed. 209 (2021), p. 106301. Available at https://www.sciencedirect.com/science/article/pii/S0169260721003758.
  • J. Paireau, A. Andronico, N. Hozé, M. Layan, P. Crépey, A. Roumagnac, M. Lavielle, P.Y. Boëlle, and S. Cauchemez, An ensemble model based on early predictors to forecast COVID-19 health care demand in France, Proc. Nat. Acad. Sci. 119 (2022), p. e2103302119. https://doi.org/10.1073/pnas.2103302119.
  • L. Peng, W. Yang, D. Zhang, C. Zhuge, and L. Hong, Epidemic analysis of COVID-19 in China by dynamical modeling, (2020). Available at medRxiv https://www.medrxiv.org/content/early/2020/02/18/2020.02.16.20023465.
  • M. Renardy, D. Kirschner, and M. Eisenberg, Structural identifiability analysis of age-structured pde epidemic models, J. Math. Biol. 84 (2022), pp. 1–30.
  • K. Roosa and G. Chowell, Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models, Theor. Biol. Med. Model. 16(1) (2019). https://doi.org/10.1186/s12976-018-0097-6.
  • T. Sauer, T. Berry, D. Ebeigbe, M.M. Norton, A.J. Whalen, and S.J. Schiff, Identifiability of infection model parameters early in an epidemic, SIAM J. Control Optim. 60(2) (2022), pp. S27–S48.
  • S. Shankar, S.S. Mohakuda, A. Kumar, P. Nazneen, A.K. Yadav, K. Chatterjee, and K. Chatterjee, Systematic review of predictive mathematical models of COVID-19 epidemic, Medical J. Armed Forc. India 77 (2021), pp. S385–S392. Special issue: COVID-19 – navigating through challenges. Available at https://www.sciencedirect.com/science/article/pii/S0377123721001258.
  • A. Tariq, T. Chakhaia, S. Dahal, A. Ewing, X. Hua, S.K. Ofori, O. Prince, A.D. Salindri, A.E. Adeniyi, J.M. Banda, P. Skums, R. Luo, L.Y. Lara-Díaz, R. Bürger, I.C.H. Fung, E. Shim, A. Kirpich, A. Srivastava, and G. Chowell, An investigation of spatial-temporal patterns and predictions of the coronavirus 2019 pandemic in colombia, 2020–2021, PLOS Negl. Trop. Dis. 16 (2022), p. e0010228. https://doi.org/10.1371/journal.pntd.0010228.
  • E.C.C.W. T.N.C.P.E.R.E. Team, The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) −china (2020).
  • N. Tuncer and T.T. Le, Structural and practical identifiability analysis of outbreak models, Math. Biosci. 299 (2018), pp. 1–18. Available at https://www.sciencedirect.com/science/article/pii/S0025556417303164.
  • N. Tuncer, A. Timsina, M. Nuno, G. Chowell, and M. Martcheva, Parameter identifiability and optimal control of an SARS-CoV-2 model early in the pandemic, J. Biol. Dyn. 16 (2022), pp. 412–438.
  • E.A. Undurraga, G. Chowell, and K. Mizumoto, COVID-19 case fatality risk by age and gender in a high testing setting in Latin America: Chile, March–August 2020, Infect. Dis. Pov. 10(1) (2021).
  • P. van den Driessche and J. Watmough, Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission, Math. Biosci. 180 (2002), pp. 29–48. Available at https://www.sciencedirect.com/science/article/pii/S0025556402001086.
  • E. Vynnycky, and R.E. White, An Introduction to Infectious Disease Modelling. OUP Oxford, 2010.
  • P.G.T. Walker, C. Whittaker, O.J. Watson, M. Baguelin, P. Winskill, A. Hamlet, B.A. Djafaara, Z. Cucunubá, D.O. Mesa, W. Green, H. Thompson, S. Nayagam, K.E.C. Ainslie, S. Bhatia, S. Bhatt, A. Boonyasiri, O. Boyd, N.F. Brazeau, L. Cattarino, G. Cuomo-Dannenburg, A. Dighe, C.A. Donnelly, I. Dorigatti, S.L. van Elsland, R. FitzJohn, H. Fu, K.A.M. Gaythorpe, L. Geidelberg, N. Grassly, D. Haw, S. Hayes, W. Hinsley, N. Imai, D. Jorgensen, E. Knock, D. Laydon, S. Mishra, G. Nedjati-Gilani, L.C. Okell, H.J. Unwin, R. Verity, M. Vollmer, C.E. Walters, H. Wang, Y. Wang, X. Xi, D.G. Lalloo, N.M. Ferguson, and A.C. Ghani, The impact of COVID-19 and strategies for mitigation and suppression in low- and middle-income countries, Science 369 (2020), pp. 413–422. https://doi.org/10.1126/science.abc0035.
  • J. Wang, Mathematical models for COVID-19: applications, limitations, and potentials, J. Publ. Health Emerg. 4 (2020), pp. 9–9, https://doi.org/10.21037/jphe-2020-05.
  • Wikipedia Commons, Map of Chile. Accessed February 7, 2021. Available at https://commons.wikimedia.org/wiki/File:Mapa-chile.svg.
  • World Health Organization, WHO Director-General's opening remarks at the media briefing on COVID-19-11. March 2020 World Health Organization 2020 [April 23]. Available at https://bit.ly/2A8aCIO.
  • H. Wu, H. Zhu, H. Miao, and A. Perelson, Parameter identifiability and estimation of HIV/AIDS dynamic models, Bull. Math. Biol. 70 (2008), pp. 785–799.
  • P. Yan and G. Chowell, Quantitative Methods for Investigating Infectious Disease Outbreaks, Texts in Applied Mathematics, Springer, Cham, 2019.
  • C. You, Y. Deng, W. Hu, J. Sun, Q. Lin, F. Zhou, C.H. Pang, Y. Zhang, Z. Chen, and X.H. Zhou, Estimation of the time-varying reproduction number of COVID-19 outbreak in China, Int. J. Hyg. Environ. Health 228 (2020), p. 113555. Available at https://www.sciencedirect.com/science/article/pii/S1438463920302133.
  • C. Zhan, Y. Zheng, T.H.Z. ai, and B. Li, Identifying epidemic spreading dynamics of COVID-19 by pseudocoevolutionary simulated annealing optimizers, Neural Comput. Applic. 33 (2021), pp. 4915–4928.
  • S. Zhang, J. Ponce, Z. Zhang, G. Lin, and G. Karniadakis, An integrated framework for building trustworthy data-driven epidemiological models: Application to the COVID-19 outbreak in New York City, PLOS Comput. Biol. 17 (2021), pp. 1–29. https://doi.org/10.1371/journal.pcbi.1009334.
  • Z. Zhao, X. Li, F. Liu, G. Zhu, C. Ma, and L. Wang, Prediction of the COVID-19 spread in African countries and implications for prevention and control: A case study in South Africa, Egypt, Algeria, Nigeria, Senegal and Kenya, Sci. Tot. Environ. 729 (2020), p. 138959. Available at https://www.sciencedirect.com/science/article/pii/S0048969720324761.
  • C. Zhou, Evaluating new evidence in the early dynamics of the novel coronavirus COVID-19 outbreak in Wuhan, China with real time domestic traffic and potential asymptomatic transmissions, (2020). Available at medRxiv https://www.medrxiv.org/content/early/2020/02/20/2020.02.15.20023440.
  • L. Zou, F. Ruan, M. Huang, L. Liang, H. Huang, Z. Hong, J. Yu, M. Kang, Y. Song, J. Xia, Q. Guo, T. Song, J. He, H.L. Yen, M. Peiris, and J. Wu, SARS-CoV-2 viral load in upper respiratory specimens of infected patients, New Engl. J. Med. 382 (2020), pp. 1177–1179. PMID: 32074444. https://doi.org/10.1056/NEJMc2001737.

Appendix. Tables and Figures

Figure A1. Summary of results using conditions of experiment of scenario 1 (Parameters (α, β, ρs)). Here and in Figure in per row, the left plot corresponds to the 95% confidence intervals (vertical lines) for the distributions of each estimated parameter obtained through 250 realizations of the synthetic data generated for Scenario 1. Each red point denotes the mean estimated parameter value. The light-blue dashed horizontal line represents the true value (or assumed) for each parameter estimated; the experiments applied to each parameter are fixed on the horizontal axis. The right plot corresponds to the mean squared error (MSE) distribution of parameter estimates considering each experiment and dataset. Finally, Yellow, blue, and green lines or bars are grouped for each experiment and represent each R0c assumed for creating the synthetic data, i.e. R0c=3, 4, and 5, respectively.

Figure A1. Summary of results using conditions of experiment of scenario 1 (Parameters (α, β, ρs)). Here and in Figure A2 in per row, the left plot corresponds to the 95% confidence intervals (vertical lines) for the distributions of each estimated parameter obtained through 250 realizations of the synthetic data generated for Scenario 1. Each red point denotes the mean estimated parameter value. The light-blue dashed horizontal line represents the true value (or assumed) for each parameter estimated; the experiments applied to each parameter are fixed on the horizontal axis. The right plot corresponds to the mean squared error (MSE) distribution of parameter estimates considering each experiment and dataset. Finally, Yellow, blue, and green lines or bars are grouped for each experiment and represent each R0c assumed for creating the synthetic data, i.e. R0c=3, 4, and 5, respectively.

Figure A2. Summary of results using conditions of scenario 1 (Parameters (Is(0), J(0), R0c)).

Figure A2. Summary of results using conditions of scenario 1 (Parameters (Is(0), J(0), R0c)).