822
Views
0
CrossRef citations to date
0
Altmetric
Research Article

A recursive model of residual life prediction for human beings with health information from activities of daily living and memory

, , , , &
Pages 529-541 | Received 02 Jan 2021, Accepted 12 Jun 2021, Published online: 07 Jul 2021

Abstract

As the ageing population increases, it is essential to manage human resources with residual life prediction, especially for the ageing population. Thus, this study aims to predict residual life using the available information of health statuses, such as the performances in activities of daily living (ADL) and memory. In this study, the principal components of ADL and memory information are extracted for prediction. The relationship between the principal components and residual life is established based on the concept of the proportional residual, which states that the residual life may be proportional to the changes in ADL and memory performance. A recursive model for residual life prediction is formulated and fitted for the Chinese Health and Nutrition Survey data. Finally, a goodness-of-fit test is conducted, and an example case is presented. The results show that the fitted model in this study is more accurate and precise than the original, unfitted model to estimate the residual life of human beings. This study is advantageous over existing studies in two aspects: (1) the model is formulated using a recursive method based on stochastic filtering; (2) the data include both physical status and mental status.

1. Introduction

Currently, live expectancy surpasses the age of 60 (de Cabo & Mattason, Citation2019; DESA, Citation2007; Fitzmaurice et al., Citation2019; Rudd et al., Citation2020; Word Health Organization, Citation2015). This demographic brings unprecedented opportunities and challenges. Longer life can be an incredibly valuable resource in economic, social, cultural, and familial aspects (Beard & Bloom, Citation2015). However, the healthcare expenditures for older people are increasing over time (Gregersen, Citation2014). The increasing prevalence of chronic diseases associated with ageing that accompanies the growth of the elderly population makes it hard for health and care services to cope (Oliver et al., Citation2014). Previous research shows that the expected cumulative expenditures for healthier elderly persons are similar to those of less healthy individuals (Lubitz et al., Citation2003). This implies that if we can estimate the health status of the aged person, their possible expenditure in a good health condition can be saved. Intuitively, a direct reflection of health status is the residual life. Thus, if a healthcare system wants to manage health better and reduce costs, an effective residual life prediction is needed.

Many researchers have studied life prediction for human beings using life expectancy or mortality (Arora et al., Citation2016; Atance et al., Citation2020; Boudoulas et al., Citation2017; Evans & Soliman, Citation2019; Parker et al., Citation2020; Wandeler et al., Citation2016; Wang et al., Citation2018). Traditional studies investigated this topic from the perspective of disease. Wandeler et al. studied the trends in life expectancy of HIV-positive adults on antiretroviral therapy worldwide (Wandeler et al., Citation2016). Wang et al. proposed a mortality prediction system for heart failure with orthogonal relief and dynamic radius means (Wang et al., Citation2018). Another stream of studies investigated this topic from the perspective of non-disease factors. For instance, Boudoulas et al. reviewed the effect of the evolution of medical science and medical technology on life expectancy (Boudoulas et al., Citation2017). Evans and Soliman presented an ecological study on the relationship between the subjective sense of well-being and life expectancy (Evans & Soliman, Citation2019). Parker et al. provided a systematic review on healthy working life expectancy at age 50 (Parker et al., Citation2020).

Besides traditional statistical studies, some researchers tried to study life expectancy or mortality using mathematical models (Hashir & Sawhney, Citation2020; Li et al., Citation2017; Su et al., Citation2020). For example, Li et al. used machine learning models to predict in-hospital mortality of ST-elevation myocardial infarction patients (Li et al., Citation2017). Su et al. developed a clinical prediction model for the mortality of diabetic adults with COVID-19 in Wuhan, China (Su et al., Citation2020). Hashir and Sawhney attempted to predict mortality with free-text clinical notes (Hashir & Sawhney, Citation2020). However, these mathematical models are limited to a population with a specific disease, while healthcare requires a comprehensive health assessment or residual life prediction for an average person. Some researchers developed models to estimate age-specific mortality for an average person (Cho et al., Citation2020). For instance, Cho et al. proposed an age-structured biomass model with an impulsive dynamic system to estimate age-specific natural mortality (Cho et al., Citation2020), Díaz-Rojo et al. used a multivariate control chart and Lee-Carter models to study mortality changes (Díaz-Rojo et al., Citation2020).

However, both streams of studies have limitations. On the one hand, these traditional studies were carried out with statistical approaches, lacking the support of the change mechanism for life prediction and the results of quantitative analysis cannot be interpreted clearly (Atance et al., Citation2020). These studies cannot be easily used to guide healthcare workers, especially when they want to implement the healthcare policy to older adults, since these studies cannot explain the life prediction mechanism. These mathematical models of age-specific mortality did not utilize other health information except age, reducing the prediction accuracy. For example, aged people with better health status than age match groups may live longer than expected. Herein, it is recommended to utilize health information to adjust the residual life prediction.

Thus, this study proposes a recursive model to estimate the residual life of human beings with the utilization of health information to obtain a comprehensive health assessment (Senne, Citation1972). This estimation is related to the recursive filtering problem (Liu et al., Citation2021; Yang et al., Citation2021). Herein, the proposed model is constructed based on a stochastic filtering model, which was initially proposed by Wang and Christer (Citation2000) for condition-based monitoring maintenance applications and then applied in life prediction for machines with vibration monitoring (Wang et al., Citation2018). This model is essentially a recursive Bayesian algorithm that is advanced in two aspects. First, it can track the subject’s progress and thus predict the outcome without information loss. Second, since the information is fully utilized, the prediction is more accurate and precise. Thus, this method can provide an effective view of the health status observed over one’s lifetime to estimate the residual life for human beings.

Moreover, this study also makes innovation in terms of information utilized. The previous research works of residual life prediction for the elderly are usually based only on the physiological conditions of the elderly, ignoring the influence of mental factors. To provide a more accurate prediction of the remaining life of the elderly, mental factors, such as the performance in memory, were introduced within the physical health indicators, such as the performance in Activities in Daily Life (ADL) in this study. In other words, the filtering approach is adapted to use both disability and cognitive impairment information, which have been proved to associate closely with later life loss (Jagger et al., Citation2007; Spiers et al., Citation2005; Stuck et al., Citation1999). The level of disability can be reflected by the performance in ADL (Covinsky et al., Citation2003; Dunlop et al., Citation1997; Katz, Citation1983), while the decline of cognitive function can be reflected by the memory impairment (Brewer et al., Citation2005; McGuire et al., Citation2006). Therefore, ADL and memory performance changes can be used as early indicators of mortality risk to estimate the residual life.

People may question why medical history, especially related to severe diseases, is not employed to predict the residual life in this study. For this issue, this study has four concerns. First, the medical history is not easily collected since people may be reluctant to talk about this sensitive topic. Second, only a fraction of people have access to their medical history, so that the application of the method utilizing medical history is limited. Third, medical history information is also difficult to be unified because patients are treated in different medical institutions or doctors with different treatment plans. Last, the medical history cannot reflect some sub-health status conditions without the diagnosis of disease, and these sub-health status conditions may also cause a reduction of residual life.

The remainder of this paper is arranged as follows. Section 2 presents the introduction and the initial process of data and variables used. Section 3 gives the model formulation process. Section 4 fits the proposed model according to the Chinese Health and Nutrition Survey (CHNS) data, a nationally representative survey in China. Section 5 provides a case study to illustrate the proposed model. Section 6 discusses, and Section 7 concludes this study, respectively.

2. Data and variables

This study uses longitudinal data from the Chinese Health and Nutrition Survey (CHNS) conducted by the Carolina Population Centre at the University of North Carolina at Chapel Hill, U.S.A., and the National Institute of Nutrition and Food Safety at the Chinese Centre for Disease Control and Prevention. CHNS started in 1989, and there were eight follow-up surveys between 1991 and 2011. CHNS used a multi-stage, random cluster method to draw samples from 19,000 participants from 4400 households in nine provinces, which made these samples substantially varied in geography, economic development, public resources, and health indicators (Popkin et al., Citation2009).

The general retirement age in China is around 55; thus, the ADL and memory tests are designed for adults aged 55 and older in the CHNS questionnaire. Therefore, the elderly is defined as adults older than 55 years old when their follow-up began, and in this study, all the analyses were performed for this population only. Excluding participants with invalid, erroneous, and incomplete cases on variables of concern, the total number of samples in our analyses is 10,986. Among them, 1746 samples recorded the age of death (complete), while the other samples are still alive at the last survey (censored), as shown in Table .

Table 1. Sample statistics.

From 2000 to 2011, ADL and memory were analysed separately in the CHNS questionnaire. The first section used ADL to understand the various life difficulties caused by health and physical limitations, as shown in Table . Respondents answered the question, ‘Do you have any difficulty doing this?’ for activities shown in Table . Possible answers were: 1 no difficulty; 2 have some difficulty but can still do it; 3 need help to do it; 4 cannot do it at all.

The second part was a memory test, as shown in Table . It began with the question, ‘How is your memory?’, followed by ‘In the past twelve months, how has your memory changed?’ Next, the participants were asked to memorize and repeat ten words. After counting backward from 20 to 1, the participants were asked to repeat those ten words. The questions are shown in Table .

Twenty-five variables can be extracted from these 25 questions, and these require at least 25 parameters to be estimated from the data, which is a demanding task even if they are not correlated. The fact is that some of them may be highly correlated or even useless to the following analyses. Hence, the relationships among the 25 variables need to be examined first to reduce the parameters. We used Principal Component Analysis (PCA) to examine the variables. PCA can transform the original set of variables to a new set of uncorrelated variables, named principal components, which are linear combinations of the original variables (Li & Liu, Citation2020; Miao & Lv, Citation2020; Samuel & Cao, Citation2016). Generally, a few principal components account for most of the variation in the original information and will be intuitively meaningful and useful in subsequent analyses where we can operate with a largely reduced number of variables. The process of PCA is as follows.

Let YT=[Y1,, Ym] be the set of m random variables that represent the m observed original variables data. Using PCA, an m-dimensional uncorrelated variable vector, ZT=[Z1,, Zm], is obtained, whose variances decrease in sequence from the first. Each Zj is taken to be a linear combination of YT, so that (1) Zj=b1jY1+b2jY2++bmjYm=bjTY,(1) where bjT=[b1j,b2j,,bmj] is a vector of constants. Equation (1) contains an arbitrary scale factor. After imposing the condition, bjTbj=k=1mbkj2=1, the overall transformation is ensured to be orthogonal and the distances are preserved. Then the variance of each Zj, λj is given by (2) λj=Var(Zj)=Var(bjTY)=bjTbj=bjTλjIbj,(2) where is the covariance matrix of Y and λ1>λ2>>λm0 are the eigenvalues of the covariance matrix of Z.

The first principal component, Z1, is found by choosing b1, to maximize the variance of b1TY subject to the constraint that b1Tb1=1. Thus, Z1 is believed to have the largest possible variance for all combinations of the form of Equation (1). Similarly, Z2 is the second principal component found by choosing b2, so that it has the second-largest possible variance and is uncorrelated with Z1. Principal components Z3,Z4,,Zm are all derived, uncorrelated, and with decreasing variances. Then, using bjT in Equation (1), Zj and the variance λj can be obtained. Because the ADL and memory reflect physical and mental health, respectively, the PCA is performed for the two parts separately. Variances of these principal components are shown in Figure .

Figure 1. Variances of principal components.

Figure 1. Variances of principal components.

After calculating the eigenvalues and the principal components, the first few components are examined to determine if they account for a large proportion of the total variance. This study selects the components that account for more than 70% of cumulative variance. Figure  shows that the first principal components of both ADL and memory account for the overwhelming majority (over 70%) of the total variation in the original information. To balance between the number of components and the amount of original information, two principal components are used in subsequent analyses: the first principal components of ADL denoted as S, and the first principal components of memory denoted as M. The PCA transforming parameters of the two first principal components are shown in Tables  and .

3. Model formulation

3.1. Residual life prediction

Typically, deteriorating health means the reduction of residual life. Since health status is an ambiguous concept difficult to quantify, the residual life can be used as a health indicator. In this case, the health status information, including ADL and memory, is used to monitor our model. Generally, physical health is strongly associated with ADL, while mental health is positively related to memory; therefore, people in a feebler health status usually exhibit deteriorative performance in ADL and memory. Thus, there is a stochastic relationship between health status monitoring data and the unobserved actual health status.

In this study, a new filtering methodology with a recursive nature is adopted. The model use all past information about ADL and memory as covariates to predict residual life. From the PCA analyses, two principal components are shown to contain most of the initially observed information, and therefore, can be used to represent the original ADL and memory data. In practice, the first principal component means a weighted average of the original information where the weights are obtained by maximizing the variance. Let Si denote the first principal components calculated based on the observed ADL at the time of the ith survey, ti, let xi denote the residual life of the participant at ti, provided that this individual has survived to ti, and let S~i denote the history values of si, where S~i={s1,s2,,si}. M~i is defined in the same pattern. Then the objective is to establish the Probability Distribution Function (PDF) of xi given S~i and M~i, i.e. pi(xi|S~i,M~i).

The relationship between residual lives at times ti and ti1 can be expressed as (3) xi=xi1(titi1) if xi1titi1,not defined if xi1<titi1.(3) Equation (3) shows that xi must be conditional on xi1titi1 so that pi(xi|S~i,M~i,xi1titi1)=pi(xi|S~i,M~i). Our model assumes that after obtaining the health status information (ADL and memory) at time ti, the residual life distribution estimated is altered by the difference between the actual and expected health status decrements over (ti1,ti). A similar concept has been used in accelerated life models where the survival function is accelerated by a deterministic constant (Crowder et al., Citation1991). Then pi(xi|S~i,M~i,xi1titi1) is given by (4) pi(xi|S~i,M~i,xi1titi1)=pi(xi|S~i,M~i)=pi(xif(ΔwiΔw¯i)|S~i1,M~i1)1f(ΔwiΔw¯i)(4) where Δwi  denotes the actual health status decrement and Δw¯i denotes the expected health status decrement over (ti1,ti). Both Δwi and Δw¯i may not be observed directly, but they can be indicated by si and mi as ΔwiΔw¯i=gs(sisi1s¯i+s¯i1)+gm(mimi1m¯i+m¯i1)=gs(ΔsiΔs¯i)+gm(ΔmiΔm¯i). Here, s¯i and m¯i denote expected si and mi, while f, gs, and gm are functions f(ΔwiΔw¯i)=e(ΔwiΔw¯i),  gs(ΔsiΔs¯i)=a(ΔsiΔs¯i), respectively and gm(ΔmiΔm¯i)=b(ΔmiΔm¯i), where a and b are parameters to be specified.

The form of function f(ΔwiΔw¯i) is chosen based on the proportional hazard model. This model is a semi-parametric model with good performance in analysing and explaining the hazard for different values of the concerned variables. The form of function f(ΔwiΔw¯i) can be easily extended to other cases.

To calculate  gs(ΔsiΔs¯i) and gm(ΔmiΔm¯i), we need first quantify the relationships between expected health status information (s¯i and m¯i) and survival time (ti) which also is the age when the participant was surveyed. We employed linear regression to quantify this relationship since the parameter estimation and prediction of linear regression are efficient, and the parameter in linear regression is easy to interpret. Thus, the two relationships can be expressed as s¯i=csti+ds and m¯i=cmti+dm, where cs, ds, cm and dm are parameters. The form of the two relationships can be extended to other cases.

Then Equation (4) can be expressed as (5) pi(xi|S~i,M~i)=pi(xiea(ΔsiΔs¯i)b(ΔmiΔm¯i)|S~i1,M~i1)×ea(ΔsiΔs¯i)+b(ΔmiΔm¯i)(5) which is a proportional function. If the health status is stable without deterioration, so that ΔsiΔs¯i=0 and ΔmiΔm¯i=0, then the residual life prediction should also be stable and pi(xi|S~i,M~i)=pi(xi|S~i1,M~i1). Otherwise, residual life prediction will vary proportionally with ΔsiΔs¯i and ΔmiΔm¯i. By specifying an initial PDF of x0, p0(x0), and using the fact that (6) pi(xi|S~i1,M~i1)=pi(xi+titi1|S~i1,M~i1)titi1pi(u|S~i1,M~i1)du,(6) and Equation (5), after some recursive manipulations, it can be shown that the PDF of xi conditional on S~i1 and M~i1, and xititi1 can be expressed as (7) pi(xi|S~i,M~i)=pixiea(sis¯i)+b(mim¯i)+k=2i(tktk1)ea(sis¯i1)+b(mim¯i1)+t1ea(ΔsiΔs¯i)+b(ΔmiΔm¯i)titi1p0uea(sis¯i1)+b(mim¯i1)+k=2i1(tktk1)ea(sis¯i1)+b(mim¯i1)+t1du.(7) Equation (7) shows an advantage of this model: xk is conditional on all past sk and mk, k=1, 2,. . ., i, rather than just the current si and mi. If all ΔsiΔs¯i=0 and ΔmiΔm¯i=0, this equation then returns to the conventional conditional survival PDF. If all observed data conform to the expected mean, then  pi(xi|S~i,M~i)= p0(xi+ti)/tip0(u)du as there is no evidence of deviation from the original prediction. Only in some cases when ΔsiΔs¯i0 or ΔmiΔm¯i0, pi(xi|S~i,M~i) is revised accordingly. However, it should be noted that this is not the case in using the proportional hazard model, where the hazard can only return to the baseline hazard if all s and m=0. There is no difficulty in extending the model to use more principal components. However, for the balance between the accuracy and the number of parameters estimated, this study focuses on the case where only the two most important principal components are used.

For the initial residual life distribution p0(x0), an appropriate choice is the Weibull distribution, p0(x0)=αββx0β1e(αx0)β, where α is the scale parameter and β is the shape parameter. The Weibull distribution is a widely used survival analysis distribution. The closed-form of the survival function and the wide variety of shapes exhibited by density functions make the Weibull distribution a particularly convenient generalization of the exponential distribution. It can be used to flexibly describe time distribution, especially when the hazard rate may increase, decrease, or remain invariant.

3.2. Parameter estimation

There are three sets of parameters estimated in our model, as shown in Table . The first set includes the Weibull distribution p0(x0) that includes the scale parameter α and the shape parameter β. If there are enough residual life data, the two parameters can be estimated using the maximum likelihood method. From the available information sources, among the n participants’ life data, h of them are complete life data, and the hc of them are censored data, then the likelihood function is L=j=1hf(tjf)l=1hcs(tl), where f is the PDF of lifetime, s is the survival function, tjf is the final lifespan of the ith participants, and tl  is the last survey time of the lth censored participant. The estimated values of the scale parameter α and the shape parameter β can be obtained by maximizing the likelihood function.

Table 4. Parameter estimated.

The second set is within the linear regression for the relationships between health status information and survival time, including cs, ds, cm, and dm. The estimation for this parameter set is conducted based on the available data. In practice, the parameters in the second set can also be adjusted by health workers’ experience.

The third set includes the parameters a and b, which govern the relation between xi and Δwi, and they are called governor parameters. The two parameters are the core parameters in the proposed model, and they are estimated by the following method.

For any participant, the following events have been observed at each survey: xi1titi1 if the participant is alive; or xntftn, where tf is the lifespan and tn is the time of the last survey before tf. The likelihood function of all observed events for a participant is (8) L=p0(x0>t1)×p1(x1>t2t1|s1)×pn1(xn1>tn1tn2|sn1)× pn(xn=tftn|sn).(8) Note that these events are all conditioned on the participant having survived up to the time of the events, so the events are statistically independent. Using Equation (7) after some manipulations, the likelihood function for a single participant over his or her lifetime can be expressed as (9) L=p0n=1N(tftn)ea(sns¯n)+b(mnm¯n)+k=2n(tktk1)ea(sis¯i1)+b(mim¯i1)+t1×e(a(ΔsiΔs¯i)+b(ΔmiΔm¯i)).(9) The likelihood function for h participants is given by (10) L=j=1hn=1Np0n=1N(tjftjnj)ea(sjnjs¯jnj)+b(mjnjm¯jnj)+k=2nj(tjktj,k1)ea(sj,k1s¯j,k1)+b(mj,k1m¯j,k1)+tj1×n=1Nea(ΔsjnjΔs¯jnj)+b(ΔmjnjΔm¯jnj)(10) where tjf is the lifespan of the jth participant, tjk is the time of the kth survey of the jth participant, sjk and mjk are the first principal components of the jth participant at tjk, and nj is the number of surveys before the death of the jth participant.

Equation (10) represents the probability that all the events reflected by the data happened. Herein, once we have the first two sets of parameters, the values of a and b can be estimated by maximizing the logarithm of Equation (10).

4. Model fitting

4.1. Fitting the model to the data

In this section, the model introduced above is fitted to the health status data from the CHNS. To keep as much as the original information as possible, both the complete and censored samples are used; 9566 randomly selected samples were to fit the model and the remaining 1420 samples were used to test the fitted model. Since the ADL and memory test information are only collected from people aged 55 and older, all the fitting analyses are limited to this population. Participants observed after the age of 55, p(t55)=1 and p(x|t55)=p(x). For simplicity, the survival time t is calculated from age 55, that is ti=the age of ith survey55.

For the initial residual life distribution p0(x0), where p0(x0)=αββx0β1e(αx0)β, the scale parameter α and the shape parameter β of the Weibull distribution are estimated using the maximum likelihood method based on data from 9566 samples. Incorporating the Weibull distribution into the likelihood function can beexpressed as (11) L=j=1hf(tjf)l=1hcs(tl)=j=1hαββtjfβ1e(αtjf)βl=1hce(αtl)β,(11) where tjf is the life span of the jth complete participant and tl is the last survey time of the lth censored participant. Taking the logarithm on both sides of Equation (11), it can be transformed to (12) LnL=hβlogα+hlogβ+(j=1)h{(β1)logtjf(αtjf)β+}+(j=1)h{(αtl)β}.(12) By maximizing Equation (12), the estimates of α and β are αˆ=0.033494 and βˆ=2.896380, respectively; by inverting the information matrix, the variances and covariance of estimated parameters α and β can be approximated, then Var(α)=5.98×108, Var(β)=1.65×108 and Cov(α,β)=6.25×105. The variances and the covariance of the estimated α and β are minimal compared with their estimated values, which means that they are stable and probably uncorrelated.

The health status information and survival time relationships are expressed as s¯i=csti+ds and m¯i=cmti+dm. The results of the regression show that cs=0.13, ds=4.48, cm=0.15 and dm=3.28 and all the regression parameters are significant (P-Value <2×105). In other words, with age (survival time) increase, the ADL and memory will deteriorate.

Next, parameters a and b can be estimated using the maximum likelihood method as (13) L=j=1hk=2nlp0(x0>tj1)×k=2njpk1(xk1>tjktjk1|S~j,k1,M~j,k1)×pnj(xnj=tjftnj|S~jnj,M~jnj)k=2nl×l=1hcp0(x0>tl1)×k=2nlpk1(xk1>tlktlk1×|S~l,k1,M~l,k1)k=2nl,(13) where S~,k=[s,1,,s,k] and M~,k=[m,1,,m,k]. Using Equation (7), p0(x0), and taking ln on both sides of Equation (13), it can be expressed as (14) LnL=hβlnα+hlnβ+j=1hk=2nla(ΔsnjΔs¯nj)+b(ΔmnjΔm¯nj)+(β1)lnk=2nj+1Λj,k1(tjktj,k1)+t1αk=2njΛj,k1(tjktj,k1)+tj1βl=2nlαk=2nlΛl,k1(tlktl,k1)+tl1β,(14) where Λ,k=ea(zkz¯k) and tjf=tnj+1. By maximizing Equation (14), the estimated values of aˆ=1.58×102 and bˆ=1.49×102, given that α and β are substituted by their estimated values αˆ and βˆ. The variance of the estimated parameter a is calculated by Var(a)E(2lnL/a2)1. Var(b) and Cov(a,b) can also be calculated in the same way. Then we have Var(a)=6.14×105, Var(b)=7.25×105 and Cov(a,b)=9.71×105. For a and b, the calculated variances and co-variances are extremely small compared with the estimated values, so that they are stable and uncorrelated.

The residual life of a participant at time ti is predicted based upon estimated model parameters and the information of first two principal components. The pi(xi|S~i,M~i) is expressed by the following function: (15) pi(xi|S~i,M~i)=αββxiΛi+k=2iΛk1(tktk1)+t1β1eαxiΛi+k=2iΛk1(tktk1)+t1βea(ΔsiΔs¯i)+b(ΔmiΔm¯i)eαk=2iΛk1(tktk1)+t1βea(sis¯i)+b(mim¯i)(15) This equation shows a life mechanism that predicts how residual life is influenced by the deviation between the expected and real components’ information, while increasing health status deterioration indicates a relatively shorter residual life.

4.2. Goodness-of-fit

To assess the model fit, 1420 samples were chosen as test set and the chi-square goodness-of-fit test (χ2) was used to carry out the test (Fisher, Citation1922; Pearson, Citation1992). This study of the chi-squared test is not applied since there is only one observation for each distribution pi(xi|S~i,M~i). It is feasible to carry out the chi-squared test in this situation by partitioning each distribution pi(xi|S~i,M~i) into some intervals with equal probability; then xi will have an equal probability to be in any one of the intervals. This effectively transforms xi into a uniform distribution. Specifically, xi is partitioned into N cells here so that each cell had an equal probability, p=1/N. Under the hypothesized distribution, each observation cell is examined. After this transformation, all xi follow an identical uniform distribution, the number of observations G per cell counted and compared with the expected values followed by goodness-of-fit test.

However, for the censored participants, their real residual life cannot be known at each survey. For these samples, pi(xi>tl|S~i,M~i) is calculated and the xi is assumed to have an equal probability of being in any one of the K cells covered by pi(xi>tl|S~i,M~i). In this case, all the numbers of observations in these K cells increase by 1/K.

The null hypothesis H0 is that the elderly’s residual life xi  follows the stated probability distribution pi(xi|S~i,M~i). Since all parameters in pi(xi|S~i,M~i) are already estimated by the other 9566 randomly selected samples, the hypothesis here is simple. To do the test, we need to decide the number of partitions, namely, N. It is recommended that for a sample of size n (large) and significance level α=0.05, one should use approximately N=2n2/5 (D’Agostino & Stephens, Citation1986). Then the expected number of observations in each cell is np=n/N, and finally, the sum of standard deviation can be expressed by (16) Sum=i=1N(Ginp)2np=i=1N(Gin/N)2n/N(16)

In this study, N2n2/5=36.47, thus choosing N=36 is reasonable. Then the degree of freedom is 362=34, the equal probability of each cell p is 1/36, and the expected number of observations in each cell is np=1420/36=39.44. Substituting these values into Equation (16), the sum of standard deviation is 32.43. On the other hand, if we choose a significance level of 0.05, referring to the χ2 table, the critical value is χ0.952(34)=48.60. As 32.43 is smaller than 48.60, the hypothesis is true, so that there is no evidence to reject the established model.

5. Case study

In this section, a participant was chosen as an example to exhibit the fitted model. The participant was first surveyed when he was 61 years old and the subsequent surveys were carried out at the ages of 65, 68, 72, and 74. Finally, he passed away just ten months after the last survey. The residual life distribution of the participant was predicted after each survey based on his ADL and memory information using the model fitted in Section 6.

To illustrate the effectiveness of the proposed model, two perspectives of comparisons are provided based on this example: (1) the fitted model itself, including the comparisons for the result from the initial distribution (Weibull distribution) before using the proposed model and the result from using exponential distribution instead of Weibull distribution as the initial distribution in the fitted model; (2) the proposed model vs. other prediction methods, including the comparisons for the result from multiple regression and the result from the life table.

5.1. The proposed model

To provide the effectiveness in terms of the proposed model itself, we calculate the results from two methods: (1) the initial distribution using the Weibull distribution without the stochastic filtering; (2) exponential distribution as the initial distribution that is using exponential distribution to replace the Weibull distribution as the initial distribution.

5.1.1. Comparison with initial distribution (Weibull distribution)

This comparison shows the difference between the results from the fitted model and the initial distribution (before adjustment, Weibull distribution), as shown in Figure . The residual life prediction using the fitted model in this study is more accurate than the initial distribution in general. The Mean Square Error (MSE) is also calculated. The MSE using our model is 6.2, and without it is 66.3. This proves that the fitted model provides a more reasonable prediction distribution. In addition, with increasing survey times and information amount, the fitted model is more precise and stable. The fitted model work only after the second survey because both Δwi and Δw¯i used in this model need at least two observations to calculate.

Figure 2. Comparison of the predicted residual life distribution based on the fitted model and the initial distribution. (Note: the distribution curves of the fourth and fifth predictions are too centralized to be shown in the plot, so the last two curves are interrupted; the following figures also have the similar situations).

Figure 2. Comparison of the predicted residual life distribution based on the fitted model and the initial distribution. (Note: the distribution curves of the fourth and fifth predictions are too centralized to be shown in the plot, so the last two curves are interrupted; the following figures also have the similar situations).

5.1.2. Comparison with exponential distribution as the initial distribution

The exponential distribution is widely used to depict the residual life, especially those whose residual life is irrelevant to their history. Thus, this comparison shows the difference between the results from the fitted model with the Weibull or the exponential distribution as the initial distribution, as shown in Figure .

Figure 3. Comparison of the predicted residual life distribution based on the fitted model with Weibull distribution and the fitted model with exponential distribution.

Figure 3. Comparison of the predicted residual life distribution based on the fitted model with Weibull distribution and the fitted model with exponential distribution.

In this study, the exponential distribution is fitted with the same data from the Weibull distribution, which is (1/θ)e(x/θ), where θ=0.034494. The residual life prediction using the fitted model with Weibull distribution is more accurate than with an exponential distribution. This proves that using the Weibull distribution in the fitted model provides a more reasonable prediction than the exponential distribution.

5.2. The proposed model vs. other prediction methods

To further illustrate the proposed model’s effectiveness, we further provide the results from other methods of residual life prediction as comparative studies. Among the life prediction methods for human beings, regression analysis and life table are the two most commonly used methods. Regression analysis is a classical and straightforward statistical method to conduct prediction, especially for predicting with a large amount of data. The life table is a statistical table reflecting the death of a group of people (usually 10,000 people), compiled according to the age-specific mortality. The comparisons of the proposed model using multiple regression and life table are presented below.

5.2.1. Comparison with multiple linear regression

The formula of the multiple linear regression is xi=lssi+lmmi+la+ϵ, where la is the constant, ϵ is the random error following N(0,σ2), ls and lm are the coefficients of the first principle components of ADL and memory, respectively. Using the same data as in the fitting for the proposed model, the values of la, ls, lm, and σ can be estimated to 6.55318, 0.26129, 0.07349, and 0.25761, respectively. Then we used this multiple linear regression model to predict the residual life, xiˆ. According to the linear regression, the prediction distribution is a normal distribution N(xiˆ,σ2). Finally, we compared the difference between the results from the fitted model and the multiple regression, as shown in Figure .

Figure 4. Comparison of the predicted residual life distributions from the proposed model and multiple linear regression.

Figure 4. Comparison of the predicted residual life distributions from the proposed model and multiple linear regression.

The residual life prediction using the proposed model is more accurate than using multiple linear regression, especially at the last two survey times when more historical health information was collected and integrated into the model.

5.2.2. Comparison with life table

Since the survey data ended in 2011, we use the 2010 life table of the Chinese population, as shown in Table . To provide a comparison, a normal distribution, based on a Life Table, is presented. The expectation is the residual life from the Life Table, while the standard deviation is set as 1 since the expected residual life is updated with age annually. Thus, we compare the difference between the fitted model and life table results, respectively, as shown in Figure .

Figure 5. Comparison of the predicted residual life distribution based on the fitted model with Weibull distribution and the fitted model with exponential distribution.

Figure 5. Comparison of the predicted residual life distribution based on the fitted model with Weibull distribution and the fitted model with exponential distribution.

Table 5. Life table of Chinese population in 2010.

Our results show that the residual life prediction using the proposed model is more accurate than using a life table. Generally, residual life prediction from the life table is longer, and the proposed model can update the prediction according to the historical health information compared with the life table.

6. Discussion

The proposed model in this study is based on the stochastic filtering theory, which solves drawbacks of quantitative analysis of the previous statistical models. The proposed model can use time-series data through a recursive method to constantly update the prediction and improve its reliability. In terms of the data, ADL and memory were used to extract an indicator that comprehensively reflects people’s health status. This method is different from the previous health assessment using biochemical indicators. The information we used is a more comprehensive reflection of health level, emphasizing the ability to live. This method is more conducive to promotion and application, especially in economically underdeveloped areas with poor medical conditions.

According to the research results, ADL and memory are adequate information for calculating the remaining human life. With the deterioration of health, residual life will decrease, and at the same time, ADL and memory will worsen. If ADL and memory reflect the acceleration of this trend, residual life will also be accelerated until the end of life. This implies that society should pay more attention to people’s ADL and memory to guard against health deterioration. In addition, for health management agencies and insurance companies, ADL and memory should also become important references for guiding work.

7. Conclusions

This study presents an improved method to estimate the residual life of the elderly by using a recursive model. Compared to the previous study model, the proposed model is developed to integrate the health information extracted from ADL and memory, which is the progressive feature of this study. This paper provides a physiologic relationship between the health status and ADL and memory. When memory is introduced as a mental factor, the usage and feasibility of the stochastic filtering are further verified. As ageing is an increasingly serious issue in many countries, the government and the public should pay attention to the ADL and memory of the elderly.

There are two possible limitations of the proposed method. On the one hand, the method is a little complex in terms of its mathematical calculation. However, this limitation can be addressed by developing this method into algorithms and models embedded in the software. On the other hand, the accuracy and precision of the prediction by this method increases with the quantity of historical information, so that this method requires an effective health information recording system or organization.

This article can be extended in a few directions. One direction is to extend the method for the application to the young population. Another direction is to apply this method to other survey data where the health information may be different. Also, it is also meaningful to study the optimization of health behaviour based on the prediction of the proposed method.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This research was funded by the National Natural Science Foundation of China [grant number 72001027] and Beijing Municipal Education Commission [grant number KM202111232007].

References