381
Views
0
CrossRef citations to date
0
Altmetric
STATISTICS

Asymptotic behavior of encompassing test for independent processes: Case of linear and nearest neighbor regressions

ORCID Icon | (Reviewing editor)
Article: 1805092 | Received 23 Oct 2019, Accepted 26 Jul 2020, Published online: 17 Aug 2020

Abstract

Encompassing test has been well developed for fully parametric modeling. In this study, we are interested on encompassing test for parametric and nonparametric regression methods. We consider linear regression for parametric modeling and nearest neighbor regression for nonparametric methods. We establish asymptotic normality of encompassing statistic associated to the encompassing hypotheses for the linear parametric method and the nonparametric nearest neighbor regression estimate. We also obtain convergence rate depending only on the number of neighbors k while it depends on the number of observation n and the bandwidth hn for kernel method. We achieve the same convergence rate when hn=k/n. Moreover, asymptotic variance of the encompassing statistic associated to kernel regression depends on the density, this is not the case for nearest neighbor regression estimate.

PUBLIC INTEREST STATEMENT

Regression techniques are used for quantitative analysis method in many fields such as in economic or financial modeling. They are a useful tool for identification of factors, which may explain the evolution of any variable of interest. In economic modeling for example, when we want to analyze the evolution of Gross Domestic Product or GDP, it might be affected by many variables like interest rate, inflation, exchange rate, sentiment indicators. Researchers or experts may face several admissible models from parametric or/and nonparametric regression methods. Encompassing test can be helpful for detection of redundant models among admissible models. The findings in this study contribute on encompassing test between linear and nearest neighbor regression estimates.

1. Introduction

Encompassing tests lie on model selection step. They are used for detection of redundant models among admissible models. In that case an encompassing model is intended to account for the results found by encompassed model. Theoretical development on encompassing test can be found in Mizon (Citation1984), Gouriéroux and Monfort (Citation1995) and Florens et al. (Citation1996). For their application, we refer readers to the general to specific computer based model selection procedure, Hendry and Doornik (Citation1994).

Recently, Bontemps et al. (Citation2008) have developed encompassing test for linear parametric against kernel nonparametric regression methods. They provide asymptotic normality of the associated encompassing statistics under the independent and identically distributed hypothesis (i.i.d.). As stated in Hendry et al. (Citation2008) that the work of Bontemps et al. (Citation2008) is the starting treatment of encompassing tests to functional parameter based on nonparametric methods.

We extend this result to nearest neighbor regression method, which has been claimed more flexible compared to kernel. Other motivation would be its interest in application like in Nowman and Saltoglu (Citation2003), Guégan and Huck (Citation2005), Ferrara et al. (Citation2010), Guégan and Rakotomarolahy (Citation2010), and Puspitasari and Rustam (Citation2018), among others.

In the next section, we provide an overview of the encompassing test. After, we establish asymptotic normality for various encompassing statistics associated to linear parametric and nearest neighbor regression methods. Last, we conclude.

2. Encompassing test for independent processes

This section introduces the encompassing test and then builds the corresponding encompassing hypothesis. So, given two regression models M1 and M2, we are interested in knowing if model M1 can account the result of model M2. In fact, we want to know if M1 encompasses M2 or, in a short notation M1EM2. Testing such a hypothesis will be done using the notion of encompassing test.

Generally speaking, model M1 encompasses model M2, if the parameter θM2 of the latter model can be expressed in function of the parameter θM1 of the former model. In other words, let Δ(θM1) be the pseudo-true value of θM2 on M1. In general, the pseudo-true value is defined as the plim of θˆM2 on M1. For more discussion on pseudo-true value associated with the KLIC,Footnote1 we refer to Sawa (Citation1978) and Govaerts et al. (Citation1994). The encompassing statistic is given by the difference between θˆM2 and Δ(θˆM1) scaled by a coefficient an.

Let S=(Y,X,Z) be a zero mean random process with valued in RxRdxRq where d,qN. For xRd and zRq, we consider the two models M1 and M2 defined as follows:

(1) M1:m(x)=E[Y|X=x]andM2:g(z)=E[Y|Z=z].(1)

In addition, the general unrestricted model is given by r(x,z)=E[Y|X=x,Z=z]. Following the encompassing test for functional parameter in Bontemps et al. (Citation2008), we have the null hypothesis:

H:E[Y|X=x,Z=z]=E[Y|X=x].

This null states that M1 is the owner model, and M2 will be served on validating this statement and is called the rival model. We test this hypothesis H through the following implicit encompassing hypothesis:

H:E[E[Y|X=x]/Z=z]=E[Y|Z=z].

The following homoskedasticity condition will be assumed all along this work:

(2) Var[Y|X=x,Z=z]=σ2.(2)

Moreover, a necessary condition for the encompassing test relies on the errors of both models where the intended encompassing model M1 should have smaller standard error than the encompassed model M2.

Given a sample, of size n, si=(yi,xi,zi) for i=1,,n as realization of the random process S=(Y,X,Z). We suppose that si, i=1,,n are i.i.d.. Then, for given functional estimates mn and gn of the functions m and g, respectively, we have the following encompassing statistic:

δˆmn,gn=gnGˆ(mn),

where Gˆ(mn) is an estimate of the pseudo-true value, associated with gn on H, in the LHS of the hypothesis H. Bontemps et al. (Citation2008) has provided asymptotic normality of this encompassing statistic δˆ by considering kernel regression estimate for nonparametric method. This result can be extended to nearest neighbor regression estimate but of course with different assumptions.

For nearest neighbor regression estimate, we consider the representation in Mack (Citation1981), that is the k nearest neighbor (or k-NN) estimate gn of g is given by:

gn(z)=1nRnqi=1nwzZiRnYi1nRnqi=1nwzZiRn,

where Rn will be defined as distance, according to the Euclidean norm in Rq, from z to its k(n)th neighbors, and w(u) is a bounded, non-negative weight function satisfying

(3) w(u)du=1andw(u)=0for|u|1.(3)

To establish an asymptotic distribution of δˆmn,gn, we need some assumptions. The following assumptions will be used for insuring the asymptotic normality and are taken from Mack (Citation1981). Without loss of generality, the function f will be a marginal density or a conditional density or a joint density according to the variables on its arguments.

The first assumption relies on the density function of the couple (Y,Z).

Assumption 1. The function χβ(z)=yβf(z,y)dy is bounded and continuous at z for β=0,1,2, and continuously differentiable in a neighborhood of z for β=0,1.

The following assumption concerns conditions on the moments up to order three of the variable of interest.

Assumption 2. E[|Y|3] < , Var[Y|Z=z] > 0 and f(z) > 0.

The last assumption states conditions on the relationship between the number of neighbors k and the sample size n.

Assumption 3. k=nα with 0 < α < 44+d.

When assumptions 1–3 hold and the relation (3) is satisfied, then Mack (Citation1981) has established the asymptotic normality of the centered k-NN regression of gn. Moreover, under assumption 3, the bias of such k-NN regression estimate vanishes to zero.

Without loss of generality, we proceed as previously when model M1 will be estimated by k-NN regression method. In the rest of the paper, N(μ,v) denotes the normal distribution with mean μ and variance v. We now present the asymptotic normality of the encompassing statistic.

3. Asymptotic normality of the encompassing statistic

In general, M1 or M2 can be estimated using nonparametric or parametric regression methods. We can encounter the following four situations: M1 and M2 are both estimated parametrically, M1 and M2 are both estimated nonparametrically, M1 is estimated nonparametrically and M2 parametrically and M1 is estimated parametrically and M2 nonparametrically.

For development on the asymptotic behavior of the encompassing statistic for fully parametric case, i.e the two models M1 and M2 have parametric specification, we refer readers to Gouriéroux et al. (Citation1983) and Mizon and Richard (Citation1986) among others. For recent discussion on this encompassing test for fully parametric case, Bontemps et al. (Citation2008) is a good reference.

Next, we will study the completely nonparametric case.

3.1. Nonparametric specification for M1 andM2

We consider the case where the two models M1 and M2 defined in (1) are estimated using the nonparametric nearest neighbor regression method. To test the hypothesis” M1 encompasses M2”, we establish asymptotic normality of the associated encompassing statistic.

Theorem 3.1. Assume that assumptions 1–3 and relations (2) and (3) hold. Then under H, we have:

k1δˆmn,gn(z)N(0,c.Var(ϵ/Z=z)w2(u)du)indistributionasn,

where ϵi=Yim(xi) for i=1,,n are the residuals from model M1 and c=πq/2Γ((q+2)/2) is the volume of unit ball in Rq with Γ(.) the gamma function.

Proof of Theorem 3.1

The proof will be based on the decomposition of the encompassing statistic into two parts as an expression of nearest neighbor regression and a kind of bias. Before all, let’s denote by:

WzZiRn=1nRnqwzZiRn1nRnqi=1nwzZiRn.

We write down our encompassing statistic by replacing our estimates gn and Gˆ(mn) at a given point z, and we have:

k1δˆmn,gn(z)=k1i=1nWzZiRnYik1i=1nWzZiRnmn(xi)=k1i=1nWzZiRnϵi+k1i=1nWzZiRn(m(xi)mn(xi))=A+B,

where A is the first expression in RHS of the equality. This involves a k-NN regression of ϵi=Yim(xi) on Zi scaled by the coefficient k1 seeing as convergence speed rate when n goes to infinity. Using Mack (Citation1981), under assumptions 1–3 and when relation (3) holds, we have:

AN(0,c.Varϵ/Z=zw2(u)du)indistributionasn.

Next, for the second expression B, we can bound by taking its supremum with respect to xi and then we get:

(4) |B|Supxik1|mn(xi)m(xi)|Supxik1|mn(xi)E[mn(xi)]|+Supxik1|E[mn(xi)]m(xi)|=B1+B2.(4)

When using the expression of the bias, Theorem 1 in [2], B2 becomes:

B2=(SupxiA(xi))kn2dk1+okn2dk1+O1kk1,

where A(.) is a function depending only on xi and its expression can be found in Mack (Citation1981). Then from Assumption 3, B2 vanishes to zero when n. It remains on showing that B1 goes to zero also. This can be achieved using result of Mukerjee (Citation1993) extension of Cheng’s work (Cheng, Citation1984). Therefore, we remark that when the number of neighbors k increases more the weights given to neighbors decrease, then rewriting mn(xi) and we have the following equivalence:

mn(xi)=j=1nKxiXjRiYjj=1nKxiXjRij=1ncjkYj,

where K(.) is a given weight function which satisfies condition (3), cj is a bounded weight equal to zero when j is larger than the number of neighbors and Ri is the distance between xi and its kth neighbor. When we denote by m˜n(xi)=j=1ncjkYj, then from Theorem 2.1 in Mukerjee (Citation1993), we have:

B1=Supxi|m˜n(xi)E[m˜n(xi)]|=O1θn+Onr1r,

with r>1 and θn a positive sequence which tends to zero as n. So we get |B| converges to zero in probability as B1. This completes the proof of theorem.

Next, we will consider the mixed situation where the owner model has parametric specification and the rival is from nonparametric method.

3.2. Parametric modelling for M1 vs nonparametric specification forM2

In this section, we consider the case that model M1 is a linear parametric model and M2 is estimated by nearest neighbor regression technique. Therefore, the hypothesis H will have linear parametric specification. The encompassing statistic associated to the null M1EM2 can be rewritten as follows:

(5) δˆβ,g(z)=gn(z)GˆL(βˆ)(z),(5)

where GˆL(βˆ) is an estimate of the pseudo-true value GL(β)(z) associated with gn on H, and is defined as GL(β)(z)=β\primeE[X|Z=z].

We estimate the rival model M2 using k-NN regression method where the owner model M1 is still with linear parametric specification. The following theorem provides the asymptotic normality of the encompassing statistic introduced in relation (5).

Theorem 3.2. Assume that assumptions 1–3, relations (2) and (3) hold.

Then under H, we get:

k1δˆβ,g(z)N(0,Σ)indistributionasn,

where Σ=cσ2w2(u)du with c is the volume of unit ball in Rq.

Proof of Theorem 3.2.

When the owner model M1 is the linear regression parametric and the rival model M2 is the k-NN regression, we write the encompassing statistic as follows:

(6) k1δˆβ,g(z)=k1(gn(z)GˆL(βˆ)(z))=k1i=1nWzZiRnYii=1nW˜zZiR˜nβˆXi=k1i=1nWzZiRn(YiβXi)+k1i=1nW˜zZiR˜n(ββˆ)Xi+k1i=1nWzZiRnβXik1i=1nW˜zZiR˜nβXi=N1+N2+N3N4,(6)

where W˜zZiR˜n=1nR˜nqw˜zZiR˜n1nR˜nqi=1nw˜zZiR˜n is the weight associated to the nearest neighbor regression of βˆXi on Zi and R˜n is the distance from z to its k˜th neighbor.

We remark that Yi and βˆXi as fitted values of Yi would have the same Zi nearest to z. We then have N3N4=0. Otherwise, this can happen asymptotically, that is Yi and βˆXi as fitted values of Yi have the same Zi nearest to z when k and k˜ tend to infinity. Thus, N3 is asymptotically equivalent to N4.

For the first expression N1=k1i=1nWzZiRnϵi, with ϵi=YiβXi. Under assumptions in Theorem 3.2, then using result in Mack (Citation1981), we have:

N1N(0,Σ)indistributionasn,

where Σ=c.σ2w2(u)du.

For N2=(ββˆ)k1i=1nW˜zZiR˜nXi, under assumptions in Theorem 3.2, we know that the estimate n(ββˆ) converges in distribution to a normal law Z with mean zero. The remaining part of N2 has the following expression k1ni=1nW˜zZiR˜nXi which converges in distribution to zero. Thus, from Slutsky’s theorem, N2 tends to zero in distribution. \boxempty

We will consider the last case where the owner model M1 is a nonparametric method and the rival model M2 is a linear parametric model.

3.3. Nonparametric specification for M1 vs parametric modelling for M2

We now consider the owner model M1 to be estimated using a k-NN nonparametric regression and the rival model M2 to be a linear parametric method. Therefore, the encompassing statistic associated to the null M1EM2 is given by:

(7) δˆm,γ=γˆγˆ(mn),(7)

where γˆ(mn) is an estimate of the pseudo-true value γ(m) associated with γˆ on H, which is defined by γ(m)=(E[ZZ])1E[Zm]. We estimate the unknown conditional mean m associated to the model M1 using k-NN regression estimate. We state in the following theorem the asymptotic normality of the encompassing statistic in relation (7). For precision, we use the assumptions introduced in previous section for k-NN regression estimate mn instead of gn.

Theorem 3.3. Assume that relations (2) and (3), assumptions 1–3, and the regularity conditions in linear regression are satisfied.

Then under H, we get:

nδˆm,γN(0,Ω)indistributionasn,

where Ω=σ2(E[ZZ])1.

Proof of Theorem 3.3.

When the functional parameters mn is from k-NN regression estimate, we rewrite the associated encompassing statistic as follows:

(8) nδˆm,γ=n(γˆγˆ(mn))=n1ni=1nZiZi11ni=1nZiYi1ni=1nZiZi11ni=1nZimn(xi)=n1ni=1nZiZi11ni=1nZi(Yim(xi))+n1ni=1nZiZi11ni=1nZi(m(xi)mn(xi))=L1+L2,(8)

where L1 corresponds to the first expression in the RHS of the equality (8). It coincides to the linear regression of the error ϵ (with ϵi=Yim(xi)) on Z.

Under i.i.d. assumption in Theorem 3.3, L1 converges in distribution to Z where Z is normally distributed with mean zero and variance Ω=σ2(E[ZZ])1. For the second expression L2, we bound it by taking the maximum with respect to xi and then we get |L2|nSnDnSup{(m(xi)mn(xi))),xiRd} where Sn=1ni=1n|Zi| and Dn=(1ni=1nZiZi)1. We remark that nSup{(m(xi)mn(xi))),xiRd} is asymptotically equivalent to the bound of |B| in EquationEquation 4 which converges to zero in probability. Thus, the product vanishes to zero also from Slutsky’s theorem. This completes the proof.

4. Illustration

In this section, we illustrate our theoretical results on real data. We focus on socio-economic factor determinants of Life expectancy. As explanatory variables for Life expectancy at birth, we consider the Gross National Income per capita in US $, the Gross Domestic Product per capita in US $ and the government health expenditure per capita in US $. Impact of these variables on Life expectancy at birth has been analyzed a long way in the literature, for regression analysis we may look at Hussain (Citation2002) and Ali and Ahmad (Citation2014). We use cross sectional data for 169 countries in 2017, which have been collected from the United Nation and the World Health Organization websites. To start our empirical study, we compute some basic statistics.

The highest life expectancy hits 84 years and belongs to Japan. While, the lowest is around 52 years belonging to Central African Republic. The best life expectancy of 84 years would be remarkable. Besides, life expectancy mean 72 years seems interesting. Moreover, the median value 73.69 indicates that around 84 countries have life expectancy above 73 years, largely beyond the retirement ages.

For socio-economic variables; Luxembourg, Switzerland and USA have the highest GDP, Income and health expenditure per capita, respectively. Burundi has the lowest GDP and Income per capita. Congo Democratic Republic registers the lowest government spending on health care. These variables exhibit some common behaviors such as the median of each variable is around fifteen times of its minimum and one over fifteen times of its maximum. They also have high dispersion. We now proceed on analysis of their relationship with life expectancy.

Let compute the correlation coefficients between life expectancy and the predictor variables.

From Table , Life expectancy has positive and high correlation with each explanatory variables. Such correlations indicate that higher GDP, income or expenditure on health will link with longer life expectancy. This preliminary analysis could motivate us on exploring other statistic and econometric analysis of the relationship between life expectancy and the three socio-economic variables. We will use the linear and the nearest neighbor regression methods. In sequel, we will work on demeaned and scaled (by a factor 1MaxMin) variables.

Table 1. Summary statistics

Table 2. Correlation between life expectancy and the socio-economic variables

For the linear regression, we explain life expectancy at birth Y by health expenditure per capita X, gross income per capita Z and GDP per capita W. Considering several combination of these explanatory variables, following we summarize regression coefficient estimates with their standard errors in parenthesis.

(9) M1:Yi=0.8Xi+uˆi1(0.02)M2:Yi=0.76Zi+uˆi2(0.006)M3:Yi=0.89Wi+uˆi3(0.008)M4:Yi=0.26Xi(0.21)+0.96Zi+uˆi4(0.18)M5:Yi=0.02Xi(0.19)+0.86Wi+uˆi5(0.18)M6:Yi=0.28Xi(0.21)+1.24Zi(0.44)0.32Wi+uˆi6(0.46),(9)

where uˆj an estimate of the error term of model Mj, j=1,,6.

Coefficients of models M1, M2 and M3 are all significant. In contrast, models M4 and M6 nest to model M2 due to non-significance of X and Z coefficient estimates. Besides, M5 nests to M3 as X’s coefficient estimate is not significant. We then focus our analysis on models M1, M2 and M3 and proceed on their diagnostics. Results are reported in Table .

Table 3. Regression diagnostics

From Table , we accept the homoscedasticity property of residuals and their non-correlation with predictors. In addition, residuals of the three models have zero mean. Thus, our three models meet standard assumptions on linear regression.

M1, M2 and M3 are non-nested models. Thus, the decision on choosing one model will be based on encompassing test. A necessary condition is that the encompassing model should fit better than encompassed model. Therefore, encompassing model is expected to have smaller error variance than its rival. The standard errors of models Mi, i=1,2,3 are σ1=0.192, σ2=0.179 and σ3=0.182, respectively. Then, among the three models, M1 has the worst fit and M2 has the best fit. We report in Table various encompassing tests associated to models M1, M2 and M3.

Table 4. Encompassing tests for models M1, M2 and M3

From Table , we accept the null M2EM3 and M3EM1 that is, M2 encompasses M3 and M3 encompasses M1. In contrast, we reject M3EM2 and M1EM3, there are no mutual encompassing. Thus, we retain model M2 as it also has the smallest standard error. We will re-examine the link between life expectancy and explanatory variables using nearest neighbor regression.

For k-NN regression of life expectancy, we need the specification of the weighting function w() and the estimation of the parameter k. Two weighting functions have been mostly used in the literature: the exponential function exp(||zZ(i)||2)j=1kexp(||zZ(i)||2) with (Z(i))i=1,,k the k nearest to z, and the uniform function 1k. We also consider these two weighting functions.

Assumption 3 states that the number k should satisfy 1<k=nα<n44+d, for n observations and d explanatory variables. Then, as n=169, we have maximum values for k which are 60, 30, and 18 for d=1, d=2 and d=3 respectively. We estimate this parameter k by minimizing the root mean squared error (RMSE). Results are summarized in Table where we keep the following notation already used in linear regression: X for health expenditure per capita, Z for gross income per capita and W for GDP per capita.

Table 5. Specification of k-NN regression estimates

For models M7 to M11, model M10 has the lowest standard error. We also remark that standard errors of models M9 and M10 are very close. We will check if model M10 can account results of other models and if there are mutual encompassing between M10 and M9. We now compute the following standardized encompassing statistics using result developed in Theorem 3.1:

δs=k1δˆπq/2Γ((q+2)/2).Varϵ/Z=zw2(u)du

where δˆ is the k-NN regression of the residuals ϵ of owner model on explanatory variables Z of rival model. Results are reported in Table .

Table 6. Encompassing tests for models M7 to M11

Values in Table are all less than 1.96 in absolute value, except for M9EM10. We accept null hypotheses M10EM8, M8EM7, M10EM9 and M10EM11. In other word, M10 can account information content in other models. As M9 does not encompass M10, there is no mutual encompassing. Thus, we can retain model M10 from all k-NN regression models.

Next illustration concerns encompassing test on nonparametric and parametric regression techniques in Theorem 3.3, having as null hypothesis: the nearest neighbor regression M10 encompasses the linear regression M2. Under this null, we have the following statistic from Theorem 3.3:

(10) δS=nδˆΩˆ = ni=1169Zi21i=1169eˆiZiσˆ21ni=1169Zi21 = i=1169eˆiZiσˆ2i=1169Zi2,(10)

where Ωˆ is an estimate of the asymptotic variance Ω, eˆi residuals of model M10 and σˆ2 is a k-NN regression estimate of the conditional variance σ2=var(Y/X=x,Z=z).

Absolute value of the standardized encompassing statistic δS=0.01 is less than 1.96. Therefore, we accept the null hypothesis at a risk level 5% i.e the nearest neighbor regression M10 encompasses the linear regression M2. We conclude that we may retain k-NN regression of life expectancy on health expenditure and income.

5. Conclusion

We know that different approaches of encompassing tests present in the literature provide different results. We have considered encompassing test in asymptotic way which is inline with the encompassing principle announced in the introduction. The work has been conducted for parametric and nonparametric regression techniques.

As stated in Hendry et al. (Citation2008) that the work of Bontemps et al. (Citation2008) is the starting treatment of encompassing tests to functional parameter based on nonparametric methods. We have extended that work to nearest neighbor functional parameter estimate under the i.i.d. assumption. When using linear and nearest neighbor regressions as estimators for conditional expectations, we have established asymptotic normality of the associated encompassing statistics for independent processes.

Comparing the convergence rate of the asymptotic encompassing statistic of k-NN regression estimate to kernel regression obtained by Bontemps et al. (Citation2008), it depends only on the number of neighbors k for k-NN while for kernel ones depends on the number of observation n and the bandwidth hn. We have the same convergence rate when hn=k/n.

Moreover, Bontemps et al. (Citation2008) obtained asymptotic variance of the encompassing statistic associated to kernel regression depending on the density, which is not the case for nearest neighbor regression estimate.

Development of encompassing test to nonparametric methods opens new research direction in theory as well as in practice.

Acknowledgments

The author thanks the anonymous referees and the Editor Professor Hiroshi Shiraishi.

Additional information

Funding

The author received no direct funding for this research.

Notes on contributors

Patrick Rakotomarolahy

Patrick Rakotomarolahy is the assistant professor in the department of mathematics and their applications at Fianarantsoa University. He has completed his Bsc and Msc in applied mathematics. He received his doctorate at the Panthéon-Sorbonne Paris 1 University. His current researches are in statistical model selection and in modeling macroeconomic and financial variables. He focuses especially on issues about model selection between parametric and nonparametric techniques. This study is in-line with this direction as the findings on asymptotic behavior of encompassing tests allow us to detect redundant models.

Notes

1. Kullback-Leibler Information Criterion

References