938
Views
1
CrossRef citations to date
0
Altmetric
Articles

A distribution-free test of independence based on a modified mean variance index

, , &
Pages 235-259 | Received 23 Feb 2022, Accepted 03 Apr 2023, Published online: 28 Apr 2023

Abstract

Cui and Zhong (2019), (Computational Statistics & Data Analysis, 139, 117–133) proposed a test based on the mean variance (MV) index to test independence between a categorical random variable Y with R categories and a continuous random variable X. They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when R is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.

1. Introduction

As a fundamental task in statistical inference and data analysis, testing independence of random variables has been explored for decades in the literature. Based on different types of random variables, many approaches to test independence have been proposed. For instance, if one wants to test independence between two categorical random variables, then the contingency table analysis and the Pearson chi-square test can be used. If both variables are continuous, there are also many important tests, such as, Hoeffding (Citation1948), Rosenblatt (Citation1975), Csörgö (Citation1985) and Zhou and Zhu (Citation2018), among others. Testing independence between random vectors has also received much attention in recent years, for instance, Székely et al. (Citation2007), Székely and Rizzo (Citation2009), Heller et al. (Citation2012), Zhu et al. (Citation2017), Pfister et al. (Citation2018) and Xu et al. (Citation2020).

It is also important to test independence between a continuous variable and a categorical variable. Suppose X is a continuous variable with support RX and Y{1,,R} is a categorical variable with R categories. We are interested in the following test of hypothesis: H0: X and Y are independent, versue H1: X and Y are not independent.Or, equivalently, (1) H0: F(x)=Fr(x), for any xRX and r=1,,R,versue H1: F(x)Fr(x), for some xRX and r=1,,R,(1) where F(x)=P(Xx), pr=P(Y=r), and Fr(x)=P(XxY=r), r=1,,R. Thus, testing independence between X and Y is equivalent to testing the equality of conditional distributions, which is known as the k-sample problem in the literature (see e.g., Jiang et al., Citation2015).

Recently, Cui and Zhong (Citation2019) proposed the mean variance (MV) test based on a new measure of dependence between X and Y, the MV index (Cui et al., Citation2015), to test hypothesis (Equation1). The MV index is defined as MV(XY)=EX[VarY(F(XY))]=r=1Rpr[F(x)Fr(x)]2dF(x),where F(xY)=P(XxY). Given {(Xi,Yi),i=1,,n} with sample size n, the MV test statistic is proposed: nMVˆn(XY)=nr=1Rpˆr[Fn(x)Frn(x)]2dFn(x),where Fn(x), pˆr and Frn(x) are the empirical counterparts of F(x), pr and Fr(x), respectively. An important theoretical finding of Cui and Zhong (Citation2019) is that when the number of categories of Y is allowed to diverge with the sample size, the standardized MV test statistic is a standard normal distribution. Cui and Zhong (Citation2019) has argued many appealing merits of this finding. For instance, this makes it convenient for obtaining any critical value of the MV test by using an approximated normal distribution when R is large.

For any fixed xRX, dividing MV test statistic's integrand by Fn(x)(1Fn(x)) leads to the Pearson chi-square test statistic (2) χn2(x)=nr=1Rpˆr[Fn(x)Frn(x)]2Fn(x)(1Fn(x))(2) (3) =r=1Rl=12(nlr(x)nnl+(x)n+r)2nl+(x)n+rn,(3) which is widely used in practice to test independence between the indicator function I(Xx) and Y. Here nlr(x) (l=1,2,r=1,,R) are the counts in a 2×R contingency table (Table ) determined in the following way n1r(x)=|{(Xi,Yi): Xix and Yi=r}|,for r=1,,R,n2r(x)=|{(Xi,Yi): Xi>x and Yi=r}|,for r=1,,R,where |A| denotes the cardinality of a set A, and nl+(x)=r=1Rnlr(x), n+r=l=12nlr(x), for l=1,2, r=1,,R. As the Pearson chi-square test is more widely used in testing independence, we can imitate the MV test statistic to take the integral of χn2(x) with respect to Fn(x), and propose the following test statistic: (4) nIPCˆn(X,Y)=i=1nr=1Rl=12(nlr(Xi)nnl+(Xi)n+r)2nl+(Xi)n+rn=nr=1Rpˆr[Fn(x)Frn(x)]2Fn(x)(1Fn(x))dFn(x).(4) We call IPCˆn(X,Y) as the integral Pearson chi-squared (IPC) statistic, and nIPCˆn(X,Y) as the IPC test statistic.

It is not difficult to see that the IPC test statistic is essentially a reestablishment of the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (Citation1987). The reader is referred to He et al. (Citation2019) and Ma et al. (Citation2022) for some recent work on this statistics. The asymptotic null distribution of the IPC test statistic when R is fixed was established in Scholz and Stephens (Citation1987). The promising performance of the k-sample Anderson Darling statistic (IPC test statistic) has been verified by many subsequent works in the literature and a variety of applications in practice. However, to our best knowledge, its theoretical property when the number of categories of Y is diverging remains unknown. The main goal of this paper is to fill in gaps in this area. In analogy to the MV test, we find that the IPC test also enjoys an appealing property, that is, the asymptotic null distribution of the standardized IPC test statistic when R is diverging is a standard normal distribution. This important theoretical finding allows the IPC test to share many distinguished merits with the MV test. Our work, together with Cui and Zhong (Citation2019), establishes a solid theoretical foundation and empirical evidence for independence testing between a continuous variable and a categorical variable with a diverging number of categories. As an application of such a theoretical finding, we also extend the IPC test to test independence between two continuous random variables. The approach is carried out by slicing one of the variables on its support to get a categorical variable, and then the IPC test can be applied. We allow the slicing scheme to be finer as the sample size increases, which ensures us to obtain a satisfactory test power. Slicing technique is widely used across many statistical fields, such as feature screening (Mai & Zou, Citation2015b; Yan et al., Citation2018; Zhong et al., Citation2021) and k-sample test (Jiang et al., Citation2015). It has also been used for testing independence. For instance, it is commonly seen in practice to slice two univariate variables into categorical variables and apply Pearson chi-squared test to test their independence. Please refer to Zhang et al. (Citation2022) for more recent development of sliced independence test. Our research enriches the application of the slicing skill in the field of independence testing. The proposed approach also provides a computationally tractable way to compute the p-value efficiently. Simulation studies show that the proposed test has satisfactory test power in many scenarios.

Table 1. Empirical bivariate distribution for a fixed x.

The rest of the paper is organized as follows. Section 2 introduces some preliminaries of the IPC test. Section 3 presents the main results, including the asymptotic null distribution of the test statistic when R is diverging with the sample size. Simulation studies of the proposed test and a real data application are included in Section 4. Section 5 concludes the paper. Due to the limited space, all the technical proofs of theorems are given in Appendix.

2. Preliminaries

Let X be a continuous random variable with support RX, Y{1,,R} be a categorical variable with R categories. Motivated by the IPC statistic in (Equation4), we define the following IPC index between X and Y. (5) IPC(X,Y)=r=1RprRX[F(x)Fr(x)]2F(x)(1F(x))dF(x).(5) The IPC statistic is a natural estimator of the IPC index. Note that the nl+(Xi) in the denominator of the right-hand side of the first equality of (Equation4) will take zero when Xi is the largest or smallest one among all {Xi}i=1n. A solution is to follow Mai and Zou (Citation2015a) and consider the Winsorized empirical CDF F~n(x)={b,if Fn(x)b;Fn(x),if a<Fn(x)<b;a,if Fn(x)aat a predefined pair of number (a,b). The Winsorization will cause bias in estimating the IPC index. Though such bias can automatically vanish if we let a0 and b1 as n. However, how to properly choose a and b is beyond the scope of this paper. At the same time we notice that, if Xi is the largest or smallest one, the numerator of the first equality of (Equation4) will also take zero. Therefore, we hereafter denote 0/0=0 following the common practice in the literature (see for example, He et al., Citation2019; Ma et al., Citation2022) to avoid confusion. Then we have the following lemmas.

Lemma 2.1

Let Y{1,,R} be a categorical variable with R categories and X a continuous variable with support RX, (6) IPCˆn(X,Y)PIPC(X,Y),(6) as n.

Lemma 2.1 shows that IPCˆn(X,Y) is a consistent estimate of the IPC index.

Lemma 2.2

0IPC(X,Y)<1 and IPC(X,Y)=0 if and only if X and Y are independent.

According to Lemma 2.2, the IPC index is an effective measure of dependence between a continuous variable and a categorical variable. Thus we can construct test of independence via the IPC statistic.

Let Tn=nIPCˆn(X,Y). Note that Tn is essentially the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (Citation1987), and then we can directly derive the asymptotic null distribution of Tn.

Theorem 2.3

Suppose X is a continuous random variable and Y is a categorical random variable with a fixed class number R. Under H0, (7) Tn=nIPCˆn(X,Y)dj=11j(j+1)χj2(R1),(7) where χj2(R1)'s, j=1,2,, are identically and independent distributed (i.i.d.) χ2 random variables with R−1 degree of freedom, and d denotes the convergence in distribution.

Though Theorem 2.3 gives an explicit form of the asymptotic null distribution, the exact distribution of j=1[j(j+1)]1χj2(R1) is not accessible since it is a summation of infinitely many chi-square random variables. To address this issue, a widely adopted approach is to approximate j=1χj2(R1)j(j+1) by DN+(R1)/(N+1) for a sufficiently large N, where DN=j=1Nχj2(R1)j(j+1), and R1N+1 is the expectation of j=N+11j(j+1)χj2(R1). However, as a chi-square type mixture, DN's cumulative distribution function does not have a known closed form. In practice, we usually generate many samples from DN and then use the empirical distribution as a surrogate of the true distribution. We can also use permutation test or bootstrap to compute the p-value for the IPC test. However, though these numerical methods are valid, they do make the IPC test less convenient for independence testing.

Lemma 2.1 declares that IPCˆn(X,Y) converges in probability to IPC(X,Y), which is a new result not discussed in Scholz and Stephens (Citation1987). Furthermore, we have a better result about the convergence rate.

Theorem 2.4

Under the conditions of Lemma 2.1, for any ϵ>0, (8) P(|IPCˆn(X,Y)IPC(X,Y)|>ϵ)C1nRexp(C2nϵ2/R2)0,(8) as n0. Here C1 is a positive constant, and C2>0 depends only on min1rRpr.

Theorem 2.4 follows directly from Theorem 3.2 in Section 3.1. The probability inequality in (Equation8) allows us to give a lower bound of the power of the test with finite sample size. In specific, according to Theorem 2.3, we compute the critical value Cα for a given significance level α>0. Then under H1, the power is P(TnCα|H1)=1P(IPCˆn(X,Y)<Cαn|H1)=1P(IPC(X,Y)IPCˆn(X,Y)>IPC(X,Y)Cαn|H1)1P(|IPC(X,Y)IPCˆn(X,Y)|>IPC(X,Y)Cαn|H1)1C1nRexp{C2n(IPC(X,Y)Cαn)2/R2}.According to Lemma 2.2, we have IPC(X,Y)>0 under H1. Therefore, the power of the test converges to 1 as the sample size increases to infinity. In other words, this ensures that the IPC test of independence is a consistent test.

We would like to conclude this section by introducing two relevant recent work in the literature on IPC index. The application of the dependence measure in marginal feature screening has received increasing attention. Recently, He et al. (Citation2019) proposed a novel feature screening procedure based on the IPC index (which they referred to as the AD index) for ultrahigh-dimensional discriminant analysis where the response is a categorical variable with a fixed number of classes. The theoretical guarantee of the IPC statistic in He et al. (Citation2019) has focused primarily on concentration inequality, rather than the asymptotic distribution. They showed that the proposed screening method is more competitive than many other existing methods. The promising numerical performance of He et al. (Citation2019)'s method soon inspired subsequent work. Later, Ma et al. (Citation2022) extended He et al. (Citation2019)'s work with the help of slicing technique, and proposed an IPC index-based screening procedure which can handle many types of response variable, including continuous variable, categorical variable and discrete variable taking finite or infinite values. Especially, the slicing technique used in Ma et al. (Citation2022) is further considered in this article to develop method for testing independence between two continuous random variables. The details are postponed in Section 3.2.

3. Main results

In this section, we allow the number of categories of Y to approach infinity with the sample size n, and consider the properties of the IPC test. Research on the categorical variable with a diverging number of categories has received increasing attention in the literature. For instance, Cui et al. (Citation2015) established the sure screening property of the MV index for discriminant analysis with a diverging number of response classes. In their setting, they allow the number of categories R to approach infinity at a slow rate of n. And Ni and Fang (Citation2016) also proposed an entropy-based feature screening for ultrahigh dimensional multiclass classification allowing the number of response classes to diverge. Readers are also referred to Ni et al. (Citation2017), Yan et al. (Citation2018), Ni et al. (Citation2020) and Ma et al. (Citation2022), among others, for more examples.

Here, we emphasize that it is also important to study test of independence between a continuous variable and a categorical variable with a diverging number of categories. One of its applications is to provide a feasible approach for testing independence between a continuous variable and a categorical variable taking infinite values. To be specific, suppose Y is a categorical variable taking infinite values (e.g., Poisson variable) and X is a continuous variable. To test independence between X and Y, we can define a new variable Y=YR for some R, where ab=min(a,b). The IPC test is then applied to test independence between X and Y, which gives us important information about whether X and Y are independent. Then a natural question is how to choose an appropriate R. A reasonable approach is to allow R to go to infinity with the sample size n so as to obtain satisfactory test power. This is one of the reasons that motivates us to study the asymptotic properties of the IPC statistic when R is diverging.

3.1. Asymptotic properties when R is diverging

In the following, we establish the large sample properties of the IPC statistic when R is diverging with the sample size n. To avoid any ambiguity, in Section 3.1, we actually consider a sequence of problems indexed by k, k=1,2,. For each k, Yk{1,,Rk} denotes the categorical variable with Rk categories, pr,k=P(Yk=r), for r=1,,Rk, Xk denotes the continuous variable, and {(Xki,Yki): i=1,2,,nk} is a random sample with sample size nk from (Xk,Yk). The following theorem shows the asymptotic normality of the standardized test statistic if Xk and Yk are independent for any k=1,2,.

Theorem 3.1

Assume that nk as k. Let Tnk=nkIPCˆnk(Xk,Yk). If Rk/min1rRkpr,k=o(nk3/8) and Rk as nk, and Xk and Yk are independent for k=1,2,, we have (9) Tnk(Rk1)2(π233)(Rk1)dN(0,1),(9) as k.

If min1rRkpr,k=O(nkγ) where 0<γ<3/8, then we derive that Rk=O(nkη) for some 0<η<3/42γ, namely, we allow the number of categories to go to infinity with the sample size n at the relatively slow rate. Cui and Zhong (Citation2019) also gave a similar result for the MV test with R diverging.

Let V(R)=j=1χj2(R1)/[j(j+1)] be the asymptotic null distribution in Theorem 2.3 where R is fixed. A direct application of Theorem 3.1 is that we can use a normal distribution with mean R−1 and variance 2(π2/33)(R1) to approximate the asymptotic null distribution of the IPC test (i.e., V(R)) when R is large. Denote W(R)=N(R1,2(π2/33)(R1)). To gain more insight into the connection between the normal distribution W(R) and V(R), one can notice that the mean and the variance of V(R) are also R−1 and 2(π2/33)(R1), respectively. This result is a distinguished merit of the IPC test. It enables us to reduce the computational cost since it is more easy to calculate the critical value of W(R) than of V(R).

To further check the validity of using W(R) as a surrogate for V(R) to compute the critical value of the IPC test when R is large, we compare the empirical quantiles of the IPC test statistic with the theoretical quantiles of the normal distribution W(R) in (Equation9) and the asymptotic null distribution V(R) in (Equation7). We generate Y{1,,R} with equal probabilities and X independently from U(0,1). We consider R=10,15,,35. For each R, let n=40×R, and we repeat the simulation 1000 times to obtain 1000 values of the IPC test statistic Tn. We report the 90% and 95% quantiles of 1000 Tn's (denoted by empirical quantile in Table ), as these two quantiles are most widely used in hypothesis testing. The 90% and 95% quantiles of V(R) (denoted by theoretical quantile 1) and W(R) (denoted by theoretical quantile 2) are also computed. The results are gathered in Table . The empirical quantiles are close to the theoretical quantiles of W(R) even when R = 10, which further supports our proposed method of using the approximated normal distribution to calculate the critical value of the IPC test when R is relatively large. Looking further into the results in Table , we can see that Tn's empirical quantiles seem to be almost systematically smaller than the quantiles of V(R) (with the exception of the 95% quantile when R = 35), while larger than the quantiles of W(R) (both by a very small amount). Note that the asymptotic distribution V(R) can be viewed as a chi-square-type mixture. Such chi-square-type mixture follows an asymmetrical, positively skewed (or right-skewed) distribution, in which the left tail is shorter while the right tail is longer. To be specific, the skewness of V(R) is E(V(R)EV(R))3/Var(V(R))3/2=(808π2)/{(2π2/36)3/2(R1)1/2}>0, which will tend to zero as R goes to infinity. While the normal distribution W(R) is symmetric, its skewness is 0. Since V(R) is a better approximation of the exact distribution of Tn, it makes sense that the 90% and 95% quantiles of both the Tn's empirical distribution and V(R) will be slightly larger than that of W(R). It is also interesting that the Tn's empirical quantiles fall between the quantiles of V(R) and the quantiles of W(R). This may implicate that the skewness of the exact distribution of Tn seems to be smaller than that of V(R).

Table 2. Comparison of empirical quantiles with two theoretical quantiles.

We further compare the empirical null distribution with W(R). Still generate Y{1,,R} with equal probabilities and X independently from U(0,1). Consider four scenarios: (a) R = 5, n=100×R=500; (b) R = 10, n=80×R=800; (c) R = 20, n=40×R=800; (d) R = 50, n=30×R=1500. We run the simulation 100000 times for each scenario to obtain 100000 values of the IPC test statistic Tn. Then we compare the empirical distribution of the standardized IPC test statistic [Tn(R1)]/2(π2/33)(R1) with the standard normal distribution N(0,1) in Figure . In scenario (a) when R = 5 is too small, the empirical density curve of the standardized IPC test statistic deviates to some extent from the normal density function, even though the sample size n = 500 is large. Also, when R = 5, the empirical density is positively skewed, with more values clustered around the left tail while the right tail is slightly longer. The empirical density curve, however, is very well matched to the standard normal density curve when R increases, such as in scenario (c) when R = 20. This further emphasizes that R should be large enough (say, larger than 10) to ensure the normal approximation in Theorem 3.1 to hold.

Figure 1. Comparing the empirical distribution of the standardized IPC test statistic with the standard normal distribution. The blue broken line represents the empirical density and the black solid line represents the standard normal density. The empirical density is a kernel density estimate using Gaussian kernels based on 100000 values of Tn. In each panel, the histogram of the standardized IPC test statistic is also displayed. (a) R = 5, n = 500. (b) R = 10, n = 800. (c) R = 20, n = 800 and (d) R = 50, n = 1500.

Figure 1. Comparing the empirical distribution of the standardized IPC test statistic with the standard normal distribution. The blue broken line represents the empirical density and the black solid line represents the standard normal density. The empirical density is a kernel density estimate using Gaussian kernels based on 100000 values of Tn. In each panel, the histogram of the standardized IPC test statistic is also displayed. (a) R = 5, n = 500. (b) R = 10, n = 800. (c) R = 20, n = 800 and (d) R = 50, n = 1500.

The following theorem allows us to bound the deviation of the IPC statistic when R is diverging, which is parallel to Theorem 3.1 in Ma et al. (Citation2022).

Theorem 3.2

Suppose Rk=O(nkη) for some 0η<1/2 and there exists a positive constant c1 such that c1/Rkpr,k for r=1,,Rk, k=1,2,. Then for any ϵ(0,1), (10) P(|IPCˆnk(Xk,Yk)IPC(Xk,Yk)|>ϵ)C1nkRkexp(C2nkϵ2Rk2),(10) where C1 is a positive constant and C2>0 depends only on c1.

Remark 3.1

He et al. (Citation2019) has also established a concentration inequality for the IPC statistic. However, their theoretical guarantee relies on a fixed number of categories (i.e., η=0). Thus, Theorem 3.2 is different to Lemma 4 in He et al. (Citation2019).

The condition c1/Rkpr,k for r=1,,Rk, which is also used in Cui et al. (Citation2015) and Cui and Zhong (Citation2019), requires that the proportion of each category of Yk can not be too small. Indeed, the condition can be relaxed in a way that c1 is allowed to tend to 0 at a slow rate. Specifically, if we assume c1=o(nkτ) for some 0<τ<1/2η, then the probability in (Equation10) will still converge to zero, but the convergence rate will be relatively slower. Note that Theorem 2.4 is a special case of Theorem 3.2 when η=0, i.e., Rk is fixed, and the condition on pr,k is automatically satisfied.

3.2. Extension of the IPC test

A natural application of Theorem 3.1 is to extend the IPC test to test independence between two continuous variables via the slicing technique. Consider two continuous random variables X and Z. Without loss of generality, we assume that the supports of X and Z are R. We define a partition of the support of Z with a given positive integer R: (11) S={[qr1,qr):qr1<qr, r=1,,R},(11) where q0=, qR=. Each interval [qr1,qr) is called a slice in the literature (Mai & Zou, Citation2015b; Yan et al., Citation2018). And a new random variable can be accordingly defined as YS=r if and only if qr1Z<qr for r=1,,R. The IPC test can be applied to test independence between X and YS. If the distribution of Z is known, we suggest a uniform slicing to partition Z such that qr=FZ1(r/R) for r=1,,R, where FZ(z) is the cumulative distribution function of Z. However, in practice, FZ(z) is usually unknown. But given observations {(Xi,Zi),i=1,,n} with sample size n, we can use qˆr=FˆZ1(r/R) to estimate qr for r=1,,R, where FˆZ(z) is the empirical distribution of Z. And Sˆ={[qˆr1,qˆr),r=1,,R} is regarded as an intuitive uniform slicing scheme (Yan et al., Citation2018). We also define YiSˆ=r if and only if Zi[qˆr1,qˆr) for r=1,,R, i=1,,n. Now, we compute IPCˆn(X,YSˆ) as IPCˆn(X,YSˆ):=r=1Rp~r[Fn(x)F~rn(x)]2Fn(x)(1Fn(x))dFn(x),where p~r=1ni=1nI(YiSˆ=r)=1/R, and F~rn(x)=1ni=1nI(Xix,YiSˆ=r)/p~r is the empirical conditional distribution of X based on the subjects for which qˆr1Zi<qˆr. We reject hypothesis H0: X and Z are independent, if (nIPCˆn(X,YSˆ)R+1)/2(π2/33)(R1)Φ1(1α) for some given significance value α(0,1), where Φ(x) is the standard normal distribution function.

Obviously, it is important to choose an appropriate R for testing independence. If R is too large, then the sample size in each slice is too small, making the estimate of the IPC index inaccurate. And if R is too small, then much information of Z may be lost, making the test power poor. In the slicing literature (Mai & Zou, Citation2015b; Yan et al., Citation2018; Zhong et al., Citation2021), a common choice is to set R=logn, where x is the integer part of x. And according to Theorem 3.1, we can also choose R<n1/4. In practice, we recommend choosing R=n/k for some 20k50, so that the sample size in each slice is about 20 to 50.

3.3. Comparison with the MV test

In this subsection, we would like to discuss the advantages of the IPC test compared to the MV test. As explained in Cui and Zhong (Citation2019), the MV index can be considered as the weighted average of Crame´r-von Mises distances between Fr(x), the conditional distribution of X given Y = r, and F(x), the unconditional distribution function of X. Note that the IPC index can be viewed as a modification of the MV index by adding a weight function {F(x)(1F(x))1}. Such weight function is large for F(x) near 0 and 1, and smaller near F(x)=1/2. Hence, the IPC test emphasizes more on the difference between Fr(x) and F(x) near the tail of F(x). As it is known, Fr(x)F(x)=j=1Rpj(Fr(x)Fj(x)). Accordingly, the IPC test is more sensitive to tail differences among the conditional distributions. In the following, we consider the test of independence between a continuous random variable and a categorical variable with a relatively large number of classes (i.e., R is large) and the test of independence for two continuous random variables, and further illustrate the IPC test's sensitivity to differences in the tails of the conditional distributions through numerical simulations.

1. When R is large or is allowed to diverge. In this case, we recommend using a normal distribution to approximate the IPC test's null distribution due to Theorem 3.1. It is not surprising that given a large R, IPC test still retains sensitivity to tail differences when using a normal distribution instead of V(R) to calculate p-value. The following example is used to illustrate this issue.

Let Y{1,,20} with P(Y=r)=1/20, for r=1,,20. When Y = r, generate XBW+(1B)Vr, where BBinomial(1,p), W and Vr are independent, W=N(0,1) and Vr=N(10+r,1). To intuitively gain some understanding of our simulation setting, set p = 0.8. We draw the conditional distributions of X given Y = 1 and Y = 5, respectively in Figure . It is easy to see that the conditional distributions differ from each other only at their right tails. We choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We apply the IPC test and the MV test, and compute the p-values for these two tests by using their approximated normal distributions. The empirical powers of these two tests based on 500 replicates at the significance level α=0.05 are presented in Table . To further validate the robustness of the IPC test against heavy-tails, we further consider Wt(1) in the above setting. The empirical powers are also shown in Table . A larger p indicates that the differences among the conditional distributions occur in a more extreme right tail end, and thus are more difficult to detect the dependence between X and Y. We can see from Table  that the IPC test is significantly more powerful than the MV test when p<0.9. When p = 0.9, neither the IPC nor the MV has sufficient statistical power to detect the dependence between X and Y. The simulation validates that the IPC test has a better power to tail differences among the conditional distributions. In Example 4.1 we will compare with other existing methods to further validate the IPC test's sensitivity towards tail differences.

Figure 2. Panel (a) shows the pair of conditional distributions. The blue solid line represents the conditional distribution of X given Y = 1, that is, BN(0,1)+(1B)N(11,1) where BBinomial(1,0.8); and the red dot-dash line represents the conditional distribution of X given Y = 5, that is, BN(0,1)+(1B)N(15,1) where BBinomial(1,0.8). Panel (b) shows the corresponding conditional density functions.

Figure 2. Panel (a) shows the pair of conditional distributions. The blue solid line represents the conditional distribution of X given Y = 1, that is, BN(0,1)+(1−B)N(11,1) where B∼Binomial(1,0.8); and the red dot-dash line represents the conditional distribution of X given Y = 5, that is, BN(0,1)+(1−B)N(15,1) where B∼Binomial(1,0.8). Panel (b) shows the corresponding conditional density functions.

Table 3. Test of independence between a continuous variable and a categorical variable with R = 20 classes.

2. Testing independence between continuous random variables. We follow the notation in Section 3.2. Let X and Z be two continuous random variables. It is natural to expect that the IPC test will be more powerful than the MV test to detect the tail differences among the conditional distribution of X given Z. Consider a straightforward extension of the IPC index in (Equation5) and define the following index between X and Z: (12) IPC(X,Z)=[F(xZ=z)F(x)]2F(x)(1F(x))dF(x)dFZ(z),(12) where F(Z=z) is the conditional distribution of X given Z = z, and F(x) and FZ(z) are the distributions of X and Z, respectively. Given a positive integer R and a corresponding uniform slicing scheme S defined as in (Equation11) with qr=FZ1(r/R) for r=1,,R, recall that YS=r if and only if qr1Z<qr. Under certain mild conditions, Ma et al. (Citation2022) has shown that IPC(X,YS)IPC(X,Z), as R.

From (Equation12), again, we have some insights that the IPC test of independence emphasizes more on the difference between F(xZ=z) and F(x) near the tail of F(x). We use a toy sample to further illustrate this issue. Generate ZUnif(4,6), and generate X=BW+5(1B)Z, where BBinomial(1,p). We still consider two settings of W: (i) WN(0,1) and (ii) Wt(1). Choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We follow the step in Section 3.2 and choose R = 20 to conduct the test of independence. Table  presents the empirical powers of IPC and MV tests based on 500 replicates at the significance level α=0.05. IPC test outperforms the MV test in these settings. Note that when p = 0.8, the MV test is almost invalid. However, the IPC test still has a reasonably acceptable power.

Table 4. Test of independence between two continuous random variables.

4. Numerical studies and data application

4.1. Numerical studies

In this section, we assess the finite-sample performance of the IPC test by comparing with some powerful methods proposed in recent years: the MV test (Cui & Zhong, Citation2019), the distance correlation (DC) test (Székely et al., Citation2007), the HHG test (Heller et al., Citation2012Citation2016) and the Hilbert-Schmidt independence criterion (HSIC) test (Gretton et al., Citation2005Citation2007; Pfister et al., Citation2018). The R packages energy, HHG, and dHSIC are used to implement the DC test, the HHG test and the HSIC test, respectively. Note that the DC test can not be directly applied to a categorical variable, so in our simulations we will transfer a categorical variable with R categories into a random vector with R−1 binary dummy variables and apply dcov.test to this dummy vector instead of the original data. For the DC, HHG, and HSIC tests, the permutation test with K = 200 is used to calculate the p-value.

Example 4.1

In this example, we evaluate the performance of IPC test for the large-R case. Let R = 15, and we consider the following two cases.

Model 1.1. Generate Y{1,,15} with equal probabilities. And let μ=(μ1,,μ15), where μ5j+l=l+1 for 1l3, and μ5j+l=l+2 for l = 4, 5, j = 0, 1, 2. For Y = r, generate X=BU+(1B)(Vμr+20), where BBinomial(1,p), UUnif(20,20), VμrBeta(3,μr).

Model 1.2. Generate YUnif(0,4). And let XBU+(1B)W, where WUnif(cos(Yπ)+21,cos(Yπ)+24). B, U are the same as in Model 1.1.

Let n = 400. In Model 1.2, we uniformly slice Y into a categorical variable with R = 15 classes in order to apply the IPC and MV tests. Let p vary from 0 to 1 in both two models. We compute the p-value for the IPC test by using the asymptotic distribution in Theorem 3.1. The empirical power of each test based on 500 simulations at the significance level α=0.05 is shown in Figure . Note that, when p = 1, X is independent with Y in both models. We deliberately report the results, i.e., the type I error rates of each test, in Table . The type I error rates of the IPC test (and other tests) are close to the nominal significance level α=0.05, which further supports Theorem 3.1. Figure  clearly shows that the IPC test outperforms other competitors. And the power differences between IPC test and MV test exceed 0.25 when p = 0.6 for both models.

Figure 3. Comparison of powers of several tests of independence against different p in Example 4.1. In each case, 500 simulations are used to estimate the power. (a) Model 1.1 and (b) Model 1.2.

Figure 3. Comparison of powers of several tests of independence against different p in Example 4.1. In each case, 500 simulations are used to estimate the power. (a) Model 1.1 and (b) Model 1.2.

Table 5. Empirical type I error rates at the significance level α=0.05 in Example 4.1.

Looking further into the models considered in this example. In both Model 1.1 and Model 1.2, the conditional distributions of X given Y differ from each other only in their right tails when p>0.5. A larger p indicates that the conditional distribution functions differ from each other in a more extreme tail end. And when p = 1, X and Y are independent. Thus it could be more difficult to detect the dependence between X and Y for a larger p<1. As a result, we can see from Figure  that the power of each test decreases with the growth of p. Among the tests considered, the DC test and the HSIC test perform the worst in both models. Their powers rapidly decrease to near 0 when p increases to 0.4. It can be seen that the IPC test and the MV test have a better performance compared to other tests. Furthermore, the IPC test has a significant higher power than the MV test when p is between 0.6 and 0.8 in both models. This further supports our observation in Section 3.3 that the IPC test is more sensitive to tail differences.

Example 4.2

This example considers a Poisson regression model. Let ZPoisson(u), where u=exp(0.8X10.8X2+log4), (X1,X2)N((0,1),Σ), Σ=(0.5|ij|)1i,j2. Let Y = Z if Y8; otherwise Y = 9. As a consequence, Y is a 10-categories variable. Consider n=100,150,,300. We apply the testing methods to test independence between Y and X1, Y and X2, respectively. And the asymptotic normal distribution in Theorem 3.1 is used to compute p-value for the IPC test. The empirical powers of each test based on 500 replications are summarized in Table . The IPC test has most excellent power performances in all settings. The HHG test and the HSIC test perform poorly when the sample size n150.

Table 6. Empirical powers of each test at the significance level α=0.05 against the sample sizes in Example 4.2.

The power of the IPC test is only slightly higher than that of the MV test. However, it is significantly higher than that of HHG and HSIC. The DC test has moderate performance, inferior to the MV test, but better than HSIC.

Example 4.3

In this example, we evaluate the power of the IPC test in testing independence between continuous variables. Simulations are carried out with sample size n = 400. We choose R = 15 to implement the IPC test. Generating ZUnif(2,2), the following alternatives are considered.

  1. Linear: X=Z/2+12γϵ, where γ is a noise parameter ranging from 0 to 1, and ϵUnif(2,2) is independent of Z.

  2. Quadratic: X=(12Z)2+4.5γϵ.

  3. Step function: X=f(Z)+25γϵ, where f takes value 2 in interval [2,1)[0,1) and value 2 in [1,0)[1,2].

  4. W-shaped: X=|Z+1|I(Z<0)+|Z1|I(Z0)+4γϵ.

  5. Sinusoid: X=cos(4πZ)+5γϵ.

  6. Ellipse: X=1(Z/2)2+1.5γϵ.

To conduct the IPC test and the MV test, we uniformly slice Z into a categorical variable Y with R = 15 classes. The choices of the coefficients in all of the above are to make sure that a full range of powers can be observed when γ varies from 0 to 1. In addition to the test methods mentioned before, in this example, we further consider a comparison with a new test, the modified Blum-Kiefer-Rosenblatt (MBKR) test (Zhou & Zhu, Citation2018) which is applied for testing independence between continuous variables. Figure  presents the empirical power of each test based on 500 simulations at the significance level α=0.05. We see from the figure that the IPC test performs quite excellent when the relationship has an oscillatory nature (the W-shaped and the sinusoid). It is also better than other competitors for the step function, and comparably well to the MBKR test for the quadratic function. However, the IPC test has poor performance compared to other tests for some smooth alternatives: the linear and the ellipse. For the linear function, the MBKR test has the highest performance. IPC test has comparable performance to HSIC. For the ellipse function, HHG test has the highest power and DC test performs the poorest. The performance of the IPC test, on the other hand, is moderate.

Figure 4. Comparison of powers of several tests of independence in Example 4.3. The noise level increases from left to right. In each case, 500 simulations are used to estimate the power of each test. (a) Linear. (b) Quadratic. (c) Step function. (d) W-shaped. (e) Sinusoid and (f) Ellipse.

Figure 4. Comparison of powers of several tests of independence in Example 4.3. The noise level increases from left to right. In each case, 500 simulations are used to estimate the power of each test. (a) Linear. (b) Quadratic. (c) Step function. (d) W-shaped. (e) Sinusoid and (f) Ellipse.

We give an intuitive explanation here for the excellent performance of the IPC test in detecting oscillatory relationships. Denote XY=r as the random variable which follows the conditional distribution of X given Y = r. By simple calculation, we find that if X and Z have an oscillatory relationship, then the variances of XY=r differ from each other more significantly. As a comparison, if X and Z have a linear relationship, then Var{XY=1}==Var{XY=15}. Consequently, the IPC test has a higher test power when there is an oscillatory relationship between X and Z.

4.2. Real data application

Example 4.4

We consider a data set from AIDS Clinical Trials Group Protocol 175 (ACTG175), which is available from the R package speff2trial. Many researchers have studied this data set, such as Tsiatis et al. (Citation2008), Zhang et al. (Citation2008), Lu et al. (Citation2013) and Zhou et al. (Citation2020). The data set contains 2139 HIV-infected subjects. And all the subjects were randomized to four different treatment groups with equal probability: zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, and ddI monotherapy. In addition to the treatment indicators indicating which group each subject was assigned to, the data contains many other important variables, such as the CD4 count at 20±5 weeks post-baseline (CD420), the CD4 count at baseline (CD40), the history of intravenous drug use, et al.

In this study, in order to get more elaborated results, we only consider the subjects under ZDV+zalcitabine groups (524 subjects) in the following analysis. The goal of our study is to check whether the treatment effect under ZDV + zalcitabine groups is dependent on some other covariates. Following Hammer et al. (Citation1996) and Tsiatis et al. (Citation2008), we use the change from baseline to 20±5 weeks in CD4 cell count, i.e., CD420−CD40, to measure the treatment effect. And the covariates of interest are listed below: history of intravenous drug use (0=no, 1=yes), gender (0=female, 1=male), antiretroviral history (0=naive, 1=experienced), age, and CD8 count at baseline (CD80). Thus the first three covariates are categorical, and the last two are continuous covariates. Let X=CD420CD40, and then there are 5 candidates Y. The null hypotheses are listed as follows.

  • H01: X is independent of Y with Y= history of intravenous drug use;

  • H02: X is independent of Y with Y= gender;

  • H03: X is independent of Y with Y= antiretroviral history;

  • H04: X is independent of Y with Y= age;

  • H05: X is independent of Y with Y= CD8 count at baseline.

We apply the IPC, MV, DC, HHG and HSIC tests to these five hypotheses. The permutation test with K = 1000 permutated times is used for DC, HHG and HSIC tests to compute the p-values. And for H04 and H05, we follow the approach in Section 3.2 to slice Y into a categorical variable with 15 classes to implement the IPC test and MV test. Table  summarizes the p-values of each test. If we only consider the significance level α=0.05, then we observe that all the tests reject H03, H04 and H05, and accept H02. That is, the treatment effect under the ZDV+zalcitabine group depends on antiretroviral history, age and CD80, but not on gender. Regarding the history of intravenous drug use, the IPC, DC, HHG and HSIC tests declare statistical dependence between this and the treatment effect. However, the MV test has a p-value larger than 0.05, and thus it can not reject H01. We draw the empirical conditional distributions of X given Y = 0 and 1 as well as the side-by-side boxplots in Figure , where Y=history of intravenous drug use. We see that the conditional distributions of X are different across different Y. However, the difference is relatively small and mainly occurs in the right tails. According to the discussion in Section 3.3, IPC test will be more powerful in such case. Also, the categories of Y are very unbalanced with #{Y=0}=448 and #{Y=1}=76, making the MV test more difficult to detect the dependence between X and Y.

Figure 5. The left panel shows the empirical conditional distributions of CD420CD40 given Y = 0 and Y = 1. And the right panel shows the side-by-side boxplots of CD420CD40 against Y = 0 and Y = 1. Here Y= history of intravenous drug use.

Figure 5. The left panel shows the empirical conditional distributions of CD420−CD40 given Y = 0 and Y = 1. And the right panel shows the side-by-side boxplots of CD420−CD40 against Y = 0 and Y = 1. Here Y= history of intravenous drug use.

Table 7. The p-values of each test in Example 4.4.

5. Discussion

In this paper, we studied the IPC test of independence between a continuous variable X and a categorical variable Y. When the number of categories of Y is fixed, the IPC test statistic is in essence the k-sample Anderson Darling test statistic, and its theoretical properties were studied in Scholz and Stephens (Citation1987). Our work mainly focused on two aspects. First, we derived the convergence rate of the IPC statistic to the IPC index and thus a lower bound of the power of the test at a given significance level with a finite sample size could be derived. Second, we showed that the standardized test statistic has an asymptotic normal distribution when the number of categories R diverges to infinity with the sample size. A distinguished merit is thereby shared by the IPC test, that is, its critical values can be easily obtained by using an approximated normal distribution when R is relatively large. As an application, we extended the IPC test to test independence between two continuous random variables. We uniformly slice a continuous variable into a discrete variable in order to apply the IPC test. And by allowing more slices as the sample size increases, the IPC test is allowed to gain more test power. The proposed test was compared to the DC test, HHG test, HSIC test and MV test on many simulation experiments. The results showed that the IPC test has a better performance in many scenarios. It is also possible to consider more different slicing schemes for independence testing of continuous variables. We left it for further research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by National Natural Science Foundation of China [Grant numbers 12271286, 11931001 and 11771241].

References

  • Csörgö, S. (1985). Testing for independence by the empirical characteristic function. Journal of Multivariate Analysis, 16(3), 290–299. https://doi.org/10.1016/0047-259X(85)90022-3
  • Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. https://doi.org/10.1080/01621459.2014.920256
  • Cui, H., & Zhong, W. (2018). A distribution-free test of independence and its application to variable selection. Available at arXiv:1801.10559.
  • Cui, H., & Zhong, W. (2019). A distribution-free test of independence based on mean variance index. Computational Statistics & Data Analysis, 139, 117–133. https://doi.org/10.1016/j.csda.2019.05.004
  • Dvoretzky, A., Kiefer, J., & Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27(3), 642–669. https://doi.org/10.1214/aoms/1177728174
  • Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In S. Jain, H. U. Simon, & E. Tomita (Eds.), Algorithmic learning theory (pp. 63–77). Springer Berlin Heidelberg.
  • Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp 585–592). Curran Associates Inc. NIPS'07.
  • Hall, P., & Heyde, C. C (1980). Martingale limit theory and its application, Probability and mathematical statistics, Inc, Academic Press [Harcourt Brace Jovanovich, Publishers].
  • Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., & Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335(15), 1081–1090. https://doi.org/10.1056/NEJM199610103351501
  • He, S., Ma, S., & Xu, W. (2019). A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Computational Statistics & Data Analysis, 137, 155–169. https://doi.org/10.1016/j.csda.2019.02.003
  • Heller, R., Heller, Y., & Gorfine, M. (2012). A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2), 503–510. https://doi.org/10.1093/biomet/ass070
  • Heller, R., Heller, Y., Kaufman, S., Brill, B., & Gorfine, M. (2016). Consistent distribution-free k-sample and independence tests for univariate random variables. Journal of Machine Learning Research, 17(29), 1–54.
  • Hoeffding, W. (1948). A non-parametric test of independence. The Annals of Mathematical Statistics, 19(4), 546–557. https://doi.org/10.1214/aoms/1177730150
  • Jiang, B., Ye, C., & Liu, J. S. (2015). Nonparametric k-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510), 642–653. https://doi.org/10.1080/01621459.2014.920257
  • Lu, W., Zhang, H. H., & Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5), 493–504. https://doi.org/10.1177/0962280211428383
  • Ma, W., Xiao, J., Yang, Y., & Ye, F. (2022). Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. Journal of Statistical Computation and Simulation, 92(15), 3222–3248. https://doi.org/10.1080/00949655.2022.2062358
  • Mai, Q., & Zou, H. (2015a). Sparse semiparametric discriminant analysis. Journal of Multivariate Analysis, 135, 175–188. https://doi.org/10.1016/j.jmva.2014.12.009
  • Mai, Q., & Zou, H. (2015b). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. https://doi.org/10.1214/14-AOS1303
  • Ni, L., & Fang, F. (2016). Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. Journal of Nonparametric Statistics, 28(3), 515–530. https://doi.org/10.1080/10485252.2016.1167206
  • Ni, L., Fang, F., & Shao, J. (2020). Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Computational Statistics & Data Analysis, 142, Article 106824. https://doi.org/10.1016/j.csda.2019.106824
  • Ni, L., Fang, F., & Wan, F. (2017). Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika, 80(6–8), 805–828. https://doi.org/10.1007/s00184-017-0629-9
  • Pfister, N., Bühlmann, P., Schölkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 80(1), 5–31. https://doi.org/10.1111/rssb.12235
  • Rosenblatt, M. (1975). A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics, 3(1), 1–14. https://doi.org/10.1214/aos/1176342996
  • Scholz, F.-W., & Stephens, M. A. (1987). k-sample Anderson–Darling tests. Journal of the American Statistical Association, 82(399), 918–924. https://doi.org/10.2307/2288805
  • Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312
  • Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505
  • Tsiatis, A. A., Davidian, M., Zhang, M., & Lu, X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine, 27(23), 4658–4677. https://doi.org/10.1002/sim.3113
  • Xu, K., Shen, Z., Huang, X., & Cheng, Q. (2020). Projection correlation between scalar and vector variables and its use in feature screening with multi-response data. Journal of Statistical Computation and Simulation, 90(11), 1923–1942. https://doi.org/10.1080/00949655.2020.1753057
  • Yan, X., Tang, N., Xie, J., Ding, X., & Wang, Z. (2018). Fused mean-variance filter for feature screening. Computational Statistics & Data Analysis, 122, 18–32. https://doi.org/10.1016/j.csda.2017.10.008
  • Zhang, M., Tsiatis, A. A., & Davidian, M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64(3), 707–715. https://doi.org/10.1111/j.1541-0420.2007.00976.x
  • Zhang, Y., Chen, C., & Zhu, L. (2022). Sliced independence test. Statistica Sinica, 32(Special onlline issue), 2477–2496. https://doi.org/10.5705/ss.202021.0203
  • Zhong, W., Wang, J., & Chen, X. (2021). Censored mean variance sure independence screening for ultrahigh dimensional survival data. Computational Statistics & Data Analysis, 159, Article 107206. https://doi.org/10.1016/j.csda.2021.107206
  • Zhou, N., Guo, X., & Zhu, L. (2020). A projection-based model checking for heterogeneous treatment effect. Available at arXiv:2009.10900.
  • Zhou, Y., & Zhu, L. (2018). Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation. Statistica Sinica, 28(3), 1351–1370. https://doi.org/10.5705/ss.202016.0264
  • Zhu, L., Xu, K., Li, R., & Zhong, W. (2017). Projection correlation between two random vectors. Biometrika, 104(4), 829–843. https://doi.org/10.1093/biomet/asx043

Appendix

Proof of theorems

This appendix contains the technical proofs of Lemma 2.2 and Theorem 3.1. Lemma 2.1 and Theorem 2.4 are direct corollaries of Theorem 3.2, and the proof of Theorem 3.2 follows from Lemma 4 in Ma et al. (Citation2022), and thus their proofs are omitted.

A.1. Notations and preliminaries

Recall that the IPC index of (X,Y), where X is a continuous random variable with support RX and Y{1,,R} is a categorical variable with R categories is defined as IPC(X,Y)=r=1Rpr[F(x)Fr(x)]2F(x)(1F(x))dF(x)=r=1R[prF(x)F(x,r)]2F(x)F¯(x)prdF(x),where F(x) is the distribution function of X, Fr(x)=P(XxY=r), F¯(x)=1F(x), pr=P(Y=r) and F(x,r)=P(Xx,Y=r), r=1,,R. And given i.i.d. samples Zi=(Xi,Yi) for i=1,,n, the IPC statistic is defined as IPCˆn(X,Y)=r=1Rpˆr(Fn(x)Frn(x))2Fn(x)F¯n(x)dFn(x)=r=1R(pˆrFn(x)Fn(x,r))2Fn(x)F¯n(x)pˆrdFn(x)=1nr=1Ri=1n(pˆrFn(Xi)Fn(Xi,r))2Fn(Xi)F¯n(Xi)pˆr,where Fn(x)=1ni=1nI(Xix), F¯n(x)=1Fn(x), pˆr=1ni=1nI(Yi=r), Fn(x,r)=1ni=1nI(Xix,Yi=r), and Frn(x)=Fn(x,r)/pˆr for r=1,,R.

We first provide a proof of Lemma 2.2.

Proof of Lemma 2.2.

It is obvious that IPC(X,Y)=0 if and only if X and Y are independent. By noticing that r=1Rpr=1 and r=1RF(x,r)=F(x), we have 1F(x)F¯(x)r=1R(prF(x)F(x,r))2pr=1F(x)F¯(x)(r=1RF2(x,r)prF2(x))<1F(x)F¯(x)(r=1RF(x,r)prprF2(x))=1F(x)F¯(x)(r=1RF(x,r)F2(x))=1.Hence we have IPC(X,Y)<1.

Next, we give some preparations for the proof of Theorem 3.1. For given constant C>0, let Fn,C(x)=F(x)n12+C, F¯n,C(x)=F¯(x)n12+C, FnC(x)=Fn(x)n12+C and F¯nC(x)=F¯n(x)n12+C. Then we have the following lemmas.

Lemma A.1

Let Δ1F(x)=Fn,C(x)FnC(x) and Δ2F=F¯n,C(x)F¯nC(x). Then supxR|Δ1F(x)|=Op(n1/2),andsupxR|Δ2F(x)|=Op(n1/2).

Proof.

It is easy to show that |Fn,C(x)FnC(x)||F(x)Fn(x)|.

Hence by Dvoretzky–Kiefer–Wolfowitz (DKW) inequality (Dvoretzky et al., Citation1956), supx|Δ1F(x)|supx|F(x)Fn(x)|=Op(n1/2).Similarly, we have supx|Δ2F(x)|=Op(n1/2).

Lemma A.2

supx|Fn,C(x)F¯n,C(x)FnC(x)F¯nC(x)FnC(x)F¯nC(x)|=Op(nC4+2C)=op(1).

Proof.

Note that Fn,C(x)F¯n,C(x)=(FnC(x)+Δ1F(x))(F¯nC(x)+Δ2F(x))=FnC(x)F¯nC(x)+F¯nC(x)Δ1F(x)+FnC(x)Δ2F(x)+Δ1F(x)Δ2F(x).Then, supx|FnC(x)F¯nC(x)Fn,C(x)F¯n,C(x)FnC(x)F¯nC(x)|supx|Δ1F(x)FnC(x)|+supx|Δ2F(x)F¯nC(x)|+supx|Δ1F(x)Δ2F(x)FnC(x)F¯nC(x)|=Op(n1/2+12+C)+Op(n1/2+12+C)+Op(n1+12+C)=Op(nC4+2C).

A.2. Proof of Theorem 3.1

To avoid any ambiguity, Theorem 3.1 considers a sequence of problems indexed by (nk,Rk,p1,k,,pRk,k), k=1,2,, where the sample size nk, the number of categories Rk, and let Yk=Y(Rk) denote the categorical variable with Rk categories and pr,k=P(Y(Rk)=r), r=1,,Rk. From now on, we shall omit the subscript unless specifically mentioned. Moreover, in Section A.2, we should keep in mind that X and Y are independent.

A.2.1. Architecture of the proof

Our aim here is to provide a general overview of the proof of Theorem 3.1. At a high level, the general structure is fairly simple. And to make the structure clear, we divide the proof into three parts.

  1. First, given a positive constant C, we substitute Fn,C(x), F¯n,C and pr for Fn(x), F¯n(x) and pˆr in the denominator of the IPC statistic, and thereby obtain IPCˆn,C(X,Y):=r=1R1pr[pˆrFn(x)Fn(x,r)]2Fn,C(x)F¯n,C(x)dFn(x).And then prove that the difference between nIPCˆn(X,Y)/R and nIPCˆn,C(X,Y)/R is bounded by nIPCˆn,C(X,Y)/R×Op(nC4+2C+Rmin1rRprn1/2)+Op(n12+CR), provided that Rmin1rRpr=o(n1/2).

  2. Fixing C = 6, let fi(x,r)=[I(Xix)F(x)][I(Yi=r)pr],and fi,n(x,r)=fi(x,r)Fn,6(x)F¯n,6(x),and define IPC~n(X,Y)=r=1R1pr[1ni=1nfi,n(x,r)]2dF(x).Under the condition Rmin1rRpr=o(n3/8), showing that nIPCˆn,6(X,Y)/R is close to nIPC~n(X,Y)/R and combined with the first part of the proof, we can derive that nIPCˆn(X,Y)nIPC~n(X,Y)=op(R).

  3. Finally, consider nIPC~n(X,Y)=J1n+J2n,where J1n=1ni=1nr=1R1prfi,n2(x,r)dF(x),and J2n=1nijr=1R1prfi,n(x,r)fj,n(x,r)dF(x).We show that J1n(R1)2(π233)(R1)P0,andJ2n2(π233)(R1)can be viewed as a martingale difference sequence. Then by the well-developed theory of central limit theorem of the martingale difference (Hall & Heyde, Citation1980), we can complete the proof.

Combined with Lemmas A.1 and A.2, the proof in part 1 is not difficult. And the proofs in part 2 and part 3 follow from Cui and Zhong (Citation2018) and Cui and Zhong (Citation2019) with a small modification.

A.2.2. Part 1

We summarize the conclusion we want to prove in part 1 into the following lemma.

Lemma A.3

For a fixed constant C, let IPCˆn,C(X,Y)=r=1R1pr[pˆrFn(x)Fn(x,r)]2Fn,C(x)F¯n,C(x)dFn(x).For simplicity, write IPCˆn=IPCˆn(X,Y), and IPCˆn,C=IPCˆn,C(X,Y). Then if Rmin1rRpr=o(n1/2), and under condition that X and Y are independent, we have |IPCˆnIPCˆn,C|=Op(n3+C2+CR)+IPCˆn,C(Op(nC4+2C)+Op(Rmin1rRprn1/2)).

Proof.

Let IPCˆn=1ni=1nr=1R[Fn(Xi,r)pˆrFn(Xi)]2Fn(Xi)F¯n(Xi)pr.Then |IPCˆnIPCˆn|max1rR|1prpˆr|1ni=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2Fn(Xi)F¯n(Xi)pr=IPCˆnmax1rR|1pˆrpr|.Since E(n(pˆrpr))2=pr(1pr),we have E(max1rR|pˆrpr|)2E(r=1R|pˆrpr|)2Rr=1RE(pˆrpr)2=Rr=1Rpr(1pr)nRn.So, max1rR|pˆrpr|=Op(R/n). Then max1rR|pˆrprpˆr|=max1rR|pˆrprpr+pˆrpr|max1rR|pˆrpr|max1rR1pr+pˆrpr.Since pˆrpr=Op(Rn)=op(min1rRpr), we have max1rR|pˆrprpˆr|=Op(Rmin1rRprn1/2)=op(1).Hence, IPCˆn=(1+Op(Rmin1rRprn1/2))IPCˆn. Next, let IPCˆn=1nr=1Ri=1n[pˆrFn(Xi)Fn(Xi,r)]2FnC(Xi)F¯nC(Xi)pr.Let X(1)X(2)X(n) be the ordered statistics of X1,,Xn. Since X is continuous, there are no ties among X1,,Xn. We can assume that X(1)<<X(n). Let An=n112+C, and define Sn1=1ni=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2Fn(Xi)F¯n(Xi)prI(XiX(An)),Sn2=1ni=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2Fn(Xi)F¯n(Xi)prI(XiX(nAn)).Indeed, we have 0IPCˆnIPCˆnSn1+Sn2. And ESn1=1ni=1nr=1Rj=1nE{[pˆrFn(Xi)Fn(Xi,r)]2Fn(Xi)F¯n(Xi)prI(XiX(An))I(Xi=X(j))}=1ni=1nr=1Rj=1AnE{[pˆrFn(Xi)Fn(Xi,r)]2Fn(Xi)F¯n(Xi)prI(Xi=X(j))}=1ni=1nj=1Anr=1R1j{(nj)[(n1)pr+1]n22(nj)n2[(n1)pr+1]+(nj1)pr+1n}=1ni=1nj=1AnR1n2=An(R1)n2.Similarly, we also have ESn2=Ann2(R1). Therefore, IPCˆnIPCˆn=Op(n3+C2+CR).Finally, according to Lemma A.2, |IPCˆnIPCˆn,C|=1n|i=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2Fn,C(Xi)F¯n,C(Xi)pri=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2FnC(Xi)F¯nC(Xi)pr|=1n|i=1nr=1R[pˆrFn(Xi)Fn(Xi,r)]2Fn,C(Xi)F¯n,C(Xi)pr×(Fn,C(Xi)F¯n,C(Xi)FnC(Xi)F¯nC(Xi)1)|IPCˆn,C×supx|Fn,C(x)F¯n,C(x)FnC(x)F¯nC(x)1|=IPCˆn,COp(nC4+2C).Hence |IPCˆnIPCˆn,C|=Op(n3+C2+CR)+IPCˆn,C(Op(nC4+2C)+Op(Rmin1rRprn1/2)).

A.2.3. Part 2

Recall that fi(x,r)=[I(Xix)F(x)][I(Yi=r)pr],fi,n(x,r)=fi(x,r)Fn,6(x)F¯n,6(x),and IPC~n(X,Y)=r=1R1pr[1ni=1nfi,n(x,r)]2dF(x).The following lemma is what we want to prove in part 2.

Lemma A.4

If Rmin1rRpr=o(n3/8), and Under H0: X and Y are independent, then IPCˆn(X,Y)IPC~n(X,Y)=Op(Rn9/8)+Op(Rn5/4min1rRpr)+IPC~n(X,Y)op(n1/8).

Proof.

For simplicity, write IPC~n=IPC~n(X,Y). Given C = 6, according to Lemma A.3, and under the condition that Rmin1rRpr=o(n3/8), we have (A1) IPCˆnIPCˆn,6=Op(n9/8R)+IPCˆn,6[Op(n3/8)+op(n1/8)]=Op(n9/8R)+IPCˆn,6op(n1/8).(A1) Let IPC~1n=r=1R1pr[1ni=1nfi,n(x,r)]2dFn(x).Next, we follow the proof of Lemma A.1 in Cui and Zhong (Citation2019), and show that IPCˆn,6IPC~1n=r=1R1pr1Fn,6(x)F¯n,6(x){[pˆrFn(x)Fn(x,r)]2[1ni=1nfi(x,r)]2}dFn(x)=O(n18)r=1R1pr{[pˆrFn(x)Fn(x,r)]2[1ni=1nfi(x,r)]2}dFn(x).Let f¯n(x,r)=1ni=1nfi(x,r). By the DKW inequality, we have supx|[pˆrFn(x)Fn(x,r)]2[1ni=1nfi(x,r)]2|=supx|pˆrFn(x)Fn(x,r)f¯n(x,r)||pˆrFn(x)Fn(x,r)+f¯n(x,r)|=supx|Fn(x)F(x)||pˆrpr|{supx|pˆrFn(x)Fn(x,r)|+supx|f¯n(x,r)|}=Op(n1/2)Op(n1/2)Op(n1/2)=Op(n3/2).Here, the second equality follows by pˆrFn(x)Fn(x,r)f¯n(x,r)={1ni=1nI(Xix,Yi=r)Fn(x)pˆr}{1ni=1nI(Xix,Yi=r)F(x)pˆrprFn(x)+prF(x)}=[Fn(x)F(x)][pˆrpr],and the last equality follows by supx|pˆrFn(x)Fn(x,r)|=Op(n1/2),supx|f¯n(x,r)|=Op(n1/2).Indeed, supx|pˆrFn(x)Fn(x,r)|supx|1ni=1n[I(Xix)F(x)]I(Yi=r)|+supx|F(x)1ni=1n[I(Yi=r)pr]|+|pˆrpr|+supx|Fn(x)F(x)|=supx|1ni=1n[I(Xix)F(x)]I(Yi=r)|+supxF(x)|pˆrpr|+Op(n1/2)=supx|1ni=1n[I(Xix)F(x)]I(Yi=r)|+Op(n1/2),and E[supx|1ni=1n[I(Xix)F(x)]I(Yi=r)|]=m=1nE[supx|1ni=1n[I(Xix)F(x)]I(Yi=r)|,  m Yis=r]=m=1n(nm)prm(1pr)nmmnE[supx|1mi=1m[I(Xix)F(x)]|]4m=1n(nm)prm(1pr)nmmn4n1/2,where the first inequality follows by the DKW inequality. Hence, supx|pˆrFn(x)Fn(x,r)|=Op(n1/2) and similarly supx|f¯n(x,r)|=Op(n1/2). Therefore, we have (A2) IPCˆn,6IPC~1n=Rmin1rRprOp(n11/8).(A2) Combining (EquationA1) and (EquationA2), we have IPCˆnIPC~1n=Op(Rn9/8)+Rmin1rRprOp(n11/8)+IPC~1nop(n1/8).To complete the proof, we only need to show that IPC~1nIPC~n=r=1R1pr[1ni=1nfi,n(x,r)]2d[Fn(x)F(x)]=Rmin1rRprOp(n11/8).It is enough to show that In(r):=[1ni=1nfi,n(x,r)]2d[Fn(x)F(x)]=Op(n11/8).Without loss of generality, let F(x) be the uniform distribution function, since we can make the transformation X=F(X) for the continuous random variable X. And In(r)=1nj=1n[1ni=1nfi,n(Xj,r)]201[1ni=1nfi,n(x,r)]2dx.For any x,y(0,1), it can be easily proved that Efi,n(x,r)fj,n(y,r)=xyxyx(n)(1x)(n)y(n)(1y)(n)(prpr2)I(i=j),where x(n)=xn1/8 and (1x)(n)=(1x)n1/8. Then EIn2(r)=E{01[1nj=1n[f¯n(Xj)2f¯n(x)2]]dx}2=E{0101[1nj=1n[f¯n(Xj)2f¯n(x)2]][1nj=1n[f¯n(Xj)2f¯n(y)2]]dxdy}=1n0101E{[f¯n(X1)2f¯n(x)2][f¯n(X1)2f¯n(y)2]}dxdy+n1n0101E{[f¯n(X1)2f¯n(x)2][f¯n(X2)2f¯n(y)2]}dxdy=0101E{[f¯n(X1)2f¯n(x)2][f¯n(X2)2f¯n(y)2]}dxdy+1n[E[f¯n(X1)2f¯n(X1)2]E[f¯n(X1)2f¯n(X2)2]],where f¯n(x)=n1i=1nfi,n(x,r). And be careful here that f¯n(x) is different from f¯n(x,r) defined above.

Since Efi,n(x,r)=0 under H0, we have E[fi,n(x,r)fj,n(x,r)fk,n(y,r)fl,n(y,r)]=0under H0 if one of {i,j,k,l} is different from the other three. Then we have E[f¯n(x)2f¯n(y)2]=1n4i,jk,lE[fi,n(x,r)fj,n(x,r)fk,n(y,r)fl,n(y,r)]=1n3E[f1,n(x,r)2f1,n(y,r)2]+n1n3E[f1,n2(x,r)]E[f2,n2(y,r)]+2(n1)n3{E[f1,n(x,r)f1,n(y,r)]}2=1n31x(n)(1x)(n)y(n)(1y)(n)E[f1(x,r)2f1(y,r)2]+n1n31x(n)(1x)(n)y(n)(1y)(n)E[f12(x,r)]E[f22(y,r)]+2(n1)n31x(n)(1x)(n)y(n)(1y)(n){E[f1(x,r)f1(y,r)]}2=O(n11/4)+(prpr2)2n2[xy(1x)(1y)+2(xyxy)2]x(n)(1x)(n)y(n)(1y)(n).And also, we have E[f¯n(X1)2f¯n(y)2]=1n4i,jk,lE[fi,n(X1,r)fj,n(X1,r)fk,n(y,r)fl,n(y,r)]=O(n11/4)+(prpr2)2n201xy(1x)(1y)+2(xyxy)2x(n)(1x)(n)y(n)(1y)(n)dx,E[f¯n(x)2f¯n(X2)2]=1n4i,jk,lE[fi,n(x,r)fj,n(x,r)fk,n(X2,r)fl,n(X2,r)]=O(n11/4)+(prpr2)2n201xy(1x)(1y)+2(xyxy)2x(n)(1x)(n)y(n)(1y)(n)dy,E[f¯n(X1)2f¯n(X2)2]=1n4i,jk,lE[fi,n(X1,r)fj,n(X1,r)fk,n(X2,r)fl,n(X2,r)]=O(n11/4)+(prpr2)2n201xy(1x)(1y)+2(xyxy)2x(n)(1x)(n)y(n)(1y)(n)dxdy,and E[f¯n(X1)2f¯n(X1)2]=1n4i,jk,lE[fi,n(X1,r)fj,n(X1,r)fk,n(X1,r)fl,n(X1,r)]=O(n11/4)+(prpr2)2n201x2(1x)2+2(xx2)2(x(n)(1x)(n))2dx.Hence, E[In(r)2]=0101E[f¯n(X1)2f¯n(X2)2]dxdy0101E[f¯n(X1)2f¯n(y)2]dxdy0101E[f¯n(x)2f¯n(X1)2]dxdy+0101E[f¯n(x)2f¯n(y)2]dxdy+1n[E[f¯n(X1)2f¯n(X1)2]E[f¯n(X1)2f¯n(X2)2]]=O(n11/4).So, IPCˆnIPC~n=IPCˆnIPC~1n+IPC~1nIPC~n=Op(Rn9/8)+Rmin1rRprOp(n11/8)+IPC~nop(n1/8).

A.2.4. Part 3

Now, we will complete the proof of Theorem 3.1.

Proof of Theorem 3.1.

Let T~n=nIPC~n. Without loss of generality, we assume that XUnif(0,1). Then F(x)=x for 0x1. According to Lemma A.4, we have TnT~n=Op(Rn1/8)+Op(Rn3/8min1rRpr)+op(T~nn1/8).Then under the condition R/min1rRpr=o(n3/8), we have R=o(n1/4), and thus TnT~n=op(R)+T~nop(n1/8), i.e., Tn(R1)2(π233)(R1)T~n(R1)2(π233)(R1)=op(1)+T~n(R1)2(π233)(R1)op(n1/8)+op(Rn1/8)=T~n(R1)2(π233)(R1)op(n1/8)+op(1).Hence, we only need to prove that T~n(R1)2(π233)(R1)dN(0,1),as n.

Recall that fi,n(x,r)=(I(Xix)x)(I(Yi=r)pr)x(n)(1x)(n), where x(n)=xn1/8 and (1x)(n)=(1x)n1/8. We first give some important facts:

  1. E[fi,n(x,r)fi,n(y,s)]=(xyxy)(prδrsprps)x(n)(1x)(n)y(n)(1y)(n);

  2. E[fi,n2(x,r)fi,n2(y,s)]Cn1/8(prδrs+prps(pr+ps)),

for all 1in, 1r,sR, where C is a constant and δrs=1 if r = s and δrs=0, otherwise.

We prove (ii). Without loss of generality, we assume that xy. E[fi,n2(x,r)fi,n2(y,s)]=[prδrs+prps(pr+ps)]E[I(Xix)x]2[I(Xiy)y]2x(n)(1x)(n)y(n)(1y)(n).And E[I(Xix)x]2[I(Xiy)y]2x(n)(1x)(n)y(n)(1y)(n)=E[I(Xix)2xI(Xix)+x2][I(Xiy)2yI(Xiy)+y2]x(n)(1x)(n)y(n)(1y)(n)=x(1y)(1y2x+3xy)x(n)(1x)(n)y(n)(1y)(n)x(1y)x(n)(1x)(n)y(n)(1y)(n)4n1/8.The last inequality is because, if 1/2xy, then x(1y)x(n)(1x)(n)y(n)(1y)(n)4x(1x)(n)4n1/8; if xy1/2, then x(1y)x(n)(1x)(n)y(n)(1y)(n)41yy(n)4n1/8; if x1/2y, then x(1y)x(n)(1x)(n)y(n)(1y)(n)4.

(iii) r,s,t,q=1R(prδrsprps)(prδrtprpt)(ptδtqptpq)(psδsqpspq)prpsptpq=O(R). This result can be found in Cui and Zhong (Citation2018) and Cui and Zhong (Citation2019).

Write T~n=1nr=1R1pr01[i=1nfi,n(x,r)]2dx=:J1n+J2n,where J1n=1ni=1nr=1R1prfi,n2(x,r)dx,and J2n=1nijr=1R1prfi,n(x,r)fj,n(x,r)dx.Note that, EJ1n=r=1R1prE(I(Xix)x)2(I(Yi=r)pr)2x(n)(1x)(n)dx=r=1R(1pr)01x(1x)x(n)(1x)(n)dx=(R1)(1n1/8),and Var(J1n)=1nVar(r=1R1prf1,n2(x,r)dx)1nE(r=1R1prf1,n2(x,r)dx)2=1n(r,s1prpsE[f1,n2(x,r)f1,n2(y,s)]dxdy)Cn1/8nr,sRprδrs+prps(pr+ps)prpsCn7/8(Rminpr+R)=O(n3/8)=o(1).Hence, E(J1n(R1)2(π233)(R1))2=C{Var(J1n)/(R1)+[EJ1n(R1)]2/(R1)}=CVar(J1n)/(R1)+C(R1)n1/4=o(1),where C is a constant. Next, we only need to show that J2n2(π2/33)(R1)dN(0,1).Note that EJ2n=0, and Var(J2n)=E(J2n2)=1n2ijklr,sR1prpsE[fi,n(x,r)fj,n(x,r)fk,n(y,s)fl,n(y,s)]dxdy=2n(n1)n2r,sR1prps{E[f1,n(x,r)f1,n(y,s)]}2dxdy=2n(n1)n2r,sR(prδrsprps)2prps(xyxy)2x(n)(1x)(n)y(n)(1y)(n)dxdy=(11n)(R1)[2(xyxy)2x(1x)y(1y)dxdy+O(n1/8)]=(11n)(R1)[2(π2/33)+O(n1/8)].The last equality holds because 0101(xyxy)2x(1x)y(1y)dxdy=π233.Let Fi=σ{(X1,Y1),,(Xi,Yi)} be the σ-field generated by a set of random variables {(X1,Y1),,(Xi,Yi)}, i=1,,n. We see that J2n2(π2/33)(R1)=i=2n[2nj=1i1r=1R1prfi,n(x,r)fj,n(x,r)dx]2(π2/33)(R1)=:i=2nZniis the summation of a martingale difference sequence with E(Zni)=0 and Var(i=2nZni)=(11n)(1+O(n1/8))1. According to Hall and Heyde (Citation1980), we need to prove i=2nE[Zni2Fi1]P1. E[Zni2Fi]=12(π2/33)(R1)(2n)2×j,ki1r,sR1prpsE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fk,n(y,s)dxdy.Thus we have i=2nE[Zni2Fi1]=J3n+J4n,where J3n=12(π2/33)(R1)(2n)2×j=1n1(nj)r,sR1prpsE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fj,n(y,s)dxdy,and J4n=22(π2/33)(R1)(2n)2×j<kn(nk)r,sR1prpsE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fk,n(y,s)dxdy.Since E(J3n)1, and Var(J3n)=C(R1)2n4j=1n1(nj)2×Var(r,sR1prpsE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fj,n(y,s)dxdy)C(R1)2n4j=1n1(nj)2×E(r,sR1prpsE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fj,n(y,s)dxdy)2=C(R1)2n4j=1n1(nj)2×E(r,sRprδrsprpsprpsxyxyx(n)(1x)(n)y(n)(1y)(n)fj,n(x,r)fj,n(y,s)dxdy)2C(R1)2n4j=1n1(nj)2R2×E{r,sR(prδrsprpsprps)2(xyxyx(n)(1x)(n)y(n)(1y)(n)fj,n(x,r)fj,n(y,s)dxdy)2}C(R1)2n4j=1n1(nj)2×R2E{r,sR(prδrsprpsprps)2(xyxy)2x(n)(1x)(n)y(n)(1y)(n)fj,n2(x,r)fj,n2(y,s)dxdy}C(R1)2n4j=1n1(nj)2R2r,sR(prδrsprpsprps)2E[fj,n2(x,r)fj,n2(y,s)]dxdyC(R1)2n4j=1n1(nj)2R2r,sR(prδrsprpsprps)2n1/8(prδrs+prps(pr+ps))C(R1)2n4j=1n1(nj)2R2n1/8Rminpr=O(n7/8Rminpr)=O(n3/8),where C and C are constants. Thus J3n1. And E(J4n)=0, and Var(J4n)=CR2n4j<k,l<m(nk)(nm)×r,sRt,qRE{1prpsptpqE[fi,n(x,r)fi,n(y,s)]fj,n(x,r)fk,n(y,s)dxdy×E[fi,n(x,t)fi,n(y,q)]fl,n(x,t)fm,n(y,q)dxdy}=CR2n4j<k,l<m(nk)(nm)r,sRt,qR(prδrsprps)(ptδtqptpq)prpsptpq×(xyxy)(xyxy)x(n)(1x)(n)y(n)(1y)(n)(x)(n)(1x)(n)(y)(n)(1y)(n)×E[fj,n(x,r)fk,n(y,s)fl,n(x,t)fm,n(y,q)]dxdydxdy=CR2n4j<k(nk)(nk)r,sRt,qR(prδrsprps)(ptδtqptpq)prpsptpq×(xyxy)(xyxy)x(n)(1x)(n)y(n)(1y)(n)(x)(n)(1x)(n)(y)(n)(1y)(n)×E[fj,n(x,r)fk,n(y,s)fj,n(x,t)fk,n(y,q)]dxdydxdyCR2n4j<k(nk)(nk)×r,sRt,qR(prδrsprps)(ptδtqptpq)(prδrtprpt)(psδsqpspq)prpsptpq=CR2n4k=2n(k1)(nk)2O(R)=O(1/R).Thus, J4nP0. On the other hand i=2nE(Zni4)i=2nCn4R2E[j=1i1r=1R1prfi,n(x,r)fj,n(x,r)dx]4i=2nCn4R2(6(i12)+i1)E[r=1R1prf1,n(x,r)f2,n(x,r)dx]4CnR2E[r=1R1prf1,n(x,r)f2,n(x,r)dx]4=CnR2E[r,sR1prpsf1,n(x,r)f1,n(y,s)f2,n(x,r)f2,n(y,s)dxdy]2CnR2E[r,sR1prps(f1,n2(x,r)f1,n2(y,s)dxdy)1/2(f2,n2(x,r)f2,n2(y,s)dxdy)1/2]2CnR2(r,sR1prpsE[f1,n2(x,r)f1,n2(y,s)]dxdy)2CnR2(r,sRprδrs+prps(pr+ps)prpsn1/8)2=Cn3/4R2(Rminpr+R+2)2=O(1n3/4(minpr)2)=o(1/R),where C, C and C are constants. By the central limit theorem of the martingale difference (Hall & Heyde, Citation1980), we have T~n(R1)2(π2/33)(R1)dN(0,1),as n. This completes the proof.