Full article: A distribution-free test of independence based on a modified mean variance index

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Cui and Zhong (2019), (Computational Statistics & Data Analysis, 139, 117–133) proposed a test based on the mean variance (MV) index to test independence between a categorical random variable Y with R categories and a continuous random variable X. They ingeniously proved the asymptotic normality of the MV test statistic when R diverges to infinity, which brings many merits to the MV test, including making it more convenient for independence testing when R is large. This paper considers a new test called the integral Pearson chi-square (IPC) test, whose test statistic can be viewed as a modified MV test statistic. A central limit theorem of the martingale difference is used to show that the asymptotic null distribution of the standardized IPC test statistic when R is diverging is also a normal distribution, rendering the IPC test sharing many merits with the MV test. As an application of such a theoretical finding, the IPC test is extended to test independence between continuous random variables. The finite sample performance of the proposed test is assessed by Monte Carlo simulations, and a real data example is presented for illustration.

Keywords:

1. Introduction

As a fundamental task in statistical inference and data analysis, testing independence of random variables has been explored for decades in the literature. Based on different types of random variables, many approaches to test independence have been proposed. For instance, if one wants to test independence between two categorical random variables, then the contingency table analysis and the Pearson chi-square test can be used. If both variables are continuous, there are also many important tests, such as, Hoeffding (Citation1948), Rosenblatt (Citation1975), Csörgö (Citation1985) and Zhou and Zhu (Citation2018), among others. Testing independence between random vectors has also received much attention in recent years, for instance, Székely et al. (Citation2007), Székely and Rizzo (Citation2009), Heller et al. (Citation2012), Zhu et al. (Citation2017), Pfister et al. (Citation2018) and Xu et al. (Citation2020).

It is also important to test independence between a continuous variable and a categorical variable. Suppose X is a continuous variable with support $R_{X}$ and $Y \in {1, \dots, R}$ is a categorical variable with R categories. We are interested in the following test of hypothesis: $H_{0} : X a n d Y a r e i n d e p e n d e n t, v e r s u e H_{1} : X a n d Y a r e n o t i n d e p e n d e n t .$ Or, equivalently, (1) $\begin{aligned} H_{0} : F (x) = F_{r} (x), f o r a n y x \in R_{X} a n d r = 1, \dots, R, \\ v e r s u e & H_{1} : F (x) \neq F_{r} (x), f o r s o m e x \in R_{X} a n d r = 1, \dots, R, \end{aligned}$ (1) where $F (x) = P (X \leq x)$ , $p_{r} = P (Y = r)$ , and $F_{r} (x) = P (X \leq x ∣ Y = r)$ , $r = 1, \dots, R$ . Thus, testing independence between X and Y is equivalent to testing the equality of conditional distributions, which is known as the k-sample problem in the literature (see e.g., Jiang et al., Citation2015).

Recently, Cui and Zhong (Citation2019) proposed the mean variance (MV) test based on a new measure of dependence between X and Y, the MV index (Cui et al., Citation2015), to test hypothesis (Equation1(1) $\begin{aligned} H_{0} : F (x) = F_{r} (x), f o r a n y x \in R_{X} a n d r = 1, \dots, R, \\ v e r s u e & H_{1} : F (x) \neq F_{r} (x), f o r s o m e x \in R_{X} a n d r = 1, \dots, R, \end{aligned}$ (1) ). The MV index is defined as $MV (X ∣ Y) = E_{X} [{V a r}_{Y} (F (X ∣ Y))] = \sum_{r = 1}^{R} p_{r} \int {[F (x) - F_{r} (x)]}^{2} d F (x),$ where $F (x ∣ Y) = P (X \leq x ∣ Y)$ . Given ${(X_{i}, Y_{i}), i = 1, \dots, n}$ with sample size n, the MV test statistic is proposed: $n {\hat{MV}}_{n} (X ∣ Y) = n \int \sum_{r = 1}^{R} {\hat{p}}_{r} {[F_{n} (x) - F_{r n} (x)]}^{2} d F_{n} (x),$ where $F_{n} (x)$ , ${\hat{p}}_{r}$ and $F_{r n} (x)$ are the empirical counterparts of $F (x)$ , $p_{r}$ and $F_{r} (x)$ , respectively. An important theoretical finding of Cui and Zhong (Citation2019) is that when the number of categories of Y is allowed to diverge with the sample size, the standardized MV test statistic is a standard normal distribution. Cui and Zhong (Citation2019) has argued many appealing merits of this finding. For instance, this makes it convenient for obtaining any critical value of the MV test by using an approximated normal distribution when R is large.

For any fixed $x \in R_{X}$ , dividing MV test statistic's integrand by $F_{n} (x) (1 - F_{n} (x))$ leads to the Pearson chi-square test statistic (2) $\begin{aligned} χ_{n}^{2} (x) & = n \sum_{r = 1}^{R} {\hat{p}}_{r} \frac{{[F_{n} (x) - F_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} \end{aligned}$ (2) (3) $\begin{aligned} = \sum_{r = 1}^{R} \sum_{l = 1}^{2} \frac{{(n_{l r} (x) n - n_{l +} (x) n_{+ r})}^{2}}{n_{l +} (x) n_{+ r} n}, \end{aligned}$ (3) which is widely used in practice to test independence between the indicator function $I (X \leq x)$ and Y. Here $n_{l r} (x)$ ( $l = 1, 2, r = 1, \dots, R$ ) are the counts in a $2 \times R$ contingency table (Table ) determined in the following way $\begin{aligned} n_{1 r} (x) = | {(X_{i}, Y_{i}) : X_{i} \leq x a n d Y_{i} = r} |, f o r r = 1, \dots, R, \\ n_{2 r} (x) = | {(X_{i}, Y_{i}) : X_{i} > x a n d Y_{i} = r} |, f o r r = 1, \dots, R, \end{aligned}$ where $| A |$ denotes the cardinality of a set A, and $n_{l +} (x) = \sum_{r = 1}^{R} n_{l r} (x)$ , $n_{+ r} = \sum_{l = 1}^{2} n_{l r} (x)$ , for $l = 1, 2, r = 1, \dots, R$ . As the Pearson chi-square test is more widely used in testing independence, we can imitate the MV test statistic to take the integral of $χ_{n}^{2} (x)$ with respect to $F_{n} (x)$ , and propose the following test statistic: (4) $\begin{aligned} n {\hat{IPC}}_{n} (X, Y) & = \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{l = 1}^{2} \frac{{(n_{l r} (X_{i}) n - n_{l +} (X_{i}) n_{+ r})}^{2}}{n_{l +} (X_{i}) n_{+ r} n} \\ = n \sum_{r = 1}^{R} {\hat{p}}_{r} \int \frac{{[F_{n} (x) - F_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} d F_{n} (x) . \end{aligned}$ (4) We call ${\hat{IPC}}_{n} (X, Y)$ as the integral Pearson chi-squared (IPC) statistic, and $n {\hat{IPC}}_{n} (X, Y)$ as the IPC test statistic.

It is not difficult to see that the IPC test statistic is essentially a reestablishment of the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (Citation1987). The reader is referred to He et al. (Citation2019) and Ma et al. (Citation2022) for some recent work on this statistics. The asymptotic null distribution of the IPC test statistic when R is fixed was established in Scholz and Stephens (Citation1987). The promising performance of the k-sample Anderson Darling statistic (IPC test statistic) has been verified by many subsequent works in the literature and a variety of applications in practice. However, to our best knowledge, its theoretical property when the number of categories of Y is diverging remains unknown. The main goal of this paper is to fill in gaps in this area. In analogy to the MV test, we find that the IPC test also enjoys an appealing property, that is, the asymptotic null distribution of the standardized IPC test statistic when R is diverging is a standard normal distribution. This important theoretical finding allows the IPC test to share many distinguished merits with the MV test. Our work, together with Cui and Zhong (Citation2019), establishes a solid theoretical foundation and empirical evidence for independence testing between a continuous variable and a categorical variable with a diverging number of categories. As an application of such a theoretical finding, we also extend the IPC test to test independence between two continuous random variables. The approach is carried out by slicing one of the variables on its support to get a categorical variable, and then the IPC test can be applied. We allow the slicing scheme to be finer as the sample size increases, which ensures us to obtain a satisfactory test power. Slicing technique is widely used across many statistical fields, such as feature screening (Mai & Zou, Citation2015b; Yan et al., Citation2018; Zhong et al., Citation2021) and k-sample test (Jiang et al., Citation2015). It has also been used for testing independence. For instance, it is commonly seen in practice to slice two univariate variables into categorical variables and apply Pearson chi-squared test to test their independence. Please refer to Zhang et al. (Citation2022) for more recent development of sliced independence test. Our research enriches the application of the slicing skill in the field of independence testing. The proposed approach also provides a computationally tractable way to compute the p-value efficiently. Simulation studies show that the proposed test has satisfactory test power in many scenarios.

Table 1. Empirical bivariate distribution for a fixed x.

Display Table

The rest of the paper is organized as follows. Section 2 introduces some preliminaries of the IPC test. Section 3 presents the main results, including the asymptotic null distribution of the test statistic when R is diverging with the sample size. Simulation studies of the proposed test and a real data application are included in Section 4. Section 5 concludes the paper. Due to the limited space, all the technical proofs of theorems are given in Appendix.

2. Preliminaries

Let X be a continuous random variable with support $R_{X}$ , $Y \in {1, \dots, R}$ be a categorical variable with R categories. Motivated by the IPC statistic in (Equation4(4) $\begin{aligned} n {\hat{IPC}}_{n} (X, Y) & = \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{l = 1}^{2} \frac{{(n_{l r} (X_{i}) n - n_{l +} (X_{i}) n_{+ r})}^{2}}{n_{l +} (X_{i}) n_{+ r} n} \\ = n \sum_{r = 1}^{R} {\hat{p}}_{r} \int \frac{{[F_{n} (x) - F_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} d F_{n} (x) . \end{aligned}$ (4) ), we define the following IPC index between X and Y. (5) $IPC (X, Y) = \sum_{r = 1}^{R} p_{r} \int_{R_{X}} \frac{{[F (x) - F_{r} (x)]}^{2}}{F (x) (1 - F (x))} d F (x) .$ (5) The IPC statistic is a natural estimator of the IPC index. Note that the $n_{l +} (X_{i})$ in the denominator of the right-hand side of the first equality of (Equation4(4) $\begin{aligned} n {\hat{IPC}}_{n} (X, Y) & = \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{l = 1}^{2} \frac{{(n_{l r} (X_{i}) n - n_{l +} (X_{i}) n_{+ r})}^{2}}{n_{l +} (X_{i}) n_{+ r} n} \\ = n \sum_{r = 1}^{R} {\hat{p}}_{r} \int \frac{{[F_{n} (x) - F_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} d F_{n} (x) . \end{aligned}$ (4) ) will take zero when $X_{i}$ is the largest or smallest one among all ${X_{i}}_{i = 1}^{n}$ . A solution is to follow Mai and Zou (Citation2015a) and consider the Winsorized empirical CDF ${\tilde{F}}_{n} (x) = {\begin{cases} b, & i f F_{n} (x) \geq b; \\ F_{n} (x), & i f a < F_{n} (x) < b; \\ a, & i f F_{n} (x) \leq a \end{cases}$ at a predefined pair of number $(a, b)$ . The Winsorization will cause bias in estimating the IPC index. Though such bias can automatically vanish if we let $a \to 0$ and $b \to 1$ as $n \to \infty$ . However, how to properly choose a and b is beyond the scope of this paper. At the same time we notice that, if $X_{i}$ is the largest or smallest one, the numerator of the first equality of (Equation4(4) $\begin{aligned} n {\hat{IPC}}_{n} (X, Y) & = \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{l = 1}^{2} \frac{{(n_{l r} (X_{i}) n - n_{l +} (X_{i}) n_{+ r})}^{2}}{n_{l +} (X_{i}) n_{+ r} n} \\ = n \sum_{r = 1}^{R} {\hat{p}}_{r} \int \frac{{[F_{n} (x) - F_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} d F_{n} (x) . \end{aligned}$ (4) ) will also take zero. Therefore, we hereafter denote $0 / 0 = 0$ following the common practice in the literature (see for example, He et al., Citation2019; Ma et al., Citation2022) to avoid confusion. Then we have the following lemmas.

Lemma 2.1

Let $Y \in {1, \dots, R}$ be a categorical variable with R categories and X a continuous variable with support $R_{X}$ , (6) ${\hat{IPC}}_{n} (X, Y) \overset{P}{\to} IPC (X, Y),$ (6) as $n \to \infty$ .

Lemma 2.1 shows that ${\hat{IPC}}_{n} (X, Y)$ is a consistent estimate of the IPC index.

Lemma 2.2

$0 \leq IPC (X, Y) < 1$ and $IPC (X, Y) = 0$ if and only if X and Y are independent.

According to Lemma 2.2, the IPC index is an effective measure of dependence between a continuous variable and a categorical variable. Thus we can construct test of independence via the IPC statistic.

Let $T_{n} = n {\hat{IPC}}_{n} (X, Y)$ . Note that $T_{n}$ is essentially the k-sample Anderson Darling test statistic proposed by Scholz and Stephens (Citation1987), and then we can directly derive the asymptotic null distribution of $T_{n}$ .

Theorem 2.3

Suppose X is a continuous random variable and Y is a categorical random variable with a fixed class number R. Under $H_{0}$ , (7) $T_{n} = n {\hat{IPC}}_{n} (X, Y) \overset{d}{\to} \sum_{j = 1}^{\infty} \frac{1}{j (j + 1)} χ_{j}^{2} (R - 1),$ (7) where $χ_{j}^{2} (R - 1)$ 's, $j = 1, 2, \dots$ , are identically and independent distributed (i.i.d.) $χ^{2}$ random variables with R−1 degree of freedom, and $\overset{d}{\to}$ denotes the convergence in distribution.

Though Theorem 2.3 gives an explicit form of the asymptotic null distribution, the exact distribution of $\sum_{j = 1}^{\infty} [j (j + 1)]^{- 1} χ_{j}^{2} (R - 1)$ is not accessible since it is a summation of infinitely many chi-square random variables. To address this issue, a widely adopted approach is to approximate $\sum_{j = 1}^{\infty} \frac{χ_{j}^{2} (R - 1)}{j (j + 1)}$ by $D_{N} + (R - 1) / (N + 1)$ for a sufficiently large N, where $D_{N} = \sum_{j = 1}^{N} \frac{χ_{j}^{2} (R - 1)}{j (j + 1)}$ , and $\frac{R - 1}{N + 1}$ is the expectation of $\sum_{j = N + 1}^{\infty} \frac{1}{j (j + 1)} χ_{j}^{2} (R - 1)$ . However, as a chi-square type mixture, $D_{N}$ 's cumulative distribution function does not have a known closed form. In practice, we usually generate many samples from $D_{N}$ and then use the empirical distribution as a surrogate of the true distribution. We can also use permutation test or bootstrap to compute the p-value for the IPC test. However, though these numerical methods are valid, they do make the IPC test less convenient for independence testing.

Lemma 2.1 declares that ${\hat{IPC}}_{n} (X, Y)$ converges in probability to $IPC (X, Y)$ , which is a new result not discussed in Scholz and Stephens (Citation1987). Furthermore, we have a better result about the convergence rate.

Theorem 2.4

Under the conditions of Lemma 2.1, for any $ϵ > 0$ , (8) $P (| {\hat{IPC}}_{n} (X, Y) - IPC (X, Y) | > ϵ) \leq C_{1} n R \exp (- C_{2} n ϵ^{2} / R^{2}) \to 0,$ (8) as $n \to 0$ . Here $C_{1}$ is a positive constant, and $C_{2} > 0$ depends only on $min_{1 \leq r \leq R} p_{r}$ .

Theorem 2.4 follows directly from Theorem 3.2 in Section 3.1. The probability inequality in (Equation8(8) $P (| {\hat{IPC}}_{n} (X, Y) - IPC (X, Y) | > ϵ) \leq C_{1} n R \exp (- C_{2} n ϵ^{2} / R^{2}) \to 0,$ (8) ) allows us to give a lower bound of the power of the test with finite sample size. In specific, according to Theorem 2.3, we compute the critical value $C_{α}$ for a given significance level $α > 0$ . Then under $H_{1}$ , the power is $\begin{aligned} P (T_{n} \geq C_{α} | H_{1}) & = 1 - P ({\hat{IPC}}_{n} (X, Y) < \frac{C_{α}}{n} | H_{1}) \\ = 1 - P (IPC (X, Y) - {\hat{IPC}}_{n} (X, Y) > IPC (X, Y) - \frac{C_{α}}{n} | H_{1}) \\ \geq 1 - P (| IPC (X, Y) - {\hat{IPC}}_{n} (X, Y) | > IPC (X, Y) - \frac{C_{α}}{n} | H_{1}) \\ \geq 1 - C_{1} n R \exp {- C_{2} n {(IPC (X, Y) - \frac{C_{α}}{n})}^{2} / R^{2}} . \end{aligned}$ According to Lemma 2.2, we have $IPC (X, Y) > 0$ under $H_{1}$ . Therefore, the power of the test converges to 1 as the sample size increases to infinity. In other words, this ensures that the IPC test of independence is a consistent test.

We would like to conclude this section by introducing two relevant recent work in the literature on IPC index. The application of the dependence measure in marginal feature screening has received increasing attention. Recently, He et al. (Citation2019) proposed a novel feature screening procedure based on the IPC index (which they referred to as the AD index) for ultrahigh-dimensional discriminant analysis where the response is a categorical variable with a fixed number of classes. The theoretical guarantee of the IPC statistic in He et al. (Citation2019) has focused primarily on concentration inequality, rather than the asymptotic distribution. They showed that the proposed screening method is more competitive than many other existing methods. The promising numerical performance of He et al. (Citation2019)'s method soon inspired subsequent work. Later, Ma et al. (Citation2022) extended He et al. (Citation2019)'s work with the help of slicing technique, and proposed an IPC index-based screening procedure which can handle many types of response variable, including continuous variable, categorical variable and discrete variable taking finite or infinite values. Especially, the slicing technique used in Ma et al. (Citation2022) is further considered in this article to develop method for testing independence between two continuous random variables. The details are postponed in Section 3.2.

3. Main results

In this section, we allow the number of categories of Y to approach infinity with the sample size n, and consider the properties of the IPC test. Research on the categorical variable with a diverging number of categories has received increasing attention in the literature. For instance, Cui et al. (Citation2015) established the sure screening property of the MV index for discriminant analysis with a diverging number of response classes. In their setting, they allow the number of categories R to approach infinity at a slow rate of n. And Ni and Fang (Citation2016) also proposed an entropy-based feature screening for ultrahigh dimensional multiclass classification allowing the number of response classes to diverge. Readers are also referred to Ni et al. (Citation2017), Yan et al. (Citation2018), Ni et al. (Citation2020) and Ma et al. (Citation2022), among others, for more examples.

Here, we emphasize that it is also important to study test of independence between a continuous variable and a categorical variable with a diverging number of categories. One of its applications is to provide a feasible approach for testing independence between a continuous variable and a categorical variable taking infinite values. To be specific, suppose Y is a categorical variable taking infinite values (e.g., Poisson variable) and X is a continuous variable. To test independence between X and Y, we can define a new variable $Y^{'} = Y \land R$ for some R, where $a \land b = min (a, b)$ . The IPC test is then applied to test independence between X and $Y^{'}$ , which gives us important information about whether X and Y are independent. Then a natural question is how to choose an appropriate R. A reasonable approach is to allow R to go to infinity with the sample size n so as to obtain satisfactory test power. This is one of the reasons that motivates us to study the asymptotic properties of the IPC statistic when R is diverging.

3.1. Asymptotic properties when R is diverging

In the following, we establish the large sample properties of the IPC statistic when R is diverging with the sample size n. To avoid any ambiguity, in Section 3.1, we actually consider a sequence of problems indexed by k, $k = 1, 2, \dots$ . For each k, $Y_{k} \in {1, \dots, R_{k}}$ denotes the categorical variable with $R_{k}$ categories, $p_{r, k} = P (Y_{k} = r)$ , for $r = 1, \dots, R_{k}$ , $X_{k}$ denotes the continuous variable, and ${(X_{k i}, Y_{k i}) : i = 1, 2, \dots, n_{k}}$ is a random sample with sample size $n_{k}$ from $(X_{k}, Y_{k})$ . The following theorem shows the asymptotic normality of the standardized test statistic if $X_{k}$ and $Y_{k}$ are independent for any $k = 1, 2, \dots$ .

Theorem 3.1

Assume that $n_{k} \to \infty$ as $k \to \infty$ . Let $T_{n_{k}} = n_{k} {\hat{IPC}}_{n_{k}} (X_{k}, Y_{k})$ . If $\sqrt{R_{k}} / min_{1 \leq r \leq R_{k}} p_{r, k} = o (n_{k}^{3 / 8})$ and $R_{k} \to \infty$ as $n_{k} \to \infty$ , and $X_{k}$ and $Y_{k}$ are independent for $k = 1, 2, \dots$ , we have (9) $\frac{T_{n_{k}} - (R_{k} - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R_{k} - 1)}} \overset{d}{\to} N (0, 1),$ (9) as $k \to \infty$ .

If $min_{1 \leq r \leq R_{k}} p_{r, k} = O (n_{k}^{- γ})$ where $0 < γ < 3 / 8$ , then we derive that $R_{k} = O (n_{k}^{η})$ for some $0 < η < 3 / 4 - 2 γ$ , namely, we allow the number of categories to go to infinity with the sample size n at the relatively slow rate. Cui and Zhong (Citation2019) also gave a similar result for the MV test with R diverging.

Let $V (R) = \sum_{j = 1}^{\infty} χ_{j}^{2} (R - 1) / [j (j + 1)]$ be the asymptotic null distribution in Theorem 2.3 where R is fixed. A direct application of Theorem 3.1 is that we can use a normal distribution with mean R−1 and variance $2 (π^{2} / 3 - 3) (R - 1)$ to approximate the asymptotic null distribution of the IPC test (i.e., $V (R)$ ) when R is large. Denote $W (R) = N (R - 1, 2 (π^{2} / 3 - 3) (R - 1))$ . To gain more insight into the connection between the normal distribution $W (R)$ and $V (R)$ , one can notice that the mean and the variance of $V (R)$ are also R−1 and $2 (π^{2} / 3 - 3) (R - 1)$ , respectively. This result is a distinguished merit of the IPC test. It enables us to reduce the computational cost since it is more easy to calculate the critical value of $W (R)$ than of $V (R)$ .

To further check the validity of using $W (R)$ as a surrogate for $V (R)$ to compute the critical value of the IPC test when R is large, we compare the empirical quantiles of the IPC test statistic with the theoretical quantiles of the normal distribution $W (R)$ in (Equation9(9) $\frac{T_{n_{k}} - (R_{k} - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R_{k} - 1)}} \overset{d}{\to} N (0, 1),$ (9) ) and the asymptotic null distribution $V (R)$ in (Equation7(7) $T_{n} = n {\hat{IPC}}_{n} (X, Y) \overset{d}{\to} \sum_{j = 1}^{\infty} \frac{1}{j (j + 1)} χ_{j}^{2} (R - 1),$ (7) ). We generate $Y \in {1, \dots, R}$ with equal probabilities and X independently from $U (0, 1)$ . We consider $R = 10, 15, \dots, 35$ . For each R, let $n = 40 \times R$ , and we repeat the simulation 1000 times to obtain 1000 values of the IPC test statistic $T_{n}$ . We report the $90 %$ and $95 %$ quantiles of 1000 $T_{n}$ 's (denoted by empirical quantile in Table ), as these two quantiles are most widely used in hypothesis testing. The $90 %$ and $95 %$ quantiles of $V (R)$ (denoted by theoretical quantile 1) and $W (R)$ (denoted by theoretical quantile 2) are also computed. The results are gathered in Table . The empirical quantiles are close to the theoretical quantiles of $W (R)$ even when R = 10, which further supports our proposed method of using the approximated normal distribution to calculate the critical value of the IPC test when R is relatively large. Looking further into the results in Table , we can see that $T_{n}$ 's empirical quantiles seem to be almost systematically smaller than the quantiles of $V (R)$ (with the exception of the $95 %$ quantile when R = 35), while larger than the quantiles of $W (R)$ (both by a very small amount). Note that the asymptotic distribution $V (R)$ can be viewed as a chi-square-type mixture. Such chi-square-type mixture follows an asymmetrical, positively skewed (or right-skewed) distribution, in which the left tail is shorter while the right tail is longer. To be specific, the skewness of $V (R)$ is $E (V (R) - E V (R))^{3} / V a r (V (R))^{3 / 2} = (80 - 8 π^{2}) / {(2 π^{2} / 3 - 6)^{3 / 2} (R - 1)^{1 / 2}} > 0$ , which will tend to zero as R goes to infinity. While the normal distribution $W (R)$ is symmetric, its skewness is 0. Since $V (R)$ is a better approximation of the exact distribution of $T_{n}$ , it makes sense that the $90 %$ and $95 %$ quantiles of both the $T_{n}$ 's empirical distribution and $V (R)$ will be slightly larger than that of $W (R)$ . It is also interesting that the $T_{n}$ 's empirical quantiles fall between the quantiles of $V (R)$ and the quantiles of $W (R)$ . This may implicate that the skewness of the exact distribution of $T_{n}$ seems to be smaller than that of $V (R)$ .

Table 2. Comparison of empirical quantiles with two theoretical quantiles.

Download CSV Display Table

We further compare the empirical null distribution with $W (R)$ . Still generate $Y \in {1, \dots, R}$ with equal probabilities and X independently from $U (0, 1)$ . Consider four scenarios: (a) R = 5, $n = 100 \times R = 500$ ; (b) R = 10, $n = 80 \times R = 800$ ; (c) R = 20, $n = 40 \times R = 800$ ; (d) R = 50, $n = 30 \times R = 1500$ . We run the simulation 100000 times for each scenario to obtain 100000 values of the IPC test statistic $T_{n}$ . Then we compare the empirical distribution of the standardized IPC test statistic $[T_{n} - (R - 1)] / \sqrt{2 (π^{2} / 3 - 3) (R - 1)}$ with the standard normal distribution $N (0, 1)$ in Figure . In scenario (a) when R = 5 is too small, the empirical density curve of the standardized IPC test statistic deviates to some extent from the normal density function, even though the sample size n = 500 is large. Also, when R = 5, the empirical density is positively skewed, with more values clustered around the left tail while the right tail is slightly longer. The empirical density curve, however, is very well matched to the standard normal density curve when R increases, such as in scenario (c) when R = 20. This further emphasizes that R should be large enough (say, larger than 10) to ensure the normal approximation in Theorem 3.1 to hold.

Figure 1. Comparing the empirical distribution of the standardized IPC test statistic with the standard normal distribution. The blue broken line represents the empirical density and the black solid line represents the standard normal density. The empirical density is a kernel density estimate using Gaussian kernels based on 100000 values of $T_{n}$ . In each panel, the histogram of the standardized IPC test statistic is also displayed. (a) R = 5, n = 500. (b) R = 10, n = 800. (c) R = 20, n = 800 and (d) R = 50, n = 1500.

The following theorem allows us to bound the deviation of the IPC statistic when R is diverging, which is parallel to Theorem 3.1 in Ma et al. (Citation2022).

Theorem 3.2

Suppose $R_{k} = O (n_{k}^{η})$ for some $0 \leq η < 1 / 2$ and there exists a positive constant $c_{1}$ such that $c_{1} / R_{k} \leq p_{r, k}$ for $r = 1, \dots, R_{k}$ , $k = 1, 2, \dots$ . Then for any $ϵ \in (0, 1)$ , (10) $P (| {\hat{IPC}}_{n_{k}} (X_{k}, Y_{k}) - IPC (X_{k}, Y_{k}) | > ϵ) \leq C_{1} n_{k} R_{k} \exp (\frac{- C_{2} n_{k} ϵ^{2}}{R_{k}^{2}}),$ (10) where $C_{1}$ is a positive constant and $C_{2} > 0$ depends only on $c_{1}$ .

Remark 3.1

He et al. (Citation2019) has also established a concentration inequality for the IPC statistic. However, their theoretical guarantee relies on a fixed number of categories (i.e., $η = 0$ ). Thus, Theorem 3.2 is different to Lemma 4 in He et al. (Citation2019).

The condition $c_{1} / R_{k} \leq p_{r, k}$ for $r = 1, \dots, R_{k}$ , which is also used in Cui et al. (Citation2015) and Cui and Zhong (Citation2019), requires that the proportion of each category of $Y_{k}$ can not be too small. Indeed, the condition can be relaxed in a way that $c_{1}$ is allowed to tend to 0 at a slow rate. Specifically, if we assume $c_{1} = o (n_{k}^{- τ})$ for some $0 < τ < 1 / 2 - η$ , then the probability in (Equation10(10) $P (| {\hat{IPC}}_{n_{k}} (X_{k}, Y_{k}) - IPC (X_{k}, Y_{k}) | > ϵ) \leq C_{1} n_{k} R_{k} \exp (\frac{- C_{2} n_{k} ϵ^{2}}{R_{k}^{2}}),$ (10) ) will still converge to zero, but the convergence rate will be relatively slower. Note that Theorem 2.4 is a special case of Theorem 3.2 when $η = 0$ , i.e., $R_{k}$ is fixed, and the condition on $p_{r, k}$ is automatically satisfied.

3.2. Extension of the IPC test

A natural application of Theorem 3.1 is to extend the IPC test to test independence between two continuous variables via the slicing technique. Consider two continuous random variables X and Z. Without loss of generality, we assume that the supports of X and Z are $R$ . We define a partition of the support of Z with a given positive integer R: (11) $S = {[q_{r - 1}, q_{r}) : q_{r - 1} < q_{r}, r = 1, \dots, R},$ (11) where $q_{0} = - \infty$ , $q_{R} = \infty$ . Each interval $[q_{r - 1}, q_{r})$ is called a slice in the literature (Mai & Zou, Citation2015b; Yan et al., Citation2018). And a new random variable can be accordingly defined as $Y^{S} = r$ if and only if $q_{r - 1} \leq Z < q_{r}$ for $r = 1, \dots, R$ . The IPC test can be applied to test independence between X and $Y^{S}$ . If the distribution of Z is known, we suggest a uniform slicing to partition Z such that $q_{r} = F_{Z}^{- 1} (r / R)$ for $r = 1, \dots, R$ , where $F_{Z} (z)$ is the cumulative distribution function of Z. However, in practice, $F_{Z} (z)$ is usually unknown. But given observations ${(X_{i}, Z_{i}), i = 1, \dots, n}$ with sample size n, we can use ${\hat{q}}_{r} = {\hat{F}}_{Z}^{- 1} (r / R)$ to estimate $q_{r}$ for $r = 1, \dots, R$ , where ${\hat{F}}_{Z} (z)$ is the empirical distribution of Z. And $\hat{S} = {[{\hat{q}}_{r - 1}, {\hat{q}}_{r}), r = 1, \dots, R}$ is regarded as an intuitive uniform slicing scheme (Yan et al., Citation2018). We also define $Y_{i}^{\hat{S}} = r$ if and only if $Z_{i} \in [{\hat{q}}_{r - 1}, {\hat{q}}_{r})$ for $r = 1, \dots, R$ , $i = 1, \dots, n$ . Now, we compute ${\hat{IPC}}_{n} (X, Y^{\hat{S}})$ as ${\hat{IPC}}_{n} (X, Y^{\hat{S}}) := \sum_{r = 1}^{R} {\tilde{p}}_{r} \int \frac{{[F_{n} (x) - {\tilde{F}}_{r n} (x)]}^{2}}{F_{n} (x) (1 - F_{n} (x))} d F_{n} (x),$ where ${\tilde{p}}_{r} = \frac{1}{n} \sum_{i = 1}^{n} I (Y_{i}^{\hat{S}} = r) = 1 / R$ , and ${\tilde{F}}_{r n} (x) = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x, Y_{i}^{\hat{S}} = r) / {\tilde{p}}_{r}$ is the empirical conditional distribution of X based on the subjects for which ${\hat{q}}_{r - 1} \leq Z_{i} < {\hat{q}}_{r}$ . We reject hypothesis $H_{0} : X a n d Z a r e i n d e p e n d e n t$ , if $(n {\hat{IPC}}_{n} (X, Y^{\hat{S}}) - R + 1) / \sqrt{2 (π^{2} / 3 - 3) (R - 1)} \geq Φ^{- 1} (1 - α)$ for some given significance value $α \in (0, 1)$ , where $Φ (x)$ is the standard normal distribution function.

Obviously, it is important to choose an appropriate R for testing independence. If R is too large, then the sample size in each slice is too small, making the estimate of the IPC index inaccurate. And if R is too small, then much information of Z may be lost, making the test power poor. In the slicing literature (Mai & Zou, Citation2015b; Yan et al., Citation2018; Zhong et al., Citation2021), a common choice is to set $R = ⌊ \log n ⌋$ , where $⌊ x ⌋$ is the integer part of x. And according to Theorem 3.1, we can also choose $R < ⌊ n^{1 / 4} ⌋$ . In practice, we recommend choosing $R = ⌊ n / k ⌋$ for some $20 \leq k \leq 50$ , so that the sample size in each slice is about 20 to 50.

3.3. Comparison with the MV test

In this subsection, we would like to discuss the advantages of the IPC test compared to the MV test. As explained in Cui and Zhong (Citation2019), the MV index can be considered as the weighted average of Cram $\overset{´}{e}$ r-von Mises distances between $F_{r} (x)$ , the conditional distribution of X given Y = r, and $F (x)$ , the unconditional distribution function of X. Note that the IPC index can be viewed as a modification of the MV index by adding a weight function ${F (x) (1 - F (x))^{- 1}}$ . Such weight function is large for $F (x)$ near 0 and 1, and smaller near $F (x) = 1 / 2$ . Hence, the IPC test emphasizes more on the difference between $F_{r} (x)$ and $F (x)$ near the tail of $F (x)$ . As it is known, $F_{r} (x) - F (x) = \sum_{j = 1}^{R} p_{j} (F_{r} (x) - F_{j} (x))$ . Accordingly, the IPC test is more sensitive to tail differences among the conditional distributions. In the following, we consider the test of independence between a continuous random variable and a categorical variable with a relatively large number of classes (i.e., R is large) and the test of independence for two continuous random variables, and further illustrate the IPC test's sensitivity to differences in the tails of the conditional distributions through numerical simulations.

1. When R is large or is allowed to diverge. In this case, we recommend using a normal distribution to approximate the IPC test's null distribution due to Theorem 3.1. It is not surprising that given a large R, IPC test still retains sensitivity to tail differences when using a normal distribution instead of $V (R)$ to calculate p-value. The following example is used to illustrate this issue.

Let $Y \in {1, \dots, 20}$ with $P (Y = r) = 1 / 20$ , for $r = 1, \dots, 20$ . When Y = r, generate $X \sim B W + (1 - B) V_{r}$ , where $B \sim B i n o m i a l (1, p)$ , W and $V_{r}$ are independent, $W = N (0, 1)$ and $V_{r} = N (10 + r, 1)$ . To intuitively gain some understanding of our simulation setting, set p = 0.8. We draw the conditional distributions of X given Y = 1 and Y = 5, respectively in Figure . It is easy to see that the conditional distributions differ from each other only at their right tails. We choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We apply the IPC test and the MV test, and compute the p-values for these two tests by using their approximated normal distributions. The empirical powers of these two tests based on 500 replicates at the significance level $α = 0.05$ are presented in Table . To further validate the robustness of the IPC test against heavy-tails, we further consider $W \sim t (1)$ in the above setting. The empirical powers are also shown in Table . A larger p indicates that the differences among the conditional distributions occur in a more extreme right tail end, and thus are more difficult to detect the dependence between X and Y. We can see from Table that the IPC test is significantly more powerful than the MV test when p<0.9. When p = 0.9, neither the IPC nor the MV has sufficient statistical power to detect the dependence between X and Y. The simulation validates that the IPC test has a better power to tail differences among the conditional distributions. In Example 4.1 we will compare with other existing methods to further validate the IPC test's sensitivity towards tail differences.

Figure 2. Panel (a) shows the pair of conditional distributions. The blue solid line represents the conditional distribution of X given Y = 1, that is, $B N (0, 1) + (1 - B) N (11, 1)$ where $B \sim B i n o m i a l (1, 0.8)$ ; and the red dot-dash line represents the conditional distribution of X given Y = 5, that is, $B N (0, 1) + (1 - B) N (15, 1)$ where $B \sim B i n o m i a l (1, 0.8)$ . Panel (b) shows the corresponding conditional density functions.

Table 3. Test of independence between a continuous variable and a categorical variable with R = 20 classes.

Display Table

2. Testing independence between continuous random variables. We follow the notation in Section 3.2. Let X and Z be two continuous random variables. It is natural to expect that the IPC test will be more powerful than the MV test to detect the tail differences among the conditional distribution of X given Z. Consider a straightforward extension of the IPC index in (Equation5(5) $IPC (X, Y) = \sum_{r = 1}^{R} p_{r} \int_{R_{X}} \frac{{[F (x) - F_{r} (x)]}^{2}}{F (x) (1 - F (x))} d F (x) .$ (5) ) and define the following index between X and Z: (12) $IPC (X, Z) = \int \int \frac{{[F (x ∣ Z = z) - F (x)]}^{2}}{F (x) (1 - F (x))} d F (x) d F_{Z} (z),$ (12) where $F (\cdot ∣ Z = z)$ is the conditional distribution of X given Z = z, and $F (x)$ and $F_{Z} (z)$ are the distributions of X and Z, respectively. Given a positive integer R and a corresponding uniform slicing scheme $S$ defined as in (Equation11(11) $S = {[q_{r - 1}, q_{r}) : q_{r - 1} < q_{r}, r = 1, \dots, R},$ (11) ) with $q_{r} = F_{Z}^{- 1} (r / R)$ for $r = 1, \dots, R$ , recall that $Y^{S} = r$ if and only if $q_{r - 1} \leq Z < q_{r}$ . Under certain mild conditions, Ma et al. (Citation2022) has shown that $IPC (X, Y^{S}) \to IPC (X, Z)$ , as $R \to \infty$ .

From (Equation12(12) $IPC (X, Z) = \int \int \frac{{[F (x ∣ Z = z) - F (x)]}^{2}}{F (x) (1 - F (x))} d F (x) d F_{Z} (z),$ (12) ), again, we have some insights that the IPC test of independence emphasizes more on the difference between $F (x ∣ Z = z)$ and $F (x)$ near the tail of $F (x)$ . We use a toy sample to further illustrate this issue. Generate $Z \sim U n i f (4, 6)$ , and generate $X = B W + 5 (1 - B) Z$ , where $B \sim B i n o m i a l (1, p)$ . We still consider two settings of W: (i) $W \sim N (0, 1)$ and (ii) $W \sim t (1)$ . Choose the sample size n = 400, and p = 0.7, 0.75, 0.8, 0.85, 0.9. We follow the step in Section 3.2 and choose R = 20 to conduct the test of independence. Table presents the empirical powers of IPC and MV tests based on 500 replicates at the significance level $α = 0.05$ . IPC test outperforms the MV test in these settings. Note that when p = 0.8, the MV test is almost invalid. However, the IPC test still has a reasonably acceptable power.

Table 4. Test of independence between two continuous random variables.

Display Table

4. Numerical studies and data application

4.1. Numerical studies

In this section, we assess the finite-sample performance of the IPC test by comparing with some powerful methods proposed in recent years: the MV test (Cui & Zhong, Citation2019), the distance correlation (DC) test (Székely et al., Citation2007), the HHG test (Heller et al., Citation2012, Citation2016) and the Hilbert-Schmidt independence criterion (HSIC) test (Gretton et al., Citation2005, Citation2007; Pfister et al., Citation2018). The R packages energy, HHG, and dHSIC are used to implement the DC test, the HHG test and the HSIC test, respectively. Note that the DC test can not be directly applied to a categorical variable, so in our simulations we will transfer a categorical variable with R categories into a random vector with R−1 binary dummy variables and apply dcov.test to this dummy vector instead of the original data. For the DC, HHG, and HSIC tests, the permutation test with K = 200 is used to calculate the p-value.

Example 4.1

In this example, we evaluate the performance of IPC test for the large-R case. Let R = 15, and we consider the following two cases.

Model 1.1. Generate $Y \in {1, \dots, 15}$ with equal probabilities. And let $μ = (μ_{1}, \dots, μ_{15})$ , where $μ_{5 j + l} = l + 1$ for $1 \leq l \leq 3$ , and $μ_{5 j + l} = l + 2$ for l = 4, 5, j = 0, 1, 2. For Y = r, generate $X = B U + (1 - B) (V_{μ_{r}} + 20)$ , where $B \sim B i n o m i a l (1, p)$ , $U \sim U n i f (- 20, 20)$ , $V_{μ_{r}} \sim B e t a (3, μ_{r})$ .

Model 1.2. Generate $Y \sim U n i f (0, 4)$ . And let $X \sim B U + (1 - B) W$ , where $W \sim U n i f (\cos (Y π) + 21, \cos (Y π) + 24)$ . B, U are the same as in Model 1.1.

Let n = 400. In Model 1.2, we uniformly slice Y into a categorical variable with R = 15 classes in order to apply the IPC and MV tests. Let p vary from 0 to 1 in both two models. We compute the p-value for the IPC test by using the asymptotic distribution in Theorem 3.1. The empirical power of each test based on 500 simulations at the significance level $α = 0.05$ is shown in Figure . Note that, when p = 1, X is independent with Y in both models. We deliberately report the results, i.e., the type I error rates of each test, in Table . The type I error rates of the IPC test (and other tests) are close to the nominal significance level $α = 0.05$ , which further supports Theorem 3.1. Figure clearly shows that the IPC test outperforms other competitors. And the power differences between IPC test and MV test exceed 0.25 when p = 0.6 for both models.

Figure 3. Comparison of powers of several tests of independence against different p in Example 4.1. In each case, 500 simulations are used to estimate the power. (a) Model 1.1 and (b) Model 1.2.

Table 5. Empirical type I error rates at the significance level $α = 0.05$ in Example 4.1.

Display Table

Looking further into the models considered in this example. In both Model 1.1 and Model 1.2, the conditional distributions of X given Y differ from each other only in their right tails when p>0.5. A larger p indicates that the conditional distribution functions differ from each other in a more extreme tail end. And when p = 1, X and Y are independent. Thus it could be more difficult to detect the dependence between X and Y for a larger p<1. As a result, we can see from Figure that the power of each test decreases with the growth of p. Among the tests considered, the DC test and the HSIC test perform the worst in both models. Their powers rapidly decrease to near 0 when p increases to 0.4. It can be seen that the IPC test and the MV test have a better performance compared to other tests. Furthermore, the IPC test has a significant higher power than the MV test when p is between 0.6 and 0.8 in both models. This further supports our observation in Section 3.3 that the IPC test is more sensitive to tail differences.

Example 4.2

This example considers a Poisson regression model. Let $Z \sim P o i s s o n (u)$ , where $u = \exp (0.8 X_{1} - 0.8 X_{2} + \log 4)$ , $(X_{1}, X_{2}) \sim N ((0, 1)^{⊤}, Σ)$ , $Σ = ({0.5}^{| i - j |})_{1 \leq i, j \leq 2}$ . Let Y = Z if $Y \leq 8$ ; otherwise Y = 9. As a consequence, Y is a 10-categories variable. Consider $n = 100, 150, \dots, 300$ . We apply the testing methods to test independence between Y and $X_{1}$ , Y and $X_{2}$ , respectively. And the asymptotic normal distribution in Theorem 3.1 is used to compute p-value for the IPC test. The empirical powers of each test based on 500 replications are summarized in Table . The IPC test has most excellent power performances in all settings. The HHG test and the HSIC test perform poorly when the sample size $n \leq 150$ .

Table 6. Empirical powers of each test at the significance level $α = 0.05$ against the sample sizes in Example 4.2.

Display Table

The power of the IPC test is only slightly higher than that of the MV test. However, it is significantly higher than that of HHG and HSIC. The DC test has moderate performance, inferior to the MV test, but better than HSIC.

Example 4.3

In this example, we evaluate the power of the IPC test in testing independence between continuous variables. Simulations are carried out with sample size n = 400. We choose R = 15 to implement the IPC test. Generating $Z \sim U n i f (- 2, 2)$ , the following alternatives are considered.

Linear: $X = Z / 2 + 12 γ ϵ$ , where γ is a noise parameter ranging from 0 to 1, and $ϵ \sim U n i f (- 2, 2)$ is independent of Z.
Quadratic: $X = (\frac{1}{2} Z)^{2} + 4.5 γ ϵ$ .
Step function: $X = f (Z) + 25 γ ϵ$ , where f takes value 2 in interval $[- 2, - 1) \cup [0, 1)$ and value $- 2$ in $[- 1, 0) \cup [1, 2]$ .
W-shaped: $X = | Z + 1 | I (Z < 0) + | Z - 1 | I (Z \geq 0) + 4 γ ϵ$ .
Sinusoid: $X = \cos (4 π Z) + 5 γ ϵ$ .
Ellipse: $X = \sqrt{1 - (Z / 2)^{2}} + 1.5 γ ϵ$ .

To conduct the IPC test and the MV test, we uniformly slice Z into a categorical variable Y with R = 15 classes. The choices of the coefficients in all of the above are to make sure that a full range of powers can be observed when γ varies from 0 to 1. In addition to the test methods mentioned before, in this example, we further consider a comparison with a new test, the modified Blum-Kiefer-Rosenblatt (MBKR) test (Zhou & Zhu, Citation2018) which is applied for testing independence between continuous variables. Figure presents the empirical power of each test based on 500 simulations at the significance level $α = 0.05$ . We see from the figure that the IPC test performs quite excellent when the relationship has an oscillatory nature (the W-shaped and the sinusoid). It is also better than other competitors for the step function, and comparably well to the MBKR test for the quadratic function. However, the IPC test has poor performance compared to other tests for some smooth alternatives: the linear and the ellipse. For the linear function, the MBKR test has the highest performance. IPC test has comparable performance to HSIC. For the ellipse function, HHG test has the highest power and DC test performs the poorest. The performance of the IPC test, on the other hand, is moderate.

Figure 4. Comparison of powers of several tests of independence in Example 4.3. The noise level increases from left to right. In each case, 500 simulations are used to estimate the power of each test. (a) Linear. (b) Quadratic. (c) Step function. (d) W-shaped. (e) Sinusoid and (f) Ellipse.

We give an intuitive explanation here for the excellent performance of the IPC test in detecting oscillatory relationships. Denote $X ∣ Y = r$ as the random variable which follows the conditional distribution of X given Y = r. By simple calculation, we find that if X and Z have an oscillatory relationship, then the variances of $X ∣ Y = r$ differ from each other more significantly. As a comparison, if X and Z have a linear relationship, then $V a r {X ∣ Y = 1} = \dots = V a r {X ∣ Y = 15}$ . Consequently, the IPC test has a higher test power when there is an oscillatory relationship between X and Z.

4.2. Real data application

Example 4.4

We consider a data set from AIDS Clinical Trials Group Protocol 175 (ACTG175), which is available from the R package speff2trial. Many researchers have studied this data set, such as Tsiatis et al. (Citation2008), Zhang et al. (Citation2008), Lu et al. (Citation2013) and Zhou et al. (Citation2020). The data set contains 2139 HIV-infected subjects. And all the subjects were randomized to four different treatment groups with equal probability: zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, and ddI monotherapy. In addition to the treatment indicators indicating which group each subject was assigned to, the data contains many other important variables, such as the CD4 count at $20 \pm 5$ weeks post-baseline (CD420), the CD4 count at baseline (CD40), the history of intravenous drug use, et al.

In this study, in order to get more elaborated results, we only consider the subjects under ZDV+zalcitabine groups (524 subjects) in the following analysis. The goal of our study is to check whether the treatment effect under ZDV + zalcitabine groups is dependent on some other covariates. Following Hammer et al. (Citation1996) and Tsiatis et al. (Citation2008), we use the change from baseline to $20 \pm 5$ weeks in CD4 cell count, i.e., CD420−CD40, to measure the treatment effect. And the covariates of interest are listed below: history of intravenous drug use ( $0 =$ no, $1 =$ yes), gender ( $0 =$ female, $1 =$ male), antiretroviral history ( $0 =$ naive, $1 =$ experienced), age, and CD8 count at baseline (CD80). Thus the first three covariates are categorical, and the last two are continuous covariates. Let $X = C D 420 - C D 40$ , and then there are 5 candidates Y. The null hypotheses are listed as follows.

$H_{0}^{1}$ : X is independent of Y with $Y =$ history of intravenous drug use;
$H_{0}^{2}$ : X is independent of Y with $Y =$ gender;
$H_{0}^{3}$ : X is independent of Y with $Y =$ antiretroviral history;
$H_{0}^{4}$ : X is independent of Y with $Y =$ age;
$H_{0}^{5}$ : X is independent of Y with $Y =$ CD8 count at baseline.

We apply the IPC, MV, DC, HHG and HSIC tests to these five hypotheses. The permutation test with K = 1000 permutated times is used for DC, HHG and HSIC tests to compute the p-values. And for $H_{0}^{4}$ and $H_{0}^{5}$ , we follow the approach in Section 3.2 to slice Y into a categorical variable with 15 classes to implement the IPC test and MV test. Table summarizes the p-values of each test. If we only consider the significance level $α = 0.05$ , then we observe that all the tests reject $H_{0}^{3}$ , $H_{0}^{4}$ and $H_{0}^{5}$ , and accept $H_{0}^{2}$ . That is, the treatment effect under the ZDV+zalcitabine group depends on antiretroviral history, age and CD80, but not on gender. Regarding the history of intravenous drug use, the IPC, DC, HHG and HSIC tests declare statistical dependence between this and the treatment effect. However, the MV test has a p-value larger than 0.05, and thus it can not reject $H_{0}^{1}$ . We draw the empirical conditional distributions of X given Y = 0 and 1 as well as the side-by-side boxplots in Figure , where $Y = h i s t o r y o f i n t r a v e n o u s d r u g u s e$ . We see that the conditional distributions of X are different across different Y. However, the difference is relatively small and mainly occurs in the right tails. According to the discussion in Section 3.3, IPC test will be more powerful in such case. Also, the categories of Y are very unbalanced with $# {Y = 0} = 448$ and $# {Y = 1} = 76$ , making the MV test more difficult to detect the dependence between X and Y.

Figure 5. The left panel shows the empirical conditional distributions of $C D 420 - C D 40$ given Y = 0 and Y = 1. And the right panel shows the side-by-side boxplots of $C D 420 - C D 40$ against Y = 0 and Y = 1. Here $Y =$ history of intravenous drug use.

Table 7. The p-values of each test in Example 4.4.

Download CSV Display Table

5. Discussion

In this paper, we studied the IPC test of independence between a continuous variable X and a categorical variable Y. When the number of categories of Y is fixed, the IPC test statistic is in essence the k-sample Anderson Darling test statistic, and its theoretical properties were studied in Scholz and Stephens (Citation1987). Our work mainly focused on two aspects. First, we derived the convergence rate of the IPC statistic to the IPC index and thus a lower bound of the power of the test at a given significance level with a finite sample size could be derived. Second, we showed that the standardized test statistic has an asymptotic normal distribution when the number of categories R diverges to infinity with the sample size. A distinguished merit is thereby shared by the IPC test, that is, its critical values can be easily obtained by using an approximated normal distribution when R is relatively large. As an application, we extended the IPC test to test independence between two continuous random variables. We uniformly slice a continuous variable into a discrete variable in order to apply the IPC test. And by allowing more slices as the sample size increases, the IPC test is allowed to gain more test power. The proposed test was compared to the DC test, HHG test, HSIC test and MV test on many simulation experiments. The results showed that the IPC test has a better performance in many scenarios. It is also possible to consider more different slicing schemes for independence testing of continuous variables. We left it for further research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by National Natural Science Foundation of China [Grant numbers 12271286, 11931001 and 11771241].

References

Csörgö, S. (1985). Testing for independence by the empirical characteristic function. Journal of Multivariate Analysis, 16(3), 290–299. https://doi.org/10.1016/0047-259X(85)90022-3
Web of Science ®Google Scholar
Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. https://doi.org/10.1080/01621459.2014.920256
PubMed Web of Science ®Google Scholar
Cui, H., & Zhong, W. (2018). A distribution-free test of independence and its application to variable selection. Available at arXiv:1801.10559.
Google Scholar
Cui, H., & Zhong, W. (2019). A distribution-free test of independence based on mean variance index. Computational Statistics & Data Analysis, 139, 117–133. https://doi.org/10.1016/j.csda.2019.05.004
Web of Science ®Google Scholar
Dvoretzky, A., Kiefer, J., & Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27(3), 642–669. https://doi.org/10.1214/aoms/1177728174
Google Scholar
Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In S. Jain, H. U. Simon, & E. Tomita (Eds.), Algorithmic learning theory (pp. 63–77). Springer Berlin Heidelberg.
Google Scholar
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., & Smola, A. J. (2007). A kernel statistical test of independence. In Proceedings of the 20th International Conference on Neural Information Processing Systems (pp 585–592). Curran Associates Inc. NIPS'07.
Google Scholar
Hall, P., & Heyde, C. C (1980). Martingale limit theory and its application, Probability and mathematical statistics, Inc, Academic Press [Harcourt Brace Jovanovich, Publishers].
Google Scholar
Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., Hirsch, M. S., & Merigan, T. C. (1996). A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine, 335(15), 1081–1090. https://doi.org/10.1056/NEJM199610103351501
PubMed Web of Science ®Google Scholar
He, S., Ma, S., & Xu, W. (2019). A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Computational Statistics & Data Analysis, 137, 155–169. https://doi.org/10.1016/j.csda.2019.02.003
Web of Science ®Google Scholar
Heller, R., Heller, Y., & Gorfine, M. (2012). A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2), 503–510. https://doi.org/10.1093/biomet/ass070
Web of Science ®Google Scholar
Heller, R., Heller, Y., Kaufman, S., Brill, B., & Gorfine, M. (2016). Consistent distribution-free k-sample and independence tests for univariate random variables. Journal of Machine Learning Research, 17(29), 1–54.
Google Scholar
Hoeffding, W. (1948). A non-parametric test of independence. The Annals of Mathematical Statistics, 19(4), 546–557. https://doi.org/10.1214/aoms/1177730150
Google Scholar
Jiang, B., Ye, C., & Liu, J. S. (2015). Nonparametric k-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510), 642–653. https://doi.org/10.1080/01621459.2014.920257
Web of Science ®Google Scholar
Lu, W., Zhang, H. H., & Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5), 493–504. https://doi.org/10.1177/0962280211428383
PubMed Web of Science ®Google Scholar
Ma, W., Xiao, J., Yang, Y., & Ye, F. (2022). Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index. Journal of Statistical Computation and Simulation, 92(15), 3222–3248. https://doi.org/10.1080/00949655.2022.2062358
Web of Science ®Google Scholar
Mai, Q., & Zou, H. (2015a). Sparse semiparametric discriminant analysis. Journal of Multivariate Analysis, 135, 175–188. https://doi.org/10.1016/j.jmva.2014.12.009
Web of Science ®Google Scholar
Mai, Q., & Zou, H. (2015b). The fused Kolmogorov filter: A nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. https://doi.org/10.1214/14-AOS1303
Web of Science ®Google Scholar
Ni, L., & Fang, F. (2016). Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification. Journal of Nonparametric Statistics, 28(3), 515–530. https://doi.org/10.1080/10485252.2016.1167206
Web of Science ®Google Scholar
Ni, L., Fang, F., & Shao, J. (2020). Feature screening for ultrahigh dimensional categorical data with covariates missing at random. Computational Statistics & Data Analysis, 142, Article 106824. https://doi.org/10.1016/j.csda.2019.106824
Web of Science ®Google Scholar
Ni, L., Fang, F., & Wan, F. (2017). Adjusted Pearson chi-square feature screening for multi-classification with ultrahigh dimensional data. Metrika, 80(6–8), 805–828. https://doi.org/10.1007/s00184-017-0629-9
Web of Science ®Google Scholar
Pfister, N., Bühlmann, P., Schölkopf, B., & Peters, J. (2018). Kernel-based tests for joint independence. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 80(1), 5–31. https://doi.org/10.1111/rssb.12235
Web of Science ®Google Scholar
Rosenblatt, M. (1975). A quadratic measure of deviation of two-dimensional density estimates and a test of independence. The Annals of Statistics, 3(1), 1–14. https://doi.org/10.1214/aos/1176342996
Web of Science ®Google Scholar
Scholz, F.-W., & Stephens, M. A. (1987). k-sample Anderson–Darling tests. Journal of the American Statistical Association, 82(399), 918–924. https://doi.org/10.2307/2288805
Web of Science ®Google Scholar
Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265. https://doi.org/10.1214/09-AOAS312
Web of Science ®Google Scholar
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794. https://doi.org/10.1214/009053607000000505
Web of Science ®Google Scholar
Tsiatis, A. A., Davidian, M., Zhang, M., & Lu, X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in Medicine, 27(23), 4658–4677. https://doi.org/10.1002/sim.3113
PubMed Web of Science ®Google Scholar
Xu, K., Shen, Z., Huang, X., & Cheng, Q. (2020). Projection correlation between scalar and vector variables and its use in feature screening with multi-response data. Journal of Statistical Computation and Simulation, 90(11), 1923–1942. https://doi.org/10.1080/00949655.2020.1753057
Web of Science ®Google Scholar
Yan, X., Tang, N., Xie, J., Ding, X., & Wang, Z. (2018). Fused mean-variance filter for feature screening. Computational Statistics & Data Analysis, 122, 18–32. https://doi.org/10.1016/j.csda.2017.10.008
Web of Science ®Google Scholar
Zhang, M., Tsiatis, A. A., & Davidian, M. (2008). Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics, 64(3), 707–715. https://doi.org/10.1111/j.1541-0420.2007.00976.x
PubMed Web of Science ®Google Scholar
Zhang, Y., Chen, C., & Zhu, L. (2022). Sliced independence test. Statistica Sinica, 32(Special onlline issue), 2477–2496. https://doi.org/10.5705/ss.202021.0203
Google Scholar
Zhong, W., Wang, J., & Chen, X. (2021). Censored mean variance sure independence screening for ultrahigh dimensional survival data. Computational Statistics & Data Analysis, 159, Article 107206. https://doi.org/10.1016/j.csda.2021.107206
Web of Science ®Google Scholar
Zhou, N., Guo, X., & Zhu, L. (2020). A projection-based model checking for heterogeneous treatment effect. Available at arXiv:2009.10900.
Google Scholar
Zhou, Y., & Zhu, L. (2018). Model-free feature screening for ultrahigh dimensional datathrough a modified Blum-Kiefer-Rosenblatt correlation. Statistica Sinica, 28(3), 1351–1370. https://doi.org/10.5705/ss.202016.0264
Web of Science ®Google Scholar
Zhu, L., Xu, K., Li, R., & Zhong, W. (2017). Projection correlation between two random vectors. Biometrika, 104(4), 829–843. https://doi.org/10.1093/biomet/asx043
PubMed Web of Science ®Google Scholar

Appendix

Proof of theorems

This appendix contains the technical proofs of Lemma 2.2 and Theorem 3.1. Lemma 2.1 and Theorem 2.4 are direct corollaries of Theorem 3.2, and the proof of Theorem 3.2 follows from Lemma 4 in Ma et al. (Citation2022), and thus their proofs are omitted.

A.1. Notations and preliminaries

Recall that the IPC index of

(X, Y)

, where X is a continuous random variable with support

R_{X}

and

Y \in {1, \dots, R}

is a categorical variable with R categories is defined as

\begin{aligned} IPC (X, Y) & = \sum_{r = 1}^{R} p_{r} \int \frac{{[F (x) - F_{r} (x)]}^{2}}{F (x) (1 - F (x))} d F (x) \\ = \sum_{r = 1}^{R} \int \frac{{[p_{r} F (x) - F (x, r)]}^{2}}{F (x) \bar{F} (x) p_{r}} d F (x), \end{aligned}

where

F (x)

is the distribution function of X,

F_{r} (x) = P (X \leq x ∣ Y = r)

\bar{F} (x) = 1 - F (x)

p_{r} = P (Y = r)

and

F (x, r) = P (X \leq x, Y = r)

r = 1, \dots, R

. And given i.i.d. samples

Z_{i} = (X_{i}, Y_{i})

for

i = 1, \dots, n

, the IPC statistic is defined as

\begin{aligned} {\hat{IPC}}_{n} (X, Y) & = \sum_{r = 1}^{R} {\hat{p}}_{r} \int \frac{{(F_{n} (x) - F_{r n} (x))}^{2}}{F_{n} (x) {\bar{F}}_{n} (x)} d F_{n} (x) \\ = \sum_{r = 1}^{R} \int \frac{{({\hat{p}}_{r} F_{n} (x) - F_{n} (x, r))}^{2}}{F_{n} (x) {\bar{F}}_{n} (x) {\hat{p}}_{r}} d F_{n} (x) \\ = \frac{1}{n} \sum_{r = 1}^{R} \sum_{i = 1}^{n} \frac{{({\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r))}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) {\hat{p}}_{r}}, \end{aligned}

where

F_{n} (x) = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x)

{\bar{F}}_{n} (x) = 1 - F_{n} (x)

{\hat{p}}_{r} = \frac{1}{n} \sum_{i = 1}^{n} I (Y_{i} = r)

F_{n} (x, r) = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x, Y_{i} = r)

, and

F_{r n} (x) = F_{n} (x, r) / {\hat{p}}_{r}

for

r = 1, \dots, R

We first provide a proof of Lemma 2.2.

Proof of Lemma 2.2.

It is obvious that $IPC (X, Y) = 0$ if and only if X and Y are independent. By noticing that $\sum_{r = 1}^{R} p_{r} = 1$ and $\sum_{r = 1}^{R} F (x, r) = F (x)$ , we have $\begin{aligned} \frac{1}{F (x) \bar{F} (x)} \sum_{r = 1}^{R} \frac{{(p_{r} F (x) - F (x, r))}^{2}}{p_{r}} & = \frac{1}{F (x) \bar{F} (x)} (\sum_{r = 1}^{R} \frac{F^{2} (x, r)}{p_{r}} - F^{2} (x)) \\ < \frac{1}{F (x) \bar{F} (x)} (\sum_{r = 1}^{R} \frac{F (x, r) p_{r}}{p_{r}} - F^{2} (x)) \\ = \frac{1}{F (x) \bar{F} (x)} (\sum_{r = 1}^{R} F (x, r) - F^{2} (x)) \\ = 1. \end{aligned}$ Hence we have $IPC (X, Y) < 1$ .

Next, we give some preparations for the proof of Theorem 3.1. For given constant C>0, let $F^{n, C} (x) = F (x) \lor n^{- \frac{1}{2 + C}}$ , ${\bar{F}}^{n, C} (x) = \bar{F} (x) \lor n^{- \frac{1}{2 + C}}$ , $F_{n}^{C} (x) = F_{n} (x) \lor n^{- \frac{1}{2 + C}}$ and ${\bar{F}}_{n}^{C} (x) = {\bar{F}}_{n} (x) \lor n^{- \frac{1}{2 + C}}$ . Then we have the following lemmas.

Lemma A.1

Let $Δ_{1} F (x) = F^{n, C} (x) - F_{n}^{C} (x)$ and $Δ_{2} F = {\bar{F}}^{n, C} (x) - {\bar{F}}_{n}^{C} (x)$ . Then $sup_{x \in R} | Δ_{1} F (x) | = O_{p} (n^{- 1 / 2}), a n d sup_{x \in R} | Δ_{2} F (x) | = O_{p} (n^{- 1 / 2}) .$

Proof.

It is easy to show that $| F^{n, C} (x) - F_{n}^{C} (x) | \leq | F (x) - F_{n} (x) | .$

Hence by Dvoretzky–Kiefer–Wolfowitz (DKW) inequality (Dvoretzky et al., Citation1956), $sup_{x} | Δ_{1} F (x) | \leq sup_{x} | F (x) - F_{n} (x) | = O_{p} (n^{- 1 / 2}) .$ Similarly, we have $sup_{x} | Δ_{2} F (x) | = O_{p} (n^{- 1 / 2})$ .

Lemma A.2

$sup_{x} | \frac{F^{n, C} (x) {\bar{F}}^{n, C} (x) - F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x)}{F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x)} | = O_{p} (n^{- \frac{C}{4 + 2 C}}) = o_{p} (1)$ .

Proof.

Note that $\begin{aligned} F^{n, C} (x) {\bar{F}}^{n, C} (x) & = (F_{n}^{C} (x) + Δ_{1} F (x)) ({\bar{F}}_{n}^{C} (x) + Δ_{2} F (x)) \\ = F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x) + {\bar{F}}_{n}^{C} (x) Δ_{1} F (x) + F_{n}^{C} (x) Δ_{2} F (x) + Δ_{1} F (x) Δ_{2} F (x) . \end{aligned}$ Then, $\begin{aligned} sup_{x} | \frac{F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x) - F^{n, C} (x) {\bar{F}}^{n, C} (x)}{F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x)} | & \leq sup_{x} | \frac{Δ_{1} F (x)}{F_{n}^{C} (x)} | + sup_{x} | \frac{Δ_{2} F (x)}{{\bar{F}}_{n}^{C} (x)} | + sup_{x} | \frac{Δ_{1} F (x) Δ_{2} F (x)}{F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x)} | \\ = O_{p} (n^{- 1 / 2 + \frac{1}{2 + C}}) + O_{p} (n^{- 1 / 2 + \frac{1}{2 + C}}) + O_{p} (n^{- 1 + \frac{1}{2 + C}}) \\ = O_{p} (n^{- \frac{C}{4 + 2 C}}) . \end{aligned}$

A.2. Proof of Theorem 3.1

To avoid any ambiguity, Theorem 3.1 considers a sequence of problems indexed by $(n_{k}, R_{k}, p_{1, k}, \dots, p_{R_{k}, k})$ , $k = 1, 2, \dots,$ where the sample size $n_{k} \to \infty$ , the number of categories $R_{k} \to \infty$ , and let $Y_{k} = Y (R_{k})$ denote the categorical variable with $R_{k}$ categories and $p_{r, k} = P (Y (R_{k}) = r)$ , $r = 1, \dots, R_{k}$ . From now on, we shall omit the subscript unless specifically mentioned. Moreover, in Section A.2, we should keep in mind that X and Y are independent.

A.2.1. Architecture of the proof

Our aim here is to provide a general overview of the proof of Theorem 3.1. At a high level, the general structure is fairly simple. And to make the structure clear, we divide the proof into three parts.

First, given a positive constant C, we substitute $F^{n, C} (x)$ , ${\bar{F}}^{n, C}$ and $p_{r}$ for $F_{n} (x)$ , ${\bar{F}}_{n} (x)$ and ${\hat{p}}_{r}$ in the denominator of the IPC statistic, and thereby obtain ${\hat{IPC}}_{n, C} (X, Y) := \sum_{r = 1}^{R} \int \frac{1}{p_{r}} \frac{{[{\hat{p}}_{r} F_{n} (x) - F_{n} (x, r)]}^{2}}{F^{n, C} (x) {\bar{F}}^{n, C} (x)} d F_{n} (x) .$ And then prove that the difference between $n {\hat{IPC}}_{n} (X, Y) / \sqrt{R}$ and $n {\hat{IPC}}_{n, C} (X, Y) / \sqrt{R}$ is bounded by $n {\hat{IPC}}_{n, C} (X, Y) / \sqrt{R} \times O_{p} (n^{- \frac{C}{4 + 2 C}} + \frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} n^{- 1 / 2}) + O_{p} (n^{- \frac{1}{2 + C}} \sqrt{R})$ , provided that $\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} = o (n^{1 / 2})$ .
Fixing C = 6, let $f_{i} (x, r) = [I (X_{i} \leq x) - F (x)] [I (Y_{i} = r) - p_{r}],$ and $f_{i, n} (x, r) = \frac{f_{i} (x, r)}{\sqrt{F^{n, 6} (x) {\bar{F}}^{n, 6} (x)}},$ and define ${\tilde{IPC}}_{n} (X, Y) = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d F (x) .$ Under the condition $\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} = o (n^{3 / 8})$ , showing that $n {\hat{IPC}}_{n, 6} (X, Y) / \sqrt{R}$ is close to $n {\tilde{IPC}}_{n} (X, Y) / \sqrt{R}$ and combined with the first part of the proof, we can derive that $n {\hat{IPC}}_{n} (X, Y) - n {\tilde{IPC}}_{n} (X, Y) = o_{p} (\sqrt{R}) .$
Finally, consider $n {\tilde{IPC}}_{n} (X, Y) = J_{1 n} + J_{2 n},$ where $J_{1 n} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n}^{2} (x, r) d F (x),$ and $J_{2 n} = \frac{1}{n} \sum_{i \neq j} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n} (x, r) f_{j, n} (x, r) d F (x) .$ We show that $\frac{J_{1 n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} \overset{P}{\to} 0, a n d \frac{J_{2 n}}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}}$ can be viewed as a martingale difference sequence. Then by the well-developed theory of central limit theorem of the martingale difference (Hall & Heyde, Citation1980), we can complete the proof.

Combined with Lemmas A.1 and A.2, the proof in part 1 is not difficult. And the proofs in part 2 and part 3 follow from Cui and Zhong (Citation2018) and Cui and Zhong (Citation2019) with a small modification.

A.2.2. Part 1

We summarize the conclusion we want to prove in part 1 into the following lemma.

Lemma A.3

For a fixed constant C, let ${\hat{IPC}}_{n, C} (X, Y) = \sum_{r = 1}^{R} \int \frac{1}{p_{r}} \frac{{[{\hat{p}}_{r} F_{n} (x) - F_{n} (x, r)]}^{2}}{F^{n, C} (x) {\bar{F}}^{n, C} (x)} d F_{n} (x) .$ For simplicity, write ${\hat{IPC}}_{n} = {\hat{IPC}}_{n} (X, Y)$ , and ${\hat{IPC}}_{n, C} = {\hat{IPC}}_{n, C} (X, Y)$ . Then if $\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} = o (n^{1 / 2})$ , and under condition that X and Y are independent, we have $\begin{aligned} | {\hat{IPC}}_{n} - {\hat{IPC}}_{n, C} | & = O_{p} (n^{- \frac{3 + C}{2 + C}} R) + {\hat{IPC}}_{n, C} (O_{p} (n^{- \frac{C}{4 + 2 C}}) + O_{p} (\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} n^{- 1 / 2})) . \end{aligned}$

Proof.

Let ${\hat{IPC}}_{n}^{'} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[F_{n} (X_{i}, r) - {\hat{p}}_{r} F_{n} (X_{i})]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} .$ Then $\begin{aligned} | {\hat{IPC}}_{n} - {\hat{IPC}}_{n}^{'} | & \leq max_{1 \leq r \leq R} | 1 - \frac{p_{r}}{{\hat{p}}_{r}} | \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} \\ = {\hat{IPC}}_{n}^{'} max_{1 \leq r \leq R} | 1 - \frac{{\hat{p}}_{r}}{p_{r}} | . \end{aligned}$ Since $E {(\sqrt{n} ({\hat{p}}_{r} - p_{r}))}^{2} = p_{r} (1 - p_{r}),$ we have $\begin{aligned} E {(max_{1 \leq r \leq R} | {\hat{p}}_{r} - p_{r} |)}^{2} & \leq E {(\sum_{r = 1}^{R} | {\hat{p}}_{r} - p_{r} |)}^{2} \\ \leq R \sum_{r = 1}^{R} E {({\hat{p}}_{r} - p_{r})}^{2} \\ = R \sum_{r = 1}^{R} \frac{p_{r} (1 - p_{r})}{n} \leq \frac{R}{n} . \end{aligned}$ So, $max_{1 \leq r \leq R} | {\hat{p}}_{r} - p_{r} | = O_{p} (\sqrt{R / n})$ . Then $max_{1 \leq r \leq R} | \frac{{\hat{p}}_{r} - p_{r}}{{\hat{p}}_{r}} | = max_{1 \leq r \leq R} | \frac{{\hat{p}}_{r} - p_{r}}{p_{r} + {\hat{p}}_{r} - p_{r}} | \leq max_{1 \leq r \leq R} | {\hat{p}}_{r} - p_{r} | max_{1 \leq r \leq R} \frac{1}{p_{r} + {\hat{p}}_{r} - p_{r}} .$ Since ${\hat{p}}_{r} - p_{r} = O_{p} (\sqrt{\frac{R}{n}}) = o_{p} (min_{1 \leq r \leq R} p_{r})$ , we have $max_{1 \leq r \leq R} | \frac{{\hat{p}}_{r} - p_{r}}{{\hat{p}}_{r}} | = O_{p} (\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} n^{- 1 / 2}) = o_{p} (1) .$ Hence, ${\hat{IPC}}_{n} = (1 + O_{p} (\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} n^{- 1 / 2})) {\hat{IPC}}_{n}^{'}$ . Next, let ${\hat{IPC}}_{n}^{*} = \frac{1}{n} \sum_{r = 1}^{R} \sum_{i = 1}^{n} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n}^{C} (X_{i}) {\bar{F}}_{n}^{C} (X_{i}) p_{r}} .$ Let $X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}$ be the ordered statistics of $X_{1}, \dots, X_{n}$ . Since X is continuous, there are no ties among $X_{1}, \dots, X_{n}$ . We can assume that $X_{(1)} < \dots < X_{(n)}$ . Let $A_{n} = ⌊ n^{1 - \frac{1}{2 + C}} ⌋$ , and define $\begin{aligned} S_{n 1} & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} I (X_{i} \leq X_{(A_{n})}), \\ S_{n 2} & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} I (X_{i} \geq X_{(n - A_{n})}) . \end{aligned}$ Indeed, we have $0 \leq {\hat{IPC}}_{n}^{'} - {\hat{IPC}}_{n}^{*} \leq S_{n 1} + S_{n 2}$ . And $\begin{aligned} E S_{n 1} & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{j = 1}^{n} E {\frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} I (X_{i} \leq X_{(A_{n})}) I (X_{i} = X_{(j)})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \sum_{j = 1}^{A_{n}} E {\frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n} (X_{i}) {\bar{F}}_{n} (X_{i}) p_{r}} I (X_{i} = X_{(j)})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{A_{n}} \sum_{r = 1}^{R} \frac{1}{j} {\frac{(n - j) [(n - 1) p_{r} + 1]}{n^{2}} - \frac{2 (n - j)}{n^{2}} [(n - 1) p_{r} + 1] + \frac{(n - j - 1) p_{r} + 1}{n}} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{A_{n}} \frac{R - 1}{n^{2}} \\ = \frac{A_{n} (R - 1)}{n^{2}} . \end{aligned}$ Similarly, we also have $E S_{n 2} = \frac{A_{n}}{n^{2}} (R - 1)$ . Therefore, ${\hat{IPC}}_{n}^{'} - {\hat{IPC}}_{n}^{*} = O_{p} (n^{- \frac{3 + C}{2 + C}} R) .$ Finally, according to Lemma A.2, $\begin{aligned} | {\hat{IPC}}_{n}^{*} - {\hat{IPC}}_{n, C} | & = \frac{1}{n} | \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F^{n, C} (X_{i}) {\bar{F}}^{n, C} (X_{i}) p_{r}} - \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F_{n}^{C} (X_{i}) {\bar{F}}_{n}^{C} (X_{i}) p_{r}} | \\ = \frac{1}{n} | \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{{[{\hat{p}}_{r} F_{n} (X_{i}) - F_{n} (X_{i}, r)]}^{2}}{F^{n, C} (X_{i}) {\bar{F}}^{n, C} (X_{i}) p_{r}} \times (\frac{F^{n, C} (X_{i}) {\bar{F}}^{n, C} (X_{i})}{F_{n}^{C} (X_{i}) {\bar{F}}_{n}^{C} (X_{i})} - 1) | \\ \leq {\hat{IPC}}_{n, C} \times sup_{x} | \frac{F^{n, C} (x) {\bar{F}}^{n, C} (x)}{F_{n}^{C} (x) {\bar{F}}_{n}^{C} (x)} - 1 | \\ = {\hat{IPC}}_{n, C} O_{p} (n^{- \frac{C}{4 + 2 C}}) . \end{aligned}$ Hence $\begin{aligned} | {\hat{IPC}}_{n} - {\hat{IPC}}_{n, C} | & = O_{p} (n^{- \frac{3 + C}{2 + C}} R) + {\hat{IPC}}_{n, C} (O_{p} (n^{- \frac{C}{4 + 2 C}}) + O_{p} (\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} n^{- 1 / 2})) . \end{aligned}$

A.2.3. Part 2

Recall that $f_{i} (x, r) = [I (X_{i} \leq x) - F (x)] [I (Y_{i} = r) - p_{r}], f_{i, n} (x, r) = \frac{f_{i} (x, r)}{\sqrt{F^{n, 6} (x) {\bar{F}}^{n, 6} (x)}},$ and ${\tilde{IPC}}_{n} (X, Y) = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d F (x) .$ The following lemma is what we want to prove in part 2.

Lemma A.4

If $\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} = o (n^{3 / 8})$ , and Under $H_{0}$ : X and Y are independent, then $\begin{aligned} {\hat{IPC}}_{n} (X, Y) - {\tilde{IPC}}_{n} (X, Y) & = O_{p} (R n^{- 9 / 8}) + O_{p} (\frac{R n^{- 5 / 4}}{min_{1 \leq r \leq R} p_{r}}) + {\tilde{IPC}}_{n} (X, Y) o_{p} (n^{- 1 / 8}) . \end{aligned}$

Proof.

For simplicity, write ${\tilde{IPC}}_{n} = {\tilde{IPC}}_{n} (X, Y)$ . Given C = 6, according to Lemma A.3, and under the condition that $\frac{\sqrt{R}}{min_{1 \leq r \leq R} p_{r}} = o (n^{3 / 8})$ , we have (A1) $\begin{aligned} {\hat{IPC}}_{n} - {\hat{IPC}}_{n, 6} & = O_{p} (n^{- 9 / 8} R) + {\hat{IPC}}_{n, 6} [O_{p} (n^{- 3 / 8}) + o_{p} (n^{- 1 / 8})] \\ = O_{p} (n^{- 9 / 8} R) + {\hat{IPC}}_{n, 6} o_{p} (n^{- 1 / 8}) . \end{aligned}$ (A1) Let ${\tilde{IPC}}_{1 n} = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d F_{n} (x) .$ Next, we follow the proof of Lemma A.1 in Cui and Zhong (Citation2019), and show that $\begin{aligned} {\hat{IPC}}_{n, 6} - {\tilde{IPC}}_{1 n} & = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int \frac{1}{F^{n, 6} (x) {\bar{F}}^{n, 6} (x)} {{[{\hat{p}}_{r} F_{n} (x) - F_{n} (x, r)]}^{2} - {[\frac{1}{n} \sum_{i = 1}^{n} f_{i} (x, r)]}^{2}} d F_{n} (x) \\ = O (n^{\frac{1}{8}}) \sum_{r = 1}^{R} \frac{1}{p_{r}} \int {{[{\hat{p}}_{r} F_{n} (x) - F_{n} (x, r)]}^{2} - {[\frac{1}{n} \sum_{i = 1}^{n} f_{i} (x, r)]}^{2}} d F_{n} (x) . \end{aligned}$ Let ${\bar{f}}_{n} (x, r) = \frac{1}{n} \sum_{i = 1}^{n} f_{i} (x, r)$ . By the DKW inequality, we have $\begin{aligned} sup_{x} | {[{\hat{p}}_{r} F_{n} (x) - F_{n} (x, r)]}^{2} - {[\frac{1}{n} \sum_{i = 1}^{n} f_{i} (x, r)]}^{2} | & = sup_{x} | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) - {\bar{f}}_{n} (x, r) | | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) + {\bar{f}}_{n} (x, r) | \\ = sup_{x} | F_{n} (x) - F (x) | | {\hat{p}}_{r} - p_{r} | {sup_{x} | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) | + sup_{x} | {\bar{f}}_{n} (x, r) |} \\ = O_{p} (n^{- 1 / 2}) O_{p} (n^{- 1 / 2}) O_{p} (n^{- 1 / 2}) = O_{p} (n^{- 3 / 2}) . \end{aligned}$ Here, the second equality follows by $\begin{aligned} {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) - {\bar{f}}_{n} (x, r) & = {\frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x, Y_{i} = r) - F_{n} (x) {\hat{p}}_{r}} \\ - {\frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x, Y_{i} = r) - F (x) {\hat{p}}_{r} - p_{r} F_{n} (x) + p_{r} F (x)} \\ = - [F_{n} (x) - F (x)] [{\hat{p}}_{r} - p_{r}], \end{aligned}$ and the last equality follows by $\begin{aligned} sup_{x} | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) | = O_{p} (n^{- 1 / 2}), \\ sup_{x} | {\bar{f}}_{n} (x, r) | = O_{p} (n^{- 1 / 2}) . \end{aligned}$ Indeed, $\begin{aligned} sup_{x} | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) | & \leq sup_{x} | \frac{1}{n} \sum_{i = 1}^{n} [I (X_{i} \leq x) - F (x)] I (Y_{i} = r) | \\ + sup_{x} | F (x) \frac{1}{n} \sum_{i = 1}^{n} [I (Y_{i} = r) - p_{r}] | + | {\hat{p}}_{r} - p_{r} | + sup_{x} | F_{n} (x) - F (x) | \\ = sup_{x} | \frac{1}{n} \sum_{i = 1}^{n} [I (X_{i} \leq x) - F (x)] I (Y_{i} = r) | + sup_{x} F (x) | {\hat{p}}_{r} - p_{r} | + O_{p} (n^{- 1 / 2}) \\ = sup_{x} | \frac{1}{n} \sum_{i = 1}^{n} [I (X_{i} \leq x) - F (x)] I (Y_{i} = r) | + O_{p} (n^{- 1 / 2}), \end{aligned}$ and $\begin{aligned} E [sup_{x} | \frac{1}{n} \sum_{i = 1}^{n} [I (X_{i} \leq x) - F (x)] I (Y_{i} = r) |] & = \sum_{m = 1}^{n} E [sup_{x} | \frac{1}{n} \sum_{i = 1}^{n} [I (X_{i} \leq x) - F (x)] I (Y_{i} = r) |, m Y_{i}^{'} s = r] \\ = \sum_{m = 1}^{n} (\binom{n}{m}) p_{r}^{m} {(1 - p_{r})}^{n - m} \frac{\sqrt{m}}{n} E [sup_{x} | \frac{1}{\sqrt{m}} \sum_{i = 1}^{m} [I (X_{i} \leq x) - F (x)] |] \\ \leq 4 \sum_{m = 1}^{n} (\binom{n}{m}) p_{r}^{m} {(1 - p_{r})}^{n - m} \frac{\sqrt{m}}{n} \\ \leq 4 n^{- 1 / 2}, \end{aligned}$ where the first inequality follows by the DKW inequality. Hence, $sup_{x} | {\hat{p}}_{r} F_{n} (x) - F_{n} (x, r) | = O_{p} (n^{- 1 / 2})$ and similarly $sup_{x} | {\bar{f}}_{n} (x, r) | = O_{p} (n^{- 1 / 2})$ . Therefore, we have (A2) ${\hat{IPC}}_{n, 6} - {\tilde{IPC}}_{1 n} = \frac{R}{min_{1 \leq r \leq R} p_{r}} O_{p} (n^{- 11 / 8}) .$ (A2) Combining (EquationA1(A1) $\begin{aligned} {\hat{IPC}}_{n} - {\hat{IPC}}_{n, 6} & = O_{p} (n^{- 9 / 8} R) + {\hat{IPC}}_{n, 6} [O_{p} (n^{- 3 / 8}) + o_{p} (n^{- 1 / 8})] \\ = O_{p} (n^{- 9 / 8} R) + {\hat{IPC}}_{n, 6} o_{p} (n^{- 1 / 8}) . \end{aligned}$ (A1) ) and (EquationA2(A2) ${\hat{IPC}}_{n, 6} - {\tilde{IPC}}_{1 n} = \frac{R}{min_{1 \leq r \leq R} p_{r}} O_{p} (n^{- 11 / 8}) .$ (A2) ), we have $\begin{aligned} {\hat{IPC}}_{n} - {\tilde{IPC}}_{1 n} & = O_{p} (R n^{- 9 / 8}) + \frac{R}{min_{1 \leq r \leq R} p_{r}} O_{p} (n^{- 11 / 8}) + {\tilde{IPC}}_{1 n} o_{p} (n^{- 1 / 8}) . \end{aligned}$ To complete the proof, we only need to show that $\begin{aligned} {\tilde{IPC}}_{1 n} - {\tilde{IPC}}_{n} & = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d [F_{n} (x) - F (x)] \\ = \frac{R}{min_{1 \leq r \leq R} p_{r}} O_{p} (n^{- 11 / 8}) . \end{aligned}$ It is enough to show that $I_{n} (r) := \int {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d [F_{n} (x) - F (x)] = O_{p} (n^{- 11 / 8}) .$ Without loss of generality, let $F (x)$ be the uniform distribution function, since we can make the transformation $X^{'} = F (X)$ for the continuous random variable X. And $I_{n} (r) = \frac{1}{n} \sum_{j = 1}^{n} {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (X_{j}, r)]}^{2} - \int_{0}^{1} {[\frac{1}{n} \sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d x .$ For any $x, y \in (0, 1)$ , it can be easily proved that $E f_{i, n} (x, r) f_{j, n} (y, r) = \frac{x \land y - x y}{\sqrt{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}}} (p_{r} - p_{r}^{2}) I (i = j),$ where $x^{(n)} = x \lor n^{- 1 / 8}$ and $(1 - x)^{(n)} = (1 - x) \lor n^{- 1 / 8}$ . Then $\begin{aligned} E I_{n}^{2} (r) & = E {\int_{0}^{1} [\frac{1}{n} \sum_{j = 1}^{n} [{\bar{f}}_{n} {(X_{j})}^{2} - {\bar{f}}_{n} {(x)}^{2}]] d x}^{2} \\ = E {\int_{0}^{1} \int_{0}^{1} [\frac{1}{n} \sum_{j = 1}^{n} [{\bar{f}}_{n} {(X_{j})}^{2} - {\bar{f}}_{n} {(x)}^{2}]] [\frac{1}{n} \sum_{j = 1}^{n} [{\bar{f}}_{n} {(X_{j})}^{2} - {\bar{f}}_{n} {(y)}^{2}]] d x d y} \\ = \frac{1}{n} \int_{0}^{1} \int_{0}^{1} E {[{\bar{f}}_{n} {(X_{1})}^{2} - {\bar{f}}_{n} {(x)}^{2}] [{\bar{f}}_{n} {(X_{1})}^{2} - {\bar{f}}_{n} {(y)}^{2}]} d x d y \\ + \frac{n - 1}{n} \int_{0}^{1} \int_{0}^{1} E {[{\bar{f}}_{n} {(X_{1})}^{2} - {\bar{f}}_{n} {(x)}^{2}] [{\bar{f}}_{n} {(X_{2})}^{2} - {\bar{f}}_{n} {(y)}^{2}]} d x d y \\ = \int_{0}^{1} \int_{0}^{1} E {[{\bar{f}}_{n} {(X_{1})}^{2} - {\bar{f}}_{n} {(x)}^{2}] [{\bar{f}}_{n} {(X_{2})}^{2} - {\bar{f}}_{n} {(y)}^{2}]} d x d y \\ + \frac{1}{n} [E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{1})}^{2}] - E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{2})}^{2}]], \end{aligned}$ where ${\bar{f}}_{n} (x) = n^{- 1} \sum_{i = 1}^{n} f_{i, n} (x, r)$ . And be careful here that ${\bar{f}}_{n} (x)$ is different from ${\bar{f}}_{n} (x, r)$ defined above.

Since $E f_{i, n} (x, r) = 0$ under $H_{0}$ , we have $E [f_{i, n} (x, r) f_{j, n} (x, r) f_{k, n} (y, r) f_{l, n} (y, r)] = 0$ under $H_{0}$ if one of ${i, j, k, l}$ is different from the other three. Then we have $\begin{aligned} E [{\bar{f}}_{n} {(x)}^{2} {\bar{f}}_{n} {(y)}^{2}] & = \frac{1}{n^{4}} \sum_{i, j} \sum_{k, l} E [f_{i, n} (x, r) f_{j, n} (x, r) f_{k, n} (y, r) f_{l, n} (y, r)] \\ = \frac{1}{n^{3}} E [f_{1, n} {(x, r)}^{2} f_{1, n} {(y, r)}^{2}] + \frac{n - 1}{n^{3}} E [f_{1, n}^{2} (x, r)] E [f_{2, n}^{2} (y, r)] \\ + \frac{2 (n - 1)}{n^{3}} {E [f_{1, n} (x, r) f_{1, n} (y, r)]}^{2} \\ = \frac{1}{n^{3}} \frac{1}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} E [f_{1} {(x, r)}^{2} f_{1} {(y, r)}^{2}] \\ + \frac{n - 1}{n^{3}} \frac{1}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} E [f_{1}^{2} (x, r)] E [f_{2}^{2} (y, r)] \\ + \frac{2 (n - 1)}{n^{3}} \frac{1}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} {E [f_{1} (x, r) f_{1} (y, r)]}^{2} \\ = O (n^{- 11 / 4}) + \frac{{(p_{r} - p_{r}^{2})}^{2}}{n^{2}} \frac{[x y (1 - x) (1 - y) + 2 {(x \land y - x y)}^{2}]}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} . \end{aligned}$ And also, we have $\begin{aligned} E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(y)}^{2}] & = \frac{1}{n^{4}} \sum_{i, j} \sum_{k, l} E [f_{i, n} (X_{1}, r) f_{j, n} (X_{1}, r) f_{k, n} (y, r) f_{l, n} (y, r)] \\ = O (n^{- 11 / 4}) + \frac{{(p_{r} - p_{r}^{2})}^{2}}{n^{2}} \int_{0}^{1} \frac{x y (1 - x) (1 - y) + 2 {(x \land y - x y)}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} d x, \\ E [{\bar{f}}_{n} {(x)}^{2} {\bar{f}}_{n} {(X_{2})}^{2}] & = \frac{1}{n^{4}} \sum_{i, j} \sum_{k, l} E [f_{i, n} (x, r) f_{j, n} (x, r) f_{k, n} (X_{2}, r) f_{l, n} (X_{2}, r)] \\ = O (n^{- 11 / 4}) + \frac{{(p_{r} - p_{r}^{2})}^{2}}{n^{2}} \int_{0}^{1} \frac{x y (1 - x) (1 - y) + 2 {(x \land y - x y)}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} d y, \\ E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{2})}^{2}] & = \frac{1}{n^{4}} \sum_{i, j} \sum_{k, l} E [f_{i, n} (X_{1}, r) f_{j, n} (X_{1}, r) f_{k, n} (X_{2}, r) f_{l, n} (X_{2}, r)] \\ = O (n^{- 11 / 4}) + \frac{{(p_{r} - p_{r}^{2})}^{2}}{n^{2}} \int_{0}^{1} \frac{x y (1 - x) (1 - y) + 2 {(x \land y - x y)}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} d x d y, \end{aligned}$ and $\begin{aligned} E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{1})}^{2}] & = \frac{1}{n^{4}} \sum_{i, j} \sum_{k, l} E [f_{i, n} (X_{1}, r) f_{j, n} (X_{1}, r) f_{k, n} (X_{1}, r) f_{l, n} (X_{1}, r)] \\ = O (n^{- 11 / 4}) + \frac{{(p_{r} - p_{r}^{2})}^{2}}{n^{2}} \int_{0}^{1} \frac{x^{2} {(1 - x)}^{2} + 2 {(x - x^{2})}^{2}}{{(x^{(n)} {(1 - x)}^{(n)})}^{2}} d x . \end{aligned}$ Hence, $\begin{aligned} E [I_{n} {(r)}^{2}] & = \int_{0}^{1} \int_{0}^{1} E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{2})}^{2}] d x d y - \int_{0}^{1} \int_{0}^{1} E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(y)}^{2}] d x d y \\ - \int_{0}^{1} \int_{0}^{1} E [{\bar{f}}_{n} {(x)}^{2} {\bar{f}}_{n} {(X_{1})}^{2}] d x d y + \int_{0}^{1} \int_{0}^{1} E [{\bar{f}}_{n} {(x)}^{2} {\bar{f}}_{n} {(y)}^{2}] d x d y \\ + \frac{1}{n} [E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{1})}^{2}] - E [{\bar{f}}_{n} {(X_{1})}^{2} {\bar{f}}_{n} {(X_{2})}^{2}]] \\ = O (n^{- 11 / 4}) . \end{aligned}$ So, $\begin{aligned} {\hat{IPC}}_{n} - {\tilde{IPC}}_{n} & = {\hat{IPC}}_{n} - {\tilde{IPC}}_{1 n} + {\tilde{IPC}}_{1 n} - {\tilde{IPC}}_{n} \\ = O_{p} (R n^{- 9 / 8}) + \frac{R}{min_{1 \leq r \leq R} p_{r}} O_{p} (n^{- 11 / 8}) + {\tilde{IPC}}_{n} o_{p} (n^{- 1 / 8}) . \end{aligned}$

A.2.4. Part 3

Now, we will complete the proof of Theorem 3.1.

Proof of Theorem 3.1.

Let ${\tilde{T}}_{n} = n {\tilde{IPC}}_{n}$ . Without loss of generality, we assume that $X \sim U n i f (0, 1)$ . Then $F (x) = x$ for $0 \leq x \leq 1$ . According to Lemma A.4, we have $T_{n} - {\tilde{T}}_{n} = O_{p} (R n^{- 1 / 8}) + O_{p} (\frac{R n^{- 3 / 8}}{min_{1 \leq r \leq R} p_{r}}) + o_{p} ({\tilde{T}}_{n} n^{- 1 / 8}) .$ Then under the condition $\sqrt{R} / min_{1 \leq r \leq R} p_{r} = o (n^{3 / 8})$ , we have $R = o (n^{1 / 4})$ , and thus $T_{n} - {\tilde{T}}_{n} = o_{p} (\sqrt{R}) + {\tilde{T}}_{n} o_{p} (n^{- 1 / 8})$ , i.e., $\begin{aligned} \frac{T_{n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} - \frac{{\tilde{T}}_{n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} & = o_{p} (1) + \frac{{\tilde{T}}_{n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} o_{p} (n^{- 1 / 8}) + o_{p} (\sqrt{R} n^{- 1 / 8}) \\ = \frac{{\tilde{T}}_{n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} o_{p} (n^{- 1 / 8}) + o_{p} (1) . \end{aligned}$ Hence, we only need to prove that $\frac{{\tilde{T}}_{n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}} \overset{d}{\to} N (0, 1),$ as $n \to \infty$ .

Recall that $f_{i, n} (x, r) = \frac{(I (X_{i} \leq x) - x) (I (Y_{i} = r) - p_{r})}{\sqrt{x^{(n)} (1 - x)^{(n)}}}$ , where $x^{(n)} = x \lor n^{- 1 / 8}$ and $(1 - x)^{(n)} = (1 - x) \lor n^{- 1 / 8}$ . We first give some important facts:

$E [f_{i, n} (x, r) f_{i, n} (y, s)] = \frac{(x \land y - x y) (p_{r} δ_{r s} - p_{r} p_{s})}{\sqrt{x^{(n)} (1 - x)^{(n)} y^{(n)} (1 - y)^{(n)}}}$ ;
$E [f_{i, n}^{2} (x, r) f_{i, n}^{2} (y, s)] \leq C n^{1 / 8} (p_{r} δ_{r s} + p_{r} p_{s} (p_{r} + p_{s}))$ ,

for all $1 \leq i \leq n$ , $1 \leq r, s \leq R$ , where C is a constant and $δ_{r s} = 1$ if r = s and $δ_{r s} = 0$ , otherwise.

We prove (ii). Without loss of generality, we assume that $x \leq y$ . $\begin{aligned} E [f_{i, n}^{2} (x, r) f_{i, n}^{2} (y, s)] & = [p_{r} δ_{r s} + p_{r} p_{s} (p_{r} + p_{s})] E \frac{{[I (X_{i} \leq x) - x]}^{2} {[I (X_{i} \leq y) - y]}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} . \end{aligned}$ And $\begin{aligned} E \frac{{[I (X_{i} \leq x) - x]}^{2} {[I (X_{i} \leq y) - y]}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} & = \frac{E [I (X_{i} \leq x) - 2 x I (X_{i} \leq x) + x^{2}] [I (X_{i} \leq y) - 2 y I (X_{i} \leq y) + y^{2}]}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} \\ = \frac{x (1 - y) (1 - y - 2 x + 3 x y)}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} \\ \leq \frac{x (1 - y)}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} \\ \leq 4 n^{1 / 8} . \end{aligned}$ The last inequality is because, if $1 / 2 \leq x \leq y$ , then $\frac{x (1 - y)}{x^{(n)} (1 - x)^{(n)} y^{(n)} (1 - y)^{(n)}} \leq 4 \frac{x}{(1 - x)^{(n)}} \leq 4 n^{1 / 8}$ ; if $x \leq y \leq 1 / 2$ , then $\frac{x (1 - y)}{x^{(n)} (1 - x)^{(n)} y^{(n)} (1 - y)^{(n)}} \leq 4 \frac{1 - y}{y^{(n)}} \leq 4 n^{1 / 8}$ ; if $x \leq 1 / 2 \leq y$ , then $\frac{x (1 - y)}{x^{(n)} (1 - x)^{(n)} y^{(n)} (1 - y)^{(n)}} \leq 4$ .

(iii) $\sum_{r, s, t, q = 1}^{R} \frac{(p_{r} δ_{r s} - p_{r} p_{s}) (p_{r} δ_{r t} - p_{r} p_{t}) (p_{t} δ_{t q} - p_{t} p_{q}) (p_{s} δ_{s q} - p_{s} p_{q})}{p_{r} p_{s} p_{t} p_{q}} = O (R)$ . This result can be found in Cui and Zhong (Citation2018) and Cui and Zhong (Citation2019).

Write ${\tilde{T}}_{n} = \frac{1}{n} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int_{0}^{1} {[\sum_{i = 1}^{n} f_{i, n} (x, r)]}^{2} d x =: J_{1 n} + J_{2 n},$ where $J_{1 n} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n}^{2} (x, r) d x,$ and $J_{2 n} = \frac{1}{n} \sum_{i \neq j} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n} (x, r) f_{j, n} (x, r) d x .$ Note that, $\begin{aligned} E J_{1 n} & = \sum_{r = 1}^{R} \frac{1}{p_{r}} \int E \frac{{(I (X_{i} \leq x) - x)}^{2} {(I (Y_{i} = r) - p_{r})}^{2}}{x^{(n)} {(1 - x)}^{(n)}} d x \\ = \sum_{r = 1}^{R} (1 - p_{r}) \int_{0}^{1} \frac{x (1 - x)}{x^{(n)} {(1 - x)}^{(n)}} d x \\ = (R - 1) (1 - n^{- 1 / 8}), \end{aligned}$ and $\begin{aligned} V a r (J_{1 n}) & = \frac{1}{n} V a r (\sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{1, n}^{2} (x, r) d x) \leq \frac{1}{n} E {(\sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{1, n}^{2} (x, r) d x)}^{2} \\ = \frac{1}{n} (\sum_{r, s} \frac{1}{p_{r} p_{s}} \int E [f_{1, n}^{2} (x, r) f_{1, n}^{2} (y, s)] d x d y) \\ \leq \frac{C n^{1 / 8}}{n} \sum_{r, s}^{R} \frac{p_{r} δ_{r s} + p_{r} p_{s} (p_{r} + p_{s})}{p_{r} p_{s}} \\ \leq \frac{C}{n^{7 / 8}} (\frac{R}{min p_{r}} + R) = O (n^{- 3 / 8}) = o (1) . \end{aligned}$ Hence, $\begin{aligned} E {(\frac{J_{1 n} - (R - 1)}{\sqrt{2 (\frac{π^{2}}{3} - 3) (R - 1)}})}^{2} & = C {V a r (J_{1 n}) / (R - 1) + {[E J_{1 n} - (R - 1)]}^{2} / (R - 1)} \\ = C V a r (J_{1 n}) / (R - 1) + C (R - 1) n^{- 1 / 4} = o (1), \end{aligned}$ where C is a constant. Next, we only need to show that $\frac{J_{2 n}}{\sqrt{2 (π^{2} / 3 - 3) (R - 1)}} \overset{d}{\to} N (0, 1) .$ Note that $E J_{2 n} = 0$ , and $\begin{aligned} V a r (J_{2 n}) & = E (J_{2 n}^{2}) \\ = \frac{1}{n^{2}} \sum_{i \neq j} \sum_{k \neq l} \sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int E [f_{i, n} (x, r) f_{j, n} (x, r) f_{k, n} (y, s) f_{l, n} (y, s)] d x d y \\ = \frac{2 n (n - 1)}{n^{2}} \sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int {E [f_{1, n} (x, r) f_{1, n} (y, s)]}^{2} d x d y \\ = \frac{2 n (n - 1)}{n^{2}} \sum_{r, s}^{R} \frac{{(p_{r} δ_{r s} - p_{r} p_{s})}^{2}}{p_{r} p_{s}} \int \frac{{(x \land y - x y)}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} d x d y \\ = (1 - \frac{1}{n}) (R - 1) [2 \int \frac{{(x \land y - x y)}^{2}}{x (1 - x) y (1 - y)} d x d y + O (n^{- 1 / 8})] \\ = (1 - \frac{1}{n}) (R - 1) [2 (π^{2} / 3 - 3) + O (n^{- 1 / 8})] . \end{aligned}$ The last equality holds because $\int_{0}^{1} \int_{0}^{1} \frac{{(x \land y - x y)}^{2}}{x (1 - x) y (1 - y)} d x d y = \frac{π^{2}}{3} - 3.$ Let $F_{i} = σ {(X_{1}, Y_{1}), \dots, (X_{i}, Y_{i})}$ be the σ-field generated by a set of random variables ${(X_{1}, Y_{1}), \dots, (X_{i}, Y_{i})}$ , $i = 1, \dots, n$ . We see that $\begin{aligned} \frac{J_{2 n}}{\sqrt{2 (π^{2} / 3 - 3) (R - 1)}} & = \frac{\sum_{i = 2}^{n} [\frac{2}{n} \sum_{j = 1}^{i - 1} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n} (x, r) f_{j, n} (x, r) d x]}{\sqrt{2 (π^{2} / 3 - 3) (R - 1)}} \\ =: \sum_{i = 2}^{n} Z_{n i} \end{aligned}$ is the summation of a martingale difference sequence with $E (Z_{n i}) = 0$ and $V a r (\sum_{i = 2}^{n} Z_{n i}) = (1 - \frac{1}{n}) (1 + O (n^{- 1 / 8})) \to 1$ . According to Hall and Heyde (Citation1980), we need to prove $\sum_{i = 2}^{n} E [Z_{n i}^{2} ∣ F_{i - 1}] \overset{P}{\to} 1$ . $\begin{aligned} E [Z_{n i}^{2} ∣ F_{i}] = \frac{1}{2 (π^{2} / 3 - 3) (R - 1)} {(\frac{2}{n})}^{2} \times \sum_{j, k}^{i - 1} \sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{k, n} (y, s) d x d y . \end{aligned}$ Thus we have $\sum_{i = 2}^{n} E [Z_{n i}^{2} ∣ F_{i - 1}] = J_{3 n} + J_{4 n},$ where $\begin{aligned} J_{3 n} & = \frac{1}{2 (π^{2} / 3 - 3) (R - 1)} {(\frac{2}{n})}^{2} \times \sum_{j = 1}^{n - 1} (n - j) \sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{j, n} (y, s) d x d y, \end{aligned}$ and $\begin{aligned} J_{4 n} & = \frac{2}{2 (π^{2} / 3 - 3) (R - 1)} {(\frac{2}{n})}^{2} \times \sum_{j < k \leq n} (n - k) \sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{k, n} (y, s) d x d y . \end{aligned}$ Since $E (J_{3 n}) \to 1$ , and $\begin{aligned} V a r (J_{3 n}) & = \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} \times V a r (\sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{j, n} (y, s) d x d y) \\ \leq \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} \times E {(\sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{j, n} (y, s) d x d y)}^{2} \\ = \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} \times E {(\sum_{r, s}^{R} \frac{p_{r} δ_{r s} - p_{r} p_{s}}{p_{r} p_{s}} \int \int \frac{x \land y - x y}{\sqrt{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}}} f_{j, n} (x, r) f_{j, n} (y, s) d x d y)}^{2} \\ \leq \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} R^{2} \\ \times E {\sum_{r, s}^{R} {(\frac{p_{r} δ_{r s} - p_{r} p_{s}}{p_{r} p_{s}})}^{2} {(\int \int \frac{x \land y - x y}{\sqrt{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}}} f_{j, n} (x, r) f_{j, n} (y, s) d x d y)}^{2}} \\ \leq \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} \\ \times R^{2} E {\sum_{r, s}^{R} {(\frac{p_{r} δ_{r s} - p_{r} p_{s}}{p_{r} p_{s}})}^{2} \int \int \frac{{(x \land y - x y)}^{2}}{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)}} f_{j, n}^{2} (x, r) f_{j, n}^{2} (y, s) d x d y} \\ \leq \frac{C}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} R^{2} \sum_{r, s}^{R} {(\frac{p_{r} δ_{r s} - p_{r} p_{s}}{p_{r} p_{s}})}^{2} \int \int E [f_{j, n}^{2} (x, r) f_{j, n}^{2} (y, s)] d x d y \\ \leq \frac{C^{'}}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} R^{2} \sum_{r, s}^{R} {(\frac{p_{r} δ_{r s} - p_{r} p_{s}}{p_{r} p_{s}})}^{2} n^{1 / 8} (p_{r} δ_{r s} + p_{r} p_{s} (p_{r} + p_{s})) \\ \leq \frac{C^{'}}{{(R - 1)}^{2} n^{4}} \sum_{j = 1}^{n - 1} {(n - j)}^{2} R^{2} n^{1 / 8} \frac{R}{min p_{r}} \\ = O (n^{- 7 / 8} \frac{R}{min p_{r}}) = O (n^{- 3 / 8}), \end{aligned}$ where C and $C^{'}$ are constants. Thus $J_{3 n} \to 1$ . And $E (J_{4 n}) = 0$ , and $\begin{aligned} V a r (J_{4 n}) & = \frac{C}{R^{2} n^{4}} \sum_{j < k, l < m} (n - k) (n - m) \\ \times \sum_{r, s}^{R} \sum_{t, q}^{R} E {\frac{1}{p_{r} p_{s} p_{t} p_{q}} \int \int E [f_{i, n} (x, r) f_{i, n} (y, s)] f_{j, n} (x, r) f_{k, n} (y, s) d x d y \\ \times \int \int E [f_{i, n} (x^{'}, t) f_{i, n} (y^{'}, q)] f_{l, n} (x^{'}, t) f_{m, n} (y^{'}, q) d x^{'} d y^{'}} \\ = \frac{C}{R^{2} n^{4}} \sum_{j < k, l < m} (n - k) (n - m) \sum_{r, s}^{R} \sum_{t, q}^{R} \frac{(p_{r} δ_{r s} - p_{r} p_{s}) (p_{t} δ_{t q} - p_{t} p_{q})}{p_{r} p_{s} p_{t} p_{q}} \\ \times \int \frac{(x \land y - x y) (x^{'} \land y^{'} - x^{'} y^{'})}{\sqrt{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)} {(x^{'})}^{(n)} {(1 - x^{'})}^{(n)} {(y^{'})}^{(n)} {(1 - y^{'})}^{(n)}}} \\ \times E [f_{j, n} (x, r) f_{k, n} (y, s) f_{l, n} (x^{'}, t) f_{m, n} (y^{'}, q)] d x d y d x^{'} d y^{'} \\ = \frac{C}{R^{2} n^{4}} \sum_{j < k} (n - k) (n - k) \sum_{r, s}^{R} \sum_{t, q}^{R} \frac{(p_{r} δ_{r s} - p_{r} p_{s}) (p_{t} δ_{t q} - p_{t} p_{q})}{p_{r} p_{s} p_{t} p_{q}} \\ \times \int \frac{(x \land y - x y) (x^{'} \land y^{'} - x^{'} y^{'})}{\sqrt{x^{(n)} {(1 - x)}^{(n)} y^{(n)} {(1 - y)}^{(n)} {(x^{'})}^{(n)} {(1 - x^{'})}^{(n)} {(y^{'})}^{(n)} {(1 - y^{'})}^{(n)}}} \\ \times E [f_{j, n} (x, r) f_{k, n} (y, s) f_{j, n} (x^{'}, t) f_{k, n} (y^{'}, q)] d x d y d x^{'} d y^{'} \\ \leq \frac{C}{R^{2} n^{4}} \sum_{j < k} (n - k) (n - k) \times \sum_{r, s}^{R} \sum_{t, q}^{R} \frac{(p_{r} δ_{r s} - p_{r} p_{s}) (p_{t} δ_{t q} - p_{t} p_{q}) (p_{r} δ_{r t} - p_{r} p_{t}) (p_{s} δ_{s q} - p_{s} p_{q})}{p_{r} p_{s} p_{t} p_{q}} \\ = \frac{C}{R^{2} n^{4}} \sum_{k = 2}^{n} (k - 1) {(n - k)}^{2} O (R) = O (1 / R) . \end{aligned}$ Thus, $J_{4 n} \overset{P}{\to} 0$ . On the other hand $\begin{aligned} \sum_{i = 2}^{n} E (Z_{n i}^{4}) & \leq \sum_{i = 2}^{n} \frac{C}{n^{4} R^{2}} E {[\sum_{j = 1}^{i - 1} \sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{i, n} (x, r) f_{j, n} (x, r) d x]}^{4} \\ \leq \sum_{i = 2}^{n} \frac{C}{n^{4} R^{2}} (6 (\binom{i - 1}{2}) + i - 1) E {[\sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{1, n} (x, r) f_{2, n} (x, r) d x]}^{4} \\ \leq \frac{C^{'}}{n R^{2}} E {[\sum_{r = 1}^{R} \frac{1}{p_{r}} \int f_{1, n} (x, r) f_{2, n} (x, r) d x]}^{4} \\ = \frac{C^{'}}{n R^{2}} E {[\sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int f_{1, n} (x, r) f_{1, n} (y, s) f_{2, n} (x, r) f_{2, n} (y, s) d x d y]}^{2} \\ \leq \frac{C^{'}}{n R^{2}} E {[\sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} {(\int f_{1, n}^{2} (x, r) f_{1, n}^{2} (y, s) d x d y)}^{1 / 2} {(\int f_{2, n}^{2} (x, r) f_{2, n}^{2} (y, s) d x d y)}^{1 / 2}]}^{2} \\ \leq \frac{C^{'}}{n R^{2}} {(\sum_{r, s}^{R} \frac{1}{p_{r} p_{s}} \int E [f_{1, n}^{2} (x, r) f_{1, n}^{2} (y, s)] d x d y)}^{2} \\ \leq \frac{C^{''}}{n R^{2}} {(\sum_{r, s}^{R} \frac{p_{r} δ_{r s} + p_{r} p_{s} (p_{r} + p_{s})}{p_{r} p_{s}} n^{1 / 8})}^{2} \\ = \frac{C^{''}}{n^{3 / 4} R^{2}} {(\frac{R}{min p_{r}} + R + 2)}^{2} = O (\frac{1}{n^{3 / 4} {(min p_{r})}^{2}}) = o (1 / R), \end{aligned}$ where C, $C^{'}$ and $C^{''}$ are constants. By the central limit theorem of the martingale difference (Hall & Heyde, Citation1980), we have $\frac{{\tilde{T}}_{n} - (R - 1)}{\sqrt{2 (π^{2} / 3 - 3) (R - 1)}} \overset{d}{\to} N (0, 1),$ as $n \to \infty$ . This completes the proof.

A distribution-free test of independence based on a modified mean variance index

Abstract

1. Introduction

Table 1. Empirical bivariate distribution for a fixed x.

2. Preliminaries

3. Main results

3.1. Asymptotic properties when R is diverging