491
Views
17
CrossRef citations to date
0
Altmetric
Original Articles

Testing Conditional Independence Restrictions

&
 

Abstract

We propose a nonparametric test of the hypothesis of conditional independence between variables of interest based on a generalization of the empirical distribution function. This hypothesis is of interest both for model specification purposes, parametric and semiparametric, and for nonmodel-based testing of economic hypotheses. We allow for both discrete variables and estimated parameters. The asymptotic null distribution of the test statistic is a functional of a Gaussian process. A bootstrap procedure is proposed for calculating the critical values. Our test has power against alternatives at distance n −1/2 from the null; this result holding independently of dimension. Monte Carlo simulations provide evidence on size and power.

JEL Classification:

ACKNOWLEDGEMENTS

We both thank seminar participants for their helpful comments. The first author would like to thank Tilburg University for its hospitality and Pierre Chaussé for research assistance. Financial support from the National Science Foundation and the North Atlantic Treaty Organization is gratefully acknowledged.

Notes

See Phillips (Citation1988) for a discussion of the difference between independence and conditional independence.

When X, Y, Z are jointly normal with mean μ and covariance matrix Σ = (σ ij ), Y⊥⊥X is equivalent to σ YX  = 0, while Y⊥⊥X | Z is equivalent to σ YX  = 0, where the concentration matrix Σ−1 = (σ ij ). In this case, there are simple parametric tests of both independence and conditional independence. For categorical data, there are also numerous tests of independence and conditional independence, see (Agresti (Citation1990), p. 228).

We need to exclude C from being the entire support since (Equation6) would reduce to P(AB) = P(A)P(B) which implies Y⊥⊥X but it is well known that conditional independence does not not imply independence (Chow and Teicher, 1998, p. 221).

Note that (Equation3) itself is not directly testable because one cannot estimate ϵ consistently.

This was pointed out to us by Jon Wellner to whom we are grateful.

There has also been some work using multivariate half spaces, i.e., hyperplanes, see Beran and Millar (Citation1986).

Formally speaking, the sets we examine are of the form A = {V ∈ 𝔅(y) × (− ∞, ∞) m+k } ∈ ℝ d , B = {V ∈ 𝔅(x) × (− ∞, ∞) l+k } ∈ ℝ d , and C = {V ∈ 𝔅(z) × (− ∞, ∞) l+m } ∈ ℝ d . Then, for example ABC = {V ∈ 𝔅(v)}.

In our discrete example, the dependence is uncovered by this choice of events, since clearly Pr(Y = 1, X = 0) ≠ Pr(Y = 1)Pr(X = 0).

This is because the class of rectangles of a given width separates probability measures. That is, if two probability measures P 1 and P 2 agree on the class of all rectangles of given width, then they agree on all Borel sets.

A case where H 0 would imply A(v | θ0, P) = 0 is when f(z) is constant for all z in 𝔅(z) and P(𝔅(y)|z) or P(𝔅(x)|z) are constant for all z in 𝔅(z).

Although CM n and KS n are desirable from a computational point of view, they can have poor (small sample) performance for large d, because the evaluation points are not representative enough. In practice, the following statistics may work better with large d and small n,

where {t i ; i = 1,…, m} is a fixed or random grid of points. The number of evaluation points, m, is under the control of the practitioner, but should increase with sample size, see Beran and Millar (1986) for justification of this device. In the simulations presented in Section 5, we used a random grid of points based on the observations.

Also note that the rate of convergence to the limiting distributions in Theorem 1 is n 1/2 independently of dimensions which implies that the size distortion is of order n −1/2 independently of dimensions. See Csörgó and Faraway (Citation1996).

See Härdle and Linton (Citation1994) for discussion of smoothing methods and Horowitz (Citation1997) for background on the bootstrap.

For such h, and K the uniform distribution, we get a k-nearest neighbor smooth distribution with with probability 1/k for all X j , j = 1,…, n, such that Z j ∈ 𝒩 k (Z i ), where 𝒩 k (Z i ) denotes a k-neighborhood of Z i .

Additional simulations with (Y, X, Z) trivariate normal are reported in Linton and Gozalo (Citation1995).

Note that for j = 1,…, p, and .

Given the large value of m using this procedure for n = 500, we decided to evaluated the test at only 10% of the points in each interval . This cut m to a more manageable 3510 points on average.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.