Search in:

Sequential Analysis

Design Methods and Applications

Volume 34, 2015 - Issue 3: Celebrating Seventy Years of Charles Stein's 1945 Seminal Paper on Two-Stage Sampling

Submit an article Journal homepage

Open access

494

Views

CrossRef citations to date

Altmetric

Listen

Original Articles

Geometric Classifier for Multiclass, High-Dimensional Data

Makoto AoshimaInstitute of Mathematics, University of Tsukuba, Ibaraki, Japan

Kazuyoshi YataInstitute of Mathematics, University of Tsukuba, Ibaraki, Japan

Pages 279-294 | Received 30 Jul 2014, Accepted 31 May 2015, Published online: 14 Aug 2015

Cite this article
https://doi.org/10.1080/07474946.2015.1063256
CrossMark

In this article

Abstract
1. INTRODUCTION
2. ASYMPTOTIC PROPERTIES OF THE GEOMETRIC CLASSIFIER
3. SAMPLE SIZE DETERMINATION TO CONTROL MISCLASSIFICATION RATES
4. SIMULATION
5. EXAMPLE
FUNDING
Acknowledgements
Footnotes
References
Appendixes

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Abstract

In this article, we consider a geometric classifier that is applicable to multiclass classification for high-dimensional data. We show the consistency property and the asymptotic normality of the geometric classifier under certain mild conditions. We discuss sample size determination so that the geometric classifier can ensure that its misclassification rates are less than prespecified thresholds. We give a two-stage procedure to estimate the sample sizes required in such a geometric classifier and propose a misclassification rate–adjusted classifier (MRAC) based on the geometric classifier. We evaluate the performance of the MRAC theoretically and numerically. Finally, we demonstrate the MRAC in actual data analyses by using a microarray data set.

Keywords:

Asymptotic normality
Geometric classifier
HDLSS
Sample size determination
Two-stage procedure

Subject Classifications:

62H30
62H10
62L10

1. INTRODUCTION

High-dimensional data situations occur in many areas of modern science, such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. A common feature of high-dimensional data is that the data dimension is high but the sample size is relatively low. This is the so-called HDLSS or “large p, small n” situation where p/n → ∞; here p is the data dimension and n is the sample size. Aoshima and Yata (Citation2011a,b) provided a variety of statistical inference for high-dimensional data such as given-bandwidth confidence region, two-sample test, classification, variable selection, regression, pathway analysis, and so on. They considered sample size determination to ensure prespecified high accuracy for high-dimensional, non-Gaussian inference and developed the theory of Stein's (1945, 1949) two-stage procedure that was originally given for inference on the univariate Gaussian mean. Aoshima and Yata (Citation2015a) verified the asymptotic normality of statistics appearing in inference on high-dimensional mean vectors under certain mild conditions. In this article, we focus on high-dimensional classification and make an attempt to give a multiclass classifier to hold misclassification rates less than prespecified thresholds.

Suppose we have independent and p-variate populations, π_i, i = 1,…, k, having un unknown mean vector μ_i and unknown covariance matrix Σ_i(>O) for each i. We assume that for all i ≠ j, where ||·|| denotes the Euclidean norm. Also, we assume that tr(Σ_i)/p ∈ (0, ∞) as p → ∞ for i = 1,…, k. Here, for a function, f(·), “f(p) ∈ (0, ∞) as p → ∞” implies and . We do not assume that Σ₁ = … =Σ_k. The eigen-decomposition of Σ_i is given by , where Λ_i is a diagonal matrix of eigenvalues, λ_i1 ≥ … ≥ λ_ip > 0, and H_i is an orthogonal matrix of the corresponding eigenvectors. We have independent and identically distributed (i.i.d.) observations, x_i1,…, x_{in_i}, from each π_i. Let , where z_ij is considered as a sphered data vector from a distribution with the zero mean vector and the identity covariance matrix. We assume n_i ≥ 2, i = 1,…, k. We estimate μ_i and Σ_i by and .

As for population π_i, i = 1,…, k, we make the following assumption:

(A-i) Let y_ij, j = 1,…, n_i, be i.i.d. random q_i-vectors having E(y_ij) = 0 and Var(y_ij) = I_{q_i} for each i (= 1,…, k), where q_i ≥ p. Let y_ij = (y_i1j,…, y_{iq_ij})^T in which for all r, for all r ≠ s, t, u. Then, the observations, x_ijs, from each π_i (i = 1,…, k) are given by (1) where Γ_i is a p × q_i matrix such that .

Here, I_{q_i} denotes the identity matrix of dimension q_i. Note that (1.1) includes the case that and y_ij = z_ij. Also, note that (A-i) is met when π_is have N_p(μ_i, Σ_i) for i = 1,…, k. In addition, we assume the following assumptions for Σ_is as necessary:

(A-ii) and as p → ∞ for i, j, l = 1,…, k.

Note that “ as p → ∞” is equivalent to the condition that “ as p → ∞”. Also, the sphericity condition such as “ as p → ∞ for i = 1,…, k” holds under (A-ii).

Remark 1.1

If all λ_ijs are bounded such as λ_ij ∈ (0, ∞) as p → ∞, (A-ii) trivially holds. For a spiked model such as λ_ij = a_ijp^α_ij (j = 1,…, t_i) and λ_ij = c_ij (j = t_i + 1,…, p) with positive constants, a_ijs, c_ijs and α_ijs, and positive integers t_is, (A-ii) holds under the condition that α_ij < 1/2 for j = 1,…, t_i(< ∞); i = 1,…, k.

Let x₀ be an observation vector of an individual belonging to one of the k populations. When k = 2, a typical classification rule is that one classifies the individual into π₁ if and into π₂ otherwise. However, the inverse matrix of S_{in_i} does not exist in the HDLSS context (p > n_i). Dudoit et al. (Citation2002) considered substituting the inverse matrix defined by only diagonal elements of S_{in_i}. Chan and Hall (Citation2009) and Aoshima and Yata (Citation2014) considered distance-based classifiers. Particularly, Aoshima and Yata (Citation2014) gave a distance-based classifier for multiclass, non-Gaussian, high-dimensional data and considered sample size determination to hold misclassification rates less than prespecified thresholds. When k = 2, the distance-based classifier is simplified as follows: One classifies the individual into π₁ if (2) and into π₂ otherwise. Here, −tr(S_1n₁)/(2n₁) +tr(S_2n₂)/(2n₂) is a bias-correction term. Aoshima and Yata (Citation2014) showed that the classifier holds a consistency property in which misclassification rates go to zero as p → ∞ even when (A-i) is not met. In that sense, the classifier is quite robust and applicable to actual high-dimensional data. On the other hand, Aoshima and Yata (Citation2011a) considered substituting {tr(S_{in_i})/p}I_p for S_{in_i} in order to use a geometric representation of HDLSS data from each π_i and gave a two-class quadratic classifier called the geometric classifier as follows: One classifies the individual into π₁ if (3) and into π₂ otherwise. Here, −p/n₁ + p/n₂ is a bias-correction term. Aoshima and Yata (Citation2014, Citation2015b) showed that the classifier holds the consistency property even when μ₁ = μ₂. Recently, Aoshima and Yata (Citation2015b) provided a general theory of quadratic classifiers for high-dimensional data in non-sparse settings.

In this article, we develop the geometric classifier by (1.3) to multiclass classification when k (≥2). In Section 2, we show the consistency property and the asymptotic normality of the geometric classifier for multiclass high-dimensional data. In Section 3, we discuss sample size determination so that the geometric classifier can ensure that its misclassification rates are less than prespecified thresholds. We give a two-stage procedure to estimate the sample sizes required in such the geometric classifier and propose a misclassification rate–adjusted classifier (MRAC) based on the geometric classifier. In Section 4, we evaluate the performance of the MRAC numerically as well. Finally, in Section 5, we demonstrate the MRAC in actual data analyses by using a microarray data set.

2. ASYMPTOTIC PROPERTIES OF THE GEOMETRIC CLASSIFIER

Let (4) for i = 1,…, k. We consider the geometric classifier when k (≥2) as follows: One classifies the individual into π_i if (5)

When argmin_{j=1,…, k}W_j(x₀|n_j) = {i₁,…, i_l} with integers l ∈ [2, k] and i₁ < … <i_l, we have max {argmin_{j=1,…, k}W_j(x₀|n_j)} = i_l. Note that the difference, W₁(x₀|n₁) − W₂(x₀|n₂), is equivalent to (1.3).

2.1. Consistency Property

Let Δ_ij(1) = ||μ_i − μ_j||² and Δ_ij(2) = tr(Σ_i) −tr(Σ_j) +tr(Σ_j)log {tr(Σ_j)/tr(Σ_i)} for all i ≠ j. Note that Δ_ij(2) ≥ 0 (i ≠ j) with equality if and only if tr(Σ_i) = tr(Σ_j). Let for all i ≠ j. We assume the followings as p → ∞ either when n_i is fixed or n_i → ∞ for i = 1,…, k:

	(A-iii) and for all i ≠ j;
	(A-iv) for all i ≠ j.

We denote the error rate of misclassifying an individual from π_i (into another class) by e(i). Then, we have the following result.

Theorem 2.1

Under (A-i), (A-iii), and (A-iv), it holds that as p → ∞

Remark 2.1

When k = 2, Aoshima and Yata (Citation2014) gave partial results of Theorem 2.1 under different conditions.

Remark 2.2

If as p → ∞ for all i ≠ j, (A-iii) and (A-iv) naturally hold. Then, one can claim Theorem 2.1 even when n_i is fixed for i = 1,…, k.

2.2. Asymptotic Normality

Let for all i ≠ j. Note that W_i(x₀|n_i) − W_j(x₀|n_j) is equivalent to with Σ_i = S_{in_i} and Σ_j = S_{jn_j} for all i ≠ j. We have that when x₀ ∈ π_i for all i ≠ j. Under (A-i), it holds that when x₀ ∈ π_i for all i ≠ j. Let for all i ≠ j. We assume extra assumptions as p → ∞ and n_i → ∞, i = 1,…, k:

(A-v) and for all i ≠ j.

Note that under (A-ii) it holds for all i ≠ j, so that tr(Σ_i)/tr(Σ_j) → 1 as p → ∞ for all i ≠ j under (A-ii) and (A-v). Then, we have the following results.

Theorem 2.2

Assume that Δ_ij(1)/tr(Σ_j) → 0 as p → ∞ for all i ≠ j. Under (A-i), (A-ii) and (A-v), it holds that as p → ∞ and n_i → ∞, i = 1,…, k where “ ⇒ ” denotes the convergence in distribution and Y_ij denotes a random variable distributed as the standard normal distribution.

Remark 2.3

When k = 2, Aoshima and Yata (Citation2011a) gave the asymptotic normality under some stronger conditions.

Corollary 2.1

Assume that Δ_ij(1)/tr(Σ_j) → 0 as p → ∞ for all i ≠ j. Under (A-i), (A-ii), and (A-v), the classification rule by (2.2) has that as p → ∞ and n_i → ∞, i = 1,…, k where Φ(·) denotes the cumulative distribution function of the standard normal distribution.

Remark 2.4

When k = 2, the above result is given as

3. SAMPLE SIZE DETERMINATION TO CONTROL MISCLASSIFICATION RATES

Let Δ_ij* = {tr(Σ_j)/p}Δ_ij = Δ_ij(1) + Δ_ij(2) for all i ≠ j. Let Δ_i* = min _{j(≠i)=1,…, k}min {Δ_ij*, Δ_ji*} for i = 1,…, k. We are interested in determining the sample size for (2.2) to ensure the requirement: where α_i ∈ (0, 1/2) and Δ_i*L(> 0) i = 1,…, k, are prespecified constants. We assume , i = 1,…, k.

3.1. Sample Size Determination

Let z_α be the upper α point of the standard normal distribution. We consider n_is satisfying (6) for all i ≠ j, where Δ_(ij) = pmax {Δ_i*L, Δ_j*L}/max {tr(Σ_i), tr(Σ_j)} (i ≠ j). Note that Δ_(ij) = Δ_(ji) and Δ_(ij) ≤ min {Δ_ij, Δ_ji} for all i ≠ j. Under (3.1), we have that so that from Theorem 2.2 it follows that for i = 1,…, k under (3.1) and the assumptions of Theorem 2.2. First, we consider the case when − 1| > 0 for i ≠ j. In the case, it holds . Under (A-i) and (A-ii), from Theorem 2.1 we have that even if n_is are fixed for i ≠ j. Next, we consider the case when tr(Σ₁) = … = tr(Σ_k). Let for i = 1,…, k. From the fact that (i ≠ j), it holds that for i ≠ j

Let us write σ_(i) = max _{j(≠i)=1,…, k}σ_j and α_(i) = min _{j(≠i)=1,…, k}α_j for i = 1,…, k. From the above arguments, we can find n_i, i = 1,…, k, to satisfy (3.1) by (7)

Note that n_i → ∞, i = 1,…, k, as p → ∞ from the fact that . For example, when k = 2, tr(Σ₁) = tr(Σ₂) and Δ_1*L = Δ_2*L, the smallest integer (n₁, n₂) satisfying (3.2) holds the following optimality:

According to (3.2), we take samples from each π_i and calculate W_i(x₀|n_i), i = 1,…, k, in (2.1). We consider the following classification procedure based on the misclassification rate adjusted classifier by Aoshima and Yata (Citation2014):

Misclassification rate–adjusted classifier (MRAC)

	Step 1: Set i = 0.
	Step 2: Put i = i + 1. If i = k, go to Step 4; otherwise go to Step 3.
	Step 3: If it holds that for all j = i + 1,…, k, go to Step 4; otherwise go to Step 2.
	Step 4: Classify x₀ into π_i.

We have the following result.

Theorem 3.1

Under (A-i) to (A-iii), for the MRAC with (3.2), it holds that as p → ∞ (8)

3.2. Designing a Lower Bound, Δ_i*L

First, we consider a lower bound of Δ_ij(1). Let . By using the two-sample test by Aoshima and Yata (Citation2015a) under certain regularity conditions, it holds that as p → ∞ and n_i → ∞, i = 1,…, k where Y_ij denotes a random variable distributed as the standard normal distribution and having W_{in_i}s defined by (9) in Yata and Aoshima (Citation2013). Here, W_{in_i} is an unbiased estimator of and as p → ∞ and n_i → ∞ under (A-i). See Aoshima and Yata (Citation2014) for the details. It follows that for given α′ ∈ (0, 1/2). Thus, one may design a lower bound of Δ_ij(1) by (9) for sufficiently small α′. Next, we consider a lower bound of Δ_ij(2). For i ≠ j it holds that with equality if and only if tr(Σ_i) = tr(Σ_j). We note that as p → ∞ and n_i → ∞, i = 1,…, k under (A-i). Thus, one may design a lower bound of Δ_ij(2) by for i ≠ j. Let Δ_ij*L = Δ_ij(1)L + Δ_ij(2)L for all i ≠ j. Note that Δ_ij*L = Δ_ji*L for i ≠ j. Finally, we choose a lower bound, Δ_i*L, by Δ_i*L = min _{j(≠i)=1,…, k}Δ_ij*L for sufficiently small α′.

3.3. Two-Stage Procedure

In order to estimate C_is in (3.2), we proceed with the following two steps:

1.	Choose m_i(≥4) satisfying (10) for i = 1,…, k. Note that (3.5) holds when m_i/C_i ∈ (0, 1) as p → ∞. Take pilot samples, x_ij, j = 1,…, m_i, of size m_i from each π_i. Then, calculate W_{im_i} for each π_i according to (9) in Yata and Aoshima (Citation2013). Let and for i = 1,…, k. Define the total sample size for each π_i by (11) where ⌈ x ⌉ denotes the smallest integer ≥x.
2.	For each i, if N_i = m_i, do not take any additional samples from π_i and otherwise—that is, if N_i > m_i—take additional samples, x_ij, j = m_i + 1,…, N_i, of size N_i − m_i from π_i. By combining the initial samples and the additional samples, calculate and S_{iN_i}, i = 1,…, k. Then, follow MRAC by using W_i(x₀\|N_i) and tr(S_{iN_i}) instead of W_i(x₀\|n_i) and tr(S_{in_i}).

Theorem 3.2

Under (A-i) to (A-iii), (3.3) holds for the MRAC with (3.5) and (3.6).

Remark 3.1

When k = 2, Aoshima and Yata (Citation2011a) gave a two-stage classification rule based on the geometric classifier. See Theorem 4.3 in Aoshima and Yata (Citation2011a) for the details. We emphasize that the MRAC can claim (3.3) for k ≥ 2 even under milder conditions than the original one by Aoshima and Yata (Citation2011a).

Remark 3.2

Under (A-i), (A-ii), and (3.5), it holds that N_i/C_i = 1 + o_P(1) as p → ∞, which is in the HDLSS situation in the sense that N_i/p = o_P(1) under the condition that .

Remark 3.3

Even when m_i/C_i > 1 for some i, the assertion in Theorem 3.2 is still claimed. However, it may cause oversampling in the sense that N_i/C_i > 1 w.p.1.

4. SIMULATION

In order to examine the performance of the MRAC with (3.5) and (3.6), we used computer simulations. First, we considered two classes having Gaussian distributions. Independent pseudorandom observations were generated from π_i: N_p(μ_i, Σ_i), i = 1, 2. We considered Σ₁ = B{(−1)^|i−j|0.3^{|i−j|^1/3}}B and Σ₂ = c{(−1)^|i−j|0.4^{|i−j|^1/3}}, where B = diag[{0.5 + 1/(p + 1)}^1/2,…, {0.5 + p/(p + 1)}^1/2]. Note that tr(Σ₁) = p and tr(Σ₂) = cp. We set μ₁ = (1,…, 1, 0,…, 0)^T whose the first 30 elements are 1 and μ₂ = (0,…, 0)^T, so that Δ_ij(1) = ||μ₁ − μ₂||² = 30. We prespecified Δ_1*L = Δ_2*L = Δ₁₂₍₁₎ = 30. We set (α₁, α₂) = (0.05, 0.15) and m_i = ⌈ 0.5 × (C_i − 1) ⌉ +1, i = 1, 2, where C_i is defined by (3.2). We considered four cases: (a) p = 500 when c = 1, (b) p = 1, 000 when c = 1, (c) p = 500 when c = 1.2, and (d) p = 1, 000 when c = 1.2. By averaging the outcomes from 2,000 (=R, say) replications, the findings are summarized in Table . Under a fixed scenario, suppose that the rth replication ends with N_i = n_ir (i = 1, 2) observations for r = 1,…, R. Let and . In the end of the rth replication, we checked whether the classifier does (or does not) classify x₀ from π_i correctly and defined P_ir = 0 (or 1) accordingly for each i. We calculated for each i as un estimate of e(i). Their estimated standard errors were given by for each i, where . As observed in Table , the two-class MRAC with (3.5) and (3.6) gave adequate performances for all the cases when considered those standard errors. Especially, when tr(Σ₁) ≠ tr(Σ₂) such as in (c) and (d), the MRAC gave good performances because Δ_i* > Δ_i*L, i = 1, 2.

Table 1. Accuracy of the two-class MRAC with (3.5) and (3.6)

Display Table

Next, we considered three classes having non-Gaussian distributions generated by y_ijl = (8/10)^1/2w_ijl, where w_ijl, j = 1,…, p (l = 1, 2,…) are independently distributed as t-distribution with 10 degrees of freedom for each π_i (i = 1, 2, 3). Note that E(y_ijl) = 0, , and y_ijl, j = 1,…, p (i = 1, 2, 3; l = 1, 2,…) are independent. Let , where . Then, the distribution of x_il satisfies (A-i) for each π_i. We considered Σ₁ = B{(−1)^|i−j|0.3^{|i−j|^1/3}}B, Σ₂ = B{(−1)^|i−j|0.4^{|i−j|^1/3}}B and Σ₃ = 1.2{(−1)^|i−j|0.4^{|i−j|^1/3}}. We set μ₁ = (1,…, 1, 0,…, 0)^T whose first 40 elements are 1, μ₂ = (0,…, 0, 1,…, 1, 0,…, 0)^T whose the 21st to the 60th elements are 1, and μ₃ = (0,…, 0)^T. Then, we had Δ_i* ≥ 40 for i = 1, 2, 3. We prespecified Δ_i*L = 40, i = 1, 2, 3. We set m_i = ⌈ 0.5 × (C_i − 1) ⌉ +1 for each π_i. We considered four cases: (a) p = 500 when (α₁, α₂, α₃) = (0.1, 0.1, 0.1), (b) p = 1, 000 when (α₁, α₂, α₃) = (0.1, 0.1, 0.1), (c) p = 500 when (α₁, α₂, α₃) = (0.05, 0.1, 0.15), and (d) p = 1, 000 when (α₁, α₂, α₃) = (0.05, 0.1, 0.15). By averaging the outcomes from 2,000 (=R, say) replications, the findings are summarized in Table . Throughout, the three-class MRAC with (3.5) and (3.6) gave adequate performance for all cases when considering those standard errors.

Table 2. Accuracy of the three-class MRAC with (3.5) and (3.6)

Display Table

5. EXAMPLE

We analyzed gene expression data by Armstrong et al. (Citation2002) in which the data set consisted of 12, 582 (=p) genes. We had three classes of leukemia subtypes; that is, π₁: acute lymphoblastic leukemia (24 samples), π₂: mixed-lineage leukemia (20 samples), and π₃: acute myeloid leukemia (28 samples). We used the MRAC and compared the geometric classifier by (3.5) and (3.6) with the distance-based classifier by Aoshima and Yata (Citation2014). The total sample size of the distance-based classifier is defined by for each π_i, where Δ_i(1) = min _{j(≠i)=1,…, k}Δ_ij(1) for i = 1,…, k, and Δ_i(1)L is a lower bound of Δ_i(1) such as Δ_i(1) ≥ Δ_i(1)L. Since Δ_i* ≥ Δ_i(1), N_i*s are larger than N_is in (3.6) w.p.1 when Δ_i*L > Δ_i(1)L.

We prespecified (α₁, α₂, α₃) = (0.05, 0.15, 0.1), so that α₍₁₎ = 0.1, α₍₂₎ = 0.05 and α₍₃₎ = 0.05. We set m₁ = m₂ = m₃ = 10. According to Section 3.2, by setting α′ = 0.05 and n_i = m_i(= 10), i = 1, 2, 3, we had Δ_12*L = 6.11 × 10⁹, Δ_13*L = 2.45 × 10¹⁰ and Δ_23*L = 8.09 × 10⁹. Thus, we prespecified Δ_1L* = min (Δ_12L*, Δ_13L*) = 6.11 × 10⁹, Δ_2L* = min (Δ_12L*, Δ_23L*) = 6.11 × 10⁹ and Δ_3L* = min (Δ_13L*, Δ_23L*) = 8.09 × 10⁹. Also, we had Δ_12(1)L = 5.96 × 10⁹, Δ_13(1)L = 2.37 × 10¹⁰ and Δ_23(1)L = 7.81 × 10⁹ according to (3.4). Thus, we prespecified Δ_1(1)L = 5.96 × 10⁹, Δ_2(1)L = 5.96 × 10⁹, and Δ_3(1)L = 7.81 × 10⁹.

By using pilot samples of size m₁ = m₂ = m₃ = 10, we calculated W_1m₁ = 2.59 × 10¹⁹, W_2m₂ = 2.16 × 10¹⁹, and W_3m₃ = 2.51 × 10¹⁹. From (3.6), the total sample size for π₁ was calculated by

Similarly, we had N₂ = 16 and N₃ = 12. We considered constructing the geometric classifier, W_i(x₀|N_i), i = 1, 2, 3, by (N₁, N₂, N₃) = (19, 16, 12) samples and checking the accuracy of the MRAC by using the remaining (24 − N₁, 20 − N₂, 28 − N₃) = (5, 4, 16) samples. We randomly split the data set from each π_i into training sets of sizes (N₁, N₂, N₃) = (19, 16, 12) and test sets of sizes (5, 4, 16). We constructed W_i(x₀|N_i), i = 1, 2, 3, by the training sets and checked the accuracy of the MRAC by using the test sets. We repeated this procedure 100 times. Then, we had the average of misclassification rates as , , and . Also, for the distance-based classifier by Aoshima and Yata (Citation2014), we calculated the total sample sizes as (N_1*, N_2*, N_3*) = (20, 17, 12) and had the average misclassification rates as , , and . Similarly, for various settings of α_is, we investigated the performance of the geometric classifier and the distance-based classifier in the MRAC. Throughout, we used the same settings as m₁ = m₂ = m₃ = 10 and (Δ_1*L, Δ_2*L, Δ_3*L) = (6.11 × 10⁹, 6.11 × 10⁹, 8.09 × 10⁹) or (Δ_1(1)L, Δ_2(1)L, Δ_3(1)L) = (5.96 × 10⁹, 5.96 × 10⁹, 7.81 × 10⁹). We summarized the results in Table . Both classifiers seem to give adequate performance in such an HDLSS situation. The geometric classifier would save more observations compared to the distance-based classifier, especially in small sample size settings. On the other hand, the distance-based classifier is very versatile and it holds (3.3) under milder conditions than the geometric classifier. See Sections 3 and 4 in Aoshima and Yata (Citation2014) for details.

Table 3. Average misclassification rates of the MRAC by the geometric classifier with (3.5) and (3.6) and by the distance-based classifier by Aoshima and Yata (2014). We set m₁ = m₂ = m₃ = 10 and (Δ_1L, Δ_2L, Δ_3*L) = (6.11 × 10⁹, 6.11 × 10⁹, 8.09 × 10⁹) or (Δ_1(1)L, Δ_2(1)L, Δ_3(1)L) = (5.96 × 10⁹, 5.96 × 10⁹, 7.81 × 10⁹). When α_i ≤ 0.05 at least for two π_is, the result was not available within the data sets

Display Table

FUNDING

Research of the first author was partially supported by Grants-in-Aid for Scientific Research (B) and Challenging Exploratory Research, Japan Society for the Promotion of Science (JSPS), under Contract Numbers 22300094 and 26540010. Research of the second author was partially supported by Grant-in-Aid for Young Scientists (B), Japan Society for the Promotion of Science (JSPS), under Contract Number 26800078.

ACKNOWLEDGMENT

The authors thank the Editor-in-Chief, Professor Nitis Mukhopadhyay, for giving us the opportunity to contribute to Stein's (1945) 70-Year Celebration Issue.

Notes

Recommended by Nitis Mukhopadhyay

REFERENCES

Aoshima, M., and Yata, K., 2011a. Authors’ Response, Sequential Analysis 30 (2011a), pp. 432–440.
Web of Science ®Google Scholar
Aoshima, M., and Yata, K., 2011b. Two-Stage Procedures for High-Dimensional Data (Editor's special invited paper), Sequential Analysis 30 (2011b), pp. 356–399.
Web of Science ®Google Scholar
Aoshima, M., and Yata, K., 2014. A Distance-Based, Misclassification Rate Adjusted Classifier for Multiclass, High-Dimensional Data, Annals of Institute of Statistical Mathematics 66 (2014), pp. 983–1010.
Web of Science ®Google Scholar
Aoshima, M., and Yata, K., 2015a. Asymptotic Normality for Inference on Multisample, High-Dimensional Mean Vectors under Mild Conditions, Methodology and Computing in Applied Probability 17 (2015a), pp. 419–439.
Web of Science ®Google Scholar
Aoshima, M., and Yata, K., 2015b. 2015b, High-Dimensional Quadratic Classifiers in Non-Sparse Settings. arXiv:1503.04549.
Google Scholar
Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., Sallan, S. E., Lander, E. S., Golub, T. R., and Korsmeyer, S. J., 2002. MLL Translocations Specify a Distinct Gene Expression Profile That Distinguishes a Unique Leukemia, Nature Genetics 30 (2002), pp. 41–47.
PubMed Web of Science ®Google Scholar
Chan, Y.-B., and Hall, P., 2009. Scale Adjustments for Classifiers in High-Dimensional, Low Sample Size Settings, Biometrika 96 (2009), pp. 469–478.
Web of Science ®Google Scholar
Dudoit, S., Fridlyand, J., and Speed, T. P., 2002. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data, Journal of American Statistical Association 97 (2002), pp. 77–87.
Web of Science ®Google Scholar
Stein, C., 1945. A Two-Sample Test for a Linear Hypothesis Whose Power Is Independent of the Variance, Annals of Mathematical Statistics 16 (1945), pp. 243–258.
Google Scholar
Stein, C., 1949. Some Problems in Sequential Estimation (abstract), Econometrica 17 (1949), pp. 77–78.
Google Scholar
Yata, K., and Aoshima, M., 2013. Correlation Tests for High-Dimensional Data Using Extended Cross-Data-Matrix Methodology, Journal of Multivariate Analysis 117 (2013), pp. 313–331.
Web of Science ®Google Scholar

Appendix

Proof of Theorem 2.1

Under (A-iv), it holds that and for all i, j. Note that for all i ≠ j, under (A-iv). Then, it holds that for all i ≠ j, under (A-iii) and (A-iv). Thus, by using Chebyshev's inequality, under (A-iii) and (A-iv) we obtain that when x₀ ∈ π_i for all i ≠ j. Under (A-i) and (A-iv) we have that and for all i ≠ j, so that tr(S_{in_i}) = tr(Σ_i) + o_P(Δ_ij) and when x₀ ∈ π_i for all i ≠ j. Note that tr(Σ_i)/p ∈ (0, ∞) as p → ∞ for i = 1,…, k. Then, under (A-i), (A-iii), and (A-iv), we have that (12) when x₀ ∈ π_i for all i ≠ j. Hence, we conclude the results.

Proof of Theorem 2.2

We note that for all i ≠ j under (A-ii). Also, note that for all i ≠ j under (A-ii) and (A-v) since δ_ji/(n_jδ_ij) = o(1) for all i ≠ j under (A-ii). Let for i ≠ j. Then, similar to (A.1), under (A-i), (A-ii), (A-v), and Δ_ij(1)/tr(Σ_j) = o(1) for all i ≠ j, we have that (13) when x₀ ∈ π_i for all i ≠ j since tr(S_{in_i})/tr(Σ_i) −1 = O_P(δ_ij/p) = o_P(1). Here, we note that , under (A-ii) from the fact that under (A-ii). It holds that when x₀ ∈ π_i and , i = 1,…, k. Then, under (A-ii) and (A-v), we have that (14) when x₀ ∈ π_i for all i ≠ j. On the other hand, under (A-i) and (A-ii), it holds that (15) for all i ≠ j. Then, by combining (A.2) with (A.3) and (A.4), under the assumptions of Theorem 2.2 we have that when x₀ ∈ π_i. Note that for all i ≠ j under (A-ii). Then, in a way similar to the proof of Theorem 3 in Aoshima and Yata (Citation2014), under (A-i) and (A-ii) we can claim that ω(x₀|n_i, n_j)/δ_ij ⇒ Y_ij for all i ≠ j. Thus, it concludes the result.

Proof of Corollary 2.1

By using Theorem 2.2 and Bonferroni's inequality, we have that when x₀ ∈ π_i. This concludes the proof.

Proof of Theorem 3.1

From (3.2), it holds that δ_ij ≤ 2Δ_(ij){1 + o(1)}/(z_{α_i/(k − 1)} + z_{α_j/(k − 1)}) when tr(Σ_i)/tr(Σ_j) = 1 + o(1) for all i ≠ j. We denote the error of misclassifying an individual from π_i into π_j by e(j|i) for i ≠ j. Then, under (3.2) and the assumptions of Theorem 2.2, we have that

when x₀ ∈ π_i for i ≠ j, where Y_ij denotes a random variable distributed as the standard normal distribution. We note that (A-v) holds under (A-iii) when for all i ≠ j. On the other hand, when δ_ij/Δ_ij = o(1) for i ≠ j, from Theorem 2.1 it holds that for x₀ ∈ π_i

under (A-i) to (A-iii) without (A-v). We note that δ_ij/Δ_ij = o(1) for i ≠ j under (A-ii) when it holds that or . Thus, one can claim e(j|i) ≤ α_i/(k − 1) + o(1) for all i ≠ j under (3.2) and (A-i) to (A-iii). Then, from Bonferroni's inequality, we have that when x₀ ∈ π_i. This concludes the proof.

Proof of Theorem 3.2

Let C_iL = ⌊C_i − (ωC_i)^1/2⌋, i = 1,…, k, where ω (> 0) is a variable such that ω → 0 as p → ∞. Then, from the proof of Theorem 5 in Aoshima and Yata (Citation2014), it holds that max {m_i, C_iL} ≤ N_i < C_i + (ωC_i)^1/2 as p → ∞ w.p.1. Then, in a way similar to the proofs of Theorems 2.4 and 2.5 in Aoshima and Yata (Citation2011a), under (A-i) to (A-iii) we have that for all i ≠ j where ω(x₀|N_i, N_j) is given in the proof of Theorem 2.2. Similar to the proof of Theorem 2.2, under (A-i) to (A-iii) we have that when x₀ ∈ π_i for all i ≠ j. Then, in a way similar to the proof of Theorem 3.1, we can conclude the results.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Geometric Classifier for Multiclass, High-Dimensional Data

Abstract

1. INTRODUCTION

Remark 1.1

2. ASYMPTOTIC PROPERTIES OF THE GEOMETRIC CLASSIFIER

2.1. Consistency Property

Theorem 2.1

Remark 2.1

Remark 2.2

2.2. Asymptotic Normality

Theorem 2.2

Remark 2.3

Corollary 2.1

Remark 2.4

3. SAMPLE SIZE DETERMINATION TO CONTROL MISCLASSIFICATION RATES

3.1. Sample Size Determination

Theorem 3.1

3.2. Designing a Lower Bound, Δ_i*L

3.3. Two-Stage Procedure

Theorem 3.2

Remark 3.1

Remark 3.2

Remark 3.3

4. SIMULATION

Table 1. Accuracy of the two-class MRAC with (3.5) and (3.6)

Table 2. Accuracy of the three-class MRAC with (3.5) and (3.6)

5. EXAMPLE

FUNDING

ACKNOWLEDGMENT

REFERENCES

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Corollary 2.1

Proof of Theorem 3.1

Proof of Theorem 3.2

Information for

Open access

Opportunities

Help and information

Geometric Classifier for Multiclass, High-Dimensional Data

Abstract

1. INTRODUCTION

Remark 1.1

2. ASYMPTOTIC PROPERTIES OF THE GEOMETRIC CLASSIFIER

2.1. Consistency Property

Theorem 2.1

Remark 2.1

Remark 2.2

2.2. Asymptotic Normality

Theorem 2.2

Remark 2.3

Corollary 2.1

Remark 2.4

3. SAMPLE SIZE DETERMINATION TO CONTROL MISCLASSIFICATION RATES

3.1. Sample Size Determination

Theorem 3.1

3.2. Designing a Lower Bound, Δi*L

3.3. Two-Stage Procedure

Theorem 3.2

Remark 3.1

Remark 3.2

Remark 3.3

4. SIMULATION

Table 1. Accuracy of the two-class MRAC with (3.5) and (3.6)

Table 2. Accuracy of the three-class MRAC with (3.5) and (3.6)

5. EXAMPLE

FUNDING

ACKNOWLEDGMENT

Notes

REFERENCES

Appendix

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Corollary 2.1

Proof of Theorem 3.1

Proof of Theorem 3.2

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

3.2. Designing a Lower Bound, Δ_i*L