1,290
Views
3
CrossRef citations to date
0
Altmetric
Theory and Methods

Feature Screening with Conditional Rank Utility for Big-Data Classification

& ORCID Icon
Pages 1385-1395 | Received 20 Jul 2022, Accepted 15 Mar 2023, Published online: 18 Apr 2023
 

Abstract

Feature screening is a commonly used strategy to eliminate irrelevant features in high-dimensional classification. When one encounters big datasets with both high dimensionality and huge sample size, the conventional screening methods become computationally costly or even infeasible. In this article, we introduce a novel screening utility, Conditional Rank Utility (CRU), and propose a distributed feature screening procedure for the big-data classification. The proposed CRU effectively quantifies the significance of a numerical feature on the categorical response. Since CRU is constructed based on the ratio of the mean conditional rank to the mean unconditional rank of a feature, it is robust against model misspecification and the presence of outliers. Structurally, CRU can be expressed as a simple function of a few component parameters, each of which can be distributively estimated using a natural unbiased estimator from the data segments. Under mild conditions, we show that the distributed estimator of CRU is fully efficient in terms of the probability convergence bound and the mean squared error rate; the corresponding distributed screening procedure enjoys the sure screening and ranking properties. The promising performances of the CRU-based screening are supported by extensive numerical examples. Supplementary materials for this article are available online.

Supplementary Materials

The supplementary materials contain proofs of all technical results in the article.

Disclosure Statement

The authors report there are no competing interests to declare.

Acknowledgments

The authors thank the editor, the associate editor, and two anonymous referees for their constructive feedback and suggestions.

Additional information

Funding

This work was supported in part by National Key Research and Development Program of China under grant 2022YFA1003804, Major Key Project of Peng Cheng Laboratory under grant PCL2023AS1-2, and Natural Science Engineering Research Council of Canada under grants RGPIN-2016-05024.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.