Abstract
This article deals with the problem of classification when some of the covariates may have missing parts. Here, it is allowed for both the training sample as well as the new unclassified observation to have missing parts in the covariates. In fact, it is shown in Remark 3.3 that in classification the reconstruction/imputation of the missing part of a new unclassified observation (which is to be classified) can be counter-productive in terms of the error rates. Furthermore, unlike many of the results in the literature, where covariate fragments are usually assumed to be missing completely at random, we do not impose such assumptions here. Given the observed parts of the covariates, we construct a kernel-type classifier which is straightforward to implement. The proposed classifier is constructed based on d-dim covariate vectors that are obtained from the original covariates (by moving from the space to
), where
itself is a parameter that has to be estimated. To estimate various parameters, we employ an easy-to-implement data-splitting approach.
Acknowledgments
This work was supported by the NSF under Grant DMS-1916161 of Majid Mojirsheibani.
Data availability statement
The Share Price Increase data set used in Section 4.2, and a description of it, is available at http://www.timeseriesclassification.com/dataset.php
Additionally, a copy of the ‘R’ codes used to carry out the analysis in Section 4.2 is posted on the GitHub repository at https://github.com/mynhinguyen/Statistical-classification-with-incomplete-covariates-via-filtering
Disclosure statement
No potential conflict of interest was reported by the author(s).