Abstract
Investigating the differential effect of treatments in groups defined by patient characteristics is of paramount importance in personalized medicine research. In some studies, participants are first classified as having or not of the characteristic of interest by diagnostic tools, but such classifiers may not be perfectly accurate. The impact of diagnostic misclassification in statistical inference has been recently investigated in parametric model contexts and shown to introduce severe bias in estimating treatment effects and give grossly inaccurate inferences. The article aims to address these problems in a fully nonparametric setting. Methods for consistently estimating and testing meaningful yet nonparametric treatment effects are developed. Along the way, we also construct estimators for misclassification error rates and investigate their asymptotic properties. The proposed methods are applicable for outcomes measured in ordinal, discrete, or continuous scales. They do not require any assumptions, such as the existence of moments. Simulation results show significant advantages of the proposed methods in bias reduction, coverage probability, and power. The applications of the proposed methods are illustrated with gene expression profiling of bronchial airway brushing in asthmatic and healthy control subjects. Supplementary materials for this article are available online.
Supplementary Materials
The supplementary materials consist of (a) A document (Supplementary.pdf) containing four sections: Section A: Technical Lemmas (and their proofs) needed to prove the main results of the article. Section B: The proofs of Proposition 3.1, 3.2, 4.1, and 4.2, and Theorem 3.1 and 4.1 in Sections 3 and 4 of the article. Section C: Additional simulation results to supplement the results reported in Section 5 of the article. Section D: A second real data example from a sleep deprivation study. (b) R-codes for replicating the numerical results reported in Section 5 of the main article and Section C of the supplementary document. (c) Anonymized and cleaned versions of the asthma and sleep deprivation datasets analyzed in Section 6 of the main article and Section D of the supplementary document, respectively. (d) R-codes used to analyze the asthma and sleep deprivation datasets.
Acknowledgments
The authors are grateful to the editor, associate editor, and anonymous reviewers for the valuable comments that substantially improved the manuscript. They also extend their gratitude to Tesfaye B. Mersha (Cincinnati Children’s Hospital Medical Center, University of Cincinnati College of Medicine) and Joseph Beyene (McMaster University) for the fruitful discussion about the transcriptomic data. Furthermore, the authors wish to express their gratitude to Drs. Hans P. A. Van Dongen and Brieann C. Satterfield of Washington State University for sharing the sleep deprivation data and granting permission for its use. The authors also thank Prof. Richard Kryscio of the University of Kentucky for his valuable insights during the revision of the manuscript.
Disclosure Statement
The authors report there are no competing interests to declare.