Accurate and Ultra-Efficient p-Value Calculation for Higher Criticism Tests

Wenjia Wanga Department of Biostatistics, University of Pittsburgh, Pittsburgh, PAView further author information

Yusi Fanga Department of Biostatistics, University of Pittsburgh, Pittsburgh, PAView further author information

Chung Changb Department of Applied Mathematics, National Sun Yat-sen University, Kaohsiung, TaiwanView further author information

George C. Tsenga Department of Biostatistics, University of Pittsburgh, Pittsburgh, PACorrespondence[email protected]
View further author information

Abstract

In modern data science, higher criticism (HC) method is effective for detecting rare and weak signals. The computation, however, has long been an issue when the number of p-values combined (K) and/or the number of repeated HC tests (N) are large. Some computing methods have been developed, but they all have significant shortcomings, especially when a stringent significance level is required. In this article, we propose an accurate and highly efficient computing strategy for four variations of HC. Specifically, we propose an unbiased cross-entropy-based importance sampling method ( ${IS}_{C E}$ ) to benchmark all existing computing methods, and develop a modified SetTest method (MST) that resolves numerical issues of the existing SetTest approach. We further develop an ultra-fast approach (UFI) combining pre-calculated statistical tables and cubic spline interpolation. Finally, following extensive simulations, we provide a computing strategy integrating MST, UFI, and other existing methods with R package “HCp” for virtually any K and small p-values ( $\sim 10^{- 20}$ ). The method is applied to a COVID-19 disease surveillance example for spatio-temporal outbreak detection from case numbers of 804 days in 3342 counties in the United States. Results confirm viability of the computing strategy for large-scale inferences. Supplementary materials for this article are available online.

KEYWORDS:

Supplementary Materials

Additional results and proofs:Include the analytic derivation of formulas and extensive simulation results. (Appendix.pdf, PDF file)

Data and code:Contain the R package (also available on GitHub), the R code and data (in “Simulation” sub-folder) to reproduce the results presented in the article, and README file (README.md, markdown file) to give detailed instructions on how to install the package and descriptions of each function. (HCp.zip, zipped file)

Disclosure Statement

The authors report there are no competing interests to declare.

Additional information

Funding

WW, YF and GCT are partially funded by NIH R01LM014142.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Accurate and Ultra-Efficient p-Value Calculation for Higher Criticism Tests

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Accurate and Ultra-Efficient p-Value Calculation for Higher Criticism Tests

Abstract

Supplementary Materials

Disclosure Statement

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date