References
- Ayres-de Campos, D., Bernardes, J., Garrido, A., Marques-de Sa, J., and Pereira-Leite, L. (2000), “Sisporto 2.0: A Program for Automated Analysis of Cardiotocograms,” Journal of Maternal-Fetal Medicine, 9, 311–318.
- Blanco, J. L., and Rai, P. K. (2014), “nanoflann: a C++ header-only fork of FLANN, a library for nearest neighbor (NN) with kd-trees,” available at https://github.com/jlblancoc/nanoflann.
- Bowden, G. J., Maier, H. R., and Dandy, G. C. (2002), “Optimal Division of Data for Neural Network Models in Water Resources Applications,” Water Resources Research, 38, 2–1–2–11. DOI: https://doi.org/10.1029/2001WR000266.
- Breiman, L. (2001), “Random Forests,” Machine Learning, 45, 5–32. DOI: https://doi.org/10.1023/A:1010933404324.
- Brooks, T. F., Pope, D. S., and Marcolini, M. A. (1989), Airfoil Self-noise and Prediction, Washington, DC: NASA.
- Chen, W. Y., Mackey, L., Gorham, J., Briol, F.-X., and Oates, C. (2018), “Stein Points,” in Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, Volume 80 of Proceedings of Machine Learning Research, pp. 844–853, Stockholmsmässan, Stockholm Sweden: PMLR.
- Chen, Y., Welling, M., and Smola, A. (2010), “Super-samples From Kernel Herding,” Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pp. 109–116.
- Dua, D., and Graff, C. (2017), “UCI Machine Learning Repository,” available at http://archive.ics.uci.edu/ml.
- Elo, I., Rodriguez, G., and Lee, H. (2001), Racial and Neighborhood Disparities in Birthweight in Philadelphia, Washington DC: Annual Meeting of the Population Association of America.
- Evett, I. W., and Spiehler, E. J. (1989), “Rule Induction in Forensic Science,” in Knowledge Based Systems, eds. P. H. Duffin, New York, NY: Halsted Press, pp. 152–160.
- Fang, K. T., and Wang, Y. (1994), Number-Theoretic Methods in Statistics, Boca Raton, FL: Chapman & Hall.
- Faraway, J. J. (2015), Linear Models with R, (2nd ed), Boca Raton, FL: CRC Press.
- Fisher, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, 7, 179–188. DOI: https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.
- Flury, B. (1990), “Principal Points,” Biometrika, 77, 33–41. DOI: https://doi.org/10.1093/biomet/77.1.33.
- Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization Paths for Generalized Linear Models Via Coordinate Descent,” Journal of Statistical Software, 33, 1–22. DOI: https://doi.org/10.18637/jss.v033.i01.
- Galvão, R. K. H., Araujo, M. C. U., José, G. E., Pontes, M. J. C., Silva, E. C., and Saldanha, T. C. B. (2005), “A Method for Calibration and Validation Subset Partitioning,” Talanta, 67, 736–740. DOI: https://doi.org/10.1016/j.talanta.2005.03.025.
- Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York: Springer.
- Hickernell, F. J. (1999), “Goodness-of-fit Statistics, Discrepancies and Robust Designs,” Statistics and Probability Letters, 44, 73–78. DOI: https://doi.org/10.1016/S0167-7152(98)00293-4.
- Joseph, V. R., Dasgupta, T., Tuo, R., and Wu, C. F. J. (2015), “Sequential Exploration of Complex Surfaces Using Minimum Energy Designs,” Technometrics, 57, 64–74. DOI: https://doi.org/10.1080/00401706.2014.881749.
- Kennard, R. W., and Stone, L. A. (1969), “Computer Aided Design of Experiments,” Technometrics, 11, 137–148. DOI: https://doi.org/10.1080/00401706.1969.10490666.
- Liaw, A., and Wiener, M. (2002), “Classification and Regression by Randomforest,” R News, 2, 18–22.
- Mak, S. (2019), “Support Points. R Package version 0.1.4,” available at https://cran.r-project.org/src/contrib/Archive/support.
- Mak, S., and Joseph, V. R. (2018a), “Projected Support Points: A New Method for High-dimensional Data Reduction,” arXiv preprint: 1708.06897.
- Mak, S., and Joseph, V. R. (2018b), “Support Points,” The Annals of Statistics, 46, 2562–2592.
- May, R., Maier, H., and Dandy, G. (2010), “Data Splitting for Artificial Neural Networks Using SOM-based Stratified Sampling,” Neural Networks, 23, 283 – 294. DOI: https://doi.org/10.1016/j.neunet.2009.11.009.
- Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. (1994), The Population Biology of Abalone (Haliotis Species) in Tasmania. I. Blacklip Abalone (h. rubra) From the North Coast and Islands of Bass Strait, Technical Report, 48, Tasmania: Sea Fisheries Division, p. 411.
- Niederreiter, H. (1992), Random Number Generation and Quasi-Monte Carlo Methods, Philadelphia, PA: SIAM.
- Owen, A. B. (2013), Monte Carlo Theory, Methods and Examples, available at https://statweb.stanford.edu/∼owen/mc/.
- Reitermanová, Z. (2010), “Data Splitting,” WDS’10 Proceedings of Contributed Papers, Part I, pp. 31–36.
- Snee, R. D. (1977), “Validation of Regression Models: Methods and Examples,” Technometrics, 19, 415–428. DOI: https://doi.org/10.1080/00401706.1977.10489581.
- Stevens, A., and Ramirez-Lopez, L. (2020), An Introduction to the Prospectr Package (R package version 0.2.1).
- Stone, M. (1974), “Cross-validatory Choice and Assessment of Statistical Predictions,” Journal of Royal Statistical Society, Series B, 36, 111–146. DOI: https://doi.org/10.1111/j.2517-6161.1974.tb00994.x.
- Street, W. N., Wolberg, W. H., and Mangasarian, O. L. (1993), “Nuclear Feature Extraction for Breast Tumor Diagnosis,” in Biomedical Image Processing and Biomedical Visualization, Vol. 1905, eds. R. S. Acharya and D. B. Goldgof, pp. 861–870. San Jose, CA: International Society for Optics and Photonics.
- Székely, G. J. and Rizzo, M. L. (2013), “Energy Statistics: A Class of Statistics Based on Distances,” Journal of Statistical Planning and Inference, 143, 1249–1272. DOI: https://doi.org/10.1016/j.jspi.2013.03.018.
- Thodberg, H. H. (1993), “Ace of Bayes: Application of Neural Networks With Pruning,” Technical report, Roskilde, Danmark.
- Tibshirani, R. (1996), “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
- Wang, H., Yang, M., and Stufken, J. (2019), “Information-based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: https://doi.org/10.1080/01621459.2017.1408468.
- Wu, C. F. J., and Hamada, M. S. (2011), Experiments: Planning, Analysis, and Optimization, (2nd ed.) Hoboken, NJ: Wiley.
- Xu, Y., and Goodacre, R. (2018), “On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning,” Journal of Analysis and Testing, 2, 249–262. DOI: https://doi.org/10.1007/s41664-018-0068-2.
- Yeh, I.-C. (1998), “Modeling of Strength of High-performance Concrete Using Artificial Neural Networks,” Cement and Concrete Research, 28,1797–1808. DOI: https://doi.org/10.1016/S0008-8846(98)00165-3.
- Zador, P. (1982), “Asymptotic Quantization Error of Continuous Signals and the Quantization Dimension,” IEEE Transactions on Information Theory, 28, 139–149. DOI: https://doi.org/10.1109/TIT.1982.1056490.