533
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Experiences with big data: Accounts from a data scientist’s perspective

, ORCID Icon, , &

References

  • Agneeswaran, V. S. 2014. Big data analytics beyond hadoop: Real-time applications with storm, spark, and more hadoop alternatives. New Jersey: FT Press.
  • Agresti, A., and B. A. Coull. 1998. Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician 52:119–26. doi:10.2307/2685469.
  • Alaswad, S., and Y. Xiang. 2017. A review on condition-based maintenance optimization models for stochastically deteriorating system. Reliability Engineering & System Safety 157:54–63. doi:10.1016/j.ress.2016.08.009.
  • Bisgaard, S., and M. Kulahci. 2011. Time series analysis and forecasting by example. New York: Wiley.
  • Borne, K. 2014. Outlier detection gets a makeover—surprise discovery in scientific big data. Statistics Views. Accessed June 1, 2019. https://www.statisticsviews.com/details/feature/6597751/Outlier-Detection-Gets-a-Makeover—Surprise-Discovery-in-Scientific-Big-Data.html.
  • Capaci, F., E. Vanhatalo, M. Kulahci, and B. Bergquist. 2019. The revised tennessee eastman process simulator as testbed for SPM and DoE methods. Quality Engineering 31 (2):212–29. doi:10.1080/08982112.2018.1461905.
  • Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–57. doi:10.1613/jair.953.
  • Croughan, M. S., K. B. Konstantinov, and C. Cooney. 2015. The future of industrial bioprocessing: Batch or continuous? Biotechnology and Bioengineering 112 (4):648–51. doi:10.1002/bit.25529.
  • De Ketelaere, B., M. Hubert, and E. Schmitt. 2015. Overview of PCA-based statistical process-monitoring methods for time dependent, high dimensional data. Journal of Quality Technology 47 (4):318–35. doi:10.1080/00224065.2015.11918137.
  • Dehlendorff, C., M. Kulahci, and K. K. Andersen. 2011. Designing simulation experiments with controllable and uncontrollable factors for applications in health care. Journal of the Royal Statistical Society: Series C (Applied Statistics) 60 (1):31–49. doi:10.1111/j.1467-9876.2010.00724.x.
  • Frumosu, F. D., and M. Kulahci. 2018. Big data analytics using semi-supervised learning methods. Quality and Reliability Engineering International 34 (7):1413–23. doi:10.1002/qre.2338.
  • Frumosu, F. D., and M. Kulahci. 2019. Outliers detection using an iterative strategy for semi-supervised learning. Quality and Reliability Engineering International 35 (5):1408–23. doi:10.1002/qre.2522.
  • Gajjar, S., M. Kulahci, and A. Palazoglu. 2016. Use of sparse principal component analysis (SPCA) for fault detection. Proceedings of the 11th IFAC Symposium on Dynamics and Control Process Systems.
  • Gajjar, S., M. Kulahci, and A. Palazoglu. 2017. Selection of non-zero loadings in sparse principal component analysis. Chemometrics and Intelligent Laboratory Systems 162:160–71. doi:10.1016/j.chemolab.2017.01.018.
  • Gajjar, S., M. Kulahci, and A. Palazoglu. 2018. Real time fault detection and diagnosis using sparse principal component analysis. Journal of Process Control 67:112–28. doi:10.1016/j.jprocont.2017.03.005.
  • Gao, H., S. Gajjar, M. Kulahci, Q. Zhu, and A. Palazoglu. 2016. Process knowledge discovery using sparse principal component analysis. Industrial & Engineering Chemistry Research 55 (46):12046–59. doi:10.1021/acs.iecr.6b03045.
  • Haixiang, G., L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Systems With Applications 73:220–39. doi:10.1016/j.eswa.2016.12.035.
  • Hastie, T., R. Tibshirani, and J. Friedman. 2009. The elements of statistical learning. 2nd ed. New York: Springer.
  • Khan, A. R., H. Schiøler, M. Kulahci, and T. Knudsen. 2017a. Selection of objective function for imbalanced classification: An industrial case study. Proceedings to 22nd Emerging Technologies and Factory Automation, Limassol, Cyprus.
  • Khan, A. R., H. Schiøler, M. Kulahci, and T. Knudsen. 2017b. Big data analytics for industrial process control. Proceedings to 22nd Emerging Technologies and Factory Automation, Limassol, Cyprus.
  • Khan, A. R., H. Schiøler, M. Zaki, and M. Kulahci. 2018. Rare-events classification: An approach based on genetic algorithm and voronoi tessellation. Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, Australia.
  • Kulahci, M., and J. Tyssedal. 2016. Split plot designs for multistage experimentation. Journal of Applied Statistics 44 (3):493–510. doi:10.1080/02664763.2016.1177497.
  • Lasi, H., P. Fettke, H-G. Kemper, T. Feld, and M. Hoffmann. 2014. Industry 4.0. Business & Information Systems Engineering 6 (4):239–42. doi:10.1007/s12599-014-0334-4.
  • Lawrence, N. D. 2017. Data readiness levels. https://arxiv.org/abs/1705.02245.
  • Leavitt, N. 2010. Will NoSQL databases live up to their promise? Computer 43 (2):12–14. doi:10.1109/MC.2010.58.
  • Li, D., S. L. Shah, and T. Chen. 2001. Identification of fast-rate models from multirate data. International Journal of Control 74 (7):680–89. doi:10.1080/00207170010018904.
  • Liu, W., T. J. Chua, J. Larn, F. Y. Wang, and X. F. Yin. 2002. APS, ERP and MES systems integration for semiconductor backend assembly. 7th International Conference on Control, Automation, Robotics and Vision, vol. 3, 1403–08.
  • Meer, K. H. 2005. Best practices in ERP software applications. iUniverse.
  • Meyer, H., F. Fuchs, and K. Thiel. 2009. Manufacturing execution systems: Optimal design, planning, and deployment. New York: McGraw Hill.
  • Montgomery, D. C. 2013. Statistical quality control—a modern introduction. 7th ed. New York: Wiley.
  • Peng, Y., M. Dong, and M. J. Zuo. 2010. Current status of machine prognostics in condition-based maintenance: A review. The International Journal of Advanced Manufacturing Technology 50 (1–4):297–313. doi:10.1007/s00170-009-2482-0.
  • Reis, M. S. 2019. Multiscale and multi-granularity process analytics: A review. Processes 7 (2):61–82. doi:10.3390/pr7020061.
  • Santner, T. J., B. J. Williams, and W. I. Notz. 2003. The design and analysis of computer experiments. New York: Springer.
  • Thornhill, N. F., M. S. Choudhury, and S. L. Shah. 2004. The impact of compression on data-driven process analyses. Journal of Process Control 14 (4):389–98. doi:10.1016/j.jprocont.2003.06.003.
  • Tyssedal, J., and M. Kulahci. 2015. Experiments for multi-stage processes. Quality Technology & Quantitative Management 12 (1):13–28. doi:10.1080/16843703.2015.11673363.
  • Tyssedal, J., M. Kulahci, and S. Bisgaard. 2011. Split-plot designs with mirror image pairs as subplots. Journal of Statistical Planning and Inference 141 (12):3686–96. doi:10.1016/j.jspi.2011.06.010.
  • Vanhatalo, E., and M. Kulahci. 2015. The effect of autocorrelation on the hotelling T2 control chart. Quality and Reliability Engineering International 31 (8):1779–96. doi:10.1002/qre.1717.
  • Vanhatalo, E., and M. Kulahci. 2016. Impact of autocorrelation on principal components and their use in statistical process control. Quality and Reliability Engineering International 32 (4):1483–1500. doi:10.1002/qre.1858.
  • Vanhatalo, E., M. Kulahci, and B. Bergquist. 2017. On the structure of dynamic principal component analysis used in statistical process monitoring. Chemometrics and Intelligent Laboratory Systems 167:1–11. doi:10.1016/j.chemolab.2017.05.016.
  • Wang, H. 2002. A survey of maintenance policies of deteriorating systems. European Journal of Operational Research 139 (3):469–89. doi:10.1016/S0377-2217(01)00197-7.
  • Wang, H. 2009. Comparison of p-control charts for low defective rate. Computational Statistics and Data Analysis 53 (12):4210–20. doi:10.1016/j.csda.2009.05.024.
  • Zaharia, M., M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, Boston, USA.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.