References
- Aggarwal G. et al. 2005. Anonymizing Tables. In: Eiter T., Libkin L. (Eds) Database Theory - ICDT 2005, pages 246–258.
- Aggarwal, C. C., and P. S. Yu. 2008. Privacy-preserving data mining: Models and algorithms. Berlin: Springer Publication. doi:https://doi.org/10.1007/978-0-387-70992-5.
- Al-Zobbi, M., S. Shahrestani, and C. Ruan. 2017. Improving MapReduce privacy by implementing multi-dimensional sensitivity-based anonymization. Journal of Big Data 4 (1):45. doi:https://doi.org/10.1186/s40537-017-0104-5.
- Amit, K., and S. Neeraj. 2016. Privacy preservation in big data using k-anonymity algorithm with privacy key. International Journal Of Computer Applications 153 (5):0975–8887.
- Bayardo, R., and R. Agrawal. 2005. Data privacy through Optimal k-anonymization, in: Proceedings of 21st International Conference on Data Engineering (ICDE), Tokyo, 5-8 April 2005, pp.217–28. https://doi.org/10.1109/ICDE.2005.42.
- Chamikara, M. A. P., P. Bertok, D. Liu, S. Camtepe, and I. Khalil. 2019. Efficient privacy preservation of big data for accurate data mining, Information Sciences. doi:https://doi.org/10.1016/j.ins.2019.05.053.
- Chamikara, M. A. P., P. Bertok, D. Liu, S. Camtepe, and I. Khalil. 2020. Efficient privacy preservation of big data for accurate data mining. Information Sciences 527:420–43. doi:https://doi.org/10.1016/j.ins.2019.05.053.
- Chen, K., G. Sun, and L. Liu. 2007. Towards attack-resilient geometric data perturbation, in: Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pp. 78–89. doi: https://doi.org/10.1137/1.9781611972771.8.
- Chen, K., and L. Liu. 2009. Privacy-preserving multiparty collaborative mining with geometric data perturbation. IEEE Transactions on Parallel and Distributed Systems 20 (12):1764–76. doi:https://doi.org/10.1109/TPDS.2009.26.
- Chen, K., and L. Liu. 2011. Geometric data perturbation for privacy preserving outsourceddata mining, Springer-Knowl. Inf. Syst 29:657–95. doi:https://doi.org/10.1007/s10115-010-0362-4.
- Chi-Wing Wong, R., J. Li, A. Wai-Chee Fu, and K. Wang. (2006). (α, K)-Anonymity: An enhanced k-anonymity model for privacy preserving data publishing. ACM Digital Library, Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, p.754–59
- Daniel Whiteson daniel '@' uci.edu. 2016. HEPMASS Data Set. https://archive.ics.uci.edu/ml/datasets/HEPMASS#
- David, R. M., J. Forne, and J. Domingo-Ferrer. 2010. From t-closeness-like privacy to post randomization via information theory. IEEE Transactions on Knowledge and Data Engineering 22 (11):1623–36. doi:https://doi.org/10.1109/TKDE.2009.190.
- David W. Aha.1988. Heart Disease Data Set, (714): 856-8779. http://archive.cs.uci.edu/ml/datasets/Heart+Disease
- Dean, J., and S. Ghemawat. 2004. MapReduce: Simplied data processing on large clusters. OSDI.
- Dua, D. and Graff, C. 2019. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
- Eyupoglu, C., M. A. Aydin, A. H. Zaim, and A. Sertbas. An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy (Basel) 2018, May 17. 20(5):373. doi:https://doi.org/10.3390/e20050373. PMID: 33265463; PMCID: PMC7512893.
- Fahad, A., Z. Tari, A. Almalawi, A. Goscinski, I. Khalil, and A. Mahmood. 2014. PPFSCADA: Privacy preserving framework for SCADA data publishing. Future Generation Computer Systems 37:496–511. doi:https://doi.org/10.1016/j.future.2014.03.002.
- Fung, B. C. M., K. Wang, R. Chen, and P. S. Yu. 2010. Privacy preserving data publishing: A survey on recent developments. ACM Computing Surveys 42 (4):14: 1–14:53. doi:https://doi.org/10.1145/1749603.1749605.
- Ghinita, G., P. Kalnis, and Y. Tao. 2011. Anonymous publication of sensitive transactional data. IEEE Transactions on Knowledge and Data Engineering 23 (2011):161–74. doi:https://doi.org/10.1109/TKDE.2010.101.
- Girka, A., V. Terziyan, M. Gavriushenko, and A. Gontarenko. 2021. Anonymization as homeomorphic data space transformation for privacy-preserving deep learning. Procedia Computer Science 180:867–76. . doi:https://doi.org/10.1016/j.procs.2021.01.337.
- Govinda, K., and E. Sathiyamoorthy. 2012. Identity anonymization and secure data storage using group signature in private cloud. Procedia Technology 4:495–99. doi:https://doi.org/10.1016/j.protcy.2012.05.079.
- Goyal, V., O. Pandey, A. Sahai, and B. Waters. 2006. Attribute-based encryption for fine-grained access control of encrypted data. Proceedings of the 13th ACM Conference on Computer and Communications Security - CCS ’06. Alexandria, VA, USA. doi:https://doi.org/10.1145/1180405.1180418
- Gu, R., X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan, and Y. Huang. 2014. Hadoop: Improving MapReduce performance by optimizing job execution mechanism in hadoop clusters. J. Parallel Distrib. Comput 74 (3):2166–79. doi:https://doi.org/10.1016/j.jpdc.2013.10.003.
- Hadoop, (2009). http://hadoop.apache.org
- Han, J., and M. Kamber. 2006. Data mining concepts and techniques. Morgan Kaufmann Publishers. Imprint of Elsevier. 225 Wyman Street, Waltham, MA 02451, USA.
- Hayward, R., and C. Chiang. 2015. Parallelizing fully homomorphic encryption for a cloud environment. Journal of Applied Research and Technology 13 (2):245–52. doi:https://doi.org/10.1016/j.jart.2015.06.004.
- HIPAA. (1999). Health insurance portability and accountability act of 1999. http://www.hhs.gov/ocr/privacy/hipaa/administrative/privacyrule (accessed 20.06.15).
- Hongwei, T., and Z. Weining. 2011. Extending l-diversity to generalize sensitive data. Elsevier Journal of Data and Knowledge Engineering 70 (1):101–26. doi:https://doi.org/10.1016/j.datak.2010.09.001.
- Islam, M. Z., and L. Brankovic. 2011. Privacy preserving data mining: A noise addition framework using a novel clustering technique. Knowl.-Based Syst 24 (8):1214–23. doi:https://doi.org/10.1016/j.knosys.2011.05.011.
- Jain, P., M. Gyanchandani, and N. Khare. 2016. Big data privacy: A technological perspective and review. J Big Data 3 (1):25. doi:https://doi.org/10.1186/s40537-016-0059-y.
- Jason Catlett. 1995. Statlog (Shuttle) Data Set. https://archive.ics.uci.edu/ml/datasets/Statlog+%28Shuttle%29
- Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: networked science in machine learning. SIGKDD Explorations 15(2): 49–60.
- Khan, S., K. Iqbal, S. Faizullah, M. Fahad, J. Ali, and W. Ahmed. 2019. Clustering based privacy preserving of big data using fuzzification and anonymization operation. International Journal of Advanced Computer Science and Applications 10 (12). doi: https://doi.org/10.14569/IJACSA.2019.0101239.
- Lammel, R. 2008. Google’s MapReduce programming model-revisited. Sci Comput Progr 70 (1):1–30. doi:https://doi.org/10.1016/j.scico.2007.07.001.
- Lauter, K., M. Naehrig, and V. Vaikuntanathan. 2011. Can homomorphic encryption be practical?, 113–24. Chicago, IL: The 3rd ACM Workshop on Cloud Computing Security.
- LeFevre, K., D. J. DeWitt, and R. Ramakrishnan. 2005. Incognito: Efficient full domain k-anonymity. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, 14–16 June 2005; pp.49–60.
- LeFevre, K., D. J. DeWitt, and R. Ramakrishnan. 2006. Mondrian multidimensional k-anonymity. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), Atlanta, GA, USA, IEEE.
- N. Li, T. Li and S. Venkatasubramanian. 2007.t-Closeness: Privacy beyond k-anonymity and l-diversity. IEEE 23rd International Conference on Data Engineering (2007), Istanbul, Turkey, 106–15. https://doi.org/10.1109/ICDE.2007.367856.
- Li, N., W. Qardaji, and D. Su (2012). On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, Seoul, Korea, 2–4; pp. 32–33.
- Lichman, M., (2013). UCI Machine Learning Repository, http://archive.ics.uci.edu/ml (accessed 20.06.15).
- Lindell, Y., and B. Pinkas. 2009. Secure multiparty computation for privacy-preserving data mining. J. Priv. Confidentiality 1:59–98.
- Liu, H., X. Huang, and J. K. Liu. 2015. Secure sharing of personal health records in cloud computing: Ciphertext-policy attribute-based signcryption. Future Gener.Comput. Syst 52:67–76. doi:https://doi.org/10.1016/j.future.2014.10.014.
- Machanavajjhala, A., D. Kifer, J. Gehrke, and M. Venkitasubramaniam. 2007. L-diversity. ACM Transactions on Knowledge Discovery from Data 1 (1):1. doi:https://doi.org/10.1145/1217299.1217302.
- Machanavajjhala, A., J. Gehrke, D. Kifer, and M. Venkitasubramaniam. 2006. L-diversity: Privacy beyond k-anonymity. 22nd International Conference on Data Engineering (ICDE’06), Atlanta, GA, USA. doi:https://doi.org/10.1109/icde.2006.1;
- Mehmood, A., I. Natgunanathan, Y. Xiang, G. Hua, and S. Guo. 2016. Protection of Big Data Privacy. IEEE Access 4. 1-1. doi:https://doi.org/10.1109/ACCESS.2016.2558446.
- Meyerson, A., and R. Williams. 2004. On the complexity of optimal K-Anonymity. In: Proc. of the ACM Symp. on Principles of Database Systems. Paris France. June 14 - 16, 2004.
- Nayahi, J. J. V., and V. Kavitha. 2015. An efficient clustering for anoymizing data and protecting sensitive labels. Int. J. Uncertain. Fuzziness Knowl.-Based Syst 23 (5):685–714. doi:https://doi.org/10.1142/S0218488515500300.
- Nayahi, J. J. V., and V. Kavitha. 2017. Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop. Future Gener. Comput. Syst 74:393–408. doi:https://doi.org/10.1016/j.future.2016.10.022.
- Ogburn, M., C. Turner, and P. Dahal. 2013. Homomorphic encryption. Procedia Computer Science 20:502–09. doi:https://doi.org/10.1016/j.procs.2013.09.310.
- Orsini, M., M. Pacchioni, A. Malagoli, and G. Guaraldi. 2017. My smart age with HIV: An innovative mobile and IoMT framework for patient’s empowerment, in: Proc. IEEE International Forum on Research and Technologies for Society and Industry(RTSI), Modena, Italy, pp. 1–6.
- Moutafis, Panagiotis & Mavrommatis, George & Vassilakopoulos, Michael, and Sioutas, Spyros. 2019. Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework. Data & Knowledge Engineering 121:42–70. doi:https://doi.org/10.1016/j.datak.2019.04.003.
- Pinkas, B. 2002. Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explor. Newsl 4 (2):12–19. doi:https://doi.org/10.1145/772862.772865.
- Potey, M. M., C. A. Dhote, and D. H. Sharma. 2016. Homomorphic encryption for security of cloud data. Procedia Computer Science 79:175–81. doi:https://doi.org/10.1016/j.procs.2016.03.023.
- Qian, J., Xia, M., and Yue, X. 2018. Parallel knowledge acquisition algorithms for big data using MapReduce. International Journal of Machine Learning and Cybernetics. 9(6):1007–21. doi:https://doi.org/10.1007/s13042-016-0624-x.
- Rahul, M., H. A. Alhumyani, and M. Muntjir. 2017. An improved homomorphic encryption for secure cloud data storage. International Journal of Advanced Computer Science and Application 8 (12):12. doi:https://doi.org/10.14569/IJACSA.2017.081258.
- Reddy, C. K., and C. C. Aggarwal. (Eds.). 2015. Healthcare data analytics (1st ed.), In Data mining and knowledge discovery series Chapman & Hall/CRC. https://doi.org/https://doi.org/10.1201/b18588
- Ronny, K. and Barry, B. 2019. Adult Data Set. http://archive.ics.uci.edu/ml/datasets/Adult
- Saadoon, M., S. H. A. Hamid, H. Sofian, H. Altarturi, N. Nasuha, Z. H. Azizul, A. A. Sani, and A. Asemi. 2021. Experimental analysis in Hadoop MapReduce: A closer look at fault detection and recovery techniques. Sensors 21 (11):3799. doi:https://doi.org/10.3390/s21113799.
- Samarati, P. 2001. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13 (6):1010–27. doi:https://doi.org/10.1109/69.971193.
- Samarati, P., and L. Sweeney. 1998. Protecting privacy when disclosing information: K-anonymity and its enforcement through generalization and suppression, in: Proceedings of IEEE Symposium on Research in Security and Privacy, Oakland, CA, USA. pp. 188–206.
- Scikit Learn Tutorial. 2006. https://www.tutorialspoint.com/scikit_learn/index.htm
- Sedayao, J., R. Bhardwaj, and N. Gorade. 2014. Making big data, privacy, and anonymization work together in the enterprise: Experiences and issues. 2014 IEEE International Congress on Big Data. doi:https://doi.org/10.1109/bigdata.congress.2014.92.
- Shafer, J., S. Rixner, and A. L. Cox. 2010. The hadoop distributed file system: Balancing portability and performance, in: IEEE International Symposium on Performance Analysis of Systems & Software,White Plains, NY, USA(32), pp. 122–33. https://doi.org/10.1109/ISPASS.2010.5452045.
- Shvachko, K., H. Kuang, S. Radia, and R. Chansler (2010). The hadoop distributed file system, in: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, Incline Village, NV, USA(33), pp. 1–10. doi: https://doi.org/10.1109/MSST.2010.5496972.
- Soria-comas, J., J. Domingo-Ferrer, D. Sanchez, and S. Martinez. 2015. t-closeness through microaggregation: Strict privacy with enhanced utility preservation. IEEE Transactions on Knowledge and Data Engineering 27 (11):3098–110. doi:https://doi.org/10.1109/TKDE.2015.2435777.
- Soria-Comas, J., J. Domingo-Ferrer, D. Sánchez, and S. Martínez. 2014. Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J 23 (5):771–94. doi:https://doi.org/10.1007/s00778-014-0351-4.
- Sowmya, Y., and M. Nagaratna. 2016. Parallelizing K-Anonymity algorithm for privacy preserving knowledge discovery from big data. International Journal of Applied Engineering Research Volume 11, 2 pp 1314–21 © Research India Publications ():. http://www.ripublication.com.
- Sumit, S., C. Laclau, M. Amini, G. Vandelle, and Andre. 2017. 'KASANDR: A Large-Scale Dataset with Implicit Feedback for Recommendation. https://archive.ics.uci.edu/ml/datasets/KASANDR
- Sweeney, L. (1997). Guaranteeing anonymity when sharing medical data, the datafly system. Proceedings: a conference of the American Medical Informatics Association. AMIA Fall Symposium, Opryland Hotel, Nashville, TN, 51–55.
- Sweeney, L. 1998. Datafly: A system for providing anonymity in medical data. In Database security XI. IFIP advances in information and communication technology, T. Y. Lin and S. Qian. ed., Boston, MA: Springer, pp 356-381. doi:https://doi.org/10.1007/978-0-387-35285-5_22.
- Sweeney, L. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst 10 (5):571–88. doi:https://doi.org/10.1142/S021848850200165X.
- Sweeney, L. 2002b. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst 10 (5):557–70. doi:https://doi.org/10.1142/S0218488502001648.
- Tankard, C. 2012. Big data security. Netw. Secur 2012:5–8.
- UT Dallas Data Security and Privacy Lab, UTD Anonymization Toolbox, 2010. http://cs.utdallas.edu/dspl/cgi-bin/toolbox/index.php (accessed 20. 06.15).
- Venugopal, V., and S. Vigila. 2018. Implementing big data privacy with mapreduce for multidimensional sensitive data. International Journal of Applied Engineering Research 13 (15):11824–29.
- Wong, R., J. Li, A. Fu, K. Wang. 2009. (α, k)-anonymous data publishing. Journal of Intelligent Information Systems 33 (2):209–34. doi:https://doi.org/10.1007/s10844-008-0075-2.
- Wong, R. C., J. Li, A. W. Fuand, and K. Wang (2006). (α, k) Anonymity: An enhanced k-anonymity model for privacy-preserving data publishing, in: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia PA USA, pp. 733–44. doi: https://doi.org/10.1145/1150402.1150499
- Xiaoxun, S., L. Min, and W. Hua. 2011. A family of enhanced (L,α) diversity models for privacy preserving data publishing. Elsevier Journal of Future Generation Computer System 27 (3):348–56. doi:https://doi.org/10.1016/j.future.2010.07.007.
- Xu, J., W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu (2006b). Utility-based anonymization using local recoding. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’06. Philadelphia PA USA. 8( 2),21–30, doi:https://doi.org/10.1145/1150402.1150504.
- Xu, J., W. Wang, J. Pei, X. Wang, B. Shi, and Fu. (2006a). Utility-based anonymization for privacy preservation with less information loss, UBDM’06, Philadelphia, Pennsylvania, USA. Copyright 2006 ACM 1-59593-440-5/06/0008.
- Zhang, X., W. Dou, J. Pei, S. Nepal, C. Yang, C. Liu, and J. Chen. 2015. Proximity-aware local recoding anonymization with MapReduce for scalable big data privacy preservation in cloud. IEEE Trans.on Computers, 64(8): 2293-2307, 1 Aug. 2015, doi: https://doi.org/10.1109/TC.2014.2360516.