Abstract
Preserving the confidentiality of sensitive data, while permitting knowledge discovery, is an important goal in privacy-preserving data mining. This paper investigates the effectiveness of data shuffling for classification tree and regression analysis. We compare the effectiveness of data shuffling to the tree based data perturbation method which was developed specifically for the purpose of data mining. Results suggest that data shuffling provides the higher levels of data security and more effectively preserves data mining knowledge than tree based data perturbation method.
Additional information
Notes on contributors
Han Li
Han Li is currently an assistant professor in School of Business Administration at Minnesota State University Moorhead. She received her doctorate in Management Information Systems from Oklahoma State University. She has published in Decision Support Systems, Operations Research, European Journal of Information Systems, Journal of Computer Information Systems, Information Management & Computer Security, and Journal of Information Privacy and Security. Her current research interests include Heath IT, privacy and confidentiality, data and information security and the adoption of information technology.
Krishnamurty Muralidhar
Krish Muralidhar is Gatton research professor at the School of Management, University of Kentucky. He received his PhD from Texas A&M University. His primary research interest is in data privacy and related areas. His research has appeared in journals such as ACM Transactions on Database Systems, Information Systems Research, Journal of Management, Management Science, and Operations Research.
Rathindra Sarathy
Rathindra Sarathy is Ardmore Chair and Professor of Information systems in the Department of Management Science and Information Systems at Oklahoma State University. He received his PhD from Texas A&M University. His research interests include database confidentiality, distributed databases, and e-commerce. His work has appeared in journals such as ACM Transactions on Database Systems, Decision Sciences, Decision Support Systems, European Journal of Operations Research, Information Systems Research, Management Science, and Operations Research.