Abstract
Opinion spammers exploit consumer trust by posting false or deceptive reviews that may have a negative impact on both consumers and businesses. These dishonest posts are difficult to detect because of complex interactions between several user characteristics, such as review velocity, volume, and variety. We propose a novel hierarchical supervised-learning approach to increase the likelihood of detecting anomalies by analyzing several user features and then characterizing their collective behavior in a unified manner. Specifically, we model user characteristics and interactions among them as univariate and multivariate distributions. We then stack these distributions using several supervised-learning techniques, such as logistic regression, support vector machine, and k-nearest neighbors yielding robust meta-classifiers. We perform a detailed evaluation of methods and then develop empirical insights. This approach is of interest to online business platforms because it can help reduce false reviews and increase consumer confidence in the credibility of their online information. Our study contributes to the literature by incorporating distributional aspects of features in machine-learning techniques, which can improve the performance of fake reviewer detection on digital platforms.
Notes
1. These online platforms include e-commerce websites such as Amazon; social media sites such as Facebook, Twitter, and Foursquare; and recommendation and review websites such as Yelp, TripAdvisor, and Expedia.
2. See http://usa.chinadaily.com.cn/life/2012–12/13/content_16013662.htm (accessed on August 9, 2017).
3. See http://www.technologyreview.com/view/426174/undercover-researchers-expose-chinese-internet-water-army/ (accessed on August 9, 2017).
4. See http://www.clearmymail.com/guides/viagra_spam_emails.aspx (accessed on August 10, 2017).
5. For readability, we use the term “spammer” to mean opinion spammer.
Additional information
Notes on contributors
Naveen Kumar
Naveen Kumar ([email protected]) is an assistant professor in the Department of Business Information and Technology (formerly Management Information Systems) at the Fogelman College of Business and Economics, University of Memphis. He received his Ph.D. from the University of Washington. His research focuses on deep learning and analytics in social media, information systems, and health care. Before joining academia, he worked as a researcher in the high-tech industry, solving complex problems in information technologies, finance, and manufacturing using machine-learning techniques.
Deepak Venugopal
Deepak Venugopal ([email protected]) is an assistant professor in the Department of Computer Science at the University of Memphis. He received his Ph.D. in computer science from the University of Texas at Dallas. His research interests focus on probabilistic models and statistical relational models. His work has been published in the proceedings of conferences, including those of the Association for the Advancement of Artificial Intelligence, Conference on Neural Information Processing, and others.
Liangfei Qiu
Liangfei Qiu ([email protected]; corresponding author) is an assistant professor in the Department of Information Systems and Operations Management at the Warrington College of Business, University of Florida. He received his Ph.D. in economics from the University of Texas at Austin. His research focuses on economics of information systems, prediction markets, social media, and telecommunications policy. His work has been published in Decision Support Systems, Information Systems Research, Journal of Management Information Systems, MIS Quarterly, and others.
Subodha Kumar
Subodha Kumar ([email protected]) is the Laura Carnell Chair Professor and director of the Center for Data Analytics at the Fox School of Business, Temple University. He earned his Ph.D. from the University of Texas at Dallas. He has published numerous papers in various journals. He is the deputy editor and a department editor of Production and Operations Management and has served as a senior editor of Decision Sciences.