Abstract
We develop a method to assess the quality of peer-produced content in knowledge repositories using their development and coordination histories. We also develop a process to identify relevant features for quality assessment models and algorithms for processing datasets in large-scale knowledge repositories. Models using these features, on English language Wikipedia articles, outperform existing methods for quality assessment. We achieve an overall accuracy of 81 percent which is a 7 percent improvement over existing models. In addition, our features improve the precision and recall of each class up to 9 percent and 17 percent respectively. Finally, our models are robust to ten-fold cross validation and techniques used for classification. Overall, our research provides a comprehensive design science framework for both identifying and efficiently extracting features related to development and coordination activities and assessing quality using these features. We also provide details of potential implementation of a quality assessment system for knowledge repositories.
Notes
5. Map-reduce is a computational framework that performs a large-scale computational processing job by splitting it into multiple smaller jobs and combines the output of these jobs while coordinating them in a timely manner to extract editor actions (Details are in Appendix B).
7. We also used the standard measures of skewness used in statistics and found that the results of our predictive modeling were not sensitive to the choice of the measures. We therefore picked the ratio of median to mean since it is much simpler to interpret. The results of predictive modeling using the other two measures are shown in Tables 14 and 15 in Appendix C.
8. AUC values are used only in Binary Classification. All other columns are for evaluating multi-class classification which is the primary focus of our evaluation. We used AUC’s only to demonstrate the power of our features in differentiating highest quality articles (FA) from the rest.
Additional information
Notes on contributors
Srikar Velichety
Srikar Velichety ([email protected]; corresponding author) is an Assistant Professor of Business Information and Technology at the Fogelman College of Business and Economics at the University of Memphis. He received his Ph.D. in Management Information Systems from the Eller College of Management, University of Arizona. Dr. Velichety’s research interests lie in social media and social networks, user-generated content, and predictive analytics.
Sudha Ram
Sudha Ram ([email protected]) is Anheuser-Busch Endowed Professor of MIS, Entrepreneurship and Innovation in the Eller College of Management, and director of the INSITE: Center for Business Intelligence and Analytics at the University of Arizona. She received her Ph.D. from the University of Illinois at Urbana-Champaign. Dr. Ram’s research focuses on business intelligence, large scale networks, data mining, and Big Data analytics, using such methods as machine learning, statistical approaches, ontologies, and conceptual modeling. Dr. Ram has published more than 200 articles in such journals as Information Systems Research, Management Science, MIS Quarterly, Journal of Management Information Systems, IEEE Transactions on Knowledge and Data Engineering, and Communications of the ACM.
Jesse Bockstedt
Jesse Bockstedt ([email protected]) is an associate professor of Information Systems and Operations Management in the Goizueta Business School at Emory University. He received his Ph.D. from Carlson School of Management at the University of Minnesota. He studies user behavior and economic issues in environments that rely on information technology. His research has appeared in a variety of journals, including Information Systems Research, MIS Quarterly, Journal of Management Information Systems, Journal of Operations Management, and Production and Operations Management.