140
Views
0
CrossRef citations to date
0
Altmetric
Articles

Tri-training and MapReduce-based massive data learning

, , &
Pages 355-380 | Received 16 Feb 2009, Accepted 12 Apr 2009, Published online: 10 Mar 2011

References

  • Angluin , D. and Laird , P. 1988 . Learning from noisy examples . Machine learning , 2 ( 4 ) : 343 – 370 .
  • Blake , C. , Keogh , E. and Merz , C.J. 1998 . UCI repository of machine learning databases [Online] , Irvine, CA : Department of Information and Computer Science, University of California . Available from: http://www.ics.uci.edu/∼mlearn/MLRepository.html
  • Blum , A. and Mitchell , T. 1998 . “ Combining labeled and unlabeled data with co-training ” . In Proceedings of the 11th annual conference on computational learning theory (COLT-98) , Edited by: Peter , B. and Yishay , M. 92 – 100 . Madison, WI : ACM Press .
  • Cardona, K., Secretan, J., Georgiopoulos, M. and Anagnostopoulos, G., 2007. A grid based system for data mining using MapReduce. Technical Report, TR-2007-02, the AMALTHEA Program
  • Chapelle , O. , Schoelkopf , B. and Zien , A. 2006 . Semi-supervised learning , Cambridge, MA : MIT Press .
  • Chu , C.T. , Kim , S.K. , Lin , Y.A. Yu , Y.Y. 2007 . “ MapReduce for machine learning on multicore ” . In Proceedings of advances in neural information processing systems 19 , 306 – 313 . Whistler, BC : NIPS .
  • Collobert , R. , Sinz , F. , Weston , J. and Bottou , L. 2006 . Large scale transductive SVMs . Journal of machine learning research , 7 : 1687 – 1712 .
  • Das , M. , Mühlenbruch , G. , Mahnken , A.H. Flohr , T.G. 2006 . Small pulmonary nodules: effect of two computer-aided detection systems on radiologist performance . Radiology , 241 ( 2 ) : 564 – 571 .
  • Das , A. , Datar , M. , Garg , A. and Rajaram , S. 2007 . “ Google news personalization: scalable online collaborative filtering ” . In Proceedings of world wide web conference (www2007) , 8 – 12 . Alberta : ACM press .
  • Dean , J. and Ghemawat , S. 2004 . “ MapReduce: simplified data processing on large clusters ” . In Proceedings of operating systems design and implementation (OSDI) 137 – 150 . San Francisco, CA
  • Dean , J. and Ghemawat , S. 2008 . MapReduce: simplified data processing on large clusters . Communications of the acm , 51 ( 1 ) : 107 – 113 .
  • Delalleau , O. , Bengio , Y. and Roux , N.L. 2006 . “ Large-scale algorithms ” . In Semi-supervised learning , Edited by: Chapelle , O. , Schölkopf , B. and Zien , A. 333 – 341 . Cambridge, MA : MIT Press .
  • Duda , R.O. , Hart , P.E. and Stork , D.G. 2001 . Pattern classification , 2nd ed. , New York : Wiley .
  • Goldman , S. and Zhou , Y. 2000 . “ Enhancing supervised learning with unlabeled data ” . In Proceedings of the 17th international conference on machine learning (ICML-2000) , Edited by: Pat , L. 327 – 334 . San Francisco : Morgan Kaufmann .
  • Jia , X.H. , Wang , Z. and Chen , S.C. 2006 . Fast screening out true negative regions for microcalcification detection in digital mammograms . Transaction of Nanjing University of Aeronautics & Astronautics , 23 ( 1 ) : 52 – 58 .
  • Jiang , Y. and Zhou , Z.H. 2004 . “ Editing training data for kNN classifiers with neural network ensemble ” . In Proceedings of the IEEE 2004 international symposium on neural networks (ISNN04), 19–21 August 356 – 361 . Dalian, China LNCS3172
  • John , N.K. , Robert , E.B. , Lawrence , O.H. and Kevin , W.B. 2008 . “ Semi-supervised learning on large complex simulations ” . In Proceedings of the 19th conference of the international association for pattern recognition (ICPR2008) , Florida : IEEE press .
  • Khoussainov , R. , Zuo , X. and Kushmerick , N. 2004 . Grid-enabled weka: a toolkit for machine learning on the grid . ERCIM news , : 47 – 48 .
  • Li , M. and Zhou , Z.H. 2005 . “ SETRED: self-training with editing ” . In Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining 611 – 621 . Hanoi, Vietnam
  • Li , M. and Zhou , Z.H. 2007 . Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples . IEEE transactions on systems, man and cybernetics – part A: systems and humans , 37 ( 6 ) : 1088 – 1098 .
  • Liu , T. , Rosenberg , C. and Rowley , H.A. 2007 . “ Clustering billions of images with large scale nearest neighbor search ” . In Proceedings of the 8th IEEE workshop on applications of computer vision (WACV 2007) , 28 Austin, TX : IEEE Computer Society .
  • Ramaswamy , S. 2008 . “ Extreming data mining ” . In Proceedings of the 2008 ACM SIGMOD international conference on management of data , 1 – 2 . Vancouver, Canada : ACM press .
  • Sánchez , J.S. 2003 . Analysis of new techniques to obtain quality training sets . Pattern recognition letters , 24 ( 7 ) : 1015 – 1022 .
  • Sindhwani , V. and Keerthi , S.S. 2006 . “ Large scale semi-supervised linear SVMs ” . In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval 477 – 484 . NewYork: ACM
  • Weston , J. 2008 . “ Large scale semi-supervised learning ” . In Proceedings of NATO advanced study institute on mining massive data sets for security , 10 – 21 . Villa Cagnola, Gazzada, Italy : September 2007 . Available from: http://kyb.mpg.de/bs/people/weston/papers/largesemi.pdf
  • Wilson , D.R. and Martinez , T.R. 1997 . Improved heterogeneous distance functions . Journal of artificial intelligence research , 6 ( 1 ) : 1 – 34 .
  • Witten , I. and Frank , E. 2005 . Data mining: practical machine learning tools and techniques , San Francisco : Morgan Kaufmann .
  • Yuille , A.L. and Rangarajan , A. 2002 . “ The concave-convex procedure (CCCP) ” . In Neural Computation Vol. 15 , 915 – 936 . 4
  • Zhou , Z.H. and Li , M. 2005 . Tri-training: exploiting unlabeled data using three classifiers . IEEE transactions on knowledge and data engineering , 17 ( 11 ) : 1529 – 1541 .
  • Zhu, X.J., 2008. Semi-supervised learning literature survey. Madison, WI: Univ. of Wisconsin-Madison, Tech. Rep. Computer Sciences, TR1530
  • Zhu , X.J. and Lafferty , J. 2005 . “ Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning ” . In Proceedings of the 22nd international conference on machine learning , 1052 – 1059 . USA : ACM . Bonn, Germany

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.