502
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

An Empirical Comparison of Four Text Mining Methods

, &
Pages 1-10 | Received 16 Nov 2009, Accepted 01 Mar 2010, Published online: 11 Dec 2015

References

  • Ahrendt, P., Goutte, C., and Larsen, J., “Co-occurrence models in music genre classification”, IEEE International workshop on Machine Learning for Signal Processing, 2005, 247–252.
  • Aldous, D., “Exchangeability and related topics”, Ecole d'Ete de Probabilites de Saint-Flour XII, Springer Lecture Notes in Mathematics, 1117, 1985, 1–198.
  • Androutsopoulos, I., Koutsias, J., Chandrinos, K., and Spyropoulos, C., “An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages”, ACM New York, NY, USA, 2000, 160–167.
  • Bao, S., Xu, S., Zhang, L., Yan, R., Su, Z., Han, D., and Yu, Y., “Joint Emotion-Topic Modeling for Social Affective Text Mining”, Data Mining, 2009. ICDM '09. Ninth IEEE International Conference, 2009, 699–704.
  • Bellegarda, J., Naik, D., and Silverman, K., “Automatic junk e-mail filtering based on latent content”, 2003, 465–470.
  • Bergholz, A., Chang, J., Paaß, G., Reichartz, F., and Strobel, S., “Improved phishing detection using model-based features”, 2008.
  • Bíró, I., Szabó, J., and Benczúr, A., “Latent dirichlet allocation in web spam filtering”, ACM New York, NY, USA, 2008, 29–32.
  • Blei, D.M., and Lafferty, J.D., “A Correlated Topic Model of Science”, The Annals of Applied Statistics, 1 (1), 2007, 17–35.
  • Blei, D.M., and Lafferty, J.D., Topic Models, Ashok N. Srivastava, Meharn Sahami ed., CRC Press, 2009.
  • Blei, D.M., Ng, A.Y., and Jordan, M.I., “Latent Dirichlet Allocation”, Journal of Machine Learning Research, 3, 2003, 993–1022.
  • Bosch, A., Zisserman, A., and Munoz, X., “Scene classification via pLSA”, Lecture Notes in Computer Science, 3954, 2006, 517–530.
  • Boyd-Graber, J., Blei, D., and Zhu, X., “A topic model for word sense disambiguation”, 2007, 1024–1033.
  • Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., and Blei, D., “Reading Tea Leaves: How Humans Interpret Topic Models”, Neural Information Processing Systems, 2009, 1–9.
  • Chen, Q., Tai, X., Jiang, B., Li, G., and Zhao, J., “Medical Image Retrieval Based on Latent Semantic Indexing”, Proceedings of the 2008 International Conference on Computer Science and Software Engineering, IEEE Computer Society, 2008, 561–564.
  • Cheung, K., Kwok, J.T., Law, M.H., and Tsui, K., “Mining customer product ratings for personalized marketing”, Decision Support Systems, 35, 2003, 231–243.
  • Chou, T.-C., and Chen, M.C., “Using Incremental PLSI for Threshold-Resilient Online Event Analysis”, Knowledge and Data Engineering, IEEE Transactions on, 20 (3), 2008, 289–299.
  • Das, S.R., and Chen, M.Y., “Yahoo! for Amazon: Sentiment extraction from small talk on the web”, Management Science, 53 (9), 2007, 1375–1388.
  • Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R., “Indexing by Latent Semantic Analysis”, Journal of the American Society for Information Science, 41 (6), 1990, 391–407.
  • Ding, C.H.Q., “A probabilistic model for Latent Semantic Indexing: Research Articles”, Journal of the American Society for Information Science and Technology, 56 (6), 2005, 597–608.
  • Fuhr, N., “Probabilistic models in information retrieval”, The Computer Journal, 35 (3), 1992, 243–255.
  • Gansterer, W., Janecek, A., and Neumayer, R., “Spam filtering based on latent semantic indexing”, Survey of Text Mining II: Clustering, Classification, and Retrieval, 2008, 165–183.
  • Girolami, M., and Kaban, A., “On an equivalence between PLSI and LDA”, ACM New York, NY, USA, 2003, 433–434.
  • Greif, T., Horster, E., and Lienhart, R., “Correlated topic models for image retrieval”, Technical Report TR2008-09, University of Augsburg, 2008.
  • Herdiyeni, Y., Nurdiati, S., and Daud, I.A., “Image Semantic Extraction Using Latent Semantic Indexing on Image Retrieval Automatic-Annotation”, Proceedings of the 2009 International Conference of Soft Computing and Pattern Recognition, IEEE Computer Society, 2009, 283–288.
  • Hofmann, T., “Probabilistic latent semantic indexing”, SIGIR-99, ACM New York, NY, USA, 1999, 50–57.
  • Hofmann, T., “Unsupervised learning by probabilistic latent semantic analysis”, Machine Learning, 42 (1), 2001, 177–196.
  • Hofmann, T., Puzicha, J., and Jordan, M., “Unsupervised learning from dyadic data”, Advances in Neural Information Processing Systems, 11, 1999.
  • Ide, N., and Veronis, J., “Introduction to the special issue on word sense disambiguation: the state of the art”, Comput. Linguist., 24 (1), 1998, 2–40.
  • Kakkonen, T., Myller, N., and Sutinen, E., “Applying latent Dirichlet allocation to automatic essay grading”, Lecture Notes in Computer Science, 4139, 2006, 110–120.
  • Kakkonen, T., Myller, N., Sutinen, E., and Timonen, J., “Comparison of Dimension Reduction Methods for Automated Essay Grading”, Educational Technology & Society, 11 (3), 2008, 275–288.
  • Koller, D., and Friedman, N., Probabilistic Graphical Models: Principles and Techniques”, The MIT Press, 2009.
  • Kongthon, A., Haruechaiyasak, C., and Thaiprayoon, S., “Expert Identification for Multidisciplinary R&D Project Collaboration”, PICMET 2009 Proceedings, 2009.
  • Kontostathis, A., and Pottenger, W., “A framework for understanding Latent Semantic Indexing (LSI) performance”, Information Processing and Management, 42 (1), 2006, 56–73.
  • Landauer, T.K., Foltz, P.W., and Laham, D., “An introduction to latent semantic analysis”, Discourse processes, 25, 1998, 259–284.
  • Larsen, K.R., Monarchi, D.E., Hovorka, D.S., and Bailey, C.N., “Analyzing unstructured text data: using latent categorization to identify intelectual communities in information systems”, Decision Support Systems, 45, 2008, 884–896.
  • Magatti, D., Calegari, S., Ciucci, D., and Stella, F., “Automatic Labeling Of Topics”, Ninth International Conference on Intelligent Systems Design and Applications, 2009, 1227–1232.
  • McCallum, A., Wang, X., and Corrada-Emmanuel, A., “Topic and role discovery in social networks with experiments on enron and academic email”, Journal of Artificial Intelligence Research, 30 (1), 2007, 249–272.
  • Mølgaard, L., Larsen, J., and Goutte, C., “Temporal analysis of text data using latent variable models”, IEEE International Workshop on Machine Learning for Signal Processubg, 2009.
  • Papadimitriou, C.H., Raghavan, P., Tamaki, H., and Vempala, S., “Latent semantic indexing: A probabilistic analysis”, Journal of Computer and System Sciences, 61, 2000, 217–235.
  • Pons-Porrata, A., Berlanga-Llavori, R., and Ruiz-Shulcloper, J., “Topic discovery based on text mining techniques”, Information Processing and Management, 43 (3), 2007, 752–768.
  • Rodriguez, M., Ali, S., and Kanade, T., “Tracking in Unstructured Crowded Scenes”, The 12 IEEE International Conference on Computer Vision, 2009.
  • Romberg, S., Horster, E., and Lienhart, R., “Multimodal pLSA on visual features and tags”, The Institute of Electrical and Electronics Engineers Inc., 2009, 414–417.
  • Rosenfeld, R., “Two decades of statistical language modeling: where do we go from here?”, Proceedings of the IEEE, 88 (8), 2000, 1270–1278.
  • Salton, G., Automatic text processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1989.
  • Santhiappan, S., Gopalan, V.P., and Valarmathi, B., “Topic models based personalized spam filter”, Proceedings of ISCF, 2006, 199–203.
  • Sanz, E.P., Hidalgo, J.M.G., and Perez, J.C.C., “Email Spam Filtering”, Advances in Computers, 74, 2008, 45–109.
  • Sidorova, A., Evangelopoulos, N., Valacich, J., and Ramakrishnan, T., “Uncovering the intellectual core of the information systems discipline”, MIS Quarterly, 32 (3), 2008, 467–482.
  • Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., and Freeman, W.T., “Discovering objects and their location in images”, International Conference on Computer Vision (ICCV 2005), 2005.
  • Strunk Jr, W., The elements of style”, Filiquarian Publishing, LLC., 2007.
  • Sun, J., Zhang, Q., Yuan, Z., Huang, W., Yan, X., and Dong, J., “Research of Spam Filtering System Based on LSA and SHA”, Springer, 2008, 340.
  • Tetlock, P.C., Saar-Tsechansky, M., and Macskassy, S., “More than words: Quantifying language to measure firms' fundamentals”, Journal of Finance, 63 (3), 2008, 1437–1467.
  • Titov, I., and McDonald, R., “A joint model of text and aspect ratings for sentiment summarization”, Urbana, 51, 2008, 308–316.
  • Wu, H., Wang, Y., and Cheng, X., “Incremental probabilistic latent semantic analysis for automatic question recommendation”, ACM New York, NY, USA, 2008, 99–106.
  • Xu, W., Liu, D., Guo, J., Cai, Y., and Hu, R., “Supervised Dual-PLSA for Personalized SMS Filtering”, Springer, 2009, 254–264.
  • Yang, W., and Dia, J., “Discovering cohesive subgroups from social networks for targeted advertising”, Expert Systems with Applications, 34, 2008, 2029–2038.
  • Zhai, H., Guo, J., Wu, Q., Cheng, X., Sheng, H., and Zhang, J., “Query Classification Based on Regularized Correlated Topic Model”, 2009.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.