8,566
Views
145
CrossRef citations to date
0
Altmetric
Teacher's Corner

Text Analysis in R

, & ORCID Icon

References

  • Arnold, T. (2017a). cleannlp: A tidy data model for natural language processing [Computer software manual] ( R package version 1.9.0). Retrieved from https://CRAN.R-project.org/package=cleanNLP
  • Arnold, T. (2017b). kerasR: R interface to the keras deep learning library [Computer software manual] ( R package version 0.6.1). Retrieved from https://CRAN.R-project.org/package=kerasR
  • Arnold, T., & Tilton, L. (2016). coreNLP: Wrappers around Stanford CoreNLP tools [Computer software manual] ( R package version 0.4-2). Retrieved from https://CRAN.R-project.org/package=coreNLP
  • Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of Recent Advances in Natural Language Processing (RANLP). Retrieved from http://research.microsoft.com/pubs/65430/new_domain_sentiment.pdf
  • Bates, D., & Maechler, M. (2015). Matrix: Sparse and dense matrix classes and methods [Computer software manual] ( R package version 1.2-3). Retrieved from https://CRAN.R-project.org/package=Matrix
  • Benoit, K., & Matsuo, A. (2017). spacyr: R Wrapper to the spaCY NLP Library [Computer software manual] ( R package version 0.9.0). Retrieved from https://CRAN.R-project.org/package=spacyr
  • Benoit, K., Watanabe, K., Nulty, P., Obeng, A., Wang, H., Lauderdale, B., & Lowe, W. (2017). quanteda: Quantitative analysis of textual data [Computer software manual] ( R package version 0.99). Retrieved from http://quanteda.io
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
  • Bouchet-Valat, M. (2014). SnowballC: Snowball Stemmers based on the C Libstemmer UTF-8 Library [Computer software manual] ( R package version 0.5.1). Retrieved from https://CRAN.R-project.org/package=SnowballC
  • Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. doi:10.1080/21670811.2015.1096598
  • Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173(3), 781–800. doi:10.1016/j.ejor.2005.07.023
  • De Smedt, T., & Daelemans, W. (2012). “vreselijk mooi!” (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), Istanbul, May 2012, 3568–3572.
  • Feinerer, I., & Hornik, K. (2017). tm: Text mining package [Computer software manual] ( R package version 0.7-1). Retrieved from https://CRAN.R-project.org/package=tm
  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233. doi:10.1037/h0057532
  • Fox, J., & Leanage, A. (2016). R and the journal of statistical software. Journal of Statistical Software, 73(2), 1–13.
  • Gagolewski, M. (2017). R package stringi: Character string processing facilities [Computer software manual]. Retrieved from http://www.gagolewski.com/software/stringi/
  • Gardner, M. J., Lutes, J., Lund, J., Hansen, J., Walker, D., Ringger, E., & Seppi, K. (2010). The topic browser: An interactive tool for browsing topic models. In Nips workshop on Challenges of Data Visualization. Retrieved from http://cseweb.ucsd.edu/~lvdmaaten/workshops/nips2010/papers/gardner.pdf
  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. In Proceedings of the National Academy of Sciences, 5228–5235. doi:10.1073/pnas.0307752101
  • Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. doi:10.1093/pan/mps028
  • Grun, B., & Hornik, K. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30. doi:10.18637/jss.v040.i13
  • Günther, E., & Quandt, T. (2016). Word counts and topic models: Automated text analysis methods for digital journalism research. Digital Journalism, 4(1), 75–88. doi:10.1080/21670811.2015.1093270
  • Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., & Van Atteveldt, W. (2014). RTextTools: Automatic text classification via supervised learning [Computer software manual] ( R package version 1.4.2). Retrieved from https://CRAN.R-project.org/package=RTextTools
  • Lang, D. T., & the CRAN Team. (2017). XML: Tools for parsing and generating XML within R and S-plus [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=XML
  • Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1), 423–444. doi:10.1023/a:1012491419635
  • Manning, C. D., Manning, C. D., Raghavan, P., Raghavan, P., Schütze, H., & Schütze, H. (2008). Introduction to information retrieval. Cambridge, UK: Cambridge University Press. doi:10.1017/cbo9780511809071
  • Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60). doi: 10.3115/v1/p14-5010
  • McCarthy, P. M., & Jarvis, S. (2010). Mtld, vocd-d, and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. doi:10.3758/brm.42.2.381
  • McLuhanm, M. (1964). Understanding Media: The Extensions of Man. New York: Pinguin Press.
  • Meyer, D., Hornik, K., & Feinerer, I. (2008). Text mining infrastructure in r. Journal of Statistical Software, 25(5), 1–54. doi:10.18637/jss.v025.i05
  • Michalke, M. (2017). koRpus: An R package for text analysis [Computer software manual] ( Version 0.10-2). Retrieved from https://reaktanz.de/?c=hacking&s=koRpus
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference of Learning Representations. arXiv preprint arXiv:1301.3781, Scottsdale, Arizona, May 2013.
  • Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expertat Systems with Applications, 40(10), 4241–4251. doi:10.1016/j.eswa.2013.01.019
  • Mullen, L. (2016a). textreuse: Detect text reuse and document similarity [Computer software manual] ( R package version 0.1.4). Retrieved from https://CRAN.R-project.org/package=textreuse.
  • Mullen, L. (2016b). tokenizers: A consistent interface to tokenize natural language text [Computer software manual] ( R package version 0.1.4). Retrieved from https://CRAN.R-project.org/package=tokenizers
  • Ooms, J. (2014). The jsonlite package: A practical and consistent mapping between json data and r objects [Computer software manual]. Retrieved from https://arxiv.org/abs/1403.2805
  • Ooms, J. (2017a). antiword: Extract text from microsoft word documents [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=antiword
  • Ooms, J. (2017b). pdftools: Text extraction, rendering and converting of pdf documents [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=pdftools
  • Porter, M. F. (2001). Snowball: A language for stemming algorithms. Retrieved from http://snowball.tartarus.org/texts/introduction.html
  • Proksch, S.-O., & Slapin, J. B. (2009). How to avoid pitfalls in statistical analysis of political texts: The case of germany. German Politics, 18(3), 323–344. doi:10.1080/09644000903055799
  • Provost, F., & Fawcett, T. (2013). Data science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51–59. doi:10.1089/big.2013.1508
  • R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
  • Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., … Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. doi:10.1111/ajps.12103
  • rOpenSci Text Workshop. (2017). tif: Text interchange format [Computer software manual]. Retrieved from https://github.com/ropensci/tif
  • Schuck, A. R., Xezonakis, G., Elenbaas, M., Banducci, S. A., & De Vreese, C. H. (2011). Party contestation and Europe on the news agenda: The 2009 European Parliamentary Elections. Electoral Studies, 30(1), 41–52. doi:10.1016/j.electstud.2010.09.021
  • Schultz, F., Kleinnijenhuis, J., Oegema, D., Utz, S., & Van Atteveldt, W. (2012). Strategic framing in the BP crisis: A semantic network analysis of associative frames. Public Relations Review, 38(1), 97–107. doi:10.1016/j.pubrev.2011.08.003t
  • Selivanov, D. (2016). text2vec: Modern text mining framework for R [Computer software manual] ( R package version 0.4.0). Retrieved from https://CRAN.R-project.org/package=text2vec
  • Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1, 3. doi:10.21105/joss.00037
  • Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time-series party positions from texts. American Journal of Political Science, 52(3), 705–722. doi:10.1111/j.1540-5907.2008.00338.x
  • Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. doi:10.1162/coli_a_00049
  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. doi:10.1177/0261927X09351676
  • TIOBE. (2017). The R programming language. Retrieved from https://www.tiobe.com/tiobe-index/r/
  • Van Atteveldt, W. (2008). Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content ( Dissertation). Charleston, SC: BookSurge.
  • Van Atteveldt, W., Sheafer, T., Shenhav, S. R., & Fogel-Dror, Y. (2017). Clause analysis: Using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 Gaza War. Political Analysis, 25(2), 207–222. doi:10.1017/pan.2016.12
  • Vliegenthart, R., Boomgaarden, H. G., & Van Spanje, J. (2012). Anti-immigrant party support and media visibility: A cross-party, over-time perspective. Journal of Elections, Public Opinion & Parties, 22(3), 315–358. doi:10.1080/17457289.2012.693933
  • Watanabe, K. (2017). The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis. The Journal of International Communication, 23(1), 138–158. doi:10.1080/13216597.2017.1287750
  • Welbers, K., & Van Atteveldt, W. (2016). corpustools: Tools for managing, querying and analyzing tokenized text [Computer software manual] ( R package version 0.201). Retrieved from http://github.com/kasperwelbers/corpustools
  • Welbers, K., Van Atteveldt, W., Kleinnijenhuis, J., & Ruigrok, N. (2016). A Gatekeeper among Gatekeepers: News Agency Influence in Print and Online Newspapers in the Netherlands. Journalism Studies, 1–19 (online first). doi:10.1080/1461670x.2016.1190663
  • Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. doi:10.18637/jss.v059.i10
  • Wickham, H., & Bryan, J. (2017). readxl: Read excel files [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=readxl
  • Wild, F. (2017). Cran task view: Natural language processing. CRAN. Version: 2017-01-17. Retrieved from https://CRAN.R-project.org/view=NaturalLanguageProcessing.
  • Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML) (pp. 412–420), Nashville, TN, July 1997.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.