Search in:

Advanced search

Communication Methods and Measures Volume 11, 2017 - Issue 4

Submit an article Journal homepage

8,566

Views

145

CrossRef citations to date

Altmetric

Teacher's Corner

Text Analysis in R

Kasper WelbersInstitute for Media Studies, University of Leuven, Leuven, BelgiumCorrespondence[email protected]

Wouter Van AtteveldtDepartment of Communcation Science, VU University Amsterdam, Amsterdam, The Netherlands

Kenneth BenoitDepartment of Methodology, London School of Economics and Political Science, London, UK

http://orcid.org/0000-0002-0797-564X

Pages 245-265 | Published online: 02 Nov 2017

Cite this article
https://doi.org/10.1080/19312458.2017.1387238
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Arnold, T. (2017a). cleannlp: A tidy data model for natural language processing [Computer software manual] ( R package version 1.9.0). Retrieved from https://CRAN.R-project.org/package=cleanNLP
Google Scholar
Arnold, T. (2017b). kerasR: R interface to the keras deep learning library [Computer software manual] ( R package version 0.6.1). Retrieved from https://CRAN.R-project.org/package=kerasR
Google Scholar
Arnold, T., & Tilton, L. (2016). coreNLP: Wrappers around Stanford CoreNLP tools [Computer software manual] ( R package version 0.4-2). Retrieved from https://CRAN.R-project.org/package=coreNLP
Google Scholar
Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. In Proceedings of Recent Advances in Natural Language Processing (RANLP). Retrieved from http://research.microsoft.com/pubs/65430/new_domain_sentiment.pdf
Google Scholar
Bates, D., & Maechler, M. (2015). Matrix: Sparse and dense matrix classes and methods [Computer software manual] ( R package version 1.2-3). Retrieved from https://CRAN.R-project.org/package=Matrix
Google Scholar
Benoit, K., & Matsuo, A. (2017). spacyr: R Wrapper to the spaCY NLP Library [Computer software manual] ( R package version 0.9.0). Retrieved from https://CRAN.R-project.org/package=spacyr
Google Scholar
Benoit, K., Watanabe, K., Nulty, P., Obeng, A., Wang, H., Lauderdale, B., & Lowe, W. (2017). quanteda: Quantitative analysis of textual data [Computer software manual] ( R package version 0.99). Retrieved from http://quanteda.io
Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Web of Science ®Google Scholar
Bouchet-Valat, M. (2014). SnowballC: Snowball Stemmers based on the C Libstemmer UTF-8 Library [Computer software manual] ( R package version 0.5.1). Retrieved from https://CRAN.R-project.org/package=SnowballC
Google Scholar
Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. doi:10.1080/21670811.2015.1096598
Web of Science ®Google Scholar
Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173(3), 781–800. doi:10.1016/j.ejor.2005.07.023
Web of Science ®Google Scholar
De Smedt, T., & Daelemans, W. (2012). “vreselijk mooi!” (terribly beautiful): A subjectivity lexicon for dutch adjectives. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC), Istanbul, May 2012, 3568–3572.
Google Scholar
Feinerer, I., & Hornik, K. (2017). tm: Text mining package [Computer software manual] ( R package version 0.7-1). Retrieved from https://CRAN.R-project.org/package=tm
Google Scholar
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233. doi:10.1037/h0057532
PubMed Web of Science ®Google Scholar
Fox, J., & Leanage, A. (2016). R and the journal of statistical software. Journal of Statistical Software, 73(2), 1–13.
Web of Science ®Google Scholar
Gagolewski, M. (2017). R package stringi: Character string processing facilities [Computer software manual]. Retrieved from http://www.gagolewski.com/software/stringi/
Google Scholar
Gardner, M. J., Lutes, J., Lund, J., Hansen, J., Walker, D., Ringger, E., & Seppi, K. (2010). The topic browser: An interactive tool for browsing topic models. In Nips workshop on Challenges of Data Visualization. Retrieved from http://cseweb.ucsd.edu/~lvdmaaten/workshops/nips2010/papers/gardner.pdf
Google Scholar
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. In Proceedings of the National Academy of Sciences, 5228–5235. doi:10.1073/pnas.0307752101
Google Scholar
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. doi:10.1093/pan/mps028
Web of Science ®Google Scholar
Grun, B., & Hornik, K. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30. doi:10.18637/jss.v040.i13
Web of Science ®Google Scholar
Günther, E., & Quandt, T. (2016). Word counts and topic models: Automated text analysis methods for digital journalism research. Digital Journalism, 4(1), 75–88. doi:10.1080/21670811.2015.1093270
Web of Science ®Google Scholar
Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., & Van Atteveldt, W. (2014). RTextTools: Automatic text classification via supervised learning [Computer software manual] ( R package version 1.4.2). Retrieved from https://CRAN.R-project.org/package=RTextTools
Google Scholar
Lang, D. T., & the CRAN Team. (2017). XML: Tools for parsing and generating XML within R and S-plus [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=XML
Google Scholar
Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1), 423–444. doi:10.1023/a:1012491419635
Web of Science ®Google Scholar
Manning, C. D., Manning, C. D., Raghavan, P., Raghavan, P., Schütze, H., & Schütze, H. (2008). Introduction to information retrieval. Cambridge, UK: Cambridge University Press. doi:10.1017/cbo9780511809071
Google Scholar
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60). doi: 10.3115/v1/p14-5010
Google Scholar
McCarthy, P. M., & Jarvis, S. (2010). Mtld, vocd-d, and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. doi:10.3758/brm.42.2.381
PubMed Web of Science ®Google Scholar
McLuhanm, M. (1964). Understanding Media: The Extensions of Man. New York: Pinguin Press.
Google Scholar
Meyer, D., Hornik, K., & Feinerer, I. (2008). Text mining infrastructure in r. Journal of Statistical Software, 25(5), 1–54. doi:10.18637/jss.v025.i05
Web of Science ®Google Scholar
Michalke, M. (2017). koRpus: An R package for text analysis [Computer software manual] ( Version 0.10-2). Retrieved from https://reaktanz.de/?c=hacking&s=koRpus
Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference of Learning Representations. arXiv preprint arXiv:1301.3781, Scottsdale, Arizona, May 2013.
Google Scholar
Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expertat Systems with Applications, 40(10), 4241–4251. doi:10.1016/j.eswa.2013.01.019
Web of Science ®Google Scholar
Mullen, L. (2016a). textreuse: Detect text reuse and document similarity [Computer software manual] ( R package version 0.1.4). Retrieved from https://CRAN.R-project.org/package=textreuse.
Google Scholar
Mullen, L. (2016b). tokenizers: A consistent interface to tokenize natural language text [Computer software manual] ( R package version 0.1.4). Retrieved from https://CRAN.R-project.org/package=tokenizers
Google Scholar
Ooms, J. (2014). The jsonlite package: A practical and consistent mapping between json data and r objects [Computer software manual]. Retrieved from https://arxiv.org/abs/1403.2805
Google Scholar
Ooms, J. (2017a). antiword: Extract text from microsoft word documents [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=antiword
Google Scholar
Ooms, J. (2017b). pdftools: Text extraction, rendering and converting of pdf documents [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=pdftools
Google Scholar
Porter, M. F. (2001). Snowball: A language for stemming algorithms. Retrieved from http://snowball.tartarus.org/texts/introduction.html
Google Scholar
Proksch, S.-O., & Slapin, J. B. (2009). How to avoid pitfalls in statistical analysis of political texts: The case of germany. German Politics, 18(3), 323–344. doi:10.1080/09644000903055799
Web of Science ®Google Scholar
Provost, F., & Fawcett, T. (2013). Data science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51–59. doi:10.1089/big.2013.1508
PubMed Web of Science ®Google Scholar
R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Google Scholar
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., … Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. doi:10.1111/ajps.12103
Web of Science ®Google Scholar
rOpenSci Text Workshop. (2017). tif: Text interchange format [Computer software manual]. Retrieved from https://github.com/ropensci/tif
Google Scholar
Schuck, A. R., Xezonakis, G., Elenbaas, M., Banducci, S. A., & De Vreese, C. H. (2011). Party contestation and Europe on the news agenda: The 2009 European Parliamentary Elections. Electoral Studies, 30(1), 41–52. doi:10.1016/j.electstud.2010.09.021
Web of Science ®Google Scholar
Schultz, F., Kleinnijenhuis, J., Oegema, D., Utz, S., & Van Atteveldt, W. (2012). Strategic framing in the BP crisis: A semantic network analysis of associative frames. Public Relations Review, 38(1), 97–107. doi:10.1016/j.pubrev.2011.08.003t
Web of Science ®Google Scholar
Selivanov, D. (2016). text2vec: Modern text mining framework for R [Computer software manual] ( R package version 0.4.0). Retrieved from https://CRAN.R-project.org/package=text2vec
Google Scholar
Silge, J., & Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1, 3. doi:10.21105/joss.00037
Google Scholar
Slapin, J. B., & Proksch, S.-O. (2008). A scaling model for estimating time-series party positions from texts. American Journal of Political Science, 52(3), 705–722. doi:10.1111/j.1540-5907.2008.00338.x
Web of Science ®Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. doi:10.1162/coli_a_00049
Web of Science ®Google Scholar
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. doi:10.1177/0261927X09351676
Web of Science ®Google Scholar
TIOBE. (2017). The R programming language. Retrieved from https://www.tiobe.com/tiobe-index/r/
Google Scholar
Van Atteveldt, W. (2008). Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content ( Dissertation). Charleston, SC: BookSurge.
Google Scholar
Van Atteveldt, W., Sheafer, T., Shenhav, S. R., & Fogel-Dror, Y. (2017). Clause analysis: Using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008–2009 Gaza War. Political Analysis, 25(2), 207–222. doi:10.1017/pan.2016.12
Web of Science ®Google Scholar
Vliegenthart, R., Boomgaarden, H. G., & Van Spanje, J. (2012). Anti-immigrant party support and media visibility: A cross-party, over-time perspective. Journal of Elections, Public Opinion & Parties, 22(3), 315–358. doi:10.1080/17457289.2012.693933
Google Scholar
Watanabe, K. (2017). The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis. The Journal of International Communication, 23(1), 138–158. doi:10.1080/13216597.2017.1287750
Google Scholar
Welbers, K., & Van Atteveldt, W. (2016). corpustools: Tools for managing, querying and analyzing tokenized text [Computer software manual] ( R package version 0.201). Retrieved from http://github.com/kasperwelbers/corpustools
Google Scholar
Welbers, K., Van Atteveldt, W., Kleinnijenhuis, J., & Ruigrok, N. (2016). A Gatekeeper among Gatekeepers: News Agency Influence in Print and Online Newspapers in the Netherlands. Journalism Studies, 1–19 (online first). doi:10.1080/1461670x.2016.1190663
Web of Science ®Google Scholar
Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. doi:10.18637/jss.v059.i10
PubMed Web of Science ®Google Scholar
Wickham, H., & Bryan, J. (2017). readxl: Read excel files [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=readxl
Google Scholar
Wild, F. (2017). Cran task view: Natural language processing. CRAN. Version: 2017-01-17. Retrieved from https://CRAN.R-project.org/view=NaturalLanguageProcessing.
Google Scholar
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML) (pp. 412–420), Nashville, TN, July 1997.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Text Analysis in R

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Text Analysis in R

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date