Search in:

Communication Methods and Measures Volume 11, 2017 - Issue 4

Submit an article Journal homepage

8,486

Views

145

CrossRef citations to date

Altmetric

Teacher's Corner

Text Analysis in R

Kasper WelbersInstitute for Media Studies, University of Leuven, Leuven, BelgiumCorrespondence[email protected]

Wouter Van AtteveldtDepartment of Communcation Science, VU University Amsterdam, Amsterdam, The Netherlands

Kenneth BenoitDepartment of Methodology, London School of Economics and Political Science, London, UK

http://orcid.org/0000-0002-0797-564X

Pages 245-265 | Published online: 02 Nov 2017

Cite this article
https://doi.org/10.1080/19312458.2017.1387238
CrossMark

Sample our Communication Studies journals, sign in here to start your access, 2013 & 2014 volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/19312458.2017.1387238?needAccess=true

ABSTRACT

Computational text analysis has become an exciting research field with many applications in communication research. It can be a difficult method to apply, however, because it requires knowledge of various techniques, and the software required to perform most of these techniques is not readily available in common statistical software packages. In this teacher’s corner, we address these barriers by providing an overview of general steps and operations in a computational text analysis project, and demonstrate how each step can be performed using the R statistical software. As a popular open-source platform, R has an extensive user community that develops and maintains a wide range of text analysis packages. We show that these packages make it easy to perform advanced text analytics.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Notes

¹ The term “data science” is a popular buzzword related to “data-driven research” and “big data” (Provost & Fawcett, Citation2013).

² Other programming environments have similar archives, such as pip for python. However, CRAN excels in how it is strictly maintained, with elaborate checks that packages need to pass before they will be accepted.

³ The London School of Economics and Political Science recently hosted a workshop (http://textworkshop17.ropensci.org/), forming the beginnings of an rOpenSci special interest group for text analysis.

⁴ For example, the tif (Text Interchange Formats) package (rOpenSci Text Workshop, Citation2017) describes and validates standards for common text data formats.

⁵ https://github.com/kasperwelbers/text_analysis_in_R.

⁶ For a list that includes more packages, and that is also maintained over time, a good source is the CRAN Task View for Natural Language Processing (Wild, Citation2017). CRAN Task Views are expert curated and maintained lists of R packages on the Comprehensive R Archive Network, and are available for various major methodological topics.

⁷ http://www.tidyverse.org/.

⁸ Notably, there are techniques for automatically expanding a dictionary based on the semantic space of a text corpus (see, e.g., Watanabe, Citation2017). This can be said to add an inductive layer to the approach, because the coding rules (i.e., the dictionary) are to some extent learned from the data.

⁹ The term n-grams can be used more broadly to refer to sequences, and is also often used for sequences of individual characters. In this teacher’s corner we strictly use n-grams to refer to sequences of words.

¹⁰ To view how to cite a package, the citation function can be used—e.g., citation(“quanteda”) for citing quanteda, or citation() for citing the R project. This either provides the citation details provided by the package developer or auto-generated details.

Provost, F., & Fawcett, T. (2013). Data science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51–59. doi:10.1089/big.2013.1508

PubMed Web of Science ®Google Scholar

rOpenSci Text Workshop. (2017). tif: Text interchange format [Computer software manual]. Retrieved from https://github.com/ropensci/tif

Google Scholar

Wild, F. (2017). Cran task view: Natural language processing. CRAN. Version: 2017-01-17. Retrieved from https://CRAN.R-project.org/view=NaturalLanguageProcessing.

Google Scholar

Watanabe, K. (2017). The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis. The Journal of International Communication, 23(1), 138–158. doi:10.1080/13216597.2017.1287750

Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related Research Data

An overview of relevant automated content analysis approaches and techniques for digital journalism scholars

Source: Informa UK Limited

10.1162/jmlr.2003.3.4-5.993

Source: Test accounts

Validating Wordscores: The Promises and Pitfalls of Computational Text Scaling

Source: Informa UK Limited

Quantitative text analysis

Source: Qeios

Text Categorization with Support Vector Machines. How to Represent Texts in Input Space

Source: Springer Science and Business Media LLC

topicmodels: An R Package for Fitting Topic Models

Source: Foundation for Open Access Statistic

Structural topic models for open ended survey responses

Source: Wiley

How to Avoid Pitfalls in Statistical Analysis of Political Texts: The Case of Germany

Source: Informa UK Limited

Text Mining Infrastructure in R

Source: Foundation for Open Access Statistics

Lexicon-based methods for sentiment analysis

Source: MIT Press - Journals

Party contestation and Europe on the news agenda: the 2009 European Parliamentary elections

Source: Elsevier BV

Tidy Data

Source: Foundation for Open Access Statistic

The psychological meaning of words: LIWC and computerized text analysis methods

Source: SAGE Publications

#Globalcitizen: An Explorative Twitter Analysis of Global Identity and Sustainability Communication

Source: MDPI AG

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Source: Cambridge University Press (CUP)

Automated text analysis methods for digital journalism research

Source: Informa UK Limited

Clause Analysis: Using Syntactic Information to Automatically Extract Source, Subject, and Predicate from Texts with an Application to the 2008–2009 Gaza War

Source: Cambridge University Press (CUP)

Give science and peace a chance: Speeches by Nobel laureates in the sciences, 1901-2018

Source: Public Library of Science (PLoS)

Setting the Agenda for Research on Media and Migration: State-of-the-Art and Directions for Future Research

Source: Informa UK Limited

Who’s listening to whom? The UK House of Lords and evidence-based policy-making on citizenship education

Source: Informa UK Limited

A new readability yardstick.

Source: American Psychological Association (APA)

A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

Source: Springer Science and Business Media LLC

Social media presence of scholarly journals

Source: Wiley

Introduction to Information Retrieval

Source: Cambridge University Press

The spread of the Kremlin’s narratives by a western news agency during the Ukraine crisis

Source: Informa UK Limited

Fast, Consistent Tokenization of Natural Language Text

Source: The Open Journal

Agency in Earth System Governance

Source: Cambridge University Press

More than words: Social networks' text mining for consumer brand sentiments

Source: Elsevier BV

Anti-Immigrant Party Support and Media Visibility: A Cross-Party, Over-Time Perspective

Source: Informa UK Limited

Using Text Mining to Estimate Schedule Delay Risk of 13 Offshore Oil and Gas EPC Case Studies During the Bidding Process

Source: MDPI AG

A Scaling Model for Estimating Time-Series Party Positions from Texts

Source: Wiley

Beyond lexical frequencies: using R for text analysis in the digital humanities

Source: HAL CCSD

tidytext: Text Mining and Analysis Using Tidy Data Principles in R

Source: (:unav)

Linking provided by

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Text Analysis in R

Related Research Data

Information for

Open access

Opportunities

Help and information

Text Analysis in R

ABSTRACT

Declaration of interest

Notes

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature