1,268
Views
15
CrossRef citations to date
0
Altmetric
Articles

Peering into the Internet Abyss: Using Big Data Audience Analysis to Understand Online Comments

, , , , &
Pages 155-173 | Published online: 25 Jun 2019
 

ABSTRACT

This article offers a methodology for conducting large-scale audience analysis called “big data audience analysis” (BDAA). BDAA uses distant reading and thin description to examine a large corpus of text data from online audiences. In this article, that corpus is approximately 450,000 online reader comments. We analyze this corpus through sentiment analysis, statistical analysis, and geolocation to identify trends and patterns in large datasets. BDAA can better prepare TPC researchers for large-scale audience studies.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. As Dush (Citation2015) notes, “content exists as digital assets that are full of potential, characterized not by being finished or published, but rather by their availability for repurposing, mining, and other future uses. Content has a core conditional quality, fluidity in terms of what shape it may take and where it may travel, and indeterminacy in terms of who may use it, to what ends, and how various uses may come to be valued” (p. 176).

2. While outside the scope of this article, distant reading might also be used to look at more granular aspects of texts and discourse. For example, distant readers might look at the ways that comments begin or end, examining linguistic features of introductory phrases across a dataset.

3. For more on how web-scraping can be used to answer humanities-oriented questions, see Black (Citation2016).

4. To generate this wordcloud, the text mining (TM) package in the software program, R, was used. The data was cleaned in the following ways: the data was turned to lower case to make the words consistent in format; the stop words were removed, including pronouns (I, you, he, she, we, they, it) and prepositions (or, by, off, with); numbers and punctuations were removed. We used the standard stopword package in TM.

5. The bank of swear words used was as follows: “shit,” “bitch,” “damn,” “fuck,” “asshole,” and “ass.”

6. We chose the top 300 commenters, and there was a six-way tie for commenters ranked from 297 – to 302.

Additional information

Notes on contributors

John R. Gallagher

John R. Gallagher is an assistant professor of English at The University of Illinois, Urbana-Champaign (UIUC), specializing in digital rhetoric, interfaces, and participatory audience theory.

Yinyin Chen

Yinyin Chen is a doctoral student in Statistics at UIUC. Her research focuses on Bayesian Statistics and model identifiability. She received her B.S. degree from Zhejiang University. She has worked at Adobe and AB-Inbev as a data scientist.

Kyle Wagner

Kyle Wagner is a doctoral candidate in educational psychology at the University of Minnesota. His research interests include writing measures and interventions, educational measurement, and optimizing practice for acquisition and retention of academic skills.

Xuan Wang

Xuan Wang received a B.S. degree in Biological Science from Tsinghua University and M.S. degrees in Biochemistry and Statistics from UIUC. Since 2017, she has been pursuing a Doctoral degree in Computer Science from UIUC. Her research interests include text mining, data mining, natural language processing, and machine learning.

Jingyi Zeng

Jingyi Zeng is a M.S. student in the Statistics Department at UIUC. She is interested in machine learning for medical diagnostics.

Alyssa Lingyi Kong

Alyssa Lingyi Kong completed her M.S. degree in Statistics at UIUC. She joined the Tax Technology and Transformation group at Ernst and Young after graduation.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 212.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.