Abstract
As computer-assisted research of voluminous datasets becomes more pervasive, so does the criticism of its epistemological, methodological, and ethical/normative inadequacies. This article proposes a hybrid approach that combines the scale of computational methods with the depth of qualitative analysis. It uses simple natural language processing algorithms to extract purposive samples from large textual corpora, which can then be analyzed using interpretive techniques. This approach helps research become more theoretically grounded and contextually sensitive—two major failings of typical “Big Data” studies. Simultaneously, it allows qualitative scholars to examine datasets that are otherwise too large to study manually and also bring more rigor to the process of sampling. The method is illustrated with two case studies, one looking at the inaugural addresses of U.S. presidents and the other investigating the news coverage of two shootings at an army camp in Texas.
Notes
1 In recent years, panels and sessions focusing on Big Data scholarship and pedagogy have become commonplace at academic conferences in the disciplines of journalism and communication studies, including annual conventions of the International Communication Association (ICA) and the Association for Education in Journalism and Mass Communication (AEJMC). Several journals in these and other social scientific disciplines have published special issues on or related to Big Data research, such as The ANNALS of The American Academy of Political and Social Science (Citation2015), Digital Journalism (2015), International Journal of Communication (2014), and Journal of Broadcasting and Electronic Media (2013).
2 Python is a general-purpose programming language that uses codes which are short, simple and highly readable. It can be downloaded from www.python.org for a number of operating systems. NLTK has several customized algorithms that make working with Python easier. It can be downloaded from www.nltk.org. Bird, Klein & Loper’s (Citation2009) book, Natural Language Processing with Python, is recommended for scholars interested in learning how to use Python with NLTK. It is available online at www.nltk.org/book.
3 “C:\Users\Desktop” is a Windows file path. For Mac users, the corresponding file path will be “/user/desktop.”
4 Scholars who want to learn more about concordance may look at Section 1.3 of the first chapter of Natural Language Processing with Python, available online (www.nltk.org/book/ch01.html). The section is called “Searching Text.”
5 Scholars who want to learn more about regular expression may look at Section 3.4 of the third chapter of Natural Language Processing with Python, available online (http://www.nltk.org/book/ch03.html). The section is called “Regular Expressions for Detecting Word Patterns.”