1,675
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Critical computation: mixed-methods approaches to big language data analysis

ORCID Icon & ORCID Icon
Pages 62-78 | Received 01 Apr 2022, Accepted 28 Aug 2022, Published online: 03 Apr 2023
 

ABSTRACT

In this theoretical piece, we discuss the limitations of using purely computational techniques to study big language data produced by people online. Instead, we advocate for mixed-method approaches that are able to more critically evaluate and consider the individual and social impact of this data. We propose one approach that combines qualitative, traditional quantitative, and computational methods for the study of language and text. Such approaches leverage the speed and expediency of computational tools while also highlighting the value of qualitative methods in critically assessing the outcome of computational results. In addition to this, we highlight two considerations for communication scholars utilizing big data: (1) the need to consider more language variations and (2) the importance of self-reflexivity when conducting big language data research. We conclude with additional recommendations for researchers seeking to adopt this framework in the context of their own research.

Acknowledgement

This work was inspired by a recent conference submission done in collaboration with Northeastern University Communication Media and Marginalization Lab network scientist Ryan Gallagher.

Notes

1 Kate Crawford, Mary L. Gray, and Kate Miltner, “Big Data| Critiquing Big Data: Politics, Ethics, Epistemology| Special Section Introduction,” International Journal of Communication 8 (2014): 10.

2 Emese Domahidi, JungHwan Yang, Julia Niemann-Lenz, and Leonard Reinecke, “Computational Communication Science| Outlining the Way Ahead in Computational Communication Science: An Introduction to the IJoC Special Section on ‘Computational Methods for Communication Science: Toward a Strategic Roadmap’,” International Journal of Communication 13 (2019): 9.

3 Buomsoo Kim, Jinsoo Park, and Jihae Suh, “Transparency and Accountability in AI Decision Support: Explaining and Visualizing Convolutional Neural Networks for Text Information,” Decision Support Systems 134 (2020): 113302; Hiroshi Kuwajima, Masayuki Tanaka, and Masatoshi Okutomi, “Improving Transparency of Deep Neural Inference Process,” Progress in Artificial Intelligence 8, no. 2 (2019): 273–85.

4 Klaus Krippendorff, Content Analysis: An Introduction to its Methodology, 4th ed. (SAGE, 2019): 280.

5 While some scholars from the positivist tradition believe quantitative methods to be objective and empirical, we approach the “problem of objectivity” from the understanding that nothing is ever truly objective. See, for example, Fuchs, Christian, “From digital positivism and administrative big data analytics towards critical digital and social media research,” European Journal of Communication 32, no. 1 (2017): 37–49. We further posit that critical and positivist approaches can mutually inform one another, rather than be held in contrast (see Ramasubramanian, Srividya, and Omotayo O. Banjo, “Critical Media Effects Framework: Bridging Critical Cultural Communication and Media Effects Through Power, Intersectionality, Context, and Agency,” Journal of Communication 70, no. 3 (2020): 379–400).

6 Emma Uprichard, “Sampling: Bridging Probability and Non-Probability Designs,” International Journal of Social Research Methodology 16, no. 1 (2013): 1–11. https://doi.org/10.1080/13645579.2011.633391

7 Royce Singleton, and Bruce C. Straits, Approaches to Social Research, 6th ed. (Oxford, U.K.: Oxford University Press, 2017) https://global.oup.com/ushe/product/approaches-to-social-research-9780190614249?cc=us&lang=en&

8 For book-length treatments of technological and algorithmic bias, see, for example: Safiya Umoja Noble, “Algorithms of Oppression,” In Algorithms of Oppression (New York University Press, 2018); Cathy O’neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway books, 2016); Caroline Criado Perez, Invisible Women: Data Bias in a World Designed for Men (Abrams, 2019).

9 Humam Khalid Yaseen, and Ahmed Mahdi Obaid, “Big Data: Definition, Architecture & Applications,” JOIV: International Journal on Informatics Visualization 4, no. 1 (2020): 45–51.

10 Jiming Hu, and Yin Zhang, “Discovering the Interdisciplinary Nature of Big Data Research Through Social Network Analysis and Visualization,” Scientometrics 112, no. 1 (2017): 91–109. https://doi.org/10.1007/s11192-017-2383-1; Daphne R. Raban, and Avishag Gordon, “The Evolution of Data Science and Big Data Research: A Bibliometric Analysis,” Scientometrics 122, no. 3 (2020): 1563–81. https://doi.org/10.1007/s11192-020-03371-2

11 Christian Fuchs, “From Digital Positivism and Administrative Big Data Analytics Towards Critical Digital and Social Media Research!,” European Journal of Communication 32, no. 1 (2017): 37–49.

12 Ossi Ylijoki, and Jari Porras, “Perspectives to Definition of Big Data: A Mapping Study and Discussion,” Journal of Innovation Management 4, no. 1 (2016): 69–91. https://doi.org/10.24840/2183-0606_004.001_0006

13 See, for example, Monerah Al-Mekhlal, and Amir Ali Khwaja, “A Synthesis of Big Data Definition and Characteristics,” In 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pp. 314–22. (IEEE, 2019). https://doi.org/10.1109/CSE/EUC.2019.00067; José María Cavanillas, Edward Curry, and Wolfgang Wahlster, New Horizons for a Data-Driven Economy: A Roadmap For Usage and Exploitation of Big Data in Europe (Springer Nature, 2016).

14 See, for example, Al-Mekhal and Khwaja Synthesis; Wo L. Chang, and Nancy Grady, “NIST Big Data Interoperability Framework: Volume 1, Big Data Definitions,” (2015); Andrea De Mauro, Marco Greco, and Michele Grimaldi, “A Formal Definition of Big Data based on its Essential Features,” Library Review (2016). https://doi.org/10.1108/LR-06-2015-0061; Jonathan Stuart Ward, and Adam Barker, “Undefined by Data: A Survey of Big Data Definitions,” arXiv preprint arXiv:1309.5821 (2013). http://arxiv.org/abs/1309.5821; Yaseen and Obaid, Big Data.

15 Ylijoki and Porras, Perspectives, 79.

16 De Mauro, Greco, and Grimaldi, Formal Definition.

17 Yaseen and Obaid, Big Data, 46.

18 Maddalena Favaretto, Eva De Clercq, Christophe Olivier Schneble, and Bernice Simone Elger, “What is Your Definition of Big Data? Researchers’ Understanding of the Phenomenon of the Decade,” PloS One 15, no. 2 (2020): e0228987. https://doi.org/10.1371/journal.pone.0228987

19 Cavanillas, Curry, and Wahlster, New Horizons.

20 Ward and Barker, Undefined.

21 Dagmar M. Schuller, and Björn W. Schuller, “A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice,” Emotion Review13, no. 1 (2021): 44–50. https://doi.org/10.1177/1754073919898526

22 James Paul Gee, An Introduction to Discourse Analysis: Theory and Method (Routledge, 2004). https://www.routledge.com/An-Introduction-to-Discourse-Analysis-Theory-and-Method/Gee/p/book/9780415725569

23 To paraphrase the validity concerns, existing dictionaries cannot be applied successfully to all cases, while building a dictionary necessarily relies on researcher assumptions without implementing the approaches for which we advocate in this article.

24 Michael X. Delli Carpini, “Breaking Boundaries: Can We Bridge the Quantitative Versus Qualitative Divide Through the Study of Entertainment and Politics?,” International Journal of Communication 7 (2013): 21.

25 Carpini and Delli, Breaking Boundaries; Catherine d’Ignazio, and Lauren F. Klein, “Feminist Data Visualization,” Workshop on Visualization for the Digital Humanities (VIS4DH) (Baltimore. IEEE., 2016). https://www.semanticscholar.org/paper/Feminist-Data-Visualization-D%27Ignazio-Klein/2e3e2eb1bdc1cab5b0fab515266bb8849d416f33; Alexis Lothian, and Amanda Phillips, “Can Digital Humanities Mean Transformative Critique?,” Journal of E-Media Studies 3, no. 1 (2013): 1–25. https://doi.org/10.1349/PS1.1938-6060.A.425; Yotam Ophir, Dror Walter, and Eleanor R. Marchant, “A Collaborative Way of Knowing: Bridging Computational Communication Research and Grounded Theory Ethnography,” Journal of Communication 70, no. 3 (2020): 447–72. https://doi.org/10.1093/joc/jqaa013

26 Ophir, Walter, and Marchant, Collaborative Knowing, emphasis in original, 448

27 Ibid.

28 Web scraping is the use of automated tools to collect content from a website or forum. Topic modeling, sometimes called the “bag of words” approach, is a type of unsupervised machine learning that uncovers the thematic structure of texts in a dataset. Network analysis explores the structural relationships of knowledge through “shared meaning and symbols” (Marya L. Doerfel, and George A. Barnett, “A Semantic Network Analysis of the International Communication Association,” Human Communication Research 25, no. 4 (1999): 589–603; 589); by applying statistical probabilities and extracting the relationships between objects in a text (Wouter van Atteveldt, “Semantic Network Analysis,” Techniques for Extracting, Representing, and Querying Media Content (2008)).

29 Here, we do not imply that any research is ever truly “objective,” since we always bring our ways of being in the world to bear on research design decisions. Nor do we intend to imply that qualitative scholarship is not empirical.

30 Thomas R. Lindlof, and Bryan C. Taylor, Qualitative Communication Research Methods (Sage Publications, 2002), 18.

31 Matthew Kirschenbaum, “What is ‘Digital Humanities,’ and Why Are They Saying Such Terrible Things About It?,” Differences 25, no. 1 (2014): 46–63.

32 Lothian and Phliips, Transformative Critique; Roopika Risam, “Beyond the Margins: Intersectionality and the Digital Humanities,”Digital Humanities Quarterly 9, no. 2 (2015).

33 d’Ignazio and Klein, Feminist Data

34 Ibid.

35 Ibid., 2.

36 Ibid., 2.

37 Ibid., 3.

38 Ibid., 3.

39 Ibid., 3.

40 Jessica Enoch, and Jean Bessette, “Meaningful Engagements: Feminist Historiography and the Digital Humanities,” College Composition and Communication (2013): 634–60.

41 d’Ignazio and Klein, Feminist Data, 3.

42 Michelle Rodino, “Breaking Out of Binaries: Reconceptualizing Gender and Its Relationship to Language in Computer-Mediated Communication,” Journal of Computer-Mediated Communication 3, no. 3 (1997): JCMC333.

43 For an example written by scholars from adjacent fields (e.g., political science and political sociology) to political communication that inform the same, see Daniel J. Levine, and David M. McCourt, “Why Does Pluralism Matter When We Study Politics? A View From Contemporary International Relations,” Perspectives on Politics 16, no. 1 (2018): 92–109.

44 The mission statement of the ACSJ, in fact, embraces several of these principles, stating, “The Activism, Communication, and Social Justice (ACSJ) Interest Group promotes research and teaching in the intersections of three key aspects of contemporary life as captured in its name. It strives for diversity in the representation of its membership and embraces pluralism and boldness in theory and methodology. It pushes the boundaries between theory and practice and between scholarship and activism by encouraging and facilitating dialogues and engagements” “Interest Groups: Activism, Communication and Social Justice,” International Communication Association. Accessed August 13, 2022. https://www.icahdq.org/group/activism).

45 See, for example, this (1988) article on “The Importance of Context in Applied Communication Research” Loyd S. Pettegrew, “The Importance of Context in Applied Communication Research,” Southern Speech Communication Journal 53, no. 4 (1988): 331–38.

46 This article has a breakdown of research approaches to knowledge production: Marton Demeter, and Manuel Goyanes, “A World-Systemic Analysis of Knowledge Production in International Communication and Media Studies: The Epistemic Hierarchy of Research Approaches,” The Journal of International Communication 27, no. 1 (2021): 38–58.

47 Philip N. Howard, “Network Ethnography and the Hypermedia Organization: New Media, New Organizations, New Methods,” New Media & Society 4, no. 4 (2002): 550–74. https://doi.org/10.1177/146144402321466813

48 Ibid., 550.

49 Kirschenbaum, Digital Humanities.

50 Klaus Krippendorff, “The Changing Landscape of Content Analysis: Reflections on Social Construction of Reality and Beyond,” Communication & Society 47 (2019): 1.

51 Joseph N. Cappella, “Vectors into the Future of Mass and Interpersonal Communication Research: Big Data, Social Media, and Computational Social Science,” Human Communication Research 43, no. 4 (2017): 545–58. https://doi.org/10.1111/hcre.12114

52 Marco Guerini, Carlo Strapparava, and Oliviero Stock, “Corps: A Corpus of Tagged Political Speeches For Persuasive Communication Processing,” Journal of Information Technology & Politics 5, no. 1 (2008): 19–32. https://doi.org/10.1080/19331680802149616

53 Fatemeh Torabi Asr, Mohammad Mazraeh, Alexandre Lopes, Vasundhara Gautam, Junette Gonzales, Prashanth Rao, and Maite Taboada, “The Gender Gap Tracker: Using Natural Language Processing to Measure Gender Bias in Media,” PloS One 16, no. 1 (2021): e0245533. https://doi.org/10.1371/journal.pone.0245533

54 Jennifer A. Manganello, Vani R. Henderson, Amy Jordan, Nicole Trentacoste, Suzanne Martin, Michael Hennessy, and Martin Fishbein, “Adolescent Judgment of Sexual Content on Television: Implications for Future Content Analysis Research,” Journal of Sex Research 47, no. 4 (2010): 364–73. https://doi.org/10.1080/00224490903015868

55 Edward C. Malthouse, and Hairong Li, “Opportunities for and Pitfalls of Using Big Data in Advertising Research,” Journal of Advertising 46, no. 2 (2017): 227–35. https://doi.org/10.1080/00913367.2017.1299653

56 Krippendorff, Changing Landscape.

57 R and Python are free, open-source programming languages commonly used in computational methods like machine learning. MaxQDA and Nvivo are paid data-analysis software packages commonly used by qualitative and mixed-method researchers.

58 Mirian Oliveira, Claudia Bitencourt, Eduardo Teixeira, and Ana Clarissa Santos, “Thematic Content Analysis: Is there a Difference Between the Support Provided by the MAXQDA® and NVivo® Software Packages,” Revista de Administração Da UFSM 9, no. 1 (2016): 72–82. https://doi.org/10.5902/1983465911213

59 QDA Miner is a paid tool used for qualitative data analysis. Wordstat, developed by the same company, is used for content analysis and text mining. It has both R and Python integrations.

60 See, for example, Lindlof & Taylor, Qualitative Communication Methods.

61 Damian Trilling, and Jeroen GF Jonkman, “Scaling up Content Analysis,” Communication Methods and Measures 12, no. 2–3 (2018): 158–74. https://doi.org/10.1080/19312458.2018.1447655

62 Andrea Ceron, Luigi Curini, and Stefano M. Iacus, “Using Sentiment Analysis to Monitor Electoral Campaigns: Method Matters—Evidence from the United States and Italy,” Social Science Computer Review 33, no. 1 (2015): 3–20. https://doi.org/10.1177/0894439314521983

63 Daniel Maier, Annie Waldherr, Peter Miltner, Gregor Wiedemann, Andreas Niekler, Alexa Keinert, Barbara Pfetsch et al, “Applying LDA Topic Modeling in Communication Research: Toward A Valid and Reliable Methodology,” Communication Methods and Measures 12, no. 2–3 (2018): 93–118. https://doi.org/10.1080/19312458.2018.1430754; Maria Y. Rodriguez, and Heather Storer, “A Computational Social Science Perspective on Qualitative Data Exploration: Using Topic Models for the Descriptive Analysis of Social Media Data,” Journal of Technology in Human Services 38, no. 1 (2020): 54–86. https://doi.org/10.1080/15228835.2019.1616350

64 Justin Grimmer, and Brandon M. Stewart, “Text As Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts,” Political Analysis 21, no. 3 (2013): 267–97. https://doi.org/10.1093/pan/mps028

65 Christian Baden, Christian Pipal, Martijn Schoonvelde, and Mariken AC G. van der Velden, “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda,” Communication Methods and Measures 16, no. 1 (2022): 1–18. https://doi.org/10.1080/19312458.2021.2015574

66 HITL approaches are generally models that require some type of human interaction. Here, we do not mean to imply researchers should coerce the data to fit a priori assumption but are, instead, using HITL as a metaphor for the cybortic approach we propose.

67 d’Ignazio and Klein, Feminist Data

68 David R. Thomas, “A General Inductive Approach for Qualitative Data Analysis,” The American Journal of Evaluation 27, no. 2 (2003).

69 Dror Walter, and Yotam Ophir, “News Frame Analysis: An Inductive Mixed-Method Computational Approach,” Communication Methods and Measures 13, no. 4 (2019): 248–66. https://doi.org/10.1080/19312458.2019.1639145

70 d’Ignazio and Klein, Feminist Data

71 Melanie Birks, Ysanne Chapman, and Karen Francis, “Memoing in Qualitative Research: Probing Data and Processes,” Journal of Research in Nursing 13, no. 1 (2008): 68–75. https://doi.org/10.1177/1744987107081254

72 d’Ignazio and Klein, Feminist Data, 2.

73 Kimberly A. Neuendorf, The Content Analysis Guidebook (Sage, 2017). https://doi.org/10.4135/9781071802878

74 Ward van Zoonen, and G. L. A. Toni, “Social Media Research: The Application of Supervised Machine Learning in Organizational Communication Research,” Computers in Human Behavior 63 (2016): 132–41. https://doi.org/10.1016/j.chb.2016.05.028

75 Björn Burscher, Daan Odijk, Rens Vliegenthart, Maarten De Rijke, and Claes H. De Vreese, “Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis,” Communication Methods and Measures 8, no. 3 (2014): 190–206. https://doi.org/10.1080/19312458.2014.937527

76 Claire Lauer, Eva Brumberger, and Aaron Beveridge, “Hand Collecting and Coding Versus Data-Driven Methods in Technical and Professional Communication Research,” IEEE Transactions on Professional Communication 61, no. 4 (2018): 389–408. https://doi.org/10.1109/TPC.2018.2870632

77 Norman L. Fairclough, “Critical and Descriptive Goals in Discourse Analysis,” Journal of Pragmatics 9, no. 6 (1985): 739–63. https://doi.org/10.1016/0378-2166(85)90002-5

78 d’Ignazio and Klein, Feminist Data.

79 Ariadna Matamoros-Fernández, and Johan Farkas, “Racism, Hate Speech, and Social Media: A Systematic Review and Critique,” Television & New Media 22, no. 2 (2021): 205–24. https://doi.org/10.1177/1527476420982230

80 Martin Emmer, and Marlene Kunst, “‘Digital Citizenship’ Revisited: The Impact of ICTs on Citizens’ Political Communication Beyond the Western State,” International Journal of Communication 12 (2018): 21.

81 Alexandre Magueresse, Vincent Carles, and Evan Heetderks, “Low-Resource Languages: A Review of Past Work and Future Challenges,” arXiv preprint arXiv:2006.07264 (2020). http://arxiv.org/abs/2006.07264

82 “Our Mission,” Center for Open Science, accessed August 13, 2022, https://www.cos.io/about/mission

83 “Open science and its role in universities: A roadmap for cultural change,” League of European Research Universities, 2018, accessed August 13, 2022, https://www.leru.org/publications/open-science-and-its-role-in-universities-a-roadmap-for-cultural-change

84 Ethics statements answer difficult and often complex questions about collection, storage, and dissemination of data used in big language data research. The AoIR set of ethical guidelines we mention details such questions, drawing on work by Annette Markham, Aline Shakti Franzke, and others.

85 Annette Markham, and Elizabeth Buchanan, “Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee (Version 2.0),” accessed August 13, 2022, https://aoir.org/reports/ethics2.pdf

86 Walter Ophir, and Marchant, Collaborative Knowing.

87 Deen Freelon, and David Karpf, “Of Big Birds and Bayonets: Hybrid Twitter Interactivity in the 2012 Presidential Debates,” Information, Communication & Society 18, no. 4 (2015): 390–406. https://doi.org/10.1080/1369118X.2014.952659

88 Yiping Xia, Josephine Lukito, Yini Zhang, Chris Wells, Sang Jung Kim, and Chau Tong, “Disinformation, Performed: Self-Presentation of a Russian IRA Account On Twitter,” Information, Communication & Society 22, no. 11 (2019): 1646–64. https://doi.org/10.1080/1369118X.2019.1621921

89 See, e.g., Stevie Chancellor, Eric PS Baumer, and Munmun De Choudhury, “Who Is the ‘Human’ in Human-Centered Machine Learning: The Case of Predicting Mental Health from Social Media,” Proceedings of the ACM on Human-Computer Interaction 3, no. CSCW (2019): 1–32. https://doi.org/10.1145/3359249

90 For more on the role of the researcher, see Elizabeth Halpern, and Ligia Costa Leite, “The Role of the Researcher when using the Socio-Anthropological Method to Understand the Phenomenon of Alcoholism,” Open Journal of Social Sciences 3, no. 05 (2015): 76. https://doi.org/10.4236/jss.2015.35011