1,955
Views
4
CrossRef citations to date
0
Altmetric
Articles

Noise Pollution: A Multi-Step Approach to Assessing the Consequences of (Not) Validating Search Terms on Automated Content Analyses

ORCID Icon, ORCID Icon & ORCID Icon

References

  • Baden, Christian, Christian Pipal, Martijn Schoonvelde, and Mariken A. C. G. van der Velden. 2022. “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda.” Communication Methods and Measures 16 (1): 1–18. https://doi.org/10.1080/19312458.2021.2015574.
  • Barberá, Pablo, Amber E. Boydstun, Suzanna Linn, Ryan McMahon, and Jonathan Nagler. 2021. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis 29 (1): 19–42. https://doi.org/10.1017/pan.2020.8.
  • Benoit, K. 2020. “Text as Data: An Overview.” In The SAGE Handbook of Research Methods in Political Science and International Relations, edited by Luigi Curini, Robert J. Franzese, and James F. Adams, 461–497. Los Angeles, London, New Delhi, Singapore, Washington DC, Melbourne: SAGE reference.
  • Benoit, K., Kohei Watanabe, Haiyan Wang, Paul Nulty, Adam Obeng, Stefan Müller, and Akitaka Matsuo. 2018. “Quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3 (30): 774. https://doi.org/10.21105/joss.00774.
  • Blatchford, Annie. 2020. “Searching for Online News Content: The Challenges and Decisions.” Communication Research and Practice 6 (2): 143–156. https://doi.org/10.1080/22041451.2019.1676864.
  • Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022.
  • Bohr, Jeremiah. 2020. “Reporting on Climate Change: A Computational Analysis of U.S. Newspapers and Sources of Bias, 1997–2017.” Global Environmental Change 61: 102038. https://doi.org/10.1016/j.gloenvcha.2020.102038.
  • Boumans, Jelle W., and Damian Trilling. 2016. “Taking Stock of the Toolkit: An Overview of Relevant Automated Content Analysis Approaches and Techniques for Digital Journalism Scholars.” Digital Journalism 4 (1): 8–23. https://doi.org/10.1080/21670811.2015.1096598.
  • Boyd, Danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication & Society 15 (5): 662–679. https://doi.org/10.1080/1369118X.2012.678878.
  • Brüggemann, M., and R. Sadikni. 2020. “Online Media Monitor on Climate Change (OMM): Analysis of Global Tweets and Online Media Coverage.” https://icdc.cen.uni-hamburg.de/en/omm-mediaanalysis.html.
  • Campbell, R. Sherlock, and James W. Pennebaker. 2003. “The Secret Life of Pronouns: Flexibility in Writing Style and Physical Health.” Psychological Science 14 (1): 60–65. https://doi.org/10.1111/1467-9280.01419.
  • Damerau, Fred J. 1993. “Generating and Evaluating Domain-Oriented Multi-Word Terms from Texts.” Information Processing & Management 29 (4): 433–447. https://doi.org/10.1016/0306-4573(93)90039-G.
  • de Vries, Erik, de Martijn Schoonvelde, and Gijs Schumacher. 2018. “No Longer Lost in Translation: Evidence That Google Translate Works for Comparative Bag-of-Words Text Applications.” Political Analysis 26 (4): 417–430. https://doi.org/10.1017/pan.2018.26.
  • Denny, Matthew J., and Arthur Spirling. 2018. “Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do About It.” Political Analysis 26 (2): 168–189. https://doi.org/10.1017/pan.2017.44.
  • DiMaggio, Paul. 2015. “Adapting Computational Text Analysis to Social Science (And Vice Versa).” Big Data & Society 2 (2): 205395171560290. https://doi.org/10.1177/2053951715602908.
  • Findley, Michael G., Kyosuke Kikuta, and Michael Denly. 2021. “External Validity.” Annual Review of Political Science 24 (1): 365–393. https://doi.org/10.1146/annurev-polisci-041719-102556.
  • Gupta, Shivani, and Atul Gupta. 2019. “Dealing with Noise Problem in Machine Learning Data-Sets: A Systematic Review.” Procedia Computer Science 161: 466–474. https://doi.org/10.1016/j.procs.2019.11.146.
  • Hase, Valerie, Daniela Mahl, and Mike S. Schäfer. 2022. “Der „Computational Turn“: Ein „Interdisziplinärer Turn“? Ein Systematischer Überblick Zur Nutzung Der Automatisierten Inhaltsanalyse in Der Journalismusforschung.” Medien & Kommunikationswissenschaft 70 (1–2): 60–78. https://doi.org/10.5771/1615-634X-2022-1-2-60.
  • Hase, Valerie, Daniela Mahl, Mike S. Schäfer, and Tobias R. Keller. 2021. “Climate Change in News Media Across the Globe: An Automated Analysis of Issue Attention and Themes in Climate Change Coverage in 10 Countries (2006–2018).” Global Environmental Change 70: 102353. https://doi.org/10.1016/j.gloenvcha.2021.102353.
  • Jacobi, Carina, Katharina Kleinen-von Königslöw, and Nel Ruigrok. 2016. “Political News in Online and Print Newspapers.” Digital Journalism 4 (6): 723–742. https://doi.org/10.1080/21670811.2015.1087810.
  • Jang, S. Mo, and P. Sol Hart. 2015. “Polarized Frames on “Climate Change” and “Global Warming” Across Countries and States: Evidence from Twitter Big Data.” Global Environmental Change 32: 11–17. https://doi.org/10.1016/j.gloenvcha.2015.02.010.
  • Jünger, Jakob, and Chantal Gärtner. 2021. “Distilling Issue Cycles from Large Databases: A Time-Series Analysis of Terrorism and Media in Africa.” Social Science Computer Review 39 (6): 1272–1291. https://doi.org/10.1177/0894439320979675.
  • Jünger, Jakob, S. Geise, and M. Hänelt. 2022. “Unboxing Computational Social Media Research: From a Datahermeneutical Perspective: How Do Scholars Address the Tension Between Automation and Interpretation?” International Journal of Communication 16: 1482–1505.
  • Karlsson, Michael, and Helle Sjøvaag. 2016. “Introduction: Research Methods in an Age of Digital Journalism.” Digital Journalism 4 (1): 1–7. https://doi.org/10.1080/21670811.2015.1096595.
  • King, Gary, Patrick Lam, and Margaret E. Roberts. 2017. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” American Journal of Political Science 61 (4): 971–988. https://doi.org/10.1111/ajps.12291.
  • Kirilenko, Andrei P., and Svetlana O. Stepchenkova. 2014. “Public Microblogging on Climate Change: One Year of Twitter Worldwide.” Global Environmental Change 26: 171–182. https://doi.org/10.1016/j.gloenvcha.2014.02.008.
  • Koppers, Lars, Jonas Rieger, Karin Boczek, and Gerret von Nordheim. 2020. Tosca: Tools for Statistical Content Analysis. https://CRAN.R-project.org/package=tosca.
  • Lacy, Stephen, Brendan R. Watson, Daniel Riffe, and Jennette Lovejoy. 2015. “Issues and Best Practices in Content Analysis.” Journalism & Mass Communication Quarterly 92 (4): 791–811. https://doi.org/10.1177/1077699015607338.
  • Laugwitz, Laura. 2021. “Qualitätskriterien Für Die Automatische Inhaltsanalyse. Zur Integration Von Verfahren Des Maschinellen Lernens in Die Kommunikationswissenschaft.” https://osf.io/preprints/socarxiv/gt28f/.
  • Lovejoy, Jennette, Brendan R. Watson, Stephen Lacy, and Daniel Riffe. 2014. “Assessing the Reporting of Reliability in Published Content Analyses: 1985–2010.” Communication Methods and Measures 8 (3): 207–221. https://doi.org/10.1080/19312458.2014.937528.
  • Lucas, Christopher, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley. 2015. “Computer-Assisted Text Analysis for Comparative Politics.” Political Analysis 23 (2): 254–277. https://doi.org/10.1093/pan/mpu019.
  • Maier, Daniel, A. Niekler, Gregor Wiedemann, and Daniela Stoltenberg. 2020. “How Document Sampling and Vocabulary Pruning Affect the Results of Topic Models.” Computational Communication Research 2 (2): 139–152. https://doi.org/10.5117/CCR2020.2.001.MAIE.
  • Maier, Daniel, A. Waldherr, P. Miltner, G. Wiedemann, A. Niekler, A. Keinert, B. Pfetsch, et al. 2018. “Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology.” Communication Methods and Measures 12 (2–3): 93–118. https://doi.org/10.1080/19312458.2018.1430754.
  • Manning, Christopher D., and Hinrich Schütze. 2002. Foundations of Statistical Natural Language Processing. 5th ed. Cambridge, MA. MIT Press.
  • Nerlich, Brigitte, and Nelya Koteyko. 2009. “Compounds, Creativity and Complexity in Climate Change Communication: The Case of ‘Carbon Indulgences.” Global Environmental Change 19 (3): 345–353. https://doi.org/10.1016/j.gloenvcha.2009.03.001.
  • Niekler, A., and P. Jähnichen. 2012. “Matching Results of Latent Dirichlet Allocation for Text.” http://asv.informatik.uni-leipzig.de/publication/file/210/nieklerjaehnichenICCM2012.pdf.
  • Nikita, M. 2020. Tuning of the Latent Dirichlet Allocation Models Parameters: Comprehensive R Archive Network (CRAN). https://cran.r-project.org/web/packages/ldatuning/index.html.
  • Pipal, Christian, Hyunjin Song, and Hajo G. Boomgaarden. 2022. “If You Have Choices, Why Not Choose (And Share) All of Them? A Multiverse Approach to Understanding News Engagement on Social Media.” Digital Journalism: 1–21. https://doi.org/10.1080/21670811.2022.2036623.
  • Reber, U. 2019. “Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora.” Communication Methods and Measures 13 (2): 102–125. https://doi.org/10.1080/19312458.2018.1555798.
  • Rieger, Jonas, Jörg Rahnenführer, and Carsten Jentsch. 2020. “Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype.” In Natural Language Processing and Information Systems, Vol. 12089, edited by Elisabeth Métais, Farid Meziane, Helmut Horacek, and Philipp Cimiano, 118–125. Cham: Springer International Publishing. http://link.springer.com/. 10.1007/978-3-030-51310-8_11. Accessed March 26, 2021.
  • Rieger, Jonas. 2020. “LdaPrototype: A Method in R to Get a Prototype of Multiple Latent Dirichlet Allocations.” Journal of Open Source Software 5 (51): 2181. https://doi.org/10.21105/joss.02181.
  • Rinke, Eike Mark, Timo Dobbrick, Charlotte Löb, Cäcilia Zirn, and Hartmut Wessler. 2022. “Expert-Informed Topic Models for Document Set Discovery.” Communication Methods and Measures 16 (1): 39. https://doi.org/10.1080/19312458.2021.1920008.
  • Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. 2016. “Navigating the Local Modes of Big Data: The Case of Topic Models.” In Computational Social Science: Discovery and Prediction, edited by R. M. Alvarez, 51–97. Analytical methods for social research. Cambridge: Cambridge University Press.
  • Roberts, Margaret E., Brandon M. Stewart, and Dustin Tingley. 2019. “Stm: An R Package for Structural Topic Models.” Journal of Statistical Software 91 (2): 1–40. https://doi.org/10.18637/jss.v091.i02.
  • Scharkow, Michael. 2012. “Automatisierte Inhaltsanalyse Und Maschinelles Lernen.” Dissertation at Berlin University of the Arts. epubli. https://opus4.kobv.de/opus4-udk/frontdoor/deliver/index/docId/28/file/dissertation_scharkow_final_udk.pdf.
  • Schofield, Alexandra, and David Mimno. 2016. “Comparing Apples to Apple: The Effects of Stemmers on Topic Models.” Transactions of the Association for Computational Linguistics 4: 287–300. https://doi.org/10.1162/tacl_a_00099.
  • Sobel, M., and Daniel Riffe. 2015. “U.S. Linkages in New York Times Coverage of Nigeria, Ethiopia and Botswana (2004-13): Economic and Strategic Bases for News.” International Communication Research Journal 50 (1): 3–22.
  • Song, Hyunjin, Petro Tolochko, Jakob-Moritz Eberl, Olga Eisele, Esther Greussing, Tobias Heidenreich, Fabienne Lind, Sebastian Galyga, and Hajo G. Boomgaarden. 2020. “In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis.” Political Communication 37 (4): 550–572. https://doi.org/10.1080/10584609.2020.1723752.
  • Stray, Jonathan. 2019. “Making Artificial Intelligence Work for Investigative Journalism.” Digital Journalism 7 (8): 1076–1097. https://doi.org/10.1080/21670811.2019.1630289.
  • Stryker, Jo Ellen, Ricardo J. Wray, Robert C. Hornik, and Itzik Yanovitzky. 2006. “Validation of Database Search Terms for Content Analysis: The Case of Cancer News Coverage.” Journalism & Mass Communication Quarterly 83 (2): 413–430. https://doi.org/10.1177/107769900608300212.
  • Vogelgesang, Jens, and Michael Scharkow. 2012. “Reliabilitätstests in Inhaltsanalysen.” Publizistik 57 (3): 333–345. https://doi.org/10.1007/s11616-012-0154-9.
  • Walter, Stefanie. 2019. “Better Off Without You? How the British Media Portrayed EU Citizens in Brexit News.” The International Journal of Press/Politics 24 (2): 210–232. https://doi.org/10.1177/1940161218821509.
  • Wilkerson, John, and Andreu Casas. 2017. “Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges.” Annual Review of Political Science 20 (1): 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542.
  • Zhu, Xingquan, and Xindong Wu. 2004. “Class Noise Vs. Attribute Noise: A Quantitative Study.” Artificial Intelligence Review 22 (3): 177–210. https://doi.org/10.1007/s10462-004-0751-8.