90
Views
0
CrossRef citations to date
0
Altmetric
Research Article

From corpus creation to formative discovery: the power of big-data-rhetoric teams and methods

, &
Pages 38-61 | Received 01 Apr 2022, Accepted 11 Aug 2022, Published online: 03 Apr 2023
 

ABSTRACT

Rhetoric has been slow to adopt big-data techniques, but that is changing. In this article, we describe the formative work of our rhetoric-data science team on an ideographic analysis of state veteran laws. Our interdisciplinary approach enabled us to build a corpus of more than 7,000 files, segment that corpus into likely public and private laws, and develop dictionaries for discerning individual entitlements, such as waived fees for gun permits. Early results show state-level trends in the number of veteran laws, proportion of veteran laws concerning disabled veterans, and proportion of veteran/disability laws affording individual entitlements. While this article presents early findings, its broader purpose is to contribute to discussions of corpus building, data cleaning, formative analysis, and the value of big-data-rhetoric collaborations. Our experience provides five insights: (1) big-data collection methods can save a public rhetoric project when customary retrieval methods fail; (2) big-data-rhetoric work starts conceptually and becomes concretized; (3) formative big-data rhetoric work can problematize fundamental research assumptions, such as what should be included in a corpus; (4) big-data methods can produce interesting results early, yielding a roadmap for future work; and (5) big-data-rhetoric teams need more guidance from the field.

Acknowledgement

The authors thank our anonymous reviewers for their insightful constructive feedback.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Matthew Brook O’Donnell, Emily B. Falk, and Matthew D. Lieberman, “Social in, Social out: How the Brain Responds to Social Language with More Social Language,” Communication Monographs 82, no. 1 (2015): 31–63.

2 Alex P. Leith, “Parasocial Cues: The Ubiquity of Parasocial Relationships on Twitch,” Communication Monographs 88, no. 1 (2021): 111–29.

3 René Weber, J. Michael Mangus, Richard Huskey, Frederic R. Hopp, Ori Amir, Reid Swanson, Andrew Gordon, Peter Khooshabeh, Lindsay Hahn, and Ron Tamborini, “Extracting Latent Moral Information from Text Narratives: Relevance, Challenges, and Solutions,” Communication Methods and Measures 12, no. 2–3 (2018): 119–39.

4 Lei Guo, Kate Mays, Yiyan Zhang, Derry Wijaya, and Margrit Betke, “What Makes Gun Violence a (Less) Prominent Issue? A Computational Analysis of Compelling Arguments and Selective Agenda Setting,” Mass Communication and Society 24, no. 5 (2021): 651–75; Hyunjin Seo, “Visual Propaganda in the Age of Social Media: An Empirical Analysis of Twitter Images During the 2012 Israeli–Hamas Conflict,” Visual Communication Quarterly 21, no. 3 (2014): 150–61; Saif Shahin, “When Scale Meets Depth: Integrating Natural Language Processing and Textual Analysis for Studying Digital Corpora,” Communication Methods and Measures 10, no. 1 (2016): 28–50; Saif Shahin, “Facing up to Facebook: How Digital Activism, Independent Regulation, and Mass Media Foiled a Neoliberal Threat to Net Neutrality,” Information, Communication & Society 22, no. 1 (2019): 1–17; Fatemeh Shayesteh and Hyunjin Seo, “Competing Frames on Social Media: Analysis of English and Farsi Tweets on Iran Plane Crash,” The Journal of International Communication 28, no. 1 (2022): 47–69.

5 An example of a study incorporating NLP is: Frederic R. Hopp, Jacob T. Fisher, Devin Cornell, Richard Huskey, and René Weber, “The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a Crowd-sourced Approach to Extracting Moral Intuitions from Text,” Behavior Research Methods 53, no. 1 (2021): 232–46.

6 An example of a study incorporating fMRI is: O’Donnell, Falk, and Lieberman, “Social in, Social out,” 32. This study does not meet all definitions of big data, particularly with respect to sample size (i.e., 19 participants). However, each functional run produced multidimensional image files that greatly exceeded the size of a typical text document because they contained more data. According to O’Donnell, Falk, and Lieberman, this bigger data capacity yields “a method to examine multiple processes simultaneously during idea exposure; fMRI can reveal implicit and explicit factors leading to successful communication that may not be apparent from self-report measures or other experimental methods alone.”

7 Roderick P. Hart, “Genre and Automated Text Analysis: A Demonstration.” In Rhetoric and the Digital Humanities, ed. Jim Ridolfo and William Hart-Davidson (Chicago, IL: University of Chicago Press, 2015), 152–68; David Hoffman and Don Waisanen, “At the Digital Frontier of Rhetoric Studies: An Overview of Tools and Methods for Computer-aided Textual Analysis.” In Rhetoric and the Digital Humanities, ed. Jim Ridolfo and William Hart-Davidson (Chicago, IL: University of Chicago Press, 2015), 169–83.

8 Martin Hinton, “Corpus Linguistics Methods in the Study of (Meta)argumentation,” Argumentation 35, no. 3 (2021): 435–55; Hoffman and Waisanen, “At the Digital Frontier;” Douglas Walton and Thomas F. Gordon, “How Computational Tools Can Help Rhetoric and Informal Logic with Argument Invention,” Argumentation 33, no. 2 (2019): 269–95.

9 Ibid.

10 On how Communication is wrestling with how to select the best method for a given text, context, and research purpose: Frederic R. Hopp and René Weber, “Reflections on Extracting Moral Foundations from Media Content,” Communication Monographs 88, no. 3 (2021a): 371–79; Rong Wang and Wenlin Liu, “Different Pathways to Identify Moral Framing from Media Content: A Response to Hopp and Weber,” Communication Monographs 88, no. 3 (2021a): 380–88.

11 For discussions of public argument and social movement studies affecting diverse U.S. territories and citizens, see Celeste Condit Railsback, “The Contemporary American Abortion Controversy: Stages in the Argument,” Quarterly Journal of Speech 70, no. 4 (1984): 410–24; M. Linda Miller, “Public Argument and Legislative Debate in the Rhetorical Construction of Public Policy: The Case of Florida Midwifery Legislation,” Quarterly Journal of Speech 85, no. 4 (1999), 361–79. Beyond public rhetoric applications, this big-data discussion is an invitation to explicate our diverse specializations and research approaches for audiences operating in a big-data world. 

12 On early computer-facilitated text analysis software: Hart, “Genre and Automated Text;” Hoffman and Waisanen, “At the Digital Frontier.”

13 Ibid.; Roderick P. Hart, “Systematic Analysis of Political Discourse: The Development of DICTION.” In Political Communication Yearbook: 1984, ed. Keith R. Sanders, Lynda Lee Kaid, and Dan Nimmo (Carbondale, IL: Southern Illinois University Press, 1985), 97–134; Roderick P. Hart, “Redeveloping DICTION: Theoretical Considerations.” In Theory, Method and Practice of Computer Content Analysis, ed. Mark D. West (Westport, CT: Ablex, 2001), 43–60.

14 Hoffman and Waisanen, “At the Digital Frontier.”

15 Ibid., 177.

16 Zoltan P. Majdik, “A Computational Approach to Assessing Rhetorical Effectiveness: Agentic Framing of Climate Change in the Congressional Record, 1994–2016,” Technical Communication Quarterly 28, no. 3 (2019): 207–22; S. Scott Graham, Zoltan P. Majdik, and Dave Clark, “Methods for Extracting Relational Data from Unstructured Texts Prior to Network Visualization in Humanities Research,” Journal of Open Humanities Data 6, no. 1 (2020).

17 Majdik, “A Computational Approach.”

18 Ibid., 217.

19 Ibid.

20 Roderick P. Hart, Verbal Style and the Presidency: A Computer-based Analysis (Orlando, FL: Academic Press, 1984).

21 On DICTION’s limitations in Hart's 1984 work: Robert L. Ivie, “Book Reviews: The Complete Criticism of Political Rhetoric,” Quarterly Journal of Speech 73, no. 1 (1987): 98–107; William W. Lammers, “The Sound of Leadership: Presidential Communication in the Modern Age [book review],” The American Political Science Review 82, no. 3 (1988): 990–91.

22 Examples of studies that used DICTION software to analyze relatively small corpora: Michelle C. Bligh and Jill L. Robinson, “Was Gandhi ‘Charismatic’? Exploring the Rhetorical Leadership of Mahatma Gandhi,” The Leadership Quarterly 21, no. 5 (2010): 844–55; Theodore F. Sheckels, “The Rhetoric of Nelson Mandela: A Qualified Success,” Howard Journal of Communication 12, no. 2 (2001): 85–99.

23 Roderick P. Hart, “Why Trump Lost and How? A Rhetorical Explanation,” American Behavioral Scientist 66, no. 1 (2022): 7–27.

24 For examples of Hart's discussion of emerging software capabilities, see Hart, “Genre and Automated Text,” Hart, “Redeveloping DICTION;” Hart, “Why Trump Lost.”

25 Essays that have discussed large corpus construction, data cleaning, and metadata issues: E. Johanna Hartelius, Jessica H. Lu, Damien Smith Pfister, and Carly S. Woods, “Digitality, Diversity, and the Future of Rhetoric and Public Address,” Rhetoric and Public Affairs 24, no. 1–2 (2021): 253–68; Majdik, “A Computational Approach;” Pamela VanHaitsma, “Between Archival Absence and Information Abundance: Reconstructing Sallie Holley's Abolitionist Rhetoric through Digital Surrogates and Metadata,” Quarterly Journal of Speech 106, no. 1 (2020): 25–47.

26 Majdik, “A Computational Approach.”

27 Guo et al., “What Makes Gun Violence.”

28 Douglas Walton, “Some Artificial Intelligence Tools for Argument Evaluation: An Introduction,” Argumentation 30, no. 3 (2016): 317–40.

29 Ibid.

30 Ibid.; Henry Prakken, “An Abstract Framework for Argumentation with Structured Arguments,” Argument & Computation 1, no. 2 (2010): 93–124.

31 Walton and Gordon, “How Computational Tools.”

32 Ibid., 292.

33 Hinton, “Corpus Linguistics Methods.”

34 Ibid.

35 Frederic R. Hopp and René Weber, “Rejoinder: How Methodological Decisions Impact the Validity of Moral Content Analyses,” Communication Monographs 88, no. 3 (2021b): 389–93.

36 Rong Wang and Wenlin Liu, “Moral Framing and Information Virality in Social Movements: A Case Study of #HongKongPoliceBrutality,” Communication Monographs 88, no. 3 (2021b): 350–70.

37 Ibid.

38 Ibid., 357.

39 Ibid., 359.

40 Ibid.

41 Hopp and Weber, “Reflections on Extracting.”

42 Wang and Liu, “Different Pathways.”

43 Hopp and Weber, “Rejoinder.”

44 Rhetorical analysis, content analysis, and textual analysis are neither synonymous nor mutually exclusive. As Edward Schiappa argued with respect to “sophistic rhetoric,” the boundaries, practices, and traditions of the myriad methods are unsettled. Edward Schiappa, “Sophistic Rhetoric: Oasis or Mirage?,” Rhetoric Review 10, no. 1 (1991): 5–18.

45 Hopp and Weber, “Reflections on Extracting.”

46 Wang and Liu, “Different Pathways.”

47 Ibid.; Rong Wang, “Marginality and Team Building in Collaborative Crowdsourcing,” Online Information Review 44, no. 4 (2020): 827–46.

48 Wang and Liu, “Different Pathways.”

49 Hopp et al. “The Extended Moral.”

50 Wang and Liu, “Different Pathways.”

51 Hopp and Weber, “Rejoinder,” 390.

52 Ibid., 391.

53 Ibid., 392.

54 Hopp and Weber, “Reflections on Extracting,” 374.

55 Ibid., 380.

56 Wang and Liu, “Different Pathways.”

57 Ibid., 381; see Seth C. Lewis, Rodrigo Zamith, and Alfred Hermida, “Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods,” Journal of Broadcasting & Electronic Media 57, no. 1 (2013): 34–52.

58 Wang and Liu, “Different Pathways,” 382.

59 Hopp and Weber, “Reflections on Extracting,” 374–75.

60 Wang and Liu, “Different Pathways,” 383.

61 Hopp and Weber, “Rejoinder;” Wang and Liu, “Different Pathways.”

62 Darrell Wanzer-Serrano, “Delinking Rhetoric, or Revisiting McGee's Fragmentation Thesis through Decoloniality,” Rhetoric and Public Affairs 15, no. 4 (2012): 647–57.

63 Diane Belcher and Gayle Nelson, Critical and Corpus-based Approaches to Intercultural Rhetoric (Ann Arbor, MI: University of Michigan Press, 2013), 1.

64 Ibid., 1.

65 Michael Calvin McGee, “The ‘Ideograph:’ A Link between Rhetoric and Ideology,” Quarterly Journal of Speech 66, no. 1 (1980): 1–16.

66 Matthew Houdek, “Racial Sedimentation and the Common Sense of Racialized Violence: The Case of Black Church Burnings,” Quarterly Journal of Speech 104, no. 3 (2018): 279–306, 286

67 Examples of critical scholarship that has deepened both rhetorical theory and applied rhetoric: Dana L. Cloud, ‘To Veil the Threat of Terror:’ Afghan Women and the <Clash of Civilizations> in the Imagery of the U.S. War on Terrorism,” Quarterly Journal of Speech 90, no. 3 (2004): 285–306; Richard D. Pineda and Stacey K. Sowards, “Flag Waving as Visual Argument: 2006 Immigration Demonstrations and Cultural Citizenship,” Argumentation and Advocacy 43, no. 3–4 (2007): 164–74; Wanzer-Serrano, “Delinking Rhetoric.”

68 M. Linda Miller, “Public Argument.”.

69 Ibid., 363–64.

70 Ibid., 370.

71 Ibid., 372–73.

72 Ibid., 373.

73 Ibid.

74 Ibid., 376.

75 Catherine L. Langford, “On Making <Person>s: Ideographs of Legal <Person>hood,” Argumentation and Advocacy 52, no. 2 (2015): 125–40. Note that Langford described Judge Parker's rhetoric as having the force of law. Our attorney-rhetorician disagrees and views the language as nonbinding dicta, since it is extraneous to the issues before the court.

76 Ibid., 137.

77 Ibid., 137.

78 Miller, “Public Argument;” see Edward Schiappa, “Analyzing Argumentative Discourse from a Rhetorical Perspective: Defining ‘Person’ and ‘Human Life’ in Constitutional Disputes over Abortion,” Argumentation 14, no. 3 (2000): 315–32.

79 Ibid., 138.

80 Ibid., 137–38.

81 Ibid., 138.

82 Ibid.

83 Ibid.

84 Paul Achter, “Rhetoric and the Permanent War,” Quarterly Journal of Speech 102, no. 1 (2016): 79–94, 93.

85 Defense Manpower Data Center, Number of Military and DoD Appropriated Fund (APF) Civilian Personnel Permanently Assigned (Washington, DC: DMDC, August 7, 2021), https://dwp.dmdc.osd.mil/dwp/app/dod-data-reports/workforce-reports

86 Achter, “Rhetoric and the Permanent War;” Paul Achter, “Unruly Bodies: The Rhetorical Domestication of Twenty-First-Century Veterans of War,” Quarterly Journal of Speech 96, no. 1 (2010): 46–68; Rebecca Izzo, “In Need of Correction: How the Army Board for Correction of Military Records is Failing Veterans with PTSD,” Yale Law Journal 123, no. 5 (2014): 1587–606; Robert L. Wilkie, Guidance to Military Discharge Review Boards and Boards for Correction of Military/Naval Records Regarding Equity, Injustice, or Clemency Determinations (Washington, DC: United States Department of Defense, 2018), https://arba.army.pentagon.mil/documents/Wilke20180725JusticeEquityClemency.pdf

87 John W. Brooker, Evan R. Seamone, and Leslie C. Rogall, “Beyond TBD: Understanding VA's Evaluation of a Former Service Member's Benefit Eligibility following Involuntary or Punitive Discharge from the Armed Forces,” Military Law Review 214, no. 1 (2012): 1–328; Umar Moulta-Ali and Sidath Viranga Panangala, Veterans’ Benefits: The Impact of Military Discharges on Basic Eligibility (Washington, DC: Congressional Research Service, 2015), https://sgp.fas.org/crs/misc/R43928.pdf

88 Margaret Kuzma, Dana Montalto, Elizabeth R. Gwin and Daniel L. Nagin, Military Discharge Upgrade Legal Practice Manual (Chicago, IL: American Bar Association, 2021).

89 Ibid.

90 Arkansas General Assembly, Act 1253: An Act to Provide Lifetime Hunting Licenses and Fishing Licenses to Certain Disabled Veterans; and for Other Purposes (Little Rock, AR: Arkansas General Assembly, 2013a).

91 Arkansas General Assembly, Act 444: An Act to Extend Veterans Preference in Hiring to School Districts; to Clarify the Veterans Preference Law; and for Other Purposes (Little Rock, AR: Arkansas General Assembly, 2013b).

92 Elishewah Weisz, “Stolen Valor: The People Who Commit Military Impersonation,” Ph.D. diss. (Sam Houston State University, 2016).

93 Alabama Secretary of State. Legislative acts. https://www.sos.alabama.gov/government-records/legislative-acts

94 United States Census Bureau. U.S. and World Population Clock. https://www.census.gov/popclock/

95 U.S. Department of Veterans Affairs. National Center for Veterans Analysis and Statistics: Veteran Population. https://www.va.gov/vetdata/veteran_population.asp

96 Ronald K. Snell, State Experiences with Annual and Biennial Budgeting (Denver, CO: National Conference of State Legislatures, 2011).

97 Hemlata Shelar, Gagandeep Kaur, Neha Heda, and Poorva Agrawal, “Named Entity Recognition Approaches and their Comparison for Custom NER Model,” Science & Technology Libraries 39, no. 3 (2020): 324–37.

98 Friedemann Vogel, Hanjo Hamann, and Isabelle Gauer, “Computer-assisted Legal Linguistics: Corpus Analysis as a New Tool for Legal Studies,” Law & Social Inquiry 43, no. 4 (2018): 1340–63.

99 Our AI/ML had previously noted the relationship between named entity and personal pronoun scores. She now ran a Pearson's product-moment correlation test (i.e., in RStudio: > cor.test(subdata$majorprop, subdata$personprop; data: subdata$majorprop and subdata$personprop) and found a strong correlation between the two (i.e., t = 4.7417, df = 227, p-value = 3.744e-06; 95% confidence interval: 0.1774648 0.4137386 sample estimates: cor 0.3001994).

100 Defense Manpower Data Center. (2021, Dec. 31). Number of Military and DoD Appropriated Fund (APF) Civilian Personnel [Duty State/Country column]. Washington, DC: DMDC, https://dwp.dmdc.osd.mil/dwp/app/dod-data-reports/workforce-reports

101 Ibid. 

102 Jessica Learish and Elisha Fieldstadt, “Gun Map: Ownership by State,” CBS News, April 14, 2022, https://www.cbsnews.com/pictures/gun-ownership-rates-by-state/51/

103 Thomas S. Kuhn, The Structure of Scientific Revolutions [50th anniversary ed.] (Chicago, IL: The University of Chicago Press, 2012).

104 Wang and Liu, “Different Pathways.”

105 On modern memorialization rhetoric in the U.S., including legislator speeches, see Bradford Vivian, “Neoliberal Epideictic: Rhetorical Form and Commemorative Politics on September 11, 2002,” Quarterly Journal of Speech 92, no. 1 (2006), 1–26.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.