2,183
Views
0
CrossRef citations to date
0
Altmetric
Article

Statistical Fallacies in Claims about “Massive and Widespread Fraud” in the 2020 Presidential Election: Examining Claims Based on Aggregate Election Results

ORCID Icon & ORCID Icon
Article: 2289529 | Received 03 Sep 2022, Accepted 23 Nov 2023, Published online: 10 Jan 2024
 

Abstract

Years after the election, a substantial portion of the electorate, including a significant majority of Republican voters and numerous Republican officials, continue to believe that the 2020 election was stolen. This essay reviews claims of alleged massive electoral fraud in the 2020 U.S. presidential election. These claims are based on analyses of aggregate-level election data. Although the underlying data in each of the 13 claims we review are accurately described, our review reveals that the interpretations of the election data, which suggest massive fraud, are based on invalid statistical or illogical reasoning. In summary, the conclusions about fraud derived from these statistical analyses are categorically incorrect. We believe this article will serve as a valuable educational tool for the press, the public, and students. It underscores the dangers of misusing statistical inference and emphasizes the importance of accurate statistical analysis in political discourse. By discussing statistical fallacies in a nontechnical manner, we aim to make our critiques accessible to a broad, nonspecialist audience. This significantly contributes to the understanding of misinformation and its impact on democracy and public trust in electoral processes. Supplementary materials for this article are available online.

Supplementary Materials

Supplementary material includes all data and replication code needed to reproduce tables, figures, and statistical analyses in this paper. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JSPP.

Acknowledgments

This research received partial support from the Peltason Chair of Democracy Studies at the University of California, Irvine. The opinions presented are exclusively those of the authors. We extend our gratitude to Sean Birch for his invaluable assistance, and to Dan Silverman and John Chin for their meticulous line-by-line revisions. Our thanks also go to the reviewers and editor for their constructive suggestions. The article benefited significantly from insights gained at the Carnegie Mellon Institute for Strategy & Technology’s Political Science Research Workshop. Additionally, the European Political Science Association panel, especially discussant Jeff Gill, provided crucial feedback.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1 For example, see Figure 3. Donald Trump has a history of making claims about fraud in elections. After the 2016 election in which he was victorious, he claimed that millions of non-citizens voted, preventing him from winning the popular vote. Those claims were also false (Cottrell, Herron, and Westwood, Citation2018). Before that, Trump promoted the claim that President Obama was not born in the United States, and thus not eligible to be president (Reeve, Citation2012; Jardina and Traugott Citation2019). This, too, is a false claim.

2 Evidence suggests that these voters sincerely hold these beliefs (Cuthbert and Theodoridis, Citation2022; Fahey, Citation2023. See Peterson and Iyengar, Citation2021).

3 Over 170 federal or statewide candidates who denied the legitimacy of the 2020 U.S. presidential election won their general election contests in November 2022. Thus, although many of the over 291 election denying candidates for these offices that emerged from the primary lost in the general election, some won their elections, and some can exert direct or indirect influence on future election administration. See for example Blanco, Wolfe, and Gardner (Citation2022).

4 Nearly three in 10 Americans reacted positively to the statement “If elected leaders will not protect America, the people must do it themselves even if it requires taking violent actions” (Cox, Citation2021).

5 “Of the 64 cases brought by Trump and his supporters, twenty were dismissed before a hearing on the merits, fourteen were voluntarily dismissed by Trump and his supporters before a hearing on the merits, and 30 cases included a hearing on the merits. Only in one Pennsylvania case involving far too few votes to overturn the results did Trump and his supporters prevail (Danforth et al., Citation2022; p. 3).

6 Former President Trump and some of his supporters are now indicted in two cases, one if federal court, one in state court. The first of these cases is in federal court in Washington D.C. and includes four counts. Prosecutors argue that the claims made by Trump “were false, and [Trump] the defendant knew that they were false” (Indictment, p. 7). They also charge that he “deliberatively disregarded the truth.” A second case is in state court in Georgia (Sullivan, Ax, and Lynch, Citation2023). The indictment details various alleged offenses by Trump and his associates, encompassing giving false testimony to legislators about election fraud and pressuring state officials to breach their official duties by nullifying the election outcomes.

7 In April of 2023, Dominion, a company who manufactures voting machines, reached a settlement of $787.5 million in a defamation lawsuit again Fox News for the propagation of claims that Dominion machines changed votes from Trump to Biden (Poniewozik, Citation2023). To prove defamation, one must show “’actual malice’–that the statement was made with knowledge of its falsity or with reckless disregard of whether it was true or false.” New York Times v. Sullivan 376 U.S. 254 (1964). In documents that were revealed during the discovery phase of the trial, Fox News host admitted that they did not believe the claims they were making on television but repeated them because “[o]ur viewers are good people and they believe it” (Poniewozik, Citation2023).

8 We hold the view that statements supported by statistical evidence can be especially pernicious. This is demonstrated by how challenging established scientific consensus can sway public opinion, as indicated in the study by Lewandowsky, Gignac, and Vaughan (Citation2013).

9 Describing the Russian model of propaganda, RAND has coined the phrase “firehose of falsehood.” It is called this because of the distinctive pattern of a “high numbers of channels and messages and a shameless willingness to disseminate partial truths or outright fictions” (Paul and Matthews, Citation2016). We prefer to think of them as a hydra-headed monster, for every piece of misinformation corrected, multiple new ones are created (“Since the Hydra could replace old heads with multiple new ones…logic requires that the creature boasted different numbers of heads at different times.” Ogden, Citation2013).

10 We would also note that, in the four years before the election, the claim that the 2020 election would be stolen by the Democrats was also repeatedly asserted by President Trump and his supporters, thus preparing the way for the post-election claims of fraud.

11 See, for example, Bump (Citation2022). We will, however, make a few points about misperceptions and misinformation in the concluding section of this article.

12 We do, however, believe that the framework of classifying election-related statistical fallacies that we offer may prove useful to other scholars.

13 We would call particular attention to the excellent survey and analysis of 2020 presidential election fallacies by Eggers, Garro, and Grimmer (Citation2021). We began writing our own essay before we were familiar with this article in its on-line form, but we have learned a lot from it and cite to it herein. Their detailed rebuttal of fallacies is more technical than ours, and we cover some fallacies that they do not. There are also various journalistic surveys of election fraud claims (Corasaniti, Epstein, and Rutenberg, Citation2020; Feldman, Citation2020; Alba and Frenkel, Citation2021), but these primarily include claims that do not fall within the scope of this essay.

14 Moreover, in understanding Electoral College outcomes, we need also to look at the geographic location of each candidate’s support. See Box 2.

15 As of 2020, there were 3143 total counties and county equivalents. In our data, we have 3153 because Alaska reports election results from Electoral Districts, of which there were 10 more than their county equivalents.

16 Data is estimated, as no official national election results are compiled. Dave Leip’s Atlas of Presidential Elections, a well-reputed compiler of election results, reports totals that do not match the certified, official federal elections results produced by the FEC. The Brookings Institute reports the differential to be 2588 Trump counties to 551 Biden counties (Frey, Citation2021).

17 The number of votes cast in Los Angeles County surpassed the total votes in 39 states!

18 For a nice overview of political graphics of different kinds, including a cartogram of the 2020 presidential election at the county level, see Bliss and Patino (Citation2020). See also Cartographic Views of the 2020 US Presidential Election - Worldmapper (Citation2020).

19 The bubble size gradations on a bubble map make this task easier than the color variations on a cartogram that usually have a limited number of victory margin categories. Though, note that the circle size in our map is scaled using the square root of the vote total, so the circle size itself is not a linear scale.

21 Top row is the proportion of the electorate for each demographic group.

22 Academics and journalists, including ourselves, often limit our analysis of election results to the two major parties. To do so, we convert results into the two-party vote share and eliminate all non-major party candidates. But, doing so can shield important insights, such as those highlighted in this section.

23 As we noted earlier, we are providing a compendium; this and other claims of Dr. Cicchetti have already been rebutted elsewhere (see e.g., http://web.archive.org/web/20220416221931/, https://reason.com/volokh/2020/12/09/more-on-statistical-stupidity-at-scotus/, and http://web.archive.org/web/20220416221815/https://statmodeling.stat.columbia.edu/2020/12/08/the-p-value-is-4-76x10%E2%88%92264-1-in-a-quadrillion/. That report has been devastatingly critiqued in the expert witness report of Gary King in the same case. Claims about election fraud in Texas v. Pennsylvania, including Dr. Cicchetti’s report, can be found here: https://www.supremecourt.gov/DocketPDF/22/22O155/163048/20201208132827887_TX-v-State-ExpedMot%202020-12-07%20FINAL.pdf

24 In the hearings by the House Select Committee investigating January 6, this discrepancy between the patterns in early and late votes was referred to as a “red mirage.”

25 Perhaps because states with more Democrats allow for more wide-spread use of mail-in balloting, for example, eight states conduct general elections entirely by mail. They include California, Colorado, Hawaii, Nevada, Oregon, Utah, Vermont, and Washington. See (Voting Outside the Polling Place: Absentee, All-Mail and Other Voting at Home Options, 2022). All of these, except Utah, voted their electors to both Democratic nominees Clinton (2016) and Biden (2020).

26 Two states, Connecticut (Conn. Gen. Stat. §9-150a) and Ohio (Ohio Rev. Code §3509.06), do not specify when counting may begin. For more information, see Voting Outside the Polling Place: Absentee, All-Mail and Other Voting at Home Options (2022; Table 16).

27 This study includes a comparison of the assumption of uniform distribution of year of birth and name distribution to actual data and show that assuming a uniform distribution over a 64-year interval and over the names in the dataset tends to underestimate the prevalence of birth year matching by about 12%(487-433)433 (see p. 119). The more there are birth bulges (as in the post-WWII baby boom), the more likely it is that two randomly chosen individuals will share the same year of birth, and some names are much more prevalent than others.

28 Moreover, as a further complication, we must consider the degree of heterogeneity in the distribution of names. Calculating this probability from the name distribution in the population requires calculating an average based on name and surname. Names vary greatly in their frequency and vary geographically. The U.S. Bureau of the Census publishes a list of surname frequencies based on national values, but the distribution of names (and their degree of heterogeneity) will vary with the racial/ethnic composition of the political unit (Grofman and Garcia, Citation2014, Citation2015).

29 Birth year and name are not fully independent of one another. Changing patterns of immigration affect the relative surname shares in different generations, and some rather dramatic changes over time in the popularity of first names mean that first name probabilities are not independent of the year of birth. Also, first and last names are far from independent since both are linked to ethnicity. Indeed, we could, in principle, estimate birth decade probabilities just by looking at the prevalence of first and last names of those born during the decade (see, Grofman and Garcia, Citation2014, Citation2015)

30 Note that the probability of finding such a match from among a set of n voters must not be confused with the probability of any specific name + birthday + birthyear combination being repeated. For obvious reasons, the likelihood of a given combination being repeated will depend, ceteris paribus, on how common is the name. We can further simplify by assuming a uniform distribution across the first two factors and assess the likelihood of two randomly chosen individuals bearing the same name from the name distribution in empirical data.

31 And, of course, this assumes that the probabilities are independent of each other.

32 For further compatibility with the McDonald and Levitt (Citation2008) study, in the model presented below, we also took our birth year time-period as a 64-year span.

33 Mebane (Citation2020) notes that “It is widely understood that the first digits of precinct vote counts are not useful for trying to diagnose election frauds.”

35 See for example, Deckert, Myagkov, and Ordeshook (Citation2011). Even Mebane, who has been a repeated applicant of the Law as a fraud detector in elections in multiple countries, has reiterated that while its violation might be taken as suggesting an anomaly, a violation of the supposed law does not prove fraud. Fraud would need to be directly investigated (Mebane, Citation2020).

36 In Pennsylvania, this included pointing viewers to a website developed by the Department of State. Ads featured prominent actors and athletes from the state explained how to find a location to cast a ballot or how to cast a ballot by mail.

37 Trump’s White House Press Secretary Kayleigh McEnany falsely claimed that “[a] mere survey of the facts shows that voters in blue counties like Philadelphia County were given certain privileges that voters in red counties were not afforded. The Constitution’s equal protection clause requires uniform standards, but Democrat election officials created disparities depending on where citizens lived and where they voted in the state.” However, the PA Department of State gave guidance to all counties that they could allow voters to make corrections to rejected mail-in ballots. Some counties, predominantly Trump-majority in their partisanship, choose not to.

38 See Morris Fiorina (Citation2017) “Unstable Majorities”, and Frances Lee (Citation2016) “Insecure Majorities”. Modern elections are very likely to result in divided government, and control of any branch of government is often won or lost in the margins. Partisan bias, such as that introduced by malapportionment or gerrymandering, can also affect the ability to carry marginal House or Senate seats.

39 For more details on Biden’s overperformance compared to U.S. House Democratic candidates, see William A. Galston (Citation2020).

40 House Democrats nationally underperformed their 2018 performance, which accounts for the fourteen net seats gained by the Republicans. Relative to the 115th Congress (2016–2018), the 177th Congress (2021–2023) has 28 more Democrats.

41 Figure omitted for space reasons.

42 An increase of 22 from 2016.

43 A decrease by 19 from 2016.

44 This “thread” is filled with statistics purportedly showing how Biden is the historic underdog going into the 2020 election (by this account, Trump should have been heavily favored), for example, “Incumbents are 6/6 when facing re-election during civil unrest”. See more: https://twitter.com/davidchapman141/status/1315440579485069314?s=20 Internal links to this claim on Twitter say that the first primary was in 1912 and that Trump had received a higher percentage of the primary vote than Eisenhower, Nixon, Clinton, and Obama. And it is noted that only five incumbents have received at least 90% of their primary vote.

45 He unnecessarily complicates this by plotting on the y-axis the difference between the split-ticket vote and the straight-ticket vote, as explained in the following paragraphs.

46 This rebuttal to Ayyadurai (Citation2020) is presented in Eggers, Garro, and Grimmer (Citation2021), acknowledging its previous elucidation by Kabir (Citation2020) and Parker (Citation2020), each of whom show empirical evidence of the party-independence of results. However, the Eggers, Garro, and Grimmer main rebuttal to the Ayyadurai (Citation2020) claim uses the logic of latent variable analysis by demonstrating how regression to the mean effects lead to negatively sloped regression lines in the situation posited by Ayyadurai.

47 Not shown due to space constraints. Can be seen at the 39 min mark in his original video.

48 For more information about Lott’s research in his own words, see a blog post he wrote himself at RealClear Politics, titled “New Peer-Reviewed Research Finds Evidence of 2020 Voter Fraud.” (Lott, Citation2020). To the best of our knowledge, this research has not yet been published in a peer-reviewed journal.