6,842
Views
1
CrossRef citations to date
0
Altmetric
Regular Articles

Wikipedia: a challenger’s best friend? Utilizing information-seeking behaviour patterns to predict US congressional elections

&
Pages 174-200 | Received 23 Oct 2020, Accepted 07 Jun 2021, Published online: 28 Jun 2021
 

ABSTRACT

Election prediction has long been an evergreen in political science literature. Traditionally, such efforts included polling aggregates, economic indicators, partisan affiliation, and campaign effects to predict aggregate voting outcomes. With increasing secondary usage of online-generated data in social science, researchers have begun to consult metadata from widely used web-based platforms such as Facebook, Twitter, Google Trends and Wikipedia to calibrate forecasting models. Web-based platforms offer the means for voters to retrieve detailed campaign-related information, and for researchers to study the popularity of campaigns and public sentiment surrounding them. However, past contributions have often overlooked the interaction between conventional election variables and information-seeking behaviour patterns. In this work, we aim to unify traditional and novel methodology by considering how information retrieval differs between incumbent and challenger campaigns, as well as the effect of perceived candidate viability and media coverage on Wikipedia’s predictive ability. In order to test our hypotheses, we use election data from United States Congressional (Senate and House) elections between 2016 and 2018. We demonstrate that Wikipedia data, as a proxy for information-seeking behaviour patterns, is particularly useful for predicting the success of well-funded challengers who are relatively less prevalent in the media. In general, our findings underline the importance of a mixed-data approach to predictive analytics in computational social science.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 We do not assume that all pageviews translate to possible voters and consider that a proportion of visits may be the result of non-standard editors (Göbel & Munzert, Citation2018). Notwithstanding such visitors, however, we believe an observation of pageviews and vote share across two congressional election cycles should be indicative of a potential relationship between pageviews and voting outcomes.

2 The United States Congress is divided into 100 elected senators, two from each of the 50 US States, and 441 elected House Representatives each representing a unique district.

3 The election day always falls on the first Tuesday after 1 November of even years

4 The FEC is the regulatory body which monitors and enforces campaign financing laws and regulations in the US.

5 The News Archive is a large database of consecutive 15-second, captioned television news broadcasting.

6 We included CNN, MSNBC, Fox News, and local CBS affiliates.

7 For instance, a candidate with 6,500 pageviews at the end of the year in a race where all candidate pageviews totalled 78,000 views now had a pageview ratio of 6,500/78,000 = 0.08, or 8 percent.

8 For House candidates, R2(adj) = .46, p < .001, and Senate candidates R2(adj) = .50, p < .001.

Additional information

Notes on contributors

Hamza Salem

Hamza Salem is a researcher focused on election prediction using unconventional methods. He holds his MSc in Social Data Science from the University of Oxford and his BA from New York University (NYU).

Fabian Stephany

Dr. Fabian Stephany is a Lead Researcher in Computational Social Science at the OII, University of Oxford, and a Research Affiliate at the Humboldt Institute for Internet and Society in Berlin. He develops digital policies in fields like digital skills, migration, innovation, and e-governance. With the iLabour project at the OII, he studies the global dynamics of Online Labour Markets.