ABSTRACT
Automatic sentiment analysis is used extensively in political science. The digitization of legislative transcripts has increased the potential application of established tools for the automated analyses of emotion in text. Unlike in writing, however, expressing emotion in speech involves intonation, facial expressions, and body language. Drawing on a new dataset of annotated texts and videos from the Canadian House of Commons, this paper does three things. First, we examine whether transcripts capture the emotional content of speeches. We find that transcripts capture sentiment, but not emotional arousal. Second, we compare strategies for the automated analysis of sentiment in text. We find that leading approaches performed reasonably well, but sentiment dictionaries generated using word embeddings surpassed these other approaches. Finally, we test the robustness of the approach based on word embeddings. Although the methodology is reasonably robust to alternative specifications, we find that dictionaries created using word embeddings are sensitive to the choice of seed words and to training corpus size. We conclude by discussing the implications for analyses of political speech.
Acknowledgments
This paper was improved by feedback received at the Centre for the Study of Democratic Citizenship at McGill, the Department of Political Science at Universit´e Laval, the Department of Political Science at Western University, the 2nd Annual Politics and Computational Social Science Conference, the 115th Conference of the American Political Science Association, the 2019 Conference of the Canadian Political Science Association, and, especially, from comments by Jacob Montgomery, Bryce Dietrich, J. Scott Matthews, Sven-Oliver Proksch, Fran¸cois P´etry, Yannick Dufresne, and David Armstrong. We are also grateful for the exceptionally detailed and constructive criticism from the Journal’s anonymous reviewers. We also thank Meghan Snider, Pierre-Oliver Bonin, Jason Vandenbeukel, Katie Moez, Stefan Ferraro, and Justin Savoie for their excellent assistance coding. We are responsible for any remaining errors.
Disclosure Statement
No potential conflict of interest was reported by the authors.
Data Availability Statement
The data described in this article are openly available in the Open Science Framework at https://doi.org/10.17605/OSF.IO/VUTW4.
Open Scholarship
This article has earned the Center for Open Science badges for Open Data, Open Materials and Preregistered. The data and materials are openly accessible at https://doi.org/10.17605/OSF.IO/VUTW4.
Correction Statement
This article has been republished with minor changes. These changes do not impact the academic content of the article.
Notes
1. In three cases, there were minor distortions in the indicated videos (e-mail dings). The previous sentence was used for those cases as well.
2. The missing speech implied that government ministers were taking bribes, and we suspect it was withdrawn from Hansard by the Member.
3. Although video coders may have coded the same video more than twice, we restricted the analysis to their first two scores because the text coders coded each snippet only twice.
4. For general summaries of these methods and their applications, see for instance, Quinn et al. (Citation2010), Cambria et al. (Citation2013), Grimmer and Stewart (Citation2013), and Wilkerson and Casas (Citation2017), and Benoit (Citation2019).
5. There are two popular variants. The Continuous Bag of Words (CBOW) algorithm assigns vectors to maximize the likelihood of a word appearing, given its context. The Skip Gram algorithm assigns vectors to maximize the likelihood of contexts appearing, given each word. We use the CBOW algorithm.
6. Pang and Lee (Citation2008) summarize the history of this development.
7. Amir et al. (Citation2015) also used word embeddings to predict the sentiment of Twitter terms using a labeled set of words and phrases, but using a regression-based approach.
8. For a detailed examination of the relevance of arithmetic operations performed on words embeddings, see Ethayarajh et al. (Citation2018).
9. For each approach, unclassified sentences are not included in the calculations of accuracy and R-squared. We exclude them because their inclusion as “neutral” classifications substantially reduces the performance of these dictionaries, and, for the purposes of comparing established dictionaries to the approach based on word embeddings, we wanted to represent the established dictionaries in the strongest possible way. In response to a helpful suggestion, we also experimented by including in the Lexicoder analysis the entire paragraph surrounding the sentences that we extracted. In effect, this meant that a greater number of words would align with the Lexicoder dictionary, which could conceivably result in a better classification of the context of the sentence in our analysis. In following this approach, we found a small decrease in the accuracy of Lexicoder’s classification and in the amount of variance that it explained, but an appreciable decrease, from 31% to 10%, in the proportion of our sentences that Lexicoder was unable to classify.
10. We are confident that supervised models trained on annotated parliamentary text would represent an excellent strategy for analyzing sentiment in parliamentary corpora, provided that there was enough annotated data with which to train the models. Parliamentary data are not normally annotated for sentiment, however, and the process of annotating them is time consuming and costly.
Additional information
Funding
Notes on contributors
Christopher Cochrane
Christopher Cochrane is a Associate Professors, Department of Political Science, University of Toronto.
Ludovic Rheault
Ludovic Rheault is a Associate Professors, Department of Political Science, University of Toronto.
Jean-François Godbout
Jean-François Godbout is a Professor in the Department of Political Science, Université de Montréal.
Tanya Whyte
Tanya Whyte Ph.D. recipients from the Department of Political Science, University of Toronto.
Michael W.-C. Wong
Michael W.-C. Wong (M.Phil., Oxford) is a Research Assistant in the Department of Political Science at the University of Toronto Scarborough.
Sophie Borwein
Sophie Borwein Ph.D. recipients from the Department of Political Science, University of Toronto.