ABSTRACT
In this work, we present an intelligent system for the automatic categorization of political documents, specifically the documents containing the parliamentary questions collected during the weekly Question Times at the Chamber of Deputies of the Italian Republic. The proposed intelligent system leverages text classification models to perform the document categorization. The system is aimed at supporting and facilitating the research activities of political science scholars, who deal with comparative and longitudinal analysis of thousands of documents. To select the best classification models for our specific task, several classical machine learning and deep learning-based text classification models have been experimentally compared.
Acknowledgments
The authors of this article wish to thank Linda Basile, Enrico Borghetto and Francesco Visconti for their help in manually checking the output of the QTIS Web application generated using the 222 unlabelled PQs as input.
Declaration on interest statement
The authors declare there are no conflicts of interest.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
A copy of the dataset can be accessed here: https://tinyurl.com/5n8mfkeh.
Notes
7 The CAP coding scheme is based on the scheme developed in the early 1990s within the US Policy Agendas Project (www.policyagendas.org). Over time some major topics were folded together, and this is the reason for the missing major topics number 11 and 22..
8 The two datasets adopted in this work are still not publicly available. The Italian Team of the CAP plans to release a new version of the dataset after the termination of the current legislative period.
9 http://hlt.isti.cnr.it/wordembeddings/skipgram wiki window10 size300 neg-samples10.tar.gz:
10 http://hlt.isti.cnr.it/wordembeddings/glove wiki window10 size300 iteration50.tar.gz:
17 The QTIS web application will be publicly available upon acceptance..
19 http://redis.io/
Additional information
Funding
Notes on contributors
A. Cavalieri
A. Cavalieri is research fellow at the University of Torino, Italy
Pietro Ducange
P. Ducange is associate professor at the Department of Information Engineering of the University of Pisa, Italy.
S. Fabi
S. Fabi is software engineer at NBS srl, San Benedetto del Tronto, Ascoli Piceno, Italy
F. Russo
F. Russo is associate professor at the University of Salento, Lecce, Italy
Nicola Tonellotto
N. Tonellotto is assistant professor at the Department of Information Engi- neering of the University of Pisa.