ABSTRACT

In this work, we present an intelligent system for the automatic categorization of political documents, specifically the documents containing the parliamentary questions collected during the weekly Question Times at the Chamber of Deputies of the Italian Republic. The proposed intelligent system leverages text classification models to perform the document categorization. The system is aimed at supporting and facilitating the research activities of political science scholars, who deal with comparative and longitudinal analysis of thousands of documents. To select the best classification models for our specific task, several classical machine learning and deep learning-based text classification models have been experimentally compared.

Acknowledgments

The authors of this article wish to thank Linda Basile, Enrico Borghetto and Francesco Visconti for their help in manually checking the output of the QTIS Web application generated using the 222 unlabelled PQs as input.

Declaration on interest statement

The authors declare there are no conflicts of interest.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

A copy of the dataset can be accessed here: https://tinyurl.com/5n8mfkeh.

Notes

7 The CAP coding scheme is based on the scheme developed in the early 1990s within the US Policy Agendas Project (www.policyagendas.org). Over time some major topics were folded together, and this is the reason for the missing major topics number 11 and 22..

8 The two datasets adopted in this work are still not publicly available. The Italian Team of the CAP plans to release a new version of the dataset after the termination of the current legislative period.

9 http://hlt.isti.cnr.it/wordembeddings/skipgram wiki window10 size300 neg-samples10.tar.gz:

10 http://hlt.isti.cnr.it/wordembeddings/glove wiki window10 size300 iteration50.tar.gz:

17 The QTIS web application will be publicly available upon acceptance..

19 http://redis.io/

Additional information

Funding

This work is partially supported by the Italian Ministry of Education and Research (MIUR) under the CrossLab project (Departments of Excellence); and the University of Pisa under the AUTENS project (Sustainable Energy Autarky).

Notes on contributors

A. Cavalieri

A. Cavalieri is research fellow at the University of Torino, Italy

Pietro Ducange

P. Ducange is associate professor at the Department of Information Engineering of the University of Pisa, Italy.

S. Fabi

S. Fabi is software engineer at NBS srl, San Benedetto del Tronto, Ascoli Piceno, Italy

F. Russo

F. Russo is associate professor at the University of Salento, Lecce, Italy

Nicola Tonellotto

N. Tonellotto is assistant professor at the Department of Information Engi- neering of the University of Pisa.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.