ABSTRACT
Since 2005, the deep learning community has had access to input graphs to their models. So, the natural language processing (NLP) community started using this technique to process text. However, a challenge that the graph neural networks (GNN) may encounter is the sensibility to representation format. Since different graphs can represent the same text, the model’s performance may change depending on the representation used. Even though many practitioners have this intuition, only some works touch on this aspect of GNN. Therefore, we explore twelve different text representation strategies that build graphs from text and apply them to the same GNN to investigate how different graphs may affect the results. We divide these strategies into four groups: reading order, dependency-based, binary tree, and graph of words. From these groups, we created the binary tree group for this paper. Nevertheless, in our tests, we observed that the dependency-based representations tend to achieve better performance: The dependency-based methods allow us to stay competitive in five relevant datasets and beat the state-of-the-art in another dataset. These results suggest that performing representation tuning can be a valuable technique to improve a deep learning model.
Acknowledgements
Research financed with funds from the National Development Fund (FNDE, in Portuguese) and the Ministry of Education (MEC, in Portuguese) of the Federal Government of Brazil, carried out by the Center for Scientific Computing and Free Software (C3SL, in Portuguese) of the Federal University of Paraná (UFPR, in Portuguese). We also want to thank the Coordination for the Improvement of Higher Education Personnel (CAPES) - Program of Academic Excellence (PROEX) (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Programa de Excelência Acadêmica (PROEX) in Portuguese).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 https://github.com/daiquocnguyen/Graph-Transformer - Last access on March 07th, 2023
2 https://github.com/HenriqueVarellaEhrenfried/B2W-Datasets - Last access on March 07th, 2023
3 https://tblock.github.io/10kGNAD/ - Last access on March 07th, 2023
4 http://www.cs.cornell.edu/people/pabo/movie-review-data/ - Last access on March 07th, 2023
5 http://disi.unitn.it/moschitti/corpora.htm - Last access on March 07th, 2023