Abstract
Span classification is a popular method for nested named entity recognition. To recognise full-span named entities, span-based models should enumerate and verify all possible entity spans in a sentence, which leads to serious problems regarding computational complexity and data imbalance. In this study, we propose a boundary regression model to support full-span named entity recognition, where a regression operation is adopted to refine spatial locations of entity spans in a sentence. Therefore, instead of exhaustively enumerating all possible spans, we need only verify a small number of them. Span boundaries are regressed to find all possible named entities in a sentence. Furthermore, for a better representation of long-named entities, a multi-granule sentence representation is adopted to encode semantic features with different semantic granularities. In our experiments, even enumerating a small number of entity spans, our model still has competitive performance, achieving 87.35% and 80.85% F1 scores on the ACE2005 and GENIA datasets. Analytical experiments show that our model is able to find all named entities in a sentence without exhaustively verifying all possible entity spans. It is effective in mitigating the computational complexity and data imbalance problems in full-span named entity recognition.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
4 Strictly speaking, an entity span is a substring of a sentence. A textual box is an abstract representation of an entity span in a feature map layer. In this paper, the two terms are interchangeable unless explicitly stated.