1,676
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Full-span named entity recognition with boundary regression

, , , &
Article: 2181483 | Received 26 Oct 2022, Accepted 12 Feb 2023, Published online: 02 Mar 2023
 

Abstract

Span classification is a popular method for nested named entity recognition. To recognise full-span named entities, span-based models should enumerate and verify all possible entity spans in a sentence, which leads to serious problems regarding computational complexity and data imbalance. In this study, we propose a boundary regression model to support full-span named entity recognition, where a regression operation is adopted to refine spatial locations of entity spans in a sentence. Therefore, instead of exhaustively enumerating all possible spans, we need only verify a small number of them. Span boundaries are regressed to find all possible named entities in a sentence. Furthermore, for a better representation of long-named entities, a multi-granule sentence representation is adopted to encode semantic features with different semantic granularities. In our experiments, even enumerating a small number of entity spans, our model still has competitive performance, achieving 87.35% and 80.85% F1 scores on the ACE2005 and GENIA datasets. Analytical experiments show that our model is able to find all named entities in a sentence without exhaustively verifying all possible entity spans. It is effective in mitigating the computational complexity and data imbalance problems in full-span named entity recognition.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

4 Strictly speaking, an entity span is a substring of a sentence. A textual box is an abstract representation of an entity span in a feature map layer. In this paper, the two terms are interchangeable unless explicitly stated.

Additional information

Funding

This work is supported by the Joint Funds of the National Natural Science Foundation of China [grant numbers 62166007, 62066007, 62066008, 62050194, 62037001].