Full-span named entity recognition with boundary regression

Junhui Yua State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China;b School of Computer Science and Technology, Guizhou University, Guiyang, People's Republic of ChinaView further author information

Yanping Chena State Key Laboratory of Public Big Data, Guizhou University, Guiyang, People's Republic of China;b School of Computer Science and Technology, Guizhou University, Guiyang, People's Republic of China;e Engineering Research Center of Text Computing & Cognitive Intelligence laboratory, Guiyang, People's Republic of ChinaCorrespondence[email protected]
View further author information

Qinghua Zhengc Xi'an Jiaotong University, Xi'an, People's Republic of ChinaView further author information

Yuefei Wuc Xi'an Jiaotong University, Xi'an, People's Republic of ChinaView further author information

Ping Chend University of Massachusetts, Boston, MA, USAView further author information

Abstract

Span classification is a popular method for nested named entity recognition. To recognise full-span named entities, span-based models should enumerate and verify all possible entity spans in a sentence, which leads to serious problems regarding computational complexity and data imbalance. In this study, we propose a boundary regression model to support full-span named entity recognition, where a regression operation is adopted to refine spatial locations of entity spans in a sentence. Therefore, instead of exhaustively enumerating all possible spans, we need only verify a small number of them. Span boundaries are regressed to find all possible named entities in a sentence. Furthermore, for a better representation of long-named entities, a multi-granule sentence representation is adopted to encode semantic features with different semantic granularities. In our experiments, even enumerating a small number of entity spans, our model still has competitive performance, achieving 87.35% and 80.85% F1 scores on the ACE2005 and GENIA datasets. Analytical experiments show that our model is able to find all named entities in a sentence without exhaustively verifying all possible entity spans. It is effective in mitigating the computational complexity and data imbalance problems in full-span named entity recognition.

Keywords:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 http://nlp.stanford.edu/data/glove.6B.zip

2 https://github.com/cambridgeltl/BioNLP-2016

3 https://nlp.stanford.edu/software/lex-parser.shtml

4 Strictly speaking, an entity span is a substring of a sentence. A textual box is an abstract representation of an entity span in a feature map layer. In this paper, the two terms are interchangeable unless explicitly stated.

Additional information

Funding

This work is supported by the Joint Funds of the National Natural Science Foundation of China [grant numbers 62166007, 62066007, 62066008, 62050194, 62037001].

Full-span named entity recognition with boundary regression

Information for

Open access

Opportunities

Help and information

Full-span named entity recognition with boundary regression

Abstract

Disclosure statement

Notes

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature