1,552
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Towards Malay named entity recognition: an open-source dataset and a multi-task framework

, ORCID Icon, & ORCID Icon
Article: 2159014 | Received 15 Aug 2022, Accepted 06 Dec 2022, Published online: 28 Dec 2022

Figures & data

Table 1. Malay NER datasets.

Table 2. Data source.

Figure 1. Dataset construction process. It consists of two parts: preliminary construction and iterative optimisation.

Figure 1. Dataset construction process. It consists of two parts: preliminary construction and iterative optimisation.

Table 3. Audit guideline.

Table 4. An example of MS-NER.

Figure 2. The MTBR framework structure. Due to space limitation, [B-P, I-P, B-L, I-L, B-O, I-O, O] in the figure represents [B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG, OTHER].

Figure 2. The MTBR framework structure. Due to space limitation, [B-P, I-P, B-L, I-L, B-O, I-O, O] in the figure represents [B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG, OTHER].

Figure 3. Probability alignment. If wi is the first token of the detected entity, the probabilities would flow towards the black arrow; otherwise they would flow towards the red arrow.

Figure 3. Probability alignment. If wi is the first token of the detected entity, the probabilities would flow towards the black arrow; otherwise they would flow towards the red arrow.

Table 5. Model settings.

Table 6. Main performance.

Table 7. Performance of different modules.

Table 8. BRE analysis.

Table 9. Case study. MTBR gives all correct predictions in these cases.

Table 10. MTBR performance on more languages.