4,616
Views
6
CrossRef citations to date
0
Altmetric
Research Article

A study on the evaluation of tokenizer performance in natural language processing

ORCID Icon & ORCID Icon
Article: 2175112 | Received 17 Jun 2022, Accepted 27 Jan 2023, Published online: 09 Feb 2023

Figures & data

Figure 1. Illustrative of 10-fold cross-validation in this study.

Figure 1. Illustrative of 10-fold cross-validation in this study.

Figure 2. Illustrative of overall research flow.

Figure 2. Illustrative of overall research flow.

Table 1. The results of accuracy for SentencePiece and Mecab-Ko each classification algorithm.

Table 2. The results of precision for SentencePiece and Mecab-Ko each classification algorithm.

Table 3. The results of recall for SentencePiece and Mecab-Ko each classification algorithm.

Table 4. The results of F1-score for SentencePiece and Mecab-Ko each classification algorithm.

Figure 3. Result of the accuracy for token number of SentencePiece of each learning algorithm.

Figure 3. Result of the accuracy for token number of SentencePiece of each learning algorithm.

Figure 4. Result of the F1-score for token number of SentencePiece of each learning algorithm.

Figure 4. Result of the F1-score for token number of SentencePiece of each learning algorithm.