Search in:

Advanced search

IETE Journal of Research Volume 70, 2024 - Issue 3

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Computers and Computing

A 2-Tier Bengali Dataset for Evaluation of Hard and Soft Classification Approaches

Debapratim Das Dawn1 Department of Computer Science and Engineering, University of Calcutta, Kolkata, India

https://orcid.org/0000-0001-9275-8027 View further author information

Abhinandan Khan2 Product Development and Diversification Division, ARP Engineering, Kolkata, India

https://orcid.org/0000-0002-0338-8325 View further author information

Soharab Hossain Shaikh3 Department of Computer Science and Engineering, Brij Mohan Lal Munjal University, Gurgaon122 413, India

https://orcid.org/0000-0003-3409-8467 View further author information

Rajat Kumar Pal1 Department of Computer Science and Engineering, University of Calcutta, Kolkata, India

https://orcid.org/0000-0001-9838-6500 View further author information

Pages 2430-2452 | Published online: 20 Feb 2023

Cite this article
https://doi.org/10.1080/03772063.2023.2173672
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

V. Korde, and C. N. Mahender, “Text classification and classifiers: A survey,” Int. J. Artif. Intell. Appl., Vol. 3, no. 2, pp. 85, 2012.
Google Scholar
G. Fenk-Oczlon, “Word frequency and word order in freezes,” 1989.
Google Scholar
P. Castells, M. Fernandez, and D. Vallet, “An adaptation of the vector-space model for ontology-based information retrieval,” IEEE, Trans. Knowl. Data. Eng., Vol. 19, no. 2, pp. 261–72, 2006. doi:10.1109/TKDE.2007.22
Google Scholar
Y. Jia, M. Salzmann, and T. Darrell, “Learning cross-modality similarity for multinomial data,” in 2011 International Conference on Computer Vision, IEEE, 2011, pp. 2407–14.
Google Scholar
L. M. Manevitz, and M. Yousef, “One-class SVMs for document classification,” J. Mach. Learn. Res., Vol. 2, no. Dec, pp. 139–54, 2001.
Google Scholar
S.-C. Lin, C.-L. Tsai, L.-F. Chien, K.-J. Chen, and L.-S. Lee, “Chinese language model adaptation based on document classification and multiple domain-specific language models,” in Fifth European Conference on Speech Communication and Technology, 1997.
Google Scholar
Y. Li, and J. Shawe-Taylor, “Using KCCA for Japanese–English cross-language information retrieval and document classification,” J. Intell. Inf. Syst., Vol. 27, no. 2, pp. 117–33, 2006. doi:10.1007/s10844-006-1627-y
Web of Science ®Google Scholar
S. Al-Harbi, A. Almuhareb, A. Al-Thubaity, M. S. Khorsheed, and A. Al-Rajeh, “Automatic Arabic text classification,” 2008, pp. 77–83.
Google Scholar
A. R. Ali, and M. Ijaz, “Urdu text classification,” in Pot 7th International Conference on Frontiers of Information Technology, 2009, pp 1–7.
Google Scholar
H. Ragas, and C. H. A. Koster, “Four text classification algorithms compared on a dutch corpus,” in Pot 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998, pp. 369–70.
Google Scholar
M. Taher Pilevar, H. Feili, and M. Soltani, “Classification of persian textual documents using learning vector quantization,” in 2009 International Conference on Natural Language Processing and Knowledge Engineering, IEEE, 2009, pp. 1–6.
Google Scholar
K. Kafle, D. Sharma, A. Subedi, and A. K. Timalsina, “Improving nepali document classification by neural network,” in Proceedings of IOE Graduate Conference, 2016, pp. 317–22.
Google Scholar
V. Gupta, and V. Gupta, “Algorithm for punjabi text classification,” Int. J. Comput. Appl., Vol. 37, no. 11, pp. 30–5, 2012.
Google Scholar
P. Bolaj, and S. Govilkar, “Text classification for marathi documents using supervised learning methods,” Int. J. Comput. Appl., Vol. 155, no. 8, pp. 6–10, 2016.
Google Scholar
S. Puri, and S. Prakash Singh, “Hindi text document classification system using svm and fuzzy: A survey,” Int. J. Rough Sets Data Anal. (IJRSDA), Vol. 5, no. 4, pp. 1–31, 2018. doi:10.4018/IJRSDA
Google Scholar
A. K. Durga, and A. Govardhan, “Ontology based text categorization-telugu document,” Int. J. Sci. Eng. Res., Vol. 2, no. 9, pp. 1–4, 2011.
Google Scholar
J. Sarmah, N. Saharia, and S. K. Sarma, “A novel approach for document classification using assamese wordnet,” in 6th International Global Wordnet Conference, 2012, pp. 324–9.
Google Scholar
K. Rajan, V. Ramalingam, M. Ganesan, S. Palanivel, and B. Palaniappan, “Automatic classification of tamil documents using vector space model and artificial neural network,” Expert. Syst. Appl., Vol. 36, no. 8, pp. 10914–18, 2009. doi:10.1016/j.eswa.2009.02.010
Web of Science ®Google Scholar
H. P. Luhn, “A statistical approach to mechanized encoding and searching of literary information,” IBM J. Res. Dev., Vol. 1, no. 4, pp. 309–17, 1957. doi:10.1147/rd.14.0309
Web of Science ®Google Scholar
A. B. Parves, A. Al Imran, and M. R. Rahman, ““Incorporating supervised learning algorithms with nlp techniques to classify bengali language forms,” in Proceedings of the International Conference on Computing Advancements, 2020, pp. 1–7.
Google Scholar
H. Borko, and M. Bernick, “Automatic document classification,” J. ACM (JACM), Vol. 10, no. 2, pp. 151–62, 1963. doi:10.1145/321160.321165
Web of Science ®Google Scholar
D. D. Dawn, S. H. Shaikh, and R. K. Pal, “A comprehensive review of bengali word sense disambiguation,” Artif. Intell. Rev., Vol. 53, no. 6, pp. 4183–213, 2020. doi:10.1007/s10462-019-09790-9
Web of Science ®Google Scholar
A. Dhar, H. Mukherjee, N. S. Dash, and K. Roy, “Automatic categorization of web text documents using fuzzy inference rule,” Sādhanā, Vol. 45, no. 1, pp. 1–22, 2020.
Web of Science ®Google Scholar
N. S. Dash, “Compound nouns and adjectives in bangla: Some empirical observations,” 2011.
Google Scholar
M. Mansur, “Analysis of n-gram based text categorization for Bangla in a newspaper corpus,” PhD thesis, BRAC University, 2006.
Google Scholar
A. K. Mandal, and R. Sen, “Supervised learning methods for bangla web document categorization,” arXiv preprint arXiv:1410.2045, 2014.
Google Scholar
M. S. Islam, F. E. M. Jubayer, and S. I. Ahmed, “A support vector machine mixed with tf-idf algorithm to categorize bengali document,” in 2017 International Conference Electrical, Computer and Communication Engineering (ECCE), IEEE, 2017, pp. 191–6.
Google Scholar
M. Rajib Hossain, and M. M. Hoque, “Automatic bengali document categorization based on deep convolution nets,” in Emerging Research in Computing, Information, Communication and Applications, Springer, 2019, pp. 513–25.
Google Scholar
A. Dhar, H. Mukherjee, N. S. Dash, and K. Roy, “Cess-a system to categorize bangla web text documents,” ACM Trans. Asian Low Res. Language Inform. Process. (TALLIP), Vol. 19, no. 5, pp. 1–18, 2020. doi:10.1145/3398070
Web of Science ®Google Scholar
N. Romim, M. Ahmed, H. Talukder, and M. S. Islam, “Hate speech detection in the bengali language: A dataset and its baseline evaluation,” in Proceedings of International Joint Conference on Advances in Computational Intelligence, Springer, 2021, pp. 457–68.
Google Scholar
Available: https://en.wikipedia.org/wiki/Languages_used_on_the_Internet, Last accessed on 2022-12-23.
Google Scholar
D. G. Altman, and J. Martin Bland, “Statistics notes: Detecting skewness from summary information,” Bmj, Vol. 313, no. 7066, pp. 1200, 1996. doi:10.1136/bmj.313.7066.1200
PubMedGoogle Scholar
G. Upton, and I. Cook, Understanding statistics. Oxford: Oxford University Press, 1996.
Google Scholar
J. A. Hartigan, and M. A. Wong, “Algorithm as 136: A k-means clustering algorithm,” J. R. Stat. Soc. Ser. C (Appl. Stat.), Vol. 28, no. 1, pp. 100–8, 1979.
Google Scholar
D. Arthur, and S. Vassilvitskii, “k-means++: The advantages of careful seeding,” Technical report, Stanford, 2006.
Google Scholar
A. Aizawa, “An information-theoretic perspective of tf–idf measures,” Inf. Process. Manag., Vol. 39, no. 1, pp. 45–65, 2003. doi:10.1016/S0306-4573(02)00021-3
Web of Science ®Google Scholar
R. R. Chowdhury, M. T. Nayeem, T. T. Mim, M. Chowdhury, S. Rahman, and T. Jannat, “Unsupervised abstractive summarization of bengali text documents,” arXiv preprint arXiv:2102.04490, 2021.
Google Scholar
S. Ismail, and M. S. Rahman, “Bangla word clustering based on n-gram language model,” in Pot International Conference on Electrical Engineering and Information & Communication Technology, IEEE, 2014, pp. 1–5.
Google Scholar
M. A. Helal, and M. Mouhoub, “Topic modelling in bangla language: An lda approach to optimize topics and news classification,” Comput. Inform. Sci., Vol. 11, no. 4, pp. 77–83, 2018. doi:10.5539/cis.v11n4p77
Google Scholar
A. Dhar, N. S. Dash, and K. Roy, “A fuzzy logic-based bangla text classification for web text documents,” J. Adv. Linguist. Stud., Vol. 7, no. 1–2, pp. 159–187, 2018.
Google Scholar
M. Ahmed, P. Chakraborty, and T. Choudhury, “Bangla document categorization using deep rnn model with attention mechanism,” in Cyber Intelligence and Information Retrieval, Springer, 2022, pp. 137–47.
Google Scholar
M. R. Hossain, M. M. Hoque, N. Siddique, and I. H. Sarker, “Bengali text document categorization based on very deep convolution neural network,” Expert. Syst. Appl., Vol. 184, p. 115394, 2021. doi:10.1016/j.eswa.2021.115394
Google Scholar
D. Chicco, and G. Jurman, “The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation,” BMC. Genomics., Vol. 21, no. 1, pp. 6, 2020. doi:10.1186/s12864-019-6413-7
PubMed Web of Science ®Google Scholar
F. Verhein, and S. Chawla, “Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets,” in Seventh IEEE International Conference on Data Mining (ICDM 2007), IEEE, 2007, pp. 679–84.
Google Scholar
P. Bermejo, J. A. Gámez, and J. M. Puerta, “Improving the performance of naive bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets,” Expert. Syst. Appl., Vol. 38, no. 3, pp. 2072–80, 2011. doi:10.1016/j.eswa.2010.07.146
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A 2-Tier Bengali Dataset for Evaluation of Hard and Soft Classification Approaches

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A 2-Tier Bengali Dataset for Evaluation of Hard and Soft Classification Approaches

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date