Search in:

Cogent Engineering Volume 11, 2024 - Issue 1

Submit an article Journal homepage

Open access

325

Views

CrossRef citations to date

Altmetric

Computer Science

Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic

Ezdihar Bifaria Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;b Immersive Virtual Reality Research Group, King Abdulaziz University, Jeddah, Saudi ArabiaCorrespondence[email protected]
View further author information

Arwa Basbraina Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaView further author information

Rsha Mirzaa Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaView further author information

Alaa Bafaila Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaView further author information

Somayah Albaradeia Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi ArabiaView further author information

Wadee Alhalabia Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia;b Immersive Virtual Reality Research Group, King Abdulaziz University, Jeddah, Saudi ArabiaView further author information

Article: 2359850 | Received 16 Oct 2023, Accepted 21 May 2024, Published online: 03 Jun 2024

Cite this article
https://doi.org/10.1080/23311916.2024.2359850
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Figures & data

Table 1. Summary of related studies to the crime classification.

Download CSV Display Table

Table 2. Summary of studies using legal documents.

Download CSV Display Table

Figure 1. The proposed framework architecture.

Figure 2. Flowchart of data collection from CAP website.

Table 3. Statistics of the CAP dataset used for experimentation.

Download CSV Display Table

Table 4. An example of the different information contained in the court document.

Download CSV Display Table

Table 5. Crime dictionary (list of crime tools and associated vocabulary).

Download CSV Display Table

Figure 3. An example of BoW representation with TF-IDF for five documents from the CAP dataset.

Table 6. Statistics of the dataset splitting for the experiment.

Download CSV Display Table

Table 7. The selected value of the Random State parameter ( $N$ ) used in our experiment.

Display Table

Figure 4. Confusion matrix for the classification model.

Figure 5. Statistics of crime documents (%) in the CAP dataset by crime tools. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 6. Heatmap of crime documents (%) in the CAP dataset. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 7. Confusion matrix of ML models using different algorithms. (a) CSE model; (b) CT model.

Figure 8. Classification of documents from the CAP dataset according to our experience. (a) Crime scene existence model. (b) Crime type model.

Thaipisutikul, T., Tuarob, S., Pongpaichet, S., Amornvatcharapong, A., & Shih, T. K. (2021). Automated classification of criminal and violent activities in Thailand from online news articles. 13th International Conference on Knowledge and Smart Technology (KST) (pp. 170–175).

Your download is now in progress and you may close this window

Login or register to access this feature

Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic

Figures & data

Table 1. Summary of related studies to the crime classification.

Table 2. Summary of studies using legal documents.

Table 3. Statistics of the CAP dataset used for experimentation.

Table 4. An example of the different information contained in the court document.

Table 5. Crime dictionary (list of crime tools and associated vocabulary).

Table 6. Statistics of the dataset splitting for the experiment.

Table 7. The selected value of the Random State parameter (N) used in our experiment.

Table 8. Results of CSE model using different algorithms.

Table 9. Comparing result of CSE model with previous study.

Table 10. Results of CT model using different algorithms.

Table 11. Comparing result of CT model with previous studies.

Availability of data and materials

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 7. The selected value of the Random State parameter ( $N$ ) used in our experiment.