325
Views
0
CrossRef citations to date
0
Altmetric
Computer Science

Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic

, , , , &
Article: 2359850 | Received 16 Oct 2023, Accepted 21 May 2024, Published online: 03 Jun 2024

Figures & data

Table 1. Summary of related studies to the crime classification.

Table 2. Summary of studies using legal documents.

Figure 1. The proposed framework architecture.

Figure 1. The proposed framework architecture.

Figure 2. Flowchart of data collection from CAP website.

Figure 2. Flowchart of data collection from CAP website.

Table 3. Statistics of the CAP dataset used for experimentation.

Table 4. An example of the different information contained in the court document.

Table 5. Crime dictionary (list of crime tools and associated vocabulary).

Figure 3. An example of BoW representation with TF-IDF for five documents from the CAP dataset.

Figure 3. An example of BoW representation with TF-IDF for five documents from the CAP dataset.

Table 6. Statistics of the dataset splitting for the experiment.

Table 7. The selected value of the Random State parameter (N) used in our experiment.

Figure 4. Confusion matrix for the classification model.

Figure 4. Confusion matrix for the classification model.

Figure 5. Statistics of crime documents (%) in the CAP dataset by crime tools. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 5. Statistics of crime documents (%) in the CAP dataset by crime tools. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 6. Heatmap of crime documents (%) in the CAP dataset. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 6. Heatmap of crime documents (%) in the CAP dataset. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.
Figure 6. Heatmap of crime documents (%) in the CAP dataset. (a) Crime type: Beating. (b) Crime type: Shooting. (c) Crime type: Stabbing. (d) Crime type: Strangulation.

Figure 7. Confusion matrix of ML models using different algorithms. (a) CSE model; (b) CT model.

Figure 7. Confusion matrix of ML models using different algorithms. (a) CSE model; (b) CT model.

Table 8. Results of CSE model using different algorithms.

Table 9. Comparing result of CSE model with previous study.

Table 10. Results of CT model using different algorithms.

Table 11. Comparing result of CT model with previous studies.

Figure 8. Classification of documents from the CAP dataset according to our experience. (a) Crime scene existence model. (b) Crime type model.

Figure 8. Classification of documents from the CAP dataset according to our experience. (a) Crime scene existence model. (b) Crime type model.

Availability of data and materials

The dataset supporting the conclusions of this article is available in the [Harvard Law School Collection] repository, [https://case.law/about/].