217
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Time anomaly detection in the duration of civil trials in Italian justice

, , , , &
Article: 2283394 | Received 02 Oct 2023, Accepted 09 Nov 2023, Published online: 16 Nov 2023

Abstract

Through the digitalisation of Civil Trials and the implementation of the Telematic Civil Process framework, the Italian Ministry of Justice has amassed a wealth of data covering all facets of modern Trials. While data availability has surged, the focus now lies in actively analysing this data to optimise Trials and curtail their duration. This paper, with a strong emphasis on its outcomes, delves into the analysis of data from the Court of Livorno. It seeks to pinpoint specific events within the Trial workflow that significantly extend Trial durations. Notably, Domain Experts have identified a set of events, intending to validate their pivotal role in recognising critical Trials. Leveraging Machine Learning techniques, the paper evaluates multiple binary classifiers to proactively identify potentially critical Trials, empowering Judges to mitigate future issues. The study has yielded a remarkable 80% accuracy rate in predicting Trials exceeding acceptable duration thresholds.

1. Introduction

The Italian government has been pushing for the digitalisation of public services in the past decade, with most public services now paper-free. The digitalisation of civil trials in 2015 through the Telematic Civil Process (PCT) has simplified the work of judges and chancellors and made a great amount of data available for analysis. Management Control (Massimo Orlando, Citation2020) projects have been implemented in the Italian Civil Trials to develop planning and monitoring workflows to manage resources more effectively and reduce the duration of trials. The duration of trials is a problem that affects Italian courts, and the European Community has provided Key Performance Indicators to compare trials in different contexts. One such index is the Disposition Time, defined by the Commission for the Efficiency of Justice of the Council of Europe (CEPEJ) in European Commission for the Efficiency of Justice – CEPEJ (Citation2018). Another official index is the Inventory formula (or Average Stock) whose formula is reported in Equation (Equation1), as it is also used by the Italian Istituto Nazionale di Statistica (ISTAT). (1) AvStock=365InitialPending+FinalPending2Started+Finished2(1) This means that, if there are 30 Trials pending at the beginning of a year and 33 at the end, while 58 new Trials have been started and 45 have reached their completion within the same year, then the Inventory Formula states that the average duration of a Trial is 247.2 days. The duration measures of trials cannot predict if they will last longer than expected. To do this, past events must be analysed to determine the possible future course. The PCT records all trial events and relevant information in databases, including parties involved and legal personnel.

In this paper, such elements are exploited to characterise the duration of examined Trials, through Machine Learning algorithms. In particular, relevant events and characteristics are examined and a binary classifier, which distinguishes between regular Trials and the ones that last more than a specified period of time, has been developed. A set of specific Events has been addressed in this work, thanks to the input of Domain Experts, to verify if they could be considered sufficient for the recognition of critical Trials, and classical Machine Learning algorithms have been taken into consideration in order to provide immediate feedback to the experts, without employing more complex techniques, that will be indeed taken in consideration in future works. This work has to be considered as a preliminary step to develop more accurate algorithms that can be used to predict future events in a Trial and to notify Court Presidents of potentially critical situations long before these are actually verified.

The remainder of this paper is organised as follows: Section 2 reports some of the works that have already been carried out on this topic or similar ones; Section 3 describes the dataset that has been analysed to create the classifiers, while Section 4 introduces the methodology that has been applied to solve the prediction problem and reports how the methodology has been applied and the consequent results obtained by the developed solution; finally, Section 5 closes the paper with final remarks and pointers to future works.

2. Related works

Identifying anomalies in time and event series is not a novelty, and many different approaches have been applied in order to solve this kind of problem. The work presented in Zheng et al. (Citation2020) analyses relevant data and compares them to an existing data set, used as a baseline for reference. In this way, it can investigate anomalies and divergences between the real stream of data and the theorized one. This kind of approach cannot be directly applied to the dominion taken in examination in this paper, as Civil Trials do not have a unique flux of events, and even Trials of the same kind can follow different paths.

The approaches followed in Bhowmick and Narvekar (Citation2018) and Djenouri et al. (Citation2018) are much more interesting for our case, as they both examine historical archives of events, to detect irregularities with respect to average behaviour. The aforementioned works focus on irregular events that cause traffic congestion but, while there are strong similarities with the objective of the present work, the nature of the events to be analysed in the different dominions makes it impossible to reapply the same solutions. Previous works on the analysis of Civil Trials and the identification of temporal outliers have been carried out in Martino et al. (Citation2021), where statistical and Process Mining techniques were applied to Civil Trials duration to single out potential events that could cause delays. In this paper a different approach, based on machine learning techniques, is applied in order to determine if a Trial is potentially at risk of lasting more than average ones, starting with the analysis of an archive of finished Trials, containing all the events that happened during it.

If we want to consider approaches that are close to the one applied in this paper, we need to take into consideration the work described in Gruginskie and Vaccaro (Citation2018), where the authors have applied basic machine learning approaches to determine the duration of Trials. The authors have considered the outcomes of their predictions at different times after the start of the Trials, but they have also started from well-defined features, while in the present manuscript, one of the objectives was to confirm the validity of the features provided by the Domain Experts.

More recent work has been presented in de Oliveira et al. (Citation2022), where the authors have applied Regression algorithms for the prediction of the duration of Trials, by analysing the Brazilian Law system. Apart from the differences in the duration metrics applied, which depend on the Law systems that have been examined, another divergence regards the selected features: while in the present work, the events that characterise a Trial have been taken into consideration, in de Oliveira et al. (Citation2022) other characteristics, such as the starting date, the number of involved lawyers, and the number of involved parties have been analysed.

The work presented in Alghazzawi et al. (Citation2022) proposes an interesting Deep Learning approach to the forecast of Trial verdicts, which combines Long Short Term Memory and Convolution Networks approaches. While the objectives of our works are surely different, it is important to note that the authors in Alghazzawi et al. (Citation2022) have conducted two-step research, by first determining the optimal set of features to analyse, before applying complex Deep Learning algorithms. This is in line with our current work, as we are first determining the best features to take into consideration to predict a Trial's duration, and then are considering more complex algorithms to solve the prediction problem.

3. Structure of the dataset

In order to correctly apply any kind of Machine Learning algorithm, it is necessary to understand how the available data are organised.

The diagram in Figure  shows the most significant entities and relationships in the database. As it can be seen from the diagram, a Process, or Procedure, is characterised by its association with a Dossier, which in turn contains information regarding the Process's state. Whenever a procedural document is filed by an actor, such as a Judge's order or an appeal by a lawyer, the document is automatically recorded in the system, and this operation generates a new Event in what is referred to as the State/Event machine. This machine is a finite-state automaton where each event moves the Process to a different state. Events determine the Process's evolution and can cause a transition from one state to another in the State/Event machine. Many events do not affect the Process's state and are insignificant. Processes can be classified by Role, Matter, and Object, with each Object defining the Rite and having its own State/Event machine. The Event log is crucial, containing the temporal sequence of events, the unique Process ID, Event ID, Event Date, Registration Date, Actor, and the Process's state before and after the event.

Figure 1. ER Diagram of the Domain.

Figure 1. ER Diagram of the Domain.

4. Applied methodologies

The methodology that has been applied to define the machine learning model used in this research consists of five phases. The first phase consists of identifying the objective of the system, and the second of selecting the relevant features to address the objective. The third phase involves considering basic characteristics such as object and event date to understand the temporal evolution of processes. In the fourth phase, four different classification algorithms are applied and compared using measures such as accuracy, precision, and recall to determine the best-performing ones. The methodology ultimately aims to classify trials as slow or normal based on pre-selected characteristics. A Grid search approach, coupled with K-folding and Random Oversampling, was applied to identify the best parameter for each classifier and to reduce the imbalance problem in the dataset.

In particular, four algorithms were compared: Logistic regression (Logit), Gaussian Naive Bayes (NB), Decision Trees, and Linear Support Vector Machine (SVC).

As a result of the first round of comparisons, which is better elaborated in Section 4.3, it was evident that all the classifiers achieved high accuracy but with very low precision and recall on one of the classes: this was caused by an imbalance between the number of samples in the two classes, which was solved through a re-sampling procedure, the so-called Random Oversampling.

As a fifth and last phase, once the dataset has been balanced and the algorithms have been compared again, in order to understand which one performed better, a Dimensionality Reduction approach, the Principal Component Analysis (PCA), was applied. This has been done because, despite the good number of patterns to be evaluated, the dimension of the feature set was still too big and there was a huge possibility of incurring overfitting errors. After the application of the PCA, the number of features has been drastically reduced, without altering the accuracy of the Classifiers, as reported in Section 4.4.

In the following, the methodology is applied and the different algorithms are compared before and after the application of the PCA.

4.1. Definition of the target and restriction of the dataset

In order to proceed with the analysis of the dataset, and to choose a possible Machine Learning algorithm to apply, we needed to set a specific goal. This goal consisted of determining whether a trial case exceeded a specific time duration limit of three years set by Italian legislation. The trials were divided into two classes, Normal and Slow. However, it was found that the average duration of trials varied greatly depending on the matter they were focussed on. An adaptive threshold was used to determine if a trial was considered Slow based on the average duration for the specific matter it belonged to. Trials were considered Slow if their duration exceeded the average for the object they belonged to by 80%. Equation (Equation2) explicitly shows the two classification conditions. (2) (TrialDurationObjectAvg+80%ObjectAvg)(TrialDuration3yrs)Slow(TrialDuration<ObjectAvg+80%ObjectAvg)&(TrialDuration<3yrs)Normal(2) The 80% value was not considered randomly, but it was discussed and selected in accordion with the domain experts, who considered it a good compromise: initial tests were run with mobile thresholds, raging from 30% to 150%, but it was determined that higher values would have not helped in recognising many slow Trials, while lower values were considered too restrictive.

The problem was thus reduced to a Binary Classification Problem, with only two classes, Normal and Slow.

One of the main characteristics of Trials is their variability, also demonstrated by the fact that the number of events that occur within a trial depends on the matter and object it belongs to. A specific Rite, focussing on General Civil Controversies, was selected to analyse a set of 9797 finished processes, with a range of 7 to over 100 events per trial. Despite restrictions, there are over 550 elements that can occur in a trial, leading to a range of possible outcomes.

4.2. Feature selection

Domain experts were asked to identify specific characteristics that could be analysed to better understand a problem. The experts identified 79 events that typically slow down trials, as well as three other factors (number of parties, lawyers, and technical consultants involved) that can influence trial duration. These findings could help define informed approaches to addressing the problem. At the end of the day, the features considered for our purposes were 80, consisting of the 77 slowing events and the three additional characteristics suggested by the experts. The target is instead represented by the classes Slow and Normal.

Figure  shows the first lines of the original dataset.

Figure 2. The original Dataset.

Figure 2. The original Dataset.

Trials were initially represented as long sequences of events, but this feature vector was unsuitable for a classification problem. Instead, each trial was converted into a vector of 81 integers, with the first 77 integers representing the number of occurrences of specific events and the last three integers representing the number of parties, consultants, and lawyers involved. The new vector was considered more suitable for analysis. The last digit represented the target class, with 0 being the Slow class, and 1 the Normal Class, calculated on the basis of the Trial Duration and of the referenced Average Duration, using the formula reported in Equation (Equation2).

4.3. Application of the classification algorithms

In order to select an appropriate Classification algorithm, we applied different classifiers and then compared their average accuracy. In particular, the Gaussian Bayes, Decision Tree, Linear Support Vector Machine and Logistic Regression Classifiers were tested. Table  reports the best Precision, Recall, F1 Score and Accuracy calculated for each of the tested classifiers after the application of the Grid Search approach, through the GridSearchCV module from the Scikit-Learn Python Library (Kramer, Citation2016). All the subsequent experiments have been carried out in this way, and the reported results are obtained with the best parameters identified by the Gridsearch module. A K-fold approach, with K equal to 5, was coupled to reduce the possibility of overfitting. Listing 1 reports an example of the Grid Search approach, applied to the Decision Tree Classifier.

Table 1. Comparison of the selected Classifiers with the best parameters identified through Grid Search.

While the accuracy of the different classifiers is quite high, it is evident from the analysis of the Precision, Recall and F1-Score that the Slow class is almost not recognised at all by any of the classifiers, except for the Gaussian Naive Bayes which instead recognises everything as Slow. This is due to a strong imbalance in the number of samples available for the two classes: while the Normal class contains 8889 entries, the Slow class only contains 908.

Figure  shows the Confusion Matrices for the different classifiers, confirming that the imbalance in the dataset causes the impossibility for the applied algorithms to correctly recognise the target classes.

Figure 3. Confusion Matrices for the tested Classifiers. (a) Logit. (b) Gaussian NB. (c) Decision Tree and (d) Linear SVC.

Figure 3. Confusion Matrices for the tested Classifiers. (a) Logit. (b) Gaussian NB. (c) Decision Tree and (d) Linear SVC.

It is evident that the dataset needs to be re-balanced. Since the number of available samples belonging to the Slow class is quite small, it would be not beneficial to apply downsampling by simply cutting the elements belonging to the Normal class. Instead, a Random Oversampling of the Slow class has been applied: this allowed us to randomly select the samples from the less populated class, and replicate them until their number matched that of the larger class. By using the RandomOverSampler function offered by Scikit-learn, we obtained a balanced dataset which has been then used to train the different classifiers. The Oversampling was only applied to the Training set and, in order to do so without the need to manually resample each training set within the K-fold, the imbalanced-learn module was used. Listing 2 shows an excerpt of the code using such an approach.

Figure  and Table  report the Normalized confusion matrices and values of Precision, Recall and F1 Score for the different classifiers after the re-balance of the dataset. It is clear that the situation is much better than before, with all the classifiers able to distinguish between the two classes much more clearly than in the unbalanced dataset. The accuracy of all the classifiers is lower than before since the algorithms do not simply classify everything as belonging to the most common class as before, thus achieving very high accuracy with low precision and recall on the rarer class: now the algorithms correctly try to guess to which class the test sample belongs, even if they can identify the wrong class. If we compare the overall performances, the Decision Tree classifier seems to be the best one, as it has very high values of precision and recall for both classes and at the same time, it keeps a very good accuracy, with an 80% score.

Figure 4. Confusion Matrices for the tested Classifiers after the Random Oversampling. (a) Logit. (b) Gaussian NB. (c) Decision Tree and (d) Linear SVC.

Figure 4. Confusion Matrices for the tested Classifiers after the Random Oversampling. (a) Logit. (b) Gaussian NB. (c) Decision Tree and (d) Linear SVC.

Table 2. Comparison of different Classifiers after Resampling.

4.4. Dimensionality reduction through PCA

The dataset on which the different classifiers have been tested contains a consistent number of samples, but at the same time, such samples are described through a high number of features. To reduce the dimension of the feature set, which could result in the improvement of the performances of the classifiers if enough information is retained after the reduction, Principal Component Analysis (PCA) has been applied to the dataset. PCA works on data without assuming previous knowledge of the class each sample belongs to (it is an unsupervised approach), but it can be used by setting how much information has to be retained after the reduction.

By applying the PCA to the dataset, and then calculating the variance of the data according to the number of components that have been selected, it is possible to trace the curve shown in Figure , representing the cumulative explained variance of the PCA.

Figure 5. Cumulative Explained Variance.

Figure 5. Cumulative Explained Variance.

The greater the cumulative variance when considering a higher number of components, the more effectively that number of components represents the information content. The curve attains 100% of the original information with the utilisation of the initial 30 components alone, demonstrating their sufficiency in representing the dataset. Consequently, a substantial reduction in the number of features under consideration can be made. By executing the classification again with the reduced feature set, the results we obtain are the same as shown in Figure and Table . This confirms that the new 30 components that have been synthesised from the existing 80, are enough to classify the balanced classes.

5. Conclusions and future work

In this initial phase of our research, we have designed and implemented a binary classifier that leverages various Machine Learning classification algorithms. The primary goal of this classifier is to identify Trials that exhibit the potential to extend beyond their expected duration by analysing their historical progress. Specifically, our research has concentrated on a set of features, which were identified by Domain Experts as pivotal in distinguishing between Slow and Normal Trials. This analysis was conducted to validate the experts' assertion regarding the significance of these features in Trial classification.

Due to a substantial class imbalance between Normal and Slow Trials, we have employed a Random Oversampling technique to ensure that the classifier does not exhibit bias toward the statistically more frequent class. The tuning of classifier parameters, in conjunction with the Oversampling technique, was carried out using Gridsearch and K-folding. Among the various classifiers compared, the Decision Tree classifier proved to be the most effective for the given scenario. It achieved superior Precision and Recall for both classes compared to alternative methods and attained an accuracy rate of 80% after applying the oversampling technique, which is the highest accuracy among the classifiers under consideration. Additionally, we applied Principal Component Analysis to the balanced dataset with numerous features and found that it did not have an adverse impact on the system's accuracy, thus demonstrating the efficient reduction of the feature count. In future works, we plan to move in three different directions: first of all, we will apply different feature selection methodologies to identify the events that better characterise a Trial, and we will compare them to the ones identified by the experts; as a second step, we will extend the experiment to different Matters, without focussing on a specific one. Different features may be addressed by Expert Systems, and very different results could be obtained with various Matters, as each of them follows a specific Rite, which profoundly differs from the others; finally, we will experiment with more complex classification systems, such as a Neural Network-based binary classifier, compare it with the results obtained via the Decision Tree implementation, and finally apply it to the classification of incomplete sequences of events in a process. In this way, we will be able to identify potentially critical Trials from the early stages of their evolution.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The work described in this paper has been supported by the Project VALERE “SSCeGov - Semantic, Secure and Law Compliant e-Government Processes”; Universitá degli Studi della Campania Luigi Vanvitelli [Valere - SSCeGov].

References

  • Alghazzawi, D., Bamasag, O., Albeshri, A., Sana, I., Ullah, H., & Asghar, M. Z. (2022). Efficient prediction of court judgments using an lstm+ cnn neural network model with an optimal feature set. Mathematics, 10(5), 683. https://doi.org/10.3390/math10050683
  • Bhowmick, K., & Narvekar, M. (2018). Trajectory outlier detection for traffic events: A survey. In intelligent computing and information and communication (pp. 37–46). Springer.
  • de Oliveira, R. S., Reis Jr, A. S., & Sperandio Nascimento, E. G. (2022). Predicting the number of days in court cases using artificial intelligence. PloS One, 17(5), e0269008. https://doi.org/10.1371/journal.pone.0269008
  • Djenouri, Y., Zimek, A., & Chiarandini, M. (2018). Outlier detection in urban traffic flow distributions. In 2018 IEEE international conference on data mining (ICDM) (pp. 935–940). IEEE.
  • European Commission for the Efficiency of Justice – CEPEJ (2018). Revised saturn guidelines for judicial time management – (3rd revision). https://rm.coe.int/cepej-2018-20-e-cepej-saturn-guidelines-time-management-3rd-revision/16808ff3ee.
  • Gruginskie, L. A. d. S., & Vaccaro, G. L. R. (2018). Lawsuit lead time prediction: Comparison of data mining techniques based on categorical response variable. PloS One, 13(6), e0198122. https://doi.org/10.1371/journal.pone.0198122
  • Kramer, O. (2016). Scikit-learn. In Machine learning for evolution strategies (pp. 45–53). Springer.
  • Martino, B. D., Cante, L. C., Esposito, A., Lupi, P., & Orlando, M. (2021). Temporal outlier analysis of online civil trial cases based on graph and process mining techniques. International Journal of Big Data Intelligence, 8(1), 31–46. https://doi.org/10.1504/IJBDI.2021.118746
  • Massimo Orlando, G. V. (2020). Il controllo di gestione negli uffici giudiziari: il “laboratorio” livorno. Questione Giustizia Trimestrale promosso da Magistratura democratica – Fascicolo 1. Eguaglianza e diritto civile.
  • Zheng, B., Rizzo, P., & Nasrollahi, A. (2020). Outlier analysis of nonlinear solitary waves for health monitoring applications. Structural Health Monitoring, 19(4), 1160–1174. https://doi.org/10.1177/1475921719876089