ABSTRACT
This study systematically assesses the process mining scenario from 2005 to 2014. The analysis of 705 papers evidenced ‘discovery’ (71%) as the main type of process mining addressed and ‘categorical prediction’ (25%) as the main mining task solved. The most applied traditional technique is the ‘graph structure-based’ ones (38%). Specifically concerning computational intelligence and machine learning techniques, we concluded that little relevance has been given to them. The most applied are ‘evolutionary computation’ (9%) and ‘decision tree’ (6%), respectively. Process mining challenges, such as balancing among robustness, simplicity, accuracy and generalization, could benefit from a larger use of such techniques.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplemental data
Supplemental data for this article can be accessed here
Notes
1. We looked for papers undoubtedly related to process mining, referring to some type of review; i.e., a secondary study (systematic or not). The string used to search through Scopus was: TITLE((“process mining’’ OR “processes mining’’ OR “workflow mining’’ OR “workflows mining’’ OR “mining process’’ OR “mining processes’’ OR “mining workflow’’ OR “mining workflows’’ OR “mining of business processes’’ OR “mining of processes’’ OR “mining of workflows’’ OR ((“business process’’ OR “business processes’’ OR workflow OR workflows) AND “data mining’’)) AND (“review’’ OR “study’’ OR “survey’’ OR “roadmap’’ OR “challenges’’ OR “state-of-the-art’’ OR “state-of-art’’ OR “state of the art’’ OR “state of art’’ OR “open issues’’ OR “lessons’’ OR “opportunities’’ OR “trends’’ OR “research’’ OR “reflections’’ OR “overview’’ OR “limitations’’ OR “outlook’’)).
4. Used only as ‘study’ henceforth in this paper for simplification purpose.
5. This percentage is calculated assuming the detailed authorship data of all studies, in which a single study may have as authorship researchers from countries. Therefore, this percentage can not be defined solely from the data of the related table.
6. The same remark of the previous footnote; however, in this case, for different institutions.
7. The same remark of the previous footnotes; however, in this case, for different authors.
8. This data was obtained empirically by searching the Scopus database.
9. Some clustering tests were performed with data generated from the application of the stemming phase. However, the quality of these clustering in terms of quantization error did not improve and the interpretability of the cluster meanings was impaired. Thus, the stemming phase was excluded from the pre-processing process.
10. Some clustering tests were performed considering only and the non-normalised relation. However, the smallest quantization errors were obtained with the normalised relation.
11. In the context of this paper, as defined in Section 4.2.2.