1,450
Views
2
CrossRef citations to date
0
Altmetric
Articles

Discovery of effective infrequent sequences based on maximum probability path

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 63-82 | Received 24 Jan 2021, Accepted 01 Jul 2021, Published online: 19 Jul 2021
 

Abstract

Process discovery usually analyses frequent behaviour in event logs to gain an intuitive understanding of processes. However, there are some effective infrequent behaviours that help to improve business processes in real life. Most existing studies either ignore them or treat them as harmful behaviours. To distinguish effective infrequent sequences from noisy activities, this paper proposes an algorithm to analyse the distribution states of activities and the strong transfer relationships between behaviours based on maximum probability paths. The algorithm divides episodic traces into two categories: harmful and useful episodes, namely noisy activities and effective sequences. First, using conditional probability entropy, the infrequent logs are pre-processed to remove individual noisy activities that are extremely irregularly distributed in the traces. Effective sequences are then extracted from the logs based on the state transfer information of the activities. The algorithm is based on a PM4Py implementation and is validated using synthetic and real logs. From the results, the algorithm not only preserves the key structure of the model and reduces noise activity, but also improves the quality of the model.

Acknowledgments

We also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Funding

This work was supported by the National Natural Science Foundation, China [grant numbers 61572035, 61402011], and the Natural Science Foundation of Anhui Province, China [grant number 2008085QD178].