609
Views
2
CrossRef citations to date
0
Altmetric
Research Article

Discovery of process variants based on trace context tree

, , &
Article: 2190499 | Received 11 Nov 2022, Accepted 19 Mar 2023, Published online: 11 Apr 2023

Abstract

Process variants usually exhibit a high degree of internal heterogeneity, in the sense that the executions of the process differ widely from each other due to contextual factors, human factors, or deliberate business decisions. Understanding differences among process variants helps analysts and managers to make informed decisions as to how to standardise or otherwise improve a business process. Existing process variant mining approaches typically fall short in full supporting semantic process variability mining, especially rarely taking activity behaviour relationships and trace context semantic into consideration. Here, we propose a semantic process variant discovery method, aimed at solving the difficulty of distinguishing similar-but-different behaviours directly from event logs. More specifically, we adapt concepts of benchmark logs and trace context tree to formalise context semantic of event log, to classify benchmark logs into several parts, thereby the clustered trace cohorts are mapped to discover the configurable process variants. In the experimental part, some performance metrics of the proposed method are evaluated and calculated by real-world event logs, supporting the usefulness of the proposed method. The experimental results show that the proposed method is able to distinguish similar-but-different behaviours and is superior to the characteristic trace clustering method using conventional neural networks.

1. Introduction

Owing to inevitable software maintenance and adaptability of process models, many process models are derived from the same base model in practical applications, in order to match the increasing individualisation of customer demands. These kind of configurable process models is gaining importance, as an example it can offer various benefits like reusability and flexibility compared to traditional predefined business process models. Configurable process models derived from the same base model usually have a high level of similarity, and cannot be differentiated from each other using state-of-the-art similarity measurements. This is especially true in the scenario of process variant mining directly from event logs, without relying on any a priori reference model.

Process variant analysis is a set of techniques to analyse event logs produced during the execution of a process, in order to identify and explain the differences between two or more process models (van der Aalst, Citation2022). The goal of process variant analysis is to help business analysts or stakeholders to understand why and how multiple variants of a process differ (Lopez-Martinez-Carrasco et al., Citation2021).

In this setting, a process variant is a subset of executions of a business process that can be distinguished from others based on some characteristics (Taymouri et al., Citation2021). In this work, we call a set of execution logs as similar-but-different process variants, which usually have high degrees of similarities. For example, an organisation may have different process orchestration for some given specific business process, such as multiple products sales processes in different countries(say C1,C2,C3,C4), or multiple accounting processes in different branches(say C1,C2,C3,C4). So, the actual executions of the same process may vary with time and geography, we can obtain 4 similar-but-different process variants: one for each of these countries or branches. In these variants, some relevant event data such as location, different business modules, products, and customer types could change, but the main process models are similar, and can be divided into differentiated clusters. The sub-models of clusters are functionally homogeneous, but can be differentiated from each other by some number of partial variations, and these similar models can be formalised, understood, and expressed as process variants.

The process variants have proven to be a mainstream development technology for flexible business systems adapting to different markets, and a wide range of methods for process variant analysis have been proposed in the past decade, such as configurable BPMN(Business Process Modelling Notation), configurable Petri nets, etc (Van Den Ingh et al., Citation2021). Latest process variants mining or discovery techniques can be divided into three categories:

  1. Process variability modelling methods, which mainly deduce process variants through configurable operations to base model;

  2. Configurable process mining methods, which discover process variants through semantic trace fragmentation or slicing operations;

  3. Trace clustering methods based on machine learning, which utilise characteristic data clustering methods to extract process variants.

Due to the interdisciplinary nature of this field, the existing methods and the types of differences they can identify vary widely. The challenges encountered while managing process variants discovery are related to the models creation and the configuration. Recently, process mining offers some advanced techniques to discover, check conformance of models, and enhance configurable process models using a collection of event logs, that captures traces during the execution of process variants (Bettina et al., Citation2022). However, existing works in configurable process mining lack the incorporation of semantics in the resulting model. Historically, semantic process mining has been applied to event logs to improve process discovery with respect to semantic (De Leoni et al., Citation2016; Khannat et al.,Citation2021).

This paper integrates the advantages of configurable process mining and trace clustering methods, and presents a log-based process variants discovery method. The main contributions of this paper are as the following:

  1. The formalisation of the behaviour semantic of event log, enriching the collection of event logs with configurable benchmark log concepts that capture variability of elements present in the logs. This is an important step towards discovering semantically enriched process variants. First, concepts of benchmark log and trace context tree are formalised to describe the behaviour semantics of event log, where kth- strict order relationship between activities are highlighted. Then, a kind of weighted frequency cosine similarity measurement is presented, in order to select the representative activity nodes and their neighbourhood length in benchmark log. Finally, the context tree of event log is constructed in the form of frequent pattern tree(abbreviated as FP tree).

  2. The construction of a configurable process variants discovery method based on trace context tree, named semantic α splitting method. Semantic α splitting method can discover trace clusters directly from event log, and it combines behaviour profiles and trace clustering techniques together. In the experimental part, results show that semantic α splitting method can identify process variants that cannot be distinguished by the existing methods, as well as it has higher fitness and higher precision than characteristic tracing clustering methods using conventional neural network (abbreviated as CNN), especially in the scenarios that different variants have different probability distribution.

The trace semantic α splitting method starts from the point of configurable process mining, and is also an effective method, which can improve characteristic trace clustering method in fitness and precision. The first advantage is to construct an approach of configurable process mining framework incorporating behaviour profiles, so it enriches process variant mining techniques in configurable process mining; the second advantage is to build a kind of trace similarity measurement incorporating various behaviour relationships of activities in event log, so it extends the classical similarity measurement in machine learning; the third advantage lies in that it can simplify the traces of event log to the greatest extent, in the meanwhile preserving the behavioural relationships of activities.

In addition to these three advantages, the biggest innovation of the proposed method is highlighted as the short and long dependencies among activities of trace semantic extraction, which is expressed as a trace context tree in neighbourhood length of k(k1).

The remainder of this paper is structured as follows. Section 2 reviews some basic concepts and notations, and Section 3 introduces the related work. Section 4 presents an illustrative motivation example. Section 5 introduces the proposed method of this work, named semantic α splitting method based on the activity context of event log. Section 6 conducts experiments and analyses the experimental results. Finally, Section 7 concludes this paper.

2. Preliminaries

In this section, we briefly review a couple of terminologies such as events, traces, event log, and log behaviour profiles based on previous work (Agostinelli et al., Citation2023; Lu et al., Citation2022; Wang et al., Citation2022), in order to ease the readability of this paper.

A business process is a set of activities executed in a given setting to achieve predefined business object. An activity is an expression of the form A(a1,a2,,anA), where A is the activity name and each ai is an attribute name. We call nA the arity of A. The attribute names of an activity are all distinct, but different activities may contain attributes with matching names (Agostinelli et al., Citation2023).

We assume a finite set Act of activities, all with distinct names; thus, activities can be identified by their name, instead of by the whole tuple. Every attribute ai of an activity A is associated with a type DA(ai), i.e. the set of values that can be assigned to ai when activity is executed (Agostinelli et al., Citation2023).

An event is the execution of an activity and is formally captured by an expression of the form e=A(v1,v2,,vnA), where AAct is an activity name with viDA(ai) (Agostinelli et al., Citation2023). The set of events is denoted as Event.

A trace is formally defined as finite sequences of events σ=e1,e2,,en with ei=Ai(v1,v2,,vnAi). Traces model process executions, i.e. the sequences of activities performed by a process instance CID. A finite collection of executions into a set L of traces is called an event log (Agostinelli et al., Citation2023).

Take the event log in Table  as an example. The event log contains 10 traces, with 3 instances of trace C1(cf. column “CID”), 2 instances of C2, etc. Herein, σ=A,B,C,E refers to any of the three instances of C1, such that Eσ={e0,e1,e2,e4} and ψ(e0)=A,ψ(e1)=B,,ψ(e4)=E, etc.

Table 1. An example of event logs.

Definition 2.1

Weak order relationship of events (Fang et al., Citation2020; Lu et al., Citation2022)

Let L be an event log, and σi=<e1,e2,em>∈σ be any trace in L, the weak order relationship L(Event×Event) contains all events pairs (x,y), such that j,k{1,,m}j<kmej=x,ek=y.

Definition 2.2

Log behaviour profiles (Fang et al., Citation2020; Lu et al., Citation2022)

Let L be an event log, an event pair (x,y)(Event×Event) is in at most one of the following relations:

  1. the strict order relation , if xLyyLx.

  2. the interleaving order relation ∥, if xLyyLx.

  3. the strict reverse order relation 1, if yLxxLy.

The set BL={,,1} is the log behaviour profile of L.

In the behaviour profile BL of L, the exclusiveness relations are not appear, for the reason that if a pair of events are exclusive each other, then they definitely not occur in the same trace concurrently. However, the opposite is not necessarily always true. So, we can not deduce exclusiveness relationship based on event log alone.

Definition 2.3

Frequent Pattern Tree (FP tree) (Borah & Nath, Citation2018)

A triple FPT=(Tr,P,H) that meets the following conditions is called frequent pattern trees, where:

  1. Tr is a root node of the tree;

  2. P is the item prefix subtree, its node item is denoted as tr=(tname,count,nodelink), where tname represents the identifier of the node item, count represents the number of subpaths from root to it, and nodelink represents the next node with the same identifier tname in the prefix subtree;

  3. H=(iname,hnodelink) is a frequent item header table, where iname stands for frequent item identification domain, and hnodelink is a pointer to the first frequent item node with the same item identifier in the prefix subtree.

3. Related work

Process variant analysis is a rather broad topic, and the main research question (RQ for short) of this field is can be summarised as “given a set of two or more process variants executions, how to identify and explain the differences among variants (Taymouri et al., Citation2021)?” The RQ can be tackled from three different perspectives: process variability modelling, configurable process mining and trace clustering.

3.1. Process variability modelling methods

From the perspective of process variability modelling, the process variants are in terms of a cluster of process models, and different operational managements to the base model could generate process cohorts (Döhring et al., Citation2014; Li et al., Citation2011; Rosa et al., Citation2017; Taymouri et al., Citation2021; Van Den Ingh et al., Citation2021). In this scenario, the up mentioned RQ can be simplified into a fine-grained problem named RQ1.

  1. Given a set of two or more process variants models, how to identify and explain the difference among variants?

Suppose that the base model of a specific business process is given as known, some process variants clusters or families can be obtained via configuring personalised operations to base model. These configurable operations are performed by stakeholder, process manager, or end-user, and can be expressed in the forms of Not-Functional-Requirement (Taymouri et al., Citation2021), reasonable process fragments (Schunselaar et al., Citation2012), declarative variability rules (van Beest et al., Citation2019), and etc.

Although, the digital and the physical worlds are closely aligned, and it is possible to track operational management processes in detail to some extent, however, there exist challenges for identify these variants (Pourbafrani et al., Citation2020). While employing model-based comparison to process variants, the key problem is related to the fact that the variants are compared in terms of their model structure whereas we aim to compare the behaviour. So, a kind of low-level behavioural representation is preferred, i.e. transition systems, instead of high-level process modelling languages, such as BPMN or Petri nets. However, low-level modelling methods are fall short in state-explosion. Moreover, existing process variability modelling methods are mostly from the perspective of control flow, another drawback of model-based approaches is that they are unable to detect differences in terms of frequency or other perspectives. Therefore, some additional comprehensive techniques could be take into consideration, supporting advanced process variability modelling.

3.2. Configurable process mining methods

Process mining (van der Aalst, Citation2022) is a body of methods and tools to analyse business execution logs (named event logs), and many organisations have adopted process mining tools that use these event data for the discovery and analysis of the actual execution of their business process. In this context, an event log describes all that occurred during the execution of the relevant system by the end-users, such as events, activities, time stamps, case instance, etc.

From the perspective of process mining, the process variants are finally formalised as a cluster of process models, so the RQ defined upcoming can be further refined a fine-grained problem, named RQ2.

  1. Given a set of event logs of two or more process variants executions, how to identify and explain the differences among process variants?

This kind of process variants mining methods starts from event logs directly, and does not depend on any priori knowledge about the business process model, and the first step is mostly splitting the event logs into cohorts using some trace merging or splitting operations. Chan et al. used process mining technology to mine configurable process models from event log collections, and proposed a frequency-based method to guide the configuration process for discovering process variants (Chan et al., Citation2014); Folino et al. proposed an automatically discovering method from the perspective of the control flow of event logs (Folino et al., Citation2015). This method generated workflow patterns collections of process models, each workflow pattern describes a trace cluster, which can further used to discover process variants; Bolt et al. integrated the control flow perspective and performance perspective of the event log together, so as to detect related process variants in an interactive manner (Bolt et al., Citation2017). Addressing to the problem that existing works in configurable process discovery lack the incorporation of the semantics in the resulting model, Khannat et al. proposed a novel method to enrich the collection of event logs with configurable process ontology concepts by introducing semantic annotations, which can capture the variability of elements in the event log (Khannat et al., Citation2021).

Besides the studies mentioned above, there are some other related process fragmentation or slicing studies about process variants. Owing to the fact that process variants are composed of several process fragments with commonalities and differences, so they are usually with high similarity. These similarities can be used to merge a cluster of variants together. Hasankiyadeh et al. used a process slicing algorithm to identify process fragments in the event log (Hasankiyadeh et al., Citation2014). Later, Pourmasoumi et al. proposed an algorithm to extract morphological fragments from the event log (Pourmasoumi et al., Citation2017). Reusing the extracted morphological fragments are predictably to reduce the cost of designing a new process model, and speed up the design progress. Hmami et al. introduced configurable process models and the variability concept in change mining approaches, and propose an approach of merging and filtering a collection of event logs from the same family with respect to variability (Hmami et al., Citation2021). This method is aim to enhance change mining from a collection of event logs and detect changes in variable fragments of the obtained event log.

Historically, semantic process mining has been applied to event logs to improve process discovery with respect to semantic. So, at present, in the study of configurable process mining, the greatest challenge lies in that how to introduce semantic in the mining procedures, enhance distinguishing similar-but-different variants effectively. This motivates why, in this paper, we have opted context tree combining frequency and behaviour relationships together as the semantic expression of event log.

3.3. trace clustering methods

From the perspective of machine learning, the process variants are finally formalised as a cluster of process traces, so RQ defined upcoming is equivalent to RQ2.

Machine learning is the systematic design, analysis and study of algorithms and systems that learn from past experience. Machine learning is inherently a multidisciplinary field. As for RQ2, trace clustering is a suitable technique to divide event log into several clusters or cohorts.

Most of the state-of-the-art process model discovery methods focus on how to find well-structured and understandable process models (De Leoni et al., Citation2016; Zandkarimi et al., Citation2020). However, due to the highly flexible and complex nature of processes, it may be particularly difficult to find actual process models being executed in some real-life environments, such as health-care, product development, customer support and other processes. When processing such unstructured processes, process mining algorithms can generate an incomprehensible ”spaghetti like” process model (De Koninck et al., Citation2021). One of the main reasons is the diversity of event logs, that is, there are local, non-significant differences between several process execution instances. There are many implicit process variants in these process execution instances, and the model of each variant is more suitable to describe some personalised logs than the complete process model.

Therefore, in order to solve the problems caused by the local diversity of event logs, researchers have proposed many different techniques. In addition to event log filtering, event log conversion, event log sampling and specially developed process mining algorithm (De Leoni et al., Citation2016), another method to overcome this limitation is trace clustering (De Koninck et al., Citation2021; Delias et al., Citation2023; Luengo & Sepúlveda, Citation2011; Tariq et al., Citation2021; Tavares et al., Citation2022; Vertuam Neto et al., Citation2021; Xu & Liu, Citation2019; Zandkarimi et al., Citation2020).

Trace clustering techniques divide the log into more homogeneous subsets by reducing the number of process log instances that participate in the analysis at one time, and combining the similarity metrics to measure the similarity between process instances. The obtained research conclusions show that trace clustering techniques would definitely enhance process mining, for the reason that all traces in each cluster can be analysed independently in flexible environment. In the subsequent research, many researchers tried to improve the trace clustering algorithms from the perspectives of feature extraction, trace coding and distance measurement, in order to enable the process mining algorithm to generate more accurate process models.

Trace clustering not only has advantages in model discovery, but also can be used in the fields of conformance checking (De Koninck et al., Citation2021), compliance detection (Tavares et al., Citation2022; Xu & Liu, Citation2019), log repairing (Tariq et al., Citation2021; Vertuam Neto et al., Citation2021), concept drift detection (Richetti et al., Citation2022), and process monitoring prediction (Tang et al., Citation2022).

3.4. Summary of related work

As outlined in the previous subsections, a wide range of methods have been proposed to tackle the problem of process variant mining. However, because of the heterogeneous nature of the underlying algorithms, there exist some deficiencies and challenges shown in Table  that need to further study and improve. The State-of-the-art log-based process variant mining methods, either from configurable process mining, or trace clustering, are designed to distinguish similar-but-different behaviours of process variants, and only few researches take the relationships of activities or event semantics into consideration.

Table 2. The comparisons of different process variants discovery methods.

Behaviour profile has been proved to be an effective technique to evaluate the relationships between activities in event logs (Tang et al., Citation2022). Therefore, based on our previous work (Fang et al., Citation2020), this paper takes the relationships between activities named activity context semantics into consideration, and proposes a new log-based approach to discover process variants, incorporating the concept of Frequent Pattern tree in pattern mining field (Borah & Nath, Citation2018), in order to distinguish similar-but-different behaviours among process variant clusters.

4. Motivation

In order to provide a better understanding of process variants mining method, we first introduce an example to illustrate the motivation of this paper.

Assuming that there are two sets of event logs L1={E,A,B274,E,F,B375,I,G,J,H476,I,C,D,J,H875} and L2={E,A,B,J274,E,F,B,J300,E,X,B,J75,I,G,J,H276,E,C,D,I,H675,E,G,H400}. Model M1 shown in Figure (a) are mined from L1 using inductive mining algorithm on pm4py platform .Footnote1 pm4py is the leading open source process mining platform written in Python. Model M2 shown in Figure (b) is obtained by applying four personalised operations to model M1, named Move(M1,E,start,I),Move(M1,J,B,end), Insert(M1,X,E,B), Move(M1,I,D,H). Here, Insert(M1,X1,X2,X3) means insert activity X1 into the position of after X2 and before X3 in model M1, Move(M1,X1,X2,X3) means move activity X1 to the position of after X2 and before X3 in model M1.

Figure 1. A reference model and its process variant. (a) The mined reference model M1 for log L1 using inductive mining method (b) A process variant model M2 from M1 by personalised operations.

Figure 1. A reference model and its process variant. (a) The mined reference model M1 for log L1 using inductive mining method (b) A process variant model M2 from M1 by personalised operations.

Suppose that L1 and L2 are known, and the priori models M1 and M2 are keep unknown. Let L=L1L2, that is mixing the two groups of event logs together. We use inductive mining on pm4py for log L, and the mined model from L is obtained as shown in Figure . The fitness of the model for log L is 1, however the precision is only 0.41.

Figure 2. The mined model of log L through inductive mining.

Figure 2. The mined model of log L through inductive mining.

Trace clustering is a common method that can enhance the discovery of process models, and has been widely used in the field of process mining. Here, we use trace clustering method with the event frequency coding technique, two model clusters are resulted as shown in Figure . In order to further evaluate the effectiveness of different methods in identifying similar-but-different process variants, we conduct a series of trace clustering and process mining experiments on log L, the related experimental results are listed in Table .

Figure 3. Two clusters deduced from log L through tracing clustering method (Xu & Liu, Citation2019). (a) The first cluster model through tracing clustering method (b) The second cluster model through tracing clustering method.

Figure 3. Two clusters deduced from log L through tracing clustering method (Xu & Liu, Citation2019). (a) The first cluster model through tracing clustering method (b) The second cluster model through tracing clustering method.

Table 3. Experimental results comparison among different methods for log L=L1L2.

It is obviously that trace clustering methods indeed enhance process mining approaches (as shown in Table ), as the performance indicators of fitness and precision of each cluster are higher than those using process mining method alone. It is noteworthy that these 4 characteristic clustering methods have a common bottleneck, more specifically, the fitness and precision are all equal to 1 in one cluster, however, in the other cluster, precision is relatively low. So, it is indicated that characteristic trace clustering methods can not distinguish upcoming mentioned similar-but-different behaviours of process variants. The experimental results in Table  also verify that characteristic trace clustering and process variant mining methods are different from each other, especially in the scenario of distinguishing similar-but-different behaviours.

As shown in this motivation example, this paper proposes a novel approach to discover process variants, named trace semantic α splitting method. It is assumed that none of reference models is given as known, and the process variants are mined directly from event log. The intuitive idea of our approach is to extract the benchmark log from the event log by trace compression. In trace compression, some properties such as activities behaviour profile, trace frequencies, and etc. are preserved, but the obtained benchmark log can simplify the initial log to the greatest extent, and inherently reduce the complexity of corresponding log processing algorithm.

5. Proposed trace semantic α splitting method

5.1. Benchmark log extraction

In order to discover process variants from event log, this section formalises the concept of context log, and uses it as the basis of benchmark log extraction.

Let L={σi:i1} be the available event log set, which is a simplified formalisation of L={CID,σ,Lattr{1,2,m}}, A={Ai:i1} be the activities set. For a given activity aA, a sub-log related to the activity a is denoted as Lsub(a)={πcontext(a)(σi)i1}, where context(a)={{a}{x}xA(x,a)BL}, and πX(σi) represents the projection of event sequence σi on set X.

Definition 5.1

kth-strict order relationship

In the event log L={σi:i1}, two activities x and y are in kth-strict order relationship, denoted as xky, if and only if (xy)(t=a1a2anL:ai=xai+k=y:1in)(tp=a1a2anL:tpt,ai=xai+l=yk<l).

As the relationship of xy means that there exists flow relationship between activity x and activity y, and the kth-strict order relationship has more relax preconditions than those in strict order relationship, so we can choose reasonable value of k in xky relationship to limit the neighbourhood length of the activity x.

Definition 5.2

Context Log

Let Aak be the kth-context alphabet corresponding to the activity a, and Lak be the context log of the activity a, Lak={titiPr(L,Aak)}, where:

  1. Aak={a}{ai:aiA} satisfies (alaiaila,1lk)(aai)(a1ai);

  2. Pr(L,A) is a mapping function extended from πA(σi), which represents the projection of event log L on activity set A, Pr(L,Aak)={πAak(σi)|σiL1i|L}.

The event log in Table  is used to illustrate the concept of context log. Activity z is selected for an example. According to the kth-strict order relationship, activities b, c, i, d, l and activity z are in 1st-strict order relationship, and activities b, c, e, h, m, i and activity z are in 2nd-strict order relationship. Therefore, according to Definition 5.2, bz and Z2m can be obtained. If the length is limited to 2, the 2th-context alphabet corresponding to activity z is Az2={z,b,c,d,e,l,h,i,m}, then the context log is Lz2={(bcizde)5,(cbizdh)3,(cibzlm)2,(ibczlm)2}.

Table 4. An example of event log L to illustrate context log.

Since there may be more than one context log for a given activity, so the concept of weighted frequency cosine similarity is proposed on the basis of cosine similarity, which helps to select the most suitable log from the context log as the benchmark log.

Definition 5.3

Benchmark log

Let L={σi:i1} be the available event log set, A={Ai:i1} be the activities set, Aak be the kth-context alphabet corresponding to the activity a, and Lak be the context log of the activity a, Lak={titiPr(L,Aak)}. Benchmark log is defined as a set of Lak, denoted as BenchL={Lak|aAk1}.

Definition 5.4

Weighted frequency cosine similarity

Let X(xj,xj,xj1),Y(yj,yj,yj1) be two three-dimensional vectors, xjα(α(,,1)) be the frequency of each trace in the context log with α relationship, yjα(α(,,1)) be the frequency of each trace in the original event log with α relationship, ωjα(α(,,1)) be the weight distribution of α relationship, pi is the percentage of each trace frequency in the event log L, then the weighted frequency cosine similarity between X and Y are denoted as COS(X,Y): (1) COS(X,Y)=i=1npij=13xjαyjαωjα2j=1kxjα2ωjα2j=1kyjα2ωjα2(1)

If COS(X,Y)=1, then the X and Y match exactly; on the contrary, if COS(X,Y)=0, then X and Y don't match at all. The closer the weighted frequency cosine similarity is to 1, the higher the matching degree. In this paper, we use COS(X,Y) as a metric to select the benchmark log.

Therefore, the value of COS(X,Y) can be used to select the benchmark log.

5.2. Context tree construction based on FP tree

Generally, there exist a set of common activities among different process variants, process variants are usually realised through individual orchestration and configuration to these common activities. Therefore, this subsection starts with the common activities of trace, and gives definitions of trace context and context tree of event log, to illustrate the semantic contexts of activities and traces.

Definition 5.5

Trace context

Let L={σi:i1} be the available event log set, σi is a trace, LCP be the longest common prefix of traces in log L, SP is called as the context of the trace σi, if and only if SP={d2σi|σi=LCP|d}, where the symbol “|” represents the concatenation operator.

As the activity common prefix can be represented by a prefix tree, in order to effectively identify the semantic context, a novel prefix tree structure named context tree is introduced here on the basis of the frequent pattern tree (Definition 2.6).

Definition 5.6

Context Tree

A triple CT=(Tr,P,H) that fulfill the following conditions is called a context tree, where:

  1. Tr is a root node of context tree;

  2. P is the context prefix subtree, the node t=(tname,count,nodelink) in the context prefix subtree. Among them, tname represents the activity name of the node, count represents the number of subpaths from root to it, and nodelink represents the next node with the same identifier tname in the prefix subtree (if none, then the next node is recorded as null);

  3. H=(iname,hnodelink) is a context header table, where iname represents activity name, hnodelink is a pointer to the first node with the activity name in the prefix subtree.

The context tree corresponding to the event log in Table  is shown in Figure .

Figure 4. The context tree corresponding to the event logs in Table .

Figure 4. The context tree corresponding to the event logs in Table 2.

It is can be concluded that each trace in the event log is substituted as a branch of the context tree (as shown in Figure ). The context tree has a top-down layout, and traces with the same prefix share a branch block of the root node. At the same time, the context header table can help us to retrieve the structure faster during the dynamic construction and query of the tree.

5.3. Selection of activity node and neighbourhood length

Due to the different choices of activity nodes and the length of the neighbourhood, the context logs extracted from the same event log are likely different, which obviously resulting in different benchmark logs. If calculating Aak for each activity node in activity set, then the number of context logs generated afterwards is definitely huge, and thus bring a highly complicated calculation complexity to the selection of benchmark log.

In order to simplifies the calculation difficulty, it is suggested that the parameters of activity node and its neighbourhood length should be selected and determined carefully.

Firstly, it is reasonable to narrow the nodes in the activity set to a controllable range. The selection of activity nodes is mainly determined according to the number of occurrence times (i.e. frequency) that they appear in the event log, and the activity nodes are resorted by frequency in descending order. At the same time, the average frequency of activity nodes is calculated, and the activity nodes that occur less frequently are excluded, which reduces the selection range of active nodes; the activity nodes selected in the narrowed range can be further divided into different clusters based on frequency, and a representative activity node can be further selected in each node cluster for next operation.

Secondly, it is also crucial to choose the length of the neighbourhood in order to select the appropriate context log.According to the selected activity node a, the corresponding kth-context alphabet Aak is determined, that is, the activity alphabet displays the context activities of the activity node.

Summarily, for the selected activity node a, an important step is determining the value of k in Aak, i.e. the neighbourhood length.

5.4. Algorithms and complexity analysis

Here, we propose semantic α trace splitting method to discover process variants directly from event log. Three algorithms (Algorithms 1− 3) are formalised to illustrate the in-depth procedures of this method.

5.4.1. Algorithms

Algorithm 1 extracts benchmark log from initial event log, and there are 4 parameters should given as constants in advance, which are threshold ϕ, weighted values for ωjα((α(,,1))). It is suggested that ϕ is a value between 0.6 and 0.8, and this value is user defined. Algorithm 2 aims to constructing the context tree on the basis of benchmark logs, and Algorithm 3 uses benchmark logs and context trees as input to discover process variants. It is noteworthy that the final process variants we mined are depicted in the form of Petri nets.

5.4.2. Complexity analysis

Given an event log L, suppose the number of activities contained in the log be n, the number of traces be m, now L is used as input to analyse the complexity of each algorithm.

The core of Algorithm 1 is to extract the benchmark log: calculating average frequency avg of all activities in event log, deleting those activity nodes with frequencies lower than avg in the alphabet, and classifying the remained activities to form a classification. After a series of operations, p×k benchmark logs are obtained. Then the time complexity of extracting the benchmark log is O(pk); the core of Algorithm 2 is to construct the context tree, assuming that the number of activities in the longest common prefix is z, and the number of activities of the remaining sequence of activities is q, the corresponding time complexity is O(m(z+q)); the core of Algorithm 3 is mining process variants. Suppose that there are x clusters, and the traces in the benchmark log are added to the clusters using context tree, so the time complexity is O(mx). Additionally, the complexity of mapping clusters of benchmark log to their counterparts in the original event log is O(x). Therefore, the total time complexity of mining process variants from the event log is O(pk+m(z+q)+x), which has equal complexity with O(n2).

6. Experiment and evaluation

In this section we apply our proposed method on three kinds of real complex industrial logs, showing that, although designed for distinguish some similar-but-different behaviours, such as in banks credit authorisation system (BPIC 2012) (Bautista et al., Citation2012), the proposed method can provide insights and unveil some deficiencies of existing methods. An important fact is that our proposed process variants mining method can improve the deficiency of trace clustering in variant mining, especially in the scenario that different variants with imbalanced distributions. The first part of this section describes variant mining procedures in the credit authorisation system (BPIC 2012), the second part provides a comparative analysis between this work and the existing methods, and the third part provides an in-depth discussion about the findings and potential limitations.

6.1. Case study

In practice, disposal of banks credit authorisation may have different policies due to different types of lenders. Based on these policies, event logs are generated during the execution of the credit business process from BPIC 2012, which is taken a case study to validate the effectiveness of the proposed method in this paper. By using the credit model alignment technology (Borah & Nath, Citation2018) for change operations, the operational logs were extracted from CPN tools platform, and the extracted event logs are shown in Table . There are 23 activities in the event log, and the activity label table is N={a,b,c,d,e,f,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w}. The meaning of each alphabet in Table  is described as the following. a (accepting credit applications), b (starting to process credit applications), c (registering customer information), d (checking customer credit), e (contacting the bank),f (checking company funds), m (approving Loan type), n (approval request), o (check income),p (archive request),q (contact customer), r (check documents), g (check interest),h (stop inspection),i (verify residential area type),j (check property information ), k (request the mortgage insurer), l (verify the information is qualified), s (verify the loan amount), t (verify the system funds), u(end the verification), v (end the inspection phase), w (the loan is successful).

Table 5. Event logs of credit disposal processes.

In Algorithm 1, rows 1–2 are executed to obtain the frequency of each activity in the activity label table N, which are listed in Table  according to frequency in descending order. The average frequency avg of these activities is calculated as 166.87, and it is used as a threshold value.

Table 6. Frequency of activities.

Every activity node whose frequency below avg is filtered out according to Algorithm 1, in order to reduce the number of activity nodes in the activity alphabet N, resulting a filtering activity alphabet is obtained as N={a,b,w,c,d,e,h,n,o,p}. Line 5 of Algorithm 1 is executed to classify the nodes in the activity alphabet N into two categories according to their frequencies, i.e. 350 and 200, respectively, and line 7 is executed to randomly select a representative node from each cluster, namely b and h, which is used to perform the further operations.

After that, lines 10–11 in Algorithm 1 are executed, and the context activities corresponding to the kth-strict order relationship of representative nodes b and h in length k are calculated, the activity nodes Aik are listed in Table .

Table 7. kth-strict order relationships of nodes b and h.

In order to avoid underfitting due to a small number of activity nodes, we control the number of activity nodes are controlled within a certain range, by using the judging condition shown in line 12 of Algorithm 1. Lines 13–14 in the Algorithm 1 are designed to control the number of activities contained in the activity relationship table within a certain threshold range, and the threshold range is a user defined parameter(in this paper, the threshold range is set to 60%–80%). Therefore, any length l below or above k is not taken into consideration, ensuring that the contextual alphabets are controlled in a reasonable complexity. There are a total of 23 activities in the activity alphabets, then the number of active nodes based on the length k ranges from 13.8 to 18.4, which is used to obtain Aik, where i = b or h.

From Tables  and , it can be induced that the neighbourhood length selected for node b is k = 5, and the lengths selected for node h can be k = 5, k = 6 or k = 7 according to the range of 13.8–18.4. However, since the activity alphabets corresponding to lengths k = 6 and k = 7 of node h are the same, so k is set to 6 for node h.

Table 8. Activity alphabet with node b of length k.

Table 9. Activity alphabet with node h length k.

The context logs of nodes b and h are extracted from the event log by mapping L to Aik in line 17. The details are shown in Table , where each row in the table represents a trace in the context log.

Table 10. Context logs for nodes b and h.

Hereafter, these context logs in Table  are used as the basis for calculating Equation (Equation1) (lines 18–20). The weights of three behavioural relations are set as the following. The weight of the strict order relation is set to 45%, the weight of the interleaving order relation is set to 30%, and the weight of the strict inverse order relation is set to 25%. The calculation results is listed in Table .

Table 11. Weighted frequency cosine similarity calculation results.

As indicated as in Table , Lh6 is selected as the benchmark log, and based on it, line 3 of Algorithm 2 is executed to split each trace in the benchmark log Lh6 into two parts: the longest common prefix of the trace and the remaining active sequence. Taking the first trace in the benchmark log Lh6 as an illustration, ab is the longest common prefix of the trace and the remaining active sequence is d1=<cdehijlkmnopw>. After that, Algorithm 2 are executed to update the context tree with new activities iteratively, and the final context tree is constructed as shown in Figure .

Figure 5. Context tree of Lh6.

Figure 5. Context tree of Lh6.

In Algorithm 3, initially each trace in benchmark log forms a cluster, and then a trace distance measurement distance(bti,btj)=1|πA(bti)πA(btj)|/|A| is utilised to merge the nearest two traces into the same cluster iteratively, until the cluster number is less than or equals to the value we set in priori. After completing the traces clustering in benchmark log, the counterpart of each trace in benchmark is identified in the original event log. Finally, the mined process variants models are depicted in the form of Petri net. So, based on the context tree in Figure , the benchmark log is clustered into 4 clusters, i.e. Lh16={abcdehijlkmnopw47,abdcehijklmnoqw28,abcdehijlkmnop25}, Lh26={abhw50},Lh36={abcdfhnw48,abdcfhnw52},Lh46={abhoqw48,abhopw52}. The last 3 lines of the Algorithm 3 are executed to map the benchmark log to the original event log, and the final process variant model are mined in the form of Petri nets, as shown in Figure .

Figure 6. Process variants (a), (b), (c), (d) found in the event log. (a) Housing loan process (b) Student loan process (c) Commercial loan process (d) Small loan process.

Figure 6. Process variants (a), (b), (c), (d) found in the event log. (a) Housing loan process (b) Student loan process (c) Commercial loan process (d) Small loan process.

As each activity in Figure  has a specific meaning, so the credit processes can be specifically interpreted as housing loan processes, student loan processes, commercial loan processes, and microloan processes. It is obviously that these 4 process variants behaves many commonalities and a certain degree of differences. Distinguishing the behaviours of these similar-but-different process variants can definitely enhance process mining, and surely bring convenience to the subsequent management by organisational managers.

6.2. Comparative analysis

In order to validate the feasibility of our approach, we have evaluated two kinds of real-life event cases, and compared this work with the activity recommendation approach (Chan et al., Citation2014), and tracing clustering method (Xu & Liu, Citation2019) using CNN coding (Pan et al., Citation2020).

The research work of the activity recommendation approach (Chan et al., Citation2014) develops process variants by using event logs to recommend activities in the process model, provided that the process model is known. Obviously, the main deficiencies of this method are listed as follows: (1) it requires a priori known process model; (2) it has a large computational complexity. However, comparatively, the proposed trace semantic α splitting method in this paper overcomes the upcoming two problems. Table  gives an illustrative comparison between these two methods, where nA represents the number of activities, nP represents the number of business process variants, n represents the maximum number of public activities located on a layer, and k represents the number of layers considered.

Table 12. Comparison of process variants discovery methods.

To validate the event complexity calculated by the method in this paper, the dataset used for the experiments in this section consists of two parts: the event logs of the bank credit process and the library book checkout and return process, and the data can be achieved from the website .Footnote2 The main indices of the event log are shown in Table .

Table 13. Holistic information of datasets.

In order to effective comparison, we calculate: (1) the average number of configuration steps required to derive the process variants; (2) the proportion of traces in the log that the process variants can be completely replayed; (3) the accuracy of the model. The results are shown in Table .

Table 14. Performance indicators comparisons with (Chan et al., Citation2014).

Furthermore, we conduct another comparison experiment with tracing clustering using CNN method, based on the same two datasets. The experimental results are listed in Table , and graphical comparisons are shown in Figure .

Table 15. Performance indicators comparisons with (Xu & Liu, Citation2019).

Figure 7. Fitness and precision comparisons based on daset1 and dataset2. (a) Performance comparisons with (Chan et al., Citation2014) (b) Performance comparisons with (Xu & Liu, Citation2019).

Figure 7. Fitness and precision comparisons based on daset1 and dataset2. (a) Performance comparisons with (Chan et al., Citation2014) (b) Performance comparisons with (Xu & Liu, Citation2019).

From Tables  and , some conclusions can be made as the following.

  1. Compared with the work of Chan et al. (Citation2014), the average configuration steps required to derive the process variants are similar for both methods, but our method is slightly better than the activity recommendation method; furthermore, the differences between our work and (Chan et al., Citation2014) is statistically significant in fitness and accuracy performance, as the P-values by Wilcoxon Test are equal to 0.0017 in both of fitness and accuracy indicators. So, it is obviously that our proposed method is superior to the activity recommendation method.

  2. Compared with the work of Xu and Liu (Citation2019), the differences in fitness and accuracy performance are not statistically significant, as the P-value by Wilcoxon Test in fitness is 0.27, while in accuracy it is 0.74, both of them are greater than significance level(0.05). However, the mean fitness and the mean accuracy are both higher than those in Xu and Liu (Citation2019).

In order to further discuss the differences between our work and the trace clustering method in Xu and Liu (Citation2019), next subsection develops another two experiments on different event logs with different distribution of the variants.

6.3. Further experimental discussions

In some scenarios, the distribution of different variants may be extremely non-uniform, such as BPIC2015. In the event logs of BPIC2015, most of trace instances are with only one occurrence, so the number of variants is very similar to the number of traces. In order to evaluate the validation of our method dealing with this kind of event logs, here we conduct two comparative experiments. The first one is based on a set of small-scale handmade event logs, where each trace instance occurs only once; the second one is based on the event logs of BPIC2015.

As shown in the section of “4. Motivation”, two sets of event logs are used to illustrate the novelty of this work. Here, we convert these two sets of event logs into new ones, where each trace occurs only once, i.e. two sets of event logs L1={E,A,B1,E,F,B1,I,G,J,H1,I,C,D,J,H1} and L2={E,A,B,J1,E,F,B,J1,E,X,B,J1,I,G,J,H1,E,C,D,I,H1,E,G,H1}. Then, based on event logs of L=L1L2, the experimental results are listed in Table .

Table 16. Comparisons among different methods for log L=L1L2.

Similarity, we conduct another similar experiment on the dataset of BPIC 2015. There are 5 sub logs in BPIC2015, named BPIC2015-1, BPIC2015-2, BPIC2015-3, BPIC2015-4, BPIC2015-5. BPIC2015 represents the union of the 5 sub logs. Taking BPIC2015-1 as an example, there are 1199 trace instances in it, and the frequency of each trace is equals to 1. We use inductive mining method, trace clustering method and our process variants discovery method proposed in this work respectively to deal with BPIC2015, and the results are shown in Table .

Table 17. Comparisons among different methods for BPIC 2015 log.

Figure  gives a comprehensive and graphical comparisons for the mentioned two kinds of event logs. From Tables –  and Figure , it is noticed that the almost mining methods are in the same fitness level, such as Inductive mining, trace clustering by SOM, trace clustering by Kmeans, trace clustering by Agglomerative clustering, and our proposed method, except Heuristics mining. However, compared with the fitness indicator, precision level is a relatively differentiated indicator. From Tables  and  and Figure , we can notice that our method has higher precision level in all sub logs of BPIC 2015 and artificial log L. So, the data in Tables  and  and Figure  gives the evidence that the proposed method in this work is superior to characteristic tracing clustering method in tackling event logs with imbalance variants distribution. The reason is that the method proposed in this paper not only considers the frequency information of traces, but also highlights on the activity relationships of the log, so the proposed method can effectively capture the behavioural differences of different variants.

Figure 8. Fitness and precision comparisons for event logs with different variant distributions. (a) fitness of L (b) precision of L (c) fitness of BPIC2015 (d) precision of BPIC2015.

Figure 8. Fitness and precision comparisons for event logs with different variant distributions. (a) fitness of L′ (b) precision of L′ (c) fitness of BPIC2015 (d) precision of BPIC2015.

Admittedly, as the proposed method requires additional procedures for activity context calculation within k neighbourhood length, so the running time will be slightly longer than the characteristic clustering methods. Taking BPIC2015 event log as an example, the characteristic trace clustering method using CNN takes 3.97 seconds, while our method takes 16.87 seconds. Based on the mentioned three kinds of medium-scale datasets, a detail execution time comparison is depicted in Figure .

Figure 9. Execution time of different methods.

Figure 9. Execution time of different methods.

7. Conclusions and future work

Process variants are a set of models or execution logs, which have high degrees of similarity, however, the behaviour of each process variant is differentiated from the others. In the scenario of only event logs are given as known, how to realise process variants mining is an open difficult problem. At present, latest trace clustering methods can significantly improve process mining in fitness and precision. However, when we encounter the problem of process variants mining, state-of-the-art trace clustering methods cannot work effectively because of the inherently high similarity of variants. To the best of our knowledge, there is only few amount of researches aiming at discovering process variants directly from the perspective of configurable process mining, and do not rely on any priori process model.

In this work, we propose a semantic α splitting method based on activity context of event log, to effectively discovering process variants directly from event logs, and obtain the process variant clusters. Unlike the previous work based on configuration operations, the proposed method combines the advantages of configurable process mining and trace clustering methods, it presents a trace similarity measurement incorporating behaviour profiles of event log. The biggest innovation of the proposed method lies in that it extracts the trace semantic in the form of trace context tree, where the short and long dependencies of activities are expressed as kth-strict order relationship. The kth-strict order relationship is a kind of behaviour profiles, and it helps to simplifies the event logs to the greatest extent, and hence reduce the calculation complexity as possible. The paper achieves two main objectives:

  1. A framework of semantic process variants mining is constructed, which depicts the semantic of event log as context tree, and simplifies the event log to the greatest extent by using the activities alphabet within k neighbourhood length. This approach highlights calculating traces directly without converting them into any other forms, and shortens the length of traces to be tackled in the event log.

  2. An approach of process variants discovery method is present, which can effectively discover the process cohorts. The mined process variants are hard to identified for their inherently high similarity. We conduct a series of experiments based on real life datasets, and compare the proposed work with those in configurable process mining, or trace clustering. Through experiments and discussions, it is demonstrated that the proposed method works effectively with high fitness and precision, foremost, it can discover variants that cannot be mined through characteristic tracing clustering method.

It is undeniable that, while the proposed method in this paper has the up mentioned advantages, it also has some shortcomings. For example, it has longer execution time than characteristic trace clustering methods, although this execution time is also in an acceptable range. As future work, we will focus on the performance improvement of configurable process mining algorithms, and build a development environment by plug-in component in PROM(Process Mining Framework). Also, exploring comprehensive configurable process mining method involved multi-perspective event attributes, such as resources, organisations, and etc., would be a key research direction of future work.

Acknowledgments

We also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 61902002].

Notes

References

  • Agostinelli, S., Chiariello, F., Maggi, F. M., Marrella, A., & Patrizi, F. (2023). Process mining meets model learning: Discovering deterministic finite state automata from event logs for business process analysis. Information Systems, 114, 102180. https://doi.org/10.1016/j.is.2023.102180
  • Bautista, A. D., Wangikar, L., & Akbar, S. M. K. (2012). Process mining-driven optimization of a consumer loan approvals process. In BPM 2012: Business process management workshops (pp. 219–220). BPI Challenge.
  • Bettina, F., Sergio, F., Filippo, F., & Luigi, P. (2022). Process mining meets argumentation: Explainable interpretations of low-level event logs via abstract argumentation. Information Systems, 107, 101987. https://doi.org/10.1016/j.is.2022.101987
  • Bolt, A., van der Aalst, W. M., & Leoni, M. D. (2017, October 23–27). Finding process variants in event logs: (Short Paper). In On the move to meaningful Internet systems. OTM 2017 conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, Proceedings, Part I (pp. 45–52). Springer International Publishing.
  • Borah, A., & Nath, B. (2018). Fp-tree and its variants: Towards solving the pattern mining challenges. In Proceedings of first international conference on smart system, innovations and computing, Singapore (pp. 535–543). Springer.
  • Chan, N. N., Yongsiriwit, K., Gaaloul, W., & Mendling, J. (2014). Mining event logs to assist the development of executable process variants. In International conference on advanced information systems engineering (pp. 548–563). Springer International Publishing.
  • De Koninck, P., Nelissen, K., Baesens, B., Snoeck, M., & De Weerdt, J. (2021). Expert-driven trace clustering with instance-level constraints. Knowledge and Information Systems, 63(5), 1197–1220. https://doi.org/10.1007/s10115-021-01548-6
  • De Leoni, M., van der Aalst, W. M., & Dees, M. (2016). A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Information Systems, 56, 235–257. https://doi.org/10.1016/j.is.2015.07.003
  • Delias, P., Doumpos, M., Grigoroudis, E., Manolitzas, P., & Matsatsinis, N. (2015). Supporting healthcare management decisions via robust clustering of event logs. Knowledge-Based Systems, 84, 203–213. https://doi.org/10.1016/j.knosys.2015.04.012
  • Delias, P., Doumpos, M., Grigoroudis, E., & Matsatsinis, N. (2023). Improving the non-compensatory trace-clustering decision process. International Transactions in Operational Research, 30(3), 1387–1406. https://doi.org/10.1111/itor.v30.3
  • Döhring, M., Reijers, H. A., & Smirnov, S. (2014). Configuration vs. adaptation for business process variant maintenance: An empirical study. Information Systems, 39, 108–133. https://doi.org/10.1016/j.is.2013.06.002
  • Fang, H., Jin, P. P., Fang, X. W., & Wang, L. L. (2020). Process variants cluster mining method based on causal behavioral profiles. Computer Integrated Manufacturing System, 26(6), 1538–1547. https://doi.org/10.13196/j.cims.2020.06.010
  • Folino, F., Guarascio, M., & Pontieri, L. (2015). Mining multi-variant process models from low-level logs. In International conference on business information systems (pp. 165–177).Springer International Publishing.
  • Hasankiyadeh, A. P., Kahani, M., Bagheri, E., & Asadi, M. (2014). Mining common morphological fragments from process event logs. In Proceedings of 24th annual international conference on computer science and software engineering (pp. 179–191). IBM Corp.
  • Hmami, A., Sbai, H., & Fredj, M. (2021). Enhancing change mining from a collection of event logs: Merging and filtering approaches. Journal of Physics: Conference Series, 1743(1), 012020. https://doi.org/10.1088/1742-6596/1743/1/012020
  • Khannat, A., Sbai, H., & Kjiri, L. (2021). Configurable process mining: Semantic variability in event logs. In ICEIS (pp. 768–775). SCITEPRESS.
  • Li, C., Reichert, M., & Wombacher, A. (2011). Mining business process variants: Challenges, scenarios, algorithms. Data & Knowledge Engineering, 70(5), 409–434. https://doi.org/10.1016/j.datak.2011.01.005
  • Lopez-Martinez-Carrasco, A., Juarez, J. M., Campos, M., & Canovas-Segura, B. (2021). A methodology based on trace-based clustering for patient phenotyping. Knowledge-Based Systems, 232, 107469. https://doi.org/10.1016/j.knosys.2021.107469
  • Lu, K., Fang, X., Fang, N., & Asare, E. (2022). Discovery of effective infrequent sequences based on maximum probability path. Connection Science, 34(1), 63–82. https://doi.org/10.1080/09540091.2021.1951667
  • Luengo, D., & Sepúlveda, M. (2011). Applying clustering in process mining to find different versions of a business process that changes over time. In International conference on business process management (pp. 153–158). Springer.
  • Medeiros, A. K. A. D., Guzzo, A., Greco, G., Van der Aalst, W. M., Weijters, A., B. F. V. Dongen, & Sacca, D. (2007). Process mining based on clustering: A quest for precision. In International conference on business process management (pp. 17–29). Springer.
  • Pan, Y., Zhang, L., & Li, Z. (2020). Mining event logs for knowledge discovery based on adaptive efficient fuzzy Kohonen clustering network. Knowledge-Based Systems, 209, 106482. https://doi.org/10.1016/j.knosys.2020.106482
  • Pourbafrani, M., van Zelst, S., & Aalst, W. (2020). Supporting automatic system dynamics model generation for simulation in the context of process mining. In 23rd International conference on business information systems (pp. 249–263). Springer International Publishing.
  • Pourmasoumi, A., Kahani, M., & Bagheri, E. (2017). Mining variable fragments from process event logs. Information Systems Frontiers, 19(6), 1423–1443. https://doi.org/10.1007/s10796-016-9662-x
  • Richetti, P., Jazbik, L. S., Baiao, F. A., & Campos, M. (2022). Deviance mining with treatment learning and declare-based encoding of event logs. Expert Systems with Application, 187, 115962. https://doi.org/10.1016/j.eswa.2021.115962
  • Rosa, M. L., Aalst, W. M. V. D., Dumas, M., & Milani, F. P. (2017). Business process variability modeling: A survey. ACM Computing Surveys (CSUR), 50(1), 1–45. https://doi.org/10.1145/3041957
  • Schunselaar, D. M., Verbeek, E., Van Der Aalst, W. M., & Raijers, H. A. (2012). Creating sound and reversible configurable process models using CoSeNets. In International conference on business information systems (pp. 24–35). Springer Berlin Heidelberg.
  • Tang, Y., Li, T., Zhu, R., Liu, C., & Zhang, S. (2022). A hybrid genetic service mining method based on trace clustering population. IEICE Transactions on Information and Systems, E105D(8), 1443–1455. https://doi.org/10.1587/transinf.2021EDP7190
  • Tariq, Z., Charles, D., McClean, S., McChesney, I., & Taylor, P. (2021). An event-level clustering framework for process mining using common sequential rules. In International conference for emerging technologies in computing (pp. 147–160). Springer International Publishing.
  • Tavares, G. M., Barbon Junior, S., Damiani, E., & Ceravolo, P. (2022). Selecting optimal trace clustering pipelines with meta-learning. In J. C. Xavier-Junior, R. A. Rios (Eds.), Intelligent systems (pp. 150–164). Springer International Publishing.
  • Taymouri, F., La Rosa, M., Dumas, M., & Maggi, F. M. (2021). Business process variant analysis: Survey and classification. Knowledge-Based Systems, 211, 106557. https://doi.org/10.1016/j.knosys.2020.106557
  • van Beest, N., Groefsema, H., García-Bañuelos, L., & Aiello, M. (2019). Variability in business processes: Automatically obtaining a generic specification. Information Systems, 80, 36–55. https://doi.org/10.1016/j.is.2018.09.005
  • Van Den Ingh, L., Eshuis, R., & Gelper, S. (2021). Assessing performance of mined business process variants. Enterprise Information Systems, 15(5), 676–693. https://doi.org/10.1080/17517575.2020.1746405
  • Van der Aalst, W. (2011). Process mining: Discovery, conformance and enhancement of business processes (2nd ed.). Springer.
  • van der Aalst, W. M. P. (2022). Process mining: A 360 degree overview. Springer International Publishing
  • Vertuam Neto, R., Tavares, G., Ceravolo, P., & Barbon, S. (2021). On the use of online clustering for anomaly detection in trace streams. In XVII brazilian symposium on information systems (pp. 1–8). Association for Computing Machinery.
  • Wang, Q., Shao, C., Fang, X., & Zhang, H. (2022). Business process recommendation method based on cost constraints. Connection Science, 34(1), 2520–2537. https://doi.org/10.1080/09540091.2022.2133083
  • Xu, J., & Liu, J. (2019). A profile clustering based event logs repairing approach for process mining. IEEE Access, 7, 17872–17881. https://doi.org/10.1109/ACCESS.2019.2894905
  • Zandkarimi, F., Rehse, J. R., Soudmand, P., & Hoehle, H. (2020). A generic framework for trace clustering in process mining. In 2020 2nd International conference on process mining (icpm) (pp. 177–184). IEEE.