Full article: An improved Monte Carlo Tree Search approach to workflow scheduling

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Workflow computing has become an essential part of many scientific and engineering fields, while workflow scheduling has long been a well-known NP-complete research problem. Major previous works can be classified into two categories: heuristic-based and guided random-search-based workflow scheduling methods. Monte Carlo Tree Search (MCTS) is a recently proposed search methodology with great success in AI research on game playing, such as Computer Go. However, researchers found that MCTS also has potential application in many other domains, including combinatorial optimization, task scheduling, planning, and so on. In this paper, we present a new workflow scheduling approach based on MCTS, which is still a rarely explored direction. Several new mechanisms are developed for the major steps in MCTS to improve workflow execution schedules further. Experimental results show that our approach outperforms previous methods significantly in terms of execution makespan.

KEYWORDS:

I. Introduction

As the computational structure of modern scientific and engineering applications grows ever more complex and large-scale, workflow computing has become an inevitable parallel processing model. To improve computational performance, workflow scheduling (Topcuoglu et al., Citation2002) has become an important research topic on modern parallel processing platforms, including various high-performance computing platforms and cloud computing environments. There are a lot of methods proposed to resolve the challenging NP-complete workflow scheduling problem in the literature. Most of them are heuristic-based algorithms like HEFT (Topcuoglu et al., Citation2002), PEFT (Arabnejad & Barbosa, Citation2014), and IPPTS (Djigal et al., Citation2021). Another well-known category of workflow scheduling approaches is based on meta-heuristic methods (Keshanchi et al., Citation2017; Xu et al., Citation2014), e.g. genetic algorithm (Srinivas & Patnaik, Citation1994) and other evolutionary algorithms.

Search based approaches were less considered in the previous workflow scheduling studies because of the search space explosion problem, especially when dealing with large scale workflows and parallel computing platforms. However, the recent development of advanced search methods has made search-based approaches a potentially promising direction for workflow scheduling. Monte Carlo Tree Search (MCTS) (Browne et al., Citation2012) is one of the most well-known advanced search methods, which has shown great performance in several applications domains (Chaslot et al., Citation2006).

Traditionally, MCTS was usually used in computer programs playing turn-based-strategy games, such as Go, chess, shogi, and poker. However, some previous research works have tried to apply it to solve specific optimization problems, such as production management (Chaslot et al., Citation2006), and found that it has the potential to outperform commonly used heuristic-based algorithms. Recently, MCTS was applied to solve the workflow scheduling problem in (Liu et al., Citation2019). In addition to applying the traditional MCTS method, they also developed a pruning algorithm to reduce the search space in MCTS for improving search efficiency and effectiveness. The proposed approach leverages the idea of branch-and-bound and is called MCTS-BB (Liu et al., Citation2019).

The experimental results in (Liu et al., Citation2019) show that MCTS outperforms well-known heuristic algorithms for workflow scheduling, including HEFT, PEFT and CPOP (Topcuoglu et al., Citation2002). However, the experimental results also reveal that in general, the performance of MCTS-BB is worse than the original MCTS approach, indicating that the proposed pruning algorithm is not as effective as expected. The pruning algorithm in (Liu et al., Citation2019) relies on the makespan of the current partial schedule and the length of the critical path among unscheduled tasks to determine the upper bound guiding the search process. In this paper, we present new pruning and heuristic-guiding mechanisms to further improve the performance of MCTS in workflow scheduling. Experimental results show that our new approaches can achieve better performance than the original MCTS and MCTS-BB methods proposed in (Liu et al., Citation2019).

The remainder of this paper is organized as follows. In the next section, we present the formulation of the workflow scheduling problem. Section III discusses related works of heuristic-based and guided random-search-based workflow scheduling methods. Section IV illustrates how MCTS could be applied to solve the workflow scheduling problem and presents the MCTS-BB methods in (Liu et al., Citation2019). Section V describes our new heuristic guided MCTS approach for workflow scheduling. Section VI presents the overall performance evaluation of our improved MCTS approach for workflow scheduling with a comparison to well-known previous methods in the literature. Section VII concludes this paper and sheds some light on promising future research directions.

II. Workflow scheduling problem

In most workflow scheduling research works (Arabnejad & Barbosa, Citation2014; Liu et al., Citation2019; Topcuoglu et al., Citation2002), a workflow is usually represented by a Directed Acyclic Graph (DAG), i.e. $G = (V, E)$ , for describing the inter-task structure, where V represents the set of nodes and $E$ is the set of edges. Figure shows a workflow example represented by a DAG. Each node in the workflow represents a computational task with the annotated value indicating the required execution time. The directed edges between tasks represent the computational or data interdependence among them and thus reduce potential task parallelism and enforce specific execution order of all tasks. The number next to an edge indicates the required data transfer time between tasks. The workflow scheduling problem on a parallel computing platform is to determine the start time and allocated processor for each task in a workflow, aiming for optimizing a specific goal, e.g. execution makespan.

Figure 1. Workflow structure represented by a DAG.

When speed heterogeneity of processors is considered in workflow scheduling, the required execution time for each task in the DAG is usually assigned to the average value of execution time across all processors in the system (Arabnejad & Barbosa, Citation2014; Topcuoglu et al., Citation2002) as follows, where $w_{i, j}$ is the required execution time for task $i$ on processor $j$ and there are total $q$ processors in the system. (1) $\begin{aligned} \bar{w_{i}} = \sum_{j = 1}^{q} \frac{w_{i, j}}{q} \end{aligned}$ (1) For two tasks $i, j$ executed on processor $m, n$ respectively, the required communication time $c_{i, j}$ for the edge connecting nodes $n_{i}$ and $n_{j}$ in the DAG is usually measured by summing up the latency $L_{m, n}$ and the data transfer time which could be calculated by dividing the data size $dat a_{i, j}$ by the data transfer rates $B_{m, n}$ , i.e. network bandwidth, between the two processors running tasks i and j. (2) $\begin{aligned} c_{i, j} = L_{m, n} + \frac{dat a_{i, j}}{B_{m, n}} \end{aligned}$ (2)

When a heterogeneous computing environment is concerned, the weight on an edge $c_{i, j}$ in the workflow’s DAG could be represented by the average communication time of transfer the data of size $dat a_{i, j}$ across all possible network links in the system as follows. (3) $\begin{aligned} \bar{c_{i, j}} = \bar{L} + \frac{dat a_{i, j}}{\bar{B}} \end{aligned}$ (3) The Earliest Start Time (EST) and Earliest Finish Time (EFT) of a task is the earliest possible time when it can start and would finish its execution, respectively. EST and EFT are dependent on both data and resource availability, while EFT also depends on the task’s workload. A workflow is assumed to have exactly one entry node without any precedent tasks and one exit node with no following tasks. EST of the entry node is defined to be zero. (4) $\begin{aligned} EST (n_{entry,} p_{j}) = 0 \end{aligned}$ (4)

The EST’s of other non-entry nodes allocated onto specific processors are computed recursively from the entry node as follows, where $avail [j]$ indicates the earliest available time of processor $j$ and $A F T ()$ is a function returning the actual finish time of a task in a partial schedule. (5) $\begin{aligned} EST (n_{i,} p_{j}) = max {avail [j], max_{n_{m} \in pred (n_{i})} (A F T (n_{m}) + c_{m, i})}, \end{aligned}$ (5)

The EFT of a task on a specific processor is the summation of its EST and required execution time on the processor. (6) $\begin{aligned} EFT (n_{i,} p_{j}) = w_{i, j} + EST (n_{i,} p_{j}) \end{aligned}$ (6) The makespan of a workflow is defined to be the actual finish time of the exit task, which is a common optimization goal of workflow scheduling algorithms.

III. Related works

Minimization of execution makespan has long been an important research topic in workflow scheduling research. Recently, as new parallel and distributed computing platforms emerge, such as cloud and fog computing, there are also research works exploring new issues on such platforms (Li et al., Citation2021; Wang & Zuo, Citation2021; Wu et al., Citation2020; Yadav et al., Citation2021, Citation2022). Some of the recent research works also deal with multi-objective scheduling problems, considering extra optimization goals, such as energy-saving, in addition to workflow execution makespan (Hu et al., Citation2021; Wu et al., Citation2020; Yadav et al., Citation2021, Citation2022).

In this paper, we focus on the most fundamental workflow scheduling problem, which aims to minimize workflow execution makespan on a parallel processor platform. Most previously proposed approaches for such scheduling problems can be roughly classified into two categories: heuristic-based (Arabnejad & Barbosa, Citation2014; Djigal et al., Citation2021; Topcuoglu et al., Citation2002) and guided random-search-based (Keshanchi et al., Citation2017) methods.

A. HEFT

Heterogeneous Earliest Finish Time (HEFT, Topcuoglu et al., Citation2002) is one of the most well-known heuristic-based workflow scheduling algorithms, featuring its simplicity, efficiency, and effectiveness. HEFT starts with a task prioritizing phase for calculating task priorities using an upward ranking mechanism, followed by a processor selection phase for allocating each task onto the most suitable processor to minimize its finish time. The upward rank of a task is calculated recursively as follows. (7) $\begin{aligned} ran k_{u} (n_{i}) = \bar{w_{i}} + max_{n_{j} \in s u c c (n_{i})} {\bar{c_{i, j}} + ran k_{u} (n_{i})} \end{aligned}$ (7)

B. PEFT

Predict Earliest Finish Time (PEFT, Arabnejad & Barbosa, Citation2014) is an efficient workflow scheduling algorithm improved from HEFT. PEFT adopts a lookahead mechanism by introducing an Optimistic Cost Table (OCT). PEFT outperforms HEFT as shown in (Arabnejad & Barbosa) while retaining the same computational time complexity as HEFT, i.e. $O (v^{2} \cdot)$ . Instead of the upward rank used in HEFT, PEFT ranks tasks using the OCT value defined as follows, which estimates the minimum processing time of the longest path from the current task to the exit task considering that the current task is allocated onto a specific processor. The OCT values of the exit node on all processors are defined to be zero. (8) $\begin{aligned} OCT (t_{i}, p_{k}) = max_{t_{j} \in s u c c (t_{i})} [min_{P_{w} \in P} {OCT (t_{j}, p_{w}) + w (t_{j}, p_{w}) + \bar{c_{i, j}}}], \\ \bar{c_{i, j}} = 0 if p_{w} = p_{k} \end{aligned}$ (8)

Based on the OCT values, the rank of each task is calculated by (9) $\begin{aligned} ran k_{OCT} (t_{i}) = \frac{\sum_{k = 1}^{P} OCT (t_{i}, p_{k})}{| P |} \end{aligned}$ (9) In the processor selection phase, PEFT uses O_EFT instead of the EFT mechanism in HEFT, which is defined as (10) $\begin{aligned} O_{EFT} (t_{i}, p_{j}) = EFT (t_{i}, p_{j}) + OCT (t_{i}, p_{j}) \end{aligned}$ (10)

C. ResourceAwareEFT

A Resource Aware Earliest Finish Time algorithm was proposed in (ResourceAwareEFT, Chen & Huang, Citation2019, July) for workflow scheduling, which is also an improved algorithm based on HEFT. While adopting the same EFT mechanism as HEFT in the processor selection phase, in the task prioritizing phase, ResourceAwareEFT ranks tasks based on the observation of different DAG structure patterns. Figure shows the four major patterns utilized in ResourceAwareEFT, which have different rank calculation mechanisms as defined in equations (11), (14), (15), (16), (17), respectively.

Figure 2. Common inter-task relationship in workflow structure (Chen & Huang).

For the pattern in Figure (a), the communication cost between tasks A and B is ignored since both tasks involve only one single edge. The $ran k_{spc} (t_{i})$ of this structure pattern is defined by (11) $\begin{aligned} ran k_{spc} (t_{i}) = 0 \end{aligned}$ (11) In (12) and (13), $| ParentTask (t_{B}) |$ is the number of parents for $t_{B}$ and $| ChildrenTask (t_{A}) |$ is the number of children for $t_{A}$ . $| P |$ represents the total number of processors in the parallel system. (12) $\begin{aligned} a & = min (| ChildrenTask (t_{A}) |, | P |) \end{aligned}$ (12) (13) $\begin{aligned} b & = min (| ParentTask (t_{B}) |, | P |) \end{aligned}$ (13) Figure (b) shows a join pattern among tasks. The $ran k_{spc} (t_{i})$ of this structure pattern is defined by (14) $\begin{aligned} ran k_{spc} (t_{i}) = \frac{b - 1}{b} \times \bar{c_{A, B}} \end{aligned}$ (14)

Figure (c) illustrates a fork pattern among tasks. The $ran k_{spc} (t_{i})$ of this structure pattern is defined by (15) $\begin{aligned} ran k_{spc} (t_{i}) = \frac{a - 1}{a} \times \bar{c_{A, B}} \end{aligned}$ (15) Figure (d) shows a more complicated mixed pattern of Figure (a–c). The $ran k_{spc} (t_{i})$ of this structure pattern is defined by (16) $\begin{aligned} ran k_{spc} (t_{i}) = \frac{a - 1}{a} \times \frac{1}{a} \times \frac{b - 1}{b} \times \bar{c_{A, B}} \end{aligned}$ (16)

Based on the above definitions, the task rank in ResourceAwareEFT is calculated as follows. (17) $\begin{aligned} ran k_{STR} (n_{i}) = \bar{w_{i}} + max_{n_{j} \in succ (n_{i})} {\begin{matrix} ran k_{SPC} (n_{j}) \\ + ran k_{STR} (n_{j}) \end{matrix}} \end{aligned}$ (17)

D. IPPTS

Improved Predict Priority Task Scheduling (IPPTS) was proposed in (Djigal et al., Citation2021), which is the latest heuristic-based workflow scheduling algorithm. IPPTS maintains the same time complexity as previous well-known heuristic algorithms (Arabnejad & Barbosa, Citation2014; Topcuoglu et al., Citation2002; Xie et al., Citation2015), while outperforming them in terms of workflow execution makespan.

IPPTS is an improved version of PPTS (Djigal et al., Citation2019) based on a Predict Cost Matrix (PCM). The PCM equation (18) is similar to the OCT equation (8) in PEFT (Arabnejad & Barbosa, Citation2014), except that PCM takes into consideration the computation cost of the current task. (18) $\begin{aligned} PCM (t_{i}, p_{k}) = max_{t_{j} \in s u c c (t_{i})} [min_{P_{w} \in P} {P C M (t_{j}, p_{w}) + w (t_{i}, p_{w}) + w (t_{j}, p_{w}) + \bar{c_{i, j}}}], \\ \bar{c_{i, j}} = 0 i f p_{w} = p_{k} \end{aligned}$ (18)

In the task prioritization phase, the average PCM value of each task is denoted by $ran k_{PCM} (t_{i})$ and defined as follows. (19) $\begin{aligned} ran k_{PCM} (t_{i}) = \frac{\sum_{k = 1}^{P} PCM (t_{i}, p_{k})}{| P |} \end{aligned}$ (19) The rank of each task, denoted by $Prank (t_{i})$ , is computed by multiplying $ran k_{PCM} (t_{i})$ with its out-degree $outd (t_{i})$ . (20) $\begin{aligned} Prank (t_{i}) = ran k_{PCM} (t_{i}) * outd (t_{i}) \end{aligned}$ (20)

In the processor selection phase, IPPTS uses Looking Ahead Earliest Finish Time $Lhea d_{eft}$ instead of the EFT mechanism (Sinnen, Citation2007). To compute $Lhea d_{eft}$ , the Looking Head Exit Time LHET and EFT are involved. The $LHET (t_{i}, p_{j})$ and $Lhea d_{eft} (t_{i}, p_{j})$ are defined by (21) $\begin{aligned} LHET (t_{i}, p_{j}) & = PCM (t_{i}, p_{j}) - w (t_{i}, p_{j}) \end{aligned}$ (21) (22) $\begin{aligned} Lhea d_{eft} (t_{i}, p_{j}) & = EFT (t_{i}, p_{j}) + LHET (t_{i}, p_{j}) \end{aligned}$ (22)

E. Genetic Algorithm

The Genetic Algorithm (GA, Srinivas & Patnaik, Citation1994) is a kind of evolutionary algorithm usually used for optimization problems. Previous works (Keshanchi et al., Citation2017; Sardaraz & Tahir, Citation2020; Xu et al., Citation2014) have applied GA to workflow scheduling, where a gene in a chromosome represents a task and the content of each chromosome is a possible task execution sequence for a specific workflow as shown in Figure . A population contains a pre-defined number of chromosomes representing various possible workflow execution schedules.

Figure 3. GA chromosome structure for workflow scheduling.

The approach proposed in (Keshanchi et al., Citation2017) starts with population initialization based on both heuristic and random chromosome generation. In the first step, heuristic chromosome generation is applied, where the schedules produced by heuristic workflow scheduling algorithms, such as HEFT-B (Topcuoglu et al., Citation2002), HEFT-T (Topcuoglu et al.,), and HEFT-L (Xu et al., Citation2014) are set as seed chromosomes. Then, in the second step of random chromosome generation, more chromosomes are randomly generated until the pre-defined population size is achieved.

After population initialization, the GA approach will enter a loop of three essential operations, i.e. selection, crossover, and mutation, until a pre-defined termination condition is reached. In the selection phase, chromosomes will be randomly selected for performing the crossover and mutation operations on them. In the crossover phase, the elitist chromosome is the chromosome with the best fitness value, which will be marked and reserved with no update. Other chromosomes will be selected for the one-point crossover or two-point crossover operations to produce offsprings as new chromosomes. Figure shows the one-point crossover operation used in (Keshanchi et al., Citation2017).

Figure 4. Single-point crossover operation.

The fitness of a chromosome is the resultant execution makespan of the schedule represented by it. In (Keshanchi et al., Citation2017), the execution makespan of a chromosome is calculated by applying the Earliest Finish Time (EFT, Sinnen, Citation2007) processor allocation policy to the task execution sequence represented in it. Therefore, a smaller value indicates better fitness.

In the mutation phase, chromosomes will be modified according to the mutation rule. For example, in (Xu et al., Citation2014) a task-swapping mutation was used, where a mutation point and a random point in a chromosome will be selected randomly for swapping the corresponding two tasks represented by the two genes if such swapping would not violate the inter-task dependence. The task-swapping mutation operation is shown in Figure .

Figure 5. Mutation operation – task swapping.

Figure 6. Mutation operation – processor allocation modification.

In our implementation of the GA-based workflow scheduling approach used in this paper for performance comparison with MCTS approaches, two chromosomes are used for representing a workflow execution schedule, one for task execution sequence and the other for processor allocation, as shown in Table 6. Therefore, when performing the mutation operation, in addition to task swapping, the processor allocation of the genes on the selected mutation and random points will also be changed randomly as shown in Table 6.

At the end of each iteration in the loop, the fitness value of each chromosome will be calculated. The termination condition of the loop is defined to be that all chromosomes have the same fitness. The elitist chromosome after the loop finishes will be returned as the best workflow execution schedule found by the GA approach. The entire GA workflow scheduling process is shown in Figure .

Figure 7. The process of GA-based workflow scheduling.

IV. Workflow scheduling based on MCTS

A. MCTS applied to workflow scheduling

Monte Carlo Tree Search (MCTS) is a heuristic searching algorithm consisting of four major steps. The entire search process of MCTS can be illustrated in Figure . When we apply MCST to workflow scheduling, each node in the MCTS tree represents a probable partial or complete schedule of a workflow to be executed on a set of processors. Nodes with complete schedules are called terminal nodes. The children of a node represent all possibilities of choosing an unscheduled ready task and allocating it to one of the processors. The root is a special node with an empty schedule where no tasks have been scheduled. MCTS starts with a single root node, and the tree gradually grows as illustrated in Figure as the search process proceeds.

Figure 8. Monte Carlo Tree Search (Browne et al., Citation2012).

The four major steps in MCTS for workflow scheduling are described as follows.

Selection. Starting from the root node, a child is recursively selected at each visited node to proceed according to a selection policy until reaching a leaf node.
Expansion. If the selected leaf node in the previous step is not a terminal node, all its child nodes would be created and their visit counts and reward values are initialized. Then, one of the expanded child nodes is randomly selected for the following simulation step.
Simulation. Starting with the chosen node, a simulation strategy is used to extend the current partial schedule to a complete schedule and calculate the execution makespan of the resultant schedule.
Backpropagation. Updating the visit counts and reward values in the nodes on the path from the chosen node to the root node based on the complete schedule. The reward value in a node represents the shortest makespan ever found from the performed simulations going through it.

B. MCTS-BB

MCTS-BB is a recent MCTS approach for workflow scheduling, which is equipped with a pruning method based on the branch-and-bound concept. The proposed pruning method is used to prevent MCTS from exploring some unpromising subtrees. Tree pruning is performed in the selection step, where the selected maximum UCT node will be marked as disabled if it meets the pruning criterion, to prevent it from further exploration. Once tree pruning occurs, the search process would start over again from the root node. Otherwise, the search process will continue as shown in Algorithm 1.

Table

An improved Monte Carlo Tree Search approach to workflow scheduling

Abstract

I. Introduction

II. Workflow scheduling problem

III. Related works

A. HEFT

B. PEFT

C. ResourceAwareEFT

D. IPPTS

E. Genetic Algorithm

IV. Workflow scheduling based on MCTS

A. MCTS applied to workflow scheduling

B. MCTS-BB

V. Our improved MCTS approach for workflow scheduling

A. Pre-calculated upper bound of makespan

B. Estimated makespan of unscheduled tasks

C. Node selection policy

D. Node expansion strategy

E. Simulation strategy

VI. Performance evaluation

A. Experiment setup and performance metrics

Table 1. Parameter values for DAG generation.

Table 2. Parameter values for GA-based method.

B. Experimental results and discussions

C. Experiment set 1: simulation strategies

Table 3. Experiment 1: Normalized makespan for different workflow sizes – CCR = 0.1.

Table 4. Experiment 1: Normalized makespan for different workflow sizes – CCR = 1.

Table 5. Experiment 1: Normalized makespan for different workflow sizes – CCR = 10.

Table 6. Experiment 1: Normalized makespan for overall performance.

D. Experiment set 2: pruning methods

E. Experiment set 3: selection policies

F. Experiment set 4: expansion polices

Table 7. Experiment 4: Normalized makespan comparison – CCR = 0.1.

Table 8. Experiment 4: Normalized makespan comparison – CCR = 1.

Table 9. Experiment 4: Normalized makespan comparison – CCR = 10.

Table 10. Experiment 4: Overall normalized makespan comparison.

VII. Overall performance evaluation

Table 11. Normalized makespan – CCR = 0.1.

Table 12. Normalized makespan – CCR = 1.

Table 13. Normalized makespan – CCR = 10.

Table 14. Normalized makespan.

Table 15. Normalized makespan of real-world workflow applications.

Table 16. Average simulation counts for different workflow sizes.

Table 17. Comparison of MCTS tree structure.

VIII. Conclusion and future work

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date