Full article: Analysis of dependence of grey wolf optimizer to shift-transformations and its shift-invariant improved methods adaptively controlling the search areas

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

As a metaheuristic method for the continuous optimization problem, the grey wolf optimizer (GWO) has attracted much attention from researchers because the method is reported to be superior to other methods. However, some works show that the GWO is too specialized only to problems having the zero-optimal solution, which can lead to a significant deterioration of the efficiency for other problems. In this paper, we, first, theoretically prove the shift-dependence of the GWO, which is the underlying cause of the over-specialization of the GWO, and we experimentally analyze the property by using a larger number of problems. Secondly, we propose am shift-invariant GWO, GWO-SR, and, modify the GWO-SR by adding two methods: an adjustment technique the size of the search area and a mutation process to enhance the diversity of the search (GWO-AS) Finally, we show advantages of two proposed GWOs by comparing them with other metaheuristic methods.

KEYWORDS:

1. Introduction

The grey wolf optimizer (GWO) is one of the nature-inspired metaheuristic methods which exploits hunting techniques and social hierarchy of the grey wolves in order to solve global optimization problems [Citation1, Citation2]. The GWO has attracted significant attention from researchers because the method has been reported to have high performance in seeking high-quality solutions regardless its simple algorithm according to many studies [Citation3–8], and thus, it has been applied to a wide range of fields [Citation9–12].

However, a previous work [Citation13] shows that the GWO is not invariant to shift-transformations in the orthogonal coordinate representation of the optimization problem. In addition, the some works [Citation14, Citation15] show that the performance of the GWO is often considerably low when the optimal solution is far from the origin. In addition, a work [Citation16] showed that the search behavior of the GWO is too specialized for problems having the optimal solution at the origin through numerical experiments, and that the updating systems of the GWO widely depend on the distance of the origin to the best solution found so far. The property can lead to a significant deterioration of the efficiency of the search process for other problems whose optimal solution is not the origin. Thus, many improved GWOs, which have been investigated [Citation3–8], can have the same drawbacks if its updating systems are not shift-invariant similar to the original GWO. Furthermore, in the work, authors proposed a shift-invariant method GWO with reference points (GWO-R) to overcome the drawback, in which grey wolves are updated by shift-invariant systems exploiting the reference points instead of the origin.

In this paper, we first analyze the underlying causes of too specialized properties of the GWO in more detail. We theoretically show that the GWO is not invariant to shift-transformations, and experimentally clarify ineffective search of the GWO due to the property by using more shifted problems than in work [Citation16]. These results show that the performance evaluations of the GWO in many studies are not fair for a general-purpose metaheuristic method for global optimization problems.

Secondly, we propose a modified GWO-R, GWO-SR, to improve the insufficient intensive search of GWO-R in the final stages, in which the size of search areas are adjusted more reasonably with respect to the distance between the reference point and the best solutions. Then, we experimentally show that the search is effective if the reference point is chosen appropriately. Moreover, we improve the GWO-SR by adding an adjusting method of search areas and a mutation process in order to strengthen the diversity of the search, which is called the GWO with adjustment of the search area (GWO-AS). Finally, we show the good performance of the proposed GWOs, GWO-SR and GWO-AS, by comparing them with other metaheuristic methods such as particle swarm optimization (PSO) [Citation17, Citation18] and firefly algorithm (FA) [Citation19, Citation20].

This paper is organized as follows. In Section 2, we introduce the GWO, and in Section 3, we analyze the property of its search process, and point out the shift-dependence and the over-specialization of its search process. In Sections 4 and 5, we propose new modified GWOs exploiting a reference point or an adjustment of search area and a mutation process, which overcome the above drawbacks. In Section 6, through numerical experiments, we verify the effectiveness of the proposed methods. Finally, we conclude this paper in Section 7.

2. Grey wolf optimizer

The GWO is one of metaheuristic methods for continuous global optimization, in which the hunting technique and the social hierarchy of grey wolves are mathematically exploited in order to solve global optimization problems [Citation1, Citation2]. Regardless of its simple algorithm, the GWO is reported to have a high performance of finding desirable solutions of the optimization. In this section, we introduce the standard GWO.

Throughout this paper, we mainly consider the following continuous optimization problems having many local minima and the rectangular constraint. $\begin{aligned} (P1) min_{x} f (x) s.t. x \in X := \prod_{j = 1}^{n} [x_{j}^{l}, x_{j}^{u}] . \end{aligned}$ Here, $f : ℜ^{n} \to ℜ$ denotes the objective function to be minimized, and X $\subset ℜ^{n}$ does the feasible region. In order to solve the problem, the GWO uses a number of search agents called grey wolves. The position of each grey wolf i, $i \in W$ denotes a candidate solution of the problem, which are represented as $x^{(i)} (t)$ $\in ℜ^{n}$ at iteration t, where W denotes an index set of all grey wolves. These positions are updated in a way that models the social hierarchy and tracking, encircling, and attacking a prey of the grey wolves as follows: (1) $\begin{aligned} d^{(p, i)} (t) := | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x^{(i)} (t) |, \end{aligned}$ (1) (2) $\begin{aligned} Δ x^{(p, i)} (t) := a (t) \otimes (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (2) (3) $\begin{aligned} x^{(i)} (t + 1) := x^{(i)} (t) + \frac{1}{3} \sum_{p \in {α, β, δ}} Δ x^{(p, i)} (t), \end{aligned}$ (3) where $x^{(α)} (t)$ , $x^{(β)} (t)$ and $x^{(δ)} (t)$ denote the first, second and third best solutions obtained by grey wolves until iteration t, respectively, which are called the alpha, beta and delta wolves. In addition, $1$ denotes the vector whose components are all one, and $| u |$ for a vector $u$ $\in ℜ^{n}$ denotes a vector whose components are $| u_{i} |$ , $i \in N := {1, \dots, n}$ , and ⊗ denotes the element-wise product of two vectors. The vector $a (t)$ $\in ℜ^{n}$ is a coefficient vector whose components are linearly decreased from 2 to 0 over the course of iterations, and components of $r^{(p, 1)}$ , $r^{(p, 2)}$ are random numbers which are independently and uniformly chosen from $(0, 1)$ for each p and i. In Matlab code [Citation21], the vector $a (t)$ is given by $a (t) = 2 (1 - t / t_{max}) 1$ , where $t_{max}$ denotes the maximal number of iterations. We also used the above linear function $a (t)$ in numerical experiments at 3.3.

In numerical experiments in this paper, in order to keep the feasibility of $x^{(i)} (t)$ , $i \in W$ , $x^{(i)} (t + 1)$ obtained in (Equation3(3) $\begin{aligned} x^{(i)} (t + 1) := x^{(i)} (t) + \frac{1}{3} \sum_{p \in {α, β, δ}} Δ x^{(p, i)} (t), \end{aligned}$ (3) ) are projected onto the feasible region X of (P1) for the original GWO and the proposed GWOs. In the GWO, the size of search area of each wolf i depends on $d^{(p, i)} (t)$ and $Δ x^{(p, i)} (t)$ by updating systems (Equation1(1) $\begin{aligned} d^{(p, i)} (t) := | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x^{(i)} (t) |, \end{aligned}$ (1) )–(Equation3(3) $\begin{aligned} x^{(i)} (t + 1) := x^{(i)} (t) + \frac{1}{3} \sum_{p \in {α, β, δ}} Δ x^{(p, i)} (t), \end{aligned}$ (3) ). In (Equation2(2) $\begin{aligned} Δ x^{(p, i)} (t) := a (t) \otimes (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (2) ), $‖ Δ x^{(p, i)} ‖$ is uniformly reduced by using $a (t)$ for all i $\in W$ , $p \in {α, β, δ}$ , which works to enhance the diversification of the search at the early stages and the intensification at the last stages according to only iteration t.

On the other hand, $d^{(p, i)}$ is determined by (Equation1(1) $\begin{aligned} d^{(p, i)} (t) := | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x^{(i)} (t) |, \end{aligned}$ (1) ) to adjust the search area for each grey wolf i according to its position or alpha, beta and delta wolves: When random variables $r_{j}^{(p, 1)}$ , $j \in N$ are all close to 0.5, then $d_{j}^{(p)} (t) = | x_{j}^{(p)} (t) - x_{j}^{(i)} (t) |$ and thus, the wolf i can search for solutions around $x^{(p)} (t)$ , $p \in {α, β, δ}$ . Conversely, when $r_{j}^{(p, 1)}$ , $j \in N$ are all close to 1 or 0, then $d^{(p, i)} (t) = | 2 x^{(p)} (t) - x^{(i)} (t) |$ or $d^{(p, i)} (t) = | - x^{(i)} (t) |$ , which means that wolf i can explore a wider area. In many studies, the GWO is reported to have excellent intensification ability, while it is relatively weak in the extensive search, and thus, various improvement methods have been proposed such as improving updating mechanisms, introducing new operators, encoding scheme of the solutions and social hierarchy of grey wolves [Citation3–8].

However, the GWO is not invariant to the shift-transformations and is too specialized only for problems having the zero-optimal solution, which can cause the ineffectiveness of the search for other problems. Therefore, in the next section, we point out its drawbacks and analyze the properties.

3. Analysis of the search behavior of GWO

3.1. Shift-dependence of original GWO

In this subsection, we show that the GWO is not invariant to shift-transformations in the orthogonal coordinate representation of the optimization problem, and compare it with other metaheuristic methods. Although the previous work [Citation13] shows that the GWO is not invariant to affine transformations, in this subsection, we show more precisely that the GWO is not invariant to shift-transformations as well as affine transformations.

First, let's consider the orthogonal coordinate system which is shifted by a constant vector $s \neq 0 \in ℜ^{n}$ from the original orthogonal coordinate system of (P1). In the shifted coordinate system, the position $x$ in the original coordinate system is represented as $\bar{x} = x + s$ . Then, the problem (P1) can be expressed in the shifted-coordinate system as follows: $\begin{aligned} (P2) min_{\bar{x}} f_{s} (\bar{x}) = f (\bar{x} - s) s.t. \\ \bar{x} \in \prod_{i = 1}^{n} [x_{i}^{l} + s_{i}, x_{i}^{u} + s_{i}], \end{aligned}$ where note that problems (P1) and (P2) are essentially the same except the coordinate system.

Now, we define the invariance of a metaheuristic method for shift-transformations in the coordinate representation of optimization problems. If the updating system is invariant to any shift-transformation, the metaheuristic method is called to be invariant, where for the sake of simplicity, we assume that all random variables used in the updating system are same in the original and shifted systems.

Definition 3.1

Suppose that for a constant vector $s \neq 0$ and all search agents i ∈I of a metaheuristic method, the positions $x^{(i)}$ and ${\bar{x}}^{(i)}$ of search agent i are feasible solutions for (P1) and (P2), respectively, such that (4) $\begin{aligned} {\bar{x}}^{(i)} = x^{(i)} + s, i \in I, \end{aligned}$ (4) where I denotes the index set of all search agents. In addition, $x_{new}^{(i)}$ and ${\bar{x}}_{new}^{(i)}$ are the next points which are obtained by updating $x^{(i)}$ and ${\bar{x}}^{(i)}$ , respectively. If $x_{new}^{(i)}$ and ${\bar{x}}_{new}^{(i)}$ satisfy the relation (5) $\begin{aligned} {\bar{x}}_{new}^{(i)} = x_{new}^{(i)} + s, i \in I, \end{aligned}$ (5) then the metaheuristic method is said to be shift-invariant.

The shift-invariance guarantees that the method can search for solutions in exactly the same way and independently to the shift-transformations. Conversely, if a metaheuristic method is not invariant to shift transformations, the search performance can vary depending on the choice of the coordinate representation of even the same problem, which can cause difficulties in executing an efficient search for each problem. Therefore, the property is significant with respect to the general purpose metaheuristic method. Moreover, many metaheuristic methods such as variations of PSO, FA, DE and others have the property because, in those methods, $x^{(i)} (t + 1) - x^{(i)} (t)$ is determined by only difference vectors between two points, which are independent of the vector $s$ . On the other hand, the GWO is not shift-invariant as follows:

Theorem 3.2

The GWO is not shift-invariant in the coordinate representation of the optimization problem.

Proof.

Now, we focus on a grey wolf i and assume that $x^{(i)}$ , ${\bar{x}}^{(i)}$ , $x^{(p)}$ and ${\bar{x}}^{(p)}$ , $p \in {α, β, δ}$ satisfy (Equation4(4) $\begin{aligned} {\bar{x}}^{(i)} = x^{(i)} + s, i \in I, \end{aligned}$ (4) ). In addition, we have that $r^{(p, 1)}$ and $r^{(p, 2)}$ $p \in {α, β, δ}$ are the same constant vectors in the different coordinate systems. Then, from (Equation1(1) $\begin{aligned} d^{(p, i)} (t) := | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x^{(i)} (t) |, \end{aligned}$ (1) ), we can derive that $\begin{aligned} d^{(p, i)} & = | 2 r^{(p, 1)} \otimes x^{(p)} - x^{(i)} |, \\ {\bar{d}}^{(p, i)} & = | 2 r^{(p, 1)} \otimes {\bar{x}}^{(p)} - {\bar{x}}^{(i)} | \\ = | (2 r^{(p, 1)} \otimes x^{(p)} - x^{(i)}) + (2 r^{(p, 1)} - 1) \otimes s) |, \\ p \in {α, β, δ} . \end{aligned}$ We can easily show that $d^{(p, i)} (t) \neq {\bar{d}}^{(p, i)} (t)$ unless random variables $r^{(p, 1)}$ are all 0.5 or $r_{j}^{(p, 1)} = (2 x_{j}^{(i)} + s_{j}) / (4 x_{j}^{(p)} + 2 s_{j})$ , $j \in N$ . In addition, from (Equation2(2) $\begin{aligned} Δ x^{(p, i)} (t) := a (t) \otimes (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (2) ) and (Equation3(3) $\begin{aligned} x^{(i)} (t + 1) := x^{(i)} (t) + \frac{1}{3} \sum_{p \in {α, β, δ}} Δ x^{(p, i)} (t), \end{aligned}$ (3) ), we have that (6) $\begin{aligned} x_{new}^{(i)} - {\bar{x}}_{new}^{(i)} & = x^{(i)} - {\bar{x}}^{(i)} \\ + \frac{1}{3} \sum_{p \in {α, β, δ}} (Δ x^{(p, i)} - Δ {\bar{x}}^{(p, i)}), \\ = s + \frac{1}{3} \sum_{p \in {α, β, δ}} a (t) \otimes (2 r^{(p, 2)} - 1) \\ \otimes (d^{(p, i)} - {\bar{d}}^{(p, i)}) . p \in {α, β, δ} . \end{aligned}$ (6) The second term of (Equation6(6) $\begin{aligned} x_{new}^{(i)} - {\bar{x}}_{new}^{(i)} & = x^{(i)} - {\bar{x}}^{(i)} \\ + \frac{1}{3} \sum_{p \in {α, β, δ}} (Δ x^{(p, i)} - Δ {\bar{x}}^{(p, i)}), \\ = s + \frac{1}{3} \sum_{p \in {α, β, δ}} a (t) \otimes (2 r^{(p, 2)} - 1) \\ \otimes (d^{(p, i)} - {\bar{d}}^{(p, i)}) . p \in {α, β, δ} . \end{aligned}$ (6) ) is not zero unless random variables $r^{(p, 2)}$ are all 0.5 or $d^{(p, i)} = {\bar{d}}^{(p, i)}$ , $p \in {α, β, δ}$ . As a result, we can see that (Equation5(5) $\begin{aligned} {\bar{x}}_{new}^{(i)} = x_{new}^{(i)} + s, i \in I, \end{aligned}$ (5) ) does not necessarily hold. Thus, the GWO is not invariant.

3.2. Search area control of GWO

Next, let us consider that the updating system of GWO is also too specialized only for optimization problems having the zero-optimal solution. In the previous work [Citation16] inefficient searches of the GWO were analyzed by the relations between the size of search area and the degree of degradation of search.

Figure 1. Relations of $d^{(p, i)}$ , $x^{(p)}$ and $x^{(i)}$ .

Now, we focus on the size of the search area of grey wolf i, $i \in W$ at iteration t. Since the size is mainly determined by $d^{(p, i)} (t) = | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x_{j}^{(i)} (t) |$ , $p \in {α, β, δ}$ , let us consider the upper bound ${\hat{d}}_{j}^{(p, i)}$ of jth component of $d^{(p, i)} (t)$ at iteration t, which is shown in Figure . Here, we define $u^{(p, i)} (t) := x^{(p)} (t) - x^{(i)} (t)$ . We can see that the jth component of $2 r^{(p, 1)} \otimes x^{(p)}$ is varied in $(0, 2 x_{j}^{(p)} (t))$ or $(2 x_{j}^{(p)} (t), 0)$ , $j \in N$ , which are represented as light blue areas in Figure . Therefore, the upper bound ${\hat{d}}_{j}^{(p, i)} (t)$ is given by (7) $\begin{aligned} {\hat{d}}_{j}^{(p, i)} (t) & = max {| x_{j}^{(i)} (t) |, | 2 x_{j}^{(p)} (t) - x_{j}^{(i)} (t) |} \\ = max {| x_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) |, \\ + | x_{j}^{(p)} (t) + u_{j}^{(p, i)} (t) |} \end{aligned}$ (7) (8) $\begin{aligned} = {\begin{cases} x_{j}^{(p)} (t) + | u_{j}^{(p, i)} (t) | & if x_{j}^{(p)} (t) \geq 0, \\ - x_{j}^{(p)} (t) + | u_{j}^{(p, i)} (t) | & if x_{j}^{(p)} (t) < 0, \end{cases} \\ = | x_{j}^{(p)} (t) | + | u_{j}^{(p, i)} (t) |, j \in {1, \dots, n} . \end{aligned}$ (8) From the results we can see that $‖ d^{(p, i)} (t) ‖$ increases as $‖ x^{(p)} (t) ‖$ or $‖ u^{(p)} (t) ‖$ increases. It is reasonable to choose a large $‖ d^{(p, i)} (t) ‖$ for a large $‖ u^{(p, i)} (t) ‖$ because the solution $x^{(i)} (t)$ which is far from the three best solutions $x^{(p)} (t)$ , $p \in {α, β, δ}$ should be changed drastically. However, it is not well-founded to choose a large $‖ d^{(p, i)} (t) ‖$ for a large $‖ x^{(p)} (t) ‖$ because $‖ x^{(p)} (t) ‖$ just represents the distance from the origin, which is simply a reference point in the coordinate representation, and thus, $‖ x^{(p)} (t) ‖$ does not contain any useful information for a search. This property can cause an inefficient search, which is discussed in Ref. [Citation16].

From these results, we can see that when the problem has the global optimal solution at the origin, an intensive search around the origin, and an extensive search far from the origin can effectively work in the GWO. On the other hand, when the optimal solutions are far from the origin, the intensive and extensive searches cannot be expected to work so effectively. In the next subsection, we experimentally clarify the ineffective search of the GWO due to the property by using a large number of shifted problems in more detail than in work [Citation16].

3.3. Numerical experiments

In this subsection, we investigate how search efficiency of GWO for (P2) changes as the shift vector $s$ is varied through numerical experiments, which analyzes the quantitative relevance such as the relationship between the size of the shift vector and the mean size of the search areas, the means of $‖ d^{(α, i)} (t, l) ‖$ and $‖ Δ x^{(α, i)} (t, l) ‖$ , (9) $\begin{aligned} d_{m}^{(α)} (t) & = \sum_{l = 1}^{50} \sum_{i \in W} \frac{‖ d^{(α, i)} (t, l) ‖}{50 | W |}, \\ Δ x_{m}^{(α)} (t) & = \sum_{l = 1}^{50} \sum_{i \in W} \frac{‖ Δ x^{(α, i)} (t, l) ‖}{50 | W |}, \end{aligned}$ (9) on 50 trials at iteration t, where $d^{(α, i)} (t, l)$ and $Δ x^{(α, i)} (t, l)$ denote vectors obtained at iteration t on trial l, respectively. In addition, we observed the mean of the best function values obtained on 50 trials at iteration t, (10) $\begin{aligned} f_{m}^{(α)} (t) = \sum_{l = 1}^{50} \frac{f_{s} (x^{(α)} (t, l))}{50}, \end{aligned}$ (10) where $x^{(α)} (t, l)$ denotes the best solution obtained at iteration t on trial l. The maximal number of iteration $t_{\max}$ is 5000, the number of wolves $| W |$ was 80, and we used $a (t) = 2 (1 - t / t_{max}) 1$ as mentioned at Section 2.

We used 18 basic functions of CEC'17 benchmark problems [Citation22] (C1: BentCigar, C2: Sum of Different Power, C3: Zakharov, C4: Rosenbrock, C5: Rastrigin, C6: Expanded Schaffer F6, C7: Levy, C8: Modified Schwefel, C9: High Conditioned Elliptic, C10: Discus, C11: Ackley, C12: Weierstrass, C13: Griewank, C14: Katsuura, C15: HappyCat, C16: HGBat, C17: Expanded Griewank plus Rosenbrock, C18: Schaffer F7 ). In the experiments, relatively simple functions were used to compare the behavior of the original GWO for shift transformations. All functions have a hypercube constraint whose sides are 100, and functions C4, C15–C17 have the optimal solution at $1$ or $- 1$ $\in ℜ^{n}$ , while others have the zero-optimal solution. In addition, we used other two problems: O1:2n-minima Func. and O2:Schwefel Func., which have the optimal solution far from the origin as shown in Table . All problems have a hypercube constraint whose side is the same constant $c_{s}$ , namely, $x_{i}^{l} = x_{i}^{u} = c_{s}$ , $i \in N$ . As a shift vector $s$ , four vectors $ρ c_{s} u$ , ρ ∈ ${0.0, 0.1, 0.5, 0.9}$ were used, where $u$ is a randomly selected unit vector, and thus, ρ determines the relative size of $s$ toward $c_{s}$ . Note that (P2) with $ρ = 0$ is equivalent to (P1).

Table 1. Benchmark problems.

Display Table

Figure depicts the results of seven problems for which typical transitions of each of (Equation9(9) $\begin{aligned} d_{m}^{(α)} (t) & = \sum_{l = 1}^{50} \sum_{i \in W} \frac{‖ d^{(α, i)} (t, l) ‖}{50 | W |}, \\ Δ x_{m}^{(α)} (t) & = \sum_{l = 1}^{50} \sum_{i \in W} \frac{‖ Δ x^{(α, i)} (t, l) ‖}{50 | W |}, \end{aligned}$ (9) ) and (Equation10(10) $\begin{aligned} f_{m}^{(α)} (t) = \sum_{l = 1}^{50} \frac{f_{s} (x^{(α)} (t, l))}{50}, \end{aligned}$ (10) ) were observed. They are selected from all results obtained by the GWO to 20 benchmark problems with four shift vectors, respectively.

Figure 2. Means of $‖ Δ x^{(α, i)} (t) ‖$ , $‖ d^{(α, i)} (t) ‖$ , and $f (x^{(α)} (t))$ for four shift vectors.

Furthermore, Table indicates the mean function values and their standard deviations (SD) obtained by GWO for 20 benchmark problems with four shift vectors on 50 trials, where the first and second best values of the means or SDs obtained for the same problem with four shift vectors are indicated by bold and italic numbers, respectively.

Table 2. Comparison of results obtained by GWO for 20 benchmark problems (P1) and (P2) with three kinds of shift vectors ( $ρ = 0.1, 0.5, 0.9$ ).

Display Table

First, we focus on the results of problems C1–C18 which have the optimal solution at the origin or close to the origin. From the figure and table, we can see in most of problems with a small ρ, the GWO can find exactly the global optimal solutions or considerably high-quality solutions, and that $‖ Δ x^{(α, i)} (t) ‖$ and $‖ d^{(α, i)} (t) ‖$ are large at early stages of the search, gradually decrease and finally approach zero, which shows that the adjustment of the size of search area is appropriate. On the other hand, GWO finds considerably low-quality solutions for the problems with a large ρ. In this case, although $‖ Δ x^{(α, i)} (t) ‖$ and $‖ d^{(α, i)} (t) ‖$ are decreasing, the rate of decrease becomes slower as ρ becomes larger. In particular, $‖ d^{(α, i)} (t) ‖$ is relatively large until the final stage, which means that the search is not sufficiently intensified. Moreover, the function value does not decrease as smoothly as the problem with smaller ρ, and the function values obtained finally are considerably worse for the problem with a larger ρ. These results show that adjustment of the search area is not appropriate, which are consistent with the previous discussion.

Next, let us consider the two problems, O1 and O2, in which the optimal solution is far from the origin. In these cases, the mean of obtained function values is smallest for (P2) with $σ = 0.9$ , and the obtained solutions are not overwhelmingly high quality, which are considerably different from the case for problems having the zero-optimal solution. In these cases, we find that the search of GWO is not so efficient regardless of the shift transformations.

From the above observations, the following conclusions can be derived. Since, in general, many global optimization problems can be expected to have many local solutions other than the origin, the technique of adjusting the search area in GWO as mentioned above can be regarded as a fatal drawback as a metaheuristic method, and it is difficult to find high-quality solutions by the GWO to many real-world problems. Therefore, in the subsequent sections, we propose modified GWOs to overcome the drawbacks which were pointed out in this section.

4. Shift-invariant GWO

4.1. Modification of GWO-R

In this subsection, we first introduce the GWO-R, which was proposed in Ref. [Citation16], and show that it is shift-invariant to overcome the drawback of the original GWO. Secondly, we theoretically and experimentally point out that the intensification of the GWO-R is insufficient in the final stages and propose its modified method such that its search areas are more reasonably controlled than GWO-R. Then, we show theoretically the shift-invariance of the GWO-R and the modified GWO-R.

In GWO-R, the point $μ (t)$ called a reference point is used to adjust adaptively the search area of each grey wolf i $\in W$ on the basis of the distance of $μ (t)$ to $x^{(p)} (t)$ , $p \in {α, β, δ}$ or $x^{(i)} (t)$ : (11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) (12) $\begin{aligned} ξ^{(p, i)} (t) := x^{(i)} (t) + 2 b (t) (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (12) (13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) where (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) ) was derived by replacing $2 r^{(p, 1)}$ with $c_{m} b (t) r^{(p, 1)}$ in (Equation1(1) $\begin{aligned} d^{(p, i)} (t) := | 2 r^{(p, 1)} \otimes x^{(p)} (t) - x^{(i)} (t) |, \end{aligned}$ (1) ), $b (t)$ is given by $b (t) = 1 - t / t_{max}$ , and $c_{m}$ is a positive value for adjusting the search area. The GWO using (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) )–(Equation13(13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) ) is called GWO with reference points (GWO-R).

The reference point $μ (t)$ was selected by using information obtained until iteration t through the search process. As the reference point, three candidates were proposed as follows:

Centroid of all pbests: $μ^{(1)} (t) = \frac{1}{| W |} \sum_{i \in W} p^{(i)} (t)$ ,
Centroid of all grey wolves: $μ^{(2)} (t) = \frac{1}{| W |} \sum_{i \in W} x^{(i)} (t)$ ,
Centroid of three best solutions: $μ^{(3)} (t) = \frac{1}{3} (x^{(α)} (t) + x^{(β)} (t) + x^{(δ)} (t))$ ,

where $p^{(i)} (t)$ , $i \in W$ denotes the personal bests (pbests), which is the best solution found by grey wolf i until iteration t as follows: $\begin{aligned} p^{(i)} (t) = argmin {f (x) | x \in {x^{(i)} (τ)}_{τ \in {1, \dots, t}}} . \end{aligned}$ The pbests have been used in PSO, which exploits information obtained by each search agent. In addition, as reference points, authors introduced internally dividing point of two reference points, $μ^{(i)} (t)$ and $μ^{(j)} (t)$ , selected from the three kinds of candidates, in which its weights vary linearly as follows: $\begin{aligned} μ^{(i, j)} & = \frac{1}{t_{max}} ((t_{max} - t) μ^{(i)} (t) + t μ^{(j)} (t)), \\ i \neq j \in {1, 2, 3} . \end{aligned}$ By using these kinds of reference points, updating systems (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) )–(Equation13(13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) ) of the GWO-R are shift-invariant, as shown at the end of this subsection,

Moreover, in Ref. [Citation16], the relations between the upper bound of $d^{(p, i)} (t)$ and the difference vector between the best solution and the reference point, namely $v^{(p)} := x^{(p)} (t) - μ (t)$ , were derived in the similar way in 3.2 as follows: (14) $\begin{aligned} {\hat{d}}_{j}^{(p, i)} (t) & = max {| v_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) |, \\ | (2 c_{m} b (t) - 1) v_{j}^{(p)} (t) + u_{j}^{(p, i)} (t) |}, \end{aligned}$ (14) The relations indicate that the search process can be classified into three stages: in the first stage the upper bound ${\hat{d}}_{j}^{(p, i)} (t)$ increases with a large proportional constant to $| v_{j}^{(p)} |$ , which can be considered to be an extensive search, and in the second and third stages, it increases with a small proportional constant, which means an intensive search. These results show that independently of the distance between the optimal solution and the origin, GWO-R gradually shifts from diverse to intensive search as the search progresses. At the same time, the effective search by GWO-R was reported through numerical experiments in Ref. [Citation16].

Furthermore, we can observe that in the first and second stages, under the assumption that $u^{(p, i)} (t) = x^{(p)} (t) - x^{(i)} (t)$ is fixed, ${\hat{d}}_{j}^{(p, i)} (t)$ is minimal when $x_{j}^{(p)} (t) = μ_{j} (t)$ . The observation means that the search area is minimal when the best solution is equal to the reference point. Such behavior is reasonable because if additionally $u^{(p)} (t) = 0$ holds, namely, $x^{(i)} (t) = x^{(p)} (t) = μ (t)$ , the size of the search area is zero.

On the other hand, under the same assumption, only in the third stage, ${\hat{d}}_{j}^{(p, i)} (t)$ is not minimal even if $x_{j}^{(p)} (t) = μ_{j} (t)$ , which is not reasonable. In such cases, the detailed search may be insufficient.

Therefore, in this paper, we modify the updating system (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) ) such that ${\hat{d}}_{j}^{(p, i)} (t)$ is minimal when $x^{(p)} (t) = μ (t)$ as follows: (15) $\begin{aligned} d^{(p, i)} (t) & := | (c_{m} b (t) r^{(p, 1)} + 1) \\ \otimes (x^{(p)} (t) - μ (t)) - (x^{(i)} (t) - μ (t)) | . \end{aligned}$ (15) The GWO with updating systems (Equation12(12) $\begin{aligned} ξ^{(p, i)} (t) := x^{(i)} (t) + 2 b (t) (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (12) ), (Equation13(13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) ) and (Equation15(15) $\begin{aligned} d^{(p, i)} (t) & := | (c_{m} b (t) r^{(p, 1)} + 1) \\ \otimes (x^{(p)} (t) - μ (t)) - (x^{(i)} (t) - μ (t)) | . \end{aligned}$ (15) ) is called simplified GWO-R (GWO-SR), in which the search process can be classified into two stages, and under the same assumptions as above, the upper bound ${\hat{d}}_{j}^{(p, i)} (t)$ is minimal in all stages when $x^{(i)} (t)$ , $x^{(p)} (t)$ and $μ (t)$ are equal. The properties can be shown as follows.

Now, we analyze the search area of each grey wolf. First, let us consider the upper bound ${\hat{d}}_{j}^{(p, i)} (t)$ of $d^{(i)} (t)$ in (Equation15(15) $\begin{aligned} d^{(p, i)} (t) & := | (c_{m} b (t) r^{(p, 1)} + 1) \\ \otimes (x^{(p)} (t) - μ (t)) - (x^{(i)} (t) - μ (t)) | . \end{aligned}$ (15) ). In the same way to derivate (Equation14(14) $\begin{aligned} {\hat{d}}_{j}^{(p, i)} (t) & = max {| v_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) |, \\ | (2 c_{m} b (t) - 1) v_{j}^{(p)} (t) + u_{j}^{(p, i)} (t) |}, \end{aligned}$ (14) ), we have that (16) $\begin{aligned} {\hat{d}}_{j}^{(p, i)} (t) & = max {| x_{j}^{(i)} (t) - μ_{j} (t) |, \\ \times | (c_{m} b (t) + 1) (x_{j}^{(p)} (t) - μ_{j} (t)) \\ - (x_{j}^{(i)} (t) - μ_{j} (t)) |} \\ = max {| v_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) |, \\ \times | c_{m} b (t) v_{j}^{(p)} (t) + u_{j}^{(p, i)} (t) |} . \end{aligned}$ (16) Thus, ${\hat{d}}_{j}^{(p, i)} (t)$ can be evaluated by dividing into the following cases: (17) $\begin{aligned} If c_{m} b (t) = 1, then {\hat{d}}_{j}^{(p, i)} (t) = | v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} (t) |, \end{aligned}$ (17) (18) $\begin{aligned} If δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \in [\frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}, 0], c_{m} b (t) > 1, \\ or δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \notin [0, \frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}], c_{m} b (t) < 1, \\ then {\hat{d}}_{j}^{(p, i)} (t) = | v_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) | \\ \leq | v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} (t) |, \end{aligned}$ (18) (19) $\begin{aligned} If δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \notin [\frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}, 0], c_{m} b (t) > 1, \\ or δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \in [0, \frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}], c_{m} b (t) < 1, \\ then {\hat{d}}_{j}^{(p, i)} (t) = | c_{m} b (t) v_{j}^{(p)} (t) + u_{j}^{(p, i)} | \\ \leq | c_{m} b (t) v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} |, \end{aligned}$ (19) where we define $δ_{j}^{(p, i)} (t) := u_{j}^{(p, i)} (t) / | u_{j}^{(p, i)} (t) |$ .

Note that $c_{m} b (t) < 1$ and $c_{m} b (t) > 1$ mean the first and second halves of the search such that $t < t_{s}$ and $t > t_{s}$ , respectively, and $t_{s} := (1 - \frac{1}{c_{m}}) t_{max}$ is the iteration at which the two kinds of searches are switched, as mentioned below:

From (Equation17(17) $\begin{aligned} If c_{m} b (t) = 1, then {\hat{d}}_{j}^{(p, i)} (t) = | v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} (t) |, \end{aligned}$ (17) ), (Equation18(18) $\begin{aligned} If δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \in [\frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}, 0], c_{m} b (t) > 1, \\ or δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \notin [0, \frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}], c_{m} b (t) < 1, \\ then {\hat{d}}_{j}^{(p, i)} (t) = | v_{j}^{(p)} (t) - u_{j}^{(p, i)} (t) | \\ \leq | v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} (t) |, \end{aligned}$ (18) ) and (Equation19(19) $\begin{aligned} If δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \notin [\frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}, 0], c_{m} b (t) > 1, \\ or δ_{j}^{(p, i)} (t) v_{j}^{(p)} (t) \in [0, \frac{2 | u_{j}^{(p, i)} |}{1 - c_{m} b (t)}], c_{m} b (t) < 1, \\ then {\hat{d}}_{j}^{(p, i)} (t) = | c_{m} b (t) v_{j}^{(p)} (t) + u_{j}^{(p, i)} | \\ \leq | c_{m} b (t) v_{j}^{(p)} (t) | + | u_{j}^{(p, i)} |, \end{aligned}$ (19) ), we can see that in the first half, ${\hat{d}}_{j}^{(p, i)} (t)$ increases with proportional constant $c_{m} b (t)$ $(> 1)$ for large $| v_{j}^{(p)} (t) |$ , and does with 1 for small $| v_{j}^{(p)} (t) |$ , respectively, which widens the search area at a relatively high rate according to $‖ v^{(p)} (t) ‖$ , and that in the second half of the search, ${\hat{d}}_{j}^{(p, i)} (t)$ increases with the proportional constant 1 for large $| v_{j}^{(p)} (t) |$ , and does with $c_{m} b (t) (< 1)$ for small $| v_{j}^{(p)} (t) |$ , respectively, which means that the effect of $‖ v^{(p)} (t) ‖$ on the search area is relatively small. These relations are depicted in Figure , in which $c_{m}$ was chosen to be 3.0, and the switching occurs at $t_{s} = (1 - \frac{1}{c_{m}}) t_{max} = \frac{2}{3} t_{max}$ . The same $c_{m}$ was used in the numerical experiment at the next subsection. The results coincide with the above results. Furthermore, note that the results show that the upper bound ${\hat{d}}_{j}^{(p, i)} (t)$ is always minimal at $x_{j}^{(p)} (t) = μ_{j} (t)$ under the assumption that $u^{(p, i)} (t)$ is fixed, which means that the intensive search of GWO-SR is more reasonable than that of GWO-R.

Figure 3. Relation between ${\hat{d}}^{(p, i)} (t)$ and $v^{(p)} (t)$ when $c_{m} = 3$ and $u^{(p, i)} = 1.5$ .

Figure 3. Relation between dˆ(p,i)(t) and v(p)(t) when cm=3 and u(p,i)=1.5.

As discussed above, the search of the GWO-SR aims to suitably adjust the search area in a feasible way for more general cases. Note that $c_{m}$ determines not only the proportional constant $c_{m} b (t)$ of the search but also the switching iteration $t_{s}$ . Thus, an appropriate selection of $c_{m}$ can tune the balance between diversification and intensification of the search.

Finally, we theoretically show the invariance of GWO-R and GWO-SR. Since in the updating systems (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) ), (Equation12(12) $\begin{aligned} ξ^{(p, i)} (t) := x^{(i)} (t) + 2 b (t) (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (12) ), (Equation13(13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) ) of GWO-R, or in (Equation12(12) $\begin{aligned} ξ^{(p, i)} (t) := x^{(i)} (t) + 2 b (t) (2 r^{(p, 2)} - 1) \otimes d^{(p . i)} (t), \\ p \in {α, β, δ} \end{aligned}$ (12) ), (Equation13(13) $\begin{aligned} x^{(i)} (t + 1) := \frac{1}{3} \sum_{p \in {α, β, δ}} ξ^{(p, i)} (t), \end{aligned}$ (13) ), (Equation15(15) $\begin{aligned} d^{(p, i)} (t) & := | (c_{m} b (t) r^{(p, 1)} + 1) \\ \otimes (x^{(p)} (t) - μ (t)) - (x^{(i)} (t) - μ (t)) | . \end{aligned}$ (15) ) of GWO-SR, the position of grey wolf i is updated by using only difference vectors between $μ (t)$ and $x^{(i)}$ or $x^{(p)}$ , we can show shift-invariance of them as follows:

Theorem 4.1

GWO-R and GWO-SR are shift-invariant in the coordinate representation of the optimization problem.

Proof.

In the same way to Theorem 3.2, we assume that $x^{(i)}$ , ${\bar{x}}^{(i)}$ , $x^{(p)}$ and ${\bar{x}}^{(p)}$ , $p \in {α, β, δ}$ satisfy (Equation4(4) $\begin{aligned} {\bar{x}}^{(i)} = x^{(i)} + s, i \in I, \end{aligned}$ (4) ). In addition, $μ (t)$ and $\bar{μ} (t)$ denote the reference points in the original and shifted coordinate systems, namely for (P1) and (P2), respectively. Then, since from the definition of the reference point, $μ (t)$ is a weighted sum of points satisfying the relation (Equation4(4) $\begin{aligned} {\bar{x}}^{(i)} = x^{(i)} + s, i \in I, \end{aligned}$ (4) ), we have that $\bar{μ} (t) = μ (t) + s$ , which means that $x^{(p)} (t) - μ (t) = {\bar{x}}^{(p)} (t) - \bar{μ} (t)$ , p $\in {α, β, δ}$ , and $x^{(i)} (t) - μ (t) = {\bar{x}}^{(i)} (t) - \bar{μ} (t)$ , $i \in W$ . Therefore, the relation $d^{(p, i)} (t) = {\bar{d}}^{(p, i)} (t)$ always holds at t + 1 if all random variables are the same in both systems. As a results, the position of any grey wolf satisfies the relation (Equation5(5) $\begin{aligned} {\bar{x}}_{new}^{(i)} = x_{new}^{(i)} + s, i \in I, \end{aligned}$ (5) ).

4.2. Numerical experiments

In this subsection, we applied GWO-R and GWO-SR to the same problems (P) with no shift-transformation as 3.3, namely, twenty 50-dimensional problems. Note that even if they are applied to (P2) with different shift vectors, the same results are obtained because of shift-invariance of them. We compared GWO-SRs with different reference points and GWO-R with $μ^{(1, 2)}$ . In Ref. [Citation16], GWO-R with $μ^{(1, 2)}$ was shown to have better performance than others. We compared the mean function values of the best solutions obtained by GWO-R and GWO-SR on 50 trials. The maximal number of iteration $t_{\max}$ was 5000, the number of wolves $| W |$ was 80, the constant $c_{m}$ of GWO-R was 1.5, and $c_{m}$ of GWO-SR was 3, which was selected in the preliminary experiments.

Table 3. Comparison of results obtained by GWO-SRs with seven different reference points and GWO-R with $μ^{(1, 2)}$ for 20 benchmark problems (P1).

Display Table

Table shows the result obtained by GWO-R using $μ^{(1, 2)}$ and GWO-SRs using $μ^{(i)}$ , i = 1, 2, 3 and four internally dividing points, $μ^{(1, 2)}$ , $μ^{(2, 1)}$ , $μ^{(1, 3)}$ and $μ^{(3, 1)}$ , which are the first to fourth best results in ones obtained by GWO-SRs using internally dividing points of all combinations of two points, $μ^{(i, j)}$ , $i \neq j$ ∈ ${1, 2, 3}$ . We can see that GWO-SRs with $μ^{(1)}$ , $μ^{(1, 2)}$ and $μ^{(1, 3)}$ obtained better solutions than others, which demonstrates that it is advantageous to use the centroid of pbests for the diversification in the first half of the search, and the centroid of $x^{(i)}$ , $i \in W$ or $x^{(p)}$ , $i \in {α, β, δ}$ for the intensification in its second half. At the same time, the results indicate that the selection of the reference point is considerably important because the reference point is a significant criterion to adjust the size of the search area.

Next, comparing GWO-R with GWO-SR using the same $μ^{(1, 2)}$ , we can observe that GWO-SR is superior to GWO-R, which indicates that the method of adjusting reasonably the search area works effectively in GWO-SR due to (Equation15(15) $\begin{aligned} d^{(p, i)} (t) & := | (c_{m} b (t) r^{(p, 1)} + 1) \\ \otimes (x^{(p)} (t) - μ (t)) - (x^{(i)} (t) - μ (t)) | . \end{aligned}$ (15) ). In addition, since most of the results of the GWO-SR are better than those of the original GWO for (P2) with different shift vectors ( $ρ \neq 0$ ) shown in Table , we can conclude that the proposed GWO-SR is more useful than the original GWO. These results and the reasonable control of the search areas in GWO-SR show that if the reference point is a temporal prey, GWO-SR can be considered to search for solutions reasonably around the prey, which means that GWO-SR more naturally and effectively simulates the hunting mechanism of grey wolves than the original GWO or GWO-R.

On the other hand, GWO-SR has a search ability that is almost comparable to other metaheuristic methods, as shown in Section 6, while it does not have significant advantages over others. The limitation of its search ability can be mainly caused by the fact that the search areas are controlled only on the basis of the distance from the reference points, and in addition, there is room for improvement in the method. Therefore, in the next section, we add two techniques to the proposed GWO, which can control the search areas adaptively as the search progresses.

5. GWO with adjustment of search area

In this section, first, we propose a method which adaptively selects the size of the search area for each grey wolf by evaluating objective function values at candidates of its next point additionally. Its updating system is given as follows: (20) $\begin{aligned} d^{(k, i)} (t) = | (c_{k} b (t) r^{(k, 1)} + 1) \otimes (x^{(α)} (t) - μ (t)) \\ - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (20) (21) $\begin{aligned} ξ^{(k, i)} (t) = x^{(i)} (t) - 2 b (t) (2 r^{(k, 2)} - 1) \otimes d^{(k . i)} (t), \\ k \in {1, 2, 3}, \end{aligned}$ (21) (22) $\begin{aligned} x^{(i)} (t + 1) \\ = argmin {f (x) | x \in {ξ^{(1, i)} (t), ξ^{(2, i)} (t), ξ^{(3, i)} (t)}} . \end{aligned}$ (22) In this method, at the beginning, $d^{(k, i)} (t)$ , $k \in {1, 2, 3}$ are calculated by using the best solution $x^{(α)} (t)$ with different $c_{k}$ , $k \in {1, 2, 3}$ , such that $c_{1} < c_{2} < c_{3}$ by (Equation20(20) $\begin{aligned} d^{(k, i)} (t) = | (c_{k} b (t) r^{(k, 1)} + 1) \otimes (x^{(α)} (t) - μ (t)) \\ - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (20) ) for grey wolf i. Then, three candidate points $ξ^{(k, i)} (t)$ , $k \in {1, 2, 3}$ for i are selected from the search areas determined by $d^{(k, i)} (t)$ , $k \in {1, 2, 3}$ , respectively, by (Equation21(21) $\begin{aligned} ξ^{(k, i)} (t) = x^{(i)} (t) - 2 b (t) (2 r^{(k, 2)} - 1) \otimes d^{(k . i)} (t), \\ k \in {1, 2, 3}, \end{aligned}$ (21) ). Next, among three candidate points, the point with the smallest objective function value is selected as the next point of $x^{(i)} (t)$ by (Equation22(22) $\begin{aligned} x^{(i)} (t + 1) \\ = argmin {f (x) | x \in {ξ^{(1, i)} (t), ξ^{(2, i)} (t), ξ^{(3, i)} (t)}} . \end{aligned}$ (22) ). Namely, the next point of each grey wolf is the best point of three candidates selected from search areas with different sizes.

Here, note that since function values of three candidate points are evaluated for each grey wolf per iteration, with regard to the number of evaluations of function values the method requires three times greater than GWO-SR. In addition, although selection from more candidates based on all of $x^{(p)} (t)$ , $p \in {α, β, δ}$ might improve current solutions more significantly, it requires a larger amount of computational resources. Thus, this method uses only three candidates based on $x^{(α)} (t)$ .

Next, let us remind that $c_{m}$ in (Equation11(11) $\begin{aligned} d^{(p, i)} (t) := | 2 c_{m} b (t) r^{(p, 1)} \\ \otimes (x^{(p)} - μ (t)) - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (11) ), which corresponds to $c_{k}$ in (Equation20(20) $\begin{aligned} d^{(k, i)} (t) = | (c_{k} b (t) r^{(k, 1)} + 1) \otimes (x^{(α)} (t) - μ (t)) \\ - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (20) ), determines the switching iteration $t_{s} = (1 - 1 / c_{m}) t_{max}$ between the extensive and intensive searches, as discussed in 4.1. Since $t_{s}^{k} := (1 - 1 / c_{k}) t_{max}$ is different for different $c_{k}$ , the balance between extensive and intensive searches is also different for each $c_{k}$ . Hence, we can expect the various searches by selecting different $c_{k}$ . In the numerical experiments in the next section, we selected $(c_{1}$ , $c_{2}$ , $c_{3}) = (3.0, 4.5, 6.0)$ , in which we have $t_{s}^{1} = \frac{2}{3} t_{max}$ , $t_{s}^{2} = \frac{7}{9} t_{max}$ and $t_{s}^{3} = \frac{5}{6} t_{max}$ .

Secondly, we introduce a mutation procedure for a more extensive search because the diversity of the search may be relatively weakened due to selecting the next point from three candidates based on the same $x^{α}$ . In the procedure, at each iteration t, a mutation procedure is executed with a probability of $(1 - t / t_{max})$ at t as follows: A grey wolf i $\in W$ and an index $k$ $\in {1, 2, 3}$ are selected uniformly at random, respectively, and the jth component of $ξ^{(i, k)} (t)$ is also selected uniformly at random among ${1, \dots, n}$ . Then, after updating $ξ^{(i, k)} (t)$ by (Equation21(21) $\begin{aligned} ξ^{(k, i)} (t) = x^{(i)} (t) - 2 b (t) (2 r^{(k, 2)} - 1) \otimes d^{(k . i)} (t), \\ k \in {1, 2, 3}, \end{aligned}$ (21) ), a mutational modification of $Δ x_{m} r_{m}$ is added as follows: (23) $\begin{aligned} ξ_{j}^{(k, i)} (t) = ξ_{j}^{(k, i)} (t) + Δ x_{m} r_{m}, \end{aligned}$ (23) where $r_{m}$ is a uniform random variable from $(- 1, 1)$ , and $Δ x_{m}$ is the upper bound of modification amount.

This modified GWO-SR is called the GWO with adjustment of the search area (GWO-AS). Here, note that we can easily show that the modified GWO-AS is also invariant to shift-transformations in the same way as Theorem 4.1 because the updating systems (Equation20(20) $\begin{aligned} d^{(k, i)} (t) = | (c_{k} b (t) r^{(k, 1)} + 1) \otimes (x^{(α)} (t) - μ (t)) \\ - (x^{(i)} (t) - μ (t)) |, \end{aligned}$ (20) )–(Equation22(22) $\begin{aligned} x^{(i)} (t + 1) \\ = argmin {f (x) | x \in {ξ^{(1, i)} (t), ξ^{(2, i)} (t), ξ^{(3, i)} (t)}} . \end{aligned}$ (22) ) and (Equation23(23) $\begin{aligned} ξ_{j}^{(k, i)} (t) = ξ_{j}^{(k, i)} (t) + Δ x_{m} r_{m}, \end{aligned}$ (23) ) include only difference vectors between $μ (t)$ and $x^{(i)}$ or $x^{(p)}$ similarly to GWO-SR.

6. Numerical experiments

6.1. Experimental conditions

In this section, we apply the proposed GWOs, GWO-SR and GWO-AS, and other metaheuristic methods, FA and PSO, to 18 basic functions of CEC'17 benchmark problems and the other two problems, which were used in 3.3 and 4.2. Note that in order to evaluate the characteristics of the proposed methods, the search behavior was investigated by comparing them with the original metaheuristic methods rather than the various modifications of them. Since the four methods differ in the number of function evaluation per iteration, we selected the maximum number of iteration for each method such that the total number of function evaluations $f_{max}$ were equal in four methods. We set $f_{max} = 1200000$ and 4800000 for 50 and 200-dimensional problems, respectively, in which 50 trials were performed for 50-dimensional and 30 trials were 200-dimensional problems. All methods used 80 search agents such as particles, fireflies and grey wolves. In addition, the parameter values of PSO were selected as the recommended ones [Citation23]. Those of FA and the proposed methods were done by preparatory experiments. For GWO-SR, $c_{m} = 3.0$ and the reference point was $μ^{(1, 2)}$ , while for GWO-AS, $Δ x_{m} = 0.6 c_{s}$ and $(c_{1}, c_{2}, c_{3}) = (3.0, 4.5, 6.0)$ the reference point was $μ^{(2, 1)}$ .

First, we compared the mean and standard deviations of function values obtained on 50 trials of four methods for twenty 50-dimensional benchmark problems, and compared those on 30 trials for nineteen 200-dimensional benchmark problems, where all four methods did not obtain any computable objective function value for 200-dimensional problem $C 2$ , which are shown in Tables and , respectively.

Table 4. Comparison of PSO, FA, GWO-SR and GWO-AS for 50-dimensional problems ( $f_{max} = 1200000$ ).

Display Table

Table 5. Comparison of PSO, FA, GWO-SR and GWO-AS for 200-dimensional problems ( $f_{max} = 4800000$ ).

Display Table

Next, we evaluated the transitions of the best function values obtained by four methods with $f_{max} = 1200000$ for all 50-dimensional problems, and selected six typical results in them, which are shown in Figure . In the figures, its horizontal axis denotes the function evaluation count per agent, and its vertical axis does the mean of the best function values in the search process of each method on 50 trials. Finally, we focus on the number of times $ξ^{(k, i)} (t + 1)$ , $k \in {1, 2, 3}$ is selected in (Equation22(22) $\begin{aligned} x^{(i)} (t + 1) \\ = argmin {f (x) | x \in {ξ^{(1, i)} (t), ξ^{(2, i)} (t), ξ^{(3, i)} (t)}} . \end{aligned}$ (22) ) of GWO-AS at iteration t + 1 as follows: $W^{(k)} (t + 1) := {i \in W | x^{(i)} (t + 1) = ξ^{(k, i)} (t)}, k \in {1, 2, 3}$ , and compare the mean of $| W^{(k)} (t + 1) |$ on 50 trials at each iteration in GWO-AS for four typical results, which are shown in Figure .

Figure 4. Means of the best function values obtained by four methods, PSO, FA, GWO-SR and GWO-AS, until function evaluation count per agent for 50-dimensional six benchmark problems.

Figure 5. Means of $| W^{(k)} (t + 1) |$ , $k \in {1, 2, 3}$ for three candidates on 50 trials at each count when GWO-AS is applied to 50-dimensional four benchmark problems.

Figure 5. Means of |W(k)(t+1)|, k∈{1,2,3} for three candidates on 50 trials at each count when GWO-AS is applied to 50-dimensional four benchmark problems.

6.2. Discussion of results

First of all, we discuss obtained function values by four methods, as shown in Tables and . Now, comparing the results of four methods and the original GWO with $ρ \neq 0$ , which are shown in Table , we can see that four methods obtained better solutions than the original GWO with $ρ \neq 0$ for many problems, especially, GWO-SR and GWO-AS did. These results well demonstrate the disadvantages of the original GWO and the advantages of GWO-SR and GWO-AS. Next, concerning the comparison of three methods, PSO, FA and GWO-SR, the number of the problems for which the least objective functions was obtained by three methods are almost the same, which indicates that GWO-SR is not only invariant to shift-transformations by just introducing the reference point, but also has a similar search ability to other two methods. In addition, let us compare four methods, PSO, FA, GWO-SR and GWO-AS. From the above two tables, we can see that GWO-AS has better search ability than other three methods: GWO-AS obtained the first least function value for ten or eleven problems, and the second least function value for five ones in four methods. From these results, we can conclude that even though the GWO has serious drawbacks, the drawbacks can be overcome by applying the proposed methods, which implies that the basic concept for search in GWO is still promising, comparing other metaheuristic methods.

Secondly, we focus on the two proposed methods, GWO-SR and GWO-AS. From preparatory experiments for the methods with different reference points, which are shown in Tables for GWO-SR, we observed that the appropriate reference points for GWO-SR were $μ^{(1)}$ and $μ^{(1, 2)}$ , while those for GWO-AS were $μ^{(2)}$ and $μ^{(2, 1)}$ , respectively. It means that the pbests are effective as the reference point for the search in GWO-SR, while the three best solutions are effective in GWO-AS. The difference can be attributed to the different search characteristics of the two methods. Although GWO-AS using the three best solutions may enhance the intensive search too strongly than GWO-SR, the balance between diversification and intensification is considered to be kept due to the proposed adaptive selection and mutation. Moreover, GWO-AS has better performance than GWO-SR in Tables and , which indicates that the two proposed methods,the adaptive selection of search areas and the mutation procedure, are effective.

Thirdly, we discuss the relationship between computational complexity and search ability. In Figure , we can observe that all four methods decrease the objective function values sequentially as the search progresses. On the other hand, in the early stage of the search, the GWO-AS decreases more slowly than others, where the slow decline can be explained by the fact GWO-AS requires three times more function evaluations per iteration than the others, as mentioned previous section, while in the middle to end of the search, the GWO-AS decreases the function value significantly faster for many problems. In addition, as discussed above, the final function value obtained by GWO-AS was the least for many problems. These results suggest that the performance of search ability of GWO in terms of computational complexity is higher than others.

Finally, we analyze the mean of selection times of $ξ^{(k, i)} (t)$ per each count in GWO-AS, as shown in Figure , in which we can see that three kinds of candidates $ξ^{(k, i)} (t)$ , $k \in {1, 2, 3}$ are variously selected as the search progresses: The candidate $ξ^{(k, i)} (t)$ which is most often selected is different in the first and second halves of the search, and its switching count is between 5000 and 8000. In the first half of the search, $ξ^{(1, i)} (t)$ with $c_{1} = 3$ was most often selected, which is relatively small, while in its second half, $ξ^{(3, i)} (t)$ with $c_{3} = 6$ was most often selected. From these results, we can see that three kinds of search areas are adaptively selected in GWO-AS for each problem as the search progresses.

7. Conclusion

In this paper, we have theoretically shown that the original GWO relies on a shift transformation of the problem, and have experimentally shown that its search is too specialized for problems with an optimal solution at the origin, which can significantly reduce its effectiveness of the search for other problems.

Next, we have proposed a modified method of GWO-R (GWO-SR) that is not only invariant to shift-transformations, and that can adjust reasonably its search areas. In addition, we have experimentally shown that its search area can be appropriately varied as the search progresses, in which GWO-SR is more efficient than GWO-R, and as efficient as PSO and FA. Moreover, we have proposed the GWO-AS by adding two methods, an adaptive selection method of the size of search area and a mutation procedure, to GWO-SR. Through numerical experiments, we have verified that the GWO-AS outperforms other methods. From these results, we can conclude that the basic concept of the search inspired by the hunting mechanism of grey wolves is promising despite of its serious drawbacks of the orignal GWO because the drawbacks can be easily overcome by the proposed methods.

Furthermore, since various methods have been investigated for the original GWO [Citation3–8], the proposed GWO can be also modified by some of them. As a future issue, the performance evaluation of the proposed GWOs with such modifications should be verified. In addition, note that the whale optimization algorithm (WOA) [Citation24], which is also one of the popular metaheuristic methods, uses basically the same update systems as the GWO. Thus, the method has the same drawbacks, as mentioned in Ref. [Citation15]. It is possible to add the shift-invariance to WOA and modify its search ability as a general-purpose metaheuristic method by applying the methods used in GWO-SR and GWO-AS. Therefore, we would like to verify how useful those methods are when they are applied to the WOA, or what variations are particularly effective.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Keiji Tatsumi

Keiji Tatsumi, received the Ph. D. degree from Kyoto University, Japan, in 2006. In 1998, he joined Osaka University, where he is currently an Associate Professor of the Graduate School of Engineering. His research interests comprise metaheuristics for global optimization and machine learning.

Nao Kinoshita

Nao Kinoshita, received his M.S. degrees from Osaka University, Japan, in 2022. He is currently working for Nippon Steel Corp. His research interests are in metaheuristics for global optimization.

References

Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61. doi: 10.1016/j.advengsoft.2013.12.007
Web of Science ®Google Scholar
Muro C, Escobedo R, Spector L, et al. Wolf-pack (Canis lupus) hunting strategies emerge from simple rules in computational simulations. Behav Processes. 2011;88:192–197. doi: 10.1016/j.beproc.2011.09.006
PubMed Web of Science ®Google Scholar
Mosav SK, Jalalian E, Gharahchopog FS. A comprehensive survey of grey wolf optimizer algorithm and its application. Int J Adv Robot & Expert Syst (JARES). 2018;1(6):23–45.
Google Scholar
Faris H, Mirjalili S. Grey wolf optimizer: a review of recent variants and applications. Neural Comput Appl. 2018;30:413–435. doi: 10.1007/s00521-017-3272-5
Web of Science ®Google Scholar
Panda M, Dan B. Grey wolf optimizer and its applications: a survey. In: Proceedings of the Third International Conference on Microelectronics, Computing and Communication Systems. 2019.p. 179–194.
Google Scholar
Ali S, Sharma A, Jadon S. A survey on grey wolf optimizer. J Emerg Technol Innov Res (JETIR). 2020;7(11):789–790.
Google Scholar
Almufti SM, Ahmad HB, Marqas RB, et al. Grey wolf optimizer: overview, modifications and applications. Int Res J Sci Technol Educ Manage. 2021;1(1):44–56.
Google Scholar
Wu G, Mallipeddi R, Suganthan P. Ensemble strategies for population-based optimization algorithms – A survey. Swarm Evol Comput. 2019;44:695–711. doi: 10.1016/j.swevo.2018.08.015
Web of Science ®Google Scholar
Mirjalili S. How effective is the grey wolf optimizer in training multi-layer perceptrons. Appl Intell. 2015;43:150–161. doi: 10.1007/s10489-014-0645-7
Web of Science ®Google Scholar
Muangkote N, Sunat K, Chiewchanwattana S. An improved grey wolf optimizer for training q-Gaussian Radial Basis Functional-link nets, 2014 International Computer Science and Engineering Conference (ICSEC); 2014.
Google Scholar
Saremi S, Mirjalili SZ, Mirjalili SM. Evolutionary population dynamics and grey wolf optimizer. Neural Comput Appl. 2015;26:1257–1263. doi: 10.1007/s00521-014-1806-7
Web of Science ®Google Scholar
Shankar K, Eswaran P. A secure visual secret share (VSS) creation scheme in visual cryptography using elliptic curve cryptography with optimization technique. Aust J Basic Appl Sci. 2015;9(36):150–163.
Google Scholar
Jian Z, Zhu G. Affine invariance of meta-heuristic algorithms. Inf Sci (Ny). 2021;576:37–53. doi: 10.1016/j.ins.2021.06.062
Web of Science ®Google Scholar
Niu P, Niu S, Liu N, et al. The defect of the grey wolf optimization algorithm and its verification method. Knowl Based Syst. 2019;171:37–43. doi: 10.1016/j.knosys.2019.01.018
Web of Science ®Google Scholar
Askari Q, Younas I, Saeed M. Emphasizing the importance of shift invariance in metaheuristics by using whale optimization algorithm as a test bed. Soft Comput. 2021;25:14209–14225. doi: 10.1007/s00500-021-06101-9
Web of Science ®Google Scholar
Tatsumi K, Kinoshita N. Shift-invariant grey wolf optimizer exploiting reference points and random selection of step-sizes. Proc SICE Ann Conf. 2022;2022:1201–1206.
Google Scholar
Kennedy J, Eberhart RC. Particle swarm optimization. Proc IEEE Int Jt Conf Neural Netw. 1995;4:1942–1948.
Google Scholar
Poli R, Kennedy J, Blackwell T. Particle swarm optimization – an overview. Swarm Intell. 2007;1:33–57. doi: 10.1007/s11721-007-0002-0
Google Scholar
Yang XS. Nature-Inspired metaheuristic algorithms. Frome: Luniver Press; 2008.
Google Scholar
Fister I, Fister J.I, Yang XS, et al. A comprehensive review of firefly algorithms. Swarm Evol Comput. 2013;13:34–46. doi: 10.1016/j.swevo.2013.06.001
Web of Science ®Google Scholar
Grey Wolf Optimizer (GWO) version 1.6 (1.85 MB) by Mirjalili S. GWO is a novel meta-heuristic algorithm for global optimization. https://www.mathworks.com/matlabcentral/fileexchange/44974-grey-wolf-optimizer-gwo.
Google Scholar
Awad NH, Ali MZ, Suganthan PN, et al. Problem definitions and evaluation criteria for the CEC 2017 special session and competition on single objective real-parameter numerical optimization, Technical Report of Nanyang Technological Univ., Jordan Univ. and Zhengzhou Univ., 2016.
Google Scholar
Clerc M. Particle swarm optimization. London: ISTE Publishing; 2006.
Google Scholar
Mirjalili S, Lewis A. The whale optimization algorithm. Adv Eng Softw. 2016;95:51–67. doi: 10.1016/j.advengsoft.2016.01.008
Web of Science ®Google Scholar

Analysis of dependence of grey wolf optimizer to shift-transformations and its shift-invariant improved methods adaptively controlling the search areas

ABSTRACT

1. Introduction

2. Grey wolf optimizer