Figures & data
The pheromone communication is carried out between population A and population B, and the Markov dynamic decision judgment of reinforcement learning is used to reward or punish population A or population B.
Figures (a) and (b) show the location distribution of 24 suppliers’ demand points and 3 distribution centres in the R101 and C101.
In Figure (a), the fairness index of example R101 tends to converge after about 100 iterations, and the convergence value is 0.083; In Figure (b), the fairness index of C101 example converges in about 230 iterations, and the convergence value is 0.054.
In Figure (a), the ACS algorithm is used to obtain the optimal α and β factors, where α = 2, β = 5.5; In Figure (b), the MMAS algorithm is used to obtain the optimal α and β factors, where α = 2.5, β = 5; In Figure (c), the ACS algorithm is used to screen ρ, and the optimal ρ = 0.2; Figure (d) uses the MMAS algorithm to screen ρ, and the optimal ρ = 0.8; Figure (e) compares the number of ants of 12,16,24,30,36, the optimal objective function is the best, and the number of ants that converges the fastest is 24.
Figure (a) shows the optimal vehicle assignment route of the R101 example, as shown in Table 9; Figure (b) shows the optimal vehicle assignment route for the C101 example, as shown in Table 10.
Figures (a) and (b) show the comparison between the ACS-MMAS, ACS, and MMAS algorithms for R101 and C101 examples, respectively. The results show that the Z2 value obtained by the ACS-MMAS algorithm is smaller than that obtained by ACS and MMAS algorithms.
Figure (a) and Figure (b) respectively show that the ACS-MMAS algorithm is used to obtain more Pareto solution sets on the pre-Pareto curve for R101 and C101 examples, and the controllable space is larger than the ACS algorithm and MMAS algorithm.
a1, a2 a3 are the three Pareto solutions, and the hypervolume value is the sum of the volumes of the hypervolume cube.
Data availability statement
Data available on request from the authors.