Abstract
In this paper, we attempt to address the issue of controlling the sensitivity parameters (or control gains) of automated driving vehicles in an open heterogeneous traffic flow system. The automated driving vehicles are supposedly equipped with adaptive cruise control and connectivity while the conventional vehicles are characterized by a stochastic safe time headway. To optimize the sensitivity parameters, the natural policy gradient reinforcement learning algorithm has been used for the best policy search. In this context, two performance indices were considered: the traffic breakdown probability and fuel consumption. After extensive simulations, it is found that the sensitivity parameters should depend on both the flow and the penetration rate for maximum performance. In particular, a low-penetration rate of 5% can improve traffic performance. A comparison with other algorithms suggests that natural policy gradient and Q-learning yield a good approximation and reduce significantly the computational cost.
Acknowledgments
This work was supported by the National Key R & D Program of China (No. 2018YFB1600900), the National Natural Science Foundation of China (No. 71971015, 71621001, and 71931002).
Disclosure statement
No potential conflict of interest was reported by the author(s).