1,028
Views
35
CrossRef citations to date
0
Altmetric
Articles

Distributed learning and multi-objectivity in traffic light control

, &
Pages 65-83 | Received 01 Sep 2013, Accepted 19 Nov 2013, Published online: 13 Mar 2014
 

Abstract

Traffic jams and suboptimal traffic flows are ubiquitous in modern societies, and they create enormous economic losses each year. Delays at traffic lights alone account for roughly 10% of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning (RL) approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Distributed constraint optimisation approaches (DCOP) have also been shown to be successful, but are limited to cases where the traffic flows are known. The distributed coordination of exploration and exploitation (DCEE) framework was recently proposed to introduce learning in the DCOP framework. In this paper, we present a study of DCEE and RL techniques in a complex simulator, illustrating the particular advantages of each, comparing them against standard isolated traffic actuated signals. We analyse how learning and coordination behave under different traffic conditions, and discuss the multi-objective nature of the problem. Finally we evaluate several alternative reward signals in the best performing approach, some of these taking advantage of the correlation between the problem-inherent objectives to improve performance.

Acknowledgements

We thank the reviewers for their helpful suggestions and comments. Tim Brys is funded by a PhD grant of the Research Foundation-Flanders (FWO), and performed a research visit to Prof. Matthew E. Taylor at Lafayette College, funded by a Short Stay Abroad grant also from the FWO. This work was supported in part by NSF IIS-1149917.

Notes

1. Agents are ordered in a 2×2 grid, with numbering starting top-left and ending bottom-right.

2. We continue only with RL as it produced better asymptotic performance in the previous experiments (i.e. and ).

3. Local throughput is measured by counting the number of cars that cross the intersection, and dividing by the time period measured. This time period is the time between two actions, i.e. 2 s for a ‘stay’ action, and 4 s for a change action (accounting for the 2 s yellow light). We divide by the time period to account for actions of different duration.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.