Optimize taxi driving strategies based on reinforcement learning

Yong GaoInstitute of Remote Sensing and Geographic Information System, Peking University, Beijing, ChinaCorrespondence[email protected]
View further author information

Dan JiangInstitute of Remote Sensing and Geographic Information System, Peking University, Beijing, ChinaView further author information

Yan XuSpatial Sciences Institute, University of Southern California, Los Angeles, CA, USAView further author information

ABSTRACT

The efficiency of taxi services in big cities influences not only the convenience of peoples’ travel but also urban traffic and profits for taxi drivers. To balance the demands and supplies of taxicabs, spatio-temporal knowledge mined from historical trajectories is recommended for both passengers finding an available taxicab and cabdrivers estimating the location of the next passenger. However, taxi trajectories are long sequences where single-step optimization cannot guarantee the global optimum. Taking long-term revenue as the goal, a novel method is proposed based on reinforcement learning to optimize taxi driving strategies for global profit maximization. This optimization problem is formulated as a Markov decision process for the whole taxi driving sequence. The state set in this model is defined as the taxi location and operation status. The action set includes the operation choices of empty driving, carrying passengers or waiting, and the subsequent driving behaviors. The reward, as the objective function for evaluating driving policies, is defined as the effective driving ratio that measures the total profit of a cabdriver in a working day. The optimal choice for cabdrivers at any location is learned by the Q-learning algorithm with maximum cumulative rewards. Utilizing historical trajectory data in Beijing, the experiments were conducted to test the accuracy and efficiency of the method. The results show that the method improves profits and efficiency for cabdrivers and increases the opportunities for passengers to find taxis as well. By replacing the reward function with other criteria, the method can also be used to discover and investigate novel spatial patterns. This new model is prior knowledge-free and globally optimal, which has advantages over previous methods.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [41625003].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Optimize taxi driving strategies based on reinforcement learning

Information for

Open access

Opportunities

Help and information

Optimize taxi driving strategies based on reinforcement learning

ABSTRACT

Disclosure statement

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature