107
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Towards solving 2-TBSG efficiently

, &
Pages 706-721 | Received 08 Jun 2019, Accepted 16 Nov 2019, Published online: 10 Dec 2019
 

Abstract

Two-player turn-based stochastic game (2-TBSG) is a two-player game model which aims to find Nash equilibriums and is widely utilized in reinforcement learning and AI. Inspired by the fact that the simplex method for solving the deterministic discounted Markov decision processes is strongly polynomial independent of the discount factor, we are trying to answer an open problem whether there is a similar algorithm for 2-TBSG. We develop a simplex strategy iteration where one player updates its strategy with a simplex step while the other player finds an optimal counterstrategy in turn, and a modified simplex strategy iteration. Both of them belong to a class of geometrically converging algorithms. We establish the strongly polynomial property of these algorithms by considering a strategy combined from the current strategy and the equilibrium strategy. Moreover, we present a method to transform general 2-TBSGs into special 2-TBSGs where each state has exactly two actions.

2010 Mathematics Subject Classifications:

Acknowledgments

The authors are grateful to the associate editor and two anonymous referees for their valuable comments and suggestions.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

The second author's research was supported in part by the National Natural Science Foundation of China (NSFC) (grant number 11831002) and by the Beijing Academy of Artificial Intelligence.

Notes on contributors

Zeyu Jia

Zeyu Jia, undergraduate student at Peking University. His research interests lie in reinforcement learning theory, machine learning theory and optimization. He is a member of the Elite Undergraduate Training Program of Applied Math in Peking University.

Zaiwen Wen

Zaiwen Wen, associate professor at Peking University. He holds a M.S. in Computational Mathematics from Academy of Mathematics and Systems Science (AMSS) (2004) and a Ph.D in Operations Research from Columbia University (2009). His research interests include large-scale computational optimization and their applications in sciences and engineering. He was awarded the National Natural Science Foundation of China for Excellent Young Scholars in 2013, the Young top-notch talent in 2015 and the Science and Technology Award for Chinese Youth in 2016. He is an associate editor of Journal of the Operations Research Society of China, Journal of Computational Mathematics and a technical editor of Mathematical Programming Computation.

Yinyu Ye

Yinyu Ye, K.T. Li Chair Professor of Engineering at Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering, Stanford University. He received a B.S. in System Engineering from the Huazhong University of Science and Technology, in China, and an MS and Ph.D in Engineering of Economic Systems and Operations Research from Stanford University. His current research interests include Continuous and Discrete Optimization, Data Science and Applications, Algorithm Design and Analysis, Computational Game/Market Equilibrium, Metric Distance Geometry, Dynamic Resource Allocation, and Stochastic and Robust Decision Making, among others. He has been a Fellow of INFORMS since 2012, and has received several academic awards, including the 2009 John von Neumann Theory Prize for fundamental sustained contributions to theory in Operations Research and the Management Sciences, the 2015 SPS Signal Processing Magazine Best Paper Award, the 2014 SIAG/Optimization Prize awarded every three years to the author(s) of the most outstanding paper, the inaugural 2012 ISMP Tseng Lectureship Prize for outstanding contributions to continuous optimization, and the inaugural 2006 Farkas Prize on Optimization. His doctoral students at Stanford have received the 2015 and 2013 Second Prize of the INFORMS Nicholson Student Paper Competition, the 2013 INFORMS Computing Society Prize, the 2008 First Prize of INFORMS Nicholson, and the 2006 and 2010 INFORMS Optimization Prizes for Young Researchers. He is the Chief Scientist of a major commercial optimization software company.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 1,330.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.