611
Views
0
CrossRef citations to date
0
Altmetric
Operations Engineering & Analytics

Inverse Markov decision processes with unknown transition probabilities

&
Pages 588-601 | Received 25 Jul 2021, Accepted 20 Jun 2022, Published online: 22 Aug 2022
 

Abstract

Inverse optimization involves recovering parameters of a mathematical model using observed values of decision variables. In Markov Decision Processes (MDPs), it has been applied to estimate rewards that render observed policies optimal. A counterpart is not available for transition probabilities. We study two variants of this problem. First, the decision-maker wonders whether there exist a policy and transition probabilities that attain given target values of expected total discounted rewards over an infinite horizon. We derive necessary and sufficient existence conditions, and formulate a feasibility linear program whose solution yields the requisite policy and transition probabilities. We extend these results when the decision-maker wants to render the target values optimal. In the second variant, the decision-maker wishes to find transition probabilities that make a given policy optimal. The resulting problem is nonconvex bilinear, and we propose tailored versions of two heuristics called Convex-Concave Procedure and Sequential Linear Programming (SLP). Their performance is compared via numerical experiments against an exact method. Computational experiments on randomly generated MDPs reveal that SLP outperforms the other two both in runtime and objective values. Further insights into SLP’s performance are derived via numerical experiments on inverse inventory control, equipment replacement, and multi-armed bandit problems.

Additional information

Notes on contributors

Zahra Ghatrani

Zahra Ghatrani received a BS degree in industrial engineering from Sharif University of Technology. She then received a PhD in industrial and systems engineering from the University of Washington in Seattle. This work was conducted at the University of Washington as a part of Zahra's doctoral dissertation. She now works at Amazon after graduation.

Archis Ghate

Archis Ghate is a Professor of Industrial and Systems Engineering at the University of Washington in Seattle. His research focuses on optimization under uncertainty. He received a PhD in industrial and operations engineering from the University of Michigan in Ann Arbor.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 202.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.