Abstract
Inverse optimization involves recovering parameters of a mathematical model using observed values of decision variables. In Markov Decision Processes (MDPs), it has been applied to estimate rewards that render observed policies optimal. A counterpart is not available for transition probabilities. We study two variants of this problem. First, the decision-maker wonders whether there exist a policy and transition probabilities that attain given target values of expected total discounted rewards over an infinite horizon. We derive necessary and sufficient existence conditions, and formulate a feasibility linear program whose solution yields the requisite policy and transition probabilities. We extend these results when the decision-maker wants to render the target values optimal. In the second variant, the decision-maker wishes to find transition probabilities that make a given policy optimal. The resulting problem is nonconvex bilinear, and we propose tailored versions of two heuristics called Convex-Concave Procedure and Sequential Linear Programming (SLP). Their performance is compared via numerical experiments against an exact method. Computational experiments on randomly generated MDPs reveal that SLP outperforms the other two both in runtime and objective values. Further insights into SLP’s performance are derived via numerical experiments on inverse inventory control, equipment replacement, and multi-armed bandit problems.
Additional information
Notes on contributors
Zahra Ghatrani
Zahra Ghatrani received a BS degree in industrial engineering from Sharif University of Technology. She then received a PhD in industrial and systems engineering from the University of Washington in Seattle. This work was conducted at the University of Washington as a part of Zahra's doctoral dissertation. She now works at Amazon after graduation.
Archis Ghate
Archis Ghate is a Professor of Industrial and Systems Engineering at the University of Washington in Seattle. His research focuses on optimization under uncertainty. He received a PhD in industrial and operations engineering from the University of Michigan in Ann Arbor.