Abstract
A certain monotonicity property is proved for the optimal expected cumulative discounted reward associated with a dynamic programming model with finite horizon, describing the Bernoulli: two-armed bandit problem. This property leads to a positive answer to the conjecture formulated in [3, p. 473]. For two special cases a stay-on-a-winner rule is obtained as a by-product.