ABSTRACT
This paper constitutes the second part of an investigation of the dual control of long duration stationary ergodic discrete Markov processes. In it the concept of decision space is introduced as a theoretical framework within which the performances of different control strategies may be compared. The evolution of a strategy is described in terms of a decision trajectory which converges by descending a ‘ hill of uncertainty ’. A study of the trajectory of the ideal strategy leads to a generalization of the definition of optimality. The existence of a whole class of sub-optimal strategies is demonstrated; it is shown that the optimal strategy of part I is a member of the class, and that any other member may be optimal also under certain conditions. Two examples of sub-optimal strategies are presented, one involving the estimation of confidence intervals, the other using a learning reinforcement technique.
Notes
†Communicated by the Author.