Abstract
When learning and test phases are not separated, there is a trade-off between speed and accuracy. This is a universal problem for agents acting under uncertainty. To address this trade-off, we employ a strategy called satisficing, which looks for actions that are satisfactory with respect to a given reference level. In this study, we introduce a satisficing value function, the loosely symmetric model with variable reference (LSVR) which is an extension of the loosely symmetric model inspired by some causal and perceptual human properties. We tested the performance of the LSVR in K-armed bandit problems that deal with the trade-off in the simplest possible way. Our results show that the LSVR enables effective online optimisation through satisficing.
Notes
No potential conflict of interest was reported by the authors.
This study was presented in part at the 6th International Conference on Soft Computing and Intelligent Systems–the 13th International Symposium on Advanced Intelligent Systems (SCIS-ISIS 2012), Kobe, Japan, 20–24 November 2012, and the 12th International Conference of Numerical Analysis and Applied Mathematics (ICNAAM 2014), Rhodes, Greece, 22–28 September 2014.