Abstract
The structure of the proposed Q-learning with a continuous actions set, as well as the states, consists of three parts: a fuzzy quantizer, whose output is used to learn the action policy; an action evaluation module, which models and produces the expected evaluation signal; and a stochastic action selection unit that generates an action with the expectation of better performance using a probability distribution function to estimate an optimal action selection policy. Further, the algorithm is applied to cooperative tasks where two robots must consider their partner's action before taking their own actions. Conventional Q-learning requires a predefined and discrete state space but fails to identify the variances in different situations in the same state. The proposed Q-learning with the stochastic recording real-valued unit can differentiate the actions corresponding to different state inputs but categorized to the same state. Therefore, this unit can be regarded as an action evaluation module, which models and produces the expected evaluation signal, and an action selection unit that generates an action with the expectation of better performance using a probability distribution function to estimate an optimal action selection policy. The results from the simulations demonstrate better performance and applicability of the proposed learning model.
Acknowledgments
Partial content of this article was presented at the 2009 IEEE International Conference on Systems, Man, and Cybernetics and published in the proceedings.