440
Views
6
CrossRef citations to date
0
Altmetric
Articles

A parallel framework for Bayesian reinforcement learning

, &
Pages 7-23 | Received 01 Sep 2013, Accepted 19 Nov 2013, Published online: 13 Mar 2014

Figures & data

Fig. 1. Stochastic GridWorld.

Fig. 1. Stochastic GridWorld.

Fig. 2. The effects of model changes on the action selection strategies for multiple agents learning in parallel: (a) ε-greedy;=0.15, (b) softmax selection, and (c) unbiased sampling.

Fig. 2. The effects of model changes on the action selection strategies for multiple agents learning in parallel: (a) ε-greedy;=0.15, (b) softmax selection, and (c) unbiased sampling.

Fig. 3. A high level comparison detailing the KLD between the distributions (learned and true) and the Q-values for agent learners in parallel: (a) comparison of selection strategies for five agent learners and (b) average Q values for five agent learners with varying selection strategies.

Fig. 3. A high level comparison detailing the KLD between the distributions (learned and true) and the Q-values for agent learners in parallel: (a) comparison of selection strategies for five agent learners and (b) average Q values for five agent learners with varying selection strategies.

Fig. 4. The effects of model changes on the action selection strategies for multiple agents learning in parallel: (a) ε-greedy selection (ε=0.05), (b) ε-greedy selection (ε=0.15), (c) ε-greedy selection (ε=0.25), (d) unbiased sampling, and (e) comparison of the performance of the selection strategies for 10 agents in parallel.

Fig. 4. The effects of model changes on the action selection strategies for multiple agents learning in parallel: (a) ε-greedy selection (ε=0.05), (b) ε-greedy selection (ε=0.15), (c) ε-greedy selection (ε=0.25), (d) unbiased sampling, and (e) comparison of the performance of the selection strategies for 10 agents in parallel.

Fig. 5. Data traces provided by Cedexis measuring application response time, total requests per country and total requests per region all over a single day: (a) number of requests per country, (b) number of requests satisfied per region, and (c) application response time performance histogram by region (Amazon EC2).

Fig. 5. Data traces provided by Cedexis measuring application response time, total requests per country and total requests per region all over a single day: (a) number of requests per country, (b) number of requests satisfied per region, and (c) application response time performance histogram by region (Amazon EC2).

Fig. 6. Performance of the parallel learning architecture, demonstrating improvements as additional learners are added: (a) Virtual resource allocation, unbiased sampling, multiple agent learners, learning in parallel. (b) A single run of unbiased sampling for 300 trials.

Fig. 6. Performance of the parallel learning architecture, demonstrating improvements as additional learners are added: (a) Virtual resource allocation, unbiased sampling, multiple agent learners, learning in parallel. (b) A single run of unbiased sampling for 300 trials.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.