62
Views
2
CrossRef citations to date
0
Altmetric
General Paper

Expectation-maximization for Bayes-adaptive POMDPs

&
Pages 1605-1623 | Received 06 Sep 2013, Accepted 15 May 2015, Published online: 21 Dec 2017

References

  • AndersonJA secular equation for the eigenvalues of a diagonal matrix perturbationLinear Algebra and its Applications1996246497010.1016/0024-3795(94)00314-9
  • Barber D and Furmston T (2009). Solving deterministic policy (PO)MDPs using expectation-maximisation and antifreeze. In: European Conference on Machine Learning (LEMIR workshop). Springer-Verlag: Bled, Slovenia, pp 50–64.
  • BatirNSome new inequalities for gamma and polygamma functionsJournal of Inequalities in Pure and Applied Mathematics20056419
  • CappéOMoulinesERydénTInference in Hidden Markov Models2005
  • CassandraARExact and Approximate Algorithms for Partially Observable Markov Decision Processes1998
  • Cassandra AR (2009). The POMDP page. http://www.pomdp.org/pomdp/code/index.shtml, accessed October 2012.
  • Dallaire P, Besse C, Ross S and Chaib-draa B (2009). Bayesian reinforcement learning in continuous POMDPs with Gaussian processes. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE: St. Louis, MO, pp 2604–2609.
  • DavisPJThe Gamma function and related functionsHandbook of Mathematical Functions1965
  • Doshi F, Pineau J and Roy N (2008). Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In: Proceedings of the 25th International Conference on Machine Learning, ACM: Helsinki, Finland, pp 256–263.
  • EnglishBJRousseauGBounds for certain harmonic sumsJournal of Mathematical Analysis and Applications1997206242844110.1006/jmaa.1997.5226
  • Furmston T and Barber D (2010). Variational methods for reinforcement learning. In: AISTATS. Journal of Machine Learning Research: Sardinia, Italy, pp 241–248.
  • GelfandIMFominSVCalculus of Variations1963
  • Guez A, Silver D and Dayan P (2012). Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems. Morgan Kaufmann: Lake Tahoe, Nevada, pp 1025–1033.
  • Hansen EA (1998a). An improved policy iteration algorithm for partially observable MDPs. Advances in Neural Information Processing Systems, 1015–1021.
  • Hansen EA (1998b). Solving POMDPs by searching in policy space. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: Madison, Wisconsin, pp 211–219.
  • Istratescu VI (2002). Fixed-point theory: An introduction. Vol. 7, Springer: Netherlands.
  • JohnsonNLKotzSBalakrishnanNContinuous Multivariate Distributions, Volume 1, Models and Applications2002
  • KaelblingLPLittmanMLCassandraARPlanning and acting in partially observable stochastic domainsArtificial Intelligence199810119913410.1016/S0004-3702(98)00023-X
  • KschischangFRFreyBJLoeligerHAFactor graphs and the sum-product algorithmInformation Theory, IEEE Transactions on200147249851910.1109/18.910572
  • NealRMHintonGEA view of the EM algorithm that justifies incremental, sparse, and other variantsLearning in graphical models1998355368
  • PapadimitriouCHTsitsiklisJNThe complexity of Markov decision processesMathematics of Operations Research198712344145010.1287/moor.12.3.441
  • Pineau J, Gordon G and Thrun S (2003). Point-based value iteration: An anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence, Vol. 18, Lawrence Erlbaum Associates: Acapulco, Mexico, pp. 1025–1032.
  • PineauJGordonGThrunSAnytime point-based approximations for large POMDPsJournal of Artificial Intelligence Research2006271335380
  • PlemmonsRJM-matrix characterizations. I-nonsingular M-matricesLinear Algebra and its Applications197718217518810.1016/0024-3795(77)90073-8
  • Poupart P, Lang T and Toussaint M (2011). Escaping local optima in POMDP planning as inference. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Vol. 3, International Foundation for Autonomous Agents and Multiagent Systems: Taipei, Taiwan, pp 1263–1264.
  • Ross S, Chaib-draa B and Pineau J (2008). Bayes-adaptive POMDPs. In: Advances in Neural Information Processing Systems 20 (NIPS). Morgan Kaufmann: Vancouver, B.C., Canada.
  • RossSPineauJChaib-draaBKreitmannPA Bayesian approach for learning and planning in partially observable Markov decision processesThe Journal of Machine Learning Research201112May17291770
  • Schwarz HR and Waldvogel J (1989). Numerical Analysis: A Comprehensive Introduction. Vol. 10, Wiley: Chichester, New York.
  • ToussaintMStorkeyAHarmelingSExpectation-maximization methods for solving (PO)MDPs and optimal control problemsInference and Learning in Dynamic Models2010
  • VargoECogillRAn argument for the Bayesian control of partially observable Markov decision processesAutomatic Control, IEEE Transactions on201459102796280010.1109/TAC.2014.2314527
  • Wang Y, Won KS, Hsu D and Lee WS (2012). Monte Carlo Bayesian reinforcement learning. In: Proceedings of the 29th International Conference on Machine Learning (ICML), Morgan Kaufmann: Edinburgh, Scotland.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.