Search in:

Advanced search

Journal of the Operational Research Society Volume 66, 2015 - Issue 10: Feature Cluster: In Memory of DJ White

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

General Paper

Expectation-maximization for Bayes-adaptive POMDPs

Erik P VargoUniversity of VirginiaCharlottesvilleUSACorrespondence[email protected]

Randy CogillIBM Research – IrelandDublinIreland

Pages 1605-1623 | Received 06 Sep 2013, Accepted 15 May 2015, Published online: 21 Dec 2017

Cite this article
https://doi.org/10.1057/jors.2015.49
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

AndersonJA secular equation for the eigenvalues of a diagonal matrix perturbationLinear Algebra and its Applications1996246497010.1016/0024-3795(94)00314-9
Google Scholar
Barber D and Furmston T (2009). Solving deterministic policy (PO)MDPs using expectation-maximisation and antifreeze. In: European Conference on Machine Learning (LEMIR workshop). Springer-Verlag: Bled, Slovenia, pp 50–64.
Google Scholar
BatirNSome new inequalities for gamma and polygamma functionsJournal of Inequalities in Pure and Applied Mathematics20056419
Google Scholar
CappéOMoulinesERydénTInference in Hidden Markov Models2005
Google Scholar
CassandraARExact and Approximate Algorithms for Partially Observable Markov Decision Processes1998
Google Scholar
Cassandra AR (2009). The POMDP page. http://www.pomdp.org/pomdp/code/index.shtml, accessed October 2012.
Google Scholar
Dallaire P, Besse C, Ross S and Chaib-draa B (2009). Bayesian reinforcement learning in continuous POMDPs with Gaussian processes. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE: St. Louis, MO, pp 2604–2609.
Google Scholar
DavisPJThe Gamma function and related functionsHandbook of Mathematical Functions1965
Google Scholar
Doshi F, Pineau J and Roy N (2008). Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In: Proceedings of the 25th International Conference on Machine Learning, ACM: Helsinki, Finland, pp 256–263.
Google Scholar
EnglishBJRousseauGBounds for certain harmonic sumsJournal of Mathematical Analysis and Applications1997206242844110.1006/jmaa.1997.5226
Google Scholar
Furmston T and Barber D (2010). Variational methods for reinforcement learning. In: AISTATS. Journal of Machine Learning Research: Sardinia, Italy, pp 241–248.
Google Scholar
GelfandIMFominSVCalculus of Variations1963
Google Scholar
Guez A, Silver D and Dayan P (2012). Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems. Morgan Kaufmann: Lake Tahoe, Nevada, pp 1025–1033.
Google Scholar
Hansen EA (1998a). An improved policy iteration algorithm for partially observable MDPs. Advances in Neural Information Processing Systems, 1015–1021.
Google Scholar
Hansen EA (1998b). Solving POMDPs by searching in policy space. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers: Madison, Wisconsin, pp 211–219.
Google Scholar
Istratescu VI (2002). Fixed-point theory: An introduction. Vol. 7, Springer: Netherlands.
Google Scholar
JohnsonNLKotzSBalakrishnanNContinuous Multivariate Distributions, Volume 1, Models and Applications2002
Google Scholar
KaelblingLPLittmanMLCassandraARPlanning and acting in partially observable stochastic domainsArtificial Intelligence199810119913410.1016/S0004-3702(98)00023-X
Web of Science ®Google Scholar
KschischangFRFreyBJLoeligerHAFactor graphs and the sum-product algorithmInformation Theory, IEEE Transactions on200147249851910.1109/18.910572
Google Scholar
NealRMHintonGEA view of the EM algorithm that justifies incremental, sparse, and other variantsLearning in graphical models1998355368
Google Scholar
PapadimitriouCHTsitsiklisJNThe complexity of Markov decision processesMathematics of Operations Research198712344145010.1287/moor.12.3.441
Web of Science ®Google Scholar
Pineau J, Gordon G and Thrun S (2003). Point-based value iteration: An anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence, Vol. 18, Lawrence Erlbaum Associates: Acapulco, Mexico, pp. 1025–1032.
Google Scholar
PineauJGordonGThrunSAnytime point-based approximations for large POMDPsJournal of Artificial Intelligence Research2006271335380
Google Scholar
PlemmonsRJM-matrix characterizations. I-nonsingular M-matricesLinear Algebra and its Applications197718217518810.1016/0024-3795(77)90073-8
Google Scholar
Poupart P, Lang T and Toussaint M (2011). Escaping local optima in POMDP planning as inference. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Vol. 3, International Foundation for Autonomous Agents and Multiagent Systems: Taipei, Taiwan, pp 1263–1264.
Google Scholar
Ross S, Chaib-draa B and Pineau J (2008). Bayes-adaptive POMDPs. In: Advances in Neural Information Processing Systems 20 (NIPS). Morgan Kaufmann: Vancouver, B.C., Canada.
Google Scholar
RossSPineauJChaib-draaBKreitmannPA Bayesian approach for learning and planning in partially observable Markov decision processesThe Journal of Machine Learning Research201112May17291770
Google Scholar
Schwarz HR and Waldvogel J (1989). Numerical Analysis: A Comprehensive Introduction. Vol. 10, Wiley: Chichester, New York.
Google Scholar
ToussaintMStorkeyAHarmelingSExpectation-maximization methods for solving (PO)MDPs and optimal control problemsInference and Learning in Dynamic Models2010
Google Scholar
VargoECogillRAn argument for the Bayesian control of partially observable Markov decision processesAutomatic Control, IEEE Transactions on201459102796280010.1109/TAC.2014.2314527
Google Scholar
Wang Y, Won KS, Hsu D and Lee WS (2012). Monte Carlo Bayesian reinforcement learning. In: Proceedings of the 29th International Conference on Machine Learning (ICML), Morgan Kaufmann: Edinburgh, Scotland.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Expectation-maximization for Bayes-adaptive POMDPs

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Expectation-maximization for Bayes-adaptive POMDPs

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date