459
Views
1
CrossRef citations to date
0
Altmetric
Articles

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers

ORCID Icon, , & ORCID Icon
Pages 122-133 | Received 03 Oct 2020, Accepted 10 Mar 2021, Published online: 12 Apr 2021

References

  • Alaya-Feki, A. B. H., Moulines, E., & LeCornec, A. (2008). Dynamic spectrum access with non-stationary multi-armed bandit. In 2008 IEEE 9th Workshop on Signal Processing Advances in Wireless Communications. (pp. 416–420). IEEE.
  • Allesiardo, R., & Féraud, R. (2015). Exp3 with drift detection for the switching bandit problem. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 1–7). IEEE.
  • Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397–422. https://doi.org/10.5555/944919.944941
  • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235–256. https://doi.org/10.1023/A:1013689704352
  • Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48–77. https://doi.org/10.1137/S0097539701398375
  • Besbes, O., Gur, Y., & Zeevi, A. (2014). Stochastic multi-armed-bandit problem with non-stationary rewards. In Advances in Neural Information Processing Systems (pp. 199–207). MIT Press.
  • Buccapatnam, S., Liu, F., Eryilmaz, A., & Shroff, N. B. (2017). Reward maximization under uncertainty: leveraging side-observations on networks. Journal of Machine Learning Research, 18, 7947–7980. https://doi.org/10.5555/3122009.3242073
  • Cao, Y., Wen, Z., Kveton, B., & Xie, Y. (2019). Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In Proceedings of Machine Learning Research (pp. 418–427). PMLR.
  • Garivier, A., & Moulines, E. (2011). On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory (pp. 174–188).  Springer.
  • Hartland, C., Baskiotis, N., Gelly, S., Sebag, M., & Teytaud, O. (2007). Change point detection and meta-bandits for online learning in dynamic environments. In CAp 2007: 9è Conférence francophone sur l'apprentissage automatique (pp. 237–250). CEPADUES.
  • Hinkley, D. V. (1971). Inference about the change-point from cumulative sum tests. Biometrika, 58, 509–523. https://doi.org/10.1093/biomet/58.3.509
  • Kaufmann, E., Korda, N., & Munos, R. (2012). Thompson sampling: An asymptotically optimal finite-time analysis. In International Conference on Algorithmic Learning Theory (pp. 199–213). Springer.
  • Kocsis, L., & Szepesvári, C. (2006). Discounted UCB. In 2nd PASCAL Challenges Workshop (Vol. 2). http://videolectures.net/pcw06_kocsis_diu/
  • Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4–22. https://doi.org/10.1016/0196-8858(85)90002-8
  • Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.
  • Li, S., Karatzoglou, A., & Gentile, C. (2016). Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 539–548). New York, NY: Association for Computing Machinery.
  • Liu, F., Lee, J., & Shroff, N. (2018). A change-detection based framework for piecewise-stationary multi-armed bandit problem. In Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press.
  • Mellor, J., & Shapiro, J. (2013). Thompson sampling in switching environments with Bayesian online change detection. In Artificial Intelligence and Statistics (pp. 442–450). Springer.
  • Srivastava, V., Reverdy, P., & Leonard, N. E. (2014). Surveillance in an abruptly changing world via multiarmed bandits. In 53rd IEEE Conference on Decision and Control (pp. 692–697). IEEE.
  • Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285–294. https://doi.org/10.1093/biomet/25.3-4.285
  • Yu, J. Y., & Mannor, S. (2009). Piecewise-stationary bandit problems with side observations. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1177–1184). New York, NY: Association for Computing Machinery.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.