227
Views
2
CrossRef citations to date
0
Altmetric
Operations Engineering & Analytics

A cost–based analysis for risk–averse explore–then–commit finite–time bandits

ORCID Icon, , ORCID Icon &
Pages 1094-1108 | Received 13 Aug 2020, Accepted 21 Jan 2021, Published online: 06 Apr 2021

References

  • Abrahams, E., Ginsburg, G.S. and Silver, M. (2005). The personalized medicine coalition. American Journal of Pharmacogenomics, 5(6), 345–355.
  • Agrawal, S. and Devanur, N. (2016). Linear contextual bandits with knapsacks, in Advances in Neural Information Processing Systems, NIPS, Barcelona, Spain, pp. 3450–3458.
  • Audibert, J.–Y. and Bubeck, S. (2010). Best arm identification in multi–armed bandits, in Twenty-third Conference on Learning Theory, Haifa, Israel, pp. 41–53.
  • Auer, P., Cesa–Bianchi, N. and Fischer, P. (2002). Finite–time analysis of the multiarmed bandit problem, Machine Learning, 47(2–3), 235–256.
  • Avner, O. and Mannor, S. (2016). Multi–user lax communications: A multi–armed bandit approach, in IEEE INFOCOM 2016–The 35th Annual IEEE International Conference on Computer Communications, IEEE Press, Piscataway, NJ, pp. 1– 9.
  • Badanidiyuru, A., Kleinberg, R. and Slivkins, A. (2013). Bandits with knapsacks, in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, IEEE Press, Piscataway, NJ, pp. 207–216.
  • Badanidiyuru, A., Langford, J. and Slivkins, A. (2014). Resourceful contextual bandits, in 27th Annual Conference on Learning Theory, Barcelona, Spain. pp. 1109–1134.
  • Bergemann, D. and Hege, U. (1998). Dynamic venture capital financing, learning and moral hazard, Journal of Banking and Finance, 22(6–8), 703–735.
  • Bergemann, D. and Hege, U. (2005). The financing of innovation: Learning and stopping, RAND Journal of Economics, 36, 719–752.
  • Bubeck, S., Munos, R. and Stoltz, G. (2009). Pure exploration in multi–armed bandits problems, in International Conference on Algorithmic Learning Theory, Springer, Porto, Portugal, pp. 23–37.
  • Bubeck, S., Munos, R. and Stoltz, G. (2011). Pure exploration in finitely–armed and continuous–armed bandits, Theoretical Computer Science, 412(19), 1832–1852.
  • Bui, L.X., Johari, R. and Mannor, S. (2011). Committing bandits, in Advances in Neural Information Processing Systems, Granada, Spain, pp. 1557–1565.
  • Cassel, A., Mannor, S. and Zeevi, A. (2018). A general approach to multi–armed bandits under risk criteria. arXiv preprint arXiv:1806.01380.
  • Chawla, N.V. and Davis, D.A. (2013). Bringing big data to personalized healthcare: A patient–centered framework, Journal of General Internal Medicine 28(3), 660–665.
  • Ding, W., Qin, T. Zhang, X.–D. and Liu, T.–Y. (2013). Multi–armed bandit with budget constraint and variable costs, in Twenty–Seventh AAAI Conference on Artificial Intelligence, AAAI, Bellevue, Washington, USA, pp. 232–238.
  • Gabillon, V., Ghavamzadeh, M. Lazaric, A. and Bubeck, S. (2011). Multi–bandit best arm identification, in Advances in Neural Information Processing Systems, NIPS, Granada, Spain, pp. 2222–2230.
  • Galichet, N. (2015). Contributions to multi–armed bandits: Risk–awareness and sub–sampling for linear contextual bandits, PhD dissertation, Universite Paris Sud–Paris XI.
  • Galichet, N., Sebag, M. and Teytaud, O. (2013). Exploration vs exploitation vs safety: Risk–aware multi–armed bandits, in Asian Conference on Machine Learning, HAL access, PhD dissertation, Université Paris Sud-Paris XI, pp. 245–260.
  • Garivier, A., Lattimore, T. and Kaufmann, E. (2016). On explore–then–commit strategies, in Advances in Neural Information Processing Systems, pp. 784–792.
  • Garivier, A., Menard, P. and Stoltz, G. (2019). Explore first, exploit next: The true shape of regret in bandit problems, Mathematics of Operations Research, 44(2), 377–399.
  • Kaggle. (2017). New York Stock Exchange Dataset. https://www.kaggle.com/dgawlik/nyse. (accessed December 2020).
  • Kolla, R.K. and Jagannathan, K. (2019). Risk–aware multi–armed bandits using conditional value–at–risk. arXiv preprint arXiv:1901.00997.
  • Lattimore, T. and Szepesvari, C. (2020). Bandit Algorithms, Cambridge University Press.
  • Lesage–Landry, A. and Taylor, J.A. (2017). The multi–armed bandit with stochastic plays, IEEE Transactions on Automatic Control, 63(7), 2280–2286.
  • Liau, D., Song, Z., Price, E. and Yang, G. (2018). Stochastic multi–armed bandits in constant space, in International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, pp. 386–394.
  • Maghsudi, S. and Hossain, E. (2017). Distributed user association in energy harvesting dense small cell networks: A mean–field multi–armed bandit approach, IEEE Access 5, 3513–3523.
  • Markowitz, H.M. (1952). Portfolio selection, The Journal of Finance 7(1), 77–91.
  • Meshram, R., Manjunath, D. and Gopalan, A. (2018). On the Whittle index for restless multiarmed hidden Markov bandits, IEEE Transactions on Automatic Control, 63(9), 3046–3053.
  • Musavi, N., Onural, D., Gunes, K. and Yildiz, Y. (2016). Unmanned aircraft systems airspace integration: A game theoretical framework for concept evaluations, Journal of Guidance, Control, and Dynamics, 40, 96–109.
  • Perchet, V., Rigollet, P., Chassang, S. and Snowberg, E. (2016). Batched bandit problems, The Annals of Statistics 44(2), 660–681.
  • Pritchard, D.E, Moeckel, F., Villa, M.S., Housman, L.T., McCarty, C.A. and McLeod, H.L. (2017). Strategies for integrating personalized medicine into healthcare practice, Personalized Medicine, 14(2), 141–152.
  • Priyanka, K. and Kulennavar, N. (2014). A survey on big data analytics in health care, International Journal of Computer Science and Information Technologies, 5(4), 5865–5868.
  • Robbins, H. (1952). Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, 58(5), 527–535.
  • Sani, A., Lazaric, A. and Munos, R. (2012). Risk–aversion in multi–armed bandits, in Advances in Neural Information Processing Systems, , NIPS, Lake Tahoe, Nevada, pp. 3275–3283.
  • Tran–Thanh, L., Chapman, A., de Cote, E.M., Rogers, A. and Jennings, N.R. (2010). Epsilon– first policies for budget–limited multi–armed bandits, in Proceedings of the Twenty–Fourth AAAI Conference on Artificial Intelligence, vol 24, Atlanta, Georgia, pp. 1211–1216.
  • Tran–Thanh, L., Chapman, A., Rogers, A. and Jennings, N.R. (2012). Knapsack based optimal policies for budget–limited multi–armed bandits, in Twenty–Sixth AAAI Conference on Artificial Intelligence, pp. 1134–1140.
  • Vakili, S. and Zhao, Q. (2015a). Mean–variance and value at risk in multi–armed bandit problems, in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE Press, Piscataway, NJ, pp. 1330–1335.
  • Vakili, S. and Zhao, Q. (2015b). Risk–averse online learning under mean–variance measures, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Press, Piscataway, NJ, pp. 1911–1915.
  • Vakili, S. and Zhao, Q. (2016). Risk–averse multi–armed bandit problems under mean–variance measure, IEEE Journal of Selected Topics in Signal Processing, 10(6), 1093–1111.
  • Vermorel, J. and Mohri, M. (2005). Multi–armed bandit algorithms and empirical evaluation, in European Conference on Machine Learning, Springer, Porto, Portugal, pp. 437–448.
  • Xia, Y., Ding, W., Zhang, X.–D., Yu, N. and Qin, T. (2016). Budgeted bandit problems with continuous random costs, in Asian Conference on Machine Learning, Hong Kong, 20–22 November, pp. 317–332.
  • Xia, Y., Li, H., Qin, T., Yu, N., and Liu, T.–Y. (2015). Thompson sampling for budgeted multi–armed bandits, in Twenty–Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina.
  • Xu, J., Haskell, W.B. and Ye, Z. (2018). Index–based policy for risk–averse multi–armed bandit. arXiv preprint arXiv:1809.05385.
  • Yekkehkhany, A., Arian, E., Hajiesmaili, M. and Nagi, R. (2019). Risk–averse explore–then–commit algorithms for finite–time bandits, in 2019 IEEE 58th Conference on Decision and Control (CDC), IEEE Press,Piscataway, NJ, pp. 8441–8446.
  • Yekkehkhany, A., Murray, T. and Nagi, R. (2020). Risk–averse equilibrium for games, arXiv preprint arXiv:2002.08414.
  • Yekkehkhany, A. and Nagi, R. (2020a). Blind GB–PANDAS: A blind throughput–optimal load balancing algorithm for affinity scheduling, IEEE/ACM Transactions on Networking, 28(3), 1199–1212.
  • Yekkehkhany, A. and Nagi, R. (2020b). Risk–averse equilibrium for autonomous vehicles in stochastic congestion games. arXiv preprint arXiv:2007.09771.
  • Yu, J.Y. and Nikolova, E. (2013). Sample complexity of risk–averse bandit–arm selection, in Twenty–Third International Joint Conference on Artificial Intelligence, Beijing, China, pp. 2576–2582.
  • Zois, D.–S. (2016). Sequential decision–making in healthcare IoT: Real–time health monitoring, treatments and interventions, in 2016 IEEE 3rd World Forum on Internet of Things (WF– IoT), IEEE Press, Piscataway, NJJ, pp. 24–29.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.