Search in:

Advanced search

IISE Transactions Volume 53, 2021 - Issue 10

Submit an article Journal homepage

227

Views

CrossRef citations to date

Altmetric

Operations Engineering & Analytics

A cost–based analysis for risk–averse explore–then–commit finite–time bandits

Ali Yekkehkhanya Electrical and Computer EngineeringCorrespondence[email protected]

https://orcid.org/0000-0001-9130-9668 View further author information

Ebrahim Arianb Industrial and Enterprise Systems Engineering, University of Illinois at Urbana–Champaign IL, USAView further author information

Rakesh Nagib Industrial and Enterprise Systems Engineering, University of Illinois at Urbana–Champaign IL, USA

https://orcid.org/0000-0003-4022-6277 View further author information

Ilan Shomoronya Electrical and Computer EngineeringView further author information

Pages 1094-1108 | Received 13 Aug 2020, Accepted 21 Jan 2021, Published online: 06 Apr 2021

Cite this article
https://doi.org/10.1080/24725854.2021.1882014
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Abrahams, E., Ginsburg, G.S. and Silver, M. (2005). The personalized medicine coalition. American Journal of Pharmacogenomics, 5(6), 345–355.
Google Scholar
Agrawal, S. and Devanur, N. (2016). Linear contextual bandits with knapsacks, in Advances in Neural Information Processing Systems, NIPS, Barcelona, Spain, pp. 3450–3458.
Google Scholar
Audibert, J.–Y. and Bubeck, S. (2010). Best arm identification in multi–armed bandits, in Twenty-third Conference on Learning Theory, Haifa, Israel, pp. 41–53.
Google Scholar
Auer, P., Cesa–Bianchi, N. and Fischer, P. (2002). Finite–time analysis of the multiarmed bandit problem, Machine Learning, 47(2–3), 235–256.
Web of Science ®Google Scholar
Avner, O. and Mannor, S. (2016). Multi–user lax communications: A multi–armed bandit approach, in IEEE INFOCOM 2016–The 35th Annual IEEE International Conference on Computer Communications, IEEE Press, Piscataway, NJ, pp. 1– 9.
Google Scholar
Badanidiyuru, A., Kleinberg, R. and Slivkins, A. (2013). Bandits with knapsacks, in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, IEEE Press, Piscataway, NJ, pp. 207–216.
Google Scholar
Badanidiyuru, A., Langford, J. and Slivkins, A. (2014). Resourceful contextual bandits, in 27th Annual Conference on Learning Theory, Barcelona, Spain. pp. 1109–1134.
Google Scholar
Bergemann, D. and Hege, U. (1998). Dynamic venture capital financing, learning and moral hazard, Journal of Banking and Finance, 22(6–8), 703–735.
Web of Science ®Google Scholar
Bergemann, D. and Hege, U. (2005). The financing of innovation: Learning and stopping, RAND Journal of Economics, 36, 719–752.
Web of Science ®Google Scholar
Bubeck, S., Munos, R. and Stoltz, G. (2009). Pure exploration in multi–armed bandits problems, in International Conference on Algorithmic Learning Theory, Springer, Porto, Portugal, pp. 23–37.
Google Scholar
Bubeck, S., Munos, R. and Stoltz, G. (2011). Pure exploration in finitely–armed and continuous–armed bandits, Theoretical Computer Science, 412(19), 1832–1852.
Web of Science ®Google Scholar
Bui, L.X., Johari, R. and Mannor, S. (2011). Committing bandits, in Advances in Neural Information Processing Systems, Granada, Spain, pp. 1557–1565.
Google Scholar
Cassel, A., Mannor, S. and Zeevi, A. (2018). A general approach to multi–armed bandits under risk criteria. arXiv preprint arXiv:1806.01380.
Google Scholar
Chawla, N.V. and Davis, D.A. (2013). Bringing big data to personalized healthcare: A patient–centered framework, Journal of General Internal Medicine 28(3), 660–665.
PubMedGoogle Scholar
Ding, W., Qin, T. Zhang, X.–D. and Liu, T.–Y. (2013). Multi–armed bandit with budget constraint and variable costs, in Twenty–Seventh AAAI Conference on Artificial Intelligence, AAAI, Bellevue, Washington, USA, pp. 232–238.
Google Scholar
Gabillon, V., Ghavamzadeh, M. Lazaric, A. and Bubeck, S. (2011). Multi–bandit best arm identification, in Advances in Neural Information Processing Systems, NIPS, Granada, Spain, pp. 2222–2230.
Google Scholar
Galichet, N. (2015). Contributions to multi–armed bandits: Risk–awareness and sub–sampling for linear contextual bandits, PhD dissertation, Universite Paris Sud–Paris XI.
Google Scholar
Galichet, N., Sebag, M. and Teytaud, O. (2013). Exploration vs exploitation vs safety: Risk–aware multi–armed bandits, in Asian Conference on Machine Learning, HAL access, PhD dissertation, Université Paris Sud-Paris XI, pp. 245–260.
Google Scholar
Garivier, A., Lattimore, T. and Kaufmann, E. (2016). On explore–then–commit strategies, in Advances in Neural Information Processing Systems, pp. 784–792.
Google Scholar
Garivier, A., Menard, P. and Stoltz, G. (2019). Explore first, exploit next: The true shape of regret in bandit problems, Mathematics of Operations Research, 44(2), 377–399.
Web of Science ®Google Scholar
Kaggle. (2017). New York Stock Exchange Dataset. https://www.kaggle.com/dgawlik/nyse. (accessed December 2020).
Google Scholar
Kolla, R.K. and Jagannathan, K. (2019). Risk–aware multi–armed bandits using conditional value–at–risk. arXiv preprint arXiv:1901.00997.
Google Scholar
Lattimore, T. and Szepesvari, C. (2020). Bandit Algorithms, Cambridge University Press.
Google Scholar
Lesage–Landry, A. and Taylor, J.A. (2017). The multi–armed bandit with stochastic plays, IEEE Transactions on Automatic Control, 63(7), 2280–2286.
Web of Science ®Google Scholar
Liau, D., Song, Z., Price, E. and Yang, G. (2018). Stochastic multi–armed bandits in constant space, in International Conference on Artificial Intelligence and Statistics, Playa Blanca, Lanzarote, Canary Islands, pp. 386–394.
Google Scholar
Maghsudi, S. and Hossain, E. (2017). Distributed user association in energy harvesting dense small cell networks: A mean–field multi–armed bandit approach, IEEE Access 5, 3513–3523.
Web of Science ®Google Scholar
Markowitz, H.M. (1952). Portfolio selection, The Journal of Finance 7(1), 77–91.
Web of Science ®Google Scholar
Meshram, R., Manjunath, D. and Gopalan, A. (2018). On the Whittle index for restless multiarmed hidden Markov bandits, IEEE Transactions on Automatic Control, 63(9), 3046–3053.
Web of Science ®Google Scholar
Musavi, N., Onural, D., Gunes, K. and Yildiz, Y. (2016). Unmanned aircraft systems airspace integration: A game theoretical framework for concept evaluations, Journal of Guidance, Control, and Dynamics, 40, 96–109.
Web of Science ®Google Scholar
Perchet, V., Rigollet, P., Chassang, S. and Snowberg, E. (2016). Batched bandit problems, The Annals of Statistics 44(2), 660–681.
Web of Science ®Google Scholar
Pritchard, D.E, Moeckel, F., Villa, M.S., Housman, L.T., McCarty, C.A. and McLeod, H.L. (2017). Strategies for integrating personalized medicine into healthcare practice, Personalized Medicine, 14(2), 141–152.
PubMed Web of Science ®Google Scholar
Priyanka, K. and Kulennavar, N. (2014). A survey on big data analytics in health care, International Journal of Computer Science and Information Technologies, 5(4), 5865–5868.
Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, 58(5), 527–535.
Web of Science ®Google Scholar
Sani, A., Lazaric, A. and Munos, R. (2012). Risk–aversion in multi–armed bandits, in Advances in Neural Information Processing Systems, , NIPS, Lake Tahoe, Nevada, pp. 3275–3283.
Google Scholar
Tran–Thanh, L., Chapman, A., de Cote, E.M., Rogers, A. and Jennings, N.R. (2010). Epsilon– first policies for budget–limited multi–armed bandits, in Proceedings of the Twenty–Fourth AAAI Conference on Artificial Intelligence, vol 24, Atlanta, Georgia, pp. 1211–1216.
Google Scholar
Tran–Thanh, L., Chapman, A., Rogers, A. and Jennings, N.R. (2012). Knapsack based optimal policies for budget–limited multi–armed bandits, in Twenty–Sixth AAAI Conference on Artificial Intelligence, pp. 1134–1140.
Google Scholar
Vakili, S. and Zhao, Q. (2015a). Mean–variance and value at risk in multi–armed bandit problems, in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE Press, Piscataway, NJ, pp. 1330–1335.
Google Scholar
Vakili, S. and Zhao, Q. (2015b). Risk–averse online learning under mean–variance measures, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Press, Piscataway, NJ, pp. 1911–1915.
Google Scholar
Vakili, S. and Zhao, Q. (2016). Risk–averse multi–armed bandit problems under mean–variance measure, IEEE Journal of Selected Topics in Signal Processing, 10(6), 1093–1111.
Web of Science ®Google Scholar
Vermorel, J. and Mohri, M. (2005). Multi–armed bandit algorithms and empirical evaluation, in European Conference on Machine Learning, Springer, Porto, Portugal, pp. 437–448.
Google Scholar
Xia, Y., Ding, W., Zhang, X.–D., Yu, N. and Qin, T. (2016). Budgeted bandit problems with continuous random costs, in Asian Conference on Machine Learning, Hong Kong, 20–22 November, pp. 317–332.
Google Scholar
Xia, Y., Li, H., Qin, T., Yu, N., and Liu, T.–Y. (2015). Thompson sampling for budgeted multi–armed bandits, in Twenty–Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina.
Google Scholar
Xu, J., Haskell, W.B. and Ye, Z. (2018). Index–based policy for risk–averse multi–armed bandit. arXiv preprint arXiv:1809.05385.
Google Scholar
Yekkehkhany, A., Arian, E., Hajiesmaili, M. and Nagi, R. (2019). Risk–averse explore–then–commit algorithms for finite–time bandits, in 2019 IEEE 58th Conference on Decision and Control (CDC), IEEE Press,Piscataway, NJ, pp. 8441–8446.
Google Scholar
Yekkehkhany, A., Murray, T. and Nagi, R. (2020). Risk–averse equilibrium for games, arXiv preprint arXiv:2002.08414.
Google Scholar
Yekkehkhany, A. and Nagi, R. (2020a). Blind GB–PANDAS: A blind throughput–optimal load balancing algorithm for affinity scheduling, IEEE/ACM Transactions on Networking, 28(3), 1199–1212.
Web of Science ®Google Scholar
Yekkehkhany, A. and Nagi, R. (2020b). Risk–averse equilibrium for autonomous vehicles in stochastic congestion games. arXiv preprint arXiv:2007.09771.
Google Scholar
Yu, J.Y. and Nikolova, E. (2013). Sample complexity of risk–averse bandit–arm selection, in Twenty–Third International Joint Conference on Artificial Intelligence, Beijing, China, pp. 2576–2582.
Google Scholar
Zois, D.–S. (2016). Sequential decision–making in healthcare IoT: Real–time health monitoring, treatments and interventions, in 2016 IEEE 3rd World Forum on Internet of Things (WF– IoT), IEEE Press, Piscataway, NJJ, pp. 24–29.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A cost–based analysis for risk–averse explore–then–commit finite–time bandits

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A cost–based analysis for risk–averse explore–then–commit finite–time bandits

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date