842
Views
0
CrossRef citations to date
0
Altmetric
Theory and Methods

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates

, ORCID Icon &
Pages 970-982 | Received 07 Jan 2020, Accepted 13 Nov 2022, Published online: 11 Jan 2023

References

  • Abbasi-Yadkori, Y., Pál, D., and Szepesvári, C. (2011), “Improved Algorithms for Linear Stochastic Bandits,” in Advances in Neural Information Processing Systems, pp. 2312–2320.
  • Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., and Schapire, R. (2014), “Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits,” in International Conference on Machine Learning, pp. 1638–1646.
  • Arya, S., and Yang, Y. (2020), “Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards,” Statistics & Probability Letters, 164, 1–9.
  • Audibert, J.-Y., and Tsybakov, A. B. (2007), “Fast Learning Rates for Plug-in Classifiers,” The Annals of Statistics, 35, 608–633. DOI: 10.1214/009053606000001217.
  • Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002), “Finite-Time Analysis of the Multiarmed Bandit Problem,” Machine Learning, 47, 235–256. DOI: 10.1023/A:1013689704352.
  • Auer, P., Ortner, R., and Szepesvári, C. (2007), “Improved Rates for the Stochastic Continuum-Armed Bandit Problem,” in Proceedings of 20th Annual Conference on Learning Theory.
  • Bastani, H., and Bayati, M. (2020), “Online Decision Making with High-Dimensional Covariates,” Operations Research, 68, 276–294. DOI: 10.1287/opre.2019.1902.
  • Berry, D. A., and Fristedt, B. (1985), Bandit Problems: Sequential Allocation of Experiments, New York: Chapman and Hall.
  • Beygelzimer, A., Orabona, F., and Zhang, C. (2017), “Efficient Online Bandit Multiclass Learning with Order Root T Regret,” in Proceedings of the 34th International Conference on Machine Learning, pp. 488–497.
  • Bistritz, I., Zhou, Z., Chen, X., Bambos, N., and Blanchet, J. (2019), “Online EXP3 Learning in Adversarial Bandits with Delayed Feedback,” in Advances in Neural Information Processing Systems, pp. 11349–11358.
  • Bubeck, S., and Cesa-Bianchi, N. (2012), “Regret Analysis of Stochastic and Non stochastic Multi-Armed Bandit Problems,” Foundations and Trends in Machine Learning, 5, 1–122. DOI: 10.1561/2200000024.
  • Bubeck, S., Munos, R., and Stoltz, G. (2011), “Pure Exploration in Finitely-Armed and Continuous-Armed Bandits,” Theoretical Computer Science, 412, 1832–1852. DOI: 10.1016/j.tcs.2010.12.059.
  • Candes, E. J., and Tao, T. (2005), “Decoding by Linear Programming,” IEEE Transactions on Information Theory, 51, 4203–4215. DOI: 10.1109/TIT.2005.858979.
  • Cesa-Bianchi, N., and Lugosi, G. (2006), Prediction, Learning and Games, Cambridge, UK: Cambridge University Press.
  • Chan, H. P. (2020), “The Multi-Armed Bandit Problem: An Efficient Nonparametric Solution,” The Annals of Statistics, 48, 346–373. DOI: 10.1214/19-AOS1809.
  • Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
  • Fan, J., and Lv, J. (2010), “A Selective Overview of Variable Selection in High Dimensional Feature Space,” Statistica Sinica, 20, 101–148.
  • Gittins, J. C. (1989), Multi-Armed Bandit Allocation Indices, New York: Wiley.
  • Goldberg, Y., and Kosorok, M. R. (2012), “Q-Learning with Censored Data,” The Annals of Statistics, 40, 529–560. DOI: 10.1214/12-AOS968.
  • Goldenshluger, A., and Zeevi, A. (2009), “Woodroofe’s One-Armed Bandit Problem Revisited,” The Annals of Applied Probability, 19, 1603–1633.
  • Goldenshluger, A., and Zeevi, A.(2013), “A Linear Response Bandit Problem,” Stochastic Systems, 3, 230–261.
  • Guan, M. Y., and Jiang, H. (2018), “Nonparametric Stochastic Contextual Bandits,” in Proceedings of Association for the Advancement of Artificial Intelligence.
  • Ing, C.-K., and Lai, T. L. (2011), “A Stepwise Regression Method and Consistent Model Selection for High-Dimensional Sparse Linear Models,” Statistica Sinica, 21, 1473–1513.
  • International Warfarin Pharmacogenetics Consortium. (2009), “Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data,” New England Journal of Medicine, 360, 753–764.
  • Kakade, S. M., Shalev-Shwartz, S., and Tewari, A. (2008), “Efficient Bandit Algorithms for Online Multiclass Prediction,” in Proceedings of the 25th International Conference on Machine Learning, ACM, pp. 440–447.
  • Laber, E. B., Lizotte, D. J., Qian, M., Pelham, W. E., and Murphy, S. A. (2014), “Dynamic Treatment Regimes: Technical Challenges and Applications,” Electronic Journal of Statistics, 8, 1225–1272.
  • Laber, E. B., Meyer, N. J., Reich, B. J., Pacifici, K., Collazo, J. A., and Drake, J. M. (2018), “Optimal Treatment Allocations in Space and Time for On-line Control of an Emerging Infectious Disease,” Journal of the Royal Statistical Society, Series C, 67, 743–789.
  • Lai, T. L. (1987), “Adaptive Treatment Allocation and the Multi-Armed Bandit Problem,” The Annals of Statistics, 15, 1091–1114.
  • Lai, T. L., and Robbins, H. (1985), “Asymptotically Efficient Adaptive Allocation Rules,” in Advances in Applied Mathematics (Vol. 6), pp. 4–22.
  • Langford, J., and Zhang, T. (2008), “The Epoch-Greedy Algorithm for Contextual Multi-Armed Bandits,” in Advances in Neural Information Processing Systems.
  • Lattimore, T., and Szepesvári, C. (2020), Bandit Algorithms, Cambridge: Cambridge University Press.
  • Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010), “A Contextual-Bandit Approach to Personalized News Article Recommendation,” in Proceedings of the 19th International World Wide Web Conference.
  • Mammen, E., and Tsybakov, A. B. (1999), “Smooth Discrimination Analysis,” The Annals of Statistics, 27, 1808–1829.
  • May, B. C., Korda, N., Lee, A., and Leslie, D. S. (2012), “Optimistic Bayesian Sampling in Contextual-Bandit Problems,” Journal of Machine Learning Research, 13, 2069–2106.
  • McKeague, I. W., and Qian, M. (2014), “Estimation of Treatment Policies based on Functional Predictors,” Statistica Sinica, 24, 1461–1485. DOI: 10.5705/ss.2012.196.
  • Meinshausen, N., and Yu, B. (2009), “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data,” The Annals of Statistics, 37, 246–270.
  • Murphy, S. A. (2003), “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical Society, Series B, 65, 331–355.
  • Perchet, V., and Rigollet, P. (2013), “The Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 41, 693–721.
  • Qian, M., and Murphy, S. A. (2011), “Performance Guarantees for Individualized Treatment Rules,” The Annals of Statistics, 39, 1180–1210.
  • Qian, W., Ding, S., and Cook, R. D. (2019), “Sparse Minimum Discrepancy Approach to Sufficient Dimension Reduction with Simultaneous Variable Selection in Ultrahigh Dimension,” Journal of the American Statistical Association, 114, 1277–1290.
  • Qian, W., Li, W., Sogawa, Y., Fujimaki, R., Yang, X., and Liu, J. (2019), “An Interactive Greedy Approach to Group Sparsity in High Dimensions,” Technometrics, 61, 409–421.
  • Qian, W., and Yang, Y. (2016a), “Kernel Estimation and Model Combination in a Bandit Problem with Covariates,” Journal of Machine Learning Research, 17, 1–37.
  • Qian, W., and Yang, Y. (2016b), “Randomized Allocation with Arm Elimination in a Bandit Problem with Covariates,” Electronic Journal of Statistics, 10, 242–270.
  • Reeve, H., Mellor, J., and Brown, G. (2018), “The K-Nearest Neighbour UCB Algorithm for Multi-Armed Bandits with Covariates,” in Proceedings of Machine Learning Research (Vol. 83), eds. F. Janoos, M. Mohri, and K. Sridharan, pp. 725–752.
  • Rigollet, P., and Zeevi, A. (2010), “Nonparametric Bandits with Covariates,” Proceedings of the 23rd International Conference on Learning Theory, pp. 54–66, Omnipress.
  • Robbins, H. (1954), “Some Aspects of the Sequential Design of Experiments,” Bulletin of the American Mathematical Society, 58, 527–535.
  • Shi, C., Fan, A., Song, R., and Lu, W. (2018), “High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes,” The Annals of Statistics, 46, 925–957. DOI: 10.1214/17-AOS1570.
  • Tsybakov, A. B. (2004), “Optimal Aggregation of Classifiers in Statistical Learning,” The Annals of Statistics, 32, 135–166.
  • Woodroofe, M. (1979), “A One-Armed Bandit Problem with a Concomitant Variable,” Journal of the American Statistical Association, 74, 799–806.
  • Yahoo! Academic Relations. (2011), “Yahoo! Front Page Today Module User Click Log Fataset, version 2.0,” Available at http://webscope.sandbox.yahoo.com.
  • Yang, Y. (1999), “Minimax Nonparametric Classification—Part I. Rates of Convergence,” IEEE Transactions on Information Theory, 45, 2271–2284.
  • Yang, Y., and Zhu, D. (2002), “Randomized Allocation with Nonparametric Estimation for a Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 30, 100–121.
  • Zhang, C.-H. (2010), “Nearly Unbiased Variable Selection under Minimax Concave Penalty,” The Annals of Statistics, 38, 894–942.
  • Zhang, T. (2011a), “Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations,” IEEE Transactions on Information Theory, 57, 4689–4708.
  • Zhang, T. (2011b), “Sparse Recovery with Orthogonal Matching Pursuit under RIP,” IEEE Transactions on Information Theory, 57, 6215–6221.
  • Zhou, L. (2015), “A Survey on Contextual Multi-Armed Bandits,” arXiv preprint arXiv:1508.03326.
  • Zou, H. (2006), “The Adaptive Lasso and its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.