Search in:

Advanced search

Journal of the American Statistical Association Volume 119, 2024 - Issue 546

Submit an article Journal homepage

842

Views

CrossRef citations to date

Altmetric

Theory and Methods

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates

Wei Qiana Department of Applied Economics and Statistics, University of Delaware, Newark, DEView further author information

Ching-Kang Ingb Institute of Statistics, National Tsing Hua University, Hsinchu, TaiwanCorrespondence[email protected]

https://orcid.org/0000-0003-1362-8246 View further author information

Ji Liuc Meta Platforms, Inc., Seattle, WAView further author information

Pages 970-982 | Received 07 Jan 2020, Accepted 13 Nov 2022, Published online: 11 Jan 2023

Cite this article
https://doi.org/10.1080/01621459.2022.2152343
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Abbasi-Yadkori, Y., Pál, D., and Szepesvári, C. (2011), “Improved Algorithms for Linear Stochastic Bandits,” in Advances in Neural Information Processing Systems, pp. 2312–2320.
Google Scholar
Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., and Schapire, R. (2014), “Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits,” in International Conference on Machine Learning, pp. 1638–1646.
Google Scholar
Arya, S., and Yang, Y. (2020), “Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards,” Statistics & Probability Letters, 164, 1–9.
Web of Science ®Google Scholar
Audibert, J.-Y., and Tsybakov, A. B. (2007), “Fast Learning Rates for Plug-in Classifiers,” The Annals of Statistics, 35, 608–633. DOI: 10.1214/009053606000001217.
Web of Science ®Google Scholar
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002), “Finite-Time Analysis of the Multiarmed Bandit Problem,” Machine Learning, 47, 235–256. DOI: 10.1023/A:1013689704352.
Web of Science ®Google Scholar
Auer, P., Ortner, R., and Szepesvári, C. (2007), “Improved Rates for the Stochastic Continuum-Armed Bandit Problem,” in Proceedings of 20th Annual Conference on Learning Theory.
Google Scholar
Bastani, H., and Bayati, M. (2020), “Online Decision Making with High-Dimensional Covariates,” Operations Research, 68, 276–294. DOI: 10.1287/opre.2019.1902.
Web of Science ®Google Scholar
Berry, D. A., and Fristedt, B. (1985), Bandit Problems: Sequential Allocation of Experiments, New York: Chapman and Hall.
Google Scholar
Beygelzimer, A., Orabona, F., and Zhang, C. (2017), “Efficient Online Bandit Multiclass Learning with Order Root T Regret,” in Proceedings of the 34th International Conference on Machine Learning, pp. 488–497.
Google Scholar
Bistritz, I., Zhou, Z., Chen, X., Bambos, N., and Blanchet, J. (2019), “Online EXP3 Learning in Adversarial Bandits with Delayed Feedback,” in Advances in Neural Information Processing Systems, pp. 11349–11358.
Google Scholar
Bubeck, S., and Cesa-Bianchi, N. (2012), “Regret Analysis of Stochastic and Non stochastic Multi-Armed Bandit Problems,” Foundations and Trends in Machine Learning, 5, 1–122. DOI: 10.1561/2200000024.
Google Scholar
Bubeck, S., Munos, R., and Stoltz, G. (2011), “Pure Exploration in Finitely-Armed and Continuous-Armed Bandits,” Theoretical Computer Science, 412, 1832–1852. DOI: 10.1016/j.tcs.2010.12.059.
Web of Science ®Google Scholar
Candes, E. J., and Tao, T. (2005), “Decoding by Linear Programming,” IEEE Transactions on Information Theory, 51, 4203–4215. DOI: 10.1109/TIT.2005.858979.
Web of Science ®Google Scholar
Cesa-Bianchi, N., and Lugosi, G. (2006), Prediction, Learning and Games, Cambridge, UK: Cambridge University Press.
Google Scholar
Chan, H. P. (2020), “The Multi-Armed Bandit Problem: An Efficient Nonparametric Solution,” The Annals of Statistics, 48, 346–373. DOI: 10.1214/19-AOS1809.
Web of Science ®Google Scholar
Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
Web of Science ®Google Scholar
Fan, J., and Lv, J. (2010), “A Selective Overview of Variable Selection in High Dimensional Feature Space,” Statistica Sinica, 20, 101–148.
PubMed Web of Science ®Google Scholar
Gittins, J. C. (1989), Multi-Armed Bandit Allocation Indices, New York: Wiley.
Google Scholar
Goldberg, Y., and Kosorok, M. R. (2012), “Q-Learning with Censored Data,” The Annals of Statistics, 40, 529–560. DOI: 10.1214/12-AOS968.
PubMed Web of Science ®Google Scholar
Goldenshluger, A., and Zeevi, A. (2009), “Woodroofe’s One-Armed Bandit Problem Revisited,” The Annals of Applied Probability, 19, 1603–1633.
Web of Science ®Google Scholar
Goldenshluger, A., and Zeevi, A.(2013), “A Linear Response Bandit Problem,” Stochastic Systems, 3, 230–261.
Google Scholar
Guan, M. Y., and Jiang, H. (2018), “Nonparametric Stochastic Contextual Bandits,” in Proceedings of Association for the Advancement of Artificial Intelligence.
Google Scholar
Ing, C.-K., and Lai, T. L. (2011), “A Stepwise Regression Method and Consistent Model Selection for High-Dimensional Sparse Linear Models,” Statistica Sinica, 21, 1473–1513.
Web of Science ®Google Scholar
International Warfarin Pharmacogenetics Consortium. (2009), “Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data,” New England Journal of Medicine, 360, 753–764.
PubMed Web of Science ®Google Scholar
Kakade, S. M., Shalev-Shwartz, S., and Tewari, A. (2008), “Efficient Bandit Algorithms for Online Multiclass Prediction,” in Proceedings of the 25th International Conference on Machine Learning, ACM, pp. 440–447.
Google Scholar
Laber, E. B., Lizotte, D. J., Qian, M., Pelham, W. E., and Murphy, S. A. (2014), “Dynamic Treatment Regimes: Technical Challenges and Applications,” Electronic Journal of Statistics, 8, 1225–1272.
PubMed Web of Science ®Google Scholar
Laber, E. B., Meyer, N. J., Reich, B. J., Pacifici, K., Collazo, J. A., and Drake, J. M. (2018), “Optimal Treatment Allocations in Space and Time for On-line Control of an Emerging Infectious Disease,” Journal of the Royal Statistical Society, Series C, 67, 743–789.
Web of Science ®Google Scholar
Lai, T. L. (1987), “Adaptive Treatment Allocation and the Multi-Armed Bandit Problem,” The Annals of Statistics, 15, 1091–1114.
Web of Science ®Google Scholar
Lai, T. L., and Robbins, H. (1985), “Asymptotically Efficient Adaptive Allocation Rules,” in Advances in Applied Mathematics (Vol. 6), pp. 4–22.
Web of Science ®Google Scholar
Langford, J., and Zhang, T. (2008), “The Epoch-Greedy Algorithm for Contextual Multi-Armed Bandits,” in Advances in Neural Information Processing Systems.
Google Scholar
Lattimore, T., and Szepesvári, C. (2020), Bandit Algorithms, Cambridge: Cambridge University Press.
Google Scholar
Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010), “A Contextual-Bandit Approach to Personalized News Article Recommendation,” in Proceedings of the 19th International World Wide Web Conference.
Google Scholar
Mammen, E., and Tsybakov, A. B. (1999), “Smooth Discrimination Analysis,” The Annals of Statistics, 27, 1808–1829.
Web of Science ®Google Scholar
May, B. C., Korda, N., Lee, A., and Leslie, D. S. (2012), “Optimistic Bayesian Sampling in Contextual-Bandit Problems,” Journal of Machine Learning Research, 13, 2069–2106.
Web of Science ®Google Scholar
McKeague, I. W., and Qian, M. (2014), “Estimation of Treatment Policies based on Functional Predictors,” Statistica Sinica, 24, 1461–1485. DOI: 10.5705/ss.2012.196.
PubMed Web of Science ®Google Scholar
Meinshausen, N., and Yu, B. (2009), “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data,” The Annals of Statistics, 37, 246–270.
Web of Science ®Google Scholar
Murphy, S. A. (2003), “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical Society, Series B, 65, 331–355.
Google Scholar
Perchet, V., and Rigollet, P. (2013), “The Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 41, 693–721.
Web of Science ®Google Scholar
Qian, M., and Murphy, S. A. (2011), “Performance Guarantees for Individualized Treatment Rules,” The Annals of Statistics, 39, 1180–1210.
PubMed Web of Science ®Google Scholar
Qian, W., Ding, S., and Cook, R. D. (2019), “Sparse Minimum Discrepancy Approach to Sufficient Dimension Reduction with Simultaneous Variable Selection in Ultrahigh Dimension,” Journal of the American Statistical Association, 114, 1277–1290.
Web of Science ®Google Scholar
Qian, W., Li, W., Sogawa, Y., Fujimaki, R., Yang, X., and Liu, J. (2019), “An Interactive Greedy Approach to Group Sparsity in High Dimensions,” Technometrics, 61, 409–421.
Web of Science ®Google Scholar
Qian, W., and Yang, Y. (2016a), “Kernel Estimation and Model Combination in a Bandit Problem with Covariates,” Journal of Machine Learning Research, 17, 1–37.
Web of Science ®Google Scholar
Qian, W., and Yang, Y. (2016b), “Randomized Allocation with Arm Elimination in a Bandit Problem with Covariates,” Electronic Journal of Statistics, 10, 242–270.
Web of Science ®Google Scholar
Reeve, H., Mellor, J., and Brown, G. (2018), “The K-Nearest Neighbour UCB Algorithm for Multi-Armed Bandits with Covariates,” in Proceedings of Machine Learning Research (Vol. 83), eds. F. Janoos, M. Mohri, and K. Sridharan, pp. 725–752.
Google Scholar
Rigollet, P., and Zeevi, A. (2010), “Nonparametric Bandits with Covariates,” Proceedings of the 23rd International Conference on Learning Theory, pp. 54–66, Omnipress.
Google Scholar
Robbins, H. (1954), “Some Aspects of the Sequential Design of Experiments,” Bulletin of the American Mathematical Society, 58, 527–535.
Google Scholar
Shi, C., Fan, A., Song, R., and Lu, W. (2018), “High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes,” The Annals of Statistics, 46, 925–957. DOI: 10.1214/17-AOS1570.
PubMed Web of Science ®Google Scholar
Tsybakov, A. B. (2004), “Optimal Aggregation of Classifiers in Statistical Learning,” The Annals of Statistics, 32, 135–166.
Web of Science ®Google Scholar
Woodroofe, M. (1979), “A One-Armed Bandit Problem with a Concomitant Variable,” Journal of the American Statistical Association, 74, 799–806.
Web of Science ®Google Scholar
Yahoo! Academic Relations. (2011), “Yahoo! Front Page Today Module User Click Log Fataset, version 2.0,” Available at http://webscope.sandbox.yahoo.com.
Google Scholar
Yang, Y. (1999), “Minimax Nonparametric Classification—Part I. Rates of Convergence,” IEEE Transactions on Information Theory, 45, 2271–2284.
Web of Science ®Google Scholar
Yang, Y., and Zhu, D. (2002), “Randomized Allocation with Nonparametric Estimation for a Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 30, 100–121.
Web of Science ®Google Scholar
Zhang, C.-H. (2010), “Nearly Unbiased Variable Selection under Minimax Concave Penalty,” The Annals of Statistics, 38, 894–942.
Web of Science ®Google Scholar
Zhang, T. (2011a), “Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations,” IEEE Transactions on Information Theory, 57, 4689–4708.
Web of Science ®Google Scholar
Zhang, T. (2011b), “Sparse Recovery with Orthogonal Matching Pursuit under RIP,” IEEE Transactions on Information Theory, 57, 6215–6221.
Web of Science ®Google Scholar
Zhou, L. (2015), “A Survey on Contextual Multi-Armed Bandits,” arXiv preprint arXiv:1508.03326.
Google Scholar
Zou, H. (2006), “The Adaptive Lasso and its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date