References
- Abbasi-Yadkori, Y., Pál, D., and Szepesvári, C. (2011), “Improved Algorithms for Linear Stochastic Bandits,” in Advances in Neural Information Processing Systems, pp. 2312–2320.
- Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., and Schapire, R. (2014), “Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits,” in International Conference on Machine Learning, pp. 1638–1646.
- Arya, S., and Yang, Y. (2020), “Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards,” Statistics & Probability Letters, 164, 1–9.
- Audibert, J.-Y., and Tsybakov, A. B. (2007), “Fast Learning Rates for Plug-in Classifiers,” The Annals of Statistics, 35, 608–633. DOI: 10.1214/009053606000001217.
- Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002), “Finite-Time Analysis of the Multiarmed Bandit Problem,” Machine Learning, 47, 235–256. DOI: 10.1023/A:1013689704352.
- Auer, P., Ortner, R., and Szepesvári, C. (2007), “Improved Rates for the Stochastic Continuum-Armed Bandit Problem,” in Proceedings of 20th Annual Conference on Learning Theory.
- Bastani, H., and Bayati, M. (2020), “Online Decision Making with High-Dimensional Covariates,” Operations Research, 68, 276–294. DOI: 10.1287/opre.2019.1902.
- Berry, D. A., and Fristedt, B. (1985), Bandit Problems: Sequential Allocation of Experiments, New York: Chapman and Hall.
- Beygelzimer, A., Orabona, F., and Zhang, C. (2017), “Efficient Online Bandit Multiclass Learning with Order Root T Regret,” in Proceedings of the 34th International Conference on Machine Learning, pp. 488–497.
- Bistritz, I., Zhou, Z., Chen, X., Bambos, N., and Blanchet, J. (2019), “Online EXP3 Learning in Adversarial Bandits with Delayed Feedback,” in Advances in Neural Information Processing Systems, pp. 11349–11358.
- Bubeck, S., and Cesa-Bianchi, N. (2012), “Regret Analysis of Stochastic and Non stochastic Multi-Armed Bandit Problems,” Foundations and Trends in Machine Learning, 5, 1–122. DOI: 10.1561/2200000024.
- Bubeck, S., Munos, R., and Stoltz, G. (2011), “Pure Exploration in Finitely-Armed and Continuous-Armed Bandits,” Theoretical Computer Science, 412, 1832–1852. DOI: 10.1016/j.tcs.2010.12.059.
- Candes, E. J., and Tao, T. (2005), “Decoding by Linear Programming,” IEEE Transactions on Information Theory, 51, 4203–4215. DOI: 10.1109/TIT.2005.858979.
- Cesa-Bianchi, N., and Lugosi, G. (2006), Prediction, Learning and Games, Cambridge, UK: Cambridge University Press.
- Chan, H. P. (2020), “The Multi-Armed Bandit Problem: An Efficient Nonparametric Solution,” The Annals of Statistics, 48, 346–373. DOI: 10.1214/19-AOS1809.
- Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
- Fan, J., and Lv, J. (2010), “A Selective Overview of Variable Selection in High Dimensional Feature Space,” Statistica Sinica, 20, 101–148.
- Gittins, J. C. (1989), Multi-Armed Bandit Allocation Indices, New York: Wiley.
- Goldberg, Y., and Kosorok, M. R. (2012), “Q-Learning with Censored Data,” The Annals of Statistics, 40, 529–560. DOI: 10.1214/12-AOS968.
- Goldenshluger, A., and Zeevi, A. (2009), “Woodroofe’s One-Armed Bandit Problem Revisited,” The Annals of Applied Probability, 19, 1603–1633.
- Goldenshluger, A., and Zeevi, A.(2013), “A Linear Response Bandit Problem,” Stochastic Systems, 3, 230–261.
- Guan, M. Y., and Jiang, H. (2018), “Nonparametric Stochastic Contextual Bandits,” in Proceedings of Association for the Advancement of Artificial Intelligence.
- Ing, C.-K., and Lai, T. L. (2011), “A Stepwise Regression Method and Consistent Model Selection for High-Dimensional Sparse Linear Models,” Statistica Sinica, 21, 1473–1513.
- International Warfarin Pharmacogenetics Consortium. (2009), “Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data,” New England Journal of Medicine, 360, 753–764.
- Kakade, S. M., Shalev-Shwartz, S., and Tewari, A. (2008), “Efficient Bandit Algorithms for Online Multiclass Prediction,” in Proceedings of the 25th International Conference on Machine Learning, ACM, pp. 440–447.
- Laber, E. B., Lizotte, D. J., Qian, M., Pelham, W. E., and Murphy, S. A. (2014), “Dynamic Treatment Regimes: Technical Challenges and Applications,” Electronic Journal of Statistics, 8, 1225–1272.
- Laber, E. B., Meyer, N. J., Reich, B. J., Pacifici, K., Collazo, J. A., and Drake, J. M. (2018), “Optimal Treatment Allocations in Space and Time for On-line Control of an Emerging Infectious Disease,” Journal of the Royal Statistical Society, Series C, 67, 743–789.
- Lai, T. L. (1987), “Adaptive Treatment Allocation and the Multi-Armed Bandit Problem,” The Annals of Statistics, 15, 1091–1114.
- Lai, T. L., and Robbins, H. (1985), “Asymptotically Efficient Adaptive Allocation Rules,” in Advances in Applied Mathematics (Vol. 6), pp. 4–22.
- Langford, J., and Zhang, T. (2008), “The Epoch-Greedy Algorithm for Contextual Multi-Armed Bandits,” in Advances in Neural Information Processing Systems.
- Lattimore, T., and Szepesvári, C. (2020), Bandit Algorithms, Cambridge: Cambridge University Press.
- Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010), “A Contextual-Bandit Approach to Personalized News Article Recommendation,” in Proceedings of the 19th International World Wide Web Conference.
- Mammen, E., and Tsybakov, A. B. (1999), “Smooth Discrimination Analysis,” The Annals of Statistics, 27, 1808–1829.
- May, B. C., Korda, N., Lee, A., and Leslie, D. S. (2012), “Optimistic Bayesian Sampling in Contextual-Bandit Problems,” Journal of Machine Learning Research, 13, 2069–2106.
- McKeague, I. W., and Qian, M. (2014), “Estimation of Treatment Policies based on Functional Predictors,” Statistica Sinica, 24, 1461–1485. DOI: 10.5705/ss.2012.196.
- Meinshausen, N., and Yu, B. (2009), “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data,” The Annals of Statistics, 37, 246–270.
- Murphy, S. A. (2003), “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical Society, Series B, 65, 331–355.
- Perchet, V., and Rigollet, P. (2013), “The Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 41, 693–721.
- Qian, M., and Murphy, S. A. (2011), “Performance Guarantees for Individualized Treatment Rules,” The Annals of Statistics, 39, 1180–1210.
- Qian, W., Ding, S., and Cook, R. D. (2019), “Sparse Minimum Discrepancy Approach to Sufficient Dimension Reduction with Simultaneous Variable Selection in Ultrahigh Dimension,” Journal of the American Statistical Association, 114, 1277–1290.
- Qian, W., Li, W., Sogawa, Y., Fujimaki, R., Yang, X., and Liu, J. (2019), “An Interactive Greedy Approach to Group Sparsity in High Dimensions,” Technometrics, 61, 409–421.
- Qian, W., and Yang, Y. (2016a), “Kernel Estimation and Model Combination in a Bandit Problem with Covariates,” Journal of Machine Learning Research, 17, 1–37.
- Qian, W., and Yang, Y. (2016b), “Randomized Allocation with Arm Elimination in a Bandit Problem with Covariates,” Electronic Journal of Statistics, 10, 242–270.
- Reeve, H., Mellor, J., and Brown, G. (2018), “The K-Nearest Neighbour UCB Algorithm for Multi-Armed Bandits with Covariates,” in Proceedings of Machine Learning Research (Vol. 83), eds. F. Janoos, M. Mohri, and K. Sridharan, pp. 725–752.
- Rigollet, P., and Zeevi, A. (2010), “Nonparametric Bandits with Covariates,” Proceedings of the 23rd International Conference on Learning Theory, pp. 54–66, Omnipress.
- Robbins, H. (1954), “Some Aspects of the Sequential Design of Experiments,” Bulletin of the American Mathematical Society, 58, 527–535.
- Shi, C., Fan, A., Song, R., and Lu, W. (2018), “High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes,” The Annals of Statistics, 46, 925–957. DOI: 10.1214/17-AOS1570.
- Tsybakov, A. B. (2004), “Optimal Aggregation of Classifiers in Statistical Learning,” The Annals of Statistics, 32, 135–166.
- Woodroofe, M. (1979), “A One-Armed Bandit Problem with a Concomitant Variable,” Journal of the American Statistical Association, 74, 799–806.
- Yahoo! Academic Relations. (2011), “Yahoo! Front Page Today Module User Click Log Fataset, version 2.0,” Available at http://webscope.sandbox.yahoo.com.
- Yang, Y. (1999), “Minimax Nonparametric Classification—Part I. Rates of Convergence,” IEEE Transactions on Information Theory, 45, 2271–2284.
- Yang, Y., and Zhu, D. (2002), “Randomized Allocation with Nonparametric Estimation for a Multi-Armed Bandit Problem with Covariates,” The Annals of Statistics, 30, 100–121.
- Zhang, C.-H. (2010), “Nearly Unbiased Variable Selection under Minimax Concave Penalty,” The Annals of Statistics, 38, 894–942.
- Zhang, T. (2011a), “Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations,” IEEE Transactions on Information Theory, 57, 4689–4708.
- Zhang, T. (2011b), “Sparse Recovery with Orthogonal Matching Pursuit under RIP,” IEEE Transactions on Information Theory, 57, 6215–6221.
- Zhou, L. (2015), “A Survey on Contextual Multi-Armed Bandits,” arXiv preprint arXiv:1508.03326.
- Zou, H. (2006), “The Adaptive Lasso and its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429.