1,984
Views
8
CrossRef citations to date
0
Altmetric
Theory and Methods Special Issue on Precision Medicine and Individualized Policy Discovery, Part II

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

, &
Pages 708-719 | Received 15 Jun 2019, Accepted 10 Aug 2020, Published online: 19 Nov 2020

References

  • Agrawal, S., and Goyal, N. (2013), “Thompson Sampling for Contextual Bandits With Linear Payoffs,” in International Conference on Machine Learning, pp. 127–135.
  • Audibert, J.-Y., and Tsybakov, A. B. (2007), “Fast Learning Rates for Plug-In Classifiers,” The Annals of Statistics, 35, 608–633. DOI: https://doi.org/10.1214/009053606000001217.
  • Auer, P. (2002), “Using Confidence Bounds for Exploitation-Exploration Trade-Offs,” Journal of Machine Learning Research, 3, 397–422.
  • Bastani, H., and Bayati, M. (2015), “Online Decision-Making With High-Dimensional Covariates,” available at SSRN: https://ssrn.com/abstract=2661896 or DOI: https://doi.org/http://dx.doi.org/10.2139/ssrn.2661896.
  • Chambaz, A., Zheng, W., and van der Laan, M. J. (2017), “Targeted Sequential Design for Targeted Learning Inference of the Optimal Treatment Rule and Its Mean Reward,” The Annals of Statistics, 45, 2537. DOI: https://doi.org/10.1214/16-AOS1534.
  • Chen, H., Lu, W., and Song, R. (2020), “Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting,” Journal of the American Statistical Association (just-accepted).
  • Chen, X., Lee, J. D., Tong, X. T., and Zhang, Y. (2016), “Statistical Inference for Model Parameters in Stochastic Gradient Descent,” arXiv no. 1610.08637.
  • Dani, V., Hayes, T. P., and Kakade, S. M. (2008), “Stochastic Linear Optimization Under Bandit Feedback,” in Proceedings of the Workshop on Computational Learning Theory, pp. 355–366.
  • Fang, Y., Xu, J., and Yang, L. (2018), “Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator,” The Journal of Machine Learning Research, 19, 3053–3073.
  • Goldenshluger, A., and Zeevi, A. (2013), “A Linear Response Bandit Problem,” Stochastic Systems, 3, 230–261. DOI: https://doi.org/10.1287/11-SSY032.
  • Hall, P., and Heyde, C. C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press.
  • Kim, E. S., Herbst, R. S., Wistuba, I. I., Lee, J. J., Blumenschein, G. R., Tsao, A., Stewart, D. J., Hicks, M. E., Erasmus, J., Gupta, S., and Alden, C. M. (2011), “The BATTLE Trial: Personalizing Therapy for Lung Cancer,” Cancer Discovery, 1, 44–53. DOI: https://doi.org/10.1158/2159-8274.CD-10-0010.
  • Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010), “A Contextual-Bandit Approach to Personalized News Article Recommendation,” in Proceedings of the 19th International Conference on World Wide Web, ACM, pp. 661–670. DOI: https://doi.org/10.1145/1772690.1772758.
  • Luedtke, A. R., and Van Der Laan, M. J. (2016), “Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy,” The Annals of Statistics, 44, 713. DOI: https://doi.org/10.1214/15-AOS1384.
  • Moulines, E., and Bach, F. R. (2011), “Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning,” in Advances in Neural Information Processing Systems, pp. 451–459.
  • Polyak, B. T., and Juditsky, A. B. (1992), “Acceleration of Stochastic Approximation by Averaging,” SIAM Journal on Control and Optimization, 30, 838–855. DOI: https://doi.org/10.1137/0330046.
  • Qian, W., and Yang, Y. (2016), “Kernel Estimation and Model Combination in a Bandit Problem With Covariates,” The Journal of Machine Learning Research, 17, 5181–5217.
  • Qiang, S., and Bayati, M. (2016), “Dynamic Pricing With Demand Covariates,” available at SSRN 2765257.
  • Robbins, H. (1952), “Some Aspects of the Sequential Design of Experiments,” Bulletin of the American Mathematical Society, 58, 527–535. DOI: https://doi.org/10.1090/S0002-9904-1952-09620-8.
  • Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: https://doi.org/10.1214/aoms/1177729586.
  • Ruppert, D. (1988), “Efficient Estimations From a Slowly Convergent Robbins–Monro Process,” Technical Report, Cornell University Operations Research and Industrial Engineering.
  • Sutton, R. S., and Barto, A. G. (2018), Reinforcement Learning: An Introduction, Cambridge, MA: MIT Press.
  • Sutton, R. S., Mahmood, A. R., and White, M. (2016), “An Emphatic Approach to the Problem of Off-Policy Temporal-Difference Learning,” The Journal of Machine Learning Research, 17, 2603–2631.
  • Tewari, A., and Murphy, S. A. (2017), “From Ads to Interventions: Contextual Bandits in Mobile Health,” in Mobile Health, eds. J. Rehg, S. Murphy, and S. Kumar, Cham: Springer, pp. 495–517.
  • Tsiatis, A. A., Davidian, M., Holloway, S. T., and Laber, E. B. (2019), Introduction to Dynamic Treatment Regimes: Statistical Methods for Precision Medicine, Boca Raton, FL: Chapman & Hall.
  • Valko, M., Korda, N., Munos, R., Flaounas, I., and Cristianini, N. (2013), “Finite-Time Analysis of Kernelised Contextual Bandits,” arXiv no. 1309.6869.
  • Woodroofe, M. (1979), “A One-Armed Bandit Problem With a Concomitant Variable,” Journal of the American Statistical Association, 74, 799–806. DOI: https://doi.org/10.1080/01621459.1979.10481033.
  • Yang, Y., and Zhu, D. (2002), “Randomized Allocation With Nonparametric Estimation for a Multi-Armed Bandit Problem With Covariates,” The Annals of Statistics, 30, 100–121. DOI: https://doi.org/10.1214/aos/1015362186.
  • Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012), “A Robust Method for Estimating Optimal Treatment Regimes,” Biometrics, 68, 1010–1018. DOI: https://doi.org/10.1111/j.1541-0420.2012.01763.x.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.