Search in:

Advanced search

Journal of the American Statistical Association Volume 116, 2021 - Issue 534

Submit an article Journal homepage

1,984

Views

CrossRef citations to date

Altmetric

Theory and Methods Special Issue on Precision Medicine and Individualized Policy Discovery, Part II

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Haoyu ChenDepartment of Statistics, North Carolina State University, Raleigh, NC

Wenbin LuDepartment of Statistics, North Carolina State University, Raleigh, NC

Rui SongDepartment of Statistics, North Carolina State University, Raleigh, NCCorrespondence[email protected]

Pages 708-719 | Received 15 Jun 2019, Accepted 10 Aug 2020, Published online: 19 Nov 2020

Cite this article
https://doi.org/10.1080/01621459.2020.1826325
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Agrawal, S., and Goyal, N. (2013), “Thompson Sampling for Contextual Bandits With Linear Payoffs,” in International Conference on Machine Learning, pp. 127–135.
Google Scholar
Audibert, J.-Y., and Tsybakov, A. B. (2007), “Fast Learning Rates for Plug-In Classifiers,” The Annals of Statistics, 35, 608–633. DOI: https://doi.org/10.1214/009053606000001217.
Web of Science ®Google Scholar
Auer, P. (2002), “Using Confidence Bounds for Exploitation-Exploration Trade-Offs,” Journal of Machine Learning Research, 3, 397–422.
Google Scholar
Bastani, H., and Bayati, M. (2015), “Online Decision-Making With High-Dimensional Covariates,” available at SSRN: https://ssrn.com/abstract=2661896 or DOI: https://doi.org/http://dx.doi.org/10.2139/ssrn.2661896.
Google Scholar
Chambaz, A., Zheng, W., and van der Laan, M. J. (2017), “Targeted Sequential Design for Targeted Learning Inference of the Optimal Treatment Rule and Its Mean Reward,” The Annals of Statistics, 45, 2537. DOI: https://doi.org/10.1214/16-AOS1534.
PubMed Web of Science ®Google Scholar
Chen, H., Lu, W., and Song, R. (2020), “Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting,” Journal of the American Statistical Association (just-accepted).
Web of Science ®Google Scholar
Chen, X., Lee, J. D., Tong, X. T., and Zhang, Y. (2016), “Statistical Inference for Model Parameters in Stochastic Gradient Descent,” arXiv no. 1610.08637.
Google Scholar
Dani, V., Hayes, T. P., and Kakade, S. M. (2008), “Stochastic Linear Optimization Under Bandit Feedback,” in Proceedings of the Workshop on Computational Learning Theory, pp. 355–366.
Google Scholar
Fang, Y., Xu, J., and Yang, L. (2018), “Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator,” The Journal of Machine Learning Research, 19, 3053–3073.
Web of Science ®Google Scholar
Goldenshluger, A., and Zeevi, A. (2013), “A Linear Response Bandit Problem,” Stochastic Systems, 3, 230–261. DOI: https://doi.org/10.1287/11-SSY032.
Google Scholar
Hall, P., and Heyde, C. C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press.
Google Scholar
Kim, E. S., Herbst, R. S., Wistuba, I. I., Lee, J. J., Blumenschein, G. R., Tsao, A., Stewart, D. J., Hicks, M. E., Erasmus, J., Gupta, S., and Alden, C. M. (2011), “The BATTLE Trial: Personalizing Therapy for Lung Cancer,” Cancer Discovery, 1, 44–53. DOI: https://doi.org/10.1158/2159-8274.CD-10-0010.
PubMed Web of Science ®Google Scholar
Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010), “A Contextual-Bandit Approach to Personalized News Article Recommendation,” in Proceedings of the 19th International Conference on World Wide Web, ACM, pp. 661–670. DOI: https://doi.org/10.1145/1772690.1772758.
Google Scholar
Luedtke, A. R., and Van Der Laan, M. J. (2016), “Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy,” The Annals of Statistics, 44, 713. DOI: https://doi.org/10.1214/15-AOS1384.
PubMed Web of Science ®Google Scholar
Moulines, E., and Bach, F. R. (2011), “Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning,” in Advances in Neural Information Processing Systems, pp. 451–459.
Google Scholar
Polyak, B. T., and Juditsky, A. B. (1992), “Acceleration of Stochastic Approximation by Averaging,” SIAM Journal on Control and Optimization, 30, 838–855. DOI: https://doi.org/10.1137/0330046.
Web of Science ®Google Scholar
Qian, W., and Yang, Y. (2016), “Kernel Estimation and Model Combination in a Bandit Problem With Covariates,” The Journal of Machine Learning Research, 17, 5181–5217.
Web of Science ®Google Scholar
Qiang, S., and Bayati, M. (2016), “Dynamic Pricing With Demand Covariates,” available at SSRN 2765257.
Google Scholar
Robbins, H. (1952), “Some Aspects of the Sequential Design of Experiments,” Bulletin of the American Mathematical Society, 58, 527–535. DOI: https://doi.org/10.1090/S0002-9904-1952-09620-8.
Web of Science ®Google Scholar
Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,” The Annals of Mathematical Statistics, 22, 400–407. DOI: https://doi.org/10.1214/aoms/1177729586.
Google Scholar
Ruppert, D. (1988), “Efficient Estimations From a Slowly Convergent Robbins–Monro Process,” Technical Report, Cornell University Operations Research and Industrial Engineering.
Google Scholar
Sutton, R. S., and Barto, A. G. (2018), Reinforcement Learning: An Introduction, Cambridge, MA: MIT Press.
Google Scholar
Sutton, R. S., Mahmood, A. R., and White, M. (2016), “An Emphatic Approach to the Problem of Off-Policy Temporal-Difference Learning,” The Journal of Machine Learning Research, 17, 2603–2631.
Web of Science ®Google Scholar
Tewari, A., and Murphy, S. A. (2017), “From Ads to Interventions: Contextual Bandits in Mobile Health,” in Mobile Health, eds. J. Rehg, S. Murphy, and S. Kumar, Cham: Springer, pp. 495–517.
Google Scholar
Tsiatis, A. A., Davidian, M., Holloway, S. T., and Laber, E. B. (2019), Introduction to Dynamic Treatment Regimes: Statistical Methods for Precision Medicine, Boca Raton, FL: Chapman & Hall.
Google Scholar
Valko, M., Korda, N., Munos, R., Flaounas, I., and Cristianini, N. (2013), “Finite-Time Analysis of Kernelised Contextual Bandits,” arXiv no. 1309.6869.
Google Scholar
Woodroofe, M. (1979), “A One-Armed Bandit Problem With a Concomitant Variable,” Journal of the American Statistical Association, 74, 799–806. DOI: https://doi.org/10.1080/01621459.1979.10481033.
Web of Science ®Google Scholar
Yang, Y., and Zhu, D. (2002), “Randomized Allocation With Nonparametric Estimation for a Multi-Armed Bandit Problem With Covariates,” The Annals of Statistics, 30, 100–121. DOI: https://doi.org/10.1214/aos/1015362186.
Web of Science ®Google Scholar
Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012), “A Robust Method for Estimating Optimal Treatment Regimes,” Biometrics, 68, 1010–1018. DOI: https://doi.org/10.1111/j.1541-0420.2012.01763.x.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date