Search in:

Advanced search

Journal of the American Statistical Association Volume 114, 2019 - Issue 527

Submit an article Journal homepage

1,459

Views

CrossRef citations to date

Altmetric

Theory and Methods

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes

Wensheng ZhuKey Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, ChinaView further author information

Donglin ZengDepartments of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NCView further author information

Rui SongDepartment of Statistics, North Carolina State University, Raleigh, NCView further author information

Pages 1404-1417 | Received 01 Apr 2017, Accepted 01 Jun 2018, Published online: 29 Oct 2018

Cite this article
https://doi.org/10.1080/01621459.2018.1506341
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Altman, T., and Leger, C. (1994), “Cross-validation, the Bootstrap, and Related Methods for Tuning Parameter Selection,” Technical Report, The Cornell University Library, 1–23.
Google Scholar
Chakraborty, B., Laber, E., and Zhao, Y. (2013), “Inference for Optimal Dynamic Treatment Regimes using an Adaptive m-out-of-n Bootstrap Scheme,” Biometrics, 69, 714–723.
PubMed Web of Science ®Google Scholar
Chakraborty, B., and Moodie, E. E. M. (2013), Statistical Methods for Dynamic Treatment Regimes, New York: Springer.
Google Scholar
Chakraborty, B., Murphy, S., and Strecher, V. (2010), “Inference for Non-Regular Parameters in Optimal Dynamic Treatment Regimes,” Statistical Methods in Medical Research, 19, 317–343.
PubMed Web of Science ®Google Scholar
Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360.
Web of Science ®Google Scholar
Fan, J., and Lv, J. (2011), “Non-Concave Penalized Likelihood with NP-Dimensionality,” IEEE Transactions on Information Theory, 57, 5467–5484.
PubMed Web of Science ®Google Scholar
Fava, M., Rush, A. J., Trivedi, M. H., Nierenberg, A. A., Thase, M. E., Sackeim, H. A., Quitkin, F. M., Wisniewski, S., Lavori, P. W., Rosenbaum, J. F., and Kupfer, D. J. (2003), “Background and Rationale for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) Study,” Psychiatric Clinics of North America, 26, 457–494.
PubMed Web of Science ®Google Scholar
Laber, E., Lizotte, D., Qian, M., Pelham, W., and Murphy, S. (2014), “Dynamic Treatment Regimes: Technical Challenges and Applications,” Electronic Journal of Statistics, 8, 1225–1272.
PubMed Web of Science ®Google Scholar
Luedtke, A. R., and Van Der Laan, M. J. (2016), “Statistical Inference for the Mean Out- Come under a Possibly Non-Unique Optimal Treatment Strategy,” The Annals of Statistics, 44, 713–742.
PubMed Web of Science ®Google Scholar
Lv, J., and Fan, Y. (2009), “A Unified Approach to Model Selection and Sparse Recovery using Regularized Least Squares,” The Annals of Statistics, 37, 3498–3528.
Web of Science ®Google Scholar
Moodie, E., and Richardson, T. (2010), “Estimating Optimal Dynamic Regimes: Correcting Bias under the Null,” Scandinavian Journal of Statistics, 37, 126–146.
Web of Science ®Google Scholar
Qian, M., and Murphy, S. A. (2011), “Performance Guarantees for Individualized Treatment Rules,” Annals of Statistics, 39, 1180–1210.
PubMed Web of Science ®Google Scholar
Robins, J. M. (2004), “Optimal Structural Nested Models for Optimal Sequential Decisions,” in Proceedings of the Second Seattle Symposium in Biostatistics, Springer, pp. 189–326.
Google Scholar
Rush, A. J., Fava, M., Wisniewski, S. R., Lavori, P. W., Trivedi, M. H., Sackeim, H. A., Thase, M. E., Nierenberg, A. A., Quitkin, F. M., Kashner, T. M., Kupfer, D. J., Rosenbaum, J. F., Alpert, J., Stewart, J. W., McGrath, P. J., Biggs, M. M., Shores-Wilson, K., Lebowitz, B. D., Ritz, L., and Niederehe, G. (2004), “Sequenced Treatment Alternatives to Relieve Depression (STAR*D): Rationale and Design,” Controlled Clinical Trials, 25, 119–142.
PubMedGoogle Scholar
Song, R., Wang, W., Zeng, D., and Kosorok, M. R. (2015), “Penalized Q-Learning for Dynamic Treatment Regimens,” Statistica Sinica, 25, 901–920.
PubMed Web of Science ®Google Scholar
Watkins, C. J. (1989), “Learning from Delayed Rewards,” Ph.D. dissertation, University of Cambridge, England.
Google Scholar
Zhang, C. (2010), “Nearly Unbiased Variable Selection under Minimax Concave Penalty,” The Annals of Statistics, 38, 894–942.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date