595
Views
0
CrossRef citations to date
0
Altmetric
ARTICLES

Deep advantage learning for optimal dynamic treatment regime

, &
Pages 80-88 | Received 02 May 2017, Accepted 14 Apr 2018, Published online: 16 May 2018

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Retrieved from https://protect-us.mimecast.com/s/fd_CCzpBnGHM4pWNviXNGA_?domain=tensorflow.org. Software available from tensorflow.org.
  • Basu, D. (1980). Randomization analysis of experimental data: The fisher randomization test. Journal of the American Statistical Association, 75(371), 575–582. doi: 10.1080/01621459.1980.10477512
  • Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … Zieba, K. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.
  • Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2015). The loss surfaces of multilayer networks. AISTATS, 38, 192–204.
  • Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning, Helsinki, Finland (pp. 160–167). ACM.
  • Cowell, R. G., Dawid, P., Lauritzen, S. L., & Spiegelhalter, D. J. (2006). Probabilistic networks and expert systems: Exact computational methods for Bayesian networks. New York, NY: Springer Science & Business Media.
  • Ding, X., Zhang, Y., Liu, T., & Duan, J. (2015). Deep learning for event-driven stock prediction. In IJCAI, Buenos Aires, Argentina (pp. 2327–2333).
  • Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on machine learning, Helsinki, Finland (pp. 272–279). ACM.
  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360. doi: 10.1198/016214501753382273
  • Fava, M., Rush, J., Trivedi, M. H., Nierenberg, A., Thase, M., Sackeim, F., … Kupfer, D. J. (2003). Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. Psychiatric Clinics of North America, 26(2), 457–494. doi: 10.1016/S0193-953X(02)00107-7
  • Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366. doi: 10.1016/0893-6080(89)90020-8
  • Karpathy, A. (2017). Lecture notes in cs231n: Convolutional neural networks for visual recognition. Spring.
  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (MSc thesis).
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1090–1098). Cambridge: The MIT Press.
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. doi: 10.1038/nature14539
  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. doi: 10.1109/5.726791
  • Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724. doi: 10.1177/0278364914549607
  • Lu, W., Zhang, H., & Zeng, D. (2013). Variable selection for optimal treatment decision. Statistical Methods in Medical Research, 22(5), 493–504. doi: 10.1177/0962280211428383
  • Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. doi: 10.1038/nature14236
  • Moodie, E. E., Richardson, T. S., & Stephens, D. A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics, 63(2), 447–455. doi: 10.1111/j.1541-0420.2006.00686.x
  • Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B, 65(2), 331–355. doi: 10.1111/1467-9868.00389
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
  • Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta Numerica, 8, 143–195. doi: 10.1017/S0962492900002919
  • Qian, M., & Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of Statistics, 39(2), 1180–1210. doi: 10.1214/10-AOS864
  • Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In Advances in neural information processing systems 20 (pp. 1–8). Vancouver: Curran Associates.
  • Robins, J. (1997). Causal inference from complex longitudinal data. In Latent variable modeling and applications to causality. New York, NY: Springer.
  • Schulte, P. J., Tsiatis, A. A., Laber, E. B., & Davidian, M. (2014). Q-and a-learning methods for estimating optimal dynamic treatment regimes. Statistical Science, 29(4), 640–661. doi: 10.1214/13-STS450
  • Shi, C., Fan, A., Song, R., & Lu, W. (2017). High-dimensional a-learning for optimal dynamic treatment regimes. Annals of Statistics, 4(1), 59–68.
  • Shi, C., Song, R., & Lu, W. (2016). Robust learning for optimal treatment decision with np-dimensionality. Electronic Journal of Statistics, 10(2), 2894–2921. doi: 10.1214/16-EJS1178
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … Hassabis, D. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489. doi: 10.1038/nature16961
  • Song, R., Kosorok, M., Zeng, D., Zhao, Y., Laber, E., & Yuan, M. (2015). On sparse representation for optimal individualized treatment selection with penalized outcome weighted learning. Stat, 4(1), 59–68. doi: 10.1002/sta4.78
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.
  • Watkins, C. J. C. H. (1989). Learning from delayed rewards (PhD thesis). University of Cambridge, England.
  • Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292. doi: 10.1007/BF00992698
  • Zhang, Y., Liang, P., & Wainwright, M. J. (2016). Convexified convolutional neural networks. arXiv preprint arXiv:1609.01000.
  • Zhang, B., Tsiatis, A. A., Davidian, M., Zhang, M., & Laber, E. (2012). Estimating optimal treatment regimes from a classification perspective. Stat, 1(1), 103–114. doi: 10.1002/sta.411
  • Zhao, Y., Zeng, D., Rush, J., & Kosorok, M. (2012). Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499), 1106–1118. doi: 10.1080/01621459.2012.695674

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.