References
- Ba JL, Kiros JR, Hinton GE. 2016. Layer normalization. arXiv. 1607:06450.
- Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar.
- Girdhar R, Carreira J, Doersch C, Zisserman A. 2019. Video action transformer network. 32nd IEEE conference on computer vision and pattern recognition, Long Beach, CA, United States, June 16–20, 2019.
- He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. 29th IEEE conference on computer vision and pattern recognition, Las Vegas, Nevada, United States, June 27–30, 2016.
- Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput. 9(8):1735–1780. doi:10.1162/neco.1997.9.8.1735.
- Jin Y, Li H, Dou Q, Chen H, Qin J, Fu CW, Heng PA. 2020. Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med Image Anal. 59:101572. doi:10.1016/j.media.2019.101572.
- Kitaev N, Kaiser L, Levskaya A: Reformer: the efficient transformer. 2020. International Conference on Learning Representaitons (ICLR), Virtual Conference, Formerly Addis Ababa, Ethiopia.
- Krizhevsky A, Sutskever I, Hinton GE. 2012. Imagenet classification with deep convolutional neural networks. 26th Conference on Neural Information Processing Systems, NIPS 2012, Lake Tahoe, Nevada, United States, Dec. 3–8, 2012.
- Namazi B, Sankaranarayanan G, Devarajan V. 2019. LapTool-Net: a contextual detector of surgical tools in laparoscopic videos based on recurrent convolutional neural networks. arXiv. 1905:08983.
- Primus MJ, Schoeffmann K, Böszörmenyi L. 2015. Instrument classification in laparoscopic videos. 13th International Workshop on Content-Based Multimedia Indexing, CBMI 2015, Prague, Czech Republic, June 10–12, 2015.
- Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. 2015. ImageNet large scale visual recognition challenge. Int J Comput Vision (IJCV). 115(3):211–252.
- Sokolova M, Lapalme G. 2009. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 45(4):427–437.
- Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. 2015. Going deeper with convolutions. 28th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, United States, June 7–12, 2015.
- Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N. 2016. Endonet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging. 36(1):86–97. doi:10.1109/TMI.2016.2593957.
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Polosukhin I. 2017. Attention is all you need. 31st Conference on Neural Information Processing Systems, NeurIPS 2017, Long Beach, CA, United States, Dec. 5–7, 2017.
- Zhang M, Lucas J, Ba J, Hinton GE. 2019. Lookahead Optimizer: k steps forward, 1 step back. 33rd Conference on Neural Information Processing Systems, NeurIPS 2019, Vancouver Canada, United States, Dec. 10–12, 2019.