948
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Multi-Script Video Caption Localization Based on Visual Rhythms

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Article: 2032926 | Received 17 Jul 2021, Accepted 18 Jan 2022, Published online: 04 Feb 2022

References

  • Agnihotri, L., and N. Dimitrova (1999). Text detection for video analysis. In IEEE Workshop on Content-Based Access of Image and Video Libraries, Fort Collins, CO, USA, 109–2214. IEEE.
  • Arafat, S. Y., and M. J. Iqbal. 2020. Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–803. doi:10.1109/ACCESS.2020.2994214.
  • Canny, J. 1987. A computational approach to edge detection. In Readings in Computer Vision, 184–203. San Francisco, CA, USA: Elsevier.
  • Chen, L.-H., and C.-W. Su. 2018. Video caption extraction using spatio-temporal slices. International Journal of Image and Graphics 18 (2):1850009. doi:10.1142/S0219467818500092.
  • Chun, S. S., H. Kim, K. Jung-Rim, S. Oh, and S. Sull (2002). Fast text caption localization on video using visual rhythm. In International Conference on Advances in Visual Information Systems, Hsin Chu, Taiwan, 259–68. Springer.
  • Concha, D. T., H. A. Maia, H. Pedrini, H. Tacon, A. S. Brito, H. L. Chaves, and M. B. Vieira (2018). Multi-stream convolutional neural networks for action recognition in video sequences based on adaptive visual rhythms. In 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA, 473–80. IEEE.
  • Epshtein, B., E. Ofek, and Y. Wexler (2010). Detecting text in natural scenes with stroke width transform. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2963–70. IEEE.
  • He, D., X. Yang, C. Liang, Z. Zhou, A. G. Ororbi, D. Kifer, and C. Lee Giles (2017a). Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 3519–28.
  • He, P., W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li (2017b). Single shot text detector with regional attention. In IEEE International Conference on Computer Vision, Venice, Italy, 3047–55.
  • Jiang, D., S. Zhang, Y. Huang, Q. Zou, X. Zhang, M. Pu, and J. Liu. 2020. Detecting dense text in natural images. IET Computer Vision 14 (8):597–604. doi:10.1049/iet-cvi.2019.0916.
  • Jiang, Y., X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, and Z. Luo (2017). R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.
  • Katper, S. H., A. R. Gilal, A. Waqas, A. Alshanqiti, A. Alsughayyir, and J. Jaafar. 2020. Deep neural networks combined with STN for multi-oriented text detection and recognition. International Journal of Advanced Computer Science and Applications 11 (4):178–85. doi:10.14569/IJACSA.2020.0110424.
  • Khare, V., P. Shivakumara, and P. Raveendran. 2015. A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Systems with Applications 42 (21):7627–40. doi:10.1016/j.eswa.2015.06.002.
  • Lee, -C.-C., Y.-C. Chiang, H.-M. Huang, and C.-L. Tsai (2007). A fast caption localization and detection for news videos. In Second International Conference on Innovative Computing, Information and Control, Kumamoto, Japan, 226–226. IEEE.
  • Liao, M., Z. Zhu, B. Shi, G.-S. Xia, and X. Bai (2018). Rotation-sensitive regression for oriented scene text detection. In IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 5909–18.
  • Liu, Y., and L. Jin (2017). Deep matching prior network: Toward tighter multi-oriented text detection. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 1962–69.
  • Long, S., X. He, and C. Ya (2018). Scene text detection and recognition: The deep learning era. arXiv preprint arXiv:1811.04256.
  • Lyu, M. R., J. Song, and M. Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15 (2):243–55. doi:10.1109/TCSVT.2004.841653.
  • Moreira, T. P., D. Menotti, and H. Pedrini (2017). First-person action recognition through visual rhythm texture description. In International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2627–31. IEEE.
  • Neumann, L., and J. Matas (2010). A method for text localization and recognition in real-world images. In Asian Conference on Computer Vision, Queenstown, New Zealand, 770–83. Springer.
  • Pinto, A., H. Pedrini, W. Schwartz, and A. Rocha (2012). Video-based face spoofing detection through visual rhythm analysis. In 25th Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, MG, Brazil, 221–28. IEEE.
  • Pinto, A., W. R. Schwartz, H. Pedrini, and A. Rocha. 2015. Using visual rhythms for detecting video-based facial spoof attacks. IEEE Transactions on Information Forensics and Security 10 (5):1025–38. doi:10.1109/TIFS.2015.2395139.
  • Shi, B., X. Bai, and S. Belongie (2017). Detecting oriented text in natural images by linking segments. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2550–58.
  • Souza, M. R. (2018). Digital video stabilization: Algorithms and evaluation. Master’s thesis, Institute of Computing, University of Campinas, Campinas-SP, Brazil.
  • Souza, M. R., and H. Pedrini. 2020. Visual rhythms for qualitative evaluation of video stabilization. EURASIP Journal on Image and Video Processing 2020:1–19. doi:10.1186/s13640-020-00508-4.
  • Souza, M., H. Maia, M. Vieira, and H. Pedrini. 2020. Survey on visual rhythms: A spatio-temporal representation for video sequences. Neurocomputing 402:409–22. doi:10.1016/j.neucom.2020.04.035.
  • Sravani, M., A. Maheswararao, and M. K. Murthy. 2021. Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimedia Tools and Applications 80 (6):9671–86. doi:10.1007/s11042-020-10113-2.
  • Tacon, H., A. S. Brito, H. L. Chaves, M. B. Vieira, S. M. Villela, H. de Almeida Maia, D. T. Concha, and H. Pedrini (2019). Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms. In International Conference on Computational Science and Its Applications, Saint Petersburg, Russia, 351–66. Springer.
  • Torres, B. S., and H. Pedrini. 2018. Detection of complex video events through visual rhythm. The Visual Computer 34 (2):145–65. doi:10.1007/s00371-016-1321-1.
  • Valery, G., and S. Jean (2020). Detection and localization of embedded subtitles in a video stream. In International Conference on Computational Science and Its Applications, Cagliari, Italy, 119–28. Springer.
  • Valio, F. B., H. Pedrini, and N. J. Leite. 2011. Fast rotation-invariant video caption detection based on visual rhythm. In Iberoamerican Congress on Pattern Recognition, ed. César San Martin and Sang-Woon Kim, 157–64. Springer.
  • Villamizar, M., O. Canévet, and J.-M. Odobez. 2020. Multi-scale sequential network for semantic text segmentation and localization. Pattern Recognition Letters 129:63–69. doi:10.1016/j.patrec.2019.11.001.
  • Wu, J.-C., J.-W. Hsieh, and Y.-S. Chen. 2008. Morphology-based text line extraction. Machine Vision and Applications 19 (3):195–207. doi:10.1007/s00138-007-0092-0.
  • Yin, X.-C., Z.-Y. Zuo, S. Tian, and C.-L. Liu. 2016. Text detection, tracking and recognition in video: A comprehensive survey. IEEE Transactions on Image Processing 25 (6):2752–73. doi:10.1109/TIP.2016.2554321.
  • Zedan, I. A., K. M. Elsayed, and E. Emary (2016). Caption detection, localization and type recognition in Arabic news video. In 10th International Conference on Informatics and Systems, Giza, Egypt, 114–20. ACM.
  • Zhang, Y., and T.-S. Chua. 2000. Detection of text captions in compressed domain video. In ACM Workshops on Multimedia, 201–04. New York, NY, USA: ACM.
  • Zhang, Z., C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai (2016). Multi-oriented text detection with fully convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 4159–67.
  • Zhong, Y., H. Zhang, and A. K. Jain. 2000. Automatic caption localization in compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (4):385–92. doi:10.1109/34.845381.