Search in:

Advanced search

Applied Artificial Intelligence

An International Journal

Volume 36, 2022 - Issue 1

Submit an article Journal homepage

Open access

963

Views

CrossRef citations to date

Altmetric

Research Article

Multi-Script Video Caption Localization Based on Visual Rhythms

Marcos Roberto e Souzaa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0003-4342-5220 View further author information

Helena de Almeida Maiaa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0002-8253-9004 View further author information

Anderson Carlos Souza e Santosa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0002-7806-3410 View further author information

Marcelo Bernardes Vieirab Department of Computer Science, Federal University of Juiz de Fora (UFJF), Juiz de Fora, Brazil

https://orcid.org/0000-0003-3356-6679 View further author information

Helio Pedrinia Institute of Computing, University of Campinas, Campinas, BrazilCorrespondence[email protected]

https://orcid.org/0000-0003-0125-630X View further author information

Article: 2032926 | Received 17 Jul 2021, Accepted 18 Jan 2022, Published online: 04 Feb 2022

Cite this article
https://doi.org/10.1080/08839514.2022.2032926
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Agnihotri, L., and N. Dimitrova (1999). Text detection for video analysis. In IEEE Workshop on Content-Based Access of Image and Video Libraries, Fort Collins, CO, USA, 109–2214. IEEE.
Google Scholar
Arafat, S. Y., and M. J. Iqbal. 2020. Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–803. doi:10.1109/ACCESS.2020.2994214.
Google Scholar
Canny, J. 1987. A computational approach to edge detection. In Readings in Computer Vision, 184–203. San Francisco, CA, USA: Elsevier.
Google Scholar
Chen, L.-H., and C.-W. Su. 2018. Video caption extraction using spatio-temporal slices. International Journal of Image and Graphics 18 (2):1850009. doi:10.1142/S0219467818500092.
Web of Science ®Google Scholar
Chun, S. S., H. Kim, K. Jung-Rim, S. Oh, and S. Sull (2002). Fast text caption localization on video using visual rhythm. In International Conference on Advances in Visual Information Systems, Hsin Chu, Taiwan, 259–68. Springer.
Google Scholar
Concha, D. T., H. A. Maia, H. Pedrini, H. Tacon, A. S. Brito, H. L. Chaves, and M. B. Vieira (2018). Multi-stream convolutional neural networks for action recognition in video sequences based on adaptive visual rhythms. In 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA, 473–80. IEEE.
Google Scholar
Epshtein, B., E. Ofek, and Y. Wexler (2010). Detecting text in natural scenes with stroke width transform. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2963–70. IEEE.
Google Scholar
He, D., X. Yang, C. Liang, Z. Zhou, A. G. Ororbi, D. Kifer, and C. Lee Giles (2017a). Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 3519–28.
Google Scholar
He, P., W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li (2017b). Single shot text detector with regional attention. In IEEE International Conference on Computer Vision, Venice, Italy, 3047–55.
Google Scholar
Jiang, D., S. Zhang, Y. Huang, Q. Zou, X. Zhang, M. Pu, and J. Liu. 2020. Detecting dense text in natural images. IET Computer Vision 14 (8):597–604. doi:10.1049/iet-cvi.2019.0916.
Web of Science ®Google Scholar
Jiang, Y., X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, and Z. Luo (2017). R2CNN: Rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579.
Google Scholar
Katper, S. H., A. R. Gilal, A. Waqas, A. Alshanqiti, A. Alsughayyir, and J. Jaafar. 2020. Deep neural networks combined with STN for multi-oriented text detection and recognition. International Journal of Advanced Computer Science and Applications 11 (4):178–85. doi:10.14569/IJACSA.2020.0110424.
Web of Science ®Google Scholar
Khare, V., P. Shivakumara, and P. Raveendran. 2015. A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Systems with Applications 42 (21):7627–40. doi:10.1016/j.eswa.2015.06.002.
Web of Science ®Google Scholar
Lee, -C.-C., Y.-C. Chiang, H.-M. Huang, and C.-L. Tsai (2007). A fast caption localization and detection for news videos. In Second International Conference on Innovative Computing, Information and Control, Kumamoto, Japan, 226–226. IEEE.
Google Scholar
Liao, M., Z. Zhu, B. Shi, G.-S. Xia, and X. Bai (2018). Rotation-sensitive regression for oriented scene text detection. In IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 5909–18.
Google Scholar
Liu, Y., and L. Jin (2017). Deep matching prior network: Toward tighter multi-oriented text detection. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 1962–69.
Google Scholar
Long, S., X. He, and C. Ya (2018). Scene text detection and recognition: The deep learning era. arXiv preprint arXiv:1811.04256.
Google Scholar
Lyu, M. R., J. Song, and M. Cai. 2005. A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Transactions on Circuits and Systems for Video Technology 15 (2):243–55. doi:10.1109/TCSVT.2004.841653.
Web of Science ®Google Scholar
Moreira, T. P., D. Menotti, and H. Pedrini (2017). First-person action recognition through visual rhythm texture description. In International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2627–31. IEEE.
Google Scholar
Neumann, L., and J. Matas (2010). A method for text localization and recognition in real-world images. In Asian Conference on Computer Vision, Queenstown, New Zealand, 770–83. Springer.
Google Scholar
Pinto, A., H. Pedrini, W. Schwartz, and A. Rocha (2012). Video-based face spoofing detection through visual rhythm analysis. In 25th Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, MG, Brazil, 221–28. IEEE.
Google Scholar
Pinto, A., W. R. Schwartz, H. Pedrini, and A. Rocha. 2015. Using visual rhythms for detecting video-based facial spoof attacks. IEEE Transactions on Information Forensics and Security 10 (5):1025–38. doi:10.1109/TIFS.2015.2395139.
Web of Science ®Google Scholar
Shi, B., X. Bai, and S. Belongie (2017). Detecting oriented text in natural images by linking segments. In IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2550–58.
Google Scholar
Souza, M. R. (2018). Digital video stabilization: Algorithms and evaluation. Master’s thesis, Institute of Computing, University of Campinas, Campinas-SP, Brazil.
Google Scholar
Souza, M. R., and H. Pedrini. 2020. Visual rhythms for qualitative evaluation of video stabilization. EURASIP Journal on Image and Video Processing 2020:1–19. doi:10.1186/s13640-020-00508-4.
Web of Science ®Google Scholar
Souza, M., H. Maia, M. Vieira, and H. Pedrini. 2020. Survey on visual rhythms: A spatio-temporal representation for video sequences. Neurocomputing 402:409–22. doi:10.1016/j.neucom.2020.04.035.
Web of Science ®Google Scholar
Sravani, M., A. Maheswararao, and M. K. Murthy. 2021. Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimedia Tools and Applications 80 (6):9671–86. doi:10.1007/s11042-020-10113-2.
Web of Science ®Google Scholar
Tacon, H., A. S. Brito, H. L. Chaves, M. B. Vieira, S. M. Villela, H. de Almeida Maia, D. T. Concha, and H. Pedrini (2019). Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms. In International Conference on Computational Science and Its Applications, Saint Petersburg, Russia, 351–66. Springer.
Google Scholar
Torres, B. S., and H. Pedrini. 2018. Detection of complex video events through visual rhythm. The Visual Computer 34 (2):145–65. doi:10.1007/s00371-016-1321-1.
Web of Science ®Google Scholar
Valery, G., and S. Jean (2020). Detection and localization of embedded subtitles in a video stream. In International Conference on Computational Science and Its Applications, Cagliari, Italy, 119–28. Springer.
Google Scholar
Valio, F. B., H. Pedrini, and N. J. Leite. 2011. Fast rotation-invariant video caption detection based on visual rhythm. In Iberoamerican Congress on Pattern Recognition, ed. César San Martin and Sang-Woon Kim, 157–64. Springer.
Google Scholar
Villamizar, M., O. Canévet, and J.-M. Odobez. 2020. Multi-scale sequential network for semantic text segmentation and localization. Pattern Recognition Letters 129:63–69. doi:10.1016/j.patrec.2019.11.001.
Web of Science ®Google Scholar
Wu, J.-C., J.-W. Hsieh, and Y.-S. Chen. 2008. Morphology-based text line extraction. Machine Vision and Applications 19 (3):195–207. doi:10.1007/s00138-007-0092-0.
Web of Science ®Google Scholar
Yin, X.-C., Z.-Y. Zuo, S. Tian, and C.-L. Liu. 2016. Text detection, tracking and recognition in video: A comprehensive survey. IEEE Transactions on Image Processing 25 (6):2752–73. doi:10.1109/TIP.2016.2554321.
PubMed Web of Science ®Google Scholar
Zedan, I. A., K. M. Elsayed, and E. Emary (2016). Caption detection, localization and type recognition in Arabic news video. In 10th International Conference on Informatics and Systems, Giza, Egypt, 114–20. ACM.
Google Scholar
Zhang, Y., and T.-S. Chua. 2000. Detection of text captions in compressed domain video. In ACM Workshops on Multimedia, 201–04. New York, NY, USA: ACM.
Google Scholar
Zhang, Z., C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai (2016). Multi-oriented text detection with fully convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 4159–67.
Google Scholar
Zhong, Y., H. Zhang, and A. K. Jain. 2000. Automatic caption localization in compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (4):385–92. doi:10.1109/34.845381.
Web of Science ®Google Scholar

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Multi-Script Video Caption Localization Based on Visual Rhythms

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Multi-Script Video Caption Localization Based on Visual Rhythms

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date