Views

CrossRef citations to date

Altmetric

Research Article

Multi-Script Video Caption Localization Based on Visual Rhythms

Marcos Roberto e Souzaa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0003-4342-5220 View further author information

Helena de Almeida Maiaa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0002-8253-9004 View further author information

Anderson Carlos Souza e Santosa Institute of Computing, University of Campinas, Campinas, Brazil

https://orcid.org/0000-0002-7806-3410 View further author information

Marcelo Bernardes Vieirab Department of Computer Science, Federal University of Juiz de Fora (UFJF), Juiz de Fora, Brazil

https://orcid.org/0000-0003-3356-6679 View further author information

Helio Pedrinia Institute of Computing, University of Campinas, Campinas, BrazilCorrespondence[email protected]

https://orcid.org/0000-0003-0125-630X View further author information

Figures & data

Figure 1. Overview of the proposed video caption localization method based on visual rhythms.

Figure 2. Example of a caption localization mask.

Figure 3. Examples of visual rhythms extracted from mask videos.

Figure 4. Steps of the visual rhythm processing. Initially, each visual rhythm has its connected components detected. They are then filtered by a subcomponent analysis. Caption positions are retrieved from final visual rhythms.

Figure 5. Example of detected and processed components.

Figure 6. Examples of visual rhythms after processing.

Table 1. Information from videos collected from YouTube to compose the dataset. All videos were tagged with creative commons license.

Display Table

Figure 7. Frames from each of the videos collected from YouTube.

Figure 8. Representation of text insertion in video and creation of ground-truth information.

Table 2. Different scripts considered in this work.

Download CSV Display Table

Figure 9. Same sentence for different scripts considered.

Figure 10. Statistics for the built dataset.

Table 3. Results obtained for detecting frames with captions for the different videos in the dataset.

Display Table

Table 4. Results obtained for detecting frames with captions for different scripts.

Display Table

Table 5. Results obtained for detecting frames with captions for different caption characteristics.

Display Table

Table 6. Average accuracy achieved for video caption localization.

Display Table

Table 7. Average accuracy obtained for caption location for each script.

Display Table

Table 8. Average accuracy obtained for caption localization with different characteristics.

Display Table

Figure 11. Results for frames with scene and caption text.

Multi-Script Video Caption Localization Based on Visual Rhythms

Table 1. Information from videos collected from YouTube to compose the dataset. All videos were tagged with creative commons license.

Table 2. Different scripts considered in this work.

Table 3. Results obtained for detecting frames with captions for the different videos in the dataset.

Table 4. Results obtained for detecting frames with captions for different scripts.

Table 5. Results obtained for detecting frames with captions for different caption characteristics.

Table 6. Average accuracy achieved for video caption localization.

Table 7. Average accuracy obtained for caption location for each script.

Table 8. Average accuracy obtained for caption localization with different characteristics.

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Multi-Script Video Caption Localization Based on Visual Rhythms

Figures & data

Table 1. Information from videos collected from YouTube to compose the dataset. All videos were tagged with creative commons license.

Table 2. Different scripts considered in this work.

Table 3. Results obtained for detecting frames with captions for the different videos in the dataset.

Table 4. Results obtained for detecting frames with captions for different scripts.

Table 5. Results obtained for detecting frames with captions for different caption characteristics.

Table 6. Average accuracy achieved for video caption localization.

Table 7. Average accuracy obtained for caption location for each script.

Table 8. Average accuracy obtained for caption localization with different characteristics.

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date