A New Multimedia Content Skimming Technique at Arbitrary User-Set Rate Based on Automatic Speech Emphasis Extraction: International Journal of Human

Views

CrossRef citations to date

Altmetric

Abstract

This article proposes a new technique for skimming multimedia content such as video mail, audio/visual data in blog sites, and other consumer-generated media. The proposed method, which is based on the automatic extraction of emphasized speech, locates emphasized portions of speech with high accuracy by using prosodic parameters such as pitch, power, and speaking rate. As the method does not employ any speech recognition technique, it enables a highly robust estimation in noisy environments. To extract emphasized portions of speech, the method introduces a metric, “degree of emphasis,” which indicates the degree of emphasis of each speech segment. Given an article, the method computes the degree of emphasis for each speech segment in it. When a user requests a skimming of the article's content, the method refers to the user-specified “skimming rate” to collect the emphasized segments. Preference experiments were performed in which participants were asked to select either the skimmed contents created by our method or those created using a fixed interval approach. The preference rate of our method was about 80%, which suggests that the proposed method can generate proper content skimming.

Acknowledgments

We thank Katsuhiko Ogawa, Director of NTT Cyber Solutions Laboratories, for encouraging us to write this article. We have had helpful discussions with Osamu Mizuno, Broadband Business Development Division, Corporate Business Headquarters, NTT EAST, Junji Takeuchi, Visual Communications Division, NTT Bizlink, and the office support staff of Megumi Machiguchi, Kaori Takei, and Harumi Matsuura.

Notes

¹Cepstrum is a standard spectral parameter in the speech processing domain and is obtained by applying inverse-FFT to log area arithmic domain power spectrum (CitationBogert, Healy, & Tukey, 1963).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

A New Multimedia Content Skimming Technique at Arbitrary User-Set Rate Based on Automatic Speech Emphasis Extraction

Information for

Open access

Opportunities

Help and information

A New Multimedia Content Skimming Technique at Arbitrary User-Set Rate Based on Automatic Speech Emphasis Extraction

Abstract

Acknowledgments

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature