Search in:

Journal of New Music Research Volume 44, 2015 - Issue 1: Cross-disciplinary and Multi-cultural Perspectives on Musical Rhythm

Submit an article Journal homepage

356

Views

CrossRef citations to date

Altmetric

Articles

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

Nashlie H. SephusGeorgia Institute of Technology, USA.Correspondence[email protected]
View further author information

Aaron D. LantermanGeorgia Institute of Technology, USA.View further author information

David V. AndersonGeorgia Institute of Technology, USA.View further author information

Pages 58-70 | Received 15 Nov 2013, Accepted 15 Apr 2014, Published online: 19 Jun 2014

Cite this article
https://doi.org/10.1080/09298215.2014.916723
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Alluri, V., & Toiviainen, P. (2009). Exploring perceptual and acoustical correlates of polyphonic timbre. Music Perception, 27(3), 223–241.
Web of Science ®Google Scholar
Anemüller, J., Schmidt, D. & Bach, J.-H. (2008). Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features. In Proceedings of INTERSPEECH ’08 (pp. 2582–2585). New York: Curran Associates.
Google Scholar
Atlas, L. (2003). Modulation spectral transforms: Application to speech separation and modification (Technical report). Kyoto, Japan: The Institute of Electronics, Information and Communication Engineers.
Google Scholar
Atlas, L., Clark, P. & Schimmel, S. (2010). Modulation Toolbox Version 2.1 for MATLAB. http://isdl.ee.washington.edu/projects/modulationtoolbox/
Google Scholar
Atlas, L. & Janssen, C. (2005). Coherent modulation spectral filtering for single-channel music source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (Vol. 4, pp. iv/461-iv/464). Piscataway, NJ: IEEE.
Google Scholar
Atlas, L., & Shamma, S. A. (2003). Joint acoustic and modulation frequency. EURASIP Journal of Applied Signal Processing, 2003, 668–675.
Google Scholar
Bach, J.H., Anemüller, J., & Kollmeier, B. (2011). Robust speech detection in real acoustic backgrounds with perceptually motivated features. Speech Communication, 53(5), 690–706.
Web of Science ®Google Scholar
Bach, J.-H.H., Kollmeier, B. & Anemüller, J. (2010). Modulation-based detection of speech in real background noise: Generalization to novel background classes. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). Proceedings (pp. 41–44). Piscataway, NJ: IEEE.
Google Scholar
Barry, D., Fitzgerald, D., Coyle, E. & Lawlor, B. (2005). Drum source separation using percussive feature detection and spectral modulation. In IEE Irish Signals and Systems Conference (pp. 13–17). IEE.
Google Scholar
Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: Bradford Books.
Google Scholar
Chi, T., Ru, P., & Shamma, S. A. (2005). Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America, 118(2), 887–906.
PubMed Web of Science ®Google Scholar
Clark, P., & Atlas, L. (2009). Time-frequency coherent modulation filtering of nonstationary signals. IEEE Transactions on Signal Processing, 57(11), 4323–4332.
Web of Science ®Google Scholar
Cont, A., Dubnov, S. & Wessel, D. (2007). Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints. 10th International Conference on Digital Audio Effects (DAFx-07). (pp. 85–92). Bordeaux: DAFx.
Google Scholar
Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. The Journal of the Acoustical Society of America, 102(5), 2892–2905.
PubMed Web of Science ®Google Scholar
Delprat, N. (1997). Global frequency modulation laws extraction from the Gabor transform of a signal: a first study of the interacting components case. IEEE Transactions on Speech and Audio Processing, 5(1), 64–71.
Google Scholar
Disch, S. & Edler, B. (2009). Multiband perceptual modulation analysis, processing and synthesis of audio signals. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2009 (pp. 2305–2308). Piscataway, NJ: IEEE.
Google Scholar
Drullman, R., Festen, J. M., & Plomp, R. (1994). Effect of reducing slow temporal modulations on speech reception. The Journal of the Acoustical Society of America, 95(5), 2670–2680.
PubMed Web of Science ®Google Scholar
Eronen, A. (2001). Comparison of features for musical instrument recognition. In IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, 2001 (pp. 19–22). Piscataway, NJ: IEEE.
Google Scholar
Falk, T. H., Fraga, F. J., Trambaiolli, L., & Anghinah, R. (2012). EEG amplitude modulation analysis for semi-automated diagnosis of Alzheimer’s disease. EURASIP Journal on Advances in Signal Processing, 2012(1), 1–9.
Web of Science ®Google Scholar
Ganapathy, S., Thomas, S. & Hermansky, H. (2009a). Modulation frequency features for phoneme recognition in noisy speech. The Journal of the Acoustical Society of America, 125(1), EL8–EL12.
PubMed Web of Science ®Google Scholar
Ganapathy, S., Thomas, S., & Hermansky, H. (2009b). 2009b. In Proceedings of Interspeech: Static and dynamic modulation spectrum for speech recognition. (pp. 2823–2826). Brighton: INTERSPEECH.
Google Scholar
Greenberg, S. & Kingsbury, B.E.D. (1997). The modulation spectrogram: In pursuit of an invariant representation of speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (Vol. 3, pp. 1647–1650). Piscataway, NJ: IEEE.
Google Scholar
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5), 1270–1277.
PubMed Web of Science ®Google Scholar
Havlicek, J.P. & Bovik, A.C. (1992). Modulation models for image processing and wavelet-based image demodulation. In Conference Record of The Twenty-Sixth Asilomar Conference on Signals, Systems and Computers (Vol. 2pp. 805–810). Piscataway, NJ: IEEE.
Google Scholar
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.
PubMed Web of Science ®Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Google Scholar
Herrera, P., Peeters, G., & Dubnov, S. (2003). Automatic classification of musical instrument sounds. Journal of New Music Research, 32(1), 3–21.
Web of Science ®Google Scholar
Holzapfel, A. & Stylianou, Y. (2008). Rhythmic similarity of music based on dynamic periodicity warping. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2217–2220). Piscataway, NJ: IEEE.
Google Scholar
Jensen, K. (1999). Timbre models of musical sounds (PhD thesis). University of Copenhagen, Copenhagen, Denmark.
Google Scholar
Joder, C., Essid, S., & Richard, G. (2009). Temporal integration for audio classification with application to musical instrument classification. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 174–186.
Web of Science ®Google Scholar
Kauppinen, J. (2012). Music Data Mining edited by Tao Li, Mitsunori Ogihara. George Tzanetakis. International Statistical Review, 80(1), 189–190.
Google Scholar
Kingsbury, B.E.D., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132.
Web of Science ®Google Scholar
Kinnunen, T. (2006). Joint acoustic-modulation frequency for speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (DOI: 10.1109/ICASSP.2006.1660108). Piscataway, NJ: IEEE.
Google Scholar
Kopparapu, S.K., Pandharipande, M.A., & Sita, G. (2010). Music and vocal separation using multiband modulation based features. In IEEE Symposium on Industrial Electronics and; Applications (ISIEA) (pp. 733–737). Piscataway, NJ: IEEE.
Google Scholar
Langner, G. (1997). Temporal processing of pitch in the auditory system. Journal of New Music Research, 26(2), 116–132.
Web of Science ®Google Scholar
Lee, C.-H., Chou, C.-H., Lien, C.-C. & Fang, J.-C. (2011). Music genre classification using modulation spectral features and multiple prototype vectors representation. In 4th International Congress on Image and Signal Processing (CISP) (Vol. 5pp. 2762–2766). Piscataway, NJ: IEEE.
Google Scholar
Lee, C.-H., Lin, H.-S., Chou, C.-H. & Shih, J.-L. (2009a). Modulation spectral analysis of static and transitional information of cepstral and spectral features for music genre classification. In Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP-09 (pp. 1030–1033). Piscataway, NJ: IEEE.
Google Scholar
Lee, C.-H., Shih, J.-L., Yu, K.-M., & Lin, H.-S. (2009b). Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Transactions on Multimedia, 11(4), 670–682.
Web of Science ®Google Scholar
Lee, C.-H., Shih, J.-L., Yu, K.-M., & Su, J.-M. (2007). Automatic music genre classification using modulation spectral contrast feature. In I. E. E. E. International (Ed.), Conference on Multimedia and Expo (pp. 204–207). Piscataway, NJ: IEEE.
Google Scholar
Li, Q. & Atlas, L. (2005). Properties for modulation spectral filtering. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (Vol. 4pp. iv/521–iv/524). Piscataway, NJ: IEEE.
Google Scholar
Li, Y., Woodruff, J., & Wang, D. (2009). Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1361–1371.
Web of Science ®Google Scholar
Lim, S.-C., Jang, S.-J., Lee, S.-P., & Kim, M. Y. (2011). Music genre/mood classification using a feature-based modulation spectrum. In International Conference on Mobile IT Convergence (ICMIC) (pp. 133–136). Piscataway, NJ: IEEE.
Google Scholar
Loeffler, B. D. (2006). Instrument timbres and pitch estimation in polyphonic music (Master’s thesis). Georgia Institute of Technology, Atlanta, GA, USA.
Google Scholar
Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval. Plymouth, MA: ISMIR.
Google Scholar
Malyska, N., Quatieri, T.F. & Sturim, D. (2005). Automatic dysphonia recognition using biologically-inspired amplitude-modulation features. IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Vol. 1pp. 873–876). Piscataway, NJ: IEEE.
Google Scholar
Markaki, M. & Stylianou, Y. (2009). Evaluation of modulation frequency features for speaker verification and identification. 17th European Signal Processing Conference. (pp. 549–553). Glasgow: EURASIP.
Google Scholar
Markaki, M., & Stylianou, Y. (2011). Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features. Speech Communication, 53(5), 726–735.
Web of Science ®Google Scholar
McDermott, J. H., & Simoncelli, E. P. (2011). Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis. Neuron, 71(5), 926–940.
PubMed Web of Science ®Google Scholar
Mesgarani, N., Slaney, M., & Shamma, S. A. (2006). Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 920–930.
Web of Science ®Google Scholar
Moritz, N., Anemuller, J., & Kollmeier, B. (2011). Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments. In I. E. E. E. International (Ed.), Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5492–5495). Piscataway, NJ: IEEE.
Google Scholar
Mubarak, O.M., Ambikairajah, E., Epps, J. & Gunawan, T.S. (2006). Modulation features for speech and music classification. 10th IEEE Singapore International Conference on Communication Systems, ICCS 2006 (pp. 1–5). Piscataway, NJ: IEEE.
Google Scholar
Muller, M., Ellis, D. P. W., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1088–1110.
Web of Science ®Google Scholar
Nagathil, A., Gerkmann, T. & Martin, R. (2010). Musical genre classification based on a highly-resolved cepstral modulation spectrum. 18th European Signal Processing Conference, EUSIPCO. (pp. 462–466). Aalborg: EURASIP.
Google Scholar
Nagathil, A., Gottel, P., & Martin, R. (2011). Hierarchical audio classification using cepstral modulation ratio regressions based on Legendre polynomials. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2216–2219). Piscataway, NJ: IEEE.
Google Scholar
Ono, N., Miyamoto, K., Roux, J.L., Kameoka, H., Sagayama, S. & Le Roux, J. (2008). Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram. 16th European Signal Processing Conference. Proceedings EUSIPCO’08. Lausanne: EURASIP.
Google Scholar
Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.
Web of Science ®Google Scholar
Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.
Web of Science ®Google Scholar
Panagakis, Y., Kotropoulos, C. & Arce, G.R. (2009). Music genre classification via sparse representations of auditory temporal modulations. 17th European Signal Processing Conference Proceedings. (pp. 1–5). Glasgow: EURASIP.
Google Scholar
Panagakis, Y., Kotropoulos, C., & Arce, G. R. (2010). Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 576–588.
Web of Science ®Google Scholar
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: Prentice-Hall.
Google Scholar
Rao, A., & Kumaresan, R. (2000). On decomposing speech into modulated components. IEEE Transactions on Speech and Audio Processing, 8(3), 240–254.
Google Scholar
Rodriguez-Serrano, F.J., Vera-Candeas, P., Cabañas Molero, P., Carabias-Orti, J.J. & Ruiz Reyes, N. (2010). Amplitude modulated sinusoidal modeling for audio onset detection. 18th European Signal Processing Conference, EUSIPCO (pp. 512–516). Aalborg: EURASIP.
Google Scholar
Rohit, H.D., Deshp, H. & Singh, R. (2001). Classification of music signals in the visual domain. Proceedings of the COST G-6 Conference on Digital Audio Effects (pp. DAFX1–DAFX4). Limerick: DAFX.
Google Scholar
Ru, P., & Shamma, S. A. (1997). Representation of musical timbre in the auditory cortex. Journal of New Music Research, 26(2), 154–169.
Web of Science ®Google Scholar
Sadjadi, S. O., & Hansen, J. H. L. (2011). Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5448–5451). Piscataway, NJ: IEEE.
Google Scholar
Schimmel, S.M. (2005). Analysis of signal reconstruction after modulation filtering. Proc. SPIE, 5910, 59100H–59100H-10.
Google Scholar
Schimmel, S.M. (2007). Theory of modulation frequency analysis and modulation filtering, with applications to hearing devices (PhD thesis), University of Washington, Seattle, WA, USA.
Google Scholar
Sephus, N., Lanterman, A. & Anderson, D. (2013). Exploring frequency modulation features and resolution in the modulation spectrum. 2013 IEEE Digital Signal Processing (DSP) and Signal Processing Education (SPE) Meeting (pp. 169–174). Piscataway, NJ: IEEE.
Google Scholar
Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences, 34(3), 114–123.
PubMed Web of Science ®Google Scholar
Shi, Y.-Y., Zhu, X., Kim, H.-G., & Eom, K.-W. (2006). A tempo feature via modulation spectrum analysis and its application to music emotion classification. In IEEE International Conference on Multimedia and Expo (pp. 1085–1088). Piscataway, NJ: IEEE.
Google Scholar
Smith, L. M., & Honing, H. (2008). Time-frequency representation of musical rhythm by continuous wavelets. Journal of Mathematics and Music, 2(2), 81–97.
Web of Science ®Google Scholar
Suh, J.W., Sadjadi, S.O., Liu, G., Hasan, T., Godin, K.W. & Hansen, J.H.L. (2011). Exploring Hilbert envelope based acoustic featuresin i-vector speaker verification using HT-PLDA. Proceedings of NIST 2011 Speaker Recognition Evaluation Workshop.
Google Scholar
Triki, M. & Slock, D.T.M. (2005). Periodic signal extraction with global amplitude and phase modulation for music signal decomposition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (ICASSP-05) (Vol. 3, pp. iii/233–iii/236). Piscataway, NJ: IEEE.
Google Scholar
Tsunoo, N. & Sagayama, S. (2009). Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals. 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 185–188). Piscataway, NJ: IEEE.
Google Scholar
Tyagi, V., McCowan, I., Misra, H., & Bourlard, H. (2003). Mel-cepstrum modulation spectrum (MCMS) features for robust ASR. In I. E. E. E. Workshop (Ed.), on Automatic Speech Recognition and Understanding (pp. 399–404). Piscataway, NJ: IEEE.
Google Scholar
Uhle, C., Dittmar, C. & Sporer, T. (2003). Extraction of drum tracks from polyphonic music using independent subspace analysis. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), (Vol. 2003, pp. 843–848). Nara: ICA.
Google Scholar
Wu, M.J., Chen, Z.S., Jang, J.S.S.R., Ren, J.M., Li, Y.H. & Lu, C.H. (2011). Combining visual and acoustic features for music genre classification. 10th International Conference on Machine Learning and Applications and Workshops (ICMLA) (Vol. 2pp. 124–129). Piscataway, NJ: IEEE.
Google Scholar
Yang, X., Wang, K., & Shamma, S. A. (1992). Auditory representations of acoustic signals. IEEE Transactions on Information Theory, 38(2), 824–839.
Web of Science ®Google Scholar
Zlatintsi, A. & Maragos, P. (2012). AM-FM modulation features for music instrument signal analysis and recognition. 20th European Signal Processing Conference Proceedings. EUSIPCO’12. (pp. 2035-2039). Bucharest: EURASIP.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date