356
Views
9
CrossRef citations to date
0
Altmetric
Articles

Modulation Spectral Features: In Pursuit of Invariant Representations of Music with Application to Unsupervised Source Identification

, &
Pages 58-70 | Received 15 Nov 2013, Accepted 15 Apr 2014, Published online: 19 Jun 2014

References

  • Alluri, V., & Toiviainen, P. (2009). Exploring perceptual and acoustical correlates of polyphonic timbre. Music Perception, 27(3), 223–241.
  • Anemüller, J., Schmidt, D. & Bach, J.-H. (2008). Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features. In Proceedings of INTERSPEECH ’08 (pp. 2582–2585). New York: Curran Associates.
  • Atlas, L. (2003). Modulation spectral transforms: Application to speech separation and modification (Technical report). Kyoto, Japan: The Institute of Electronics, Information and Communication Engineers.
  • Atlas, L., Clark, P. & Schimmel, S. (2010). Modulation Toolbox Version 2.1 for MATLAB. http://isdl.ee.washington.edu/projects/modulationtoolbox/
  • Atlas, L. & Janssen, C. (2005). Coherent modulation spectral filtering for single-channel music source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (Vol. 4, pp. iv/461-iv/464). Piscataway, NJ: IEEE.
  • Atlas, L., & Shamma, S. A. (2003). Joint acoustic and modulation frequency. EURASIP Journal of Applied Signal Processing, 2003, 668–675.
  • Bach, J.H., Anemüller, J., & Kollmeier, B. (2011). Robust speech detection in real acoustic backgrounds with perceptually motivated features. Speech Communication, 53(5), 690–706.
  • Bach, J.-H.H., Kollmeier, B. & Anemüller, J. (2010). Modulation-based detection of speech in real background noise: Generalization to novel background classes. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). Proceedings (pp. 41–44). Piscataway, NJ: IEEE.
  • Barry, D., Fitzgerald, D., Coyle, E. & Lawlor, B. (2005). Drum source separation using percussive feature detection and spectral modulation. In IEE Irish Signals and Systems Conference (pp. 13–17). IEE.
  • Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: Bradford Books.
  • Chi, T., Ru, P., & Shamma, S. A. (2005). Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America, 118(2), 887–906.
  • Clark, P., & Atlas, L. (2009). Time-frequency coherent modulation filtering of nonstationary signals. IEEE Transactions on Signal Processing, 57(11), 4323–4332.
  • Cont, A., Dubnov, S. & Wessel, D. (2007). Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints. 10th International Conference on Digital Audio Effects (DAFx-07). (pp. 85–92). Bordeaux: DAFx.
  • Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. The Journal of the Acoustical Society of America, 102(5), 2892–2905.
  • Delprat, N. (1997). Global frequency modulation laws extraction from the Gabor transform of a signal: a first study of the interacting components case. IEEE Transactions on Speech and Audio Processing, 5(1), 64–71.
  • Disch, S. & Edler, B. (2009). Multiband perceptual modulation analysis, processing and synthesis of audio signals. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2009 (pp. 2305–2308). Piscataway, NJ: IEEE.
  • Drullman, R., Festen, J. M., & Plomp, R. (1994). Effect of reducing slow temporal modulations on speech reception. The Journal of the Acoustical Society of America, 95(5), 2670–2680.
  • Eronen, A. (2001). Comparison of features for musical instrument recognition. In IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, 2001 (pp. 19–22). Piscataway, NJ: IEEE.
  • Falk, T. H., Fraga, F. J., Trambaiolli, L., & Anghinah, R. (2012). EEG amplitude modulation analysis for semi-automated diagnosis of Alzheimer’s disease. EURASIP Journal on Advances in Signal Processing, 2012(1), 1–9.
  • Ganapathy, S., Thomas, S. & Hermansky, H. (2009a). Modulation frequency features for phoneme recognition in noisy speech. The Journal of the Acoustical Society of America, 125(1), EL8–EL12.
  • Ganapathy, S., Thomas, S., & Hermansky, H. (2009b). 2009b. In Proceedings of Interspeech: Static and dynamic modulation spectrum for speech recognition. (pp. 2823–2826). Brighton: INTERSPEECH.
  • Greenberg, S. & Kingsbury, B.E.D. (1997). The modulation spectrogram: In pursuit of an invariant representation of speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997. ICASSP-97 (Vol. 3, pp. 1647–1650). Piscataway, NJ: IEEE.
  • Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5), 1270–1277.
  • Havlicek, J.P. & Bovik, A.C. (1992). Modulation models for image processing and wavelet-based image demodulation. In Conference Record of The Twenty-Sixth Asilomar Conference on Signals, Systems and Computers (Vol. 2pp. 805–810). Piscataway, NJ: IEEE.
  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.
  • Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
  • Herrera, P., Peeters, G., & Dubnov, S. (2003). Automatic classification of musical instrument sounds. Journal of New Music Research, 32(1), 3–21.
  • Holzapfel, A. & Stylianou, Y. (2008). Rhythmic similarity of music based on dynamic periodicity warping. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2217–2220). Piscataway, NJ: IEEE.
  • Jensen, K. (1999). Timbre models of musical sounds (PhD thesis). University of Copenhagen, Copenhagen, Denmark.
  • Joder, C., Essid, S., & Richard, G. (2009). Temporal integration for audio classification with application to musical instrument classification. IEEE Transactions on Audio, Speech, and Language Processing, 17(1), 174–186.
  • Kauppinen, J. (2012). Music Data Mining edited by Tao Li, Mitsunori Ogihara. George Tzanetakis. International Statistical Review, 80(1), 189–190.
  • Kingsbury, B.E.D., Morgan, N., & Greenberg, S. (1998). Robust speech recognition using the modulation spectrogram. Speech Communication, 25(1–3), 117–132.
  • Kinnunen, T. (2006). Joint acoustic-modulation frequency for speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (DOI: 10.1109/ICASSP.2006.1660108). Piscataway, NJ: IEEE.
  • Kopparapu, S.K., Pandharipande, M.A., & Sita, G. (2010). Music and vocal separation using multiband modulation based features. In IEEE Symposium on Industrial Electronics and; Applications (ISIEA) (pp. 733–737). Piscataway, NJ: IEEE.
  • Langner, G. (1997). Temporal processing of pitch in the auditory system. Journal of New Music Research, 26(2), 116–132.
  • Lee, C.-H., Chou, C.-H., Lien, C.-C. & Fang, J.-C. (2011). Music genre classification using modulation spectral features and multiple prototype vectors representation. In 4th International Congress on Image and Signal Processing (CISP) (Vol. 5pp. 2762–2766). Piscataway, NJ: IEEE.
  • Lee, C.-H., Lin, H.-S., Chou, C.-H. & Shih, J.-L. (2009a). Modulation spectral analysis of static and transitional information of cepstral and spectral features for music genre classification. In Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP-09 (pp. 1030–1033). Piscataway, NJ: IEEE.
  • Lee, C.-H., Shih, J.-L., Yu, K.-M., & Lin, H.-S. (2009b). Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Transactions on Multimedia, 11(4), 670–682.
  • Lee, C.-H., Shih, J.-L., Yu, K.-M., & Su, J.-M. (2007). Automatic music genre classification using modulation spectral contrast feature. In I. E. E. E. International (Ed.), Conference on Multimedia and Expo (pp. 204–207). Piscataway, NJ: IEEE.
  • Li, Q. & Atlas, L. (2005). Properties for modulation spectral filtering. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (Vol. 4pp. iv/521–iv/524). Piscataway, NJ: IEEE.
  • Li, Y., Woodruff, J., & Wang, D. (2009). Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1361–1371.
  • Lim, S.-C., Jang, S.-J., Lee, S.-P., & Kim, M. Y. (2011). Music genre/mood classification using a feature-based modulation spectrum. In International Conference on Mobile IT Convergence (ICMIC) (pp. 133–136). Piscataway, NJ: IEEE.
  • Loeffler, B. D. (2006). Instrument timbres and pitch estimation in polyphonic music (Master’s thesis). Georgia Institute of Technology, Atlanta, GA, USA.
  • Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval. Plymouth, MA: ISMIR.
  • Malyska, N., Quatieri, T.F. & Sturim, D. (2005). Automatic dysphonia recognition using biologically-inspired amplitude-modulation features. IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Vol. 1pp. 873–876). Piscataway, NJ: IEEE.
  • Markaki, M. & Stylianou, Y. (2009). Evaluation of modulation frequency features for speaker verification and identification. 17th European Signal Processing Conference. (pp. 549–553). Glasgow: EURASIP.
  • Markaki, M., & Stylianou, Y. (2011). Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features. Speech Communication, 53(5), 726–735.
  • McDermott, J. H., & Simoncelli, E. P. (2011). Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis. Neuron, 71(5), 926–940.
  • Mesgarani, N., Slaney, M., & Shamma, S. A. (2006). Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 920–930.
  • Moritz, N., Anemuller, J., & Kollmeier, B. (2011). Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments. In I. E. E. E. International (Ed.), Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5492–5495). Piscataway, NJ: IEEE.
  • Mubarak, O.M., Ambikairajah, E., Epps, J. & Gunawan, T.S. (2006). Modulation features for speech and music classification. 10th IEEE Singapore International Conference on Communication Systems, ICCS 2006 (pp. 1–5). Piscataway, NJ: IEEE.
  • Muller, M., Ellis, D. P. W., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1088–1110.
  • Nagathil, A., Gerkmann, T. & Martin, R. (2010). Musical genre classification based on a highly-resolved cepstral modulation spectrum. 18th European Signal Processing Conference, EUSIPCO. (pp. 462–466). Aalborg: EURASIP.
  • Nagathil, A., Gottel, P., & Martin, R. (2011). Hierarchical audio classification using cepstral modulation ratio regressions based on Legendre polynomials. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2216–2219). Piscataway, NJ: IEEE.
  • Ono, N., Miyamoto, K., Roux, J.L., Kameoka, H., Sagayama, S. & Le Roux, J. (2008). Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram. 16th European Signal Processing Conference. Proceedings EUSIPCO’08. Lausanne: EURASIP.
  • Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.
  • Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.
  • Panagakis, Y., Kotropoulos, C. & Arce, G.R. (2009). Music genre classification via sparse representations of auditory temporal modulations. 17th European Signal Processing Conference Proceedings. (pp. 1–5). Glasgow: EURASIP.
  • Panagakis, Y., Kotropoulos, C., & Arce, G. R. (2010). Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 576–588.
  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: Prentice-Hall.
  • Rao, A., & Kumaresan, R. (2000). On decomposing speech into modulated components. IEEE Transactions on Speech and Audio Processing, 8(3), 240–254.
  • Rodriguez-Serrano, F.J., Vera-Candeas, P., Cabañas Molero, P., Carabias-Orti, J.J. & Ruiz Reyes, N. (2010). Amplitude modulated sinusoidal modeling for audio onset detection. 18th European Signal Processing Conference, EUSIPCO (pp. 512–516). Aalborg: EURASIP.
  • Rohit, H.D., Deshp, H. & Singh, R. (2001). Classification of music signals in the visual domain. Proceedings of the COST G-6 Conference on Digital Audio Effects (pp. DAFX1–DAFX4). Limerick: DAFX.
  • Ru, P., & Shamma, S. A. (1997). Representation of musical timbre in the auditory cortex. Journal of New Music Research, 26(2), 154–169.
  • Sadjadi, S. O., & Hansen, J. H. L. (2011). Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5448–5451). Piscataway, NJ: IEEE.
  • Schimmel, S.M. (2005). Analysis of signal reconstruction after modulation filtering. Proc. SPIE, 5910, 59100H–59100H-10.
  • Schimmel, S.M. (2007). Theory of modulation frequency analysis and modulation filtering, with applications to hearing devices (PhD thesis), University of Washington, Seattle, WA, USA.
  • Sephus, N., Lanterman, A. & Anderson, D. (2013). Exploring frequency modulation features and resolution in the modulation spectrum. 2013 IEEE Digital Signal Processing (DSP) and Signal Processing Education (SPE) Meeting (pp. 169–174). Piscataway, NJ: IEEE.
  • Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences, 34(3), 114–123.
  • Shi, Y.-Y., Zhu, X., Kim, H.-G., & Eom, K.-W. (2006). A tempo feature via modulation spectrum analysis and its application to music emotion classification. In IEEE International Conference on Multimedia and Expo (pp. 1085–1088). Piscataway, NJ: IEEE.
  • Smith, L. M., & Honing, H. (2008). Time-frequency representation of musical rhythm by continuous wavelets. Journal of Mathematics and Music, 2(2), 81–97.
  • Suh, J.W., Sadjadi, S.O., Liu, G., Hasan, T., Godin, K.W. & Hansen, J.H.L. (2011). Exploring Hilbert envelope based acoustic featuresin i-vector speaker verification using HT-PLDA. Proceedings of NIST 2011 Speaker Recognition Evaluation Workshop.
  • Triki, M. & Slock, D.T.M. (2005). Periodic signal extraction with global amplitude and phase modulation for music signal decomposition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (ICASSP-05) (Vol. 3, pp. iii/233–iii/236). Piscataway, NJ: IEEE.
  • Tsunoo, N. & Sagayama, S. (2009). Rhythm map: Extraction of unit rhythmic patterns and analysis of rhythmic structure from music acoustic signals. 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 185–188). Piscataway, NJ: IEEE.
  • Tyagi, V., McCowan, I., Misra, H., & Bourlard, H. (2003). Mel-cepstrum modulation spectrum (MCMS) features for robust ASR. In I. E. E. E. Workshop (Ed.), on Automatic Speech Recognition and Understanding (pp. 399–404). Piscataway, NJ: IEEE.
  • Uhle, C., Dittmar, C. & Sporer, T. (2003). Extraction of drum tracks from polyphonic music using independent subspace analysis. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003), (Vol. 2003, pp. 843–848). Nara: ICA.
  • Wu, M.J., Chen, Z.S., Jang, J.S.S.R., Ren, J.M., Li, Y.H. & Lu, C.H. (2011). Combining visual and acoustic features for music genre classification. 10th International Conference on Machine Learning and Applications and Workshops (ICMLA) (Vol. 2pp. 124–129). Piscataway, NJ: IEEE.
  • Yang, X., Wang, K., & Shamma, S. A. (1992). Auditory representations of acoustic signals. IEEE Transactions on Information Theory, 38(2), 824–839.
  • Zlatintsi, A. & Maragos, P. (2012). AM-FM modulation features for music instrument signal analysis and recognition. 20th European Signal Processing Conference Proceedings. EUSIPCO’12. (pp. 2035-2039). Bucharest: EURASIP.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.