CrossRef citations to date
Original Articles

An Overview of Audio Event Detection Methods from Feature Extraction to Classification

, , , ORCID Icon &


  • Acır, N., Ö. Özdamar, and C. Güzeliş. 2006. Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Engineering Applications of Artificial Intelligence 19 (2):209–18.
  • Agrawala, A. 1970. Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16 (4):373–79.
  • Alcala-Fdez, J., R. Alcala, and F. Herrera. 2011. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems 19 (5):857–72.
  • Andreassen, T., A. Surlykke, and J. Hallam. 2014. Semi-automatic long-term acoustic surveying: A case study with bats. Ecological Informatics 21:13–24.
  • Arnold, M. 2002. Subjective and objective quality evaluation of watermarked audio tracks. Proceedings. Second International Conference on Web Delivering of Music. WEDELMUSIC. IEEE.
  • Atrey, P. K., M. C. Maddage, and M. S. Kankanhalli. 2006. Audio based event detection for multimedia surveillance. International Conference on Acoustics, Speech and Signal Processing, ICASSP Proceedings, IEEE.
  • Bailey, T., and A. K. Jain. 1978. A note on distance-weighted k-nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics 8 (4):311–13.
  • Balochian, S., E. A. Seidabad, and S. Z. Rad. 2013. Neural network optimization by genetic algorithms for the audio classification to speech and music. International Journal of Signal Processing, Image Processing & Pattern Recognition 6 (3).
  • Bardeli, R., D. Wolff, F. Kurth, M. Koch, K. H. Tauchert, and K. H. Frommolt. 2010. Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognition Letters 31 (12):1524–34.
  • Bauer, E., and R. Kohavi. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36 (1–2):105–39.
  • Baum, L. E., and T. Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics 37 (6):1554–63.
  • Besacier, L., J. F. Bonastre, and C. Fredouille. 2000. Localization and selection of speaker-specific information with statistical modeling. Speech Communication 31 (2–3):89–106.
  • Bhatia, N. 2010. Survey of nearest neighbor techniques. International Journal of Computer Science and Information Security (IJCSIS) 8 (2):302–305.
  • Bhavsar, H., and A. Ganatra. 2012. A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE) 2 (4):2231–2307.
  • Bin, M., L. Haizhou, and T. Rong. 2007. Spoken language recognition using ensemble classifiers. IEEE Transactions on Audio, Speech, and Language Processing 15 (7):2053–62.
  • Blum, A., and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on Computational learning theory, ACM.
  • Bourlard, H. A., and N. Morgan. 1993. Connectionist speech recognition: A hybrid approach. Kluwer Academic Publishers. https://link.springer.com/book/10.1007%2F978-1-4615-3210-1
  • Breiman, L. 1996. Bagging predictors. Machine Learning 24 (2):123–40.
  • Breiman, L. 2001. Random forests. Machine Learning 45 (1):5–32.
  • Buckley, J. J., and Y. Hayashi. 1994. Fuzzy genetic algorithm and applications. Fuzzy Sets and Systems 61 (2):129–36.
  • Busso, C., S. Lee, and S. Narayanan. 2009. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing 17 (4):582–96.
  • Cakir, E., T. Heittola, H. Huttunen, and T. Virtanen. 2015. Polyphonic sound event detection using multi label deep neural networks. Neural Networks (IJCNN), 2015 International Joint Conference on, IEEE. pp. 1–7.
  • Campbell, J. P., Jr. 1997. Speaker recognition: A tutorial. Proceedings of the IEEE 85 (9):1437–62.
  • Carpenter, G. A., S. Grossberg, and J. H. Reynolds. 1991. ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Networks 4 (5):565–88.
  • Charalampidis, D., M. Georgiopoulos, and T. Kasparis. 2000. Classification of noisy signal using fuzzy ARTMAP neural networks. International Joint Conference on Neural Networks, IJCNN2000, Proceedings of the IEEE-INNS-ENNS.
  • Cheng, J., Y. Sun, and L. Ji. 2010. A call-independent and automatic acoustic system for the individual recognition of animals: A novel model using four passerines. Pattern Recognition 43 (11):3846–52.
  • Choi, J.-H., and J.-H. Chang. 2012. On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication 54 (3):477–90.
  • Chuan, C.-H. 2013. Audio classification and retrieval using wavelets and gaussian mixture models. International Journal Multimed Data Engineering Managed 4 (1):1–20.
  • Chung-Hsien, W., and H. Chia-Hsin. 2006. Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Transactions on Audio, Speech, and Language Processing 14 (2):647–57.
  • Cintra, M. E., M. C. Monard, E. A. Cherman, and H. De Arruda Camargo. 2011. On the estimation of the number of fuzzy sets for fuzzy rule-based classification systems. 11th International Conference on Hybrid Intelligent Systems (HIS), 2011.
  • Clavel, C., T. Ehrette, and G. Richard. 2005. Events detection for an audio-based surveillance system. IEEE International Conference on Multimedia and Expo, 2005. ICME 2005.
  • Cohen, I., N. Sebe, F. G. Gozman, M. C. Cirelo, and T. S. Huang. 2003. Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
  • Cordón, O., M. J. Del Jesus, and F. Herrera. 1999. A proposal on reasoning methods in fuzzy rule-based classification systems. International Journal of Approximate Reasoning 20 (1):21–45.
  • Costa, Y. M. G., L. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins. 2012. Music genre classification using LBP textural features. Signal Processing 92 (11):2723–37.
  • Cover, T., and P. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13 (1):21–27.
  • Cui, X., H. Jing, and C. Jen-Tzung. 2012. Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20 (7):1923,1935.
  • Dafna, E., A. Tarasiuk, and Y. Zigel. 2013. Automatic detection of whole night snoring events using non-contact microphone. PLoS One 8 (12). doi:10.1371/journal.pone.0084139
  • Damper, R. I., and J. E. Higgins. 2003. Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters 24 (13):2167–73.
  • Daoudi, K., D. Fohr, and C. Antoine. 2003. Dynamic Bayesian networks for multi-band automatic speech recognition. Computer Speech & Language 17 (2–3):263–85.
  • Davis, S. B., and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28 (4):357–66.
  • Deller, J. J. R., J. H. L. Hansen, and J. G. Proakis. (2000). Discrete-Time Processing of Speech Signals. Hoboken, New Jersey, USA: Wiley-IEEE Press.
  • Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2009. Classification of audio signals using SVM and RBFNN. Expert Systems with Applications 36 (3):6069–75.
  • Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2011a. Classification of audio signals using AANN and GMM. Applied Soft Computing 11 (1):716–23.
  • Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2011b. Pattern classification models for classifying and indexing audio signals. Engineering Applications of Artificial Intelligence 24 (2):350–57.
  • Dietterich, T. 2000a. Ensemble methods in machine learning. Multiple Classifier Systems, Springer Berlin Heidelberg 1857:1–15.
  • Dietterich, T. 2000b. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40 (2):139–57.
  • Driggers, R. G. 2003. Encyclopedia of Optical Engineering (Vol. 3). Maryland, USA: Marcel Dekker Inc.
  • Drugman, T. 2014. Using mutual information in supervised temporal event detection: Application to cough detection. Biomedical Signal Processing and Control 10:50–57.
  • Espi, M., M. Fujimoto, K. Kinoshita, and T. Nakatani. 2015. Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal on Audio, Speech, and Music Processing 2015 (26): 1–12.
  • Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2):179–88.
  • Freund, Y., and R. E. Schapire. 1996. Experiments with a new boosting algorithm. ICML, Bari, Italy Morgan Kaufmann Publishers Inc.San Francisco, CA, USA.
  • Freund, Y., and R. E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55 (1):119–39.
  • Friedman, N., D. Geiger, and M. Goldszmidt. 1997. Bayesian network classifiers. Machine Learning 29 (2–3):131–63.
  • Ganapathy, S., P. Rajan, and H. Hermansky. 2011. Multi-layer perceptron based speech activity detection for speaker verification. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
  • Gencoglu, O., T. Virtanen, and H. Huttunen. 2014. Recognition of acoustic events using deep neural networks. Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, IEEE, pp. 506–10.
  • Gergen, S., A. Nagathil, and R. Martin. 2014. Classification of reverberant audio signals using clustered ad hoc distributed microphones. Signal Processing 107:21–32.
  • Giannakopoulos, T., D. Kosmopoulos, A. Aristidou, and S. Theodoridis. 2006. Violence content classification using audio features. Advances in Artificial Intelligence, Berlin, Heidelberg: Springer, 502–07.
  • Giannakopoulos, T., and A. Pikrakis. 2014. Chapter 4 - audio features. In Introduction to audio analysis, Eds. T. Giannakopoulos, and A. Pikrakis, 59–103. Oxford: Academic Press.
  • Giannakopoulos, T., A. Pikrakis, and S. Theodoridis. 2007. A multi-class audio classification method with respect to violent content in movies using bayesian networks. Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on.
  • Grossberg, S. 1976. Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions. Biological Cybernetics 23 (4):187–202.
  • Guz, U., S. Cuendet, D. Hakkani-Tür, and G. Tur. 2010. Multi-view semi-supervised learning for dialog act segmentation of speech. IEEE Transactions on Audio, Speech, and Language Processing 18 (2):320,329.
  • Hall, M. 2007. A decision tree-based attribute weighting filter for naive Bayes. Knowledge-Based Systems 20 (2):120–26.
  • Hermansky, H. 1990. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87 (4):1738–52.
  • Hongwei, W., and J. M. Mendel. 2007. Classification of battlefield ground vehicles using acoustic features and fuzzy logic rule-based classifiers. IEEE Transactions on Fuzzy Systems 15 (1):56–72.
  • Huang, C.-J., Y.-J. Yang, D.-X. Yang, and Y.-J. Chen. 2009. Frog classification using machine learning techniques. Expert Systems with Applications 36 (2, Part 2):3737–43.
  • Itoh, H., T. Takiguchi, and Y. Ariki. 2013. Event detection and recognition using HMM with whistle sounds. Signal-Image Technology & Internet-Based Systems (SITIS), 2013 International Conference on.
  • Jain, A. K., R. P. W. Duin, and M. Jianchang. 2000. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (1):4–37.
  • Janik, P., and T. Lobos. 2006. Automated classification of power-quality disturbances using SVM and RBF networks. IEEE Transactions on Power Delivery 21 (3):1663–69.
  • Joachims, T. 1999. Transductive inference for text classification using support vector machines. ICML, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
  • Kalteh, A. M., P. Hjorth, and R. Berndtsson. 2008. Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application. Environmental Modelling & Software 23 (7):835–45.
  • Kaufman, L., and P. J. Rousseeuw. 1990. Finding groups in data: An introduction to cluster analysis. New York: Wiley.
  • Khairnar, D. G., S. N. Merchant, and U. B. Desai. 2005. An optimum RBF network for signal detection in non-gaussian noise. In Pattern recognition and machine intelligence, Eds. S. Pal, S. Bandyopadhyay, and S. Biswas, Springer Berlin Heidelberg, 3776:306–309.
  • Khunarsal, P., C. Lursinsap, and T. Raicharoen. 2013. Very short time environmental sound classification based on spectrogram pattern matching. Information Sciences 243:57–74.
  • Kinnunen, T., and H. Li. 2010. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52 (1):12–40.
  • Kinnunen, T., I. Sidoroff, M. Tuononen, and P. Fränti. 2011. Comparison of clustering methods: A case study of text-independent speaker modeling. Pattern Recognition Letters 32 (13):1604–17.
  • Kinnunen, T., B. Zhang, J. Zhu, and Y. Wang. 2007. Speaker verification with adaptive spectral subband centroids. In Advances in biometrics, Eds. S.-W. Lee, and S. Li, Springer Berlin Heidelberg, 4642: 58–66.
  • Kohonen, T. 1982. Analysis of a simple self-organizing process. Biological Cybernetics 44 (2):135–40.
  • Kotti, M., E. Benetos, C. Kotropoulos, and I. Pitas. 2007. A neural network approach to audio-assisted movie dialogue detection. Neurocomputing 71 (1–3):157–66.
  • Kulkarni, V. Y., and P. K. Sinha. 2013. Random forest classifiers: A survey and future research directions. Int Journal of Advanced Computing 36 (1):1144–53.
  • Kumar, A., and B. Raj. 2016. Audio event detection using weakly labeled data. Proceedings of the 2016 ACM on Multimedia Conference, ACM, pp. 1038–47.
  • Lamel, L., L. Rabiner, A. E. Rosenberg, and J. G. Wilpon. 1981. An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 29 (4):777–85.
  • Larsen, B., and C. Aone. 1999. Fast and effective text mining using linear-time document clustering. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM.
  • Lee, C.-H., C.-H. Chou, -C.-C. Han, and R.-Z. Huang. 2006. Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognition Letters 27 (2):93–101.
  • Lefèvre, S., and N. Vincent. 2011. A two level strategy for audio segmentation. Digital Signal Processing 21 (2):270–77.
  • Li, D., I. K. Sethi, N. Dimitrova, and T. McGee. 2001. Classification of general audio data for content-based retrieval. Pattern Recognition Letters 22 (5):533–44.
  • Li, H., T. Zhang, and L. Ma. 2012. Confirmation based self-learning algorithm in LVCSR’s semi-supervised incremental learning. Procedia Engineering 29:754–59.
  • Li, L., G. Fengpei, Z. Qingwei, and Y. Yonghong. 2010. Detecting cheering events in sports games. 2nd International Conference on Education Technology and Computer (ICETC).
  • Li, X., L. Wang, and E. Sung. 2008. AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence 21 (5):785–95.
  • Lie, L., Z. Hong-Jiang, and J. Hao. 2002. Content analysis for audio classification and segmentation. IEEE Transactions on Speech and Audio Processing 10 (7):504–16.
  • Lin, L., Y. Li, and A. Sadek. 2013. A k nearest neighbor based local linear wavelet neural network model for on-line short-term traffic volume prediction. Procedia - Social and Behavioral Sciences 96:2066–77.
  • Liu, H., and S. Zhang. 2012. Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems and Software 85 (5):1067–74.
  • Liu, Z.-G., Q. Pan, and J. Dezert. 2013. A new belief-based K-nearest neighbor classification method. Pattern Recognition 46 (3):834–44.
  • Lu, G. 2001. Indexing and retrieval of audio: A survey. Multimedia Tools and Applications 15 (3):269–90.
  • Lu, G.-F., and Y. Wang. 2012. Feature extraction using a fast null space based linear discriminant analysis algorithm. Information Sciences 193:72–80.
  • Malhotra, B., I. Nikolaidis, and J. Harms. 2008. Distributed classification of acoustic targets in wireless audio-sensor networks.”. Computation Network 52 (13):2582–93.
  • Mayer, R., R. Neumayer, D. Baum, and A. Rauber. 2009. Analytic comparison of self-organising maps. In Advances in self-organizing maps, Eds. J. Príncipe, and R. Miikkulainen, Springer Berlin Heidelberg, 5629: 182–190.
  • McConaghy, T., H. Leung, E. Bosse, and V. Varadan. 2003. Classification of audio radar signals using radial basis function neural networks. IEEE Transactions on Instrumentation and Measurement 52 (6):1771–79.
  • McLoughlin, I., H. Zhang, Z. Xie, Y. Song, and W. Xiao. 2015. Robust sound event classification using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (3):540–52.
  • Meyer, C., and H. Schramm. 2006. Boosting HMM acoustic models in large vocabulary speech recognition. Speech Communication 48 (5):532–48.
  • Milone, D. H., J. R. Galli, C. A. Cangiano, H. L. Rufiner, and E. A. Laca. 2012. Automatic recognition of ingestive sounds of cattle based on hidden Markov models. Computers and Electronics in Agriculture 87:51–55.
  • Mitchell, T. 1999. The role of unlabeled data in supervised learning. Proceedings of the sixth international colloquium on cognitive science, Citeseer.
  • Mitra, V., and C.-J. Wang. 2008. Content based audio classification: A neural network approach. Soft Computing 12 (7):639–46.
  • Moreno, P. J., and S. Agarwal. 2003. An experimental study of EM-based algorithms for semi-supervised learning in audio classification. ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.
  • Muhammad, G., and M. Melhem. 2014. Pathological voice detection and binary classification using MPEG-7 audio features. Biomedical Signal Processing and Control 11:1–9.
  • Muñoz-Expósito, J. E., S. García-Galán, N. Ruiz-Reyes, and P. Vera-Candeas. 2007. Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination. Engineering Applications of Artificial Intelligence 20 (6):783–93.
  • Navarathna, R., D. Dean, S. Sridharan, and P. Lucey. 2013. Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech & Language 27 (4):911–27.
  • Neiberg, D., G. Salvi, and J. Gustafson. 2013. Semi-supervised methods for exploring the acoustics of simple productive feedback. Speech Communication 55 (3):451–69.
  • Niessen, M. E., T. L. M. Van Kasteren, and A. Merentitis. 2013. Hierarchical modeling using automated sub-clustering for sound event recognition. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
  • Nillson, N. 1965. Learning machines: Foundations of trainable pattern classifying systems. New York: McGraw-Hill.
  • Nirmal, J., S. Patnaik, M. Zaveri, and P. Kachare. 2013. Multi-scale speaker transformation using radial basis function. Procedia Technology 10:311–19.
  • Nozaki, K., H. Ishibuchi, and H. Tanaka. 1996. Adaptive fuzzy rule-based classification systems. IEEE Transactions on Fuzzy Systems 4 (3):238–50.
  • Oppenheim, A. V., R. W. Schafer, and J. R. Buck. 1989. Discrete-time signal processing. Englewood Cliffs: Prentice-hall.
  • Orio, N. 2010. Automatic identification of audio recordings based on statistical modeling. Signal Processing 90 (4):1064–76.
  • Parascandolo, G., H. Huttunen, and T. Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE, pp. 6440–44.
  • Park, D.-C. 2009. Classification of audio signals using Fuzzy c-means with divergence-based Kernel. Pattern Recognition Letters 30 (9):794–98.
  • Pellegrini, T., J. Portêlo, I. Trancoso, A. Abad, and M. Bugalho. 2009. Hierarchical clustering experiments for application to audio event detection. Proceedings of the 13th International Conference on Speech and Computer.
  • Pimentel, M. A. F., D. A. Clifton, L. Clifton, and L. Tarassenko. 2014. A review of novelty detection. Signal Processing 99:215–49.
  • Polikar, R., L. Upda, S. S. Upda, and V. Honavar. 2001. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31 (4):497–508.
  • Pomponi, E., and A. Vinogradov. 2013. A real-time approach to acoustic emission clustering. Mechanical Systems and Signal Processing 40 (2):791–804.
  • Potamitis, I., S. Ntalampiras, O. Jahn, and K. Riede. 2014. Automatic bird sound detection in long real-field recordings: Applications and tools. Applied Acoustics 80:1–9.
  • Prakash, V. J., and D. L. Nithya. 2014. A survey on semi-supervised learning techniques. International Journal of Computer Trends and Technology (IJCTT) 8(1):25–29.
  • Principe, J. C., N. R. Euliano, and W. C. Lefebvre. 2000. Neural and adaptive systems: Fundamentals through simulations. New York: Wiley.
  • Prodanov, P., and A. Drygajlo. 2005. Bayesian networks based multi-modality fusion for error handling in human–Robot dialogues under noisy conditions. Speech Communication 45 (3):231–48.
  • Qiang, H., and S. Cox. 2011. Inferring the structure of a tennis game using audio information. IEEE Transactions on Audio, Speech, and Language Processing 19 (7):1925–37.
  • Radhakrishnan, R., A. Divakaran, and P. Smaragdis. 2005. Audio analysis for surveillance applications. Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on.
  • Ravan, M., and S. Beheshti. 2011. Speech recognition from adaptive windowing PSD estimation. 24th Canadian Conference on Electrical and Computer Engineering (CCECE), 524–27.
  • Reaves, B. 1991. Comments on “An improved endpoint detector for isolated word recognition. Signal Processing, IEEE Transactions On 39 (2):526–27.
  • Reynolds, D. A. 1995. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17 (1–2):91–108.
  • Reynolds, D. A., T. F. Quatieri, and R. B. Dunn. 2000. Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10 (1–3):19–41.
  • Rojek, I., M. Jagodziński, et al. 2012. Hybrid artificial intelligence system in constraint based scheduling of integrated manufacturing ERP systems. In Hybrid artificial intelligent systems, Eds. E. Corchado, V. Snášel, and A. Abraham, Springer Berlin Heidelberg, 7209: 229–240.
  • Rokach, L. 2009. Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Computational Statistics & Data Analysis 53 (12):4046–72.
  • Rongyan, W., L. Gang, G. Jun, and M. Zhenxin. 2010. Semi-supervised learning for automatic audio events annotation using TSVM. International Conference on Computer Application and System Modeling (ICCASM).
  • Ruiz Reyes, N., P. Vera Candeas, S. García Galán, and J. E. Muñoz. 2010. Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination. Engineering Applications of Artificial Intelligence 23 (2):151–59.
  • Santos, A., and A. Canuto. 2014. Applying semi-supervised learning in hierarchical multi-label classification. Expert Systems with Applications 41(14): 6075–6085.
  • Sathya, R., and A. Abraham. 2013. Comparison of supervised and unsupervised learning algorithms for pattern classification. (IJARAI) International Journal of Advanced Research in Artificial Intelligence 2 (2). http://thesai.org/Publications/ViewPaper?Volume=2&Issue=2&Code=IJARAI&SerialNo=6
  • Scheme, E. J., B. Hudgins, and P. A. Parker. 2007. Myoelectric signal classification for phoneme-based speech recognition. IEEE Transactions on Biomedical Engineering 54 (4):694–99.
  • Schlüter, J. 2016.Learning to pinpoint singing voice from weakly labeled examples. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 44–50.
  • Schölkopf, B., A. Smola, and K.-R. Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10 (5):1299–319.
  • Schroeder, J., S. Wabnik, P. J. Hengel, and S. Goetze. 2011. Detection and classification of acoustic events for in-home care. In Ambient assisted living, Eds. R. Wichert, and B. Eberhardt, Springer Berlin Heidelberg, 181–195.
  • Schuller, B., A. Batliner, S. Steidl, and D. Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53 (9–10):1062–87.
  • Schwarz, P., P. Matejka, and J. Cernocky. 2006. Hierarchical structures of neural networks for phoneme recognition. IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings.
  • Sharma, S., and R. Lal Yadav. 2013. Comparative study of K-means and robust clustering. International Journal of Advanced Computer Research 3 (12):207–210.
  • Shen, J., J. Shepherd, and A. H. H. Ngu. 2006. Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Transactions on Multimedia 8 (6):1179–89.
  • Shuiping, W., T. Zhenming, and L. Shiqiang. 2011. Design and implementation of an audio classification system based on SVM. Procedia Engineering 15:4031–35.
  • Skurichina, M., and R. P. W. Duin. 2002. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications 5 (2):121–35.
  • Stavrakoudis, D. G., I. Z. Gitas, and J. B. Theocharis. 2011. A hierarchical genetic fuzzy rule-based classifier for high-dimensional classification problems. IEEE International Conference on Fuzzy Systems (FUZZ).
  • Sturim, D. E., P. A. Torres-Carrasquillo, T. F. Quatieri, N. Malyska, and A. McCree. 2011. Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. Interspeech, Florence, Italy, ISCA.
  • Su, P.-C., C.-H. Lan, C.-S. Wu, Z.-X. Zeng, and W.-Y. Chen. 2013. Transition effect detection for extracting highlights in baseball videos. EURASIP Journal on Image and Video Processing 2013 (1):1–16.
  • Sun, Y., S. Todorovic, and J. Li. 2006. Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence 20 (07):1093–116.
  • Tao, C. W. 2002. A reduction approach for fuzzy rule bases of fuzzy controllers. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 32 (5):668–75.
  • Temko, A., D. Macho, and C. Nadeu. 2008. Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognition 41 (5):1814–23.
  • Temko, A., and C. Nadeu. 2006. Classification of acoustic events using SVM-based clustering schemes. Pattern Recognition 39 (4):682–94.
  • Temko, A., and C. Nadeu. 2009. Acoustic event detection in meeting-room environments. Pattern Recognition Letters 30 (14):1281–88.
  • Tianzhu, Z., X. Changsheng, Z. Guangyu, L. Si, and L. Hanqing. 2012. A generic framework for video annotation via semi-supervised learning. IEEE Transactions on Multimedia 14 (4):1206–19.
  • Tin Kam, H. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8):832–44.
  • Tong, Z., and C. C. J. Kuo. 2001. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing 9 (4):441–57.
  • Triguero, I., J. A. Sáez, J. Luengo, S. García, and F. Herrera. 2014. On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132:30–41.
  • Truong, T. K., -C.-C. Lin, and S.-H. Chen. 2007. Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet. Pattern Recognition Letters 28 (11):1307–13.
  • Tsunoo, E., G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE Transactions on Audio, Speech, and Language Processing 19 (4):1003–14.
  • Turnbull, D., and C. Elkan. 2005. Fast recognition of musical genres using RBF networks. IEEE Transactions on Knowledge and Data Engineering 17 (4):580–84.
  • Tzortzis, G., and A. Likas. 2008. The global kernel k-means clustring algorithm. IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence), IEEE.
  • Uncini, A. 2003. Audio signal processing by neural networks. Neurocomputing 55 (3–4):593–625.
  • Vapnik, V. 1998. Statistical learning theory. New York: Wiley.
  • Wang, X., and X.-P. Zhang. 2012. Ice hockey shooting event modeling with mixture hidden Markov model. Multimedia Tools and Applications 57 (1):131–44.
  • Weimin, H., C. Tuan-Kiang, L. Haizhou, K. Tian Shiang, and J. Biswas. 2010. Scream detection for home applications. 5th IEEE Conference on Industrial Electronics and Applications (ICIEA).
  • Xiaodan, Z., H. Jing, G. Potamianos, and M. Hasegawa-Johnson. 2009. Acoustic fall detection using Gaussian mixture models and GMM supervectors. IEEE international conference on acoustics. Speech and Signal Processing, 2009. ICASSP 2009.
  • Xu, Q., L. Zhang, and W. Liang. 2013. Acoustic detection technology for gas pipeline leakage. Process Safety and Environmental Protection 91 (4):253–61.
  • Yanan, L., Y. Yilong, L. Lili, P. Shaohua, and Y. Qiuhong. 2012. Semi-supervised gait recognition based on self-training. IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS), 2012.
  • Yang, J.-C., L.-A. Liu, Q.-W. Qin, and M. Zhang. 2013. Audio event change detection and clustering in movies. Journal of Multimedia 8 (2):113–20.
  • Yangqiu, S., and Z. Changshui. 2008. Content-based information fusion for semi-supervised music genre classification. IEEE Transactions on MultiMedia 10 (1):145–52.
  • Yaochu, J. 2000. Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement. IEEE Transactions on Fuzzy Systems 8 (2):212–21.
  • Ya-Ti, P., L. Ching-Yung, S. Ming-Ting, and T. Kun-Cheng. 2009. Healthcare audio event classification using hidden markov models and hierarchical hidden markov models. IEEE International Conference on Multimedia and Expo 2009. ICME 2009.
  • Ye, J., and S. Ji. 2009. Discriminant analysis for dimensionality reduction: An overview of recent developments. Biometrics. John Wiley & Sons, Inc, Published Online: 11 NOV 2009.: 1–19.
  • Ye, T., W. Zuoying, and L. Dajin. 2002. Nonspeech segment rejection based on prosodic information for robust speech recognition. Signal Processing Letters, IEEE 9 (11):364–67.
  • Younghyun, L., K. Hanseok, and D. K. Han. 2013. Acoustic signal based abnormal event detection system with multiclass adaboost. IEEE International Conference on Consumer Electronics (ICCE), 2013.
  • Yunyun, W., C. Songcan, and Z. Zhi-Hua. 2012. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems 23 (5):689–702.
  • Zadeh, L. A. 1996. Fuzzy sets and their application to pattern classification and clustering analysis. In Fuzzy sets, fuzzy logic, and fuzzy systems, Eds. J. K. George, and Y. Bo, 355–93. World Scientific Publishing Co., Inc, 355–393.
  • Zhao, Y., and G. Karypis. 2001. Criterion functions for document clustering: Experiments and analysis. Department of Computer Science and Engineering University of Minnesota,Minneapolis, USA, Technical report.
  • Zhu, L., and Q. Yang. 2012. Speaker recognition system based on weighted feature parameter. Physics Procedia 25:1515–22.
  • Zhu, X., and A. B. Goldberg. 2009. Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3 (1):1–130.
  • Zolfaghari, P., and T. Robinson. 1996. Formant analysis using mixtures of Gaussians. Fourth International Conference on Spoken Language, 1996. ICSLP 96. Proceedings.
  • Zubair, S., F. Yan, and W. Wang. 2013. Dictionary learning based sparse coefficients for audio classification with max and average pooling. Digital Signal Processing 23 (3):960–70.
  • Zweig, G. 2003. Bayesian network structures and inference techniques for automatic speech recognition. Computer Speech & Language 17 (2–3):173–93.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.