Search in:

Applied Artificial Intelligence

An International Journal

Volume 31, 2017 - Issue 9-10

Submit an article Journal homepage

Free access

5,142

Views

CrossRef citations to date

Altmetric

Original Articles

An Overview of Audio Event Detection Methods from Feature Extraction to Classification

Elham BabaeeFaculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Nor Badrul AnuarFaculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Ainuddin Wahid Abdul WahabFaculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Shahaboddin ShamshirbandDepartment for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam;Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, VietnamCorrespondence[email protected]

http://orcid.org/0000-0002-6605-498X

Anthony T. ChronopoulosDepartment of Computer Science, University of Texas, San Antonio, USA;Visiting Faculty, Department Computer Engineering & Informatics, University of Patras, Rio, Greece

Pages 661-714 | Published online: 05 Feb 2018

Cite this article
https://doi.org/10.1080/08839514.2018.1430469
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF

References

Acır, N., Ö. Özdamar, and C. Güzeliş. 2006. Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Engineering Applications of Artificial Intelligence 19 (2):209–18.
Web of Science ®Google Scholar
Agrawala, A. 1970. Learning with a probabilistic teacher. IEEE Transactions on Information Theory 16 (4):373–79.
Web of Science ®Google Scholar
Alcala-Fdez, J., R. Alcala, and F. Herrera. 2011. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems 19 (5):857–72.
Web of Science ®Google Scholar
Andreassen, T., A. Surlykke, and J. Hallam. 2014. Semi-automatic long-term acoustic surveying: A case study with bats. Ecological Informatics 21:13–24.
Web of Science ®Google Scholar
Arnold, M. 2002. Subjective and objective quality evaluation of watermarked audio tracks. Proceedings. Second International Conference on Web Delivering of Music. WEDELMUSIC. IEEE.
Google Scholar
Atrey, P. K., M. C. Maddage, and M. S. Kankanhalli. 2006. Audio based event detection for multimedia surveillance. International Conference on Acoustics, Speech and Signal Processing, ICASSP Proceedings, IEEE.
Google Scholar
Bailey, T., and A. K. Jain. 1978. A note on distance-weighted k-nearest neighbor rules. IEEE Transactions on Systems, Man and Cybernetics 8 (4):311–13.
Web of Science ®Google Scholar
Balochian, S., E. A. Seidabad, and S. Z. Rad. 2013. Neural network optimization by genetic algorithms for the audio classification to speech and music. International Journal of Signal Processing, Image Processing & Pattern Recognition 6 (3).
Google Scholar
Bardeli, R., D. Wolff, F. Kurth, M. Koch, K. H. Tauchert, and K. H. Frommolt. 2010. Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring. Pattern Recognition Letters 31 (12):1524–34.
Web of Science ®Google Scholar
Bauer, E., and R. Kohavi. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36 (1–2):105–39.
Web of Science ®Google Scholar
Baum, L. E., and T. Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics 37 (6):1554–63.
Google Scholar
Besacier, L., J. F. Bonastre, and C. Fredouille. 2000. Localization and selection of speaker-specific information with statistical modeling. Speech Communication 31 (2–3):89–106.
Web of Science ®Google Scholar
Bhatia, N. 2010. Survey of nearest neighbor techniques. International Journal of Computer Science and Information Security (IJCSIS) 8 (2):302–305.
Google Scholar
Bhavsar, H., and A. Ganatra. 2012. A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE) 2 (4):2231–2307.
Google Scholar
Bin, M., L. Haizhou, and T. Rong. 2007. Spoken language recognition using ensemble classifiers. IEEE Transactions on Audio, Speech, and Language Processing 15 (7):2053–62.
Google Scholar
Blum, A., and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on Computational learning theory, ACM.
Google Scholar
Bourlard, H. A., and N. Morgan. 1993. Connectionist speech recognition: A hybrid approach. Kluwer Academic Publishers. https://link.springer.com/book/10.1007%2F978-1-4615-3210-1
Google Scholar
Breiman, L. 1996. Bagging predictors. Machine Learning 24 (2):123–40.
Web of Science ®Google Scholar
Breiman, L. 2001. Random forests. Machine Learning 45 (1):5–32.
Web of Science ®Google Scholar
Buckley, J. J., and Y. Hayashi. 1994. Fuzzy genetic algorithm and applications. Fuzzy Sets and Systems 61 (2):129–36.
Web of Science ®Google Scholar
Busso, C., S. Lee, and S. Narayanan. 2009. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing 17 (4):582–96.
Web of Science ®Google Scholar
Cakir, E., T. Heittola, H. Huttunen, and T. Virtanen. 2015. Polyphonic sound event detection using multi label deep neural networks. Neural Networks (IJCNN), 2015 International Joint Conference on, IEEE. pp. 1–7.
Google Scholar
Campbell, J. P., Jr. 1997. Speaker recognition: A tutorial. Proceedings of the IEEE 85 (9):1437–62.
Web of Science ®Google Scholar
Carpenter, G. A., S. Grossberg, and J. H. Reynolds. 1991. ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Networks 4 (5):565–88.
Web of Science ®Google Scholar
Charalampidis, D., M. Georgiopoulos, and T. Kasparis. 2000. Classification of noisy signal using fuzzy ARTMAP neural networks. International Joint Conference on Neural Networks, IJCNN2000, Proceedings of the IEEE-INNS-ENNS.
Google Scholar
Cheng, J., Y. Sun, and L. Ji. 2010. A call-independent and automatic acoustic system for the individual recognition of animals: A novel model using four passerines. Pattern Recognition 43 (11):3846–52.
Web of Science ®Google Scholar
Choi, J.-H., and J.-H. Chang. 2012. On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication 54 (3):477–90.
Web of Science ®Google Scholar
Chuan, C.-H. 2013. Audio classification and retrieval using wavelets and gaussian mixture models. International Journal Multimed Data Engineering Managed 4 (1):1–20.
Google Scholar
Chung-Hsien, W., and H. Chia-Hsin. 2006. Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Transactions on Audio, Speech, and Language Processing 14 (2):647–57.
Google Scholar
Cintra, M. E., M. C. Monard, E. A. Cherman, and H. De Arruda Camargo. 2011. On the estimation of the number of fuzzy sets for fuzzy rule-based classification systems. 11th International Conference on Hybrid Intelligent Systems (HIS), 2011.
Google Scholar
Clavel, C., T. Ehrette, and G. Richard. 2005. Events detection for an audio-based surveillance system. IEEE International Conference on Multimedia and Expo, 2005. ICME 2005.
Google Scholar
Cohen, I., N. Sebe, F. G. Gozman, M. C. Cirelo, and T. S. Huang. 2003. Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings.
Google Scholar
Cordón, O., M. J. Del Jesus, and F. Herrera. 1999. A proposal on reasoning methods in fuzzy rule-based classification systems. International Journal of Approximate Reasoning 20 (1):21–45.
Web of Science ®Google Scholar
Costa, Y. M. G., L. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins. 2012. Music genre classification using LBP textural features. Signal Processing 92 (11):2723–37.
Web of Science ®Google Scholar
Cover, T., and P. Hart. 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13 (1):21–27.
Web of Science ®Google Scholar
Cui, X., H. Jing, and C. Jen-Tzung. 2012. Multi-view and multi-objective semi-supervised learning for HMM-based automatic speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20 (7):1923,1935.
Google Scholar
Dafna, E., A. Tarasiuk, and Y. Zigel. 2013. Automatic detection of whole night snoring events using non-contact microphone. PLoS One 8 (12). doi:10.1371/journal.pone.0084139
PubMed Web of Science ®Google Scholar
Damper, R. I., and J. E. Higgins. 2003. Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters 24 (13):2167–73.
Web of Science ®Google Scholar
Daoudi, K., D. Fohr, and C. Antoine. 2003. Dynamic Bayesian networks for multi-band automatic speech recognition. Computer Speech & Language 17 (2–3):263–85.
Web of Science ®Google Scholar
Davis, S. B., and P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28 (4):357–66.
Google Scholar
Deller, J. J. R., J. H. L. Hansen, and J. G. Proakis. (2000). Discrete-Time Processing of Speech Signals. Hoboken, New Jersey, USA: Wiley-IEEE Press.
Google Scholar
Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2009. Classification of audio signals using SVM and RBFNN. Expert Systems with Applications 36 (3):6069–75.
Web of Science ®Google Scholar
Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2011a. Classification of audio signals using AANN and GMM. Applied Soft Computing 11 (1):716–23.
Web of Science ®Google Scholar
Dhanalakshmi, P., S. Palanivel, and V. Ramalingam. 2011b. Pattern classification models for classifying and indexing audio signals. Engineering Applications of Artificial Intelligence 24 (2):350–57.
Web of Science ®Google Scholar
Dietterich, T. 2000a. Ensemble methods in machine learning. Multiple Classifier Systems, Springer Berlin Heidelberg 1857:1–15.
Google Scholar
Dietterich, T. 2000b. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40 (2):139–57.
Web of Science ®Google Scholar
Driggers, R. G. 2003. Encyclopedia of Optical Engineering (Vol. 3). Maryland, USA: Marcel Dekker Inc.
Google Scholar
Drugman, T. 2014. Using mutual information in supervised temporal event detection: Application to cough detection. Biomedical Signal Processing and Control 10:50–57.
Web of Science ®Google Scholar
Espi, M., M. Fujimoto, K. Kinoshita, and T. Nakatani. 2015. Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal on Audio, Speech, and Music Processing 2015 (26): 1–12.
Web of Science ®Google Scholar
Fisher, R. A. 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2):179–88.
Google Scholar
Freund, Y., and R. E. Schapire. 1996. Experiments with a new boosting algorithm. ICML, Bari, Italy Morgan Kaufmann Publishers Inc.San Francisco, CA, USA.
Google Scholar
Freund, Y., and R. E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55 (1):119–39.
Web of Science ®Google Scholar
Friedman, N., D. Geiger, and M. Goldszmidt. 1997. Bayesian network classifiers. Machine Learning 29 (2–3):131–63.
Web of Science ®Google Scholar
Ganapathy, S., P. Rajan, and H. Hermansky. 2011. Multi-layer perceptron based speech activity detection for speaker verification. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
Google Scholar
Gencoglu, O., T. Virtanen, and H. Huttunen. 2014. Recognition of acoustic events using deep neural networks. Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, IEEE, pp. 506–10.
Google Scholar
Gergen, S., A. Nagathil, and R. Martin. 2014. Classification of reverberant audio signals using clustered ad hoc distributed microphones. Signal Processing 107:21–32.
Web of Science ®Google Scholar
Giannakopoulos, T., D. Kosmopoulos, A. Aristidou, and S. Theodoridis. 2006. Violence content classification using audio features. Advances in Artificial Intelligence, Berlin, Heidelberg: Springer, 502–07.
Google Scholar
Giannakopoulos, T., and A. Pikrakis. 2014. Chapter 4 - audio features. In Introduction to audio analysis, Eds. T. Giannakopoulos, and A. Pikrakis, 59–103. Oxford: Academic Press.
Google Scholar
Giannakopoulos, T., A. Pikrakis, and S. Theodoridis. 2007. A multi-class audio classification method with respect to violent content in movies using bayesian networks. Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on.
Google Scholar
Grossberg, S. 1976. Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions. Biological Cybernetics 23 (4):187–202.
PubMed Web of Science ®Google Scholar
Guz, U., S. Cuendet, D. Hakkani-Tür, and G. Tur. 2010. Multi-view semi-supervised learning for dialog act segmentation of speech. IEEE Transactions on Audio, Speech, and Language Processing 18 (2):320,329.
Google Scholar
Hall, M. 2007. A decision tree-based attribute weighting filter for naive Bayes. Knowledge-Based Systems 20 (2):120–26.
Web of Science ®Google Scholar
Hermansky, H. 1990. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87 (4):1738–52.
PubMed Web of Science ®Google Scholar
Hongwei, W., and J. M. Mendel. 2007. Classification of battlefield ground vehicles using acoustic features and fuzzy logic rule-based classifiers. IEEE Transactions on Fuzzy Systems 15 (1):56–72.
Web of Science ®Google Scholar
Huang, C.-J., Y.-J. Yang, D.-X. Yang, and Y.-J. Chen. 2009. Frog classification using machine learning techniques. Expert Systems with Applications 36 (2, Part 2):3737–43.
Web of Science ®Google Scholar
Itoh, H., T. Takiguchi, and Y. Ariki. 2013. Event detection and recognition using HMM with whistle sounds. Signal-Image Technology & Internet-Based Systems (SITIS), 2013 International Conference on.
Google Scholar
Jain, A. K., R. P. W. Duin, and M. Jianchang. 2000. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (1):4–37.
Web of Science ®Google Scholar
Janik, P., and T. Lobos. 2006. Automated classification of power-quality disturbances using SVM and RBF networks. IEEE Transactions on Power Delivery 21 (3):1663–69.
Web of Science ®Google Scholar
Joachims, T. 1999. Transductive inference for text classification using support vector machines. ICML, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Google Scholar
Kalteh, A. M., P. Hjorth, and R. Berndtsson. 2008. Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application. Environmental Modelling & Software 23 (7):835–45.
Web of Science ®Google Scholar
Kaufman, L., and P. J. Rousseeuw. 1990. Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Google Scholar
Khairnar, D. G., S. N. Merchant, and U. B. Desai. 2005. An optimum RBF network for signal detection in non-gaussian noise. In Pattern recognition and machine intelligence, Eds. S. Pal, S. Bandyopadhyay, and S. Biswas, Springer Berlin Heidelberg, 3776:306–309.
Google Scholar
Khunarsal, P., C. Lursinsap, and T. Raicharoen. 2013. Very short time environmental sound classification based on spectrogram pattern matching. Information Sciences 243:57–74.
Web of Science ®Google Scholar
Kinnunen, T., and H. Li. 2010. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52 (1):12–40.
Web of Science ®Google Scholar
Kinnunen, T., I. Sidoroff, M. Tuononen, and P. Fränti. 2011. Comparison of clustering methods: A case study of text-independent speaker modeling. Pattern Recognition Letters 32 (13):1604–17.
Web of Science ®Google Scholar
Kinnunen, T., B. Zhang, J. Zhu, and Y. Wang. 2007. Speaker verification with adaptive spectral subband centroids. In Advances in biometrics, Eds. S.-W. Lee, and S. Li, Springer Berlin Heidelberg, 4642: 58–66.
Google Scholar
Kohonen, T. 1982. Analysis of a simple self-organizing process. Biological Cybernetics 44 (2):135–40.
Web of Science ®Google Scholar
Kotti, M., E. Benetos, C. Kotropoulos, and I. Pitas. 2007. A neural network approach to audio-assisted movie dialogue detection. Neurocomputing 71 (1–3):157–66.
Web of Science ®Google Scholar
Kulkarni, V. Y., and P. K. Sinha. 2013. Random forest classifiers: A survey and future research directions. Int Journal of Advanced Computing 36 (1):1144–53.
Google Scholar
Kumar, A., and B. Raj. 2016. Audio event detection using weakly labeled data. Proceedings of the 2016 ACM on Multimedia Conference, ACM, pp. 1038–47.
Google Scholar
Lamel, L., L. Rabiner, A. E. Rosenberg, and J. G. Wilpon. 1981. An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 29 (4):777–85.
Google Scholar
Larsen, B., and C. Aone. 1999. Fast and effective text mining using linear-time document clustering. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM.
Google Scholar
Lee, C.-H., C.-H. Chou, -C.-C. Han, and R.-Z. Huang. 2006. Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognition Letters 27 (2):93–101.
Web of Science ®Google Scholar
Lefèvre, S., and N. Vincent. 2011. A two level strategy for audio segmentation. Digital Signal Processing 21 (2):270–77.
Web of Science ®Google Scholar
Li, D., I. K. Sethi, N. Dimitrova, and T. McGee. 2001. Classification of general audio data for content-based retrieval. Pattern Recognition Letters 22 (5):533–44.
Web of Science ®Google Scholar
Li, H., T. Zhang, and L. Ma. 2012. Confirmation based self-learning algorithm in LVCSR’s semi-supervised incremental learning. Procedia Engineering 29:754–59.
Google Scholar
Li, L., G. Fengpei, Z. Qingwei, and Y. Yonghong. 2010. Detecting cheering events in sports games. 2nd International Conference on Education Technology and Computer (ICETC).
Google Scholar
Li, X., L. Wang, and E. Sung. 2008. AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence 21 (5):785–95.
Web of Science ®Google Scholar
Lie, L., Z. Hong-Jiang, and J. Hao. 2002. Content analysis for audio classification and segmentation. IEEE Transactions on Speech and Audio Processing 10 (7):504–16.
Google Scholar
Lin, L., Y. Li, and A. Sadek. 2013. A k nearest neighbor based local linear wavelet neural network model for on-line short-term traffic volume prediction. Procedia - Social and Behavioral Sciences 96:2066–77.
Google Scholar
Liu, H., and S. Zhang. 2012. Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems and Software 85 (5):1067–74.
Web of Science ®Google Scholar
Liu, Z.-G., Q. Pan, and J. Dezert. 2013. A new belief-based K-nearest neighbor classification method. Pattern Recognition 46 (3):834–44.
Web of Science ®Google Scholar
Lu, G. 2001. Indexing and retrieval of audio: A survey. Multimedia Tools and Applications 15 (3):269–90.
Web of Science ®Google Scholar
Lu, G.-F., and Y. Wang. 2012. Feature extraction using a fast null space based linear discriminant analysis algorithm. Information Sciences 193:72–80.
Web of Science ®Google Scholar
Malhotra, B., I. Nikolaidis, and J. Harms. 2008. Distributed classification of acoustic targets in wireless audio-sensor networks.”. Computation Network 52 (13):2582–93.
Web of Science ®Google Scholar
Mayer, R., R. Neumayer, D. Baum, and A. Rauber. 2009. Analytic comparison of self-organising maps. In Advances in self-organizing maps, Eds. J. Príncipe, and R. Miikkulainen, Springer Berlin Heidelberg, 5629: 182–190.
Google Scholar
McConaghy, T., H. Leung, E. Bosse, and V. Varadan. 2003. Classification of audio radar signals using radial basis function neural networks. IEEE Transactions on Instrumentation and Measurement 52 (6):1771–79.
Web of Science ®Google Scholar
McLoughlin, I., H. Zhang, Z. Xie, Y. Song, and W. Xiao. 2015. Robust sound event classification using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (3):540–52.
Web of Science ®Google Scholar
Meyer, C., and H. Schramm. 2006. Boosting HMM acoustic models in large vocabulary speech recognition. Speech Communication 48 (5):532–48.
Web of Science ®Google Scholar
Milone, D. H., J. R. Galli, C. A. Cangiano, H. L. Rufiner, and E. A. Laca. 2012. Automatic recognition of ingestive sounds of cattle based on hidden Markov models. Computers and Electronics in Agriculture 87:51–55.
Web of Science ®Google Scholar
Mitchell, T. 1999. The role of unlabeled data in supervised learning. Proceedings of the sixth international colloquium on cognitive science, Citeseer.
Google Scholar
Mitra, V., and C.-J. Wang. 2008. Content based audio classification: A neural network approach. Soft Computing 12 (7):639–46.
Web of Science ®Google Scholar
Moreno, P. J., and S. Agarwal. 2003. An experimental study of EM-based algorithms for semi-supervised learning in audio classification. ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.
Google Scholar
Muhammad, G., and M. Melhem. 2014. Pathological voice detection and binary classification using MPEG-7 audio features. Biomedical Signal Processing and Control 11:1–9.
Web of Science ®Google Scholar
Muñoz-Expósito, J. E., S. García-Galán, N. Ruiz-Reyes, and P. Vera-Candeas. 2007. Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination. Engineering Applications of Artificial Intelligence 20 (6):783–93.
Web of Science ®Google Scholar
Navarathna, R., D. Dean, S. Sridharan, and P. Lucey. 2013. Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech & Language 27 (4):911–27.
Web of Science ®Google Scholar
Neiberg, D., G. Salvi, and J. Gustafson. 2013. Semi-supervised methods for exploring the acoustics of simple productive feedback. Speech Communication 55 (3):451–69.
Web of Science ®Google Scholar
Niessen, M. E., T. L. M. Van Kasteren, and A. Merentitis. 2013. Hierarchical modeling using automated sub-clustering for sound event recognition. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
Google Scholar
Nillson, N. 1965. Learning machines: Foundations of trainable pattern classifying systems. New York: McGraw-Hill.
Google Scholar
Nirmal, J., S. Patnaik, M. Zaveri, and P. Kachare. 2013. Multi-scale speaker transformation using radial basis function. Procedia Technology 10:311–19.
Google Scholar
Nozaki, K., H. Ishibuchi, and H. Tanaka. 1996. Adaptive fuzzy rule-based classification systems. IEEE Transactions on Fuzzy Systems 4 (3):238–50.
Web of Science ®Google Scholar
Oppenheim, A. V., R. W. Schafer, and J. R. Buck. 1989. Discrete-time signal processing. Englewood Cliffs: Prentice-hall.
Google Scholar
Orio, N. 2010. Automatic identification of audio recordings based on statistical modeling. Signal Processing 90 (4):1064–76.
Web of Science ®Google Scholar
Parascandolo, G., H. Huttunen, and T. Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE, pp. 6440–44.
Google Scholar
Park, D.-C. 2009. Classification of audio signals using Fuzzy c-means with divergence-based Kernel. Pattern Recognition Letters 30 (9):794–98.
Web of Science ®Google Scholar
Pellegrini, T., J. Portêlo, I. Trancoso, A. Abad, and M. Bugalho. 2009. Hierarchical clustering experiments for application to audio event detection. Proceedings of the 13th International Conference on Speech and Computer.
Google Scholar
Pimentel, M. A. F., D. A. Clifton, L. Clifton, and L. Tarassenko. 2014. A review of novelty detection. Signal Processing 99:215–49.
Web of Science ®Google Scholar
Polikar, R., L. Upda, S. S. Upda, and V. Honavar. 2001. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31 (4):497–508.
Web of Science ®Google Scholar
Pomponi, E., and A. Vinogradov. 2013. A real-time approach to acoustic emission clustering. Mechanical Systems and Signal Processing 40 (2):791–804.
Web of Science ®Google Scholar
Potamitis, I., S. Ntalampiras, O. Jahn, and K. Riede. 2014. Automatic bird sound detection in long real-field recordings: Applications and tools. Applied Acoustics 80:1–9.
Web of Science ®Google Scholar
Prakash, V. J., and D. L. Nithya. 2014. A survey on semi-supervised learning techniques. International Journal of Computer Trends and Technology (IJCTT) 8(1):25–29.
Google Scholar
Principe, J. C., N. R. Euliano, and W. C. Lefebvre. 2000. Neural and adaptive systems: Fundamentals through simulations. New York: Wiley.
Google Scholar
Prodanov, P., and A. Drygajlo. 2005. Bayesian networks based multi-modality fusion for error handling in human–Robot dialogues under noisy conditions. Speech Communication 45 (3):231–48.
Web of Science ®Google Scholar
Qiang, H., and S. Cox. 2011. Inferring the structure of a tennis game using audio information. IEEE Transactions on Audio, Speech, and Language Processing 19 (7):1925–37.
Google Scholar
Radhakrishnan, R., A. Divakaran, and P. Smaragdis. 2005. Audio analysis for surveillance applications. Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on.
Google Scholar
Ravan, M., and S. Beheshti. 2011. Speech recognition from adaptive windowing PSD estimation. 24th Canadian Conference on Electrical and Computer Engineering (CCECE), 524–27.
Google Scholar
Reaves, B. 1991. Comments on “An improved endpoint detector for isolated word recognition. Signal Processing, IEEE Transactions On 39 (2):526–27.
Google Scholar
Reynolds, D. A. 1995. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17 (1–2):91–108.
Web of Science ®Google Scholar
Reynolds, D. A., T. F. Quatieri, and R. B. Dunn. 2000. Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10 (1–3):19–41.
Web of Science ®Google Scholar
Rojek, I., M. Jagodziński, et al. 2012. Hybrid artificial intelligence system in constraint based scheduling of integrated manufacturing ERP systems. In Hybrid artificial intelligent systems, Eds. E. Corchado, V. Snášel, and A. Abraham, Springer Berlin Heidelberg, 7209: 229–240.
Google Scholar
Rokach, L. 2009. Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Computational Statistics & Data Analysis 53 (12):4046–72.
Web of Science ®Google Scholar
Rongyan, W., L. Gang, G. Jun, and M. Zhenxin. 2010. Semi-supervised learning for automatic audio events annotation using TSVM. International Conference on Computer Application and System Modeling (ICCASM).
Google Scholar
Ruiz Reyes, N., P. Vera Candeas, S. García Galán, and J. E. Muñoz. 2010. Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination. Engineering Applications of Artificial Intelligence 23 (2):151–59.
Web of Science ®Google Scholar
Santos, A., and A. Canuto. 2014. Applying semi-supervised learning in hierarchical multi-label classification. Expert Systems with Applications 41(14): 6075–6085.
PubMed Web of Science ®Google Scholar
Sathya, R., and A. Abraham. 2013. Comparison of supervised and unsupervised learning algorithms for pattern classification. (IJARAI) International Journal of Advanced Research in Artificial Intelligence 2 (2). http://thesai.org/Publications/ViewPaper?Volume=2&Issue=2&Code=IJARAI&SerialNo=6
Google Scholar
Scheme, E. J., B. Hudgins, and P. A. Parker. 2007. Myoelectric signal classification for phoneme-based speech recognition. IEEE Transactions on Biomedical Engineering 54 (4):694–99.
PubMed Web of Science ®Google Scholar
Schlüter, J. 2016.Learning to pinpoint singing voice from weakly labeled examples. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 44–50.
Google Scholar
Schölkopf, B., A. Smola, and K.-R. Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10 (5):1299–319.
Web of Science ®Google Scholar
Schroeder, J., S. Wabnik, P. J. Hengel, and S. Goetze. 2011. Detection and classification of acoustic events for in-home care. In Ambient assisted living, Eds. R. Wichert, and B. Eberhardt, Springer Berlin Heidelberg, 181–195.
Google Scholar
Schuller, B., A. Batliner, S. Steidl, and D. Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53 (9–10):1062–87.
Web of Science ®Google Scholar
Schwarz, P., P. Matejka, and J. Cernocky. 2006. Hierarchical structures of neural networks for phoneme recognition. IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings.
Google Scholar
Sharma, S., and R. Lal Yadav. 2013. Comparative study of K-means and robust clustering. International Journal of Advanced Computer Research 3 (12):207–210.
Google Scholar
Shen, J., J. Shepherd, and A. H. H. Ngu. 2006. Towards effective content-based music retrieval with multiple acoustic feature combination. IEEE Transactions on Multimedia 8 (6):1179–89.
Web of Science ®Google Scholar
Shuiping, W., T. Zhenming, and L. Shiqiang. 2011. Design and implementation of an audio classification system based on SVM. Procedia Engineering 15:4031–35.
Google Scholar
Skurichina, M., and R. P. W. Duin. 2002. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications 5 (2):121–35.
Web of Science ®Google Scholar
Stavrakoudis, D. G., I. Z. Gitas, and J. B. Theocharis. 2011. A hierarchical genetic fuzzy rule-based classifier for high-dimensional classification problems. IEEE International Conference on Fuzzy Systems (FUZZ).
Google Scholar
Sturim, D. E., P. A. Torres-Carrasquillo, T. F. Quatieri, N. Malyska, and A. McCree. 2011. Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. Interspeech, Florence, Italy, ISCA.
Google Scholar
Su, P.-C., C.-H. Lan, C.-S. Wu, Z.-X. Zeng, and W.-Y. Chen. 2013. Transition effect detection for extracting highlights in baseball videos. EURASIP Journal on Image and Video Processing 2013 (1):1–16.
Google Scholar
Sun, Y., S. Todorovic, and J. Li. 2006. Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence 20 (07):1093–116.
Google Scholar
Tao, C. W. 2002. A reduction approach for fuzzy rule bases of fuzzy controllers. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 32 (5):668–75.
PubMed Web of Science ®Google Scholar
Temko, A., D. Macho, and C. Nadeu. 2008. Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognition 41 (5):1814–23.
Web of Science ®Google Scholar
Temko, A., and C. Nadeu. 2006. Classification of acoustic events using SVM-based clustering schemes. Pattern Recognition 39 (4):682–94.
Web of Science ®Google Scholar
Temko, A., and C. Nadeu. 2009. Acoustic event detection in meeting-room environments. Pattern Recognition Letters 30 (14):1281–88.
Web of Science ®Google Scholar
Tianzhu, Z., X. Changsheng, Z. Guangyu, L. Si, and L. Hanqing. 2012. A generic framework for video annotation via semi-supervised learning. IEEE Transactions on Multimedia 14 (4):1206–19.
Web of Science ®Google Scholar
Tin Kam, H. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8):832–44.
Web of Science ®Google Scholar
Tong, Z., and C. C. J. Kuo. 2001. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing 9 (4):441–57.
Google Scholar
Triguero, I., J. A. Sáez, J. Luengo, S. García, and F. Herrera. 2014. On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132:30–41.
Web of Science ®Google Scholar
Truong, T. K., -C.-C. Lin, and S.-H. Chen. 2007. Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet. Pattern Recognition Letters 28 (11):1307–13.
Web of Science ®Google Scholar
Tsunoo, E., G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE Transactions on Audio, Speech, and Language Processing 19 (4):1003–14.
Google Scholar
Turnbull, D., and C. Elkan. 2005. Fast recognition of musical genres using RBF networks. IEEE Transactions on Knowledge and Data Engineering 17 (4):580–84.
Web of Science ®Google Scholar
Tzortzis, G., and A. Likas. 2008. The global kernel k-means clustring algorithm. IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence), IEEE.
Google Scholar
Uncini, A. 2003. Audio signal processing by neural networks. Neurocomputing 55 (3–4):593–625.
Web of Science ®Google Scholar
Vapnik, V. 1998. Statistical learning theory. New York: Wiley.
Google Scholar
Wang, X., and X.-P. Zhang. 2012. Ice hockey shooting event modeling with mixture hidden Markov model. Multimedia Tools and Applications 57 (1):131–44.
Web of Science ®Google Scholar
Weimin, H., C. Tuan-Kiang, L. Haizhou, K. Tian Shiang, and J. Biswas. 2010. Scream detection for home applications. 5th IEEE Conference on Industrial Electronics and Applications (ICIEA).
Google Scholar
Xiaodan, Z., H. Jing, G. Potamianos, and M. Hasegawa-Johnson. 2009. Acoustic fall detection using Gaussian mixture models and GMM supervectors. IEEE international conference on acoustics. Speech and Signal Processing, 2009. ICASSP 2009.
Google Scholar
Xu, Q., L. Zhang, and W. Liang. 2013. Acoustic detection technology for gas pipeline leakage. Process Safety and Environmental Protection 91 (4):253–61.
Web of Science ®Google Scholar
Yanan, L., Y. Yilong, L. Lili, P. Shaohua, and Y. Qiuhong. 2012. Semi-supervised gait recognition based on self-training. IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS), 2012.
Google Scholar
Yang, J.-C., L.-A. Liu, Q.-W. Qin, and M. Zhang. 2013. Audio event change detection and clustering in movies. Journal of Multimedia 8 (2):113–20.
Google Scholar
Yangqiu, S., and Z. Changshui. 2008. Content-based information fusion for semi-supervised music genre classification. IEEE Transactions on MultiMedia 10 (1):145–52.
Web of Science ®Google Scholar
Yaochu, J. 2000. Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement. IEEE Transactions on Fuzzy Systems 8 (2):212–21.
Web of Science ®Google Scholar
Ya-Ti, P., L. Ching-Yung, S. Ming-Ting, and T. Kun-Cheng. 2009. Healthcare audio event classification using hidden markov models and hierarchical hidden markov models. IEEE International Conference on Multimedia and Expo 2009. ICME 2009.
Google Scholar
Ye, J., and S. Ji. 2009. Discriminant analysis for dimensionality reduction: An overview of recent developments. Biometrics. John Wiley & Sons, Inc, Published Online: 11 NOV 2009.: 1–19.
Google Scholar
Ye, T., W. Zuoying, and L. Dajin. 2002. Nonspeech segment rejection based on prosodic information for robust speech recognition. Signal Processing Letters, IEEE 9 (11):364–67.
Web of Science ®Google Scholar
Younghyun, L., K. Hanseok, and D. K. Han. 2013. Acoustic signal based abnormal event detection system with multiclass adaboost. IEEE International Conference on Consumer Electronics (ICCE), 2013.
Google Scholar
Yunyun, W., C. Songcan, and Z. Zhi-Hua. 2012. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems 23 (5):689–702.
PubMed Web of Science ®Google Scholar
Zadeh, L. A. 1996. Fuzzy sets and their application to pattern classification and clustering analysis. In Fuzzy sets, fuzzy logic, and fuzzy systems, Eds. J. K. George, and Y. Bo, 355–93. World Scientific Publishing Co., Inc, 355–393.
Google Scholar
Zhao, Y., and G. Karypis. 2001. Criterion functions for document clustering: Experiments and analysis. Department of Computer Science and Engineering University of Minnesota,Minneapolis, USA, Technical report.
Google Scholar
Zhu, L., and Q. Yang. 2012. Speaker recognition system based on weighted feature parameter. Physics Procedia 25:1515–22.
Google Scholar
Zhu, X., and A. B. Goldberg. 2009. Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 3 (1):1–130.
Google Scholar
Zolfaghari, P., and T. Robinson. 1996. Formant analysis using mixtures of Gaussians. Fourth International Conference on Spoken Language, 1996. ICSLP 96. Proceedings.
Google Scholar
Zubair, S., F. Yan, and W. Wang. 2013. Dictionary learning based sparse coefficients for audio classification with max and average pooling. Digital Signal Processing 23 (3):960–70.
Web of Science ®Google Scholar
Zweig, G. 2003. Bayesian network structures and inference techniques for automatic speech recognition. Computer Speech & Language 17 (2–3):173–93.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

An Overview of Audio Event Detection Methods from Feature Extraction to Classification

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

An Overview of Audio Event Detection Methods from Feature Extraction to Classification

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date