Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning

Kiavash BahreiniWelten Institute, Research Centre for Learning, Teaching and Technology, Faculty of Psychology and Educational Sciences, Open University of the Netherlands, Heerlen, The NetherlandsCorrespondence[email protected]
View further author information

Rob NadolskiWelten Institute, Research Centre for Learning, Teaching and Technology, Faculty of Psychology and Educational Sciences, Open University of the Netherlands, Heerlen, The NetherlandsView further author information

Wim WesteraWelten Institute, Research Centre for Learning, Teaching and Technology, Faculty of Psychology and Educational Sciences, Open University of the Netherlands, Heerlen, The NetherlandsView further author information

Pages 415-430 | Published online: 28 Mar 2016

Cite this article
https://doi.org/10.1080/10447318.2016.1159799
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

REFERENCES

Bahreini, K., Nadolski, R., Qi, W., & Westera, W. (2012a). FILTWAM - A framework for online game-based communication skills training - Using webcams and microphones for enhancing learner support. In P. Felicia (Ed.), The 6th European Conference on Games Based Learning (ECGBL) (pp. 39–48). Cork, Ireland: Academic Conferences, Ltd.
Google Scholar
Bahreini, K., Nadolski, R., & Westera, W. (2012b). FILTWAM - A framework For online affective computing in serious games. In A. De Gloria & S. de Freitas (Eds.), The 4th International Conference on Games and Virtual Worlds for Serious Applications (VSGAMES’ 12). Procedia Computer Science.15 (pp. 45–52). Genoa, Italy : Elsevier B.V.
Google Scholar
Bahreini, K., Nadolski, R., & Westera, W. (2014). Towards multimodal emotion recognition in E-learning environments. Interactive Learning Environments. DOI: 10.1080/10494820.2014.908927.
Web of Science ®Google Scholar
Bahreini, K., Nadolski, R., & Westera, W. (2015). Towards real-time speech emotion recognition for affective E-learning. Education and Information Technologies, 1–20. Springer US. DOI: 10.1007/s10639-015-9388-2.
Google Scholar
Ben Ammar, M., Neji, M., Alimi, A. M., & Gouarderes, G. (2010). The affective tutoring system. Expert Systems with Applications, 37(4), 3013–3023.
Web of Science ®Google Scholar
Biswas, P., & Langdon, P. (2015). Multimodal intelligent eye-gaze tracking system. International Journal of Human–Computer Interaction, 31(4), 277–294, DOI: 10.1080/10447318.2014.1001301.
Web of Science ®Google Scholar
Bosch, N., Chen, H., D’Mello, S., Baker, R., & Shute, V. (2015). Accuracy vs. availability heuristic in multimodal affect detection in the wild. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI ‘15) (pp. 267–274). Seattle, WA: ACM.
Google Scholar
Brent, M. (1995). Instance-based learning: Nearest neighbour with generalization. Hamilton, NZ: University of Waikato, Department of Computer Science.
Google Scholar
Buisine, S., Courgeon, M., Charles, A., Clavel, C., Martin, J. C., Tan, N., & Grynszpan, O. (2014). The role of body postures in the recognition of emotions in contextually rich scenarios. International Journal of Human–Computer Interaction, 30(1), 52–62, DOI: 10.1080/10447318.2013.802200.
Web of Science ®Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. Proceedings of the Inter Speech, 1517–1520. Lisbon, Portugal.
Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., … Narayanan, S. S. (2004). Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of ACM 6th International Conference on Multimodal Interfaces (pp. 205–211). New York: ACM.
Google Scholar
Castellano, G., Kessous, L., & Caridakis, G. (2008). Emotion recognition through multiple modalities: Face, body gesture, speech, affect and emotion in human–computer interaction. In C. Peter & R. Beale (Eds.), Lecture notes in computer science 4868 (pp. 92–103). Berlin Heidelberg: Springer.
Google Scholar
Chen, L. S., Huang, T. S., Miyasato, T., & Nakatsu, R. (1998). Multimodal human emotion/expression recognition. Proceedings of the 3rd International Conference on Face and Gesture Recognition, 366–371.
Google Scholar
Chen, L. (2000). Joint processing of audio-visual information for the recognition of emotional expressions in human–computer interaction. PhD thesis. University of Illinois at Urbana–Champaign.
Google Scholar
Cohen, W. W. (1995). Fast effective rule induction. Twelfth International Conference on Machine Learning, 115–123.
Google Scholar
Cooper, D. H., Cootes, T. F., Taylor, C. J., & Graham, J. (1995). Active shape models – Their training and application. Computer Vision and Image Understanding, 61, 38–59.
Web of Science ®Google Scholar
Cristinacce, D., & Cootes, T. (2004). A comparison of shape constrained facial feature detectors. IEEE International Conference on Automatic Face and Gesture Recognition (FG’04), 375–380.
Google Scholar
Cristinacce, D., & Cootes, T. (2008). Automatic feature localisation with constrained local models. Journal of Pattern Recognition, 41(10), 3054–3067.
Web of Science ®Google Scholar
De Silva, L. C., & Ng, L. C. (2000). Bimodal emotion recognition. IEEE International Conference on Automatic Face and Gesture Recognition, 332–335.
Google Scholar
D’Mello, S. K., & Graesser, A. C. (2012). AutoTutor and Affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems, 2(4), 1–39.
Google Scholar
Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. K. Cole (Ed.), Nebraska Symposium on Motivation (pp. 207–283). Lincoln, NE: University of Nebraska Press.
Google Scholar
Ekman, P., & Friesen, W. V. (1979). Facial action coding system: Investigator’s guide. Consulting Psychologists Press. https://www.paulekman.com/product/facs-manual/
Google Scholar
Frank, E., Hall, M., & Pfahringer, B. (2003). Locally weighted Naive Bayes. 19th Conference in Uncertainty in Artificial Intelligence, 249–256.
Google Scholar
Gaffary, Y., Eyharabide, V., Martin, J. C., & Ammi, M. (2014). The impact of combining kinesthetic and facial expression displays on emotion recognition by users. International Journal of Human–Computer Interaction, 30(11), 904–920, DOI: 10.1080/10447318.2014.941276.
Web of Science ®Google Scholar
Geertzen, J. (2012). Inter-rater agreement with multiple raters and variables. Retrieved from https://mlnl.net/jg/software/ira/
Google Scholar
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2008). Multi-pie. IEEE International Conference on Automatic Face and Gesture Recognition (FG’08), 1–8.
Google Scholar
Grubb, C. (2013). Multimodal emotion recognition. Technical Report. Retrieved from http://orzo.union.edu/Archives/SeniorProjects/2013/CS.2013/
Google Scholar
Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorials in Quantitative Methods for Psychology, 8(1), 23–34. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/.
Google Scholar
Huhnel, I., Fölster, M., Werheid, K., & Hess, U. (2014). Empathic reactions of younger and older adults: No age related decline in affective responding. Journal of Experimental Social Psychology, 50, 136–143.
Web of Science ®Google Scholar
Jack, R. E., Garrod, O. G. B., Yub, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244. DOI: 10.1073/pnas.1200155109.
PubMed Web of Science ®Google Scholar
Jaimes, A., & Sebe, N. (2007). Multimodal human–computer interaction: A survey, computer vision and image understanding. Special Issue on Vision for Human–Computer Interaction, 108(1–2), 116–134.
Google Scholar
Jiang, L., & Zhang, H. (2006). Weightily averaged one-dependence estimators. Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence (PRICAI), 970–974.
Google Scholar
Krahmer, E., & Swerts, M. (2011). Audio-visual expression of emotions in communication. In Philips Research Book Series 12 (pp. 85–106). Dordrecht, The Netherlands: Springer.
Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.
PubMed Web of Science ®Google Scholar
Lang, G., & van der Molen, H. T. (2008). Psychologische Gespreksvoering book. Heerlen: Open University of the Netherlands.
Google Scholar
Le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 41(1), 191–201.
Web of Science ®Google Scholar
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn-Kande dataset (CK+): A complete facial expression dataset for action unit and emotion-specified expression. In Proceedings of the Third IEEE Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010) (pp. 94–101). San Francisco, CA: IEEE.
Google Scholar
Messer, K., Matas, J., Kittler, J., Luuttin, J., & Maitre, G. (1999). XM2VTSDB: The extended M2VTS database. International Conference of Audio and Video-Based Biometric Person Authentication (AVBPA’99), 72–77.
Google Scholar
Murthy, G. R. S., & Jadon, R. S. (2009). Effectiveness of Eigenspaces for facial expression recognition. International Journal of Computer Theory and Engineering, 1(5), 638–642.
Google Scholar
Nadolski, R. J., Hummel, H. G. K., Van den Brink, H. J., Hoefakker, R., Slootmaker, A., Kurvers, H., & Storm, J. (2008). EMERGO: Methodology and toolkit for efficient development of serious games in higher education. Simulations & Gaming, 39(3), 338–352. DOI: http://sag.sagepub.com/content/39/3/338.full.pdf+html.
Google Scholar
Nwe, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.
Web of Science ®Google Scholar
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572. DOI: 10.1080/14786440109462720.
Google Scholar
Pekrun, R. (1992). The impact of emotions on learning and achievement: towards a theory of cognitive/motivational mediators. Journal of Applied Psychology, 41, 359–376.
Web of Science ®Google Scholar
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schoelkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel Methods - Support Vector Learning (pp. 185–208). Cambridge, MA: MIT Press.
Google Scholar
Preeti, K. (2013). Multimodal emotion recognition for enhancing human–computer interaction. PhD dissertation. University of Narsee Monjee, Institute of Management Studies, Department of Computer Engineering. Mumbai, India.
Google Scholar
Rus, V., D’Mello, S. K., Hu, X., & Graesser, A. C. (2013). Recent advances in intelligent tutoring systems with conversational dialogue. AI Magazine, 34(3), 42–54.
Web of Science ®Google Scholar
Russell, J. A. (1994). Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychological Bulletin, 115, 102–141.
PubMed Web of Science ®Google Scholar
Saragih, J., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision (IJCV), 91(2), 200–215.
Web of Science ®Google Scholar
Sarrafzadeh, A., Alexander, S., Dadgostar, F., Fan, C., & Bigdeli, A. (2008). How do you know that I don’t understand? A look at the future of intelligent tutoring systems. Computers in Human Behavior, 24(4), 1342–1363.
Web of Science ®Google Scholar
Schuller, B., Lang, M., & Rigoll, G. (2002). Multimodal emotion recognition in audio-visual communication. IEEE International Conference on Multimedia and Expo, ICME ‘02, 1, 745–748. DOI: 10.1109/ICME.2002.1035889.
Google Scholar
Sebe, N., Cohen, I., Gevers, T., & Huang, T. S. (2006). Emotion recognition based on joint visual and audio cues. 18th International Conference on Pattern recognition, 1136–1139.
Google Scholar
Sebe, N. (2009). Multimodal interfaces: Challenges and perspectives. Journal of Ambient Intelligence and Smart Environments, 1(1), 23–30.
Web of Science ®Google Scholar
Van der Molen, H. T., & Gramsbergen-Hoogland, Y. H. (2005). Communication in organizations: Basic skills and conversation models. New York, NY: Psychology Press.
Google Scholar
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Conference on computer vision and pattern recognition, I-511-I-518.
Google Scholar
Viola, P., & Jones, M. (2002). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.
Google Scholar
Vogt, T. (2011). Real-time automatic emotion recognition from speech: The recognition of emotions from speech in view of real-time applications. Südwestdeutscher Verlag für Hochschulschriften. ISBN-10: 3838125452.
Google Scholar
Wang, S., Ling, X., Zhang, F., Tong, J. (2010). Speech emotion recognition based on principal component analysis and back propagation neural network. In Proceedings of the 2010 International Conference on Measuring Technology and Mechatronics Automation (ICMTMA ‘10), 03 (pp. 437–440). IEEE Computer Society, Washington, DC, USA.
Google Scholar
Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., & Andre, E. (2013). The Social Signal Interpretation (SSI) framework: Multimodal signal processing and recognition in real-time. Proceedings of the 21st ACM International Conference on Multimedia, MM ‘13,831–834.
Google Scholar
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.
PubMed Web of Science ®Google Scholar
Zhang, Z. (1999). Feature-based facial expression recognition: Sensitivity analysis and experiment with a multi-layer perceptron. International Journal of Pattern Recognition Artificial Intelligence, 13(6), 893–911.
Web of Science ®Google Scholar
Zheng, F., & Geoffrey, I. W. (2006). Efficient lazy elimination for averaged-one dependence estimators. Proceedings of the Twenty-third International Conference on Machine Learning (ICML 2006), 1113–1120.
Google Scholar
Zheng, Z., & Webb, G. (2000). Lazy learning of Bayesian rules. Machine Learning, 4(1), 53–84.
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning

REFERENCES

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning

REFERENCES

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date