Abstract
Video-based learning has successfully become an effective alternative to face-to-face instruction. In such situations, modeling or predicting learners’ flow experience during video learning is critical for enhancing the learning experience and advancing learning technologies. In this study, we set up an instructional scenario for video learning according to flow theory. Different learning states, i.e., boredom, fit (flow), and anxiety, were successfully induced by varying the difficulty levels of the learning task. We collected learners’ electrocardiogram (ECG) signals as well as facial video, upper body posture and speech data during the learning process. We proposed classification models of the learning state and regression models to predict flow experience by utilizing different combinations of the data from the four modalities. The results showed that the model performance of learning state recognition was significantly improved by the decision-level fusion of multimodal data. By using the selected important features from all data sources, such as the standard deviation of normal to normal R-R intervals (SDNN), high-frequency (HF) heart rate variability and mel-frequency cepstral coefficients (MFCC), the multilayer perceptron (MLP) classifier gave the best recognition result of learning states (i.e., mean AUC of 0.780). The recognition accuracy of boredom, fit (flow) and anxiety reached 47.48%, 80.89% and 47.41%, respectively. For flow experience prediction, the MLP regressor based on the fusion of two modalities (i.e., ECG and posture) achieved the optimal prediction (i.e., mean RMSE of 0.717). This study demonstrates the feasibility of modeling and predicting the flow experience in video learning by combining multimodal data.
Disclosure statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Additional information
Funding
Notes on contributors
Yankai Wang
Yankai Wang is a master student at the Zhejiang Sci-Tech University (China). His research interests include online video learning, quality of experience evaluation, and learning state recognition.
Bing Chen
Bing Chen is a lecturer at the Hangzhou Normal University (China). He has a PhD degree in foundations of artificial intelligence (Xiamen University, China). At present, he mainly focuses on the research of artificial intelligence and robotics in education.
Hongyan Liu
Hongyan Liu is a full professor and researcher at the Zhejiang Sci-Tech University (China). She has a PhD degree in Basic Psychology (Beijing Normal University, China). Currently, her research interests are on neuro-cognitive mechanism underlying emotion and attractiveness and related areas.
Zhiguo Hu
Zhiguo Hu is a full professor and researcher at the Hangzhou Normal University (China). He has a PhD degree in Basic Psychology (Beijing Normal University, China). Currently, his research interests are on the neuro-cognitive mechanism of emotion regulation and related areas.