3,836
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Leveraging Artificial Intelligence to Predict Young Learner Online Learning Engagement

, , &

ABSTRACT

The recent surge of online language learning services in the past decade has benefitted second language learners. However, there is a lack of understanding of whether learners, especially young learners, are engaged in online learning, and how educators can enhance the engagement of the online learning experience. This study examines an artificial intelligence (AI)- powered automated system that uses voice and facial recognition to track both teacher and learner speech, facial expressions, and interactions in real-time in a one-to-one 25-minute online English class. Each learner completed a learner engagement survey within 72 hours of the online class. Results demonstrated that young learners were highly engaged during this one-to-one online learning setting (mean = 4.5, out of 5). Learners’ frontal face exposure (indicating their attentiveness during class) and English proficiency levels are significant and positive predictors of learner engagement. Teachers’ total length of speech and instructional time tended toward significance in predicting learner engagement. Educational implications are discussed.

The recent surge of online language tutoring services in the past decade has benefitted second language learners (Meticulous Research, Citation2020). There are two types of online tutoring services: asynchronous and synchronous (Khan, Citation2006). In asynchronous tutoring, learners study on their own schedule and access learning materials online. In synchronous tutoring, instruction is live with the tutor and the learner being simultaneously present. Synchronous tutoring has two modes: one-to-one and one-to-many.

Artificial intelligence (AI) based automated measures, such as facial and voice recognition, can be an efficient tool to measure learning and teaching and provide immediate and actionable information to educational stakeholders, especially in one-to-one settings (Dewan, Lin, Wen, Murshed, & Uddin, Citation2018). However, little research has been done to understand how AI-extracted learning and teaching indicators predict learner engagement. This study furthers the emerging research in the educational technology field by leveraging AI data from both learners and teachers to better understand young learners’ online learning engagement in the one-to-one online tutoring setting.

Theoretical background

Definition and importance of learner engagement

Engagement has been called the “holy grail of learning” (Sinatra, Heddy, & Lombardi, Citation2015, p. 1), and is defined as “the amount of physical and psychological energy that the learner devotes to the academic experience” (Astin, Citation1997, p. 297). It is associated with improved outcomes, such as critical thinking (Elyas & Al-Bogami, Citation2019), academic achievement (Reeve & Tseng, Citation2011), and persistence in learning (Kuh, Cruce, Shoup, Kinzie, & Gonyea, Citation2008). Learner engagement is also a key factor in second language learning, framed as a “drive that is capable of both generating and sustaining long-term behavior” (Dorneyi & Ryan, Citation2015, p. 99).

The type of engagement learners experience varies with age. Elementary school learners are particularly swayed by classroom experiences, whereas older learners indicate the setting of long-term goals increases their engagement (Leona et al., Citation2021). Determining learner engagement and enhancing it is an important factor for long-term language acquisition. Student engagement changes in response to lesson activities and classroom interactions, as well as the learners’ motivational state (Oga-Baldwin & Nakata, Citation2020).

Measuring young learner engagement

Measuring young learner engagement, can be difficult (Henrie, Halverson, & Graham, Citation2015). Observational evaluation, where an observer’s subjective opinion on the learner’s engagement, is commonly used (Dewan, Murshed, & Lin, Citation2019). These evaluations contain checklists of actions meant to indicate engagement such as “sitting quietly” and “no tardy cards.” However, these learning indicators have not been shown to measure engagement but rule-following (Whitehill, Serpell, Lin, Foster, & Movellan, Citation2014). Therefore, social researchers recommend conducting surveys with children directly such as self-report surveys (Fredricks & McColskey, Citation2012).

A 2017 study (Oga-Baldwin & Nakata, Citation2017) measured children’s engagement in a 5th grade English as a Foreign Language (EFL) class in Japan using self-report surveys. Researchers found that learner engagement during in-person class was on average 3.68 out of 5 points. Additionally, the researchers gathered independent rankings by external observers classifying student engagement as low, moderate, or high. The students’ self-reported data broadly matched with the external rankings for engagement. Further research has shown the impact of classroom activities on learners decreases as they grow older (Nikolov, Citation2002). For young learners, it is essential to discover what types of interactions increase engagement.

AI-based methods to measure engagement

Engineers and researchers have used AI to draw insight into teaching and learning indicators. Computer vision enables computers to extract useful information from digital images, videos, and other visual inputs (IBM, Citationn.d.-a). Voice recognition is the capability of a program to identify an individual user’s voice (IBM, Citationn.d.-b). These AI-based methods mirror external observations to classify learning engagement as an emerging area of research (Sharma, Giannakos, & Dillenbourg, Citation2020). Automated methods are mostly being used to study adult online learning (Whitehill et al., Citation2014) and rarely being examined for young learners.

Computer vision analyzes cues from the face, body posture, and hand gestures to estimate learner engagement (Sharma et al., Citation2020). Video stream data is captured using a webcam or similar device. Typically, the engagement detection systems are trained using large datasets that are human labeled to track and classify features including facial expressions and eye movement (Friesen & Ekman, Citation1978). Facial recognition methods have only recently begun to be deployed for engagement detection in learning but show promise (Dewan et al., Citation2019). Krithika and Lakshmi Priya (Citation2016) found eye and head movement is an important indicator to detect adult learners’ emotion and engagement during online instruction. Ramakrishnan, Ottmar, LoCasale-Crouch, and Whitehill (Citation2019) trained a smiling facial recognition model specific to young learners to analyze the classroom climate. Their results showed that learners’ and teacher’s smiles accurately indicated the dynamics of a positive classroom climate. Another study analyzed video data and found that teachers use smiling to guide, confirm, and correct student responses. Smiling is “an available resource for teachers to construct pedagogically relevant actions” (Jakonen & Evnitskaya, Citation2020, p. 29).

Researchers analyzed teacher and learner audio files in an online context and found that teacher talk, an important teaching indicator, is related to student engagement (Hirschel, Citation2018). The correct amount of teacher talk is difficult to pinpoint and varies widely depending upon variables including learner age and second language proficiency. Paul (Citation2003) recommended raising teacher awareness of teacher talk and focusing on expanding the comprehensible input. Walsh and Li (Citation2013) showed that teacher talk directly shapes learning opportunities in language classrooms suggesting that increased wait-time, extended learner turns, and increased planning time contributes to learner speaking engagement in the classroom. Whereas teacher talk interruptions were shown to negatively impact learner participation (Yataganbaba & Yıldırım, Citation2016).

Research questions

To lay the groundwork to leverage AI-based measures generated, this study investigates the following research questions:

1. What is the engagement level reported by young learners studying English in a one-to-one online tutoring platform?

2. Can AI-measured learning and teaching indicators predict self-reported learner engagement?

Methods

Participants

Between November and December 2020, 159 participants were sampled from a student database of an online education technology company, VIPKid (https://www.vipkid.com/en-us/). VIPKid provides elective informal after-school online tutoring programs. Participants in this study enrolled in the Major Course (MC), VIPKid’s flagship English curriculum line. Instruction is provided via a synchronous one-to-one, online class format taught remotely by teachers. A lesson comprises a pre-class video, a one-to-one 25-minute class, and post-class enrichment activities. Learners and teachers access the classes by logging into VIPKid’s proprietary platform. Teachers follow the interactive slide decks for the one-to-one class.

Among all the learner participants, there were 79 males (51.6%) and 74 females (48.4%). We followed the recommendations of Bell (Citation2007) that survey should be completed with children aged 7 or above. Therefore, our participants ages ranged from 7 to 14 years, with the average age being 9.1 years. Learners were enrolled in MC Levels 2 through 6 with higher MC levels indicating a higher level of English proficiency determined by an entrance placement test. Level 3 had the most learners with 58 learners (37.9%) and Level 5 had the fewest with 13 learners (8.5%).

Materials and measures

Engagement survey

Our study used an updated engagement survey instrument from Oga-Baldwin and Nakata (Citation2017). The questionnaire from 2017 was translated from English into Chinese by two researchers (). The overall engagement is the average score of all eight items with the Cronbach’s alpha reliability of 0.90.

The survey was piloted twice. First, with eight Chinese learners aged 7–10 to ensure that it was clear and understandable. Second, using two different versions of the five-point smiley face Likert scale (SFL) developed by Hall, Hume, and Tazzyman (Citation2016), one with endpoints anchored at a sad face to a happy face and the other with variations of only happy faces. The piloting revealed that learners were able to understand the translated language. The happy-face-only variation received the greatest scale usage in learner responses, which is consistent with findings in Whitehill et al. (Citation2014). Based on the pilot results, the survey was finalized with the translated Chinese language and the scale as shown in .

Figure 1. Five-point smiley face Likert scale used in the learner survey.

Figure 1. Five-point smiley face Likert scale used in the learner survey.

AI-measured learning and teaching indicators

AI-measured learning and teaching indicators were generated through voice recognition and computer vision technology for both the learner and teacher. These variables include commonly used and discussed AI measures on learner image, teacher behaviors, and instructional time. The descriptions of variables are listed in .

Table 1. Artificial Intelligence (AI) -Measured Learning and Teaching Indicator.

Table 2. Descriptive Statistics on Student Self-Reported Engagement Scores.

To generate voice recognition variables, the AI system extracted the audio information from the video playback. The teacher and the learner each have an audio file. Next, voice activity detection (VAD), an algorithm that detects an audio signal and segments it into portions that contain speech or silent parts (Ivry, Berdugo, & Cohen, Citation2019) was run for each speaker’s audio file using the threshold classification method. The threshold classification method classifies data according to a preset threshold. Those within the threshold range belong to one category (e.g. “voiced”) while the rest go to the other (e.g. “unvoiced”). This method was used to identify the start time and end time of the voiced and unvoiced segments in the audio (Song, Wang, & Feng-Juan Guo, Citation2009). Two VAD models, for learners and teachers respectively, were trained. The models used a large set of transcribed data that removed meaningless vocal interferences for better accuracy in calculating these speech variables (Bhatia, Davidson, Kalidindi, Mukherjee, & Peters, Citation2006). The VAD models indicate when teachers and learners begin speaking during class and calculate the total length of speech for each speaker. See for a visual representation of the VAD audio signal segmentation. Portions that have detected speech are bordered in black rectangles.

Figure 2. Visual representation of VAD audio signal segmentation.

Figure 2. Visual representation of VAD audio signal segmentation.

To generate the computer vision variables, the AI system first extracted video information from the video recording. For every 1-second video frame, the face detection algorithm RetinaFace (Deng et al., Citation2019) was used to identify whether there is a face in the video frame. Next, using the facial angle model “Fsa-net” (Yang, Chen, Lin, & Chuang, Citation2019), the yaw angle of the learner’s head posture was calculated for every frame with a recognized face. The learner’s head pose information can be used to gauge attention (Stiefelhagen & Zhu, Citation2002). The head yaw angle is used to track whether or not a learner is looking at the screen. For our purposes, if the head yaw angle is more than 30° from the computer screen, the image is categorized as non-frontal facing (See ).

Figure 3. Calculation of Head Yaw Angle for Off- Screen Viewing.

Figure 3. Calculation of Head Yaw Angle for Off- Screen Viewing.

To count the number of smiling faces, the “RetinaFace” set of recognized faces was categorized as either “smiling” or “non-smiling” using a self-developed image classification model trained using the ResNet module (He, Zhang, Ren, & Sun, Citation2016) in the Azure Machine Learning designer. This image classification model is a supervised learning method (Xie, Girshick, Dollár, Tu, & He, Citation2017) and was trained using a set of 10,000+ human-labeled images.

Procedure

Upon enrollment in the one-to-one tutoring platform, learners and their guardians signed consent forms to allow the company to collect and analyze voice and video data to report learning progress and monitor class interactions. This study invited “active” learners, learners who had completed at least four classes and taken a class within the two weeks preceding the start, to participate. 295 parents of active learners indicated they would participate and signed a consent form for the study. They were incentivized to complete the survey with a platform-specific currency which could be used as credits to exchange for classes and peripheral products. Among them, 193 learners completed the survey within two weeks after receiving it (survey response rate = 65%), out of whom, 159 participants’ surveys were completed within 72 hours after they completed a one-to-one class. We excluded 34 participants who filled in the survey after 72 hours, as they may inaccurately recall their class over time, and some of them started taking a second class which could confuse their memory of the specific class surveyed.

Data analysis

All analyses were performed using SPSS (Citation2020). First, descriptive analyses were performed to understand the distribution of each variable. Second, a Pearson correlation table was created to examine the preliminary interrelationships among the AI learning and teaching indicators. Third, multiple regression models were used to predict learner self-reported engagement from AI learning and teaching indicators and learner and teacher background characteristics.

Three learners had instructional times under 10 minutes and were removed from the analysis due to reasons such as leaving class early, illness, etc. Three other learners were enrolled in MC levels above 6. These classes are 50-minutes in length. We excluded these learners from the analysis because 25-minute classes and 50-minute classes are not comparable on indicators that are derived from counts of interactions over the class period. Therefore, the final sample size was 153. Because learner participants responded to all survey questions, there is no missing survey data.

Results

describes learners’ self-reported overall engagement scores. On a scale of 1 to 5, with 5 being the highest level of engagement, learners reported a high level of engagement as the average score on each item and the total scores were all above 4.

Descriptive statistics for the online learning and teaching indicators and their correlations are presented in . The results indicated that the learner’s smiling face was positively associated with the learner’s frontal face exposure (r = .28) and the teacher’s smiling face (r = .21).

Table 3. Descriptive Statistics of Online Learning and Teaching variables and their correlation

We used a multiple regression model to predict learner self-reported engagement from all the AI learning and teaching indicators, learner demographic chracteristics (English proficiency level, gender, age), and instruction time. These independent variables have been found to be predictive of learning outcomes in the literature (Oga-Baldwin & Nakata, Citation2017). The multiple regression results show that learner frontal face exposure, learner English proficiency level, teacher speech total length, and instructional time are significant (or marginally significant) predictors of learner self-reported engagement (, ).

Table 4. Multiple Regression Predicting The Learner Engagement

Figure 4. Learner Frontal Face Exposure and Learner Self-Reported Engagement Scores.

Figure 4. Learner Frontal Face Exposure and Learner Self-Reported Engagement Scores.

Figure 5. English Proficiency Level and Learner Self-Reported Engagement Scores.

Figure 5. English Proficiency Level and Learner Self-Reported Engagement Scores.

Figure 6. Teacher Speech Total Length and Learner Self-Reported Engagement Scores.

Figure 6. Teacher Speech Total Length and Learner Self-Reported Engagement Scores.

Figure 7. Instructional Time and Learner Self-Reported Engagement Scores.

Figure 7. Instructional Time and Learner Self-Reported Engagement Scores.

Learners who had more frontal face exposure reported significantly higher levels of overall engagement than their peers with less frontal face exposure (β = 0.007, p < .05). also shows the positive association between learner frontal face exposure and self-reported engagement score. As shown in both and , learners with higher English proficiency reported a higher level of overall engagement than those who reported lower English proficiency levels (β = 0.16, p < .05). Two marginally significant relationships show that the shorter the total length of teacher speech (β = −0.65, p = .09) and the longer the instructional time (β = 0.03, p = .09), the higher the learner self-reported engagement score ( and ).

Discussion

Self-reported online learner engagement among young learners

This study found that our study participants reported a high level of engagement (4.5 out of 5) learning English in a synchronous one-to-one online tutoring platform. This engagement level is higher than what was reported in the traditional EFL classrooms (Oga-Baldwin & Nakata, Citation2017). In Oga-Baldwin & Nakata’s study, using the same survey and scale as in this study, students in 16 traditional brick-and-mortar fifth-grade Japanese EFL classes reported an average engagement ranging between 3.18–4.14 (mean = 3.68, SD = 0.83). Learners in our study likely reported higher levels of engagement due to the one-to-one nature of the courseware. Most learners in traditional brick-and-mortar classrooms compete for the teacher’s attention. In a one-to-one environment, the learner has the complete attention of their teacher which may relate to a high level of engagement.

Additionally, the course content has been specifically designed to engage the learner’s attention. To maintain high learner engagement, the courseware follows a “best practices” framework synthesized by Taylor and Parsons (Citation2011): (1) interaction, (2) exploration, (3) relevancy, (4) multimedia, (5) instruction, and (6) authentic assessment. Learners had more opportunities to interact with their teachers as compared to traditional classrooms. In terms of “exploration,” the lessons in this program utilize Bloom’s Taxonomy (Krathwohl, Citation2002). Learners are required to analyze, apply, and create something new (e.g., formulating their own ideas/opinions using the language and content learned in that lesson). Regarding “relevancy,” lessons were organized by themes and topics based on learners’ age, interests, and abilities, increasing the relevance of the content to learners. Finally, the curriculum primarily calls for cross-subject exploration and collaboration, often called “21st Century Skills” in recent literature (Taylor & Parsons, Citation2011). The curriculum itself may be more engaging for learners than that which they typically encounter in a brick-and-mortar classroom which further explains the learner’s high engagement scores.

AI learning and teaching indicators

Our correlational analysis shows that AI-measured learners’ smiling faces were positively associated with both learner frontal face exposure and the teacher’s smiling faces. Echoing what has been found in classroom setting that learners’ facial feedback (i.e., smiling) is positively correlated attention in classroom conditions (Marsh, Rhoads, & Ryan, Citation2019), our findings indicate in one-on-one online learning settings, learner’s smiling face is related to frontal face exposure, a measure of attention. Furthermore, previous research emphasized that teacher enthusiasm could enhance a range of positive learner outcomes, including attention, engagement, and recall in a classroom setting (Marsh et al., Citation2019). Our results suggest the mutual influence of teachers’ smiling faces and learners’ smiling faces in a synchronous one-to-one online language tutoring setting.

Predictors of engagement

Amongst all the AI learning and teaching indicators, learner frontal face exposure was a significant predictor of learner engagement. An evident increment of engagement was found when the count of frontal face exposures within a lesson increased. Our learner frontal face exposure measure indicated whether the learner was directly facing the screen. When the learners had low counts of learner frontal face exposure, they most likely were not attentive and focused on the lesson content. Behaviors like drowsiness and turning the head to talk were identified as low engagement indicators in the context of online learning (Hwang & Yang, Citation2008).

The teacher’s total length of speech has a tendency toward significance in predicting learners’ engagement. Learner engagement dipped significantly when teacher talk was over half the class time. Consistent with Paul (Citation2003) recommending raising awareness of teacher talk time, our findings emphasize the negative impact of teacher talk reducing young learner’s engagement. The importance in language learning for young learners is to encourage learners to speak more during the lessons.

Instructional time was also found to tend toward significance in predicting learners’ engagement. Teachers who stayed 10 minutes or longer had learners rate their engagement very high (4.8+ out of 5). Although class time is set to be 25 minutes, some teachers voluntarily chose to stay longer. The end of the class period is for rapport-building activities during which teachers and learners engage in conversation related to their personal lives. These brief extension moments allow the teachers to give additional support and attention to the learners. This finding is consistent with previous research on how teachers’ additional support and attention impacted learners (Blazar & Kraft, Citation2017). In both the previous research and this study, teachers’ additional support for learners was proven to be related to a higher level of learner engagement.

Finally, English proficiency level was a significant predictor of learner engagement. Learners with higher English proficiency reported a higher level of overall engagement than learners with lower English proficiency in all Levels, with an exception between Levels 4 and 5 where a plateau was observed. This makes sense as high-proficiency learners were more able to use the target language to express themselves and interact with the teachers, thus an improved learner engagement is observed. However, language learning literature supports that learner engagement is related to teacher and learner interactions (Davidson, Citation1999), but not necessarily learners’ language proficiency. More research is needed to examine the impact of learner proficiency on learner engagement.

Practical implications

Online learning platforms can use automated AI measures to predict and monitor learner engagement. One important AI indicator is learner frontal face exposure. If a learner is detected as not facing the screen for a certain amount of time, an automatic reminder can be sent to the learner to pull their attention back. The message can be sent to the teacher so that the teacher could differentiate teaching to get the learner’s attention back.

Similarly, if the system detects teachers are speaking too much, an automated reminder can provide tips to teachers to give learners opportunities to speak up. For example, for low proficiency-level learners, the system suggests using simple activities like “read aloud” to allow learners to participate.

Instructional time is a bit difficult to integrate into practical settings as teachers cannot always voluntarily extend class. However, teachers may benefit from trainings on effective strategies to build teacher learner relationships. If the teachers can show that they care for their learners and are willing to provide extra support, the learners are more likely to be engaged in their remote learning environment.

Limitations and future research

Our study has several limitations that need to be addressed in future research. First, our AI indicators of engagement can be improved. Learner frontal face exposure measured by head orientation has limited precision in estimating a person’s attentional direction; future studies need to utilize eye gaze and body posture data to achieve better precision to measure attentional direction (Shirama, Citation2012). Future studies also need to include other emotional recognitions such as boredom, confusion, frustration, etc., to boost engagement prediction (Dewan et al., Citation2019). Recent studies have noted that racial bias exists in facial recognition programs (Raji et al., Citation2020). The accuracy of the facial recognition model, RetinaFace, used to recognize teacher and learner faces should be examined for any racial biases. Future studies can also benefit from a measure of the quality of teacher speech.

Second, this study has a sample size of 153 participants. Future studies need to replicate the findings of this study using a larger sample.

Third, although self-reported survey data has the advantage of capturing user experience and perceptions, they can be objective and biased (Short et al., Citation2009). Future research may aim to overcome this limitation by using mixed methods (e.g., structured interviews) to provide a more detailed picture of learner engagement.

Finally, one needs to generalize the findings of this study to other online learning environments (i.e., Massive Open Online Courses) with caution. Additional research is necessary to validate our approach in other online contexts with young learners.

Conclusion

Despite these limitations, this study is the first to report the engagement level among young learners in the one-on-one online language tutoring context, and the first to predict young learners’ self-reported engagement using AI learning and teaching indicators. This study provides a promising contribution to the field of engagement study of young learners using education technology and offers a valid starting point for further research in this area.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • Astin, A. (1997). What matters in college? Four critical years revisited. San Francisco: Joseey-Bass.
  • Bell, A. (2007). Designing and testing questionnaires for children. Journal of Research in Nursing, 12(5), 461–469. doi:10.1177/1744987107079616
  • Bhatia, M., Davidson, J., Kalidindi, S., Mukherjee, S., & Peters, J. (2006). VoIP: An In-Depth Analysis (Indianapolis, Indiana: Cisco Press).
  • Blazar, D., & Kraft, M. A. (2017). Teacher and teaching effects on students’ attitudes and behaviors. Educational Evaluation and Policy Analysis, 39(1), 146–170. doi:10.3102/0162373716670260
  • Davidson, A. (1999). Negotiating social differences: Youth’s assessments of educators’ strategies. Urban Education, 34(3), 169–338. doi:10.1177/0042085999343004
  • Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., & Zafeiriou, S. (2019). RetinaFace: Single-stage dense face localisation in the wild. Retrieved from http://arxiv.org/abs/1905.00641
  • Dewan, M. A. A., Lin, F., Wen, D., Murshed, M., & Uddin, Z. (2018). A deep learning approach to detecting engagement of online learners. 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China. (pp. 1895–1902).
  • Dewan, M. A. A., Murshed, M., & Lin, F. (2019). Engagement detection in online learning: A review. Smart Learning Environments, 6(1). doi:10.1186/s40561-018-0080-z
  • Dorneyi, Z., & Ryan, S. (2015). The psychology of the language learner revisited. New York, NY: Routledge.
  • Elyas, T., & Al-Bogami, B. (2019). The tole of the iPad as instructional tool in optimizing young learners’ achievement in EFL classes in the Saudi context. Arab World English Journal, 1(1), 144–162. doi:10.24093/awej/elt1.11
  • Fredricks, J. A., & McColskey, W. (2012). The measurement of student engagement: A comparative analysis of various methods and student self-report instruments. In S. L. Christenson, A. L. Reschly, & C. Wylie (Eds.), Handbook of research on student engagement (pp. 763–782). Boston, MA: Springer US.
  • Friesen, E., & Ekman, P. (1978). Facial action coding system: A technique for the measurement of facial movement. California, CA: Consulting Psychologists Press, Palo Alto.
  • Hall, L., Hume, C., & Tazzyman, S. (2016). Five degrees of happiness: Effective smiley face Likert scales for evaluating with children. Proceedings of IDC 2016 - The 15th International Conference on Interaction Design and Children (pp. 311–321). doi:10.1145/2930674.2930719.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA (pp. 770–778).
  • Henrie, C. R., Halverson, L. R., & Graham, C. R. (2015). Measuring student engagement in technology-mediated learning: A review. Computers and Education, 90, 36–53. doi:10.1016/j.compedu.2015.09.005
  • Hirschel, R. (2018). Teacher talk in the elementary school EFL classroom. Bulletin of Sojo University, 43, 31–41.
  • Hwang, K., & Yang, C. (2008). Fuzzy fusion for affective state assessment in distance learning based on image detection. 2008 International Conference on Audio, Language and Image Processing. Shanghai, China (pp. 380–384).
  • IBM. (n.d.a). What is computer vision? Retrieved from https://www.ibm.com/topics/computer-vision
  • IBM. (n.d.b). What is speech recognition? Retrieved from https://www.ibm.com/cloud/learn/speech-recognition
  • IBM Corp. (2020). SPSS Statistics for Windows (No. 27). Author.
  • Ivry, A., Berdugo, B., & Cohen, I. (2019). Voice activity detection for transient noisy environment based on diffusion nets. IEEE Journal of Selected Topics in Signal Processing, 13(2), 254–264. doi:10.1109/JSTSP.2019.2909472
  • Jakonen, T., & Evnitskaya, N. (2020). Teacher smiles as an interactional and pedagogical resource in the classroom. Journal of Pragmatics, 163, 18–31. doi:10.1016/j.pragma.2020.04.005
  • Khan, B. H. (2006). flexible learning in an information society. Hershey, Pennsylvania: Information Science Publishing.
  • Krathwohl, D. R. (2002). A revision of Bloom’s taxonomy: An overview. Theory Into Practice, 41(4), 212–218. doi:10.1207/s15430421tip4104_2
  • Krithika, L. B., & Lakshmi Priya, G. G. (2016). Student Emotion Recognition System (SERS) for e-learning improvement based on learner concentration metric. Procedia Computer Science, 85, 767–776. doi:10.1016/j.procs.2016.05.264
  • Kuh, G. D., Cruce, T. M., Shoup, R., Kinzie, J., & Gonyea, R. M. (2008). Unmasking the effects of student engagement on first-year college grades and persistence. The Journal of Higher Education, 79(5), 540–563. doi:10.1080/00221546.2008.11772116
  • Leona, N. L., van Koert, M. J., van der Molen, M. W., Rispens, J. E., Tijms, J., & Snellings, P. (2021). Explaining individual differences in young English language learners’ vocabulary knowledge: The role of Extramural English Exposure and motivation. System, 96, 102402. doi:10.1016/j.system.2020.102402
  • Marsh, A. A., Rhoads, S. A., & Ryan, R. M. (2019). A multi-semester classroom demonstration yields evidence in support of the facial feedback effect. Emotion, 19(8), 1500–1504. doi:10.1037/emo0000532
  • Meticulous Research. (2020). Online language learning market by product (SaaS, Apps, Tutoring), Mode (Consumer, Government, K-12, Corporate), Language (English, German, Japanese, Korean, Mandarin Chinese) and Geography - Global Forecast to 2027. Retrieved from https://www.meticulousresearch.com/product/online-language-learning-market-5025
  • Nikolov, M. (2002). Issues in English language education. Bern, Switzerland: P. Lang.
  • Oga-Baldwin, W. L. Q., & Nakata, Y. (2017). Engagement, gender, and motivation: A predictive model for Japanese young language learners. System, 65, 151–163. doi:10.1016/j.system.2017.01.011
  • Oga-Baldwin, W. Q., & Nakata, Y. (2020). How teachers promote young language learners’ engagement: Lesson form and lesson quality. Language Teaching for Young Learners, 2(1), 101–130. doi:10.1075/ltyl.19009.oga
  • Paul, D. (2003). Teaching English to children in Asia. Hong Kong, China: Longman Asia ELT.
  • Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Lee, J., & Denton, E. (2020, February). Saving face: Investigating the ethical concerns of facial recognition auditing. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. New York, USA (pp. 145–151).
  • Ramakrishnan, A., Ottmar, E., LoCasale-Crouch, J., & Whitehill, J. (2019). Toward automated classroom observation: Predicting positive and negative climate. 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) (pp. 1–8). doi:10.1109/FG.2019.8756529.
  • Reeve, J., & Tseng, C. M. (2011). Agency as a fourth aspect of students’ engagement during learning activities. Contemporary Educational Psychology, 36(4), 257–267. doi:10.1016/j.cedpsych.2011.05.002
  • Sharma, K., Giannakos, M., & Dillenbourg, P. (2020). Eye-tracking and artificial intelligence to enhance motivation and learning. Smart Learning Environments, 7(1). doi:10.1186/s40561-020-00122-x
  • Shirama, A. (2012). Stare in the crowd: Frontal face guides overt attention independently of its gaze direction. Perception, 41(4), 447–459. doi:10.1068/p7114
  • Short, M. E., Goetzel, R. Z., Pei, X., Tabrizi, M. J., Ozminkowski, R. J., Gibson, T. B., … Wilson, M. G. (2009). How accurate are self-reports? Analysis of self-reported health care utilization and absence when compared with administrative data. Journal of Occupational & Environmental Medicine, 51(7), 786–796. doi:10.1097/JOM.0b013e3181a86671
  • Sinatra, G. M., Heddy, B. C., & Lombardi, D. (2015). The challenges of defining and measuring student engagement in science. Educational Psychologist, 50(1), 1–13. doi:10.1080/00461520.2014.1002924
  • Song, Y., Wang, W., & Feng-Juan Guo, F. (2009). Feature extraction and classification for audio information in news video. 2009 International Conference on Wavelet Analysis and Pattern Recognition (pp. 43–46). doi:10.1109/ICWAPR.2009.5207452.
  • Stiefelhagen, R., & Zhu, J. (2002, April). Head orientation and gaze direction in meetings. CHI’02 extended abstracts on human factors in computing systems. Minneapolis, Minnesota, USA (pp. 858–859).
  • Taylor, L., & Parsons, J. (2011). Improving student engagement. Current Issues in Education, 14(1), 1–32.
  • Walsh, S., & Li, L. (2013). Conversations as space for learning. International Journal of Applied Linguistics, 23(2), 247–266. doi:10.1111/ijal.12005
  • Whitehill, J., Serpell, Z., Lin, Y., Foster, A., & Movellan, J. R. (2014). The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Transactions on Affective Computing, 5(1), 86–98. doi:10.1109/TAFFC.2014.2316163
  • Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA (pp. 1492–1500).
  • Yang, T.-Y., Chen, Y.-T., Lin, -Y.-Y., & Chuang, -Y.-Y. (2019). FSA-Net: Learning fine-grained structure aggregation for head post estimation from a single image. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA (pp. 1087–1096).
  • Yataganbaba, E., & Yıldırım, R. (2016). Teacher interruptions and limited wait time in EFL young learner classrooms. Procedia - Social and Behavioral Sciences, 232, 689–695. doi:10.1016/j.sbspro.2016.10.094

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.