271
Views
0
CrossRef citations to date
0
Altmetric
Educational Assessment & Evaluation

Social annotations and second language viewers’ engagement with multimedia learning resources in LMOOCs: a self-determination theory perspective

& ORCID Icon
Article: 2335715 | Received 30 Oct 2023, Accepted 16 Mar 2024, Published online: 02 Apr 2024

Abstract

The benefits of using social annotations to promote collaborative learning experiences have been investigated in the context of digital reading and listening, while its implications in multimedia learning contexts remain unclear. To investigate the influence of social annotations on multimedia language learning, we invited 16 African participants who were learning Chinese as a second language through a MOOC platform in China. They were given full autonomy to study and to shift instructional videos between three modes: no visual aids, linguistic captions, and social annotations. Guided by self-determination theory, the qualitative data were collected through one-to-one interviews, while quantitative data, such as the duration of watching and the number of questions raised by participants, were obtained from observations. Statistical analysis suggests that MOOC learners demonstrated a higher level of engagement and motivation in the social annotation mode than in the other two modes. These statistical findings can be explained from the following three perspectives: content and emotion-related social annotations by other L1 viewers can assist in (1) checking their comprehension of the video by comparing their understanding with others; (2) adjusting learning goals at both the linguistic and content level and therefore using self-regulated learning strategies to a greater extent, especially in actively seeking assistance and other learning resources, and (3) relating the video content to the teaching curriculum/personal experience/learning goals, therefore creating a sense of belonging to a larger language learner community as language learners, interpreters and contributors to the video social annotations.

1. Introduction

Many solutions have been proposed to increase video viewers’ engagement and motivation levels, including the introduction of training sessions in self-directed learning strategies (García Botero et al., Citation2019), the embedding of peer feedback sessions (Wang et al., Citation2023), the provision of extra assistance in instructional videos (Bonafini et al., Citation2017) and the embedment of students’ designed multimedia learning resources and assessment tasks (Joksimović et al., Citation2019). One of the convenient ways is to provide additional scaffolds inside the videos, that is, the presentation of captions, especially for second-language viewers who need to develop content knowledge and overcome language barriers simultaneously (Yang & Chang, Citation2014). Multimedia learning with captions is found to result in different L2 learning outcomes. Concerning vocabulary acquisition, videos with captions improve L2 learners’ receptive vocabulary knowledge and retention rate (Mohsen, Citation2016; Teng, Citation2020). However, there are contradictory findings regarding listening comprehension, as captions are found to promote or hinder listening comprehension (Aldera & Mohsen, Citation2013; Yang & Chang, Citation2014).

As a result, two trends are observed in the recent development of using instructional videos with captions to engage with second language learners. First of all, an increasing number of studies have shifted their attention away from linguistic-oriented captions to content and emotion-oriented annotations, such as barrage and time-sync comments (Yang, Citation2021). Secondly, instead of inviting language instructors to generate captions and give pre- designed strategy-based instructions, more studies started to recommend the learners, both L1 and L2 viewers, to generate social annotations (Mohammadhassan et al., Citation2022). This study investigated the impact of using content and emotion-oriented social annotations on learners’ motivation to engage with multimedia learning resources in a MOOC context.

2. Literature review

2.1. Captions and multimedia learning

Multimedia resources with different kinds of captions have multiple effects on vocabulary acquisition and listening comprehension, such as (1) better vocabulary retention with the provision of captions (Mohsen, Citation2016); (2) more accurate pronunciation when keyword captions and full captions are presented (Mahdi, Citation2017); (3) higher scores in receptive vocabulary knowledge after the combination of the advance-organiser strategy and glossed full captions are used (Teng, Citation2020); (4) the improvement of listening comprehension with the aid of annotated keyword captions (Yang & Chang, Citation2014); (5) poorer listening performance when annotations are provided (Aldera & Mohsen, Citation2013). For example, Mohsen (Citation2016) investigated the effects of annotations + captions + animation (ACA) and annotations + transcripts + animation (ATA) on L2 vocabulary gains, and found that both conditions improved vocabulary acquisition, while ACA was more beneficial to vocabulary retention. Yang and Chang (Citation2014) compared the impacts of full, keyword-only, and annotated keyword captions on L2 learners’ listening performance, and concluded that the annotated keyword captions best facilitated overall listening comprehension.

Regarding learning motivation, captions do not show a significant positive impact on L2 learners’ attitudes, confidence, preferences, and self-efficacy (Chen, Wang, et al., Citation2019; Chen et al., Citation2020; Ozdemir et al., Citation2016). For example, Chen et al. (Citation2020) compared the effects of non-caption, English captions, and Chinese captions on L2 learners’ motivation, and found that non-caption videos enhanced learners’ confidence and preference, while English captions caused a high cognitive load in low-proficiency learners. It is speculated that the provision of linguistic assistance fails to facilitate low-proficiency learners’ selecting, organizing, and integrating processes in multimedia learning because the linguistic captions, as an extra verbal information resource, increased the cognitive burden when the learners were already struggling to process the pictorial and audio information with limited cognitive capacity.

2.2. Social annotations and multimedia learning

According to Sun et al. (Citation2023), social annotation (SA) enables users to collectively highlight crucial text, generate comments, and engage in discussions on the same online learning materials. This not only speeds up and enhances learners’ cognitive comprehension of information but also contributes to fostering a sense of belonging, which is particularly crucial for remote education (Sun et al., Citation2023). The incorporation of content- and emotion-related comments in instructional videos seems to offer a practical solution to the lack of face-to-face peer communication during learning. This fosters opportunities for learners to share and express their thoughts and opinions. There are two ways to address its effectiveness: (1) discourse analysis of the comments in the videos, and (2) observation and measurement of viewers’ learning behaviors in response to the comments and the learning outcomes. Concerning the first one, the discourse analysis on the comments of videos has been widely used to analyze the content and emotion in the comments (Messerli & Locher, Citation2021). The results confirmed its implications on developing contributors’ knowledge in cross-cultural communications (Benson, Citation2015) and writing proficiency (Pino-Silva, Citation2007).

As for the second type, unlike the mixed results from the studies on the effects of linguistic-oriented captions and learners’ comprehension level on multimedia learning resources, the existing evidence suggests a positive impact. The content-oriented comments are sometimes called barrage or bullet screen, an interactive commenting system attached and synchronized with online videos (Chen, Zhou, et al., Citation2019). On the one hand, some studies linked content-oriented comments or barrages to learners’ higher engagement or motivation levels. For example, aiming to support online learners’ collaborations in sharing knowledge in a discussion platform, Yang et al. (Citation2011) found that the availability of peers’ annotations, which explains their understanding of the mentioned key concepts in the conversations, may assist in sharing knowledge among the learners, improving their reading comprehension level and sustaining a high level of motivation and engagement, as evidence by the number and quality of questions and answers they raised and answered collaboratively. Similarly, based on the results of an eye-tracking experiment, Chen, Zhou, et al. (Citation2019) also concluded that barrages that reflect other viewers’ emotions and comments on the content of the videos may attract individual viewers’ attention longer than when they watch videos alone. They observed that participants paid more attention to the barrage area, with longer gaze time and more fixation on it, probably because of their desire to interact emotionally with other viewers.

On the other hand, social annotations, such as peer-generated and content-oriented comments in the reading and multimedia learning resources, appear to result in better learning outcomes. For example, in the context of a centralized university learning system, Araújo et al. (Citation2017) observed that allowing students to comment on and rate the uploaded instructional videos in the learning management system is crucial to developing their collaborative interaction skills and their performance. Moreover, in the context of a video-based language teacher education program, Zottmann and his research team (2012) found that the inclusion of authentic comments made by teachers and learners in the videos appeared to be more effective in developing teacher trainees’ analytical competency in classroom situations. It suggests that the comments made by teachers and students create an authentic environment for teacher trainees to evaluate and understand the case, leading to better learning outcomes. Yang et al. (Citation2011) developed a web-based collaborative annotation tool applied in group reading activities, and found that social annotations effectively facilitated learners’ knowledge sharing and reading comprehension.

Chen and Chen (Citation2014) further discussed the impact of collaborative reading annotations on digital reading performance, and concluded that the application of social annotations not only improved learners’ reading comprehension at explicit and inferential levels, but also led to better use of reading strategies, more positive reading attitudes, and higher learning satisfaction.

The impact of a certain type of annotations in videos has been statistically tested by comparing the learning outcomes between control and experimental groups. Learning outcomes here usually refer to two types of evidence: retention rate of vocabulary knowledge (both receptive and productive) and the comprehension level of the video. The commonly adopted theoretical frameworks include Cognitive Load Theory (Plass et al., Citation2010) and Cognitive Theory in Multimedia Learning (Mayer, Citation2009). However, what has not been investigated extensively is the link between video annotations and L2 viewers’ engagement and motivation levels, which are particularly important in a self-directed learning environment without instructors’ close supervision and monitoring.

Self-determination theory has been widely used by previous MOOC studies to understand online learners’ engagement patterns (Martin et al., Citation2018) and describe their motivation and satisfaction levels (Joo et al., Citation2018). Guided by self-determination theory, this study explored the extent to which the combination of linguistic captions and content-oriented social annotations can help L2 viewers develop personalized learning goals and learning paths when studying instructional videos. The theory has three components: competence, relatedness, and autonomy (Deci & Ryan, Citation2000). Competence refers to the need to attain valued outcomes (Deci & Ryan, Citation2010). It suggests that learners make efforts to achieve their learning goals and expected learning outcomes, which in turn increases their motivation level (Martin et al., Citation2018). Relatedness refers to the need to connect and interact with others, seeking care and a sense of belonging (Deci & Ryan, Citation2010). This means that learners have the desire to communicate and interact with others during online learning, and be part of the learning community. In other words, LMOOC learners have psychological demands to be connected to other distance learners, and be able to learn, co-create and share knowledge with others in the same learning community (Martin et al., Citation2018). Autonomy refers to the need to act in harmony with one’s integrated sense of self, and is concerned with the experience of integration and freedom (Deci & Ryan, Citation2010). It can be seen that learners need to select and adjust their learning goals and learning modes, adjust their learning pace, take control of their online learning process, and use learning strategies and resources based on their actual needs (Martin et al., Citation2018).

3. Research gaps and research questions

This study contributes to the current understanding of using multimedia learning resources to motivate LMOOC learners from at least two perspectives: (1) although motivation and engagement level have been proposed as two significant predictors for academic success in a MOOC context, previous studies were largely designed and conducted experimentally, therefore perhaps lacking the guidance of motivation theories; (2) most studies have been focused on the impact of captions, including L1 and L2, on video viewers’ development of vocabulary knowledge and comprehension level. It is still rare to investigate the impact of content and emotion-oriented social annotations on LMOOC learners. In the case of learning instructional videos, it is speculated that social annotations in instructional videos may motivate L2 viewers to further engage with the videos by fulfilling their psychological needs for autonomy, relatedness, and competence. This study asked two research questions: (1) to what extent can social annotations from L1 viewers motivate L2 viewers to engage with the multimedia learning resources in an LMOOC context? (2) If social annotations can better motivate L2 viewers to engage with the multimedia learning resources in an LMOOC context, why?

4. Research design

4.1. Research context

The data in this study were collected from an online Chinese language program, which is based on a MOOC platform in the university and is designed for international students learning Chinese as a second language. This language learning program, called Exploring China Online, offers rich multimedia resources to help L2 learners improve their L2 proficiency and better understand China. The program lasts for 12 weeks and includes topics such as Chinese cuisine, local festivals, music and arts, and traveling. The videos, with voice narration in Chinese, are also provided with Chinese captions and social annotations. Learners have the autonomy to select non-caption mode, caption mode, and social annotations mode in the MOOC platform.

Eighteen African students with a Chinese language proficiency level around HSK level 3 (equivalent to CEFR B1 level) from a polytechnic in Guangzhou, Guangdong, were enrolled in this program. Two participants dropped out of the program, leaving sixteen participants (7 males aged 19–25 years and 9 females aged 18–26 years) who finished the program. Due to the pandemic, they had been learning online in the same MOOC platform for over 6 months, and were informed of the research plan, operation procedures, and privacy issues. Two teachers with more than five years of Chinese teaching experience were also invited to the program.

4.2. Development of materials

A video clip about Chinese cuisine was used in this study, with a vocabulary size of 296 words and a video length of around 150 seconds. The video was selected based on three factors: (1) a language difficulty level equivalent to HSK 3 (CEFR B1 level); 2) the availability of full-text captions; (3) the popularity of the video content, which ensures participants’ familiarity with the topic, as well as the quantity and relevance of social annotations made by native viewers. The selected video clip is one episode from an online cooking teaching program posted by a famous cook with over 6 million followers on Bilibili (a popular video website in China). Participants were familiar with the Chinese cuisine topic as it repeatedly appeared in their Chinese courses. The researchers downloaded the social annotations from L1 viewers, removed the irrelevant comments, and edited the social annotation version. Finally, three video modes (no visual aids at all, Chinese captions, and social annotations) were uploaded to the MOOC platform (see ). Participants were free to select among the three modes during learning.

Figure 1. Screenshots of caption mode and social annotation mode.

Figure 1. Screenshots of caption mode and social annotation mode.

A pilot study was conducted two weeks before data collection. Two African students were invited to watch the video without time restrictions. They were allowed to select the modes based on their needs, pause the video, rewind it, take notes, check a Chinese dictionary, and ask others for help. It was observed that the participants watched the video about 2 or 3 times under each mode, spending around 15 minutes (900 seconds) in total.

4.3. Data collection

4.3.1. Quantitative data

The quantitative data of this study were collected through observations, and the qualitative data from one-to-one interviews. The participants logged into the MOOC platform in their dormitories, and were instructed to select video modes, use the keyboard or mouse to pause, fast-forward, and rewind the video in advance. To present the participants with the features of an authentic MOOC learning environment, during video learning, the participants were allowed to take notes and raise questions with their cellphones, laptops, pens and papers, use e-dictionaries, take photos and screenshots, and ask help from others.

There was no time restriction for the video learning. The whole learning process and interviews were voluntarily recorded with an extra phone. The quantitative measures of video learning behavior were mainly borrowed from previous studies on social annotations and captions in multimedia learning, which include (1) the duration of video watching with two types of visual aids, (2) the number of comments the participants had made, (3) the number of questions the participants asked the interviewers and friends, (4) the number of pauses the participants had made, and (5) the number of notes the participants had taken (see ).

Table 1. Prompts of quantitative data collection.

4.3.2. Qualitative data: post-video interviews

Each time after the participants finished a video, an one-to-one online interview was conducted, meaning that the number of post-video interviews depended on the number of times participants watched the video. The same predetermined questions were asked after a video was finished. As shown in , the interview questions were developed based on the self-determination theory (Deci & Ryan, Citation2000), which focuses on the degree to which learning is self-motivated and self-determined. Three main intrinsic needs are involved in the self-determination theory, i.e., the need for competence, relatedness, and autonomy. Researchers have raised follow-up questions based on observation and the need for further explanation. The interviews were conducted online through a smartphone or laptop, and were audio-recorded.

Table 2. Prompts of semi-structured interviews.

4.4. Data analysis

To measure the different levels of engagement under three modes – no visual aids, linguistic captions and social annotations – ANOVA tests were conducted to compare the five indicators: (1) the duration of video watching under each mode, (2) the number of comments the participants made, (3) the number of questions the participants asked the interviewers and friends, (4) the number of pauses the participants made, and (5) the number of notes the participants took.

To answer the second research question, the interview data were coded under the framework of self-determination theory (Deci & Ryan, Citation2000) (see ), in which competence refers to the extent to which two types of visual aids may help L2 viewers to (1) check their understanding, (2) identify the mistakes/misunderstanding in their understanding of the video, and, (3) motivate them to study sustainably way. Relatedness refers to the extent to which two types of visual aids can better help L2 viewers to (1) find out the relevance of this video to the teaching curriculum, personal experience and learning goals, and (2) perceive themselves as a member of a larger community, such as L2 learners, video viewers and contributors. Finally, autonomy refers to how the two modes with visual aids can help L2 viewers to (1) set and adjust learning goals at both the linguistic and content levels, (2) use more self-regulated learning strategies, and (3) actively seek assistance and other learning resources.

Table 3. Coding framework of interview data.

The two authors coded sixteen interview transcripts together. The disagreements could mainly be categorized into two parts: (1) comments that cannot be linked to the framework, such as some negative feedback towards a certain mode, and (2) the interpretation of participants’ confusion with the video or social annotations. All discrepancies were resolved through discussions with the participants.

5. Findings

5.1. Observational data

To answer the first research question, descriptive and inferential analysis were conducted based on observational data. reports the results of the descriptive data analysis. In total, thirteen second language learners participated in this study by studying a three-minute instructional video, recording their video learning with a smartphone on their back and answering our questions during the post-video interviews. Overall, when social annotations were enabled, viewers watched the 3-minute instructional video for longer (M = 302.5 seconds, SD = 57.329) than under the other two modes. The viewers also left more comments (M = 8.44, SD = 4.351), asked more questions to the researchers and their friends (M = 1.25, SD = 1.238), paused the videos more times (M = 7, SD = 4.147), and took more notes in either French or Chinese (M = 3.06, SD = 3.714) during their watching.

Table 4. Descriptive statistics.

One-way ANOVA tests revealed a statistically significant difference in all five indicators of their learning experience with the instructional video among the three groups: (1) the durations of watching the video (F(2, 48) = 17.072, p = 0.000), (2) the number of comments the viewers made (F(2, 48) = 29.468, p = 0.000), (3) the number of questions the viewers asked (F(2, 48) = 7.938, p = 0.000), (4) the number of pauses the viewers pressed (F(2, 48) = 34.141, p = 0.000) and (5) the number of notes taken by the viewers from the three groups (F(2, 48) = 3.437, p = 0.041). Post hoc analyzes were conducted using Bonferroni’s post-hoc test. The viewers spent significantly more time learning social annotations than linguistic-oriented captions. They also left more comments and paused the video more frequently with social annotations than captions. Moreover, they asked significantly more questions and took more notes when they watched the video with other viewers’ social annotations.

Considering that the participants were encouraged to watch the videos as many times as they wanted, a longer period with the social annotations may be interpreted as a higher level of willingness to engage with the video. Moreover, taking more notes, asking more questions and pausing the video more frequently suggested that L2 viewers engaged with social annotations in the video more actively than the captions, as the participants had studied the video on the MOOC platform without the presence of instructors. Therefore, one possible interpretation of the descriptive data is that the LMOOC learners demonstrate higher levels of motivation and engagement with the social annotations made by other L1 viewers in the video.

5.2. Interview data

To answer the second research question, the reasons were investigated though interview data. L2 viewers’ higher level of motivation and engagement under the social annotation mode compared to the other two modes can be explained by the data from post-video interviews from at least three perspectives. First, the instructional video’s social annotations created self- assessment opportunities for L2 viewers to constantly check their comprehension level and language proficiency. For example, in the interview, one participant pointed the finger at one sentence and stated:

(S5) This comment said, “Add vinegar so the eggplant will change its color.” Yes, the purpose of using vinegar is to change the color of the eggplant.

As a result of these self-assessment opportunities, most participants believed that when the social annotations from L1 viewers confirmed their understanding of the video, they felt motivated and excited, as if they and other L1 viewers agreed with each other. For instance, one MOOC learner in the study said:

(S1) This annotation said: “It’s easy,” and I also said: “Yes, I agree.” I am happy to see that others have the same opinions as me; for example, comments like “It’s easy” and “Looks delicious.”

When disagreeing with or confused about some social annotations, our participants also expressed positive feelings, as these annotations motivated them to watch the videos more often and make sense of these different understandings, largely due to the differences in life and learning experience.

(S4) Others’ comments motivate me to watch the video again and again because I would like to share my feelings. The comments are helpful to me because if someone says that he likes something in the video, it means that “something” must have some advantages, and I have to try it. Let me see why this person said, “I like this.”

Secondly, these embedded social annotations in multimedia learning resources have the potential to create a great degree of learning autonomy for L2 viewers from at least three perspectives. For example, during the data collection period, participants were found to take photos and notes of the social annotations that were considered important or unfamiliar.

The quote below explains their learning goals:

(S1) Some of the comments are important; they make you focus on something. I took photos of some comments, you see? I didn’t take photos in the caption mode, because I remembered the things that the cook used. But I couldn’t remember all the comments, so I took photos. For example, I saw a comment I didn’t understand, I took a photo.

When failing to make sense of these notes and photos, L2 viewers reported that they sought assistance from others, such as texting their friends, asking questions to the researchers and attempting to write down a question to other viewers.

(S9) I think the tofu was ready after it was steamed; why did the chef need to spread hot oil on it? It’s unnecessary. Look, this comment said, “We can eat tofu now,” which means this guy also thinks that we can eat the tofu right after it’s steamed. I think he’s right; no need for hot oil, and I am happy to see that we have the same opinion.

Moreover, our participants reported using more self-regulated learning strategies: note- taking, making cultural references, and linking previous working/learning experiences. These initiatives are particularly crucial for MOOC learners who study uploaded instructional videos in their time, as timely feedback and scaffolding from instructors and peers are not usually available.

(S16) In this comment, this guy also asked: “If sugar can improve the flavor, what is MSG for?” I have the same question. Is it necessary for the cook to use sugar here? When we cook, we usually don’t use sugar, and we especially never mix sugar with other seasonings.

Finally, social annotations appeared to help L2 viewers grow a stronger sense of community and make the instructional video more relevant to their learning and working experience. Regarding language learning, L2 viewers considered social annotations as a useful source for developing their relevant vocabulary knowledge for the video. For instance, one participant felt that the difficulty level of the annotations was easier than the transcript.

(S3) Reading new words in comments is a way to learn Chinese, so I will try to learn as much as possible. The words in captions are more difficult than in comments, and it’s easier to learn new words from them than from captions.

Apart from the development of lexical knowledge, the participants also took the opportunity to learn the writing style of annotations and attempted to be one of the authors of social annotations. In other words, they tried to become a community member who contributed their interpretations of the video.

(S4) Sometimes, as an African, I can’t make proper Chinese comments. For example, I don’t know whether my comment is respectful. However I can learn how to arrange sentences from Chinese viewers’ comments.

Their growing sense of belonging can also be observed through their attempt to expand relevant cultural/contextual references around the video. This attempt motivated them to leave more annotations in the video.

(S9) Some comments are helpful. For example, from the helpful comments, you can see that some people commend the cook, and some recommend other cooking methods. It seems that someone has cooked the same dish before, and in the comment, he tells you the problems you may encounter.

(S2) Interestingly, the Chinese viewers have different opinions, and I read the comments to see what they think about the video. I can also learn how to improve my Chinese expression from the comments. For example, how to use idioms, which you can only find in native speakers’ conversations instead of books. I know some idioms can be used in conversations, but I didn’t know they can also be used in comments.

6. Discussion

To answer two research questions, L2 viewers reported a higher level of engagement and motivation to study the instructional video under the social annotation mode than the other two modes: the linguistic mode and the mode without any visual aid. This is crucially vital for L2 distance learners, who are expected to be more autonomous and self-regulated in the online learning environment with a relatively low level of supervision and support from both instructors and peers (García Botero et al., Citation2019; Moore & Wang, Citation2021). The collected evidence in this study includes the duration of viewing, the number of times pausing the video, and the number of questions and comments made while watching. Inspired by the self- determination theory, it argues that social annotation creates a unique collaborative learning experience for L2 viewers in which they can (1) constantly self-assess and update their understanding of the video; (2) practice their self-regulated learning strategies such as adjusting their learning goals and seeking assistance from other viewers vicariously, and (3) developing a stronger sense of belonging to different communities, such as language learners, contributors to the video annotations, and interpreters of the video.

First of all, the social annotation mode creates opportunities for L2 viewers to compare their understanding with other L1 viewers of the same video. As a result, these self-assessment opportunities motivate L2 viewers to attempt to understand other L1 viewers’ annotations, express agreements, disagreements, and confusions during their video learning experience, and therefore grow their interest in interpreting the video from their learning, working and cultural experience (Dubovi & Tabak, Citation2020). These self- assessments based on multi-comparison practice are particularly important for L2 online learners who do not have access to feedback from external helpers and teachers (Wei et al., Citation2022) nor the capability to interpret the non-context-specific and summative comments from automatic feedback systems. As the study by Yang et al. (Citation2011) suggested, these content-relevant annotations in multimedia learning resources not only help viewers to confirm what they learned, but also help them to better understand the concepts and knowledge by visualizing the connections among them. Lastly, these comparisons also facilitate the L2 viewers to make sense of the video beyond the linguistic level. This is supported by other studies, which suggested that reading content-related annotations in instructional videos from others helps viewers to better link the content to their experience and improve their analytic skills (Zottmann et al., Citation2012).

Secondly, the use of learner autonomy strategies in multimedia learning has been identified as crucial for a higher level of engagement (Martin et al., Citation2018). In the social annotation mode, a greater sense of learner autonomy is evident from L2 viewers’ frequent use of self-directed learning strategies, including seeking assistance from others, adjusting learning goals based on self-assessment results and using diversified strategies to interpret and understand social annotations. The level of learner autonomy or use of self-regulated learning strategies in the social annotation mode appears to be higher than in the linguistic caption mode. Although linguistic captions have long been thought to hold great promise for encouraging L2 viewers’ incidental vocabulary learning, their value in helping them understand or pick up new knowledge from instructional videos has been seriously questioned (Aldera & Mohsen, Citation2013). One possible explanation is that these linguistic- oriented captions fail to help L2 viewers and listeners to better interpret and make sense of the video content based on their previous learning, working and living experiences (Perez et al., Citation2013). On the other hand, the collaborative video learning experience, created by the social annotations from other viewers, successfully builds schema and background knowledge for the L2 viewers vicariously, motivating and facilitating these top-down listening comprehension strategies. It eventually contributes to L2 viewers’ understanding of the video content (Benson, Citation2015).

Finally, making the video content relevant to learners and developing a sense of belonging can motivate L2 distance learners to study instructional videos. In this study, in addition to the identity as a language learner for the instructional video, two types of communities are identified from our interview data: an interpreter of the video, and a contributor to the social annotations in the video. As an interpreter of the video, L2 viewers feel motivated to interpret the video’s social annotations from different perspectives, including personal experience, cultural background, and prior knowledge. In other words, adding unique interpretations to the video from an L2 perspective is motivating in the multimedia learning environment. Meanwhile, as a contributor to social annotations, writing styles can be learned and imitated, and L2 viewers are encouraged to become active contributors or editors of the video, rather than passive viewers or language learners. This finding aligns with other studies, in which active participation in making multimedia learning resources is perceived as a useful teaching pedagogy for L2 learners (Nambiar et al., Citation2017).

7. Conclusion

To conclude this study, statistical analysis on quantified observational data suggests that MOOC learners demonstrated a higher level of engagement (duration of viewing and number of pauses being pressed) and motivation (number of questions being asked and comments being made) in the social annotation mode than in the other two modes. These statistical findings can be explained from the following three perspectives: content and emotion-related social annotations by other L1 viewers can assist in (1) checking their comprehension of the video by comparing their understanding with others; (2) adjusting learning goals at both the linguistic and content level and therefore using self-regulated learning strategies to a greater extent, especially in actively seeking assistance and other learning resources, and (3) relating the video content to the teaching curriculum/personal experience/learning goals, therefore creating a sense of belonging to a larger language learner community as language learners, interpreters and contributors to the video social annotations.

As this is an exploratory study to investigate the influence of social annotations on multimedia language learning from a self-determination theory perspective, the limitations of this study cannot be neglected. Firstly, this study followed the rule of thumb in statistical analysis in sample size by including more than 15 samples in each group in this ANOVA study. Due to the small sample size, the generalizability to other contexts may be compromised. Future study may be benefited from estimating the ideal sample size by using power analysis. Secondly, although it is difficult to conduct a quasi-experimental study in an online environment, further studies can invite a large number of L2 learners to participate in the study by randomly assigning them to three modes in a classroom condition. Thirdly, Sun et al. (Citation2023) emphasized in their recent systematic review on social annotations that past research in this domain often involved brief, one-shot interventions lasting only a short period of time. A more extended intervention period and more diverse learning environments should be included in the research design in future studies. Last, researchers can investigate how different social annotation types may affect L2 viewers’ motivation and comprehension levels in different ways. For example, emotional social annotations may be more understandable for low-proficiency L2 viewers, as the words used in the emotional annotations are usually easier and not much new knowledge is introduced. L2 viewers tend to find a sense of recognition through emotional social annotations rather than explanatory ones.

Disclosure statement

The authors declare there is no Complete of Interest at this study.

Additional information

Funding

This work was supported by Guangdong Provincial Education Science Planning Project (Higher Education Special Project) and Macao Polytechnic University Grants (Grant ID: RP/FCA-08/2023).

Notes on contributors

Shuzhou Lin

Shuzhou Lin is a PhD candidate at University International College, Macau University of Science and Technology at present. She received her master’s degree in Applied Linguistics from South China Normal University. Her research areas include computer assisted language learning and cognitive theory in multimedia learning.

Wei Wei

Wei Wei has a PhD from School of Education, University of Leeds, UK. He is currently working as associate professor at Faculty of Applied Sciences, Macao Polytechnic University. His research interests are in the fields of computer assisted learning and learning-oriented assessment.

References

  • Aldera, A. S., & Mohsen, M. A. (2013). Annotations in captioned animation: Effects on vocabulary learning and listening skills. Computers & Education, 68, 60–75. https://doi.org/10.1016/j.compedu.2013.04.018
  • Araújo, R. D., Brant-Ribeiro, T., Mendonça, I. E. S., Mendes, M. M., Dorça, F. A., & Cattelan, R. G. (2017). Social and collaborative interactions for educational content enrichment. Educational Technology & Society, 20(3), 133–144.
  • Benson, P. (2015). Commenting to learn: Evidence of language and intercultural learning in comments on youtube videos. Language Learning and Technology, 19(3), 88–105.
  • Bonafini, F., Chae, C., Park, E., & Jablokow, K. (2017). How much does student engagement with videos and forums in a MOOC affect their achievement? Online Learning, 21(4), 223–240. https://doi.org/10.24059/olj.v21i4.1270
  • Chen, C. M., & Chen, F. Y. (2014). Enhancing digital reading performance with a collaborative reading annotation system. Computers & Education, 77, 67–81. https://doi.org/10.1016/j.compedu.2014.04.010
  • Chen, G., Zhou, S., & Zhi, T. (2019). Viewing mechanism of lonely audience: Evidence from an eye movement experiment on barrage video. Computers in Human Behavior, 101, 327–333. https://doi.org/10.1016/j.chb.2019.07.025
  • Chen, M. P., Wang, L. C., Zou, D., Lin, S. Y., & Xie, H. (2019). Effects of caption and gender on junior high students’ EFL learning from iMap-enhanced contextualized learning. Computers & Education, 140, 103602. https://doi.org/10.1016/j.compedu.2019.103602
  • Chen, M.-P., Wang, L.-C., Zou, D., Lin, S.-Y., Xie, H., & Tsai, C.-C. (2020). Effects of captions and English proficiency on learning effectiveness, motivation and attitude in augmented-reality-enhanced theme-based contextualized EFL learning. Computer Assisted Language Learning, 33(1-2), 1–25. https://doi.org/10.1080/09588221.2019.1704787
  • Deci, E. L., & Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268. https://doi.org/10.1016/j.chb.2019.07.025
  • Deci, E. L., & Ryan, R. M. (2010). Intrinsic motivation. In: I. Weiner, & W. E. Craighead (Eds.), The Corsini Encyclopedia of Psychology. New Jersey: John Wiley & Songs, p. 2.
  • Dubovi, I., & Tabak, I. (2020). An empirical analysis of knowledge co-construction in YouTube comments. Computers & Education, 156, 103939. https://doi.org/10.1016/j.compedu.2020.103939
  • García Botero, G., Questier, F., & Zhu, C. (2019). Self-directed language learning in a mobile-assisted, out-of-class context: Do students walk the talk? Computer Assisted Language Learning, 32(1-2), 71–97. https://doi.org/10.1080/09588221.2018.1485707
  • Joksimović, S., Dowell, N., Gašević, D., Mirriahi, N., Dawson, S., & Graesser, A. C. (2019). Linguistic characteristics of reflective states in video annotations under different instructional conditions. Computers in Human Behavior, 96, 211–222. https://doi.org/10.1016/j.chb.2018.03.003
  • Joo, Y. J., So, H. J., & Kim, N. H. (2018). Examination of relationships among students’ self- determination, technology acceptance, satisfaction, and continuance intention to use K-MOOCs. Computers & Education, 122, 260–272. https://doi.org/10.1016/j.compedu.2018.01.003
  • Mahdi, H. S. (2017). The use of keyword video captioning on vocabulary learning through mobile-assisted language learning. International Journal of English Linguistics, 7(4), 1–7. https://doi.org/10.5539/ijel.v7n4p1
  • Martin, N., Kelly, N., & Terry, P. (2018). A framework for self-determination in massive open online courses: Design for autonomy, competence, and relatedness. Australasian Journal of Educational Technology, 34(2), 35–55. https://doi.org/10.14742/ajet.3722
  • Mayer, R. E. (2009). Multimedia learning. Cambridge University Press.
  • Messerli, T., & Locher, M. (2021). Humour support and emotive stance in comments on Korean TV drama. Journal of Pragmatics, 178, 408–425. https://doi.org/10.1016/j.pragma.2021.03.001
  • Mohammadhassan, N., Mitrovic, A., & Neshatian, K. (2022). Investigating the effect of nudges for improving comment quality in active video watching. Computers & Education, 176, 104340. https://doi.org/10.1016/j.compedu.2021.104340
  • Mohsen, M. A. (2016). Effects of help options in a multimedia listening environment on L2 vocabulary acquisition. Computer Assisted Language Learning, 29(7), 1220–1237. https://doi.org/10.1080/09588221.2016.1210645
  • Moore, R. L., & Wang, C. (2021). Influence of learner motivational dispositions on MOOC completion. Journal of Computing in Higher Education, 33(1), 121–134. https://doi.org/10.1007/s12528-020-09258-8
  • Murphy, C. A., & Stewart, J. C. (2015). The impact of online or F2F lecture choice on student achievement and engagement in a large lecture-based science course: Closing the gap. Online Learning, 19(3), 91–110. https://doi.org/10.24059/olj.v19i3.536
  • Nambiar, R. M., Nor, N. M., Ismail, K., & Adam, S. (2017). New learning spaces and transformations in teacher pedagogy and student learning behavior in the language learning classroom. 3L the Southeast Asian Journal of English Language Studies, 23(4), 29–40. https://doi.org/10.17576/3L-2017-2304-03
  • Ozdemir, M., Izmirli, S., & Sahin-Izmirli, O. (2016). The effects of captioning videos on academic achievement and motivation: Reconsideration of redundancy principle in instructional videos. Journal of Educational Technology & Society, 19(4), 1–10.
  • Perez, M., Peters, E., & Desmet, P. (2013). Is less more? Effectiveness and perceived usefulness of keyword and full captioned video for L2 listening comprehension. ReCALL, 26(1), 21–43. https://doi.org/10.1017/S0958344013000256
  • Pino-Silva, J. (2007). The video-based short comment writing task. Foreign Language Annals, 40(2), 320–329. https://doi.org/10.1111/j.1944-9720.2007.tb03204.x
  • Plass, J. L., Moreno, R., & Brünken, R. (Eds.). (2010). Cognitive load theory. Cambridge University Press.
  • Sun, C., Hwang, G. J., Yin, Z., Wang, Z., & Wang, Z. (2023). Trends and issues of social annotation in education: A systematic review from 2000 to 2020. Journal of Computer Assisted Learning, 39(2), 329–350. https://doi.org/10.1111/jcal.12764
  • Teng, F. (2020). Vocabulary learning through videos: Captions, advance-organiser strategy, and their combination. Computer Assisted Language Learning, 35(3), 518–550. https://doi.org/10.1080/09588221.2020.1720253
  • Wang, Q., Wen, Y., & Quek, C. L. (2023). Engaging learners in synchronous online learning. Education and Information Technologies, 28(4), 4429–4452. https://doi.org/10.1007/s10639-022-11393-x
  • Wei, W., Cheong, C., Zhu, X., & Lu, Q. (2022). Comparing self-reflection and peer feedback practices in an academic writing task: A student self-efficacy perspective. Teaching in Higher Education. Online Advanced Publication. https://doi.org/10.1080/13562517.2022.2042242
  • Yang, J. C., & Chang, P. (2014). Captions and reduced forms instruction: The impact on EFL students’ listening comprehension. ReCALL, 26(1), 44–61. https://doi.org/10.1017/S0958344013000219
  • Yang, S., Zhang, J., Su, A., & Tsai, J. (2011). A collaborative multimedia annotation tool for enhancing knowledge sharing in CSCL. Interactive Learning Environments, 19(1), 45–62. https://doi.org/10.1080/10494820.2011.528881
  • Yang, Y. (2021). Danmaku subtitling: An exploratory study of a new grassroots translation practice on Chinese video-sharing websites. Translation Studies, 14(1), 1–17. https://doi.org/10.1080/10494820.2011.528881
  • Zhao, Y., & Luo, Y. (2020). Autonomous learning mode based on a four-element teaching design for visual communication course. International Journal of Emerging Technologies in Learning, 15(19), 66–82. https://doi.org/10.3991/ijet.v15i19.17399
  • Zhu, J., Yuan, H., Zhang, Q., Huang, P.-H., Wang, Y., Duan, S., Lei, M., Lim, E. G., & Song, P. (2022). The impact of short videos on student performance in an online-flipped college engineering course. Humanities & Social Sciences Communications, 9(1), 327. https://doi.org/10.1057/s41599-022-01355-6
  • Zottmann, J., Goeze, A., Frank, C., Zentner, U., Fischer, F., & Schrader, J. (2012). Fostering the analytical competency of pre-service teachers in a computer-supported case-based learning environment: A matter of perspective? Interactive Learning Environments, 20(6), 513–532. https://doi.org/10.1080/10494820.2010.539885