ABSTRACT
Although second language (L2) listening assessment has been the subject of much research interest in the past few decades, there remain a multitude of challenges facing the definition and operationalization of the L2 listening construct(s). Notably, the majority of L2 listening assessment studies are based upon the (implicit) assumption that listening is reducible to cognition and metacognition. This approach ignores emotional, neurophysiological, and sociocultural mechanisms underlying L2 listening. In this paper, the role of these mechanisms in L2 listening assessment is discussed and four gaps in understanding are explored: the nature of L2 listening, the interaction between listeners and the stimuli, the role of visuals, and authenticity in L2 listening assessments. Finally, a review of the papers published in the special issue is presented and recommendations for further research on L2 listening assessments are provided.
Introduction
In the past few decades, research has illuminated many issues in second language (L2) listening assessment. The L2 listening constructs have improved in coverage, clarity, and relevance and operationalization methods have gradually grown outside of the boundaries of traditional testing. Three main approaches to defining L2 listening have emerged from the literature: subskill-based, cognitive-based, and attribute-based (Aryadoust & Luo, Citation2022; Buck, Citation2001). Subskills are hypothetical, measurable abilities that create the building blocks of listening (Aryadoust, Citation2020). There are multiple subskill lists consisting of higher-order skills such as the ability to understand the gist of the passage), the ability to draw inferences, etc. (e.g., Aryadoust, Citation2020; Lee & Sawaki, Citation2009; Min & He, Citation2021). While the developers of these lists assert that the subskills have a hierarchy of cognitive difficulty, no consistent results have been found to support the presumed hierarchies. In addition, the internal connections between these subskills are often ignored in modeling L2 listening (Goh & Aryadoust, Citation2015). Due to the application of time-invariant quantitative methods and cross-sectional research designs in subskills-based listening research, the subskills are viewed as undynamic and immutable.
On the other hand, the cognitive-based approach views listening as a dynamic and non-linear process which has significant variations across contexts (Aryadoust & Luo, Citation2022; Imhof, Citation2010). In this approach, there are three mechanisms underlying L2 listening, i.e., pre-comprehension, comprehension, and post-comprehension, although this tripartite is not intended to imply a linear direction (Aryadoust, Citation2019). During pre-comprehension, the listener is involved in perception and recognition of the auditory stimuli (Aryadoust, Citation2019). During comprehension, the listener builds a surface representation of words, parses them into propositional and macro-discourse units called the textbase (mainly through bottom-up or literal processing; Townsend & Bever, Citation2001), and finally links the units to generate a global mental representation called the situation model (through top-down or inferential processing; Kintsch, Citation1998). There is volatility and non-linearity in these processes. If a segment of the stimuli is not perceived, recognized, or comprehended, the listener would fill in the gap through top-down processing (e.g., using his/her own world knowledge to fill in the gap).
Finally, the attribute-based approach to L2 listening recognizes the effect of the stimuli and non-cognitive attributes such as text difficulty and academic background of the listener on their performance (Aryadoust & Luo, Citation2022). Under the tenets of this approach, attributes are understood to have a role in L2 listening performance and test fairness. For example, the texts containing technical and multisyllabic words or lengthy clauses are likely to be more difficult to process (Flowerdew, Citation1994) and certain listener attributes such as shared first languages can interact with the test performance of listeners and advantage particular test-taking groups (Harding, Citation2012).
According to Aryadoust and Luo (Citation2022), although the depth and scope of these research streams have extended over the past decades, these advances have also made L2 listening researchers aware of some of the unknown territories which have never been ventured into. In this paper, I will discuss some of these under-researched areas, present a review of the papers published in the special issue, and propose possible research directions.
The unknown and the under-researched in L2 listening assessment
The field of L2 listening assessment has grown in sophistication and complexity over the course of several decades (Vandergrift, Citation2007). Researchers have made considerable progress in understanding the nature of listening, thought processes of test takers (e.g., Aryadoust et al., Citation2022; Field, Citation2009), functionality of test items (Liao & Yao, Citation2021; Park, Citation2008), and the effect of listeners’ cognitive and non-cognitive attributes on their comprehension performance (e.g., Bril et al., Citation2021; Canaran et al., Citation2020; Ghorbani Nejad & Farvardin, Citation2019; Goh, Citation2000; Janusik & Varner, Citation2020). In addition, the integration of innovative technologies such as eye tracking and neuroimaging to understand gaze behaviors and brain activation during listening has opened an exciting new front in L2 listening research (e.g., Aryadoust et al., Citation2022; Batty, Citation2015; Suvorov, Citation2015). This surge of research has also allowed us to become aware of our knowledge gaps. I believe, to make more progress in our understanding of L2 listening and its assessment, the unknown about L2 listening (those that we have become aware of) should first be spelled out and discussed.
There are four major gaps in understanding of L2 listening: (1) the nature of L2 listening, (2) the interaction between listeners and stimuli, (3) the role of visuals, and (4) authenticity in L2 listening assessment. This list should be viewed as a preliminary step in articulating research gaps in L2 listening and is, therefore, by no means an exhaustive and complete description of the challenges that we encounter in the field.
The nature of L2 listening
As discussed above, researchers have investigated some of the cognitive and non-cognitive aspects of listening such as subskills, cognitive processes, and attributes. While our understanding of L2 listening has improved, there remain a multitude of mechanisms that have not been problematized. First, a distinction should be made between the nature of listening (what it is or ontology) and listening constructs (what can be known or epistemology). It appears to me that L2 listening assessment research is mostly concerned with the listening construct(s) while taking the nature of L2 listening for granted. Through using listening tests (especially “one-shot” tests that aim to measure proficiency), researchers can learn only little about the nature of L2 listening.
L2 listening construct(s) should be designed based on a multifarious theory of L2 listening that defines listening under non-test conditions or target language use (TLU) situations. The aim of testing is to generalize from test scores to the TLU situation, and unless we have a clear understanding of listening in the TLU domain (its ontology), our assessments will lack generalizability (Alderson, Citation2000). As Alderson (Citation2000, p. 117) reminds us, “[w]hat matters in the end is the extent to which we can generalize from assessment procedures or tests to [listening] performance in the real world,” and if L2 listening theories are deficient or erroneous or not representative of the different aspects of listening in the TLU domain, so will be the test scores.
A reliable and precise theory of L2 listening would consist of four dimensions: behavior, cognition, emotion, and neurophysiology. Listeners’ behavior is the first and most visible dimension of listening and includes nonverbal communication during listening (gesture and body language including haptics, proxemics, facial expressions, eye contact etc.), backchanneling, and paralinguistic features of voice (such as tone, loudness, inflection, and pitch) (see, Giri, Citation2009; Burgoon et al., Citation2016, for reviews).
Listener’s cognition refers to his/her thought processes during information take-up and transformation, comprehension (Buck, Citation2001), and storing and retrieving. As discussed above, top-down and bottom-up processing is fairly well-researched in the L2 listening field. A recent review of listening processes and subskills revealed that most researchers who investigated adult L2 learners focused exclusively on coarse-grained comprehension processes and subskills, whereas the dominant approach in first language (L1) and children’s listening research was the examination of pre-comprehension processes such as phoneme and word recognition (Aryadoust, Citation2020). On the other hand, the storage and retrieval processes are not typically included in L2 listening constructs, although they do seem to be a component of listening in real-life listening in different TLU domains. In their listening-response model, Bejar et al. (Citation2000) allocated a role to learning, as a product of academic listening.
Emotion and neurophysiology in L2 listening are by far the least researched dimensions of it. Whereas emotion research in L2 listening is focused chiefly on anxiety (e.g., Bekleyen, Citation2009; Chen et al., Citation2021; Elkhafaifi, Citation2005; Kimura, Citation2017; Yaylı, Citation2017), emotions have a wider definition and coverage in (neuro)psychology; they are psychological states that are basically caused by change in humans’ neurophysiological states (Panksepp, Citation2005). Whereas emotions and feelings are used interchangeably in L2 listening research, we should distinguish them from each other. Emotions are physiological responses to external or internal stimuli (Cabanac, Citation2002) which include visceral and motor components (Dantzer, Citation1989), while feelings are the conscious experience and perception of the neurophysiological emotions (LeDoux, Citation2012). From this vantage point, methods to measure emotions and feelings have significant differences; emotions are measured and quantified using technologies that can access and observe neurophysiological mechanisms of listeners such as eye tracking, neuroimaging, and galvanic skin response, whereas feelings are measured through using “subjective” methods such as surveys and verbal reports. It might be said that the anxiety research in L2 listening assessment exclusively addresses one side product of emotions, that is, the feeling of anxiety.
Like cognition and behavior in L2 listening, emotions have underlying neurological mechanisms, which to my knowledge have (almost) never been problematized and investigated in L2 (listening) research. Specifically, the limbic system which is involved in emotional and behavioral responses to the stimuli (LaBar & Cabeza, Citation2006) has been a permanent absentee in L2 listening (assessment) research. The importance of this potential research front lies in the relationship between logical thinking during comprehension and emotional responses. Research in neuroscience shows that the brain regions regulating logical thinking in the frontal lobe (Donoso et al., Citation2014) – which are essential in literal and inferential listening (Aryadoust et al., Citation2022) – and the limbic system regulating emotions jointly act upon the incoming stimuli to comprehend them and make socioculturally acceptable decisions (Bechara et al., Citation2000). Therefore, it is plausible that L2 listening comprehension is a product of not only language and logical thinking but also the emotional mechanisms that help regulate and balance logical thinking. This invites an inevitable question concerning the nature of L2 listening test scores, i.e., how much of the variance in test scores is caused by the emotional state of the test takers and how much of it is due to their logical and language-specific neurocognitive processes? And if we find reproducible and verifiable answers to this question, what implications would it have for the uses of L2 listening test scores?
Finally, recent research on L2 listening assessments has shown that neurocognitive mechanisms underlying listening can significantly explain the cognitive processes and cognitive load of listeners (Aryadoust & Luo, Citation2022; Aryadoust et al., Citation2020). Specifically, the activation of the prefrontal cortex and parts of the perisylvian region are associated with test-taking behaviors and can differentiate between high-performing and low-performing listeners (Aryadoust et al., Citation2020). However, test scores and neurocognitive processes of listenrs do not always resonate with each other (Aryadoust et al., Citation2020). For example, while test scores showed no significant differences between groups of listening test takers in Aryadoust et al. (Citation2020), brain activation was significantly different across the groups. This suggests that relying on test scores as sole representatives of L2 listening proficiency can be restrictive and at odds with the neuroscience of listening. Eye tracking research has also shown that psychometrically valid dimensions underlying test score data are insensitive to construct-irrelevant variations in test takers’ response process that can be captured by examining their gaze behavior (e.g., Holzknecht et al., Citation2021). In all, over-relying on test scores and their psychometric features results in underrepresenting the neurocognitive, emotional, and behavioral dimensions of L2 listening.
Based on the argument advanced above, I propose that the preceding dimensions of L2 listening should be incorporated in assessment research and theory development. Without having an interdisciplinary understanding of the nature of L2 listening, it will be difficult to develop authentic and reliable tests and our understanding of the sources of variation in test performance will be restricted to the cognitive processes of listeners (specifically bottom-up and top-down processing).
The interaction between listeners and stimuli
Previous research has explored features of stimuli in listening tests and their effects on comprehension. The stimuli in this paper refers to the written and auditory stimuli, i.e., test items and the listening passage. The written stimuli are viewed from two perspectives: (1) types of test items such as multiple choice (MC) and open-ended questions, and (ii) the method of presentation. Research shows that using different item formats can have a significant effect on the cognitive processes of test takers (e.g., Rupp et al., Citation2006). Research on MC and open-ended questions items in L2 listening assessment has further shown that MC items are typically easier than open-ended items (In’nami, Citation2005). For example, in an experimental study, Cheng (Citation2004) found that test takers achieved the highest scores in MC items compared with open-ended items, which resonates with a meta-analysis by In’nami and Koizumi (Citation2009).
In addition, the method of presentation refers to when and how the test items are presented to test takers. Research shows that when the test items are presented simultaneously with the auditory stimuli and L2 listeners are involved in virtually concurrent listening, reading, and answering, their listening processes will be “shallower” such that they will not be able to store the comprehended information in their long-term memory and/or retrieve it after the listening task (Field, Citation2009). On the other hand, if the test items are presented after the listening passage is played, listeners would be more likely to remember the passage and create a coherent situation model (Field, Citation2009); this mode of presentation would also involve more cognitive load and brain activation especially in the prefrontal cortex and the perisylvian region (Aryadoust et al., Citation2022).
Another dimension of the listening stimuli is their linguistic, discoursal, and generic features. The effect of multiple linguistic features such as lexical and syntactic sophistication and discoursal features on cognitive challenges of listening have been examined in previous research (Buck & Tatsuoka, Citation1998; Freedle & Fellbaum, Citation1987; Freedle & Kostin, Citation1991; Révész & Brunfaut, Citation2013). In addition, research shows that the conversational genre is typically easier for L2 listeners to comprehend than lengthy pieces of academic discourse (Aryadoust, Citation2013). Nevertheless, the genre effect is under-researched and the results of linguistic and discoursal feature studies are far from conclusive. Studies examining the relationship between listening performance and auditory features of the stimuli such as scriptedness are also few and far between (e.g., Ockey & Wagner, Citation2018).
It may be said that L2 listening performance is easily affected by change in the features of written and auditory stimuli. However, the aforementioned research findings need to be replicated across different contexts and L2s. Particularly, the field of listening assessment should adopt experimental designs to control for the effect of extraneous and intervening variables. Recent promising research in this direction includes Aryadoust et al. (Citation2022), Kormos et al. (Citation2019), O’Grady (Citation2021), and Wagner (Citation2014), wherein the authors managed to control for different textual and listener-related features to examine the effect of their target variables on listening test performance.
The role of visuals
While most of the traditional L2 listening frameworks and assessments were founded upon a conceptualization that did not recognize the role of visuals in listening, an increasingly larger number of assessments have adopted visuals in their design. In an early study, Bejar et al. (Citation2000) categorized visuals into content visuals (visuals that provide information apropos of the content of the auditory stimuli) and situational or context visuals (visuals that show the context or location in which the interaction or communication happened). In addition, as early as the 1980s, researchers included videos in the design of L2 listening tests such as Cubilo (Citation2017), Li (Citation2016), Ockey (Citation2007), Parry and Meredith (Citation1984), Suvorov (Citation2009), and Wagner (Citation2008). Despite this, Aryadoust and Luo’s (Citation2022) study showed that many of the L2 listening assessments that they reviewed did not use any visual components, which points to a wide research gap in the field.
Scholars have viewed the use of visuals favorably and as a way to improve the authenticity of L2 listening tests. For example, Shin (Citation1998) found that video lectures tend to maximize authenticity as they resemble real-life lectures delivered at university lecture halls, which is consistent with another study by Ginther (Citation2002). This is a plausible argument, since language for academic purposes is often heavily intertwined with visuals, including PowerPoint slides, lecturers’ body language, and in some cases videos. Thus, decontextualizing L2 listening by excluding visuals would divest it from its authentic features.
In sum, I urge future researchers to investigate the effect of visuals on the functionality of L2 listening assessments and the interactions between test takers and visuals. With the ever-growing integration of technology in day-to-day life, conceptualizing the TLU domains and, accordingly, the L2 listening construct without giving a role to visuals would jeopardize the dynamic nature of the construct and the authenticity of assessments.
Authenticity in L2 listening assessment/materials
Bachman (Citation1991) defined authenticity in terms of the resemblance of test tasks with TLU tasks (situational authenticity) as well as the correspondence between the thought processes of test takers under test and non-test conditions (interactional authenticity). In the L2 listening field, authenticity has been defined more specifically and in terms of the use of kinesics in test materials (Kellerman, Citation1992), visuals (Ockey, Citation2007), linguistic and discoursal features (Flowerdew, Citation1994; Hansen & Jensen, Citation1994), and features of prosody and pronunciation (Ockey & Wagner, Citation2018; Wagner, Citation2014). For example, Hansen and Jensen (Citation1994, p. 249) suggested that the features of authentic academic lecture discourse consist of “restatement, paraphrasing, pauses, pacing, and a decreased syntactic complexity to control the density of propositions.” Relatedly, Flowerdew (Citation1994) detailed the linguistic and discoursal features of academic lecture listening that differentiate it from other spoken genres.
Wagner (Citation2014) viewed L2 leaners’ exposure to a variety of accents as a key feature of the authentic L2 listening tests. He stated “[i]t seems unlikely that L2 learners will interact only with speakers of the standard variety, and thus the lack of exposure to other varieties in the classroom certainly contributes to the phenomenon that these learners often experience of not being able to understand real world spoken English encountered outside the classroom” (p. 295). He further argued that features of authentic speech such as hesitations, connected speech, and high speed of delivery should be included in listening assessments and teaching materials, since they familiarize learners with real-world language as opposed to contrived speeches and conversations to which learners are almost never exposed in TLU domains.
The reasons for the tendency toward using nonauthentic materials in the L2 field are not well understood and need further research. Wagner (Citation2014) proposed several possible reasons such as material developers’ and publishers’ tendency to include “clearer” and “standard” listening materials, challenges in creating appropriate and meaningful tasks, learners’ positive attitudes toward nonauthentic materials (e.g., Kmiecik & Barkhuizen, Citation2006), and some scholars’ belief that the use of authentic materials is not always possible or plausible (e.g., Richards, Citation2006). In addition, Wagner (Citation2014, p. 298) rightly stressed the role of commercialized test developers in perpetuating the use of nonauthentic materials in language testing. He argued that the problem of inauthenticity of listening test materials “is exacerbated and perpetuated by the types of spoken texts that are used in tests of listening comprehension, especially large-scale, high-stakes standardized tests of L2 listening ability.” Due to the considerable washback of commercialized tests in test preparation classes and language programs, the overwhelming majority of L2 learners are merely exposed to “scripted and polished spoken texts (spoken by British or American standard English speakers) that have few of the phonological and textual characteristics of unplanned spoken discourse” (Wagner, Citation2014, p. 297).
In sum, it is essential to investigate the various aspects of authenticity in L2 listening assessments and their effect on the behavioral, cognitive, emotional, and neurocognitive dimensions of L2 listening. The field of L2 listening assessment is still lacking a theory of the TLU domains and their features. Corpus linguistics has much to offer to L2 listening assessment in this regard. Through the lenses of corpus linguistics, L2 listening scholars will be able to better understand the linguistic, generic, and discoursal features of listening in the TLU domains and represent them in listening assessments.
Papers in the special issue
The special issue of The International Journal of Listening published in March 2022 aims to address (some of) the aforementioned gaps in understanding of the nature and assessment of L2 listening. There are seven papers in the special issue: the present introductory paper, five research papers, and a closing paper by Franz Holzknecht. In what follows, I will review the papers and highlight some of their implications for L2 listening assessment. The order of presentation below is thematic and may not follow the order in which the papers are published.
First, Yo In’nami and Rie Koizumi examined the metacognitive dimension of the L2 listening construct of two listening proficiency tests administered to different samples. According to In’nami and Koizumi, there are inconsistent findings in the extant literature about the contribution of the construct of metacognitive awareness to L2 listening, and they set out to investigate this inconsistency. Applying random forests analysis – a machine learning method used in classification and regression problems – In’nami and Koizumi found that test takers’ awareness toward “person knowledge” strategies predicted the variance in the listening ability of different cohorts of test takers, while other dimensions of metacognitive awareness either predicted test takers’ performance in only one of the cohorts or neither. In addition, they found that “person knowledge,” “mental translation,” and “directed attention” predicted listening comprehension in both the tests that they employed. The study offers implications for reproducibility of results and verifiability of the theory of metacognitive awareness in L2 listening assessments. Reproducibility holds when the results of one study are fully or partially replicated in a different context. In’nami and Koizumi’s findings show that the relationship between metacognitive awareness and L2 listening depends on several factors such as test type and the features of the cohort. The authors highlighted the limitation of using questionnaires in metacognitive research, too. They argued that “questionnaires only assess learners’ perceptions and not their actions […] and may not completely reflect their behaviors” (p. 15). They proposed the use of technology such as eye tracking to supplement the results of survey-based research.
In another study of the special issue, Sarah Sok and Hye Won Shin extended the design of In’nami and Koizumi’s investigation to young English learners and examined the relationship among metacognitive awareness, language aptitude, and L2 listening. Sok and Shin applied multiple instruments to measure the constructs and determine their relationships. Using ordinary linear squares (OLS) regression analysis and mediation analysis, the authors found that aptitude was a strong and direct predictor of L2 listening measures, while metacognitive awareness as a whole predicted a smaller amount of variance in the listening test scores. In addition, a small portion of the effect of aptitude on L2 listening was partially mediated by metacognitive awareness. Sok and Shin called for further research on the reproducibility of the results across other populations of L2 listeners. Like In’nami and Koizumi, they also recommended the application of “objective” methods of observation such as eye tracking and neuroimaging to investigate the predictors of L2 listening. Finally, they underscored the pedagogical implications of their study, stressing that “a period of long-term, high-quality, explicit L2 instruction may help to mitigate the impact of aptitude on L2 listening proficiency, such that even students with low aptitude can achieve successful outcomes” (p. 11). They suggested that, although previous researchers advised against the promotion and teaching of mental translation strategies in L2 listening, “translating from the L2 to the L1 may be a necessary first step that can aid, rather than impede, the development of L2 semantic knowledge and processing abilities” (p. 11).
The studies by In’nami and Koizumi and Sok and Shin suggest that the L2 listening construct is related to metacognitive awareness and aptitude, but the amount of dependence among the constructs is mediated by test- and listener-specific factors. This has two implications for research in L2 listening assessment. First, it seems that we are still far from a verified theory of L2 listening which would allow for accurately predicting the listening mechanisms of test takers and language learners under different conditions. This gap is supposedly due to the effect of “context” on assessments. Recognizing the veracity of this assumption, I believe that context should be clearly defined and operationalized in L2 listening research. First, the constituents of context should be identified and their effects be investigated in controlled experiments (e.g., participants and their cognitive and non-cognitive characteristics, test features, the TLU situation, etc.). In addition, the interaction between the constituents should also be examined via moderation and mediation analysis (see, Collier, Citation2020). The second implication of the two studies is that the L2 listening ability should be viewed as a construct situated within a broader nomological network or nomothetic span (Cronbach & Meehl, Citation1955; Embretson, Citation1998), within which listening is affected/predicted by various factors such as metacognition and aptitude. As empirical evidence and scientific knowledge about the constructs in this network expands, the precision of predictions about listening performance enhances.
Next, Ryoko Fujita’s study investigated the role of noise in speech in L2 listening assessment. Fujita adapted the Speech-Perception-in-Noise (SPIN) test (Kalikow et al., Citation1977) and designed an experiment with four conditions having different signal-to-noise ratios (SNRs) (0, 5, 10, 15). The results confirmed that comprehension was significantly affected by the presence of background noise and listeners switched to top-down prediction and inference-making and used contextual information at different noise levels as a compensatory strategy. The participants’ reliance on contextual clues maximized when they were exposed to the highest amount of noise (i.e., SNR = 15), while the tolerance of noise varied significantly across the participants. Based on the results, Fujita suggested that “in using listening materials with background noise, instructors should be aware that noise affects learners’ listening comprehension processes differently” (p. 17).
Fujita’s study speaks to the issue of authenticity in L2 listening assessment. As Wagner (Citation2014) argued, authentic listening passages are not as polished as scripted listening materials that are commonly used in commercialized testing or language learning programs. Spoken language in real-life/TLU situations includes disfluency, noise, hesitation etc. and test developers and material designers should aim to expose learners to the materials that contain all the nuts and bolts of listening tasks. Fujita recognized this fact and stressed that “[f]or more authentic listening materials, test or material developers should consider adding background noise to the listening texts. This is especially important in EFL settings, where it may be important for learners to be habituated to listening materials with some noise to prepare them for listening comprehension in authentic contexts full of background noise” (p. 17).
The last two studies of the special issue addressed issues surrounding multimodality in L2 listening assessment. In the first study, Ruslan Suvorov and Shanshan He pointed to a growing consensus among scholars on the application of multimodality in L2 listening tests. They set out to review various methodological features of 45 studies that used multimodal stimuli in listening, particularly their “research aims, research designs, data collection and analysis methods, study and participant characteristics, design characteristics of L2 listening assessment instruments, and test administration procedures” (p. 1). They found that most studies compared the effect of different visual stimuli and/or audio materials in “the paper-based delivery format and academic lectures” (p. 16) and applied quantitative methods to examine the stimuli’s effect. By contrast, only five studies examined the interaction between listeners and the visuals or their response processes. The limitation of the studies reviewed, as Suvorov and He argued, is that “[s]uch heavily product-focused research, unfortunately, ignores the importance of response processes, test-taking strategies, and––most importantly––L2 learners’ actual interaction with visuals during the listening assessment” (p. 15).
In the second study addressing multimodality in L2 listening, Anna von Zansen, Raili Hilden, and Emma Laihanen used Rasch measurement to examine differential item functioning (DIF) across gender in a nation-wide multimodal listening test. The results of the study showed that the presence of multimodal materials did not minimize the psychometric unidimensionality of the test, suggesting that “the multimodal listening test seems to measure processes related to the same trait” (p. 16). On the other hand, DIF in favor of males and females was identified in several test items. Like the other studies in the special issue, von Zansen et al. suggested that future research should leverage modern technology (e.g., eye tracking) to examine the thought and response processes of test takers and their connection with psychometric features of listening tests.
The studies by Suvorov and He and von Zansen suggest that there is a lack of a framework and working guidelines for the generation and incorporation of visuals in L2 listening tests. There is also a dearth of experimental research investigating the effect of multimodality in L2 listening tests. Future researchers should consider examining the interaction between listeners’ characteristics and test items that rely on different modes of presentation (e.g., audio versus audio-visuals modalities). As an inevitable dimension of modern L2 listening tests, the inclusion of multimodality in tests should be informed by clear guidelines to help maximize the validity of the tests.
In the epilogue to the special issue, Franz Holzknecht reviewed the papers in this issue of The International Journal of Listening positively and called for the application of technology in future L2 listening research. He indicated that L2 listening researchers “should also think about the application of advanced methods such as neuroimaging techniques, and first studies in this realm show promising results […]” (p. #). Holzknecht also suggested that cross-disciplinary research offers a more effective paradigm to keep pace with the demands of a rapidly changing world.
Final remarks
The nature and assessment of L2 listening continue to draw researchers’ attention in the years to come. As societies become more integrated with technology and artificial intelligence, the relevance and coverage of many previous conceptualizations of the L2 listening construct would diminish. In this context, the ever-increasing need for upgrading the scientific understanding of the nature of L2 listening and its assessment becomes evident and compelling. As suggested in this paper, we should endeavor to formulate a multidimensional L2 listening theory consisting of the behavioral, cognitive, emotional, and neurocognitive aspects of listening. In addition, it is important to investigate the interaction between listeners’ features and stimuli, the role of multimodality, and authenticity in L2 listening assessment. To address these issues, I have suggested using experimental designs and nomothetic spans, which are underused in the L2 listening assessment research. These designs would allow researchers to optimize the reproducibility of research findings across different contexts and move the field toward further theoretical accuracy and empirical replicability. I hope that this special issue of The International Journal of Listening will encourage further robust research on the theory and construct of L2 listening and its assessment.
References
- Alderson, J. C. (2000). Assessing reading. Cambridge University Press.
- Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Cambridge Scholars Publishing.
- Aryadoust, V. (2019). An integrated cognitive model of comprehension. International Journal of Listening, 33(2), 71–100. https://doi.org/https://doi.org/10.1080/10904018.2017.1397519
- Aryadoust, V. (2020). A review of comprehension subskills: A Scientometrics perspective. System, 88, 102180. https://doi.org/https://doi.org/10.1016/j.system.2019.102180
- Aryadoust, V., Foo, S., & Ng, L. Y. (2022). What can gaze behaviors, neuroimaging data, and test scores tell us about test method effects and cognitive load in listening assessments? Language Testing, 39(1), 56–89. https://doi.org/https://doi.org/10.1177/02655322211026876
- Aryadoust, V., & Luo, L. (2022). The typology of the second language listening construct: A systematic review. [Manuscript submitted for publication]. National Institute of Education, Nanyang Technological University.
- Aryadoust, V., Ng, L. Y., Foo, S., & Esposito, G. (2020). A neurocognitive investigation of test methods and gender effects in a computerized listening comprehension test. Computer Assisted Language Learning. First Online Publication. https://doi.org/https://doi.org/10.1080/09588221.2020.1744667
- Bachman, L. F. (1991). What does language testing have to offer? TESOL Quarterly, 25(4), 671–704. https://doi.org/https://doi.org/10.2307/3587082
- Batty, A. O. (2015). A comparison of video-and audio-mediated listening tests with many-facet Rasch modeling and differential distractor functioning. Language Testing, 32(1), 3–20. https://doi.org/https://doi.org/10.1177/0265532214531254
- Bechara, A., Damasio, H., & Damasio, A. (2000). Emotion, decision making and the orbitofrontal cortex. Cerebral Cortex, 10(3), 295–307. https://doi.org/https://doi.org/10.1093/cercor/10.3.295
- Bejar, I., Douglas, D., Jamieson, J., Nissan, S., & Turner, J. (2000). TOEFL 2000 listening framework: A working paper (Report No. RM-00-07). Educational Testing Service. https://www.ets.org/Media/Research/pdf/RM-00-07
- Bekleyen, N. (2009). Helping teachers become better English students: Causes, effects, and coping strategies for foreign language listening anxiety. System, 37(4), 664–675. https://doi.org/https://doi.org/10.1016/j.system.2009.09.010
- Bril, M., Gerrits, A., & Visser, M. (2021). The effects of working memory capacity and L2 proficiency on processing morphosyntactic violations in Korean as a second language: A self-paced listening study. International Journal of Listening. First Online Publication. https://doi.org/https://doi.org/10.1080/10904018.2021.1992281
- Buck, G. (2001). Assessing listening. Cambridge University Press.
- Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing, 15(2), 119–157. https://doi.org/https://doi.org/10.1177/026553229801500201
- Burgoon, J. K., Guerrero, L. K., & Floyd, K. (2016). Introduction to nonverbal communication. Routledge.
- Cabanac, M. (2002). What is emotion? Behavioural Processes, 60(2), 69–83. https://doi.org/https://doi.org/10.1016/S0376-6357(02)00078-5
- Canaran, Ö., Bayram, I., Doğan, M., & Baturay, M. H. (2020). causal relationship among the sources of anxiety, self-efficacy, and proficiency in L2 listening. International Journal of Listening, First Online Publication. https://doi.org/https://doi.org/10.1080/10904018.2020.1793676
- Chen, X., Lake, J., & Padilla, A. M. (2021). Grit and motivation for learning English among Japanese university students. System, 96, 102411. https://doi.org/https://doi.org/10.1016/j.system.2020.102411
- Cheng, H.-F. (2004). A comparison of multiple-choice and open-ended response formats for the assessment of listening proficiency in English. Foreign Language Annals, 37(4), 544–553. https://doi.org/https://doi.org/10.1111/j.1944-9720.2004.tb02421.x
- Collier, J. E. (2020). Applied structural equation modeling using AMOS: Basic to advanced techniques. Routledge.
- Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/https://doi.org/10.1037/h0040957
- Cubilo, J. (2017). Video-mediated listening passages and typed note-taking: Examining their effects on examinee listening test performance and item characteristics ( Publication No. 10757729) [Doctoral dissertation, University of Hawaii at Mānoa]. ProQuest Dissertations Publishing.
- Dantzer, R. (1989). The psychosomatic delusion. The Free Press.
- Donoso, M., Collins, A., & Koechlin, E. (2014). Foundations of human reasoning in the prefrontal cortex. Science, 344(6191), 1481–1486. https://doi.org/https://doi.org/10.1126/science.1252254
- Elkhafaifi, H. (2005). Listening comprehension and listening anxiety in the Arabic language classroom. The Modern Language Journal, 89(2), 206–220. https://doi.org/https://doi.org/10.111/j.1540-4781.2005.00275.x
- Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396. https://doi.org/https://doi.org/10.1037/1082-989X.3.3.380
- Field, J. (2009). The cognitive validity of the lecture-based question in the IELTS listening paper. In P. Thompson (Ed.), International english language testing system (IELTS) research reports 2009 (Vol. 9, pp. 17–65). British Council and IELTS Australia. https://www.ielts.org/-/media/research-reports/ielts_rr_volume09_report1.ashx
- Flowerdew, J. (1994). Academic listening: Research perspectives. Cambridge University Press.
- Freedle, R., & Fellbaum, C. (1987). An exploratory study of the relative difficulty of TOEFL’s listening comprehension items. In R. Freedle & R. Duran (Eds.), Cognitive and linguistic analyses of test performance (pp. 162–192). Ablex.
- Freedle, R., & Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages (ETS Research Report No. RR-91-29). Educational Testing Service.
- Ghorbani Nejad, S., & Farvardin, M. T. (2019). Roles of general language proficiency, aural vocabulary knowledge, and metacognitive awareness in L2 learners’ listening comprehension. International Journal of Listening, First Online Publication. https://doi.org/https://doi.org/10.1080/10904018.2019.1572510
- Ginther, A. (2002). Context and content visuals and performance on listening comprehension stimuli. Language Testing, 19(2), 133–167. https://doi.org/https://doi.org/10.1191/0265532202lt225oa
- Giri, V. N. (2009). Nonverbal communication theories. In S. W. Littlejohn & K. A. Foss (Eds.), Encyclopedia of communication theory (pp. 690–694). Sage Publication.
- Goh, C. M. (2000). A cognitive perspective on language learners’ listening comprehension problems. System, 28(1), 55–75. https://doi.org/https://doi.org/10.1016/S0346-251X(99)00060-3
- Goh, C., & Aryadoust, V. (2015). Examining the notion of listening sub-skill divisibility and its implications for second language listening. International Journal of Listening, 29(3), 109–133. https://doi.org/https://doi.org/10.1080/10904018.2014.936119
- Hansen, C., & Jensen, C. (1994). Evaluating lecture comprehension. In J. Flowerdew (Ed.), Academic listening: Research perspective (pp. 241–268). Cambridge University Press.
- Harding, L. (2012). Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective. Language Testing, 29(2), 163–180. https://doi.org/https://doi.org/10.1177/0265532211421161
- Holzknecht, F., McCray, G., Eberharter, K., Kremmel, B., Zehentner, M., Spiby, R., & Dunlea, J. (2021). The effect of response order on candidate viewing behaviour and item difficulty in a multiple-choice listening test. Language Testing, 38(1), 41–61. https://doi.org/https://doi.org/10.1177/0265532220917316
- Imhof, M. (2010). What is going on in the mind of the listener? –The cognitive psychology of listening. In A. D. Wolvin (Ed.), Listening and human communication in the 21st century (pp. 97–126). Blackwell.
- In’nami, Y. (2005). The effects of task types on listening test performance: A retrospective study. JLTA (Japan Language Testing Association) Journal, 8, 1–20. https://doi.org/https://doi.org/10.20622/jltaj.8.0_1
- In’nami, Y., & Koizumi, R. (2009). A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing, 26(2), 219–244. https://doi.org/https://doi.org/10.1177/0265532208101006
- Janusik, L. A., & Varner, T. (2020). (Re)Discovering metacognitive listening strategies in L1 contexts: What strategies are the same in the L1 and L2 context? International Journal of Listening, First Online Publication. https://doi.org/https://doi.org/10.1080/10904018.2020.1833724
- Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. The Journal of the Acoustical Society of America, 61(5), 1337–1351. https://doi.org/https://doi.org/10.1121/1.381436
- Kellerman, S. (1992). I see what you mean: The role of kinesic behavior in listening and implications for foreign and second language learning. Applied Linguistics, 13(3), 239–257. https://doi.org/https://doi.org/10.1093/applin/13.3.239
- Kimura, H. (2017). Foreign language listening anxiety: A self-presentational view. International Journal of Listening, 31(3), 142–162. https://doi.org/https://doi.org/10.1080/10904018.2016.1222909
- Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge University Press.
- Kmiecik, K., & Barkhuizen, G. (2006). Learner attitudes towards authentic and specially prepared listening materials: A mixed message? TESOLANZ Journal, 14, 1–15. https://www.eajournals.org/wp-content/uploads/The-Perspectives-and-Effectiveness-of-Authentic-Materials-in-Listening-Comprehension-of-Saudi-University-Students.pdf
- Kormos, J., Košak Babuder, M., & Pižorn, K. (2019). The role of low-level first language skills in second language reading, reading-while-listening and listening performance: A study of young dyslexic and non-dyslexic language learners. Applied Linguistics, 40(5), 834–858. https://doi.org/https://doi.org/10.1093/applin/amy028
- LaBar, K., & Cabeza, R. (2006). Cognitive neuroscience of emotional memory. Nature Reviews Neuroscience, 7(1), 54–64. https://doi.org/https://doi.org/10.1038/nrn1825
- LeDoux, J. (2012). Rethinking the emotional brain. Neuron, 73(4), 653–676. https://doi.org/https://doi.org/10.1016/j.neuron.2012.02.004
- Lee, Y.-W., & Sawaki, Y. (2009). Application of three cognitive diagnosis models to ESL reading and listening assessments. Language Assessment Quarterly, 6(3), 239–263. https://doi.org/https://doi.org/10.1080/15434300903079562
- Li, C.-H. (2016). A comparative study of video presentation modes in relation to L2 listening success. Technology, Pedagogy and Education, 25(3), 301–315. https://doi.org/https://doi.org/10.1080/1475939X.2015.1035318
- Liao, L., & Yao, D. (2021). Grade-related differential item functioning in general English proficiency test-kids listening. Frontiers in Psychology, 12, 767244. https://doi.org/https://doi.org/10.3389/fpsyg.2021.767244
- Min, S., & He, L. (2021). Developing individualized feedback for listening assessment: Combining standard setting and cognitive diagnostic assessment approaches. Language Testing, First Online Publication. https://doi.org/https://doi.org/10.1177/0265532221995475
- O’Grady, S. (2021). Adapting multiple-choice comprehension question formats in a test of second language listening comprehension. Language Teaching Research, First Online Publication. 136216882098536. https://doi.org/https://doi.org/10.1177/1362168820985367
- Ockey, G. J. (2007). Construct implications of including still image or video in computer-based listening tests. Language Testing, 24(4), 517–537. https://doi.org/https://doi.org/10.1177/0265532207080771
- Ockey, G., & Wagner, E. (2018). Assessing L2 listening: Moving towards authenticity. John Benjamins Publishing Company.
- Panksepp, J. (2005). Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.
- Park, G.-P. (2008). Differential item functioning on an english listening test across gender. TESOL Quarterly, 42(1), 115–123. https://doi.org/https://doi.org/10.2307/40264430
- Parry, T. S., & Meredith, R. A. (1984). Videotape vs. audiotape for listening comprehension tests: An experiment. OMLTA Journal, 47–53. https://eric.ed.gov/?id=ED254107
- Révész, A., & Brunfaut, T. (2013). Text characteristics of task input and difficulty in second language listening comprehension. Studies in Second Language Acquisition, 35(1), 31–65. https://doi.org/https://doi.org/10.1017/S0272263112000678
- Richards, J. (2006). Materials development and research: Making the connection. RELC Journal, 37(1), 5–26. https://doi.org/https://doi.org/10.1177/0033688206063470
- Rupp, A. A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23(4), 441–474. https://doi.org/https://doi.org/10.1191/0265532206lt337oa
- Shin, D. (1998). Using videotaped lectures for testing academic listening proficiency. International Journal of Listening, 12(1), 57–80. https://doi.org/https://doi.org/10.1080/10904018.1998.10499019
- Suvorov, R. (2009). Context visuals in L2 listening tests: The effects of photographs and video vs. audio-only format. In C. A. Chapelle, H. G. Jun, & I. Katz (Eds.), Developing and evaluating language learning materials (pp. 53–68). Iowa State University.
- Suvorov, R. (2015). The use of eye tracking in research on video-based second language (L2) listening assessment: A comparison of context videos and content videos. Language Testing, 32(4), 463–483. https://doi.org/https://doi.org/10.1177/0265532214562099
- Townsend, D. J., & Bever, T. G. (2001). Sentence comprehension: The integration of habits and rules. MIT Press.
- Vandergrift, L. (2007). Recent developments in second and foreign language listening comprehension research. Language Teaching, 40(3), 191–210. https://doi.org/https://doi.org/10.1017/S0261444807004338
- Wagner, E. (2008). Video listening tests: What are they measuring? Language Assessment Quarterly, 5(3), 218–243. https://doi.org/https://doi.org/10.1080/15434300802213015
- Wagner, E. (2014). Using unscripted spoken texts in the teaching of second language listening. TESOL Journal, 5(2), 288–311. https://doi.org/https://doi.org/10.1002/tesj.120
- Yaylı, D. (2017). Using group work as a remedy for EFL teacher candidates’ listening anxiety. Eurasian Journal of Educational Research, 71(3), 41–58. https://doi.org/https://doi.org/10.14689/ejer.2017.71.3