2,634
Views
1
CrossRef citations to date
0
Altmetric
Special Issue: The Nature and Assessment of L2 Listening Guest Editor: Vahid Aryadoust

The Known and Unknown About the Nature and Assessment of L2 Listening

ORCID Icon

ABSTRACT

Although second language (L2) listening assessment has been the subject of much research interest in the past few decades, there remain a multitude of challenges facing the definition and operationalization of the L2 listening construct(s). Notably, the majority of L2 listening assessment studies are based upon the (implicit) assumption that listening is reducible to cognition and metacognition. This approach ignores emotional, neurophysiological, and sociocultural mechanisms underlying L2 listening. In this paper, the role of these mechanisms in L2 listening assessment is discussed and four gaps in understanding are explored: the nature of L2 listening, the interaction between listeners and the stimuli, the role of visuals, and authenticity in L2 listening assessments. Finally, a review of the papers published in the special issue is presented and recommendations for further research on L2 listening assessments are provided.

Introduction

In the past few decades, research has illuminated many issues in second language (L2) listening assessment. The L2 listening constructs have improved in coverage, clarity, and relevance and operationalization methods have gradually grown outside of the boundaries of traditional testing. Three main approaches to defining L2 listening have emerged from the literature: subskill-based, cognitive-based, and attribute-based (Aryadoust & Luo, Citation2022; Buck, Citation2001). Subskills are hypothetical, measurable abilities that create the building blocks of listening (Aryadoust, Citation2020). There are multiple subskill lists consisting of higher-order skills such as the ability to understand the gist of the passage), the ability to draw inferences, etc. (e.g., Aryadoust, Citation2020; Lee & Sawaki, Citation2009; Min & He, Citation2021). While the developers of these lists assert that the subskills have a hierarchy of cognitive difficulty, no consistent results have been found to support the presumed hierarchies. In addition, the internal connections between these subskills are often ignored in modeling L2 listening (Goh & Aryadoust, Citation2015). Due to the application of time-invariant quantitative methods and cross-sectional research designs in subskills-based listening research, the subskills are viewed as undynamic and immutable.

On the other hand, the cognitive-based approach views listening as a dynamic and non-linear process which has significant variations across contexts (Aryadoust & Luo, Citation2022; Imhof, Citation2010). In this approach, there are three mechanisms underlying L2 listening, i.e., pre-comprehension, comprehension, and post-comprehension, although this tripartite is not intended to imply a linear direction (Aryadoust, Citation2019). During pre-comprehension, the listener is involved in perception and recognition of the auditory stimuli (Aryadoust, Citation2019). During comprehension, the listener builds a surface representation of words, parses them into propositional and macro-discourse units called the textbase (mainly through bottom-up or literal processing; Townsend & Bever, Citation2001), and finally links the units to generate a global mental representation called the situation model (through top-down or inferential processing; Kintsch, Citation1998). There is volatility and non-linearity in these processes. If a segment of the stimuli is not perceived, recognized, or comprehended, the listener would fill in the gap through top-down processing (e.g., using his/her own world knowledge to fill in the gap).

Finally, the attribute-based approach to L2 listening recognizes the effect of the stimuli and non-cognitive attributes such as text difficulty and academic background of the listener on their performance (Aryadoust & Luo, Citation2022). Under the tenets of this approach, attributes are understood to have a role in L2 listening performance and test fairness. For example, the texts containing technical and multisyllabic words or lengthy clauses are likely to be more difficult to process (Flowerdew, Citation1994) and certain listener attributes such as shared first languages can interact with the test performance of listeners and advantage particular test-taking groups (Harding, Citation2012).

According to Aryadoust and Luo (Citation2022), although the depth and scope of these research streams have extended over the past decades, these advances have also made L2 listening researchers aware of some of the unknown territories which have never been ventured into. In this paper, I will discuss some of these under-researched areas, present a review of the papers published in the special issue, and propose possible research directions.

The unknown and the under-researched in L2 listening assessment

The field of L2 listening assessment has grown in sophistication and complexity over the course of several decades (Vandergrift, Citation2007). Researchers have made considerable progress in understanding the nature of listening, thought processes of test takers (e.g., Aryadoust et al., Citation2022; Field, Citation2009), functionality of test items (Liao & Yao, Citation2021; Park, Citation2008), and the effect of listeners’ cognitive and non-cognitive attributes on their comprehension performance (e.g., Bril et al., Citation2021; Canaran et al., Citation2020; Ghorbani Nejad & Farvardin, Citation2019; Goh, Citation2000; Janusik & Varner, Citation2020). In addition, the integration of innovative technologies such as eye tracking and neuroimaging to understand gaze behaviors and brain activation during listening has opened an exciting new front in L2 listening research (e.g., Aryadoust et al., Citation2022; Batty, Citation2015; Suvorov, Citation2015). This surge of research has also allowed us to become aware of our knowledge gaps. I believe, to make more progress in our understanding of L2 listening and its assessment, the unknown about L2 listening (those that we have become aware of) should first be spelled out and discussed.

There are four major gaps in understanding of L2 listening: (1) the nature of L2 listening, (2) the interaction between listeners and stimuli, (3) the role of visuals, and (4) authenticity in L2 listening assessment. This list should be viewed as a preliminary step in articulating research gaps in L2 listening and is, therefore, by no means an exhaustive and complete description of the challenges that we encounter in the field.

The nature of L2 listening

As discussed above, researchers have investigated some of the cognitive and non-cognitive aspects of listening such as subskills, cognitive processes, and attributes. While our understanding of L2 listening has improved, there remain a multitude of mechanisms that have not been problematized. First, a distinction should be made between the nature of listening (what it is or ontology) and listening constructs (what can be known or epistemology). It appears to me that L2 listening assessment research is mostly concerned with the listening construct(s) while taking the nature of L2 listening for granted. Through using listening tests (especially “one-shot” tests that aim to measure proficiency), researchers can learn only little about the nature of L2 listening.

L2 listening construct(s) should be designed based on a multifarious theory of L2 listening that defines listening under non-test conditions or target language use (TLU) situations. The aim of testing is to generalize from test scores to the TLU situation, and unless we have a clear understanding of listening in the TLU domain (its ontology), our assessments will lack generalizability (Alderson, Citation2000). As Alderson (Citation2000, p. 117) reminds us, “[w]hat matters in the end is the extent to which we can generalize from assessment procedures or tests to [listening] performance in the real world,” and if L2 listening theories are deficient or erroneous or not representative of the different aspects of listening in the TLU domain, so will be the test scores.

A reliable and precise theory of L2 listening would consist of four dimensions: behavior, cognition, emotion, and neurophysiology. Listeners’ behavior is the first and most visible dimension of listening and includes nonverbal communication during listening (gesture and body language including haptics, proxemics, facial expressions, eye contact etc.), backchanneling, and paralinguistic features of voice (such as tone, loudness, inflection, and pitch) (see, Giri, Citation2009; Burgoon et al., Citation2016, for reviews).

Listener’s cognition refers to his/her thought processes during information take-up and transformation, comprehension (Buck, Citation2001), and storing and retrieving. As discussed above, top-down and bottom-up processing is fairly well-researched in the L2 listening field. A recent review of listening processes and subskills revealed that most researchers who investigated adult L2 learners focused exclusively on coarse-grained comprehension processes and subskills, whereas the dominant approach in first language (L1) and children’s listening research was the examination of pre-comprehension processes such as phoneme and word recognition (Aryadoust, Citation2020). On the other hand, the storage and retrieval processes are not typically included in L2 listening constructs, although they do seem to be a component of listening in real-life listening in different TLU domains. In their listening-response model, Bejar et al. (Citation2000) allocated a role to learning, as a product of academic listening.

Emotion and neurophysiology in L2 listening are by far the least researched dimensions of it. Whereas emotion research in L2 listening is focused chiefly on anxiety (e.g., Bekleyen, Citation2009; Chen et al., Citation2021; Elkhafaifi, Citation2005; Kimura, Citation2017; Yaylı, Citation2017), emotions have a wider definition and coverage in (neuro)psychology; they are psychological states that are basically caused by change in humans’ neurophysiological states (Panksepp, Citation2005). Whereas emotions and feelings are used interchangeably in L2 listening research, we should distinguish them from each other. Emotions are physiological responses to external or internal stimuli (Cabanac, Citation2002) which include visceral and motor components (Dantzer, Citation1989), while feelings are the conscious experience and perception of the neurophysiological emotions (LeDoux, Citation2012). From this vantage point, methods to measure emotions and feelings have significant differences; emotions are measured and quantified using technologies that can access and observe neurophysiological mechanisms of listeners such as eye tracking, neuroimaging, and galvanic skin response, whereas feelings are measured through using “subjective” methods such as surveys and verbal reports. It might be said that the anxiety research in L2 listening assessment exclusively addresses one side product of emotions, that is, the feeling of anxiety.

Like cognition and behavior in L2 listening, emotions have underlying neurological mechanisms, which to my knowledge have (almost) never been problematized and investigated in L2 (listening) research. Specifically, the limbic system which is involved in emotional and behavioral responses to the stimuli (LaBar & Cabeza, Citation2006) has been a permanent absentee in L2 listening (assessment) research. The importance of this potential research front lies in the relationship between logical thinking during comprehension and emotional responses. Research in neuroscience shows that the brain regions regulating logical thinking in the frontal lobe (Donoso et al., Citation2014) – which are essential in literal and inferential listening (Aryadoust et al., Citation2022) – and the limbic system regulating emotions jointly act upon the incoming stimuli to comprehend them and make socioculturally acceptable decisions (Bechara et al., Citation2000). Therefore, it is plausible that L2 listening comprehension is a product of not only language and logical thinking but also the emotional mechanisms that help regulate and balance logical thinking. This invites an inevitable question concerning the nature of L2 listening test scores, i.e., how much of the variance in test scores is caused by the emotional state of the test takers and how much of it is due to their logical and language-specific neurocognitive processes? And if we find reproducible and verifiable answers to this question, what implications would it have for the uses of L2 listening test scores?

Finally, recent research on L2 listening assessments has shown that neurocognitive mechanisms underlying listening can significantly explain the cognitive processes and cognitive load of listeners (Aryadoust & Luo, Citation2022; Aryadoust et al., Citation2020). Specifically, the activation of the prefrontal cortex and parts of the perisylvian region are associated with test-taking behaviors and can differentiate between high-performing and low-performing listeners (Aryadoust et al., Citation2020). However, test scores and neurocognitive processes of listenrs do not always resonate with each other (Aryadoust et al., Citation2020). For example, while test scores showed no significant differences between groups of listening test takers in Aryadoust et al. (Citation2020), brain activation was significantly different across the groups. This suggests that relying on test scores as sole representatives of L2 listening proficiency can be restrictive and at odds with the neuroscience of listening. Eye tracking research has also shown that psychometrically valid dimensions underlying test score data are insensitive to construct-irrelevant variations in test takers’ response process that can be captured by examining their gaze behavior (e.g., Holzknecht et al., Citation2021). In all, over-relying on test scores and their psychometric features results in underrepresenting the neurocognitive, emotional, and behavioral dimensions of L2 listening.

Based on the argument advanced above, I propose that the preceding dimensions of L2 listening should be incorporated in assessment research and theory development. Without having an interdisciplinary understanding of the nature of L2 listening, it will be difficult to develop authentic and reliable tests and our understanding of the sources of variation in test performance will be restricted to the cognitive processes of listeners (specifically bottom-up and top-down processing).

The interaction between listeners and stimuli

Previous research has explored features of stimuli in listening tests and their effects on comprehension. The stimuli in this paper refers to the written and auditory stimuli, i.e., test items and the listening passage. The written stimuli are viewed from two perspectives: (1) types of test items such as multiple choice (MC) and open-ended questions, and (ii) the method of presentation. Research shows that using different item formats can have a significant effect on the cognitive processes of test takers (e.g., Rupp et al., Citation2006). Research on MC and open-ended questions items in L2 listening assessment has further shown that MC items are typically easier than open-ended items (In’nami, Citation2005). For example, in an experimental study, Cheng (Citation2004) found that test takers achieved the highest scores in MC items compared with open-ended items, which resonates with a meta-analysis by In’nami and Koizumi (Citation2009).

In addition, the method of presentation refers to when and how the test items are presented to test takers. Research shows that when the test items are presented simultaneously with the auditory stimuli and L2 listeners are involved in virtually concurrent listening, reading, and answering, their listening processes will be “shallower” such that they will not be able to store the comprehended information in their long-term memory and/or retrieve it after the listening task (Field, Citation2009). On the other hand, if the test items are presented after the listening passage is played, listeners would be more likely to remember the passage and create a coherent situation model (Field, Citation2009); this mode of presentation would also involve more cognitive load and brain activation especially in the prefrontal cortex and the perisylvian region (Aryadoust et al., Citation2022).

Another dimension of the listening stimuli is their linguistic, discoursal, and generic features. The effect of multiple linguistic features such as lexical and syntactic sophistication and discoursal features on cognitive challenges of listening have been examined in previous research (Buck & Tatsuoka, Citation1998; Freedle & Fellbaum, Citation1987; Freedle & Kostin, Citation1991; Révész & Brunfaut, Citation2013). In addition, research shows that the conversational genre is typically easier for L2 listeners to comprehend than lengthy pieces of academic discourse (Aryadoust, Citation2013). Nevertheless, the genre effect is under-researched and the results of linguistic and discoursal feature studies are far from conclusive. Studies examining the relationship between listening performance and auditory features of the stimuli such as scriptedness are also few and far between (e.g., Ockey & Wagner, Citation2018).

It may be said that L2 listening performance is easily affected by change in the features of written and auditory stimuli. However, the aforementioned research findings need to be replicated across different contexts and L2s. Particularly, the field of listening assessment should adopt experimental designs to control for the effect of extraneous and intervening variables. Recent promising research in this direction includes Aryadoust et al. (Citation2022), Kormos et al. (Citation2019), O’Grady (Citation2021), and Wagner (Citation2014), wherein the authors managed to control for different textual and listener-related features to examine the effect of their target variables on listening test performance.

The role of visuals

While most of the traditional L2 listening frameworks and assessments were founded upon a conceptualization that did not recognize the role of visuals in listening, an increasingly larger number of assessments have adopted visuals in their design. In an early study, Bejar et al. (Citation2000) categorized visuals into content visuals (visuals that provide information apropos of the content of the auditory stimuli) and situational or context visuals (visuals that show the context or location in which the interaction or communication happened). In addition, as early as the 1980s, researchers included videos in the design of L2 listening tests such as Cubilo (Citation2017), Li (Citation2016), Ockey (Citation2007), Parry and Meredith (Citation1984), Suvorov (Citation2009), and Wagner (Citation2008). Despite this, Aryadoust and Luo’s (Citation2022) study showed that many of the L2 listening assessments that they reviewed did not use any visual components, which points to a wide research gap in the field.

Scholars have viewed the use of visuals favorably and as a way to improve the authenticity of L2 listening tests. For example, Shin (Citation1998) found that video lectures tend to maximize authenticity as they resemble real-life lectures delivered at university lecture halls, which is consistent with another study by Ginther (Citation2002). This is a plausible argument, since language for academic purposes is often heavily intertwined with visuals, including PowerPoint slides, lecturers’ body language, and in some cases videos. Thus, decontextualizing L2 listening by excluding visuals would divest it from its authentic features.

In sum, I urge future researchers to investigate the effect of visuals on the functionality of L2 listening assessments and the interactions between test takers and visuals. With the ever-growing integration of technology in day-to-day life, conceptualizing the TLU domains and, accordingly, the L2 listening construct without giving a role to visuals would jeopardize the dynamic nature of the construct and the authenticity of assessments.

Authenticity in L2 listening assessment/materials

Bachman (Citation1991) defined authenticity in terms of the resemblance of test tasks with TLU tasks (situational authenticity) as well as the correspondence between the thought processes of test takers under test and non-test conditions (interactional authenticity). In the L2 listening field, authenticity has been defined more specifically and in terms of the use of kinesics in test materials (Kellerman, Citation1992), visuals (Ockey, Citation2007), linguistic and discoursal features (Flowerdew, Citation1994; Hansen & Jensen, Citation1994), and features of prosody and pronunciation (Ockey & Wagner, Citation2018; Wagner, Citation2014). For example, Hansen and Jensen (Citation1994, p. 249) suggested that the features of authentic academic lecture discourse consist of “restatement, paraphrasing, pauses, pacing, and a decreased syntactic complexity to control the density of propositions.” Relatedly, Flowerdew (Citation1994) detailed the linguistic and discoursal features of academic lecture listening that differentiate it from other spoken genres.

Wagner (Citation2014) viewed L2 leaners’ exposure to a variety of accents as a key feature of the authentic L2 listening tests. He stated “[i]t seems unlikely that L2 learners will interact only with speakers of the standard variety, and thus the lack of exposure to other varieties in the classroom certainly contributes to the phenomenon that these learners often experience of not being able to understand real world spoken English encountered outside the classroom” (p. 295). He further argued that features of authentic speech such as hesitations, connected speech, and high speed of delivery should be included in listening assessments and teaching materials, since they familiarize learners with real-world language as opposed to contrived speeches and conversations to which learners are almost never exposed in TLU domains.

The reasons for the tendency toward using nonauthentic materials in the L2 field are not well understood and need further research. Wagner (Citation2014) proposed several possible reasons such as material developers’ and publishers’ tendency to include “clearer” and “standard” listening materials, challenges in creating appropriate and meaningful tasks, learners’ positive attitudes toward nonauthentic materials (e.g., Kmiecik & Barkhuizen, Citation2006), and some scholars’ belief that the use of authentic materials is not always possible or plausible (e.g., Richards, Citation2006). In addition, Wagner (Citation2014, p. 298) rightly stressed the role of commercialized test developers in perpetuating the use of nonauthentic materials in language testing. He argued that the problem of inauthenticity of listening test materials “is exacerbated and perpetuated by the types of spoken texts that are used in tests of listening comprehension, especially large-scale, high-stakes standardized tests of L2 listening ability.” Due to the considerable washback of commercialized tests in test preparation classes and language programs, the overwhelming majority of L2 learners are merely exposed to “scripted and polished spoken texts (spoken by British or American standard English speakers) that have few of the phonological and textual characteristics of unplanned spoken discourse” (Wagner, Citation2014, p. 297).

In sum, it is essential to investigate the various aspects of authenticity in L2 listening assessments and their effect on the behavioral, cognitive, emotional, and neurocognitive dimensions of L2 listening. The field of L2 listening assessment is still lacking a theory of the TLU domains and their features. Corpus linguistics has much to offer to L2 listening assessment in this regard. Through the lenses of corpus linguistics, L2 listening scholars will be able to better understand the linguistic, generic, and discoursal features of listening in the TLU domains and represent them in listening assessments.

Papers in the special issue

The special issue of The International Journal of Listening published in March 2022 aims to address (some of) the aforementioned gaps in understanding of the nature and assessment of L2 listening. There are seven papers in the special issue: the present introductory paper, five research papers, and a closing paper by Franz Holzknecht. In what follows, I will review the papers and highlight some of their implications for L2 listening assessment. The order of presentation below is thematic and may not follow the order in which the papers are published.

First, Yo In’nami and Rie Koizumi examined the metacognitive dimension of the L2 listening construct of two listening proficiency tests administered to different samples. According to In’nami and Koizumi, there are inconsistent findings in the extant literature about the contribution of the construct of metacognitive awareness to L2 listening, and they set out to investigate this inconsistency. Applying random forests analysis – a machine learning method used in classification and regression problems – In’nami and Koizumi found that test takers’ awareness toward “person knowledge” strategies predicted the variance in the listening ability of different cohorts of test takers, while other dimensions of metacognitive awareness either predicted test takers’ performance in only one of the cohorts or neither. In addition, they found that “person knowledge,” “mental translation,” and “directed attention” predicted listening comprehension in both the tests that they employed. The study offers implications for reproducibility of results and verifiability of the theory of metacognitive awareness in L2 listening assessments. Reproducibility holds when the results of one study are fully or partially replicated in a different context. In’nami and Koizumi’s findings show that the relationship between metacognitive awareness and L2 listening depends on several factors such as test type and the features of the cohort. The authors highlighted the limitation of using questionnaires in metacognitive research, too. They argued that “questionnaires only assess learners’ perceptions and not their actions […] and may not completely reflect their behaviors” (p. 15). They proposed the use of technology such as eye tracking to supplement the results of survey-based research.

In another study of the special issue, Sarah Sok and Hye Won Shin extended the design of In’nami and Koizumi’s investigation to young English learners and examined the relationship among metacognitive awareness, language aptitude, and L2 listening. Sok and Shin applied multiple instruments to measure the constructs and determine their relationships. Using ordinary linear squares (OLS) regression analysis and mediation analysis, the authors found that aptitude was a strong and direct predictor of L2 listening measures, while metacognitive awareness as a whole predicted a smaller amount of variance in the listening test scores. In addition, a small portion of the effect of aptitude on L2 listening was partially mediated by metacognitive awareness. Sok and Shin called for further research on the reproducibility of the results across other populations of L2 listeners. Like In’nami and Koizumi, they also recommended the application of “objective” methods of observation such as eye tracking and neuroimaging to investigate the predictors of L2 listening. Finally, they underscored the pedagogical implications of their study, stressing that “a period of long-term, high-quality, explicit L2 instruction may help to mitigate the impact of aptitude on L2 listening proficiency, such that even students with low aptitude can achieve successful outcomes” (p. 11). They suggested that, although previous researchers advised against the promotion and teaching of mental translation strategies in L2 listening, “translating from the L2 to the L1 may be a necessary first step that can aid, rather than impede, the development of L2 semantic knowledge and processing abilities” (p. 11).

The studies by In’nami and Koizumi and Sok and Shin suggest that the L2 listening construct is related to metacognitive awareness and aptitude, but the amount of dependence among the constructs is mediated by test- and listener-specific factors. This has two implications for research in L2 listening assessment. First, it seems that we are still far from a verified theory of L2 listening which would allow for accurately predicting the listening mechanisms of test takers and language learners under different conditions. This gap is supposedly due to the effect of “context” on assessments. Recognizing the veracity of this assumption, I believe that context should be clearly defined and operationalized in L2 listening research. First, the constituents of context should be identified and their effects be investigated in controlled experiments (e.g., participants and their cognitive and non-cognitive characteristics, test features, the TLU situation, etc.). In addition, the interaction between the constituents should also be examined via moderation and mediation analysis (see, Collier, Citation2020). The second implication of the two studies is that the L2 listening ability should be viewed as a construct situated within a broader nomological network or nomothetic span (Cronbach & Meehl, Citation1955; Embretson, Citation1998), within which listening is affected/predicted by various factors such as metacognition and aptitude. As empirical evidence and scientific knowledge about the constructs in this network expands, the precision of predictions about listening performance enhances.

Next, Ryoko Fujita’s study investigated the role of noise in speech in L2 listening assessment. Fujita adapted the Speech-Perception-in-Noise (SPIN) test (Kalikow et al., Citation1977) and designed an experiment with four conditions having different signal-to-noise ratios (SNRs) (0, 5, 10, 15). The results confirmed that comprehension was significantly affected by the presence of background noise and listeners switched to top-down prediction and inference-making and used contextual information at different noise levels as a compensatory strategy. The participants’ reliance on contextual clues maximized when they were exposed to the highest amount of noise (i.e., SNR = 15), while the tolerance of noise varied significantly across the participants. Based on the results, Fujita suggested that “in using listening materials with background noise, instructors should be aware that noise affects learners’ listening comprehension processes differently” (p. 17).

Fujita’s study speaks to the issue of authenticity in L2 listening assessment. As Wagner (Citation2014) argued, authentic listening passages are not as polished as scripted listening materials that are commonly used in commercialized testing or language learning programs. Spoken language in real-life/TLU situations includes disfluency, noise, hesitation etc. and test developers and material designers should aim to expose learners to the materials that contain all the nuts and bolts of listening tasks. Fujita recognized this fact and stressed that “[f]or more authentic listening materials, test or material developers should consider adding background noise to the listening texts. This is especially important in EFL settings, where it may be important for learners to be habituated to listening materials with some noise to prepare them for listening comprehension in authentic contexts full of background noise” (p. 17).

The last two studies of the special issue addressed issues surrounding multimodality in L2 listening assessment. In the first study, Ruslan Suvorov and Shanshan He pointed to a growing consensus among scholars on the application of multimodality in L2 listening tests. They set out to review various methodological features of 45 studies that used multimodal stimuli in listening, particularly their “research aims, research designs, data collection and analysis methods, study and participant characteristics, design characteristics of L2 listening assessment instruments, and test administration procedures” (p. 1). They found that most studies compared the effect of different visual stimuli and/or audio materials in “the paper-based delivery format and academic lectures” (p. 16) and applied quantitative methods to examine the stimuli’s effect. By contrast, only five studies examined the interaction between listeners and the visuals or their response processes. The limitation of the studies reviewed, as Suvorov and He argued, is that “[s]uch heavily product-focused research, unfortunately, ignores the importance of response processes, test-taking strategies, and––most importantly––L2 learners’ actual interaction with visuals during the listening assessment” (p. 15).

In the second study addressing multimodality in L2 listening, Anna von Zansen, Raili Hilden, and Emma Laihanen used Rasch measurement to examine differential item functioning (DIF) across gender in a nation-wide multimodal listening test. The results of the study showed that the presence of multimodal materials did not minimize the psychometric unidimensionality of the test, suggesting that “the multimodal listening test seems to measure processes related to the same trait” (p. 16). On the other hand, DIF in favor of males and females was identified in several test items. Like the other studies in the special issue, von Zansen et al. suggested that future research should leverage modern technology (e.g., eye tracking) to examine the thought and response processes of test takers and their connection with psychometric features of listening tests.

The studies by Suvorov and He and von Zansen suggest that there is a lack of a framework and working guidelines for the generation and incorporation of visuals in L2 listening tests. There is also a dearth of experimental research investigating the effect of multimodality in L2 listening tests. Future researchers should consider examining the interaction between listeners’ characteristics and test items that rely on different modes of presentation (e.g., audio versus audio-visuals modalities). As an inevitable dimension of modern L2 listening tests, the inclusion of multimodality in tests should be informed by clear guidelines to help maximize the validity of the tests.

In the epilogue to the special issue, Franz Holzknecht reviewed the papers in this issue of The International Journal of Listening positively and called for the application of technology in future L2 listening research. He indicated that L2 listening researchers “should also think about the application of advanced methods such as neuroimaging techniques, and first studies in this realm show promising results […]” (p. #). Holzknecht also suggested that cross-disciplinary research offers a more effective paradigm to keep pace with the demands of a rapidly changing world.

Final remarks

The nature and assessment of L2 listening continue to draw researchers’ attention in the years to come. As societies become more integrated with technology and artificial intelligence, the relevance and coverage of many previous conceptualizations of the L2 listening construct would diminish. In this context, the ever-increasing need for upgrading the scientific understanding of the nature of L2 listening and its assessment becomes evident and compelling. As suggested in this paper, we should endeavor to formulate a multidimensional L2 listening theory consisting of the behavioral, cognitive, emotional, and neurocognitive aspects of listening. In addition, it is important to investigate the interaction between listeners’ features and stimuli, the role of multimodality, and authenticity in L2 listening assessment. To address these issues, I have suggested using experimental designs and nomothetic spans, which are underused in the L2 listening assessment research. These designs would allow researchers to optimize the reproducibility of research findings across different contexts and move the field toward further theoretical accuracy and empirical replicability. I hope that this special issue of The International Journal of Listening will encourage further robust research on the theory and construct of L2 listening and its assessment.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.