388
Views
0
CrossRef citations to date
0
Altmetric
Research Article

English medium instruction learners’ self-efficacy, engagement, and satisfaction; developing a measurement instrument

ORCID Icon &
Received 25 Jul 2023, Accepted 26 Feb 2024, Published online: 16 Apr 2024

ABSTRACT

This paper describes the development of a survey instrument to measure learners’ self-efficacy, engagement, and satisfaction within the context of English Medium Instruction (EMI) classes, with reference to both language issues and the discipline being taught. The paper presents details of analysis of the instrument using the Rasch model. The survey was administered to 287 tertiary students enrolled in EMI classes in Japan. Results of a Rasch Principal Components Analysis of Residuals showed the three intended constructs to be largely unidimensional. Item-level analysis, as well as reliability and separation measures and an investigation of Wright maps, provided further evidence that the items are working adequately to measure each of the three constructs and to allow differentiation between levels of each construct among the respondents. In addition, the effectiveness of the 6-point Likert scale used was confirmed with an analysis of category structure functioning. Finally, a correlation analysis with a subset (n = 97) of the participants for which English proficiency test scores were available was conducted. Results showed a small, statistically significant and positive relationship between reading proficiency and each of the three constructs.

Background

The drive to internationalize tertiary institutions around the world has led to a situation where many previously non-Anglosphere universities now provide instruction for different subjects in English, and this is usually referred to as English Medium Instruction (EMI). The students in EMI classes are usually (although not exclusively) second language (L2) English speakers, and often the teachers are too. This approach can be contrasted with “education in the disciplines” (Macaro & Aizawa, Citation2022), that is classes in universities in primarily English-speaking countries which focus on a given discipline. A class in international relations taught in English at an Australian university is part of “education in the disciplines”, while the same course taught at a university in Japan or Turkey is most probably EMI.

While learners in EMI classes may not speak English as an L1, the sole and original goal of EMI is the teaching of a particular disciplinary area (Airey, Citation2016), rather than a chance for learners to improve their English. Yet along the way, EMI has become inextricably linked with the teaching of English as a foreign/second language (EFL/ESL), and in some ways has even been co-opted by the language teaching profession, so that EMI is now viewed by many as a language learning approach. It is not uncommon to see discussion of EMI in relation to the potential for language proficiency gains, reflective in statements such as “English-medium education refers to curricula using English as a medium of instruction for basic and advanced courses to improve students’ academic English proficiency” (Taguchi, Citation2014, p. 89). In a meta-analysis of 154 journal articles on the topic of EMI, Macaro and Aizawa (Citation2022) found that 91.6% had been written by applied linguistics specialists, compared to only 1.9% by discipline specialists, (with 6.5% a collaboration by both). Further, they note that 80% of these papers were published in the applied linguistics literature (their meta-analysis itself being published in such a journal) rather than higher education, or subject-related journals.

It is not surprising therefore that the focus of many studies concerning EMI has often been on language learning outcomes, or the L2 abilities (and inabilities) of students to successfully navigate a course completely in English, and in which can be sensed the strong belief among many EMI proponents that in these courses only English should be used (Sahan et al., Citation2022). Indeed, in many ways it seems that the original goal of EMI – to teach knowledge of a discipline – is sometimes forgotten.

The view of EMI as an approach to language learning has probably helped (or at least not hindered) its adoption in many contexts where the potential success of pure EMI is questionable at best. For example, Japan is a context in which the number of EMI courses has been rapidly increasing in recent years (Harris & Strefford, Citation2022). “Internationalization” has been a key driver, but the very strong EFL industry has arguably helped to promote it. It may be that EMI gives language specialists a chance to move away from pure language teaching (Yuan, Citation2021), a role that some may view as limiting or perhaps somehow “less valuable”.

From what has been outlined above, it can be strongly argued that EMI is presently “owned” by the applied linguists, and in discussing this idea, Macaro and Aizawa (Citation2022) call for greater collaboration between language specialists and “teachers of the disciplines” in order to redress this imbalance. This does not of course mean ignoring language completely, and in contexts like Japan, language factors are indeed important and must be taken into account. EMI was originally developed in comparatively high English proficiency contexts, and there are special considerations that need to be made for lower-proficiency contexts like Japan. Therefore, it is important for researchers to take into consideration both language and discipline-related factors, and this became the key rationale behind the present study, creating a measurement instrument which takes both of these areas into consideration.

This paper outlines the development of a survey instrument to measure Japanese EMI learners’ satisfaction, engagement, and self-efficacy, all important areas which heavily affect success (or otherwise) in EMI classes in contexts such as Japan. For each of the three factors we inquire after both subject and language concerns. Furthermore, Rasch analysis was used to confirm validity and reliability of this new instrument, addressing a problem in previous EMI research of inadequately tested measurement instruments (Curle & Derakhshan, Citation2021). It also provides the entire instrument, which we hope may be used or further adapted by other researchers in the field, addressing a major issue in social sciences that such instruments are not always made freely available (Rammstedt & Blumke, Citation2019). Finally, the instrument was tested in a correlation study with English listening and reading proficiency scores from a subset of the participants. This avoided the “questionnaire curse”, a common situation in which all variables in a study, even dependent variables, derive from self-reported measures (Al-Hoorie et al., Citation2021).

Literature review

In previous EMI research focusing on the learner, outcomes fall overwhelmingly in the area of L2 ability (e.g., Rose et al., Citation2020), features of language use in the classroom (e.g., Chou, Citation2018), or language improvement (e.g., Coxhead & Boutorwick, Citation2018). However, as Thompson et al. (Citation2019) point out, other factors besides language ability are important and have generally been neglected in EMI research, and this is despite the high potential for such factors to be instrumental in achieving successful learning outcomes.

Nonetheless, previous studies which have attempted to measure various aspects of learner experience, in addition to our combined professional experience of over 40 years of teaching, helped us to identify three key areas of interest: self-efficacy, engagement, and satisfaction. Below, we outline these studies, and provide a rationale for the development of an instrument that can accurately, and comprehensively, measure them. First however, we provide an outline of these three key concepts.

Self-efficacy (SE) is part of Albert Bandura’s Social Cognitive Theory (Bandura, Citation1997) and concerns the beliefs that a person holds regarding their ability to achieve a certain goal. While it has been argued that in some cases SE can be general, it is more commonly thought to be domain specific (high levels of SE concerning one’s golf swing will not necessarily carry over to tennis swing). It has been shown to be an important self-belief across various domains from sport to health to business (Schunk & DiBenedetto, Citation2016). It has also been shown to be a strong predictor of actual performance, perhaps more than any other self-belief (Graham & Weiner, Citation1996). For teachers, it should be considered an extremely important variable because people with high levels of SE tend to show greater persistence in the face of difficult tasks, are more intrinsically interested in tasks, will give more effort, and are likely to be less anxious (Prat-Sala & Redford, Citation2010). All of these are qualities that teachers seek to nurture in their learners and, because teachers can strongly influence the development of SE, it is an area where they can have a large and positive impact.

The term “engagement” describes a learner state that can change with time and context, (thus it is not treated as a trait), referring specifically to action (Lawson & Lawson, Citation2013). This distinguishes it from motivation, which it is held, is an antecedent of engagement (Reschly & Christenson, Citation2022). Therefore, unlike SE, engagement is less a psychological variable, but context- (and task-) specific. While it is not always clear exactly what engagement entails, it is generally thought to consist of three main dimensions: affective (for example signs of enjoyment or interest in a given class), cognitive (such as verbal and non-verbal markers of interaction and how learners connect new knowledge and relate it to their existing knowledge), and behavioural (aspects such as time spent on a given task or total amount of output). In this study, we tried to include elements of each of these in the items in the scale. However, due to our need to create a concise measurement instrument that includes reference to SE, engagement, and satisfaction concerning both the discipline and language in the same instrument, we treated engagement in a more general sense. Overall, engagement is an important variable in education, because, like SE, positive engagement has been shown to be a major contributor in learning success (Hattie, Citation2009; Skinner et al., Citation2008), and like SE, the teacher can have a great deal of influence on learner engagement. Importantly, measurement of engagement should be tailored to the specific domain because engagement in a learner can change drastically depending on context. As far as we are aware, there is not yet a domain specific measurement instrument for engagement in EMI, and especially not for contexts in which English proficiency is limited, and one that includes both content and language in its scope.

Learner satisfaction is a factor that has been the subject of extensive research in education (Weerasinghe & Fernando, Citation2017), aiming to capture levels of fulfilment or contentment. Learner satisfaction is tied to perseverance and academic achievement. Not only does satisfaction have direct implications for learner SE and engagement, but identifying lack of satisfaction can help teachers and institutions to improve. For EMI, we argue that it is an important variable to be measured. If, for example, expectations are met, then this can create something akin to a positive feedback loop in which SE and engagement rise, leading to positive learning outcomes.

SE, engagement, and satisfaction are three important variables in any area of education. As previously stated, these variables have not been extensively researched within the context of EMI. However, this does not mean there has been a total neglect, and so it is to an overview of studies which have investigated these variables within the context of EMI classes that we now turn.

In response to the early 21st century expansion of EMI around the world, some studies have begun to appear which investigate learner perceptions and their importance to successful learning outcomes in EMI classes. Thompson et al. (Citation2019) found that along with L2 proficiency, SE was a predictor of “success” (for which they used mid-term and end-of-term test scores as an outcome variable) of 139 second-year students in a business management EMI course in Japan, finding links between positive self-beliefs and perseverance. They employed a self-beliefs survey and supported it with interviews with seven of this main cohort. From the latter they found that language support was very important for these learners, both before and during the course. Through a multiple linear regression, the authors found that SE was indeed a predictor of success and this was backed up with interview data. One drawback of the paper is that the full survey is not included, but rather only example items such as “I usually get good marks in English” (Thompson et al., Citation2019, p. 209). In addition, the items that are included suggest a focus on language outcomes rather than including reference to the discipline in question.

Zhang and Pladevall-Ballester (Citation2021) investigated the attitudes and perceptions of learners in three different kinds of EMI class: international trade (n = 96), film production (n = 45), and project management (n = 29) over one semester (September to December) in China. Pre- and post-course surveys were augmented with classroom observations held three times over the course. Overall, they found that learners became “less positive” towards the end of courses. They suggest that this may be related to language proficiency issues and conclude that translations might be used for scaffolding purposes when needed, or as extra materials given as out-of-class work. Once again, the full survey is not included so it is difficult to know exactly what was being asked of participants, but the authors refer to “perceptions, expectations, and attitudes” (p. 204). Also, the examples provided suggest a focus on language, such as “Can’t understand the lecture due to poor listening skills and limited vocabulary range” (Zhang & Pladevall-Ballester, Citation2021, p. 209).

Closely related to the present study, Le and Nguyen (Citation2022) investigated the interaction between learner motivation, engagement, and satisfaction with questionnaire data from 437 tertiary students in Vietnam. Satisfaction was measured based on a survey instrument developed by Tseng et al. (Citation2020), and this became the dependent variable with data from an engagement and motivation scale as the independent and mediating variables (however, once again the items for these latter constructs are not provided in the published paper). They found that while these students were generally positive in assessing their satisfaction with the course, cognitive and emotional engagement has a mediating effect between motivation and satisfaction.

In another study investigating learner satisfaction, Kym and Kym (Citation2014) measured perceptions of both language and discipline knowledge of 364 students in 11 separate business administration EMI courses at a university in Korea. They developed an instrument which among other items aiming to measure self-perceived comprehension and “general perception” (p. 42), included five items measuring satisfaction. The authors performed a correlation study and an ANOVA with the resulting data and self-reported test scores of about half of the participants. They found that satisfaction did not correlate with proficiency, but it is important to note that these were self-reported scores of varying language proficiency tests.

As the review of previous research in this area above highlights, while researchers have begun to investigate these variables (SE, engagement, and satisfaction) highlighting their salience for EMI courses, there are, as far as we can see, two main gaps that need to be addressed. First, previous inquiry into these areas appears to focus heavily on the linguistic side. For example, a survey investigating SE may inquire after students’ perceived confidence in the language required to complete a task rather than SE related to knowledge of the discipline. Second, and this is an issue that is not only related to the EMI literature, but common to many studies in the applied linguistics field in which they are situated, is the lack of either assessing, or at least publishing details of validity and reliability of the scales employed (Al-Hoorie & Vitta, Citation2019), or indeed even publishing the survey itself (Rammstedt & Blumke, Citation2019).

The intention of this paper is to address these two issues by describing the design and psychometric testing of a concise (24-item) instrument that can accurately measure the SE, engagement, and satisfaction of EMI learners. While it is very difficult to parse out student’s attitudes towards language and content, we have tried to redress the balance by inquiring after both. The process of the development of the instrument and initial results are described below. Following this, we outline a correlation study of the three variables with language proficiency scores. First however, we provide the two research questions that framed this study.

Research questions

There are two main research questions that we seek to address in this study.

  1. How can three key factors for student success in EMI (self-efficacy, engagement, and satisfaction) be measured accurately?

  2. How do these three factors interact with language proficiency?

Methodology

Participants

The participants were first-, second-, and third-year university students in an International Relations faculty at a large private university in Japan. We chose students from across different grade levels in order to test the instrument with a variety of learners who were in differing, though connected, EMI class contexts. The participants were not language majors, although, in accordance with the ongoing internationalization of higher education in Japan, all students took a number of English language classes, and in-English classes (“EMI-like”) in their first and second year of study, and also a compulsory EMI class on international relations in their third year. Therefore, all of the students took the survey in, and in relation to EMI, or “EMI-like” classes. Participation was voluntary. Students were provided with a link to an online survey and were informed that they could opt out if they preferred. The data collection took place over one week towards the end of the academic year. In total, 287 students completed the survey. From this main cohort, 106 students had also taken an institutional version of the Test of English for International Communication Listening and Reading (TOEIC L&R, Educational Testing Service, Citation2008). After removing nine outliers from this group, the test scores for the remaining 97 students were used in bivariate correlations with the three survey variables.

Instruments

We created a battery of 24 questions, eight for each of the three intended constructs (SE, engagement, and satisfaction). All items were original, and were created based on the authors’ experiences with EMI classes. This experiential knowledge enabled us to pinpoint factors that may significantly impact learners, and these factors are interrelated.

A key issue is that what students in EMI classes are expected to do is very difficult for many of them. The gap between how they learnt in high-school, and what they are expected to do in EMI classes is large. Specifically, many students have limited language skills, and have limited experience participating in, and contributing to, a classroom environment in which they are expected to produce something. This naturally results in a lack of SE. As SE is a predictor of motivation (McGeown et al., Citation2014), and because motivation is essential for academic success (Schneider & Preckel, Citation2017), it is an extremely important factor to investigate. Regarding engagement, if a student is engaged in a task, then it is logical to conclude that motivation is higher, resulting in positive learning outcomes. SE is also positively linked to satisfaction (Kryshko et al., Citation2022) and most EMI classes are one component of a wider curriculum, and therefore satisfaction in one class is assumed to have a high potential to have a spill-over effect on other classes.

For each potential construct (SE, engagement, and satisfaction), items related to both language and content were included. The items were created first in English, and then translated into Japanese by the authors. This was then checked by a Japanese native speaker familiar with the project and small adaptions were made. In line with Krosnick and Fabrigar’s (Citation1997) recommendation for keeping Likert scales between five and seven points, we opted for a six-point scale in order to avoid a situation where learners might rely on the “neutral” middle category. The final set of questions were uploaded to a Google Form which was then made available to the participants via QR code.

Analysis and results

Rasch analyses

In order to investigate the performance of the survey, Rasch analysis (Rasch, Citation1960) was conducted. Rasch is a mathematical model that is useful for survey design because it allows researchers to ascertain the unidimensionality of constructs in a potential scale. It provides information about the connections between “person” abilities (or in this case “level of a given attitude” in survey participants) and “item” difficulty (in this case “difficulty to agree or disagree with an item”). It also provides measures of reliability for both persons and items. An advantage of using Rasch analysis to refine survey instruments lies in the way it can help to identify differences in Likert scale categories. A fundamental problem with Likert scales is that resulting data gained from them are ordinal (Embertson & Reise, Citation2000), and as such, an assumption is that the gaps between the numbers are identical across the entire scale (for example, it is assumed that the gap between “1” and “2” on the scale is identical in strength to the gap between “2” and “3”), but this may not necessarily be the case. Rasch provides logit scores which can then be used in other statistical procedures.

To investigate the unidimensionality of each of the three constructs in the instrument, a Principal Components Analysis of Residuals (PCAR) was conducted with Winsteps software (Linacre, Citation2020b). In order to confirm that each construct is unidimensional, it is commonly held that variance explained by measures should be over 50% and eigenvalues in the first contrast should sit below 2.0 (Linacre, Citation2020a; Wright, Citation1996). Figures outside of this are potential indicators of other dimensions in the scale. As can be seen in , each of the three constructs in the instrument meets these requirements, with the exception of the engagement measure, which has a slightly higher eigenvalue in the first contrast. Linacre (Citation2020a) however explains that these values are no more than indicators, and that in the same way that figures falling inside these recommendations do not instantly mean a construct is definitely unidimensional, figures outside of these parameters do not necessarily mean a construct contains other dimensions. Therefore, in order to gain a better understanding of the instrument’s performance, individual items were also investigated.

Table 1. PCAR results for the three scales.

Rasch provides individual item information as both mean-square (MNSQ) and standardized (ZSTD) scores, but the latter are sometimes sensitive to large sample sizes (Bond et al., Citation2021), so for the present study, MNSQ scores are provided. For each item, infit and outfit values are produced. A score of 1.0 reflects “fit” to the expected model, and it has been suggested that values between 0.6 and 1.4 can be considered acceptable (Wolfe & Smith, Citation2007). Items outside of this recommendation suggest potential item “disturbance” (Bond et al., Citation2021; Wright & Linacre, Citation1994). A high MNSQ suggests that a response is not as predictable as would be expected by the model. On the other hand, low MNSQ values suggest the opposite – that an item response is very predictable to the point of overfitting the model, which means an item may not be efficient in yielding any new information. Outfit values provide information about outliers which are possibly the result of small issues like careless mistakes. High infit values are considered more problematic, indicating for example, issues with item wording (Apple & Neff, Citation2012). Once again, these are just guidelines, but upon reviewing flagged items, it may be the case that rewording or removal is deemed appropriate. As the data in indicates, the vast majority of items in each construct show acceptable fit to the model. Only two items had higher than recommended infit values. In the SE construct, item 20 “I was able to complete all homework assignments for this class adequately” had an infit MNSQ of 1.65. A possible reason for this may be that it is the only item related to out-of-class work. While a student may be consistent in thinking about their ability to complete various kinds of tasks in class, homework may be thought of differently. It is also possible that levels of homework given might have varied depending on the teacher, making responses to this item more inconsistent. However, as the results of the PCAR showed the SE construct to be unidimensional, this item was retained. The other item displaying a higher than recommended infit MNSQ value (1.77) was item 5 from the satisfaction construct, “I need to have a high level of English competency for my future career”. A possible reason for this misfit might be that this is the only item in this construct referring to future career. As many of the participants were only first- or second-year students, for them, their careers may seem far into the future, and connections between what they are learning now, and what they will need for their career, may be hard to imagine. This has been an issue with other scales using career-related items with tertiary students, such as motivation for learning English (Leeming & Harris, Citation2024). Once again though, the PCAR showed the satisfaction construct to be unidimensional and therefore, it was decided to retain it.

Table 2. Item statistics for the three scales.

Rasch also allows for investigation of reliability and separation for both persons and items. Person and item reliability figures are similar to Cronbach’s alpha in that those over 0.8 can be considered reliable (Bond et al., Citation2021). As shows, all three constructs meet this benchmark. Person separation shows the level to which the scale is separating out high and low performers (in this case, different levels of the various constructs among respondents). A minimum of 2.0 means that it is adequately separating the participants into two distinct groups (Linacre, Citation2020a). Again, all three constructs meet this requirement. The item reliability figures for all three constructs support the replicability of the instrument. If the same set of items were administered to a separate group of participants, it is highly likely that the same item hierarchy would result. Item separation is a way of ascertaining item hierarchy, and figures above 3.0 in conjunction with item reliability above 0.9 strongly supports both the construct validity and replicability of the scales (Linacre, Citation2020a).

Table 3. Person and item separation and reliability for the three constructs.

Another useful tool that Winsteps affords is the Wright map which provides a visual representation of the level to which items are endorsed by participants, and can help to highlight redundant items (Knoch & McNamara, Citation2015). Persons and items are placed on logit scales on the map. Persons are represented by the symbols on the left of the map (each # represents two people and each dot is one) and the higher a respondent, the more of a particular construct they exhibit in their answers (i.e., are more “self-efficacious”, “engaged”, or “satisfied”).

Items are displayed on the right within each of the six Likert categories. The positioning of the items on the map shows endorsement difficulty. The higher items are those that are more difficult for respondents to endorse. For example, in the engagement construct, item 12, “I sometimes forgot time in class because I was so involved in the topic” was the most difficult for respondents to agree with. Indeed, this item was created to try to tease out those highly engaged students and to provide a harder item for them to agree with. While certain tasks may be interesting for students, it would perhaps take extreme levels of engagement for them to “forget time”.

The mean values are represented by the letter “M”, and ideally the mean for persons and items should be parallel. The Wright maps () show that the person means for all three constructs are higher than those for items, and this, along with a ceiling effect evident from the collection of # symbols on the top left, which was particularly prominent for satisfaction () provides a visual depiction of how the respondents were generally positive in their responses.

Figure 1. Wright map for self-efficacy.

Figure 1. Wright map for self-efficacy.

Figure 1. (Continued).

Figure 1. (Continued).

Figure 2. Wright map for engagement.

Figure 2. Wright map for engagement.

Figure 2. (Continued)

Figure 2. (Continued)

Figure 3. Wright map for satisfaction.

Figure 3. Wright map for satisfaction.

The items for SE () show a good level of spread with very little redundancy. For engagement () and satisfaction () however, there is some overlap suggesting a level of redundancy in certain items. Nonetheless, the spread of items as shown by these maps suggests that the questionnaire as a whole is differentiating between learners with higher and lower levels of each construct, as supported by the results from the other analyses outlined in this paper. With all of this taken into consideration, it was decided to retain all eight items for each construct.

Finally, an analysis of category structure functioning was conducted in order to test the effectiveness of the six-point Likert scale. According to criteria suggested by Wolfe and Smith (Citation2007), each category should have at least 10 responses, the average measure for each category should be higher than those before it, outfit MNSQ figures should ideally be below 2.0, and the gap between difficulty levels should be between .59 and 5 logits. As can be seen in , with the exception of a slightly larger than 2.0 (2.12) outfit MNSQ for the “strongly disagree” category in the satisfaction construct, all of these criteria have been met, and this one divergence is slight. As can be seen in the “count” column, it was indeed difficult for learners to “strongly disagree” with the satisfaction items, and along with the heavy weighting towards the “agree” end of the scale for this construct, this suggests one of two possibilities. It could be that these learners are either actually very satisfied with the EMI course, or it could be a sign of acquiescence bias, a common problem with student evaluation data (Graeff, Citation2005) and which especially may be the case here given that the survey was administered in the faculty that ran the course which the respondents were requested to rate for satisfaction.

Table 4. Category structure functioning for the three constructs.

The results of the analyses provided above suggest that the three constructs (SE, engagement, and satisfaction) are unidimensional, and with a few exceptions, the individual items are working well to separate out varying levels of the respective constructs in the participants. Strong item separation figures further support construct validity. High reliability figures suggest that this survey will provide similar results if administered again to a similar population.

Correlation studies

In addition to investigating the validity and reliability of the instrument, logit scores derived from Rasch were used for bivariate correlations, to investigate the relationship between the three constructs and a separate variable, language proficiency scores. TOEIC L&R scores were available for a small subset of the students (n = 106) who completed the survey. Scores for the 97 of these students were used for a correlation study using JASP 0.16.3 (after outliers from the main cohort were removed). Correlation studies are often employed when comparing various attitudinal constructs (Ahmadian & Ghasemi, Citation2017; Han & Wang, Citation2021; Yuan et al., Citation2023) and because the number of students for whom test data was available was relatively small, correlation was chosen over regression. This also addresses a common issue with many studies dealing with measurement, the “questionnaire curse” (Al-Hoorie et al., Citation2021, p. 6), whereby all variables, including the dependent variable in a study, come from self-reported measures.

First, logit scores derived from the three survey constructs were exported from Rasch. Outliers were then investigated using boxplots. Across the three variables, a total of nine outliers were identified and removed, leaving a total of 97 participants. Descriptive statistics for the three scales within the survey can be seen in . As can be seen from that table, skew and kurtosis are both well within the recommended range of −2 and + 2 (George & Mallery, Citation2010), which suggests that the data is normally distributed.

Table 5. Descriptive statistics for the three constructs of the survey (n = 97).

The TOEIC L&R is a well-known test in parts of Asia including Japan, and has been shown to be a reliable and generalizable test of language proficiency (Zhang, Citation2006). In addition, scores from the TOEIC L&R have previously been used in studies showing correlation with variables such as SE (Powers & Powers, Citation2015). Descriptive statistics for the 97 participants are provided in and show that these students performed substantially better in listening than reading, even though both skills are vitally important in EMI classes. As can been seen in , there were rather large differences in ability between some of the students, with the lowest score of 340, and the highest of 830.

Table 6. Descriptive statistics for the TOEIC listening and reading.

After checking assumptions, Pearson correlations were run on the data. The results showed small, statistically significant, positive correlations between TOEIC Reading scores and SE (r = .303, n = 97, p = 0.003), engagement (r = .218, n = 97, p = 0.032), and satisfaction (r = .267, n = 97, p = 0.008), suggesting that reading proficiency in particular may have a relationship with the attitudes investigated with this scale (). On the other hand, there were no statistically significant correlations with TOEIC listening scores. A potential reason for this relationship between reading ability and the three variables in question may lie in the fact that reading is an important part of the EMI classes of these learners (and indeed in many similar EMI programmes). EMI classes will, by necessity, include readings that are more difficult than those found in regular EFL classes. This is because EMI classes are primarily focused on building knowledge about the academic content, as opposed to pure language development, and therefore reading skills are particularly important in EMI classes (Kang et al., Citation2023).

Table 7. Correlations between proficiency and self-efficacy, engagement, and satisfaction (n = 97).

Conclusion

As the number EMI programmes continues to grow in contexts where English is not an official language, and where English language proficiency and motivation may be low, the need to investigate the effects of EMI programmes on pedagogical outcomes, both positive and negative, becomes extremely important. Such programmes cost a great deal in terms of both money and time (for institutions, teachers, and students alike) and it is imperative to maximize the outcomes, in terms of discipline-related knowledge, and language development. Also, because the EMI research that has looked at learner issues has hitherto focused primarily on language learning concerns, there is a gap in studies looking at other areas such as psychological variables. This paper described the creation of a survey instrument to address this. It was designed to measure EMI learners’ levels of SE, engagement, and satisfaction concerning aspects of both the subject being taught and the language in which it is taught (English). The survey was administered to 287 Japanese tertiary students in EMI or EMI-like classes.

The results of a Rasch PCAR in conjunction with item-level analysis were used to closely investigate the functioning of the instrument. The three constructs were found to be largely unidimensional, with most of the items working well to discriminate varying levels of SE, engagement, and satisfaction among participants. The engagement construct had a slightly higher than recommended eigenvalue in the first contrast, but investigation of individual items suggests that they are functioning adequately to measure engagement of learners in EMI classes. One item in the SE construct and one item in the satisfaction construct had slightly higher than recommended infit scores, but the results of the PCAR along with data from Wright maps and acceptable person and item reliability and separation, all suggest that items in the instrument are working effectively to measure the three constructs with a range of tertiary learners. The same instrument used with a different population should theoretically bring about similar results.

In addition to these analyses, Wright maps and an investigation of category structure functioning revealed a noticeable ceiling effect for the satisfaction construct (). This is possibly due to acquiescence bias, a common problem with student evaluations (Graeff, Citation2005). Nonetheless, for future reiterations, inclusion of one or two more “difficult to endorse” items might be useful. After taking into consideration the results of the PCAR, separation and reliability data, and individual item analyses, it was decided that the items in the section are still able to adequately function to gain valid and reliable data on learner satisfaction in EMI courses.

In previous EMI studies, there appears to have been an underreporting of instruments themselves. In providing the final instrument with all items included (see Appendix), we hope that other researchers might be able to use or further refine it, and that it can be employed in future studies of learners in various EMI contexts, as a concise, reliable, and valid measurement tool. Practically, it means that these important areas might become better understood by teachers and programme coordinators.

Finally, a correlation study was conducted with logit scores for the three constructs derived from Rasch and TOEIC L&R test scores for a subset of the main cohort (n = 97). While there were no statistically significant correlations with the Listening scores, there were small, but significant, positive correlations between all constructs and TOEIC Reading scores. This suggests that there is a positive connection between English proficiency and the three variables measured here. Students are required to do a considerable amount of challenging reading as preparation for each class and therefore students’ reading proficiency level possibly impacts these factors.

Future studies should seek to further improve on measurement instruments for EMI learners, addressing both language and the discipline being taught. In doing this, calls for greater collaboration between disciplines experts and applied linguistics will be met, which could ultimately lead to more positive outcomes in EMI programmes. We hope this paper provides an example of what might be achieved with such collaboration.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Justin Harris

Justin Harris is a professor in the Faculty of Economics at Kindai University in Osaka, Japan. His research interests center around methodologies that encourage language motivation including task-based language teaching (TBLT). Justin is cofounder of the JALT TBLT special interest group which is focused on bridging the gap between TBLT research and classroom implementation. He has a particular interest in measurement, and has published extensively on the develop of instruments for use within applied linguistics and the greater field of education.

Patrick Strefford

Patrick Strefford is Professor of International Relations at Kyoto Sangyo University, Japan where he teaches courses on International Relations and International Development. His research focuses on Myanmar’s foreign relations, particularly aid donors’ policies and practices towards Myanmar, focusing especially on that of Japan’s. In 2013, he was awarded a Japan Society for the Promotion of Sciences grant to support research into international aid to support the transition in Myanmar. He is a co-editor of a 2020 volume on Myanmar’s transition.

References

  • Ahmadian, M., & Ghasemi, A. (2017). Language learning strategies, multiple intelligences and self-efficacy: Exploring the links. Journal of Asia TEFL, Winter, 14(4), 587–836. https://doi.org/10.18823/asiatefl.2017.14.4.11.755
  • Airey, J. (2016). EAP, EMI or CLIL? In K. Hyland & P. Shaw (Eds.), The Routledge Handbook of English for Academic Purposes (pp. 71–83). Routledge.
  • Al-Hoorie, A.H., Hiver, P., Kim, T.Y., & De Costa, P. (2021). The identity crisis in language motivation research. Journal of Language and Social Psychology, 40(1), 136–153. https://doi.org/10.1177/0261927X20964507
  • Al-Hoorie, A.H., & Vitta, J.P. (2019). The seven sins of L2 research: A review of 30 journals' statistical quality and their CiteScore, SJR, SNIP, JCR impact factors. Language Teaching Research, 23(6), 727–744. https://doi.org/10.1177/1362168818767191
  • Apple, M.T., & Neff, P. (2012). Using Rasch measurement to validate the big five factor marker questionnaire for a Japanese university population. Journal of Applied Measurement, 13(3), 276–296.
  • Bandura, A. (1997). Self-efficacy: The exercise of control. Freeman.
  • Bond, T.G., Yan, Z., & Heene, M. (2021). Applying the Rasch model: Fundamental measurement in the human sciences (4th ed.). Routledge.
  • Chou, M.-H. (2018). Speaking anxiety and strategy use for learning English as a foreign language in full and partial English-medium instruction contexts. TESOL Quarterly, 52(3), 611–633. https://doi.org/10.1002/tesq.455
  • Coxhead, A., & Boutorwick, T.J. (2018). Longitudinal vocabulary development in an EMI international school context: Lessons and texts in EAL, maths, and science. TESOL Quarterly, 52(3), 588–610. https://doi.org/10.1002/tesq.450
  • Curle, S.M., & Derakhshan, A. (2021). Trends in using questionnaires for EMI research: Suggestions for future improvements. In J.K.H. Pun & S.M. Curle (Eds.), Research methods in English medium instruction (pp. 33–45). Routledge.
  • Educational Testing Service. (2008). TOEIC test data and analysis 2007: Number of examinees and scores in FY2007. Institute for International Business Corporation, TOEIC Steering Committee.
  • Embertson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Psychology Press.
  • George, D., & Mallery, M. (2010). SPSS for Windows Step by Step: A Simple Guide and Reference, 17.0 update (10a ed.). Pearson.
  • Graeff, T.R. (2005). Response Bias. In K. Kempf-Leonard (Ed.), Encyclopedia of Social Measurement (pp. 411–418). Elsevier.
  • Graham, S., & Weiner, B. (1996). Theories and principles of motivation. In D.C. Berliner & R.C. Calfee (Eds.), Handbook of Educational Psychology (pp. 63–84). Simon & Schuster Macmillan.
  • Han, Y., & Wang, Y. (2021). Investigating the correlation among Chinese EFL teachers’ Self-efficacy, work engagement, and reflection. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.763234
  • Harris, J. & Strefford, P.(2022). The many faces of English Medium Instruction in Japanese universities: Introducing ‘EMI-local’. Ikoma Keizai Ronsou, 20(2), 31–52.
  • Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses related to achievement. Routledge.
  • Kang, A., Lim, Y., & Murdoch, Y.D. (2023). The value of reading circles in EMI class: Engagement, usefulness, and outcomes. SAGE Open, 13 (2). Retrieved from. https://journals.sagepub.com/doi/full/10.1177/21582440231179681
  • Knoch, U., & McNamara, T. (2015). Rasch analysis. In L. Plonsky (Ed.), Advancing quantitative methods in second language research (pp. 275–304). Routledge.
  • Krosnick, J.A., & Fabrigar, L.R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, & M. Collins (Eds.), Survey measurement and process quality (pp. 141–164). Wiley-Interscience.
  • Kryshko, O., Fleischer, J., Grunschel, C., & Leutner, D. (2022). Self-efficacy for motivational regulation and satisfaction with academic studies in STEM undergraduates: The mediating role of study motivation. Learning and Individual Differences, 93, 102096. https://doi.org/10.1016/j.lindif.2021.102096
  • Kym, I., & Kym, M.H. (2014). Students’ perceptions of EMI in higher education in Korea. Findings on Stakeholder Engagement, 11(2), 35–61.
  • Lawson, M.A., & Lawson, H.A. (2013). New conceptual frameworks for student engagement research, policy, and practice. Review of Educational Research, 83(3), 432–479. https://doi.org/10.3102/0034654313480891
  • Leeming, P. & Harris, J. (2024). The language learning orientations scale and language learners’ motivation in Japan: A partial replication study. Research Methods in Applied Linguistics, 3(1). https://doi.org/10.1016/j.rmal.2024.100096
  • Le, N.T., & Nguyen, D.T. (2022). Student satisfaction with EMI courses: The role of motivation and engagement. Journal of Applied Research in Higher Education, 15(3), 762–775. Online First. https://doi.org/10.1108/JARHE-02-2022-0050
  • Linacre, J.M. (2020a). A user’s guide to WINSTEPS: Rasch-model computer program (4.7.0). MESA Press.
  • Linacre, J.M. (2020b). WINSTEPS: Multiple-choice, rating scale, and partial credit Rasch analysis [Computer software]. MESA.
  • Macaro, E., & Aizawa, I. (2022). Who owns English as a medium of instruction. Journal of Multilingual and Multicultural Development, 1–14. Online First. https://doi.org/10.1080/01434632.2022.2136187
  • McGeown, S.P., Putwain, D., Simpson, E.G., Boffey, E., Markham, J., & Vince, A. (2014). Predictors of adolescents’ academic motivation: Personality, self-efficacy and adolescents’ characteristics. Learning and Individual Differences, 32, 278–286. https://doi.org/10.1016/j.lindif.2014.03.022
  • Powers, D.E., & Powers, A. (2015). The incremental contribution of TOEIC listening, reading, speaking, and writing tests to predicting performance on real-life English language tasks. Language Testing, 32(2), 151–167. https://doi.org/10.1177/0265532214551855
  • Prat-Sala, M., & Redford, P. (2010). The interplay between motivation, self-efficacy, and approaches to studying. British Journal of Educational Psychology, 80(2), 283–305. https://doi.org/10.1348/000709909X480563
  • Rammstedt, B., & Blumke, M. (2019). Measurement instruments for the social sciences. Measurement Instruments for the Social Sciences, 1(4). https://doi.org/10.1186/s42409-018-0003-3
  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Denmark Paedagogiske Institut.
  • Reschly, A.L., & Christenson, S.L. (2022). Jingle-Jangle revisited: History and further evolution of the student engagement construct. In A.L. Reschly & S.L. Christenson (Eds.), Handbook of Research on Student Engagement (pp. 3–24). Springer.
  • Rose, H., Curle, S., Aizawa, I., & Thompson, G. (2020). What drives success in English medium taught courses? The interplay between language proficiency, academic skills, and motivation. Studies in Higher Education, 45(11), 2149–2161. https://doi.org/10.1080/03075079.2019.1590690
  • Sahan, K., Galloway, N., & McKinley, J. (2022). ‘English-only’ english medium instruction: Mixed views in Thai and Vietnamese higher education. Language Teaching Research, Online First https://doi.org/10.1177/13621688211072632.
  • Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565–600. https://doi.org/10.1037/bul0000098
  • Schunk, D.H., & DiBenedetto, M.K. (2016). Self-efficacy theory in education. In K.R. Wentzel & D.B. Miele (Eds.), Handbook of motivation at school (2nd ed. pp. 34–54). Routledge.
  • Skinner, E., Furrer, C., Marchand, G., & Kindermann, T. (2008). Engagement and disaffection in the classroom: Part of a larger motivational dynamic? Journal of Educational Psychology, 100(4), 765–781. https://doi.org/10.1037/a0012840
  • Taguchi, N. (2014). English-medium education in the global society: Introduction to the special issue. International Review of Applied Linguistics for Language Teaching, 52(2), 89–98. https://doi.org/10.1515/iral-2014-0004
  • Thompson, G., Aizaiwa, I., Curle, S., & Rose, H. (2019). Exploring the role of self-efficacy beliefs and learners’ success in English Medium Instruction. International Journal of Bilingual Education and Bilingualism, 25(1), 196–209. https://doi.org/10.1080/13670050.2019.1651819
  • Tseng, P.-H., Pilcher, N., & Richards, K. (2020). Measuring the effectiveness of English medium instruction shipping courses. Maritime Business Review, 5(4), 351–371. https://doi.org/10.1108/MABR-10-2019-0042
  • Weerasinghe, I.M.S., & Fernando, R.L. (2017). Students’ satisfaction in higher education. American Journal of Educational Research, 5(5), 533–539.
  • Wolfe, E.W., & Smith, E.V. (2007). Instrument development tools and activities for measure validation using Rasch models: Part II–validation activities. Journal of Applied Measurement, 8(2), 204–234.
  • Wright, B.D. (1996). Comparing rasch measurement and factor analysis. Structural Equation Modeling, 3(1), 3–24. https://doi.org/10.1080/10705519609540026
  • Wright, B.D., & Linacre, M. (1994). Reasonable mean-square fit value. Rasch Measurement Transactions, 8(3), 370.
  • Yuan, R. (2021). Promoting English-as-a-medium-of-instruction (EMI) teacher development in higher education: What can language specialists do and become? RELC Journal, 54(1), 267–279. Online First. https://doi.org/10.1177/0033688220980173
  • Yuan, R., Qiu, X., Wang, C., & Zhang, T. (2023). Students’ attitudes toward language learning and use in English-medium instruction (EMI) environments: A mixed methods study. Journal of Multilingual and Multicultural Development, 1–18. Online First. https://doi.org/10.1080/01434632.2023.2176506
  • Zhang, S. (2006). Investigating the relative effects of persons, items, sections, and languages on TOEIC score dependability. Language Testing, 23(3), 351–369. https://doi.org/10.1191/0265532206lt332oa
  • Zhang, M., & Pladevall-Ballester, E.P. (2021). Students’ attitudes and perceptions towards three EMI courses in mainland China. Language, Culture & Curriculum, 35(2), 1–17. https://doi.org/10.1080/07908318.2021.1979576

Appendix

EMI Satisfaction, Engagement, Self-Efficacy Survey

Satisfaction

  1. I think that this course has taught me what I want to know about the main topic.

  2. Through this course, I was able to develop my general English ability.

  3. Through this course, I was able to develop my English related to the main topic.

  4. I think that I was able to become more of an international person through this course. (L)

  5. I need to have a high level of English competency for my future career.

  6. Choosing this faculty was a good idea.

  7. I think that my classmates were motivated to work hard in this class.

  8. I think that the course materials for this class met my needs).

Engagement

  • (9) The topics we discussed in class made me want to learn more about the topics.

  • (10) It was very interesting listening to the teacher talk about the main topics of this class.

  • (11) The textbook inspired me to learn more about the topic.

  • (12) I sometimes forgot time in class because I was so involved in the topic.

  • (13) When we did group discussions, I became very involved in them.

  • (14) When we did tasks in class, I became involved in them.

  • (15) Doing extra work out of class for this course is not hard because it was interesting for me.

  • (16) I enjoyed talking to my classmates about the topics of this class.

Self-efficacy

  • (17) The tasks that we did in class were easy for me to do.

  • (18) I could understand everything that the teacher talked about.

  • (19) I always contributed a lot to group discussions.

  • (20) I was able to complete all homework assignments for this class adequately.

  • (21) My English ability was adequate for this class.

  • (22) I could understand the content of the textbook for this class.

  • (23) I was at least as good as most of my classmates at understanding the topic.

  • (24) I had enough knowledge of the main topic of this class that I could discuss it in my own language.