2,707
Views
0
CrossRef citations to date
0
Altmetric
Articles

High school science teachers’ assessment literacy for inquiry-based science instruction

ORCID Icon & ORCID Icon
Pages 621-642 | Received 20 Feb 2023, Accepted 21 Aug 2023, Published online: 30 Aug 2023

ABSTRACT

Drawing upon the model for science teacher assessment literacy developed by [Abell, S. K., & Siegel, M. A. (2011). Assessment literacy: What science teachers need to know and be able to do. In The professional knowledge base of science teaching (pp. 205–221). Springer. https://doi.org/10.1007/978-90-481-3927-9_12], this study explored science teachers' knowledge and practices of inquiry-based science instruction (IBSI) in Taiwan, and investigated the possible interactions among the categories of assessment literacy. Forty high school science teachers with relevant experience in IBSI participated in the study. Data were collected from semi-structured interviews and background questionnaires. We analysed the data and developed a coding scheme through both theory-oriented and data-oriented approaches. Four categories of teacher assessment literacy were explored, including assessment purposes, assessed learning outcomes, assessment strategies, and assessment scoring. The results indicated that, through formative assessments, teachers supported students' inquiry-based learning and self-regulated learning in a variety of ways. Furthermore, teachers not only used multiple data sources to assess students' learning performances, but also appreciated both subcategories of assessed learning outcomes - inquiry as means and inquiry as ends. The analysis also revealed some patterns of interactions among the categories of assessment literacy. Based on the findings, discussions and suggestions are provided.

Introduction

Assessment literacy, as an integral part of teacher professionalism (Abell & Siegel, Citation2011; Xu & Brown, Citation2016), has been receiving growing attention from educational research communities (Coombs & DeLuca, Citation2022), partly due to the critical role of assessment in education. Assessment exerts profound influences on students’ learning (Grangeat et al., Citation2021; Mertler, Citation2004) and classroom instruction (Grangeat et al., Citation2021; OECD, Citation2016), and is also a tool of educational reform (Heitink et al., Citation2016). In addition to the need to evaluate students’ learning outcomes and to communicate with students, parents and stakeholders (Nitko & Brookhart, Citation2011; Pastore & Andrade, Citation2019), assessment results affect many subsequent decisions, such as how teachers adjust instruction, how students are assigned to suitable follow-on courses, and whether students have an opportunity to apply for universities. Although teachers have to make assessment-related decisions, they do not necessarily receive sufficient training or engage in professional development activities about assessments. In fact, research on teachers’ assessment literacy has revealed that some teachers have inappropriate assessment knowledge (Mertler, Citation2004; Plake, Citation1993; Xu & Brown, Citation2016), have difficulties comprehending specific technical terms (e.g. scale scores or confidence bands) (Kim et al., Citation2020) and experience challenges enacting some assessment practices (e.g. developing valid grading procedures) (Mertler, Citation2004).

Stiggins (Citation1991), a pioneer in the area, defined assessment literacy as knowledge about educational assessment and the related skills needed to evaluate students’ learning. Based on and expanding upon this definition, some standards or theories have been proposed and have focused on different components or aspects of teachers’ assessment literacy (Abell & Siegel, Citation2011; Pastore & Andrade, Citation2019; Xu & Brown, Citation2016). Among them, the Standards for Teacher Competence in Educational Assessment of Students (AFT, Citation1990) depicted seven competencies in which teachers should be skilled, for example, choosing assessment methods appropriate for instructional decisions. These standards have had broad implications for research, practice and professional development (Coombs & DeLuca, Citation2022; Mertler, Citation2004; Plake, Citation1993). Another significant example is the hierarchical framework proposed by Xu and Brown (Citation2016). Their framework included six components that illustrated teacher assessment literacy in practice, and extended the scope from teachers’ knowledge, skills, and practices to contextual factors, teacher conceptions, teacher learning, and teachers’ identity as assessors.

In recent years, the research field of assessment literacy has gained significant interest and has seen developments in various directions. First, researchers have examined teachers’ assessment literacy in different subject areas and instructional contexts, including language education (Coombe et al., Citation2020; Kremmel & Harding, Citation2020), physical education (DinanThompson & Penney, Citation2015; Tolgfors, Citation2019), mathematics (Ayalon & Wilkie, Citation2020), writing (Lam, Citation2019), and content-and-language integrated learning (Liu et al., Citation2023). Secondly, scholars have proposed theoretical frameworks and developed survey instruments to meet diverse needs (Kremmel & Harding, Citation2020; Pastore & Andrade, Citation2019; Yan & Pastore, Citation2022; Zhang et al., Citation2022). Thirdly, the influences of contextual factors at different levels on teachers’ assessment literacy have been explored (DeLuca et al., Citation2021; Edwards, Citation2020; Fulmer et al., Citation2015), including teachers’ knowledge and personal factors (Edwards, Citation2020) or macro-cultures in different countries (DeLuca et al., Citation2021). Fourthly, ongoing investigations and effective approaches have been conducted to enhance teachers’ assessment literacy through assessment education, training and professional development courses (Bijsterbosch et al., Citation2019; Schelling & Rubenstein, Citation2023). Finally, the role of students in assessment have been reflected, leading to the proposal and confirmation of frameworks and components of student assessment literacy (Chan & Luo, Citation2021; Hannigan et al., Citation2022).

While assessment literacy in certain subjects and instructional contexts has been emphasised, the main focus has predominantly been on language and physical education, with limited attention paid to science assessment literacy (Joachim et al., Citation2020). Characteristics of a discipline or the nature of the subject matter are critical factors that could influence the content, strategies, and subsequent decisions of assessments. For example, conducting experiments and developing models are common learning practices in science classrooms, but not in language courses. Science teachers need to have knowledge and related skills to design evaluations or tasks to assess students’ experimental skills and modelling abilities, whereas these assessment competencies are not required for language teachers.

Additionally, understandings of educational assessments at the international, regional, national, and classroom levels (e.g. international assessments, large-scale assessments, and classroom assessments) could involve different sets of knowledge and skills. The levels of assessment have also been overlooked by previous research on teachers’ assessment literacy.

Furthermore, assessment should be aligned with instruction (Martone & Sireci, Citation2009). In science education, the importance and effectiveness of inquiry-based science instruction (IBSI) have been widely recognised over the past 30 years (National Research Council, Citation2000). Designing and implementing assessments for IBSI are not only challenging (Hung & Wu, Citation2022), but have received increasing discussion and attention (Cowie & Harrison, Citation2021; Grangeat et al., Citation2021). Yet, relatively little is understood about science teachers’ assessment literacy for IBSI.

To fill the research gap regarding teachers’ assessment literacy for IBSI and to highlight the characteristics of science learning and classroom assessments, this study explored science teachers’ knowledge and practices related to classroom assessments of IBSI, and drew upon a model for science teacher assessment literacy developed by Abell and Siegel (Citation2011) as the theoretical framework of this study. As will be introduced later, the model proposed four categories of assessment literacy: assessment purposes, what to assess, assessment strategies, and assessment interpretation and action-taking. The research questions guiding this study were as follows. (1) What is science teachers’ assessment literacy for IBSI in Taiwan? (2) Are there interactions among the categories of assessment literacy? If so, what are the interaction patterns?

Theoretical background

The model for science teacher assessment literacy

According to Abell and Siegel (Citation2011), science teacher assessment literacy could be defined as ‘in creating an assessment-centered classroom, what a science teacher knows and is able to do’ (p. 206), including the assessment knowledge, skills, and practices of science teachers. Abell and Siegel proposed a model to depict the various types of assessment knowledge and skills needed to create an assessment-centered learning environment. The model was founded on teacher knowledge of assessment in science in Magnusson et al. (Citation1999) and classified science teachers’ assessment literacy into four categories.

The first category is teacher knowledge of assessment purposes including diagnostic, formative, summative and metacognitive assessments. For example, teachers should know why and when to choose a certain type of assessment, and understand the differences among these four types of assessment. The second category is what to assess, which is ‘related to curricular goals and to values of what is important to learn and how learning occurs’ (p. 214). This category involves identifying important teaching goals and establishing the alignment between instruction and assessment. For instance, if the instructional goal is to develop students’ scientific practices, the teacher should design assessment tasks or performance assessments that allow students to demonstrate their practices. The third category is assessment strategies, and refers to the ways teachers assess student learning (Magnusson et al., Citation1999), including general strategies (e.g. using a questioning strategy or worksheets in formative assessment) and topic-specific strategies (e.g. using a diagnostic test to analyse students’ alternative ideas about force). Teachers’ knowledge and skills about response strategies, including methods of grading and effective forms of feedback, also fall into this category. Finally, teachers’ assessment literacy of assessment interpretation and action-taking includes how teachers make sense of assessment results and decide their following actions. That is, science teachers should know how to process, interpret, and act upon assessment data such as assigning grades and modifying instructional plans.

In this study, Abell and Siegel’s model for science teacher assessment literacy (Citation2011) was adopted to guide our exploration of science teachers’ assessment literacy about IBSI because of several reasons. First, this model was developed based on their empirical research on assessment knowledge of science teachers as well as the theories of science teacher knowledge by Magnusson et al. (Citation1999) and Pellegrino et al. (Citation2001). The role of disciplinary characteristics in assessment literacy was taken into account in the model. Additionally, the model addressed classroom assessments and supported learner-centered environments that were in line with the focus of this study. Finally, the model suggested that the four categories ‘interact with each other in practice’ (p. 211). However, so far, few studies have analysed such interaction; thus, the second research question of this study could address this issue.

Relevant research on science teacher assessment literacy

Based on Abell and Siegel’s (Citation2011) model, relevant literature has explored science teachers’ assessment literacy, but some research issues remain open to further investigation. This study endeavoured to increase the understanding from different directions. Previous studies have compared teacher assessment literacy in theory and in practice (Izci & Siegel, Citation2019; Siegel & Wissehr, Citation2011) and have examined the change in the views and the practice of equitable assessment for English learners (Siegel, Citation2014). The results showed that, in practice, teachers changed to traditional forms of assessment, faced difficulties (Izci & Siegel, Citation2019; Siegel & Wissehr, Citation2011) or made less progress (Siegel, Citation2014), as opposed to their personal theories or views. However, these studies focused on only some of the categories in Abell and Siegel’s (Citation2011) model, while exploring teacher assessment literacy. For example, Siegel and Wissehr (Citation2011) only emphasised assessment tools and purposes. Additionally, although scoring is the basis for interpreting assessment data and taking action, and teachers encounter difficulties while grading (Mertler, Citation2004), the issue of assessment scoring in IBSI is yet to be addressed. Therefore, our study explored four categories (including assessment scoring) of Abell and Siegel’s (Citation2011) model to systematically present teacher assessment literacy about IBSI.

Regarding assessment purposes, previous studies employed data-driven approaches to present the diversity of assessment purposes (Siegel & Wissehr, Citation2011), or focused on summative assessment to develop a rubric and track the growth of teachers’ summative assessment literacy (Edwards, Citation2017). According to Abell and Siegel’s model, our study divided the assessment purposes into three subcategories (diagnostic, formative and summative assessment). This classification could fill the needs of classroom assessment and was in line with our goal because the assessment purposes at different learning stages were taken into account, such as understanding students’ background knowledge and progress, providing feedback to teachers and students, and providing evidence of student learning for grading.

Furthermore, most relevant research selected pre-service and novice teachers as participants (Siegel, Citation2014; Siegel & Wissehr, Citation2011). Nevertheless, there were significant differences in the assessment competency, approaches to assessment and perceived assessment skill of pre-service and in-service teachers (DeLuca et al., Citation2018; Mertler, Citation2004). For the purpose of presenting in-service science teachers’ assessment literacy about IBSI, we recruited science teachers with different teaching experiences to participate in the study. Moreover, in order to cover the varieties of science teacher assessment literacy about IBSI, we invited 10 teachers from each discipline (physics, chemistry, biology, and earth science) rather than focusing on one specific discipline (Gottheiner & Siegel, Citation2012).

Finally, the current study further explored the interactions among the categories of assessment literacy. Some studies have touched on this issue, but much remains to be explored. Grob et al. (Citation2021) and Wang et al. (Citation2010) aimed to describe the relations between assessment strategies and assessed learning outcomes. For example, Grob et al. (Citation2021) indicated that high school teachers often use written data to assess the ability of ‘communicating about the methodology and the results’ in IBSI. However, they only focused on formative assessment. Besides, at high school level, most assessment activities are based on one data source, and especially on written data (Grob et al., Citation2021). However, we were curious about the interactions between assessed learning outcomes and assessed purposes, or whether there are different patterns of interaction among assessment strategies and assessed learning outcomes in Taiwan. The patterns could help to provide evidence-based suggestions for improving teacher assessment literacy, such as using different assessment strategies to assess important but infrequent learning outcomes.

Assessments for IBSI

Educational research has indicated many challenges that science teachers meet to successfully design and carry out assessments when enacting IBSI (Akuma & Callaghan, Citation2019; Cowie & Harrison, Citation2021; Grangeat et al., Citation2021; Talanquer et al., Citation2013; Zlabkova et al., Citation2021). The possible reasons resulting in challenges may be as follows. (1) Scientific inquiry is a multi-faceted activity involving different scientific practices and corresponding learning outcomes (NRC, Citation2000, Citation2012). (2) To assess multi-faceted activities and abilities, teachers often use multiple data sources (Grob et al., Citation2021). (3) With the improvement of students’ abilities, the relationship between teachers and students is also changed accordingly (Crawford, Citation2000; Dobber et al., Citation2017; Wu & Hsieh, Citation2006; Zlabkova et al., Citation2021). For example, in IBSI, a teacher is not only an assessor but a collaborator. These characteristics open up more possibilities of assessments for inquiry learning, but also create difficulties and challenges regarding what to assess, and how to assess it.

To support teachers in connecting inquiry instruction with assessment practices, there have been many studies and international projects devoted to the assessment of inquiry learning (Grangeat et al., Citation2021; Grob et al., Citation2021; Ruiz-Primo et al., Citation2010; Zuiker & Whitaker, Citation2014). The topics or issues they have explored include developing an inquiry-based teaching model that integrates assessment strategies and resources (Zuiker & Whitaker, Citation2014), analysing the functions, methods and cycles of teacher-student conversations in IBSI (Nieminen et al., Citation2021; Rached & Grangeat, Citation2021), identifying pre-service teachers’ abilities to notice or provide feedback on students’ performances (Ropohl & Rönnebeck, Citation2019; Talanquer et al., Citation2013), analysing the assessment practices at different school levels (Grob et al., Citation2021), and establishing a partnership between researchers and teachers to find effective assessment methods (Zlabkova et al., Citation2021). These studies also showed that regarding assessments of inquiry learning, teachers preferred summative assessment (Grob et al., Citation2021; Zlabkova et al., Citation2021) because of time issues, their assessment competences, or their beliefs in helping students prepare for high-stakes examinations. On the contrary, researchers emphasised formative assessments for IBSI which can bridge assessment and inquiry learning to improve students’ inquiry abilities (Cowie & Harrison, Citation2021; Grangeat et al., Citation2021). Although science teachers struggle to fulfil different assessment purposes of IBSI (Zuiker & Whitaker, Citation2014), few studies have examined teachers’ thoughts about the assessment purposes and corresponding reasons.

Besides, to assess multi-faceted inquiry abilities, NRC (Citation2012) and researchers recommended using multiple forms of assessments to collect the evidence of student learning (Kuo et al., Citation2015; Zlabkova et al., Citation2021). However, relevant studies have only focused on a certain type of assessment data source (e.g. teacher-student conversations) (Nieminen et al., Citation2021; Rached & Grangeat, Citation2021) or assessment method (e.g. peer assessment) (Zlabkova et al., Citation2021). It remains unanswered how and why different forms of assessment (e.g. peer and online assessments) are used in IBSI. To answer these questions, this study conducted teacher interviews, investigated science teachers’ assessment literacy about IBSI, and explored the assessment categories identified in Abell and Siegel’s (Citation2011) model.

Methods

Participants

In order to collect information and perspectives on the assessments of IBSI, this study planned to recruit teachers with experience of IBSI. Also, to balance the influence of subject area, teaching experience, and professional growth on teachers’ assessment literacy, we recruited 10 teachers from each subject (i.e. physics, chemistry, biology, and earth science); of these teachers, three were less-experienced (teaching less than 10 years), three were experienced (teaching more than 10 years), and four ‘seed’ teachers were regularly involved in professional development activities. We contacted high school science teachers who met the recruitment criteria and expressed their willingness to participate in the study through various channels, including personal connections, professional development groups, curriculum development committees, and other relevant networks.

Finally, forty high school science teachers (21 female and 19 males) with an average of 9.1 years of inquiry-based teaching experience participated in this study. These teachers came mainly from northern Taiwan (85%, 34 out of 40) and 98% of them taught science in public schools. Their teaching experience ranged from 3 to 33 years, with an average of 15.1 years. shows an overview of their backgrounds.

Table 1. Background Information of the Participating Teachers (N = 40).

Data collection

The instruments of the study included a background questionnaire and an interview protocol. The questionnaire was designed to survey teachers’ background information, such as their demographics, teaching experience in science, teaching experience in IBSI, and their professional development for IBSI and assessments. The interview protocol was developed based on the four categories in Abell and Siegel’s (Citation2011) model. Two high school science teachers were invited to pilot the interview questions and help refine the protocol. Additionally, a panel of experts, including science education specialists and in-service teachers, examined the interview questions to ensure they were appropriate for probing teachers’ IBSI assessment conceptions and practices. Three doctoral students, including the first author, conducted the interviews, and before the formal interviews the interviewers engaged in several discussions to reach consensus on interview procedures and prompts. In the final version, the interview questions were: (1) How do you assess the effectiveness of inquiry-based instruction? (2) What assessment activities or questions do you design? (3) How do you score or analyse students’ performances of these assessment activities or their responses to the questions?

Before the interview was conducted, each teacher was handed a letter and an informed consent form. The teachers were informed of the study purpose, and their participation was voluntary and anonymous. They could withdraw from the research at any stage. The video recordings and written data would be used only for research purposes. After completing the form and agreeing to participate in the study, teachers started answering the questionnaire and interview questions. We invited teachers to give examples to illustrate their statements, and asked further questions when we received unclear responses. Teachers were interviewed individually and each interview lasted about 40 min. The interviews were videotaped, transcribed, and analysed to answer the research questions.

Data analysis

Coding scheme

Through an iterative process, we developed a coding scheme that originally included the four categories of assessment literacy from Abell and Siegel (Citation2011) and then refined based on the data. The iterative process was repeated until the scheme could appropriately cover the information from the data corpus. The codes in the final scheme were mentioned by at least 10% of the teachers. present the coding scheme of the four categories: assessment purpose, assessed learning outcome, assessment strategy, and assessment scoring.

Table 2. Assessment purpose and what to assess: coding scheme and results.

Table 3. Assessment strategy: coding scheme and results.

Table 4. Assessment scoring: coding scheme and results.

According to Abell and Siegel (Citation2011), the category of assessment purpose included diagnostic, formative, summative, and metacognitive purposes. Metacognitive assessments supported students in gaining awareness of, actively monitoring, and further regulating their own learning. However, during the interviews only a few teachers mentioned metacognitive purposes. Considering that the definition of metacognitive purpose was aligned with the definition of formative assessment (Black & Wiliam, Citation1998; Yan et al., Citation2021), we incorporated metacognitive purpose into formative purpose. shows the three components of assessment purpose: diagnostic, formative, and summative.

also presents the assessed learning outcomes that could be classified into two subcategories: inquiry as means and inquiry as ends, according to Abd-El-Khalick et al. (Citation2004), Grob et al. (Citation2021), Talanquer et al. (Citation2013), and Wang et al. (Citation2010). The third category () was assessment strategies with two subcategories: assessment method and data source (Grob et al., Citation2021). Finally, in the assessment interpretation and action-taking, we focused on assessment scoring in with two subcategories: grading systems (Herridge & Talanquer, Citation2021) and scoring criteria (Grob et al., Citation2021).

Procedures

To answer the research questions, we analysed the qualitative data and followed the procedures suggested by Erickson (Citation2012) and Strauss and Corbin (Citation1998). The interviews of 40 teachers were transcribed verbatim, and the NVivo 12 analysis software (QSR-International, Citation2020) was used to organise and analyse the data. A unit of analysis (or a ‘response’) was a fragment representing an idea of approximately one to three sentences. Each response could be coded by more than one code depending on its content. The transcripts were then coded through an iterative process. Additionally, another science education researcher was invited to code 25% of the data (10 teachers’ transcripts) independently, and the inter-agreement between the two coders was 0.86.

After coding, we examined the interactions among the codes within and between categories, recorded common patterns of data, and searched for assertions to answer the research questions. The assertions of these interactions and patterns could also fully cover the variation in the data (Hasselgren & Beach, Citation1997). Finally, to generate findings, we validated the assertions by using confirming evidence from the data.

Findings

Assessment literacy about IBSI

To answer the first research question and illustrate science teachers’ assessment literacy about IBSI, we created to summarise the findings.

Assessment purpose

Regarding the assessment purposes, shows that formative purposes were most frequently used, whereas diagnostic purposes were less mentioned. A total of 33 teachers (83%) talked about the assessment purposes in the interviews, among which most teachers (65%) mentioned the formative purposes. The analysis of teachers’ responses found that through formative assessments, teachers supported students’ inquiry-based learning and self-regulated learning in a variety of ways. The formative assessments mentioned by teachers were diverse. Some teachers asked students to propose their research plans, and adjusted the instruction to help students carry out their research. For example, T09 stated that ‘In mid-September, students must submit proposals. If students’ knowledge or abilities are insufficient, I will add two more classes to teach them how to write a summary and how to read reference books’.

Additionally, some teachers used concept maps or mind maps as formative assessments, and provided feedback so students could improve their ideas. For example, ‘The mind map can show students’ logical thinking abilities. After a student finished their drawing, we discuss it and I give feedback to the student. Then he can improve it’ (T14). Furthermore, some teachers indicated that assessments could serve metacognitive purposes. They helped students self-monitor their learning and developed scientific knowledge, abilities, and attitudes using the information from formative assessments.

Teachers (38%) also talked about summative assessments such as scoring and understanding the learning achievement of students. For example, ‘Students care deeply about grades and how the teacher evaluates their works. The grades allow them to know if they have learned something or if the teacher is satisfied with their performance. That is a measure of achievement’ (T20). Fewer teachers (18%) mentioned the diagnostic purposes, such as understanding students’ background knowledge or misconceptions at the beginning of learning. A typical example was that ‘In the first class, I conducted a test to get a rough idea of the background knowledge and abilities of students. I have a set of standards in my mind while asking the questions. If students meet that standard, I think it’s OK, and then we can go ahead’ (T13).

Assessed learning outcomes

The second category was what to assess, referring to the various learning outcomes that assessments were designed to address (). Every teacher mentioned at least one learning outcome from the subcategory of inquiry as ends (100%), while most teachers also indicated the learning outcomes in the subcategory of inquiry as means (93%).

Regarding the outcomes of inquiry as means (), scientific knowledge (58%), and learning attitudes and engagement (43%) were the most common. Many teachers evaluated students’ scientific conceptions and knowledge needed in investigating scientific phenomena or designing products, such as the principles and concepts about standing waves, humidity, and fluids. T40 provided an example, ‘I’ll ask a mechanics problem, say an airplane flying … involving fluid mechanics and force … to see if they understand the scientific concepts during the research process. This is the focus of an assessment’. Furthermore, teachers (43%) assessed students’ learning attitude, motivation, participation, and other affective objectives in IBSI. For example,

His participation in each course … the word ‘participation’ does not refer to attendance or whether he sleeps. It means that when the teacher asks a question, he can follow this question to think or engage deeply or provide different views. (T30)

In the process of IBSI, self-directed learning and reflection were also required (35%) at the individual level, and at the group or class level, students needed to interact and cooperate with peers (33%). Common assessment methods were asking students to report results regularly to promote their self-directed learning, or to report difficulties they encountered in class for peer reflection. Students were also urged to reflect on and adjust the research process and results. Besides, 20% of the teachers hoped that students could improve their thinking and creative abilities.

In terms of assessing outcomes of inquiry as ends (), a majority of teachers focused on planning and carrying out investigations (75%), obtaining and communicating information (68%), engaging in inquiry and solving problems (60%), and observing and asking questions (58%), whereas few teachers addressed analysing and interpreting data (35%), or engaging in argumentation and developing models (23%). Offering an example of planning and carrying out investigations, T21 said that ‘Regarding the terminal velocity, I will pay more attention to how students measure it, and then what operations they perform on the paper. I do not emphasise how much time it takes for terminal velocity to be reached’.

Assessment strategy

Two subcategories were revealed in the third category of assessment strategy (). The first subcategory was assessment methods that included teacher-assessment, peer-assessment, and self-assessment. Teacher assessment was the most common, followed by peer assessment, whereas self-assessment was less mentioned. Additionally, the three assessment methods were used to achieve different purposes. For example, when the assessors were teachers, teachers checked whether students’ scientific concepts or inquiry abilities were up to their standards, examined students’ learning performances and difficulties, and made up for deficiencies in teaching. As stated by T05,

In class, not everyone had the opportunity to answer the questions, so I asked students to answer the same questions in words during the exam. Through the written data, I knew that some students didn’t understand at all, and some might understand, but they got nervous or felt shy when expressing their opinions.

On the other hand, 40% of teachers used student peer-assessment in which students rated their peers’ concept maps, products, or the contributions in a group. Teachers believed that peer assessment could improve the fairness of grading, prevent teachers from being subjective, and enhance students’ ability to analyse and judge the quality of peers’ work through mutual evaluation. However, to develop students’ evaluation ability, teachers must not only explain the grading standards but also ask students to practice how to grade and explain the scoring rubrics. Finally, fewer teachers (15%) used student self-assessment. They allowed students to conduct individual or group self-assessments to examine whether the design or activity met the requirements, whether the report was complete, how well the product (e.g. a musical instrument) was designed, or whether a proposal was feasible.

The second subcategory of assessment strategy was data source (). The participating teachers used a variety of data sources to assess students’ learning performances. Written data (93%), oral data (70%) and observational data (58%) were most common. Among the written data, reports and essays (53%) were most often used and examination papers (33%) were less frequently used. For example, ‘Throughout the activity, four or five types of data may be used to collect information on students’ performances including work sheets, essays, academic portfolios, artifacts and photos, and so on’ (T11). Regarding the oral data, the most common forms were students’ group presentations of their final reports. In addition, in different phases of inquiry (e.g. after searching for information), teachers asked students to have group discussions, answer teachers’ questions, or engage in debates by taking different positions.

More than half of the teachers also used observational data. They observed students’ experimental performances (e.g. titration, use of microscope), participation in class, and some affective responses (e.g. learning attitude). Doing scientific investigations was an essential part of IBSI so teachers would observe whether students could implement the experimental design, how they use tools and measurements, and how they solved the problems in an investigation. For example, T35 said,

If you want to assess students’ process skills, it’s best … to see how they do it. In fact, I’ve been looking around the classroom to see if students are measuring the angle correctly, or whether they are using trigonometry correctly.

However, teachers also mentioned the difficulties they faced while collecting and evaluating observational data. ‘I could probably know the performance of each group, but I couldn’t know who did it. It’s difficult to assess the work of each student’ (T19). This may be part of the reason why teachers had to use multiple sources of data to assess students’ performances in IBSI.

Assessment scoring

In the fourth category of assessment scoring, the subcategories of the grading system and scoring criterion emerged from the data (). Grading systems included criterion referenced (45%) and norm referenced (13%) systems. Teachers who used the former system listed the items to be graded, generated the criteria and corresponding points, and rated them accordingly. On the contrary, teachers who used the latter system (13%) first compared and ranked students’ performances, and then gave the corresponding scores.

In the subcategory of scoring criterion, the most commonly used were the correctness of scientific knowledge (60%), the rationality and logic of inferences (43%), the degree of participation and interaction of students (43%), and the completeness and richness of reports and data (40%). Additionally, teachers’ scoring criteria covered students’ cognitive, affective, and behavioural aspects, as well as the quality of their work (e.g. innovation and aesthetics). For example, among these criteria, those related to the cognitive aspects included the correctness of scientific knowledge (60%), the rationality and logic of inferences (43%), and identifying the key points at different stages (such as observation, recording, explanation) (33%) and critical reflection (20%). Besides, the criteria related to students’ behaviours contained the accuracy, proficiency and completion of conducting experiments (33%) and the clarity, robustness, and interestingness of expression (13%). Teachers’ responses to the scoring criteria reflected the diversity of assessed learning outcomes and data sources in IBSI, so they covered a wide range of aspects and different levels in the same aspect.

Interactions among the categories of assessment literacy

To answer the second research question and investigate the interactions among the categories, we used cross-tables ( and ) to present the possible interplays between two categories.

Table 5. Interactions between assessed learning outcomes and assessment purpose.

Table 6. Interactions between assessed learning outcomes and assessment strategy.

Interactions between assessed learning outcomes and assessment purposes

As the finding of assessment purposes presented previously indicated, the major purpose of teachers’ assessments in IBSI was formative. shows that the assessed learning outcomes for formative purposes were diverse, and an emphasis was placed on assessing learning outcomes of inquiry as ends. Among them, planning and carrying out investigations (18), and obtaining and communicating information (10) were most common, followed by inquiry and solving problems (7), and observation and asking questions (6). Among the assessed learning outcomes of inquiry as means, self-directed learning and reflection (8), and understanding and integrating scientific knowledge (7) were more common. Based on the formative purpose, teachers paid attention to students’ performance in self-directed learning and reflection, as well as the progress of scientific knowledge and inquiry abilities.

Regarding the summative and diagnostic purposes, teachers were more likely to assess the learning outcomes of inquiry as ends (). For summative purposes, teachers focused on understanding and integrating scientific knowledge (6), and obtaining and communicating information (5). For diagnostic purposes, teachers typically assessed planning and carrying out investigations (3).

Interactions between assessed learning outcomes and assessment strategies

illustrates the interactions between assessed learning outcomes and assessment strategies used by the teachers. Overall, the participating teachers used different data sources and assessment methods to target different learning outcomes in IBSI. Among the data sources, written data (64), oral data (44) and observational data (40) were the most common data sources to assess the learning outcomes of inquiry as ends. For example, the worksheets were used to evaluate students’ performance of planning and carrying out investigations (11). Reports and essays could evaluate more comprehensive scientific practices such as planning (7) and engaging in inquiry (6). Oral data were most commonly used to assess obtaining and communicating information (26), and observing and asking questions (14). Teachers often used observational data to assess planning and carrying out investigations (34), and engaging in inquiry (11).

Additionally, written data (40), oral data (20), and 3-D products or models (18) were the most common data sources to assess the learning outcomes of inquiry as means. For example, 3-D products or models and concept maps were often used to evaluate students’ ability to think and innovate. T31 mentioned the innovations that students made using 3D printers while students assembled wind power modules.

In terms of the assessment methods, the three methods were commonly used to assess the learning outcomes of inquiry as ends, and the learning outcomes were mostly assessed by teachers. Learning outcomes including planning and carrying out investigations (21) and understanding scientific knowledge (18) were most commonly assessed by teachers, whereas obtaining and communicating information (5), self-directed learning and reflection (4) were more often assessed by peers.

Discussion and implications

This study systematically explored high school science teachers’ assessment literacy for inquiry-based science instruction (IBSI) in Taiwan, and further searched for interactions among the categories of assessment literacy. Regarding assessment purposes, the findings echoed the work of Siegel and Wissehr (Citation2011), in that most teachers recognised the importance of formative assessment and supported students’ inquiry learning in a variety of ways. However, fewer teachers mentioned diagnostic assessment. The possible reasons were as follows. First, teachers might examine students’ background abilities through dialogue (Kawalkar & Vijapurkar, Citation2013) or informal formative assessment during the teaching process. Second, most teachers in the study had rich relevant teaching experience and were familiar with students’ background knowledge and abilities. According to our previous study, many teachers pre-arranged basic training courses.

More than 40% of teachers mentioned six sub-categories of assessed learning outcomes, including four scientific practices, understanding scientific knowledge, learning attitude and engagement. These learning outcomes were aligned with the teaching goals of the curriculum guidelines in Taiwan. Particularly, it highlighted that teachers not only valued scientific practices, but also expected students to develop various abilities through IBSI, such as self-directed learning and reflection, group interactions and cooperative learning. Nevertheless, similar to the literature (Grob et al., Citation2021; Talanquer et al., Citation2013), the study showed that learning outcomes including analysing and interpreting data, engaging in argument and developing models were neglected, despite their significance in IBSI.

Notably, differing from previous studies (Grob et al., Citation2021; Talanquer et al., Citation2013), the findings showed that many teachers connected phenomena and knowledge (e.g. paper falling and terminal speed) or paid attention to subject-specific content (e.g. redox, simple harmonic motion) while explaining the assessment plans. Obviously, teachers emphasised process skills as well as understanding, applying and even discovering scientific knowledge through the process of assessment. Teachers’ emphasis on scientific knowledge is also reflected in assessed learning outcomes of inquiry as means and the scoring criteria. ‘Understanding and integrating scientific knowledge’ and ‘correctness of scientific knowledge’ were the most mentioned codes in the above two sub-categories, respectively.

Teachers performed multiple roles not only when conducting IBSI (Dobber et al., Citation2017) but also in the assessment process. We found that teachers are trainers of assessment ability, assessors, instructors, and motivators while conducting assessment. Teachers guided diagnostic and summative assessment. In fact, they used assessment to confirm teaching effectiveness, conduct supplementary teaching, discover students’ learning difficulties, and give extra points to encourage specific performances (e.g. innovation, challenge). Additionally, teachers trained students to understand, practice, and perform peer- and self-assessment; they even conducted meetings to verify the effectiveness and credibility of these assessments.

Peer-assessment that can help brainstorm usually occurred in the stages of asking questions, proposing hypotheses, planning research, and reporting results. Self-assessment was often used in planning, carrying on investigations and obtaining, evaluating, and communicating information, so that students had ownership of their learning. These two assessments were commonly used for formative purposes, and for examining students’ self-directed learning and reflection abilities. The results echoed and extended the work of Panadero et al. (Citation2017). They found that the intervention of self-assessment can promote students’ self-regulated learning. Besides, Zlabkova et al. (Citation2021) indicated that common challenges in peer-assessment included time issues, low support from colleagues, parental concerns, and students’ limited knowledge. In this study, some teachers performed peer-assessment in open discussion, that allowed teachers to supplement what students said or did in a timely manner, co-construct knowledge, overcome the challenges and enhance the effectiveness of peer assessment.

Teachers used multiple data sources to assess students’ performances, with written data being the most common (93%). Grob et al. (Citation2021) believed that the phenomenon may have resulted from the mature reading and writing abilities of high school students and the scoring habits of summative assessment. Among written data, most teachers used reports and essays (53%), while fewer used test papers (33%). The former usually involved complete inquiry processes, and could leave time for students to reflect, get feedback, and revise (NRC, Citation2000). Differing from previous studies (Grob et al., Citation2021), oral data (70%) and observational data (58%) were also common data sources. Feedback and communication between teachers, students and peers could be improved, and students’ thinking processes and difficulties could also be presented (Nieminen et al., Citation2021; Rached & Grangeat, Citation2021; Ruiz-Primo & Furtak, Citation2007) through oral data that collected evidence of student performance and supported learning simultaneously. Moreover, observational data was an important data source for teachers to understand students’ experimental ability, learning attitude, and group interactions. Multiple data sources met the requirements of the guidelines (National Research Council, Citation2000, Citation2012) and allowed students with different abilities and expertise to express learning outcomes in different ways.

Regarding the interactions between assessed learning outcomes and data sources, for common learning outcomes (e.g. planning, conducting investigations), teachers use various data sources to judge student performance (). However, other learning outcomes were assessed by specific data sources. For example, students’ thinking and creative ability were mainly evaluated by models and 3-D products; the ability of analysing and interpreting data were assessed by test papers, and the ability of argumentation and modelling was assessed by oral data. For the same learning outcome, if data sources used to assess performance could be changed, new possibilities might be opened up. In other words, it suggests that more methods or data sources could be used more frequently for important learning outcomes, such as using multimedia-based assessment to examine the ability of analysing and interpreting data (Kuo et al., Citation2015) or using argumentative writing to assess the ability of argumentation (Sampson et al., Citation2013).

Ethics statement

Before the interview was conducted, each teacher was handed a letter and an informed consent form. The teachers were informed of the study purpose, and their participation was voluntary and anonymous. They could withdraw from the research at any stage. The video recordings and written data would be used only for research purposes. IRB review was not considered necessary by the funding institute.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Ministry of Science and Technology in Taiwan [grant number MOST 109-2511-H-003-015-MY3].

References

  • Abd-El-Khalick, F., Boujaoude, S., Duschl, R., Lederman, N. G., Mamlok-Naaman, R., Hofstein, A., ... Tuan, H. L. (2004). Inquiry in science education: International perspectives. Science Education, 88(3), 397–419.
  • Abell, S. K., & Siegel, M. A. (2011). Assessment literacy: What science teachers need to know and be able to do. In The professional knowledge base of science teaching (pp. 205–221). Springer. https://doi.org/10.1007/978-90-481-3927-9_12
  • AFT, N., NEA, (American Federation of Teachers, Standards for teacher competence in educational assessment of students. (1990). Standards for teacher competence in educational assessment of students. Educational Measurement: Issues and Practice, 9(4), 30–30. https://doi.org/10.1111/j.1745-3992.1990.tb00391.x
  • Akuma, F. V., & Callaghan, R. (2019). A systematic review characterizing and clarifying intrinsic teaching challenges linked to inquiry-based practical work. Journal of Research in Science Teaching, 56(5), 619–648. https://doi.org/10.1002/tea.21516
  • Ayalon, M., & Wilkie, K. J. (2020). Developing assessment literacy through approximations of practice: Exploring secondary mathematics pre-service teachers developing criteria for a rich quadratics task. Teaching and Teacher Education, 89, 103011, 1–14. https://doi.org/10.1016/j.tate.2019.103011.
  • Bijsterbosch, E., Béneker, T., Kuiper, W., & van der Schee, J. (2019). Teacher professional growth on assessment literacy: A case study of prevocational geography education in The Netherlands. The Teacher Educator, 54(4), 420–445. https://doi.org/10.1080/08878730.2019.1606373
  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
  • Chan, C. K. Y., & Luo, J. (2021). A four-dimensional conceptual framework for student assessment literacy in holistic competency development. Assessment & Evaluation in Higher Education, 46(3), 451–466. https://doi.org/10.1080/02602938.2020.1777388
  • Coombe, C., Vafadar, H., & Mohebbi, H. (2020). Analyzing rater severity in a freshman composition course using many facet Rasch measurement. Language Testing in Asia, 10(1), 1–16. https://doi.org/10.1186/s40468-020-0098-3
  • Coombs, A., & DeLuca, C. (2022). Mapping the constellation of assessment discourses: A scoping review study on assessment competence, literacy, capability, and identity. Educational Assessment, Evaluation and Accountability, 34(3), 279–301. https://doi.org/10.1007/s11092-022-09389-9
  • Cowie, B., & Harrison, C. (2021). The what, when & how factors: Reflections on classroom assessment in the service of inquiry. International Journal of Science Education, 449–465. https://doi.org/10.1080/09500693.2020.1824088
  • Crawford, B. A. (2000). Embracing the essence of inquiry: New roles for science teachers. Journal of Research in Science Teaching, 37(9), 916–937. https://doi.org/10.1002/1098-2736(200011)37:9<916::AID-TEA4>3.0.CO;2-2
  • DeLuca, C., Rickey, N., & Coombs, A. (2021). Exploring assessment across cultures: Teachers’ approaches to assessment in the U.S., China, and Canada. Cogent Education, 8(1), 1921903. https://doi.org/10.1080/2331186X.2021.1921903
  • DeLuca, C., Valiquette, A., Coombs, A., LaPointe-McEwan, D., & Luhanga, U. (2018). Teachers’ approaches to classroom assessment: A large-scale survey. Assessment in Education: Principles, Policy & Practice, 25(4), 355–375. https://doi.org/10.1080/0969594X.2016.1244514
  • DinanThompson, M., & Penney, D. (2015). Assessment literacy in primary physical education. European Physical Education Review, 21(4), 485–503. https://doi.org/10.1177/1356336X15584087
  • Dobber, M., Zwart, R., Tanis, M., & van Oers, B. (2017). Literature review: The role of the teacher in inquiry-based education. Educational Research Review, 22, 194–214. https://doi.org/10.1016/j.edurev.2017.09.002
  • Edwards, F. (2017). A rubric to track the development of secondary pre-service and novice teachers’ summative assessment literacy. Assessment in Education: Principles, Policy & Practice, 24(2), 205–227. https://doi.org/10.1080/0969594X.2016.1245651
  • Edwards, F. (2020). The effect of the lens of the teacher on summative assessment decision making: The role of amplifiers and filters. The Curriculum Journal, 31(3), 379–397. https://doi.org/10.1002/curj.4
  • Erickson, F. (2012). Qualitative research methods for science education. In B. J. Fraser, K. G. Tobin, & C. J. McRobbie (Eds.), Second international handbook of science education (pp. 1451–1469). Springer.
  • Fulmer, G. W., Lee, I. C., & Tan, K. H. (2015). Multi-level model of contextual factors and teachers’ assessment practices: An integrative review of research. Assessment in Education: Principles, Policy & Practice, 22(4), 475–494. https://doi.org/10.1080/0969594X.2015.1017445
  • Gottheiner, D. M., & Siegel, M. A. (2012). Experienced middle school science teachers’ assessment literacy: Investigating knowledge of students’ conceptions in genetics and ways to shape instruction. Journal of Science Teacher Education, 23(5), 531–557. https://doi.org/10.1007/s10972-012-9278-z
  • Grangeat, M., Harrison, C., & Dolin, J. (2021). Exploring assessment in STEM inquiry learning classrooms. International Journal of Science Education, 43(3), 345–361. https://doi.org/10.1080/09500693.2021.1903617
  • Grob, R., Holmeier, M., & Labudde, P. (2021). Analysing formal formative assessment activities in the context of inquiry at primary and upper secondary school in Switzerland. International Journal of Science Education, 43(3), 407–427. https://doi.org/10.1080/09500693.2019.1663453
  • Hannigan, C., Alonzo, D., & Oo, C. Z. (2022). Student assessment literacy: Indicators and domains from the literature. Assessment in Education: Principles, Policy & Practice, 29(4), 482–504. https://doi.org/10.1080/0969594X.2022.2121911
  • Hasselgren, B., & Beach, D. (1997). Phenomenography — a “good-for-nothing brother” of phenomenology? Outline of an analysis. Higher Education Research & Development, 16(2), 191–202. https://doi.org/10.1080/0729436970160206
  • Heitink, M. C., Van der Kleij, F. M., Veldkamp, B. P., Schildkamp, K., & Kippers, W. B. (2016). A systematic review of prerequisites for implementing assessment for learning in classroom practice. Educational Research Review, 17, 50–62. https://doi.org/10.1016/j.edurev.2015.12.002
  • Herridge, M., & Talanquer, V. (2021). Dimensions of variation in chemistry instructors’ approaches to the evaluation and grading of student responses. Journal of Chemical Education, 98(2), 270–280. https://doi.org/10.1021/acs.jchemed.0c00944
  • Hung, C. S., & Wu, H. K. (2022). High school science teachers’ conceptions about the curriculum of “inquiry and practice”: Course characteristics, challenges, teaching goals and activities. Chinese Journal of Science Education, 30(1), 1–26. https://doi.org/10.6173/CJSE.202203_30(1).0001
  • Izci, K., & Siegel, M. A. (2019). Investigation of an alternatively certified new high school chemistry teacher’s assessment literacy. International Journal of Education in Mathematics, Science and Technology, 7(1), 1–19. https://doi.org/10.18404/ijemst.473605
  • Joachim, C., Hammann, M., Carstensen, C. H., & Bögeholz, S. (2020). Modeling and measuring Pre-service teachers’ assessment literacy regarding experimentation competences in biology. Education Sciences, 10(5), 140. https://doi.org/10.3390/educsci10050140
  • Kawalkar, A., & Vijapurkar, J. (2013). Scaffolding Science Talk: The role of teachers’ questions in the inquiry classroom. International Journal of Science Education, 35(12), 2004–2027. https://doi.org/10.1080/09500693.2011.604684
  • Kim, A. A., Chapman, M., Kondo, A., & Wilmes, C. (2020). Examining the assessment literacy required for interpreting score reports: A focus on educators of K–12 English learners. Language Testing, 37(1), 54–75. https://doi.org/10.1177/0265532219859881
  • Kremmel, B., & Harding, L. (2020). Towards a comprehensive, empirical model of language assessment literacy across stakeholder groups: Developing the language assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120. https://doi.org/10.1080/15434303.2019.1674855
  • Kuo, C. Y., Wu, H. K., Jen, T. H., & Hsu, Y. S. (2015). Development and validation of a multimedia-based assessment of scientific inquiry abilities. International Journal of Science Education, 37(14), 2326–2357. https://doi.org/10.1080/09500693.2015.1078521
  • Lam, R. (2019). Teacher assessment literacy: Surveying knowledge, conceptions and practices of classroom-based writing assessment in Hong Kong. System, 81, 78–89. https://doi.org/10.1016/j.system.2019.01.006
  • Liu, J. E., Lo, Y. Y., & Xin, J. J. (2023). CLIL teacher assessment literacy: A scoping review. Teaching and Teacher Education, 129, 104150. https://doi.org/10.1016/j.tate.2023.104150
  • Magnusson, S., Krajcik, J., & Borko, H. (1999). Nature, sources, and development of pedagogical content knowledge for science teaching. In Examining Pedagogical Content Knowledge (pp. 95–132). Springer.
  • Martone, A., & Sireci, S. G. (2009). Evaluating alignment between curriculum, assessment, and instruction. Review of Educational Research, 79(4), 1332–1361. https://doi.org/10.3102/0034654309341375
  • Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experience make a difference? 49–64. American secondary education.
  • National Research Council. (2000). Inquiry and the national science education standards: A guide for teaching and learning. National Academy Press.
  • National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press. https://doi.org/10.17226/13165
  • Nieminen, P., Hähkiöniemi, M., & Viiri, J. (2021). Forms and functions of on-the-fly formative assessment conversations in physics inquiry lessons. International Journal of Science Education, 43(3), 362–384. https://doi.org/10.1080/09500693.2020.1713417
  • Nitko, A. J., & Brookhart, S. M. (2011). Educational assessment of students (P. A. Smith, 6th ed.). ERIC.
  • OECD. (2016). Pisa 2015 results (volume II): policies and practices for successful schools. OECD Publishing. https://doi.org/10.1787/9789264267510-en
  • Panadero, E., Jonsson, A., & Botella, J. (2017). Effects of self-assessment on self-regulated learning and self-efficacy: Four meta-analyses. Educational Research Review, 22, 74–98. https://doi.org/10.1016/j.edurev.2017.08.004
  • Pastore, S., & Andrade, H. L. (2019). Teacher assessment literacy: A three-dimensional model. Teaching and Teacher Education, 84, 128–138. https://doi.org/10.1016/j.tate.2019.05.003
  • Pellegrino, J. W., Chudowsky, N., & Glaser, R. (2001). The nature of assessment and reasoning from evidence. In R. Glaser, N. Chudowsky, & J. W. Pellegrino (Eds.), Knowing what students know: The science and design of educational assessment (pp. 37–54). Washington DC: National Academies Press.
  • Plake, B. S. (1993). Teacher assessment literacy: Teachers’ competencies in the educational assessment of students. Mid-Western Educational Researcher, 6(1), 21–27.
  • QSR-International. (2020). Qualitative data analysis software NVivo. QSR International Chadstone.
  • Rached, E., & Grangeat, M. (2021). French teachers’ informal formative assessment in the context of inquiry-based learning. International Journal of Science Education, 43(3), 385–406. https://doi.org/10.1080/09500693.2020.1740818
  • Ropohl, M., & Rönnebeck, S. (2019). Making learning effective – quantity and quality of pre-service teachers’ feedback. International Journal of Science Education, 41(15), 2156–2176. https://doi.org/10.1080/09500693.2019.1663452
  • Ruiz-Primo, M. A., & Furtak, E. M. (2007). Exploring teachers’ informal formative assessment practices and students’ understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44(1), 57–84. https://doi.org/10.1002/tea.20163
  • Ruiz-Primo, M. A., Li, M., Tsai, S. P., & Schneider, J. (2010). Testing one premise of scientific inquiry in science classrooms: Examining students’ scientific explanations and student learning. Journal of Research in Science Teaching, 47(5), 583–608. https://doi.org/10.1002/tea.20356
  • Sampson, V., Enderle, P., Grooms, J., & Witte, S. (2013). Writing to learn by learning to write during the school science laboratory: Helping middle and high school students develop argumentative writing skills as they learn core ideas. Science Education, 97(5), 643–670. https://doi.org/10.1002/sce.21069
  • Schelling, N., & Rubenstein, L. D. (2023). Pre-service and in-service assessment training: Impacts on elementary teachers’ self-efficacy, attitudes, and data-driven decision making practice. Assessment in Education: Principles, Policy & Practice, 30(2), 177–202. https://doi.org/10.1080/0969594X.2023.2202836
  • Siegel, M. A. (2014). Developing preservice teachers’ expertise in equitable assessment for English learners. Journal of Science Teacher Education, 25(3), 289–308. https://doi.org/10.1007/s10972-013-9365-9
  • Siegel, M. A., & Wissehr, C. (2011). Preparing for the plunge: Preservice teachers’ assessment literacy. Journal of Science Teacher Education, 22(4), 371–391. https://doi.org/10.1007/s10972-011-9231-6
  • Stiggins, R. J. (1991). Assessment literacy. Phi delta kappan, 72(7), 534–539.
  • Strauss, A., & Corbin, J. (1998). Basics of qualitative research techniques. Sage publications Thousand.
  • Talanquer, V., Tomanek, D., & Novodvorsky, I. (2013). Assessing students’ understanding of inquiry: What do prospective science teachers notice? Journal of Research in Science Teaching, 50(2), 189–208. https://doi.org/10.1002/tea.21074
  • Tolgfors, B. (2019). Transformative assessment in physical education. European Physical Education Review, 25(4), 1211–1225. https://doi.org/10.1177/1356336X18814863
  • Wang, J.-R., Kao, H.-L., & Lin, S.-W. (2010). Preservice teachers’ initial conceptions about assessment of science learning: The coherence with their views of learning science. Teaching and Teacher Education, 26(3), 522–529. https://doi.org/10.1016/j.tate.2009.06.014
  • Wu, H. K., & Hsieh, C. E. (2006). Developing sixth graders’ inquiry skills to construct explanations in inquiry-based learning environments. International Journal of Science Education, 28(11), 1289–1313. https://doi.org/10.1080/09500690600621035
  • Xu, Y., & Brown, G. T. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58, 149–162. https://doi.org/10.1016/j.tate.2016.05.010
  • Yan, Z., Li, Z., Panadero, E., Yang, M., Yang, L., & Lao, H. (2021). A systematic review on factors influencing teachers’ intentions and implementations regarding formative assessment. Assessment in Education: Principles, Policy & Practice, 28(3), 228–260. https://doi.org/10.1080/0969594X.2021.1884042
  • Yan, Z., & Pastore, S. (2022). Assessing teachers’ strategies in formative assessment: The teacher formative assessment practice scale. Journal of Psychoeducational Assessment, 40(5), 592–604. https://doi.org/10.1177/07342829221075121
  • Zhang, R. C., Hung, C. S., & Wu, H. K. (2022). Examining the validity and measurement invariance of an assessment literacy inventory for secondary science teachers. Chinese Journal of Science Education, 30(4), 309–333. https://doi.org/10.6173/CJSE.202212_30(4).0002
  • Zlabkova, I., Petr, J., Stuchlikova, I., Rokos, L., & Hospesova, A. (2021). Development of teachers’ perspective on formative peer assessment. International Journal of Science Education, 43(3), 428–448. https://doi.org/10.1080/09500693.2020.1713418
  • Zuiker, S., & Whitaker, J. R. (2014). Refining inquiry with multi-form assessment: Formative and summative assessment functions for flexible inquiry. International Journal of Science Education, 36(6), 1037–1059. https://doi.org/10.1080/09500693.2013.834489