2,475
Views
0
CrossRef citations to date
0
Altmetric
Review

A systematic review of assessments for young children’s scientific and engineering practices

ORCID Icon, ORCID Icon & ORCID Icon

ABSTRACT

Background and Purpose

As a growing number of instructional units have been developed to promote young children’s scientific and engineering practices (SEPs), understanding how to evaluate and assess children’s SEPs is imperative. However, paper-and-pencil assessments would not be suitable for young children because of their limited reading and writing skills. To explore the assessments for SEPs available for young children aged 3–8 years, this study reviewed assessments of young children’s SEPs reported in empirical studies, and analysed the characteristics of these assessments to delineate how young children’s SEPs have been measured.

Methods

We followed the procedures of a systematic review proposed by Zawacki-Richter et al. (2020). The EBSCOhost database was used to gather empirical studies in education and psychology. A total of 46 articles published from 2003 to 2020 met the inclusion criteria and were reviewed.

Findings

The findings indicated that of the eight SEPs suggested by the National Research Council (2012), Analysing and interpreting data was assessed the most, followed by Using mathematics and computational thinking, Constructing explanations and designing solutions, and Planning and carrying out investigations. A majority of assessments were designed for children of 4, 5, and 6 years old and used paper-based visualizations and real objects to present the tasks and items. Additionally, due to the verbal or performance nature of the SEPs, the assessments collected different types of data as evidence to evaluate children’s SEPs. Performance-based assessments were the most common, followed by multiple-choice, ranking, and oral responses.

Conclusion

The findings of the reviewed assessments revealed a variety of performance expectations of SEPs and suggested that some SEPs are measurable and developmentally appropriate for young children. Also, the availability of assessments is uneven in different types of SEPs, and more assessments for information communication and modelling practices are needed.

Introduction

One principal goal of science education from kindergarten through 12th grade is to engage students in investigations and develop their scientific and engineering practices (SEPs), such as asking questions, using models, planning investigations, and designing solutions (Abd-El-Khalick et al. Citation2004; Ministry of Education Citation2018; National Research Council Citation2012). To achieve the goal, a growing number of science learning modules and instructional units have been designed to promote students’ SEPs (e.g. Hsin and Hsin-Kai Citation2022; Hsin and Wu Citation2022; Zhang et al. Citation2015). In addition to the learning and teaching of SEPs, utilizing appropriate assessments to evaluate students’ SEPs is imperative so that researchers and educators can examine the development of SEPs and the effectiveness of interventions. Thus, this review study focuses on the assessments designed to measure students’ SEPs in order to provide insights into how SEPs can be examined. The assessments identified in this study could be used as research instruments for researchers to analyse SEPs, as well as pedagogical tools for practitioners to engage students in important SEPs.

Additionally, previous research has suggested that early exposure to science could lead to better performances and achievement in science in later years (Eshach and Fried Citation2005; Saçkes et al. Citation2011), and the importance of developing young children’s scientific knowledge and competencies has been stressed internationally (Erden and Sönmez Citation2011; Garbett Citation2003; National Research Council Citation2012; Piasta, Pelatti, and Miller 2014). However, compared to assessing students in later grade levels, measuring young children’s SEPs could be more of a challenge because children in their preschool, kindergarten, and early primary years (aged 3–8 years) have limited oral, reading, and writing skills and may not be able to fluently express their ideas and demonstrate their understanding. Traditional paper-and-pencil assessments would not be suitable for them. How can young children’s SEPs be measured to reflect the essence of practices suggested by the research literature (e.g. Lave Citation1988; Ford Citation2015) and policy documents (e.g. National Research Council Citation2012)? What are the assessments of SEPs available for early science teachers and educators to evaluate the effectiveness of their instruction? To address these questions, this study reviewed assessments of young children’s SEPs reported in empirical studies and analysed the characteristics of these assessments to delineate how young children’s SEPs have been measured and what exactly has been measured.

Background

Assessment and its Characteristics

As suggested by Popham (Citation2017), assessment in an educational context can be defined as ‘a formal attempt to determine students’ status with respect to educational variables of interest’ (p. 10). This study employed this definition to identify the assessments of SEPs used in empirical studies, which included but were not limited to paper-based tests, performance assessments, problem-solving tasks, and interviews. Furthermore, to inform how the assessments of SEPs can be designed, this study analysed the characteristics of educational assessments. According to the literature on educational assessments (e.g. Bennett and Bejar Citation1998; Drasgow and Mattern Citation2006), components and characteristics of an assessment include its assessment purpose, the construct being assessed, item and task presentation, response format, scoring procedure, and validation method (Kuo and Wu Citation2013). To conduct this systematic review, we used these characteristics to develop classification categories and codes, to analyse the assessments, and to identify patterns (Alexander Citation2020; Zawacki-Richter et al. Citation2020). For example, the assessment purpose was further classified into formative assessment, summative assessment, diagnostic assessment, and tests of an educational intervention (). The types of item/task presentation could be categorized into oral descriptions, figures, hand puppets, multimedia, and so on, which could be used to inform the design of assessments. The analyses of the assessment characteristics could shed light on the design of assessments for SEPs.

Table 1. Overview of analytical codes and the number of assessments (n).

Scientific and Engineering Practices

In addition to assessment purposes, another core characteristic of assessments is the construct being assessed, that is, the scientific and engineering practices focused on in this study (). SEPs can be viewed as ‘ways of reasoning and acting that develop reliable knowledge claims’ (Ford Citation2015, 1041). Practices emphasize the connection between doing and learning, and can be viewed as knowledge in action (Lave Citation1988). Thus, practices are more than skills and abilities as they ‘require coordination both of knowledge and skill’ (National Research Council Citation2012, 41).

Table 2. Scientific and engineering practices, performance codes, and the number of assessments (n).

Eight practices have been identified and suggested by the Next Generation Science Standards (NGSS; National Research Council Citation2012): (S1) Asking questions (for science) and defining problems (for engineering), (S2) Developing and using models, (S3) Planning and carrying out investigations, (S4) Analysing and interpreting data, (S5) Using mathematics and computational thinking, (S6) Constructing explanations (for science) and designing solutions (for engineering), (S7) Engaging in argument from evidence, and (S8) Obtaining, evaluating, and communicating information. For each practice, NGSS Lead States (Citation2013) outlined performance expectations for different grade bands. Our study included the performance expectations for the grade band of K-2 as part of the conceptual framework ().

However, before the term ‘practice’ has been commonly used in science and technological education, a substantial amount of research in education and psychology has explored scientific ways of reasoning, thinking, and acting, such as generating hypotheses, controlling variables, testing and revising theories, evaluating evidence, and drawing conclusions (e.g. Klahr and Dunbar Citation1988; Kuhn et al. Citation2000; Zimmerman Citation2007). For Zimmerman (Citation2000, Citation2007) showed that many children’s scientific reasoning skills involved in SEPs have been investigated from a developmental perspective. In this study, therefore, scientific thinking, reasoning, and other related constructs were also utilized as search terms (Appendix 1) and empirical studies from educational psychology and developmental psychology were also included if they provided instruments to assess young children’s scientific processes.

Purpose and Research Questions

Drawing upon the definitions and characteristics of assessment and SEP, we developed a conceptual framework () for the systematic review. The purpose of the study was to review assessments of young children’s SEPs reported in empirical studies and to analyse the interplay between assessment characteristics (i.e. assessed SEPs, assessed children’s age, item presentation, response format, and assessment purpose). The research questions are as follows.

  1. What assessment characteristics do the assessments of SEPs have?

  2. What are the assessments of SEPs available for children of 3–8 years old?

  3. How are the assessment purposes related to the types of SEP?

  4. How are the types of item/task presentation related to the types of SEP?

  5. How are the types of response format related to the types of SEP?

By answering these questions, this review study would contribute to the understanding of how young children’s SEPs can be measured and what should be measured, and can provide insights into the design of assessments for children’s SEPs.

Methods

This study followed the procedures of a systematic review proposed by Zawacki-Richter et al. (Citation2020). After the research questions and the conceptual framework were developed, we selected articles by constructing selection criteria, developing search strategies, and assessing the quality of studies. We then coded the selected articles and synthesized results of individual studies to answer the research questions. In the following sections, we provide a detailed account of the selection of articles, the coding process, and the data synthesis.

Collecting and Selecting Articles

The EBSCOhost database was used in this study to gather empirical studies in education and psychology. It includes authoritative databases such as Academic Search Complete, British Education Index (BEI), and Education Resources Information Centre (ERIC) and allowed us to search multiple databases simultaneously without comparing and removing the duplicate results across different databases.

Given the purpose of this study, the search terms were chosen to cover the three categories of interest: practice, discipline, and young children (Appendix 1). The primary terms and their synonyms within each category were combined using the operator ‘OR’. The operator ‘AND’ was used to join the three included categories, and ‘NOT’ was to exclude the studies on special education, family education, and reading literacy. The searches were limited to journal articles published in English up until January 2022. We purposefully selected the assessments reported in journal articles because, compared to conference papers, journal articles are rigorously peer-reviewed, contain more information about the assessments, and have a stronger methodological basis. After duplicates were removed, 1,246 papers were left for further selection. Titles and keywords were reviewed and 311 articles focusing on the following topics were filtered out: special education (e.g. autism, atypical, and gifted children, 20 papers), neuroscience (e.g. neuro function and electroencephalography, 33 papers), executive function or inhibitory control (75 papers), and less relevant issues (e.g. moral, emotion, psychomotor, theory of mind, 183 papers). This phase resulted in the inclusion of 935 papers.

In the second phase, titles and abstracts were read to select papers that met the following criteria: (a) empirical studies, (b) involved young children ranging from 3 to 8 years old as participants, (c) included a formal attempt to determine children’s science performances, and (d) provided the content information and scoring details of the assessment used. Additionally, to establish the reliability regarding the selection of papers, three researchers (two science education and one early childhood education expert) independently read 100 papers that were randomly selected from the 935. The interrater agreement among the three experts was 86%. The inconsistencies were resolved by discussion. Subsequently, one researcher screened the titles and abstracts of the rest of the articles, and 795 papers were further excluded. After the full texts of the remaining 140 papers were reviewed, a total of 46 articles published from 2003 to 2020 met the criteria and were included in the final review.

Coding Assessments in the Selected Articles

Assessments reported in the 46 selected papers were coded and analysed by using the framework in . contains the eight categories of an assessment: (1) assessed children’s age, (2) subject area involved (e.g. sciences and mathematics), (3) assessment purpose, (4) item/task presentation (e.g. oral descriptions only, or hand puppets and physical objects), (5) response format (e.g. ranking and performance), (6) scoring/coding method (e.g. classification and rating scale), (7) validity method (e.g. content validity and criterion-related validity), and (8) reliability method (e.g. Cronbach’s α and interrater agreement). For example, four types of assessment purposes were identified. Formative and summative assessments referred to the ones taken respectively during and after a learning unit to evaluate how and whether students learned the SEPs. Diagnostic assessments were used to analyse young children’s SEPs before or without a learning unit. If an assessment was designed to evaluate the effectiveness of an intervention, it was defined as the pre- and post-tests of an intervention. The analysis of the assessment purposes may help decide whether an assessment would be more suitable to be used as a research instrument or a pedagogical tool. For studies that did not provide sufficient information about some of the categories, ‘Not specified’ or ‘Not mentioned’ was coded.

focuses on the construct being assessed and includes children’s performance expectations suggested by NGSS Lead States (Citation2013) to analyse which SEP was assessed. As we coded the assessments from the selected papers, new performances emerged from the analysis so the codes were added to or refined. For example, in , two newly added codes were S3g. Conduct an experiment by controlling and manipulating variables and S5d. Use rules or codes to generate a series of steps to formulate problems or solutions. They were not listed in NGSS Lead States (Citation2013) but were reported in some of the selected articles and are closely related to children’s computational thinking, so they were included in our coding framework. In addition, S5b1 and S5b2 fell under the same performance expectation in NGSS Lead States; however, our analysis suggested that the reviewed assessments required children to describe, measure, and/or compare quantitative attributes of different objects (S5b1) but did not ask them to display the data using simple graphs (S5b2). Dividing this performance expectation into two codes allowed us to better examine which performance was assessed or not assessed by the selected studies.

Among the 46 selected articles, one article may include more than one assessment. Also, more than one code within a category could be applied to an assessment. For example, one assessment may contain two types of response formats. Thus, the total number of assessments for each category could be larger than 46.

Synthesising Coded Assessments

The processes of coding and synthesizing in this study were expedited by the use of a qualitative analysis tool, NVivo 12. The tool provides features such as generating descriptive coded results, searching for coded content, and creating cross-tables of any two categories to facilitate data visualization, analysis, and synthesis. After the 46 articles were coded by using the categories and codes in , we first generated a descriptive report of the numbers of each category as an overview of the assessments. Secondly, using the report, we explored the prevalence of the characteristics and codes in the assessments. Thirdly, we used NVivo to create cross-tables of two categories to answer research questions about the types of SEPs related to other assessment characteristics. Finally, we made comparisons among the cross-tables and identified the patterns of the categories to generate findings.

Results

Overview of the Assessment Characteristics

To answer the first research question, we generated a descriptive report of the coding results. show an overview of assessment characteristics and the number of assessments for SEPs. First, a majority of assessments were designed for children of 4, 5, and 6 years old (24, 30, and 24 assessments respectively). There were also more than 10 assessments available for children of 3 and 7 years old. Secondly, the most covered subject area was science, while the subject area of 11 assessments was not specified. Of the nine engineering assessments, seven focused on programming. Thirdly, among the assessments, 20 were diagnostic, followed by 11 summative assessments, and nine assessments designed to evaluate children’s learning before and after an intervention.

Regarding the fourth category, the assessment tasks and items were presented with paper-based visualizations (n = 22), real objects (n = 16), hand puppets and physical materials (n = 5), and multimedia (n = 5), except for one assessment presented by oral descriptions only. Additionally, in the fifth category of response format, performance-based assessments were the most common (n = 21), followed by multiple-choice or ranking (n = 17), and oral responses (n = 17). Three assessments asked children to respond in written form or with drawings, and one assessment required children to create an artefact (e.g. designing two- and three-dimensional illustrations of spaces) in line with their performances (Gözen Citation2015). To score the responses collected from children, 29 assessments determined the correctness of the responses, and points were given based on the number of correct responses, whereas 14 assessments used a rating scale or levels to evaluate children’s performances, descriptions, or drawings. In nine assessments, rather than being graded, children’s responses were classified into different types. In addition, the amount of reaction or total time children spent on a task was used by three assessments as a measure of children’s science performances.

In terms of the validity of the assessments, 19 articles adopted assessments that were developed and validated by previous studies, such as the number-line task (Cohen and Sarnecka Citation2014), the Solve-Its test for programming concepts (Sullivan and Umashi Bers Citation2018), and the EARLI Numeracy Measures (Wu et al. Citation2015). Also, 13 articles drew upon the procedures, frameworks, or classification methods from the literature to validate their assessments. The examples included utilizing Ohio’s natural sciences standards for pre-schoolers (Inan, Trundle, and Kantor Citation2010), replicating the categorization procedure of a previous study (Sobel and Buchanan Citation2009), and applying a structured cognitive interview (Wright and Gotwals Citation2017). Other validity methods included expert or content validity (n = 8), construct validity (n = 5), and criterion-related validity (n = 3). Regarding the reliability methods, as most of the assessments collected qualitative data from children (e.g. performances, oral descriptions, and drawings), the interrater agreement was the most frequently used method to establish reliability (). For 10 assessments that used multiple choice or ranking as the response format or quantified the qualitative data collected from children, they used statistical methods, such as Cronbach’s α, KR-20, or Rasch analysis, to measure the reliability. A total of 16 assessments that were adopted from previous studies did not offer reliability data, and two did not provide any information about reliability.

Of the eight scientific and engineering practices (), S4. Analysing and interpreting data was assessed the most (n = 30), followed by S5. Using mathematics and computational thinking (n = 15), S6. Constructing explanations and designing solutions (n = 12), and S3. Planning and carrying out investigations (n = 11). Additionally, 21 assessments were used to evaluate children’s performance of S4c. Using observations to describe patterns and/or relationships. These assessments could be further divided into two types. A total of 11 assessments focused on children’s competencies in characteristic identification, classification, and pattern recognition (e.g. Hong and Diamond Citation2012; Monteira and Jiménez-Aleixandre Citation2016), whereas 10 emphasized children’s reasoning about the relationships between two variables, such as causal reasoning (Solis and Grotzer Citation2016) or Bayesian reasoning (Schulz, Gopnik, and Glymour Citation2007). Regarding S5. Using mathematics and computational thinking, seven and eight assessments were available to evaluate children’s performances on S5b1. Describing and measuring quantitative attributes of different objects and S5d. Using rules or codes to generate a series of steps to formulate problems or solutions respectively. also shows that no assessment was designed to measure children’s practices of S2. Developing and using models. Only one assessment was available for S8. Obtaining, evaluating, and communicating information.

Assessments of SEPs and Assessed Children’s Age

To address the second research question, we identified the assessments of SEPs available for young children from age 3 to 8 years. As shown in , for children at age 3, the assessments of SEPs were limited to S4c, S5b1, S5d, S6a, and S6b. The SEP of S4c. Using observations to describe patterns and/or relations was the most frequently assessed. Among the six coded assessments, two (Kirkland et al. Citation2015; Qiong et al. Citation2015) focused on children’s sorting and recognition performances, while four studies (Peterson and French Citation2008; Schulz, Gopnik, and Glymour Citation2007; Sobel and Buchanan Citation2009; Sobel et al. Citation2017) assessed children’s reasoning practices of the relations between variables. The second frequently assessed SEP was S5b1. Describing, measuring, or comparing quantitative attributes of different objects (n = 3), such as pointing to an object that was bigger or smaller among multiple objects (Wu et al. Citation2015). There were also assessments for S6b. Using tools and/or materials to generate a solution to a specific problem. In Dejonckheere, Van De Keere, and Mestdagh (Citation2009), 3-year-olds had to use toy cars and blocks to formulate a solution to cross a river. These findings suggested that these SEPs (i.e. S4c, S5b1, S5d, S6a, and S6b) have been considered to be developmentally appropriate and measurable for 3-year-olds.

Table 3. Assessments of SEPs and assessed children’s age.

For children at age 4–6 years, more assessments were available that covered a variety of SEPs, including S1, S3, S4, S5, S6, and S7. In addition to the assessments of S4. Analysing, S5. Mathematising, and S6. Explaining available for 3-year-olds, the assessments for 4–6 year-olds were also designed to evaluate their practices of S1. Questioning, S3. Investigating, and S7. Argumentation (). For example, to examine children’s questioning practices, Klemm and Neuhaus (Citation2017) used the procedures developed by Kohlhauf, Rutke, and Neuhaus (Citation2011) and asked children of 4–6 years old to think about a possible research question and generate a hypothesis. Additionally, the assessments of S3. Investigating focused on S3d. Making observations to collect data (e.g. Inan, Trundle, and Kantor Citation2010) and S3g. Conducting an experiment by controlling and manipulating variables (e.g. Dejonckheere et al. Citation2016; van der Graaf, Segers, and Verhoeven Citation2015). Although this practice of S3g was not listed as a performance expectation in NGSS Lead States (Citation2013), of the seven studies reviewed in this study, it was viewed as an important SEP and could be enacted by 4–6 year-olds (Jirout and Zimmerman Citation2015). Furthermore, the most frequently assessed performance of S7. Argumentation was children’s use of evidence in support of their claims. In Janina and Neuhaus (Citation2017) , children were asked to indicate the evidence to support their claim with prompts such as ‘How do you know?’ and ‘What is your evidence?’.

The distribution of the assessments of SEPs for 7–8 year-olds was similar to that for 4–6 year-olds. However, assessments of S3d. Making observations to collect data S4a. Recording information (observations, thoughts, and ideas) were not found for children of 7–8 years old. Additionally, in some studies, assessed children’s age was not specified. For example, Inan, Trundle, and Kantor (Citation2010) was conducted in a preschool classroom, and children’s performances of sharing experiences, discussing, and brainstorming were observed, so the study was coded as S8. Communicating practices. Yet, although the study indicated that children’s age ranged from 3 to 5 years, it was unclear in which age group certain observations or assessment attempts were made. Similarly, Senocak et al. (Citation2013) developed and validated an instrument to evaluate Turkish kindergarteners’ inquiry performances such as stating a problem, making predictions, and measuring with non-standard units, but their study did not indicate the kindergarteners’ age range in the Turkish educational context.

Assessments of SEPs and Assessment Purposes

How are the assessment purposes related to the types of SEP? shows that formative assessments were available to evaluate various practices of S3. Investigating, S4. Analysing, S5. Mathematising, S6. Explaining, S7. Argumentation, and S8. Communicating. On the other hand, the summative assessments were mainly designed for S5d. Using rules or codes to formulate problems or solutions, such as Solve-Its (Strawhacker, Lee, and Bers Citation2017; Sullivan and Bers Citation2016) and TechCheck (Relkin, de Ruiter, and Bers Citation2020).

Table 4. Assessments of SEPs and assessment purposes.

In addition, more than 10 diagnostic assessments were developed to evaluate children’s performances of S4c. Using observations to describe patterns and/or relationships. For instance, the diagnostic assessment in Livingston and Andrews (Citation2005) asked 5-year-old children to observe and classify a set of stimuli to determine the children’s level of category learning. Also, four assessments were used for examining children’s performances in S5b1. Describing, measuring, or comparing quantitative attributes of different objects. One example was the numeracy measures in Wu et al. (Citation2015) that required children to count aloud, make measurements, count objects, and recognize patterns.

Assessments of SEPs also served as the pre- and post-tests of an educational intervention. Ten SEPs were examined to illustrate the effectiveness of an intervention: S4c. Using observations to describe patterns and/or relationships (n = 4), S6a. Using information from observations to construct an evidence-based account for natural phenomena (n = 3), S4d. Comparing predictions to what occurred (n = 2), S7f. Constructing an argument with evidence (n = 2), S1b. Asking questions that can be answered by an investigation (n = 1), S3d. Planning an investigation to produce data (n = 1), S3g. Conducting an experiment by controlling and manipulating variables (n = 1), S4a. Recording information (n = 1), S5b1. Describing, measuring, or comparing quantitative attributes of different objects (n = 1), and S6b. Using tools and/or materials to solve a problem (n = 1). This finding revealed that these practices were commonly set as instructional objectives of an early science learning module or program.

Assessments of SEPs and Item/Task Presentations

The fourth research question examined how different types of item/task presentations were used in the assessments of SEPs. Paper-based visualizations, including pictures, cards, and written text with oral descriptions, were the most useful to present items and tasks of almost all the coded SEPs (). The second most used presentation was real objects, animals, or plants. Compared to paper-based visualizations, real objects were particularly helpful for assessments of S6b. Using tools and/or materials to solve a problem. For example, the problem-solving task in Hong and Diamond (Citation2012) was presented with objects and toys, and required children to use them to make a foil container sink in the water.

Table 5. Assessments of SEPs and item/task presentations.

The third type of presentation, hand puppets and physical materials, was relatively less employed, and the SEPs involved in the assessments that used this type of presentation overlapped with the SEPs using paper-based visualizations. Moreover, technology-based multimedia were applied in showing the assessment items and tasks. Compared to other types of presentation, multimedia have been used rather recently, and the five studies that adopted the use of multimedia were published after 2014. For example, Cohen and Sarnecka (Citation2014) used a laptop monitor to present stimuli of measurement tasks. Video clips created by Grotzer et al. (Citation2016) were to demonstrate scenarios and then children were asked to answer questions based on their observations and interpretations of the scenarios. Finally, one study used verbal descriptions only to present the assessment items. To administer a multiple-choice assessment, in Peppler et al. (Citation2020), the questions and answer options were read out loud at least twice for first graders to choose an answer.

Assessments of SEPs and Response Formats

How are the types of response formats related to the types of SEP? overviews the numbers of assessments that employed five types of response formats across different SEPs. As questions or problems were usually represented in a verbal form, to assess S1. Questioning practices, children were required to select from a group of answer options (e.g. Senocak et al. Citation2013) or orally describe their answers (e.g. Samarapungavan et al. Citation2009). On the other hand, performances were the response format dominantly used for the assessments of S3. Investigations. In these assessments, children were asked to demonstrate how they made observations, used tools to measure different objects, and carry out investigations by controlling variables.

Table 6. Assessments of SEPs and response formats.

To evaluate children’s SEPs of S4. Analysing, S5. Mathematising, and S7. Argumentation, three types of response formats (i.e. multiple-choice or ranking, performance, and oral description) were used. Additionally, written expressions and drawings were used in S4. Analysing practices. For example, in Tarım (Citation2015), children were asked to make observations, identify patterns, and make drawings to complete the empty boxes according to the pattern rule.

The assessments of S6. Explaining had the most variety in response formats. In addition to the aforementioned four types, artefacts were also used to collect data from children. In the Architectural Design Education Program in Gözen (Citation2015), children from 6 to 11 years old created two- and three-dimensional illustrations of spaces (e.g. poster design, garden design, and modelling) to examine their SEPs of S6b. Using tools and/or materials to design a solution. Finally, in the only assessment of S8. Communicating (Inan, Trundle, and Kantor Citation2010), children’s performances of sharing experiences, discussing, and brainstorming were evaluated. Together these findings suggested that due to the verbal or performance nature of the SEPs, the assessments reviewed in this study collected different types of data as evidence to evaluate children’s SEPs.

Discussion and Conclusion

As developing learners’ scientific and engineering practices has been identified as a principal goal of K-12 science education, the issue of how to assess young children’s SEPs should be addressed. This study reviewed assessments of young children’s SEPs reported in empirical studies and delineated how young children’s SEPs have been measured.

This study found that the availability of assessments is uneven in different types of SEPs and performances. Some SEPs such as Analysing and interpreting data, Using mathematics and computational thinking, and Constructing explanations and designing solutions, have been assessed and investigated more than others. On the other hand, more assessments are needed for Developing and using models and Obtaining and communicating information. Additionally, while various performances for the SEPs have been identified by NGSS Lead States (Citation2013), not all performances have been assessed. For example, the assessments for Engaging in argument from evidence have been focused on the performance of S7f. Constructing an argument with evidence, whereas no assessment was available for other argumentation performances. Furthermore, although computational thinking and coding practices are relatively new practices (Grover and Pea Citation2013), some summative assessments have been designed and utilized. However, to understand how children develop these practices in learning activities, more formative or diagnostic assessments may be needed. Taken together, this study outlined the availability of the assessments and indicated that further research may consider developing more assessments for the SEPs that have not been addressed by previous research.

The findings of the reviewed assessments also revealed more performance expectations of SEPs and suggested that some SEPs are measurable and developmentally appropriate for young children. This review study identified two performances that were not listed in NGSS Lead States (Citation2013): S3g. Conducting an experiment by controlling and manipulating variables and S5d. Using rules or codes to generate a series of steps to formulate problems or solutions. The findings have instructional implications and indicate that future learning and teaching units for early science could introduce and engage students in these SEPs.

Moreover, in this study, the analyses of assessment characteristics provide insight into the design of assessments for early science education. Although performance-based assessments were a dominant form, more types of task presentations and response formats could be taken into consideration as children may prefer a variety of ways to express their understanding and competence of science. For example, in addition to paper-based visualizations and physical objects, multimedia presentations of assessment tasks and items have been adopted by recent studies. Children could enact their scientific engineering practices by demonstrating adequate performances, choosing correct answers, as well as creating artefacts. Particularly, the latter form of response format may be useful for designing assessments of modelling practices, which were not evaluated in the reviewed studies.

Finally, the analyses of assessment characteristics could also offer suggestions on how the assessments could be used for research and practice. Among the different types of assessments, the formative and summative assessments with clear scoring procedures (e.g. correctness or classification) and simple response formats (e.g. multiple choice or ranking) may be more feasible for teachers and could be adapted to classroom teaching. On the other hand, diagnostic assessments and pre- and post-tests were usually designed for research purposes and involved more open-ended response formats and complicated scoring methods. Future research could adopt these assessments to explore and characterise young children’s SEPs.

Acknowledgments

This study was supported by the Ministry of Science and Technology in Taiwan under MOST 107-2511-H-003-012-MY3, MOST 109-2811-H-003-505, MOST 109-2811-H-003-522, and the “Institute for Research Excellence in Learning Sciences” of National Taiwan Normal University from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the Ministry of Science and Technology, Taiwan [MOST 107-2511-H-003-012-MY3,MOST 109-2811-H-003-505,MOST 109-2811-H-003-522]; Ministry of Education, Taiwan [Higher Education Sprout Project].

References

  • Abd-El-Khalick, Fouad, Saouma Boujaoude, Richard Duschl, Norman G. Lederman, Rachel Mamlok-Naaman, Avi Hofstein, Mansoor Niaz, David Treagust, and H. L. Tuan. 2004. “Inquiry in Science Education: International Perspectives.” Science Education 88: 394–419. doi:10.1002/sce.10118.
  • Alexander, Patricia A. 2020. “Methodological Guidance Paper: The Art and Science of Quality Systematic Reviews.” Review of Educational Research 90 (1): 6–23. doi:10.3102/0034654319854352.
  • Bennett, Randy Elliot, and Isaac I. Bejar. 1998. “Validity and Automated Scoring: It’s Not Only the Scoring.” Educational Measurement: Issues and Practice 17 (4): 9–17. doi:10.1111/j.1745-3992.1998.tb00631.x.
  • Cohen, Dale J., and Barbara W. Sarnecka. 2014. “Children’s number-line Estimation Shows Development of Measurement Skills (Not Number Representations).” Developmental Psychology 50 (6): 1640–1652. doi:10.1037/a0035901.
  • Dejonckheere, Peter J. N., Nele De Wit, Keere Kristof Van de, and Stephanie Vervaet. 2016. “Exploring the Classroom: Teaching Science in Early Childhood.” International Electronic Journal of Elementary Education 8 (4): 537–558. doi:10.12973/eu-jer.5.3.149.
  • Dejonckheere, Peter J. N., Kristof Van De Keere, and Nele Mestdagh. 2009. “Training the Scientific Thinking Circle in Pre- and Primary School Children.” The Journal of Educational Research 103 (1): 1–16. doi:10.1080/00220670903228595.
  • Drasgow, Fritz, and Krista Mattern. 2006. “New Tests and New Items: Opportunities and Issues.” In Computer-based Testing and the Internet: Issues and Advances, edited by K. Crowley, C. Schunn, and T. Okada, 59–75, Chichester, UK: Wiley.
  • Erden, Feyza T., and Sema Sönmez. 2011. “Study of Turkish Preschool Teachers’ Attitudes toward Science Teaching.” International Journal of Science Education 33 (8): 1149–1168. doi:10.1080/09500693.2010.511295.
  • Eshach, Haim, and Michael N. Fried. 2005. “Should Science Be Taught in Early Childhood?” Journal of Science Education and Technology 14 (3): 315–336. doi:10.1007/s10956-005-7198-9.
  • Ford, M J. 2015. “Educational Implications of Choosing “Practice” to Describe Science in the Next Generation Science Standards.” Science Education 99 (6): 1041–1048. doi:10.1002/sce.21188.
  • Garbett, Dawn. 2003. “Science Education in Early Childhood Teacher Education: Putting Forward a Case to Enhance Student Teachers’ Confidence and Competence.” Research in Science Education 33 (4): 467–481. doi:10.1023/b:rise.0000005251.20085.62.
  • Gözen, Göksu. 2015. “Architectural Design Education Program for Children: Adaptation into Turkish Culture and Analysis of Its Effectiveness.” Eurasian Journal of Educational Research 15 (59). doi:10.14689/ejer.2015.59.3.
  • Grotzer, Tina A., S. Lynneth Solis, M. Shane Tutwiler, and Megan Powell Cuzzolino. 2016. “A Study of Students’ Reasoning about Probabilistic Causality: Implications for Understanding Complex Systems and for Instructional Design.” Instructional Science 45 (1): 25–52. doi:10.1007/s11251-016-9389-6.
  • Grover, Shuchi, and Roy Pea. 2013. “Computational Thinking in K–12: A Review of the State of the Field.” Educational Researcher 42 (1): 38–43. doi:10.3102/0013189x12463051.
  • Hong, Soo-Young, and Karen E. Diamond. 2012. “Two Approaches to Teaching Young Children Science Concepts, Vocabulary, and Scientific problem-solving Skills.” Early Childhood Research Quarterly 27 (2): 295–305. doi:10.1016/j.ecresq.2011.09.006.
  • Hsin, Ching-Ting, and Hsin-Kai Wu. 2022. “Implementing a project-based Learning Module in Urban and Indigenous Areas to Promote Young Children’s Scientific Practices.” Research in Science Education. doi:10.1007/s11165-022-10043-z.
  • Inan, Hatice Zeynep, KathyCabe Trundle, and Rebecca Kantor. 2010. “Understanding Natural Sciences Education in a Reggio Emilia-inspired Preschool.” Journal of Research in Science Teaching 47 (10): 1186–1208. doi:10.1002/tea.20375.
  • Jirout, Jamie, and Corinne Zimmerman. 2015. “Development of Science Process Skills in the Early Childhood Years.” In Research in Early Childhood Science Education, edited by Kathy Cabe Trundle and Mesut Saçkes, 143–165. Dordrecht: Springer.
  • Kirkland, Lynn D., Maryann Manning, Kyoko Osaki, and Delyne Hicks. 2015. “Increasing logico-mathematical Thinking in Low SES Preschoolers.” Journal of Research in Childhood Education 29 (3): 275–286. doi:10.1080/02568543.2015.1040901.
  • Klahr, David, and Kevin Dunbar. 1988. “Dual Space Search during Scientific Reasoning.” Cognitive Science 12: 1–48. doi:10.1207/s15516709cog1201_1.
  • Klemm, Janina, and Birgit J. Neuhaus. 2017. “The Role of Involvement and Emotional well-being for Preschool Children’s Scientific Observation Competency in Biology.” International Journal of Science Education 39 (7): 863–876. doi:10.1080/09500693.2017.1310408.
  • Kohlhauf, Lucia, Ulrike Rutke, and Birgit Neuhaus. 2011. “Influence of Previous Knowledge, Language Skills and domain-specific Interest on Observation Competency.” Journal of Science Education and Technology 20 (5): 667–678. doi:10.1007/s10956-011-9322-3.
  • Kuhn, Deanna, John Black, Alla Keselman, and Danielle Kaplan. 2000. “The Development of Cognitive Skills to Support Inquiry Learning.” Cognition and Instruction 18 (4): 495–523. doi:10.1207/S1532690XCI1804_3.
  • Kuo, Che-Yu, and Hsin-Kai Wu. 2013. “Toward an Integrated Model for Designing Assessment Systems: An Analysis of the Current Status of computer-based Assessments in Science.” Computers & Education 68: 388–403. doi:10.1016/j.compedu.2013.06.002.
  • Lave, Jean. 1988. Cognition in Practice. Cambridge, UK: Cambridge University Press.
  • Livingston, Kenneth R., and Janet K. Andrews. 2005. “Evidence for an age-independent Process in Category Learning.” Developmental Science 8 (4): 319–325. doi:10.1111/j.1467-7687.2005.00419.x.
  • Ministry of Education. 2018. Curriculum Guidelines of 12-year Basic Education: Natural Sciences. Taipei, Taiwan: Author.
  • Monteira, Sabela F., MaríaPilar Jiménez-Aleixandre. 2016. “The Practice of Using Evidence in Kindergarten: The Role of Purposeful Observation.” Journal of Research in Science Teaching 53 (8): 1232–1258. doi:10.1002/tea.21259.
  • National Research Council. 2012. A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. Washington, DC: National Academies Press.
  • NGSS Lead States. 2013. Next Generation Science Standards: For States, by States. Washington, DC: National Academies Press.
  • Peppler, Kylie, Naomi Thompson, Joshua Danish, Armin Moczek, and Seth Corrigan. 2020. “Comparing First- and third-person Perspectives in Early Elementary Learning of Honeybee Systems.” Instructional Science 48 (3): 291–312. doi:10.1007/s11251-020-09511-8.
  • Peterson, Shira May, and Lucia French. 2008. “Supporting Young Children’s Explanations through Inquiry Science in Preschool.” Early Childhood Research Quarterly 23 (3): 395–408. doi:10.1016/j.ecresq.2008.01.003.
  • Piasta, Shayne B., Christina Yeager Pelatti, and Heather Lynnine Miller. 2014. “Mathematics and Science Learning Opportunities in Preschool Classrooms.” Early Education and Development 25 (4): 445–468. doi:10.1080/10409289.2013.817753.
  • Popham, W. James. 2017. Classroom Assessment: What Teachers Need to Know. 8th ed. Boston, USA: Pearson.
  • Relkin, Emily, Laura de Ruiter, and Marina Umaschi Bers. 2020. “TechCheck: Development and Validation of an Unplugged Assessment of Computational Thinking in Early Childhood Education.” Journal of Science Education and Technology 29 (4): 482–498. doi:10.1007/s10956-020-09831-x.
  • Saçkes, Mesut, Kathy Cabe Trundle, Randy L. Bell, and Ann A. O’Connell. 2011. “The Influence of Early Science Experience in Kindergarten on Children’s Immediate and Later Science Achievement: Evidence from the Early Childhood Longitudinal Study.” Journal of Research in Science Teaching 48 (2): 217–235. doi:10.1002/tea.20395.
  • Samarapungavan, Ala, Panayota Mantzicopoulos, Helen Patrick, and Brian French. 2009. “The Development and Validation of the Science Learning Assessment (SLA): A Measure of Kindergarten Science Learning.” Journal of Advanced Academics 20 (3): 502–535. doi:10.1177/1932202X0902000306.
  • Schulz, Laura E., Alison Gopnik, and Clark Glymour. 2007. “Preschool Children Learn about Causal Structure from Conditional Interventions.” Developmental Science 10 (3): 322–332. doi:10.1111/j.1467-7687.2007.00587.x.
  • Senocak, Erdal, Ala Samarapungavan, Pınar Aksoy, and Cemal Tosun. 2013. “A Study on Development of an Instrument to Determine Turkish Kindergarten Students’ Understandings of Scientific Concepts and Scientific Inquiry Processes.” Educational Sciences: Theory & Practice 13 (4): 2217–2228. doi:10.12738/estp.2013.4.1721.
  • Sobel, David M., and David W. Buchanan. 2009. “Bridging the Gap: Causality-at-a-distance in Children’s Categorization and Inferences about Internal Properties.” Cognitive Development 24 (3): 274–283. doi:10.1016/j.cogdev.2009.03.003.
  • Sobel, David M., Christopher D. Erb, Tiffany Tassin, and Deena Skolnick Weisberg. 2017. “The Development of Diagnostic Inference about Uncertain Causes.” Journal of Cognition and Development 18 (5): 556–576. doi:10.1080/15248372.2017.1387117.
  • Solis, S. Lynneth, and TinaA. Grotzer. 2016. “They Work Together to Roar: Kindergartners’ Understanding of an Interactive Causal Task.” Journal of Research in Childhood Education 30 (3): 422–439. doi:10.1080/02568543.2016.1178196.
  • Strawhacker, Amanda, Melissa Lee, and Marina Umaschi Bers. 2017. “Teaching Tools, Teachers’ Rules: Exploring the Impact of Teaching Styles on Young Children’s Programming Knowledge in ScratchJr.” International Journal of Technology and Design Education 28 (2): 347–376. doi:10.1007/s10798-017-9400-9.
  • Sullivan, Amanda, and Marina Umaschi Bers. 2016. “Girls, Boys, and Bots: Gender Differences in Young Children’s Performance on Robotics and Programming Tasks.” Journal of Information Technology Education: Innovations in Practice 15 (1): 145–165. doi:10.28945/3547.
  • Sullivan, Amanda, and Marina Umashi Bers. 2018. “The Impact of Teacher Gender on Girls’ Performance on Programming Tasks in Early Elementary School.” Journal of Information Technology Education: Innovations in Practice 17: 153–162. doi:10.28945/4082.
  • Tarım, Kamuran. 2015. “Effects of Cooperative Group Work Activities on pre-school Children’s Pattern Recognition Skills.” Kuram ve Uygulamada Egitim Bilimleri 15 (6): 1597–1604. doi:10.12738/estp.2016.1.0086.
  • van der Graaf, Joep, Eliane Segers, and Ludo Verhoeven. 2015. “Scientific Reasoning Abilities in Kindergarten: Dynamic Assessment of the Control of Variables Strategy.” Instructional Science 43 (3): 381–400. doi:10.1007/s11251-015-9344-y.
  • Wright, Tanya S., and AmeliaWenk Gotwals. 2017. “Supporting Kindergartners’ Science Talk in the Context of an Integrated Science and Disciplinary Literacy Curriculum.” Elementary School Journal 117 (3): 513–537. doi:10.1086/690273.
  • Wu, Hsin-Kai. 2022. “Modelling a Complex System: Using novice-expert Analysis for Developing an Effective technology-enhanced Learning Environment.” International Journal of Science Education 32 (2): 195–219. doi:10.1080/09500690802478077.
  • Wu, Qiong, Pui-Wa Lei, James C. DiPerna, Paul L. Morgan, and Erin E. Reid. 2015. “Identifying Differences in Early Mathematical Skills among Children in Head Start.” International Journal of Science and Mathematics Education 13 (6): 1403–1423. doi:10.1007/s10763-014-9552-y.
  • Zawacki-Richter, Olaf, Michael Kerres, Svenja Bedenlier, Melissa Bond, and Katja Buntins. 2020. Systematic Reviews in Educational Research: Methodology, Perspectives and Application. In: Spring Nature.
  • Zhang, Wen-Xin, Ying-Shao Hsu, Chia-Yu Wang, and Ho. Yu-Ting. 2015. “Exploring the Impacts of Cognitive and Metacognitive Prompting on Students’ Scientific Inquiry Practices within an E-Learning Environment.” International Journal of Science Education 37 (3): 529–553. doi:10.1080/09500693.2014.996796.
  • Zimmerman, Corinne. 2000. “The Development of Scientific Reasoning Skills.” Developmental Review 20: 99–149. doi:10.1006/drev.1999.0497.
  • Zimmerman, Corinne. 2007. “The Development of Scientific Thinking Skills in Elementary and Middle School.” Developmental Review 27: 172–223. doi:10.1016/j.dr.2006.12.001.

Appendix 1. Included and excluded search terms by category