1,331
Views
0
CrossRef citations to date
0
Altmetric
Intervention, Evaluation, and Policy Studies

Evaluation of a State-Wide Mathematics Support Program for at-Risk Students in Grade 1 and 2 in Germany

ORCID Icon & ORCID Icon
Pages 687-716 | Received 14 May 2019, Accepted 04 Mar 2022, Published online: 08 Jun 2022

Abstract

Supporting students with difficulties in learning mathematics is a challenge for teachers and educational administrators. Formative assessment is considered to play a successful role in supporting at-risk students as well as students without difficulties in mathematics. There is a need for intervention programs, including formative assessment techniques, that (a) are easy to implement in the regular classroom without requiring radical changes in teachers’ individual teaching style, and (b) are effective in supporting at-risk students at the earliest stage possible in their school careers. This article analyzes an effectiveness trial of a formative assessment program developed to meet these goals and conducted in the first two years of elementary school. The examination of the longitudinal dataset from Grades 1–3 (N = 2,330) revealed an effect after the implementation, which was maintained at nearly the same effect size one year after completion of the program. The findings imply that formative assessment can foster the arithmetic achievement of students at risk as well as that of the entire class without changing the curriculum or teachers’ individual teaching style.

1. Introduction

It is well known that mathematical skills before school entrance are predictive of mathematical achievement at the end of primary school (e.g., Aunola et al., Citation2004; MacDonald & Carmichael, Citation2018). Therefore, early interventions at school entrance aiming at improving the mathematical knowledge and skills of students have the potential to significantly improve mathematical outcomes (e.g., Clements et al., Citation2013; for an overview see Clements & Sarama, Citation2011).

Interventions following a formative assessment approach have been described as effective in improving student performance, in particular the achievement of low-performing students (Dunn & Mulvenon, Citation2009). Formative assessment can be described as a method eliciting, interpreting, and using evidence about student achievement to adapt instruction to the needs of students (Black & Wiliam, Citation2009). Despite the potential of formative assessment for improving the mathematical achievement of students, research has shown that both formative assessment practice and the development of formative assessment materials are challenging (e.g., Black & Wiliam, Citation1998; Lee et al., Citation2012, Quyen & Khairani, Citation2016; Yin et al., Citation2008) and that theoretically effective elements are not always followed by the expected effects (Maier et al., Citation2016, Förster & Souvignier, Citation2014). Therefore, studies that test the effectiveness of theoretically developed interventions in practice are needed (e.g., Schütze et al., Citation2018).

In this context, the mathematics support program “Mathe macht stark” (MMS; Mathematics Makes You Strong) for at-risk students in the first and second grades of elementary school (age 6–8 years) was developed, coordinated, and financed by the German federal state of Schleswig-Holstein. MMS is based on formative assessment and was developed in line with recommendations made on the basis of previous implementation studies. These recommendations concern, for example, the frequent use of assessment tasks and the subsequent adaptation of teaching based on the obtained information, the easy administration of the program in the regular classroom, a clear alignment with the educational standards and the prescribed curriculum being taught, as well as the recommendation that the program not be time-consuming and that it accompany professional development (Frohbieter et al., Citation2011; Herman & Gribbons, Citation2001; Karuza, Citation2004; Lembke & Stecker, Citation2007; Nicol & Macfarlane-Dick, Citation2006; Sharkey & Murnane, Citation2006; Vendlinski et al., Citation2009). The MMS program provides material for teachers that can be used during and in addition to regular mathematics teaching (i.e., it does not involve either additional curriculum elements or prestructured lessons) to diagnose and support at-risk students. The program implementation includes short in-service teacher training sessions with information about the program and how to use the assessment tasks and the information obtained from these tasks to support students’ learning. The aim of the program is to recognize students’ learning difficulties in arithmetic at the earliest stage possible and to improve students’ understanding of the key concepts of the arithmetic curriculum in Grades 1 and 2. The effectiveness of the MMS program was examined in a three-year longitudinal study, which covered the two-year implementation phase in Grades 1 and 2 and a follow-up assessment one year after the intervention at the end of Grade 3. Schools that did not participate in the MMS program served as the comparison group. The evaluation also included an investigation into whether the program’s effect can be increased by providing the schools with additional teacher working hours. By presenting the findings of this study, which are based on 135 school classes, this article answers the call for research results to back up formative assessment programs by conducting ecologically valid implementation studies.

1.1. Students with Mathematical Difficulties—The Need for Early Intervention

Students already have different levels of cognitive prerequisites (Magnuson & Duncan, Citation2006) and prior knowledge (Hoff & Tian, Citation2005; Verdine et al., Citation2013) before entering elementary school. The children’s socioeconomic status (SES) has been associated with those differences in early childhood (e.g., Klein et al., Citation2008).

Even though growth trajectories are highly variable across students (Dumas et al., Citation2019), studies show that early mathematics skills are related to later mathematics learning (the average effect size of school-entry mathematics skills for later mathematics achievement in school was .34; Duncan et al., Citation2007). Morgan et al. (Citation2009) showed that children with mathematical difficulties had the lowest growth rates from first grade through fifth grade. Furthermore, there is evidence of an achievement gap, in which under-resourced students systematically underperform their more privileged peers in math (Bohrnstedt et al., Citation2015; Lee, Citation2002; Reardon & Galindo, Citation2009). Those early achievement gaps in mathematics tend to widen throughout schooling (Burchinal et al., Citation2011; Cameron et al., Citation2015; Klibanoff et al., Citation2006).

Therefore, students whose performance in early mathematics is considerably lower than that of their peers are often identified as being at risk or disadvantaged regarding their future development in mathematics (the 25% to 35% lowest performing students;Footnote1 Hanich et al., Citation2001; Jordan et al., Citation2003; McLean & Hitch, Citation1999; Morgan et al., Citation2009). Those students are often called low-achieving or low-performing students (e.g., Baker et al., Citation2002; Smith et al., Citation2013), students with mathematical difficulties (MD; e.g. Jordan et al., Citation2003), or students at risk for MD (e.g. Gersten et al., Citation2015; Morgan et al., Citation2009). In this article, we use the latter term.

The results regarding the association between early and later mathematics achievement as well as the persistence of differences in early mathematical knowledge and skills provide a compelling rationale for schools to support students who are at risk for MD. This is then linked to the necessity of providing schools with effective methods for supporting students early in their school careers (Smith et al., Citation2013). A wide variety of early interventions aiming at improving the mathematical achievement of students have been developed, some of which have demonstrated the capability to significantly improve mathematical outcomes (e.g., early numeracy preventative Tier 2 intervention, Bryant et al. Citation2011; Building Blocks, Clements & Sarama, Citation2007; Clements et al., Citation2013; cognitively guided instruction, Carpenter et al., Citation2015; preventive 1st-grade tutoring, Fuchs et al., Citation2005). Although preschool intervention can significantly steepen learning trajectories (Dumas et al., Citation2019), some studies indicate that positive intervention effects decrease or fade out over time (Bailey et al., Citation2018; Leak et al., Citation2012; Li et al., Citation2020; Preschool Curriculum Evaluation Research Consortium, Citation2008; Turner et al., Citation2006).

1.2. Formative Assessment

Using information obtained by formative assessment techniques has been described as an effective means to improve student performance, in particular that of lower-performing students (Dunn & Mulvenon, Citation2009). Fuchs et al. (Citation2008) even conclude that ongoing progress monitoring is the most essential element in any type of intervention with low-performing students, and this needs to occur in the form of formative assessments. Definitions of formative assessment differ in the literature. Sometimes formative assessment is defined as testing material, sometimes as a process of assessment, but it is always described with a formative purpose (Dunn & Mulvenon, Citation2009). In this article, we follow the description of Black and Wiliam (Citation2009), who consider it to be a characteristic element of formative assessment “that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions they would have taken in the absence of the evidence that was elicited.” (Black & Wiliam, Citation2009, p. 9). There are several criteria for classifying different types of formative assessment. A fairly common distinction regarding the degree of formalization is made by Shavelson (Citation2006). The author distinguishes between (1) on-the-fly formative assessment (lesson sequences in which teachers spontaneously identify misconceptions and gaps in understanding, provide feedback, and adapt the lesson content accordingly), (2) planned-for-interaction assessment (planned lesson sequences, such as question-and-answer sequences that provide information about the students’ learning level), and (3) curriculum-embedded assessment (relatively strongly formalized; it can be part of the conventional teaching activity, e.g., homework or presentations, but it can also be embedded in the curriculum at key points through additional assessment instruments such as individual tasks or whole test procedures and batteries; for a description of possible implementations, see Schütze et al. [Citation2018]).

The basic idea of formative assessment—that is, using assessment to obtain information about student thinking in order to compare it with the intended learning goal and, in the case of inadequate fit, to modify teaching and learning activities to help students achieving the learning goal—has gained increasing attention since the mid-1990s (Souvignier & Hasselhorn, Citation2018). An increasing number of studies deal with the impact and design of formative assessment in mathematics instruction. Based on their critical analysis of the work of Black and Wiliam (Citation1998) and on their own review of nine studies from 1999 to 2007, Dunn and Mulvenon (Citation2009) argue that there is some support for the impact of formative assessment on student achievement, but there is still a need for studies with more efficient methodologies and designs to facilitate conclusive results about its effectiveness. There is evidence that formative assessment has a positive impact on the whole group of students (e.g., Decristan et al., Citation2015). However, Phelan et al. (Citation2011) report findings that students with higher pretest scores benefited especially from the formative assessment. Other studies show that formative assessment is an effective means for low-achieving students (Bottge et al., Citation2021, King, Citation2016).

Kingston and Nash (Citation2011, Citation2015a, Citation2015b) report a small-to-medium effect of formative assessment on performance in their meta-analysis based on 13 studies from 1990 to 2010 using a control-group or comparison-group design. The results of their moderator analysis suggest that the effectiveness depends on the school subject (e.g., they found a higher effect on English language arts, d = .32, than on mathematics, d = .17, or science, d = .09) as well as on features related to the specific implementation of the formative assessment (e.g., professional development, d = .30, and the use of computer-based formative systems, d = .28, were more effective than e.g., curriculum-embedded assessments, d = −.05, and the specific use of student feedback, d = .03). Yet studies on long-term effects (≥ 1 year) are scarce. Those studies that included a follow-up measure found contradicting outcomes regarding long-term effects (Andersson & Palm, Citation2017).

Despite the supportive evidence regarding formative assessment, research has identified factors that hinder an effective implementation of formative assessment in everyday school practice. The implementation of formative assessment is still very demanding, time-consuming, and frequently rather unsystematic (e.g., Black & Wiliam, Citation1998; Quyen & Khairani, Citation2016; Yin et al., Citation2008). Other studies describe a lack of teaching materials suitable for systematic and sensitive assessments. Furthermore, schools and teachers often lack the background and capacity to engage in assessment (Herman et al., Citation2006; Herman & Gribbons, Citation2001; Quyen & Khairani, Citation2016; Stiggins, Citation2002; Yin et al., Citation2008).

In addition, the precise implementation of effective formative assessment programs is content- and competency-specific and there is some question whether it can be described in general terms. Such content-related principles can be found in the subject-specific didactical literature and often have no explicit reference to formative assessment (Schütze et al., Citation2018). But there are promising general factors for achieving successful implementation based on previous studies, that are discussed in the literature (Phelan et al., Citation2011). In the following, we outline and discuss some of these promising factors but also barriers for successful curriculum-embedded formative assessment implementation based on previous studies, without any claim to completeness. For example, curriculum-embedded formative assessment is recommended, including frequent assessments to generate information about students’ learning progress, so that teachers can continuously make instructional decisions (Lembke & Stecker, Citation2007; Nicol & Macfarlane-Dick, Citation2006). However, research suggests that teachers are often unable to conduct the assessments and use the information from assessments to make instructional decisions (Karuza Citation2004; Vendlinski et al., Citation2009). Formative assessment practice including a dynamic process of evidence elicitation, analysis, and action is a very complex and demanding task for the teacher and makes demands on teachers’ content and pedagogical knowledge. In this context, Phelan et al. note critically that “there is a great deal of rhetoric surrounding formative assessment and doing something with the results, but in reality, teachers do not always have the wherewithal to do anything except repeat what they have already done” (2011, p. 331). Hence, professional development is required for teachers to learn how to conduct such assessments, and especially how to analyze the data and derive instructional decisions to help close the gaps between a student’s current performance and the stated learning goal (Herman et al., Citation2006; Karuza, Citation2004; Sharkey & Murnane, Citation2006). Other barriers for realizing formative assessment in the classroom are the preparation, application time, and effort needed (e.g., Lee et al., Citation2012). Despite the advantages of curriculum-embedded formative assessment tools created for teachers, interventions seem to offer benefits when teachers are involved in the practical implementation. Therefore, teachers need time for this extra work, ideally during the school day (Sharkey & Murnane, Citation2006). In particular, teachers need effective strategies with which they can realize formative assessment within the given time constraints of school lessons (Hondrich et al., Citation2016). Therefore, the assessment should be easy to administer, clearly aligned with the prescribed educational standards and curriculum being taught, user-friendly, and not time-consuming (Lembke & Stecker, Citation2007; Sharkey & Murnane, Citation2006). Finally, a specific formative assessment program has to be implemented over a long period of time in order to obtain a sustained integration of formative assessment into teachers’ pedagogical practices (Vendlinski et al., Citation2009).

In summary, there is a need to provide schools with effective methods for supporting at-risk students at the earliest stage possible in their school careers. Formative assessment can potentially be an effective method in conjunction with a successful implementation in everyday school practice. But even with a consensus on the effectiveness of formative assessment in facilitating student learning, a need for further research still exists on the implementation of this approach as well as for materials developed specifically for practical use (Dunn & Mulvenon, Citation2009; Schütze et al., Citation2018). Furthermore, there is a need for intervention studies in which specific measures to implement the concept of formative assessments are tested in practice and are scientifically evaluated (Souvignier & Hasselhorn, Citation2018); there is also a need for follow-up studies (Andersson & Palm, Citation2017). Therefore, the purpose of this study was to evaluate the implementation of a formative assessment program that aims to support students at risk for MD in the first two grades of elementary education.

1.3. The MMS Support Program—Overview of the Study

The MMS program is a state-driven mathematics support program in Germany that aims to support elementary school students at risk for MD. The program is based on formative assessment and takes into account the above-mentioned factors that previous research has identified as being promising for an effective implementation. MMS focuses on students at the beginning of their school career in Grades 1 and 2, where mathematics learning starts in a formal context (Germany does not have a mathematics curriculum for kindergarten and kindergarten teachers are usually not trained in mathematics education). The idea of MMS is to support teachers in their individual everyday mathematics teaching rather than to prescribe how to teach ready-made teaching material. This approach includes a sequence of assessments as well as systematic information to support teachers’ use of those assessments to improve student learning. In this manner, assessment material and guidance are provided to teachers aiming to diagnose and—if necessary—improve students’ understanding of the key arithmetic concepts and principles in Grades 1 and 2 that are central prerequisites for further mathematical development. The focus on arithmetic is motivated by the crucial role of arithmetic in students’ mathematics development (e.g., Jordan et al., Citation2003). Arithmetic makes up a large part of the German elementary school standards for mathematics, which is reflected in the mathematics curriculum in each federal stateFootnote2 (KMK—Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland [the Standing Conference of the Ministers of Education and Cultural Affairs of the Federal States in the Federal Republic of Germany], Citation2005).

1.3.1. Content of the MMS Support Program

The two-year intervention targets central arithmetic knowledge and skills. The intervention is structured into seven “milestones” of the arithmetic curriculum. A milestone encompasses 2–5 mathematical topics to be taught within a curriculum unit, and for each topic learning goals for the students are defined. Together, all seven milestones comprise 23 basic arithmetic topics from the Grade 1 and Grade 2 curriculum, for example, ordering and comparing (sets, lengths, numbers), counting (forward, backward, counting on, counting by 2s), number relations (successor, predecessor), part-whole relations, elementary strategies for addition and subtraction (counting on, doubling, halving, commutativity, bridging through 10), the place-value system, etc. These topics were chosen because of research findings on their importance for students’ arithmetic achievement. The milestones are arranged in accordance with the curriculum of Grade 1 and Grade 2. Materials were developed to enable teachers to identify students at risk for MD, to detect the specific difficulties of those students in reaching the learning goals, and to select appropriate teaching material and strategies to support those students in learning the specific arithmetic topics. The materials comprised a teacher booklet and a student booklet, structured in line with the arithmetic topics of the seven milestones. The MMS material was developed by expert teachers from the Institute for Quality Development in Schools in Schleswig-Holstein and reviewed by mathematics education researchers. The material was professionally designed after completion of a pilot phase in which small adjustments were made.

The focus on professional development was another pillar of the MMS intervention. When teachers use MMS for the first time, the implementation is accompanied by short teacher training sessions organized by the Institute for Quality Development in Schools in Schleswig-Holstein. The professional development focus has two goals. First, the teachers should receive information about the design, materials, and implementation of MMS and should receive guidance on how to integrate the program into their daily teaching practice. Second, the training sessions provide opportunities for the teachers to acquire relevant content knowledge, pedagogical content knowledge, and assessment knowledge (Timperley et al., Citation2007). This specific professional knowledge is relevant for the teachers in interpreting the diagnostic information obtained through the MMS assessment material and for connecting this information with effective teaching material and strategies to support students with specific difficulties. For the second goal, the material is organized in a flexible manner so that it can easily be adapted to the needs of the participating teachers (e.g., if specific gaps in teachers’ knowledge become apparent during a session). The framework of the training sessions was designed according to criteria for effective teacher professional development, as described in meta-studies (e.g., Timperley et al., Citation2007; Yoon et al., Citation2007). More specifically, this means that the sessions encompass training on five occasions (an initial session of seven hours and four sessions of four hours in the first school year) and are coherent and content-focused (each session is directly associated with the MMS content topics), interactive (teachers can reflect on their experiences between sessions), based on the teachers’ needs (learning opportunities are provided if gaps in professional knowledge become apparent), and practical (the teachers can directly integrate the presented material into their next lessons). In addition, the teachers are given an easily accessible teacher booklet with guidelines on how to use the MMS material, which provides relevant information for each content topic at a glance.

1.3.2. Intervention Sequence

The MMS materials for the different arithmetic topics were designed to complement the different kinds of existing curricular material in such a way that the MMS materials can fit into the mathematics teachers’ individual teaching frameworks and timelines during regular mathematics lessons. Hence, the idea of MMS is not to present new or innovative content to the teachers. Instead, the main feature of MMS is to combine well-proven assessments, formats, and learning material in such a way that it can easily be understood and used by teachers. Because the topics treated in MMS are the core ideas of the first- and second-grade mathematics curriculum, the assessments can easily be incorporated into the ongoing curriculum at appropriate points. MMS assists teachers in: (1) identifying students’ individual learning difficulties in mathematics at the earliest stage possible, and (2) offering suitable learning opportunities and support for students at risk for MD. The procedure for the MMS intervention is as follows:

  1. For each of the 2–5 topics belonging to a milestone, the teacher administers 3–4 short tasks in the form of a class screening (approx. 10 minutes) after a mathematical topic has been taught. The students work individually on the screening tasks in their student booklet (). The tasks cover the learning goals for the topic.Footnote3 The instructions are included in the teacher booklet and read aloud by the teacher. Furthermore, the teacher booklet contains short didactical information regarding the learning goals, tasks, and the domain. Based on the students’ results on these tasks, the teacher can identify students in the class who potentially did not reach the learning goals related to the assessed topic.

  2. The teacher conducts a short interview with the identified students, comprising 3–4 additional tasks, to get insights into the mistakes made by the students and their possible misconceptions (). The interview can be done individually or in small groups. Precise instructions are presented in the teacher booklet and in the student booklet. Based on the interview the teacher can determine where the students are in relation to the learning goal.Footnote4

  3. The teacher can choose suitable learning opportunities based on the information gained through the solutions reached or mistakes made by the students in the class tasks and through the students’ answers in the task-based interview. The teacher booklet contains a brief description of suggested learning activities and corresponding material that is directly linked to each interview task and the observations. These suggested learning activities are briefly explained with the help of didactical recommendations.Footnote5 The MMS program refers only to standard mathematics learning material available in every classroom in Germany.

  4. Steps 1–3 can be implemented in the mathematics classroom when addressing the corresponding mathematical topic so that teachers still have full flexibility. When all topics of a milestone are finished, MMS offers a short summative assessment (up to 10 items) for the milestone content as a test for the entire class. The test covers the learning goals from all mathematical topics in the respective milestone (2–5 topics) to evaluate to what extent the students reached those learning goals.

Figure 1. Example of two tasks from Milestone 2, Part-Whole Relationships. Instructions are read aloud by the teacher. (1. “You see 2 boxes. In the first box there are 5 dots in total. How many of them are hidden? In the second box there are 7 dots in total. How many of them are hidden?” 2. “Write down all ways in which the number 6 can be broken down into smaller numbers in the decomposition house. And now write down all ways in which the number 8 can be broken down into smaller numbers).”.

Figure 1. Example of two tasks from Milestone 2, Part-Whole Relationships. Instructions are read aloud by the teacher. (1. “You see 2 boxes. In the first box there are 5 dots in total. How many of them are hidden? In the second box there are 7 dots in total. How many of them are hidden?” 2. “Write down all ways in which the number 6 can be broken down into smaller numbers in the decomposition house. And now write down all ways in which the number 8 can be broken down into smaller numbers).”.

Figure 2. Example of an interview task from Milestone 2, Part-Whole Relationships. Instructions are given by the teacher (1. “How could you divide the apples among the plates? Is there another possibility?” 2. “Take a good look at the picture. What number is divided here? How many circles are lying left of the pencil?” The child is allowed to reenact the task with two-colour counters, a teaching material available in all German elementary schools).

Figure 2. Example of an interview task from Milestone 2, Part-Whole Relationships. Instructions are given by the teacher (1. “How could you divide the apples among the plates? Is there another possibility?” 2. “Take a good look at the picture. What number is divided here? How many circles are lying left of the pencil?” The child is allowed to reenact the task with two-colour counters, a teaching material available in all German elementary schools).

The program started in the 2013/2014 school year in 100 elementary schools in the German federal state of Schleswig-Holstein. All schools received free material for each student. The program is financed by the Ministry of Education, Science and Cultural Affairs of the German federal state of Schleswig-Holstein.

1.4. The Present Study

The goal of this study was to investigate the effectiveness of the two-year MMS program after completion as well as the sustained effect one year after completion. The study was conducted with the support of IPN—Leibniz Institute for Science and Mathematics Education (Germany). The IPN researchers, however, were not involved in the implementation of the MMS program. We analyzed the effect of MMS on both students at risk for MD and the entire group of students, as the aim of MMS is to support students at risk for MD (see Section 1.3) and because formative assessment is known to also have a positive effect on average-achieving students (see Section 1.2).

The evaluation was conducted in the form of an effectiveness trial to take into account the naturally varying implementation of the real-world situation. Classroom-based learning is strongly contextualized and influential factors, which often cannot be controlled by the teachers and which prevent, restrict, support, or foster student learning, are likely to influence not only student achievement but also interventions designed to improve achievement (e.g., at the individual level: social contexts, community influences, norms, values, and cultural capital; at the school level: beliefs, commitments, education, experience, roles, professionalization, and autonomy of teachers; McDonald et al., Citation2006). Therefore, the goal of this effectiveness trial was to analyze whether the intervention was effective in the real world of typical school settings. We used a quasi-experimental design with two intervention groups and one comparison group. We implemented two intervention groups because time turned out to be an essential factor for teachers to be able to implement the formative assessment methods (see Section 1.2) and had not been an initial component of the MMS program; hence, one group of schools received the originally planned MMS intervention, and a second group of schools received the MMS intervention with two extra teacher working hours per week to support the implementation of the MMS program.

We analyzed the data of a three-year longitudinal study with 135 elementary school classes comprising information on teachers as well as information on students concerning basic cognitive abilities as well as basic numerical and language skills at the beginning of Grade 1 and mathematics skills at the end of Grades 2 and 3. The sample was selected in one federal state of Germany (Schleswig-Holstein). As a result, the prescribed mathematics curriculum in this sample is the same (i.e., the state-wide curriculum). Therefore, this sample allowed us to examine the effects of the formative assessment intervention (MMS) on students’ mathematics achievement with a longitudinal design and a sound sample size while controlling for relevant covariates. Accordingly, our study addressed the following research question:

Does the MMS intervention have an effect on students’ mathematics achievement at the end of Grade 2 as well as at the end of Grade 3?

We developed three hypotheses based on this research question:

  1. Given that the target group of the MMS program was low-performing students, our first hypothesis was that the MMS intervention would have an effect on the achievement of students at risk for MD.

  2. As research suggests that formative assessment raises not only the achievement of low-performing students, but also that of average-achieving and high-achieving students (e.g., Black & Wiliam, Citation1998; Decristan et al., Citation2015; Phelan et al., Citation2011), our second hypothesis was that the MMS intervention would also have an effect on the achievement of the entire group of students.

  3. Studies show that one barrier for realizing formative assessment strategies is the time and effort needed for preparation and application (Lee et al., Citation2012) and that teachers need time to do this extra work, ideally during the school day (Sharkey & Murnane, Citation2006). This is not addressed by the MMS program. Therefore, our third hypothesis was that the intervention would have a greater effect if the schools received additional teacher working hours per week.

2. Methods

2.1. Design and Participants

The study was based on a quasi-experimental design with two intervention groups and one comparison group. One intervention group (MMS+) consisted of 20 elementary schools that received the MMS material (student material, teacher booklet) for Grades 1 and 2 and the in-service teacher training sessions as well as two extra teacher working hours per week.Footnote6 The second intervention group (MMS) consisted of 10 schools and received only the MMS material and the in-service teacher training sessions. The comparison group consisted of 10 elementary schools. Teachers in the comparison group did not get any specific support and were asked to give their regular mathematics instruction. It should be mentioned that, due to a directive of the Ministry of Education, all teachers are required to provide specific learning opportunities for low-achieving students. Although there is no supervision of whether teachers follow this directive, we assumed that the teachers in the comparison group also used some kind of specific learning material. In total, the longitudinal sample consists of 2,330 students nested in 135 classes from 40 elementary schools (about 10% of the student cohort in this federal state).

The 30 schools of the two intervention groups were selected from the 100 schools in the federal state of Schleswig-Holstein that had volunteered for the first wave of the MMS implementation in 2013.Footnote7 The 30 schools were chosen so as to ensure a broad geographical distribution (e.g., urban and rural regions in different parts of the federal state) as well as an adequate distribution of the socio-economic and cultural background. From the 30 schools, 20 schools were randomly chosen for the MMS + group, and we checked that the two groups were comparable with respect to geographical distribution and socio-economic and cultural background. Finally, the 10 schools of the comparison group were chosen so that they were comparable to the intervention group schools in terms of geographical distribution and the socio-economic and cultural background. This means that the sample of our study was not selected strictly randomly. However, since the material was offered for free—which was appealing for all schools—a positive selection was not necessarily the case. Finally, the intervention and comparison groups were comparable concerning the student composition with respect to basic cognitive abilities (see Section 3.1). As a result of sampling all schools were from the same federal state and therefore followed the same state-wide curriculum.

2.2. Measures

2.2.1. Mathematics Achievement

The assessment of students’ learning progress in arithmetic at the end of Grades 2 and 3 was based on federal state curriculum standards. The items for these tests were adapted from approved tests of the longitudinal “personality and learning development of elementary school children” study (Persönlichkeits- und Lernentwicklung von Grundschulkindern, PERLE; Lipowsky et al., Citation2011). The items cover arithmetic knowledge that should be acquired in the Grade 2 and Grade 3 curriculum. The test at the end of Grade 2 (59 items, 40 min) for example covered addition, subtraction, multiplication, division in the 0–1,000 number range, doubling, halving, and factors of a given number. The test at the end of Grade 3 (40 items, 40 min) covered, for example, addition, subtraction, multiplication, division in the 0–10,000 number range, and factors of a given number. The items were constructed and formatted in such a way that they were similar to typical items that students encounter in examples in textbooks. However, the format of these test items differed from that of the items in the MMS assessment material. The short-constructed response format was used (response in the form of a number). For the data analysis some of the items were combined and, in those cases, partial credit coding was used. The Grade 2 and Grade 3 tests had 14 common items so that the same scale could be used for both tests (see Section 2.4). The reliability of the tests was very good (EAP/PV reliability = .90 to .94).

2.2.2. Control Measures

Since this study is a quasi-experiment, the intervention and control groups are not strictly comparable. To analyze the effect of the intervention on the arithmetic achievement, we control for several covariates that might have an impact on students’ arithmetic learning. As mentioned in Section 1.2, teachers’ professional knowledge might influence the quality of formative assessment. Because in the federal state of Schleswig-Holstein about 40% of the mathematics elementary school teachers do not have a formal qualification to teach mathematics (i.e., they are out-of-field teachers), we included the covariate “earned certificate to teach mathematics” (studied mathematics or not) as an indicator of teacher knowledge in our analyses. We also controlled for the covariate “mathematics textbook” as the mathematics textbooks used in this sample had shown an effect on student achievement in a previous study. There were seven textbook categories in our sample (four categories for the frequently used textbooks A–D, the category “other books” for textbooks that were used only sporadically in our sample, “no books” for classes that did not use a textbook, and “missing book” for missing information). Data on earned certificate to teach mathematics and mathematics textbook were collected using a teacher questionnaire.

In order to control for individual differences between individual students as well as between classes, students’ basic numerical and language skills were measured at the beginning of Grade 1 using approved standardized tests (basic numerical skills: Hamburger Rechentest, HaReT, [Lorenz, Citation2007], Cronbach’s α =.74; German language skills: Münsteraner Screening, MÜSC, [Mannhaupt, Citation2012], Cronbach’s α = .72). Students’ basic cognitive abilities were measured using the Culture Fair Intelligence Test (CFT 1–R, Weiß & Osterland, Citation2013, Cronbach’s α = .91).

To account for differences between schools, we control for the average of school-level achievement scores from the German national mathematics comparison test at the end of Grade 3 (Vergleichsarbeiten, VERA) from 2010 to 2014 in the multilevel regression model.

2.3. Procedure

Data collection started in the school year 2013/2014. At the beginning of Grade 1, basic numerical and language skills were assessed. This was followed by an assessment of basic cognitive abilities halfway through Grade 1. After completion of the MMS intervention at the end of Grade 2 and one year after completion at the end of Grade 3, the mathematics achievement was assessed. All tests were administered by trained test administrators in accordance with the respective manuals.

2.4. Data Analyses

The tests of Grade 2 and Grade 3 had 14 common items and 45 and 26 unique items, respectively. This anchor-item design of the mathematics tests allowed us to estimate the student mathematics scores in Grades 2 and 3 on the same scale using Item Response Theory (IRT). Since the test contained partial credit items, an extension of the Rasch model, the IRT partial credit model (PCM; Masters, Citation1982), was used. The student scores were estimated in a two-step procedure. First, the Grade 2 test was calibrated using the data of all students who completed the test to monitor the quality of the items and estimate the item parameters. Second, the same procedure was done for the Grade 3 test, but the item parameters of the 14 common items were fixed to the parameters of the Grade 2 scaling (fixed item parameter calibration, FIPC). In this manner the two tests are linked and the item and person parameters of the Grade 3 test can be interpreted on the Grade 2 scale.

Next, five plausible values (PV) for each of the 2,230 students were generated for Grade 2 and Grade 3 each, by fitting a latent regression model and using item parameters anchored at their estimated values from the previous calibration. A detailed review of the PV methodology is given in Mislevy (Citation1991). This method was chosen because it allows for unbiased population-level analyses of competence distributions and context variables (Davier et al., Citation2009).

To analyze the effect on students at risk for MD, we first identified the low-achieving students in the sample. Therefore, in line with the cited literature, students who scored in the lowest 25th percentile of the basic numerical skill assessment at the beginning of Grade 1 were categorized as being at risk for MD. Second, we estimated multiple regression models to estimate the effects of the covariates and independent variables on the arithmetic achievement. Because of the nested structure of the data (students nested within classes), we used a sandwich estimator to adjust standard deviations for non-independence (with “type = complex” and “cluster = group” in Mplus).Footnote8 To analyze the effect of the intervention on mathematics achievement, we tested four regression models. In the first model we included the basic cognitive abilities as well as basic numerical and language skills (measured by the HaReT, MÜSC, and CFT 1–R), because these variables are known to have an effect on mathematics achievement. In the second model, we included the mathematics textbook (the mathematics textbook used) and the earned certificate to teach mathematics (studied mathematics or not) as dummy-coded covariates, because according to previous studies we expect those variables to have an effect on students’ mathematics achievement (Mullis et al., Citation2012). In the third model, we included the mean of the school VERA scores from 2010 to 2014 to account for differences between schools. In the last model, we included the intervention group variables as dummy-coded variables. In this way, we tested if the coefficients of the intervention groups have any significant predictive effect in addition to the coefficients of the basic cognitive abilities as well as basic numerical and language skills, mean of the school-level VERA sores, textbooks and earned certificate to teach mathematics. Or stated differently, we analyzed the effect of coefficients of the intervention groups while holding the covariates constant.Footnote9

To analyze the effects of the intervention on the entire class, the whole sample was analyzed with multilevel random intercept models to control for the nested structure of the data. First, models without predictors (null models) were run to estimate the partition of variance between and within classes. Second, models were estimated including individual characteristics (at individual level) and class composition (at class level). The basic cognitive abilities as well as basic numerical and language skills were included on the individual level to account for individual differences between students. For all three variables, the individual scores were aggregated to class means and included on the class level to control for class composition. In a third step, we added mathematics textbook and earned certificate to teach mathematics and in a fourth step the mean of the school-level VERA sores to the model. In a final model, we included the intervention group variables on the class level.

To test for significant differences between the regression coefficients of the two intervention groups, we used the Wald chi-square test.Footnote10

Scores for the basic cognitive abilities and basic numerical and language skills as well as the mathematics achievement scores were standardized so that the corresponding ß-coefficients can be interpreted as effect sizes similar to Cohen’s d (see Tymms, Citation2004). For missing data on the independent variables for the basic cognitive abilities and basic numerical and language skills, we applied a full-information-maximum-likelihood (FIML) approach in Mplus 7.0 (Muthén and Muthén Citation1998–2012). FIML combines missing data and parameter estimation in a single step and uses all the available information (Enders, Citation2010). The percentage of missing data in the basic cognitive abilities and basic numerical and language skills was 6.1% for the CFT 1–R, 8.1% for the HaReT, and 8.6% for the MÜSC. These tests were a nonobligatory part of the regular mathematics lessons and we did not detect specific patterns in the missing data. Due to sample selection, there were no missing data for the predictor support program and the school-level mathematics achievement scores (VERA).

3. Results

First, we present a short summary of the descriptive information obtained about the earned certificate to teach mathematics, the mathematics textbooks, and the students’ basic cognitive abilities as well as basic numerical and language skills.

3.1. Descriptive Results

About 39% of the teachers in our sample had a formal qualification to teach mathematics, whereas 51% teachers were qualified for other subjects and taught mathematics as out-of-field teachers. The remaining 10% of the teachers did not provide information about their qualification.

Of the 2,330 students, 562 were identified as being students at risk for MD. The descriptive results for the cognitive abilities as well as basic numerical and language skills and the dependent variables (mathematics achievement) in the two intervention groups and the comparison group are reported in (entire sample) and (students at risk for MD). To analyze baseline equivalence, we tested whether the intervention groups differed from the comparison group in the basic cognitive abilities and numerical and language skills. For the sample of the whole class we found only small differences (Cohens d between −0.06 and 0.15). The differences for the sample of students at risk were a bit higher. For the basic numerical and language skills Cohens d was between −0.19 and 0.19, and for the basic cognitive abilities the comparison group showed higher scores than the two intervention groups (Cohens d: 0.28 and 0.33). All our analyses presented below are adjusted by these measures. We also tested whether earned certificate to teach mathematics was independent of the group. The number of teachers having a formal qualification to teach mathematics did not differ significantly between the three groups (χ2[2, N = 135] = 6.65, p = .16).

Table 1. Mean values and standard deviations for individual basic cognitive abilities as well as basic numerical and language skills and mathematics achievement in Grades 2 and 3 (dependent variables) in the intervention groups for the whole sample.

Table 2. Mean values and standard deviations for individual basic cognitive abilities as well as basic numerical and language skills and mathematics achievement in Grades 2 and 3 (dependent variables) in the intervention groups for students at risk for MD.

Null models were run in the multilevel analyses for the entire sample to determine the share of between-class variance in students’ mathematics achievement (given by the intra-class coefficient, ICC). At the end of Grade 2, there was 19.2% variance between the classes; at the end of Grade 3, the variance between the classes was 17.1%. Hence, the mathematics achievement differed depending on the class attended by the student.

3.2. Results for Students at Risk for MD

Regarding the individual characteristics at school entrance (see Model 1 in and ), basic cognitive abilities and basic numerical skills showed the expected significant association with mathematics achievement at the end of Grades 2 and 3. Controlling for those individual characteristics at school entrance, the mean of the school-level VERA scores showed no significant association with mathematics achievement at the end of Grades 2 and 3. Three textbooks showed a significant association with mathematics achievement at the end of Grade 2 (β = .37 − .47) and two textbooks showed a significant association with mathematics achievement at the end of Grade 3 (β = .32 − .48). In order to avoid putting too much information into the tables, these covariates are not displayed in and . These outcomes are in line with previous research on textbooks.

Table 3. Regression for individual and classroom covariates and intervention groups on arithmetic achievement for students at risk for MD at the end of Grade 2.

Table 4. Regression for individual and classroom covariates and intervention groups on arithmetic achievement for students at risk for MD at the end of Grade 3.

When including the dummy variables for the intervention groups in the regression models, the explained variance increased slightly for Grades 2 and 3 (Grade 2: ΔR2 = 1.6%, Grade 3: ΔR2 = 1%; Model in and ; the comparison group is the reference category). The regression coefficients of the two intervention groups in Grade 2 were of considerable size and showed the significant average difference in the standardized student achievement between the respective intervention group and the comparison group (Model 4 in : MMS: β = .35 and MMS+: β = .27). Therefore, we can conclude that, compared to the comparison group, the intervention had a positive effect on the mathematics achievement of students at risk for MD after a two-year implementation period. There was no significant difference between the effects of the two intervention groups, which means that the additional teacher working hours per week did not have an effect, X2 (1, N = 2,330) = 0.41, p = 0.52. Regarding the end of Grade 3, only the MMS group showed a significant effect on the achievement of students at risk for MD (Model 4 in : β = .29) whereas the effect of MMS+ (β = .12) was not significant. Hence, one year after completion of the intervention, after teaching without intervention material, the MMS intervention still affected the achievement of students at risk for MD. The finding that the MMS + intervention with additional teacher working hours did not have a significant effect one year after the intervention ended is rather interesting and is discussed in Section 4.1.

3.3. Results for the Entire Class

Similar to the results found for students at risk for MD, cognitive abilities and basic numerical skills had a significant positive association with the mathematics achievement of the entire class at the end of Grades 2 and 3, whereas language skills had a significant positive but small association with mathematics achievement at the end of Grade 2 and did not have a significant association at the end of Grade 3 (see Model 1 in and ).

Table 5. Multilevel regression for individual and classroom covariates and intervention groups on students’ arithmetic achievement at the end of Grade 2.

Table 6. Multilevel regression for individual and classroom covariates and intervention groups on students’ arithmetic achievement at the end of Grade 3.

The association between class composition regarding the basic numerical and language skills and the mathematics achievement at the end of Grades 2 and 3 was not significant, whereas the composition regarding the basic cognitive abilities yielded substantial regression coefficients and had a significant association with the mathematics achievement at the end of Grades 2 and 3. Controlling for those class-level characteristics at school entrance, the mean of the school-level VERA mathematics scores showed no significant association with mathematics achievement at the end of Grades 2 and 3.

When including the dummy variables for the intervention groups in the multilevel models (Model 4 in and ), the explained variance increased noticeably (Grade 2: ΔR2 = 9.9%, Grade 3: ΔR2 = 8.4%). At the end of Grade 2, the regression coefficients of the two intervention groups reached the level of significance (MMS+: β = .21 and MMS: β = .30) and showed a considerable difference in average student achievement relative to the comparison group. The intervention groups did not differ significantly, X2 (1, N = 2,330) = 1.27, p = 0.26. We can conclude that, after a two-year intervention period, the MMS program had a positive effect on student achievement for the entire group of students relative to the comparison group. At the end of Grade 3—one year after completion of the intervention—the MMS intervention still had a significant effect on student achievement (MMS+: β = .16 and MMS: β = .29). Again, the intervention groups did not differ significantly, X2 (1, N = 2,330) = 2.74, p = 0.10.

4. Discussion

4.1. Summary and Implications for Educational Research

Supporting students with difficulties in learning mathematics is challenging for teachers. Since previous research has shown that early mathematics skills are related to later mathematics development (e.g., Duncan et al., Citation2007, kindergarten through third grade; Morgan et al., Citation2009, kindergarten through fifth grade), low-achieving students should be supported at the earliest stage possible in their school careers. Although there is evidence that formative assessment approaches can play a successful role in supporting at-risk students, there is a need for ecologically valid intervention studies in which formative assessment techniques are implemented in everyday school practice and are evaluated scientifically (Souvignier & Hasselhorn, Citation2018). The goal is to provide schools with effective methods for supporting low-achieving students that can be used from the time students enter school and that are easy to implement in the regular classroom without radical changes having to be made in teachers’ individual teaching style. Using an effectiveness trial with a quasi-experimental design and a sound sample size, our study addresses this need and contributes to the existing research by analyzing the effect of the MMS program. This program aims to support students at risk for MD in their first two years in elementary school and is based on a formative assessment approach. It incorporates characteristics considered effective in the literature, for example, the frequent use of assessment tasks, the use of existing assessments that are easy to administer, a clear alignment with the learning goals of the prescribed educational standards and curriculum being taught, assessments that are user-friendly and not time-consuming, and guidance for teachers to interpret assessment data and to select appropriate teaching material (Frohbieter et al., Citation2011; Herman et al., Citation2006; Karuza, Citation2004; Lembke & Stecker, Citation2007; Nicol & Macfarlane-Dick, Citation2006; Sharkey & Murnane, Citation2006). By analyzing a large longitudinal dataset, we were able to show that students at risk for MD in the intervention groups on average reached higher mathematics achievement than students at risk for MD in the comparison group. Therefore, we can conclude that students at risk for MD benefited from the two-year MMS implementation at the end of Grade 2 and that this effect was still maintained one year after the intervention and without providing specific Grade 3 formative assessment material. It turned out that not only the mathematics achievement of students at risk for MD was fostered but also the achievement of the whole group of students. Hence, the formative assessment elements the teachers used in the mathematics classrooms for students at risk for MD were also beneficial for the other students. This outcome is in line with previous research (Black & Wiliam, Citation1998; Decristian et al., 2015). Yet in the light of previous follow-up studies showing that intervention effects often fade out after the completion of the intervention (Bailey et al., Citation2018; Li et al., Citation2020), our results stand out. One reasonable explanation could be that the teachers may have learned effective methods with which they can carry out formative assessments in their ongoing classroom practice and they were thus able to maintain some of the practices even after completion of the intervention.

A second version of the MMS program (the MMS+) was implemented because previous research has indicated that the time available to teachers during the school day is an important factor for a successful implementation of formative assessment programs (Lee et al., Citation2012; Sharkey & Murnane, Citation2006). MMS + was enriched by two additional teacher working hours per week and per school. Interestingly, these extra teacher working hours did not have any added positive effect on students at risk for MD in the follow-up test (i.e. at the end of Grade 3). More specifically, the MMS + group’s positive effect on students’ achievement in the whole class at the end of Grade 2 (i.e., for the intervention period) and Grade 3 was similar to the positive effect of the MMS group (without extra teacher working hours). For students at risk for MD, the MMS group showed a significant effect at the end of both grades (i.e., Grades 2 and 3), whereas the effect of the MMS + group reached significance at the end of Grade 2 and was not significant at the end of Grade 3. A possible explanation for these unexpected findings on the effectiveness of the MMS and MMS + groups is that, in the MMS + group, schools used the additional teacher working hours in different ways, for example to do team-teaching (some mathematics lessons were taught by two teachers) or to separate students after the diagnosis (students who needed extra support were taught by another teacher in a different room or even got extra lessons). In such cases the regular mathematics teacher is teaching as usual and identifies students who experience difficulties. Those students then get extra support with the MMS material from the team-teacher or co-teacher. Hence, the availability of additional teacher working hours led to the involvement of a second teacher for parts of the lessons, which possibly resulted in an outsourcing of the formative assessment elements from the regular mathematics teacher to another teacher. Students benefited from this formative assessment treatment similarly to the students in the MMS group without extra teacher working hours. However, with the additional teacher working hours falling away in Grade 3, it is possible that the mathematics teachers of the MMS + group were not able to continue the formative assessment practices because they had not learned how to implement these methods within their regular teaching hours. In the German education system, frequent teacher changes are avoided in elementary schools. Therefore, it is common for one mathematics teacher to stay with one class for several years (we unfortunately have no data on whether the mathematics teacher changed during our study). This would be in line with research findings that teachers need to implement the formative assessment over a long period of time to learn those methods and how to use them in their classroom practice (e.g., Herman et al., Citation2006; Karuza, Citation2004; Sharkey & Murnane, Citation2006; Vendlinski et al., Citation2009) and that interventions effects often fade out after the intervention is completed (Bailey et al., Citation2018; Li et al., Citation2020). The result that the effect of the MMS+ intervention vanished one year after the intervention was completed highlights the importance of ongoing progress monitoring, especially for supporting students at risk for MD (e.g., Fuchs et al., Citation2008).

In summary, our findings on the positive effect of the MMS implementation are in line with previous formative assessment research, which has shown a positive effect of formative assessment on low-achieving students as well as on average-achieving students (e.g. Black & Wiliam, Citation1998; Decristan et al., Citation2015; Phelan et al., Citation2011). With the results obtained from our effectiveness trial, our study contributes to the existing research by answering the call to test the implementation of formative assessments in everyday school practice and evaluating this implementation scientifically. Our study suggests that the positive effects of formative assessment remain after the intervention if the mathematics teachers themselves learned how to implement the formative assessments during a longer period (here: the 2 years of the intervention program). In order for teachers to achieve this, it is important that the formative assessment program be easy to implement and not require major changes in the individual teaching style, the teaching content, or the methods.

4.2. Limitations

There are several limitations to this study. This study was designed as an effectiveness trial and therefore did not analyze whether the separate components of the program were in fact effective under laboratory conditions. We only analyzed the program as a whole in everyday classroom practice and we do not know whether, and if so how much, the teachers departed from the MMS framework when teaching. Therefore, we cannot draw conclusions about which aspects of the program are effective or which are irrelevant. As noted in the Introduction, all teachers are required by the Ministry of Education to provide specific learning opportunities to low-achieving students. Because of the study design, we do not know which procedures the teachers in the comparison group used. We only can conclude that the intervention groups using MMS on average reached a higher mathematics achievement than the comparison groups that used a variety of procedures.

Furthermore, we assume that, through the MMS program, the quality of teaching increases (in addition to the formative assessment elements implemented) because the MMS program provides teachers with opportunities to improve their professional knowledge. We therefore would assume that the effect of the program on student achievement—at least to some degree—is mediated by the quality of teaching. Due to time and practical restrictions, it was not possible to assess the quality of teaching in this study. Therefore, we cannot make any statements about whether teachers’ professional knowledge possibly moderates the MMS effects. But it could be worthwhile for future research to analyze how teachers implement the program and the extent to which it enhances the quality of teaching.

Another limitation of this study is that we were not able to model the school level in the multilevel analyses because there were not enough groups per school. We also did not have any additional information about the achievement of previous cohorts, which would have enabled us to control for school effects. Furthermore, the schools in the MMS intervention groups where chosen from a pool of schools that volunteered to implement MMS, and the possibility of selection bias can thus not be precluded (Shadish et al., Citation2002). It is possible that mainly schools with certain characteristics or schools that were more eager to improve applied for the MMS program. To account for differences between schools, we included the mean of the school-level achievement mathematics scores (VERA) from 2010 to 2014 in the regression models. We argue that schools which are more eager to improve or have a more advantaged catchment area are more likely to have higher school averages on the VERA test. We did not find significant effects of the VERA scores (). This means that the schools’ average performance on the national VERA test over the last five years before the intervention has no added explanatory effect of the arithmetic achievement. Because the material was offered for free and about one quarter of all elementary schools in the state volunteered for the intervention, we presume that the intervention was appealing to all kinds of schools. Furthermore, the comparison group was chosen in such a way that at least the socioeconomic and cultural background was distributed evenly across the intervention and comparison groups. Our study should therefore be considered as providing first evidence of the effectiveness of the MMS program, which can be further strengthened by conducting replications and varying implementation conditions. Especially the question of long-term effects, that is, whether the MMS program still shows effects at the end of Grade 4, which is the end of elementary school in Germany, is relevant for future research. It would be most interesting to see whether MMS students at risk for MD reach at least the minimum standards at the end of Grade 4 and whether more MMS students transfer to the academic school track than their peers from comparison classes.

One further limitation of our study is the identification of students at risk based on the 25th percentile of the sample regarding the basic mathematical skills. This criterion is highly confounded with the sample used and may not accurately identify those students most at risk. The identification of students at risk for MD at one point in time can be the result of natural fluctuations in the growth of mathematical knowledge or can be due to measurement error. However, as studies have shown a high persistence of MD and a strong relation between early mathematics achievement and later mathematics learning (Duncan et al., Citation2007; Jordan et al., Citation2003; Morgan et al., Citation2009), we do not expect there to be great variation in the identification of students at risk for MD. Another vulnerability of this classification stems from the specific situation in Germany. Kindergarten students in Germany do not have any structural exposure to mathematical learning opportunities. Therefore, one could argue that, on the one hand, differences in the basic mathematical skills are based on developmental differences. On the other hand, children in Germany do encounter mathematical learning opportunities in kindergarten (Krajewski and Schneider, Citation2009). Furthermore, Krajewski and Schneider (Citation2006) were able to show that, while controlling for basic cognitive abilities and basic literacy skills, the basic mathematical skills of students in kindergartens in Germany still correlated rather highly with the mathematics achievement at the end of Grade 4. Therefore, we assume that also in Germany, students with low basic mathematical skills at the beginning of Grade 1 are likely to have disadvantages regarding their mathematical development in comparison to their peers, based on more than just developmental differences.

4.3. Educational Implications

The evaluation showed that the implementation of an early formative assessment program had a positive effect on the achievement of students at risk for MD as well as on the entire class. The results of this study emphasize the potential of the formative assessment method. Furthermore, the MMS program answers the call to provide schools with methods, material, and professional teacher development to support students at risk for MD already at an early stage in their school career. The program could be an example of how to successfully incorporate the potential effective factors for implementing formative assessments discussed in previous studies. Moreover, assessment programs such as MMS could be an instrument to enhance educational effectiveness without the need for additional teacher working hours.

The MMS program takes into account the relevant factors required to successfully implement the formative assessment method in everyday school practice. The program is relatively easy to implement because there are no changes in the curriculum, no new content to teach, no new methods to use, and no need for a radical change in teachers’ individual teaching style. The comprehensive program incorporates professional development, assessment tasks, and specific feedback in the form of a targeted intervention for the milestones of arithmetic development. Additional teacher working hours are not necessary for the successful implementation of the program. This result is of major interest from an economic perspective, because the continuous supply of additional teacher working hours would be the most expensive element of the program.

4.4. Conclusion

This study contributes to research on the effectiveness of early formative assessment on mathematics achievement and answers the call to provide schools with methods and material to close the achievement gap as early as possible. The longitudinal data indicate the stability of MMS effects from the end of Grade 2 up to the end of Grade 3. The findings imply that formative assessment, as implemented in the MMS program, can foster the development of arithmetic achievement for students at risk for MD as well as for the entire class, without changing the curriculum, and without requiring teachers to learn new methods or teach new content.

Notes

1 As mentioned before the developmental trajectory of children in this age varies considerably. Thus, it must be noted that the identification of students at risk for MD at one point in time by a relative performance criterion has its weaknesses. We discuss this problem in the limitations section.

2 In Germany the national educational standards are a framework for the whole country. Each federal state has a state-specific curriculum that is aligned to these standards and all schools in a federal state have to follow the state-specific curriculum. In this manuscript we use the term “curriculum” to refer to this level of obligatory learning content and objectives, which is the most concrete and binding.

3 Example of learning goals from Milestone 2, Part-Whole Relationships: Students have developed a basic idea of the decomposition of numbers. They are able to decompose numbers with concrete material, they are able to decompose numbers mentally by imagining concrete material, they achieve automatization of decomposition.

4 Possible observations for the interview task regarding the topic Part-Whole Relationships: (1) Decompositions based on material can be made; (2) Decompositions can be made (without material); (3) All decompositions of a given number can be found; (4) Decompositions can be identified without counting; (5) Decompositions can be made with high speed.

5 For example, for observation (1) from Footnote 4, learning activities regarding decompositions based on material are suggested, for observation (4), learning tasks regarding the automatization of the decomposition are suggested.

6 In these schools, the teacher participating in the MMS intervention was relieved from other responsibilities for these two hours.

7 The 100 schools account for a quarter of all elementary schools in this state.

8 Because the Level 2 cluster sizes in this subsample were too small, we used linear regression models with the “type = complex” procedure instead of multilevel models to correct for standard errors.

9 As suggested in the review process, we conducted alternative regression analyses by including the predictors in a different order (first the intervention groups, then the individual school entry measures, the textbooks and teacher variable, and finally the school-level achievement scores). The findings are presented in the Appendix.

10 Based on a suggestion in the review process, we also conducted alternative multi-level analyses for the whole sample. Again, the findings are presented in the Appendix.

References

  • Andersson, C., & Palm, T. (2017). The impact of formative assessment on student achievement: A study of the effects of changes to classroom practice after a comprehensive professional development programme. Learning and Instruction, 49, 92–102. https://doi.org/10.1016/j.learninstruc.2016.12.006
  • Aunola, K., Leskinen, E., Lerkkanen, M.-K., & Nurmi, J.-E. (2004). Developmental dynamics of math performance from preschool to Grade 2. Journal of Educational Psychology, 96(4), 699–713. https://doi.org/10.1037/0022-0663.96.4.699
  • Baker, S., Gersten, R., & Lee, D.-S. (2002). A synthesis of empirical research on teaching mathematics to low-achieving students. The Elementary School Journal, 103(1), 51–73. https://doi.org/10.1086/499715
  • Bailey, D. H., Duncan, G. J., Watts, T., Clements, D., & Sarama, J. (2018). Risky business: Correlation and causation in longitudinal studies of skill development. The American Psychologist, 73(1), 81–94. https://doi.org/10.1037/amp0000146
  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
  • Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31. https://doi.org/10.1007/s11092-008-9068-5
  • Bohrnstedt, G., Kitmitto, S., Ogut, B., Sherman, D., & Chan, D. (2015). School composition and the Black–White achievement gap (NCES 2015-018). National Center for Education Statistics.
  • Bottge, B. A., Ma, X., Gassaway, L. J., Jones, M., & Gravil, M. (2021). Effects of formative assessment strategies on the fractions computation skills of students with disabilities. Remedial and Special Education, 42(5), 279–289. https://doi.org/10.1177/0741932520942954
  • Bryant, D. P., Bryant, B. R., Roberts, G., Vaughn, S., Pfannenstiel, K. H., Porterfield, J., & Gersten, R. (2011). Early numeracy intervention program for first-grade students with mathematics difficulties. Exceptional Children, 78(1), 7–23. https://doi.org/10.1177/001440291107800101
  • Burchinal, M., McCartney, K., Steinberg, L., Crosnoe, R., Friedman, S. L., McLoyd, V., & Pianta, R., NICHD Early Child Care Research Network (2011). Examining the Black-White achievement gap among low-income children using the NICHD study of early child care and youth development. Child Development, 82(5), 1404–1420. doi:10.1111/j.1467-624.2011.01620.x
  • Cameron, C. E., Grimm, K. J., Steele, J. S., Castro-Schilo, L., & Grissmer, D. W. (2015). Nonlinear Gompertz curve models of achievement gaps in mathematics and reading. Journal of Educational Psychology, 107(3), 789–804. https://doi.org/10.1037/edu0000009
  • Carpenter, T. P., Fennema, E., Franke, M. L., Levi, L., & Empson, S. B. (2015). Children's mathematics: Cognitively guided instruction. Heinemann.
  • Clements, D. H., & Sarama, J. (2007). Effects of a preschool mathematics curriculum: Summative research on the Building Blocks project. Journal for Research in Mathematics Education, 38, 136–163.
  • Clements, D. H., & Sarama, J. (2011). Early childhood mathematics intervention. Science, 333(6045), 968–970. https://doi.org/10.1126/science.1204537
  • Clements, D. H., Sarama, J., Wolfe, C. B., & Spitler, M. E. (2013). Longitudinal evaluation of a scale-up model for teaching mathematics with trajectories and technologies: Persistence of effects in the third year. American Educational Research Journal, 50(4), 812–850. https://doi.org/10.3102/0002831212469270
  • Davier, M. v., Gonzales, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series, 2009(2), 9–36.
  • Decristan, J., Hondrich, A. L., Büttner, G., Hertel, S., Klieme, E., Kunter, M., Lühken, A., Adl-Amini, K., Djakovic, S.-K., Mannel, S., Naumann, A., & Hardy, I. (2015). Impact of additional guidance in science education on primary students’ conceptual understanding. The Journal of Educational Research, 108(5), 358–370. https://doi.org/10.1080/00220671.2014.899957
  • Dumas, D., McNeish, D., Sarama, J., & Clements, D. (2019). Preschool mathematics intervention can significantly improve student learning trajectories through elementary school. AERA Open, 5(4), 233285841987944. https://doi.org/10.1177/2332858419879446
  • Duncan, G. J., Dowsett, C. J., Claessens, A., Magnuson, K., Huston, A. C., Klebanov, P., Pagani, L. S., Feinstein, L., Engel, M., Brooks-Gunn, J., Sexton, H., Duckworth, K., & Japel, C. (2007). School readiness and later achievement. Developmental Psychology, 43(6), 1428–1446. https://doi.org/10.1037/0012-1649.43.6.1428
  • Dunn, K. E., & Mulvenon, S. W. (2009). A critical review of research on formative assessment. The limited scientific evidence of the impact of formative assessment in education. Practical Assessment, Research & Evaluation, 14, 1–11.
  • Enders, C. K. (2010). Applied missing data analysis. Guilford Press (Methodology in the Social Sciences).
  • Frohbieter, G., Greenwald, E., Stecher, B., & Schwartz, H. (2011). Knowing and doing: What teachers learn from formative assessment and how they use the information. (CRESST Report 802). CA: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
  • Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D., & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty. Journal of Educational Psychology, 97(3), 493–513. https://doi.org/10.1037/0022-0663.97.3.493
  • Fuchs, L. S., Fuchs, D., Powell, S. R., Seethaler, P. M., Cirino, P. T., & Fletcher, J. M. (2008). Intensive intervention for students with mathematics disabilities: Seven principles of effective practice. Learning Disability Quarterly, 31 (2S), 79–92. https://doi.org/10.2307/20528819.
  • Förster, N., & Souvignier, E. (2014). Learning progress assessment and goal setting: Effects on reading achievement, reading motivation and reading self-concept. Learning and Instruction, 32, 91–100. https://doi.org/10.1016/j.learninstruc.2014.02.002
  • Gersten, R., Rolfhus, E., Clarke, B., Decker, L. E., Wilkins, C., & Dimino, J. (2015). Intervention for first graders with limited number knowledge. American Educational Research Journal, 52(3), 516–546. https://doi.org/10.3102/0002831214565787
  • Hanich, L. B., Jordan, N. C., Kaplan, D., & Dick, J. (2001). Performance across different areas of mathematical cognition in children with learning difficulties. Journal of Educational Psychology, 93(3), 615–626. https://doi.org/10.1037/0022-0663.93.3.615
  • Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support school inquiry and continuous improvement: Final report to the Stuart Foundation (CSE Report 535). University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
  • Herman, J., Osmundson, E., Ayala, C., Schneider, S., Timms, M, & Center for Assessment and Evaluation of Student Learning. (2006). The nature and impact of Teachers’ formative assessment practices. (CSE Technical Report 703).
  • Hoff, E., & Tian, C. (2005). Socioeconomic status and cultural influences on language. Journal of Communication Disorders, 38(4), 271–278. https://doi.org/10.1016/j.jcomdis.2005.02.003
  • Hondrich, A. L., Hertel, S., Adl-Amini, K., & Klieme, E. (2016). Implementing curriculum-embedded formative assessment in primary school science classrooms. Assessment in Education, 23(3), 353–376. https://doi.org/10.1080/0969594X.2015.1049113
  • Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003). Arithmetic fact mastery in young children. A longitudinal investigation. Journal of Experimental Child Psychology, 85(2), 103–119.
  • Karuza, H. (2004). A Math Intervention Model for Middle School. How the Combination of Formative Assessment, Feedback, Academic Vocabulary, and Word Problems Affect Student Achievement in Mathematics [Dissertation]. Claremont Graduate University.
  • King, D. (2016). Do formative assessment strategies help learners with academic difficulties? [Dissertation]. 1822. https://rdw.rowan.edu/etd/1822
  • Kingston, N., & Nash, B. (2011). Formative assessment. A meta-analysis and a call for research. Educational Measurement, 30(4), 28–37. https://doi.org/10.1111/j.1745-3992.2011.00220.x
  • Kingston, N., & Nash, B. (2015a). Erratum. Educational measurement: Issues and practice, 34(2S), 55. https://doi.org/10.1111/emip.12075
  • Kingston, N., & Nash, B. (2015b). Erratum. Educational measurement: Issues and Practice, 34(3S), 49. https://doi.org/10.1111/emip.12083
  • Klein, A., Starkey, P., Clements, D., Sarama, J., & Iyer, R. (2008). Effects of a Pre-Kindergarten mathematics intervention: A randomized experiment. Journal of Research on Educational Effectiveness, 1(3), 155–178. https://doi.org/10.1080/19345740802114533
  • Klibanoff, R. S., Levine, S. C., Huttenlocher, J., Vasilyeva, M., & Hedges, L. V. (2006). Preschool children’s mathematical knowledge: The effect of teacher “math talk”. Developmental Psychology, 42(1), 59–69. https://doi.org/10.1037/0012-1649.42.1.59
  • KMK – Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland. (2005). Bildungsstandards im Fach Mathematik für den Primarbereich. Beschluss vom 15.10.2004.
  • Krajewski, K., & Schneider, W. (2006). Mathematische Vorläuferfertigkeiten im Vorschulalter und ihre Vorhersagekraft für die Mathematikleistungen bis zum Ende der Grundschulzeit. Psychologie in Erziehung Und Unterricht, 53, 246–262.
  • Krajewski, K., & Schneider, W. (2009). Exploring the impact of phonological awareness, visual-spatial working memory, and preschool quantity-number competencies on mathematics achievement in elementary school: findings from a 3-year longitudinal study. Journal of Experimental Child Psychology, 103(4), 516–531. https://doi.org/10.1016/j.jecp.2009.03.009 Epub 2009 May 8. PMID: 19427646.
  • Leak, J., Duncan, G. J., Li, W., Magnuson, K., Schindler, H., & Yoshikawa, H. (2012). Is timing everything? How early childhood education program cognitive and achievement impacts vary by starting age, program duration and time since the end of the program. Irvine.
  • Lee, J. (2002). Racial and ethnic achievement gap trends: reversing the progress toward equity? Educational Researcher, 31(1), 3–12. https://doi.org/10.3102/0013189X031001003
  • Li, W., Duncan, G. J., Magnuson, K., Schindler, H. S., Yoshikawa, H., & Leak, J. (2020). Timing in early childhood education: How cognitive and achievement program impacts vary by starting age. Program Duration, and Time since the End of the Program (EdWorkingPaper, 20–201). https://doi.org/10.26300/5tvg-nt21
  • Lipowsky, F., Faust, G., & Karst, K. (Eds.). (2011). Dokumentation der Erhebungsinstrumente des Projekts “Persönlichkeits- und Lernentwicklung von Grundschulkindern” (PERLE) – Teil 2. Materialien zur Bildungsforschung [Documentation of the instruments from the project “personality and learning development of elementary school children” (PERLE) – Part 2. materials for educational research]. GFPF.
  • Lee, H., Feldman, A., & Beatty, I. D. (2012). Factors that affect science and mathematics teachers’ initial implementation of technology-enhanced formative assessment using a classroom response system. Journal of Science Education and Technology, 21(5), 523–539. https://doi.org/10.1007/s10956-011-9344-x
  • Lembke, E., & Stecker, P. (2007). Curriculum-based measurement in mathematics: An evidence-based formative assessment procedure. NH.
  • Lorenz, J. H. (2007). HaRet – Hamburger Rechentest für Klasse 1. Freie und Hansestadt Hamburg.
  • MacDonald, A., & Carmichael, C. (2018). Early mathematical competencies and later achievement: Insights from the Longitudinal Study of Australian Children. Mathematics Education Research Journal, 30(4), 429–444. https://doi.org/10.1007/s13394-017-0230-6
  • Magnuson, K. A., & Duncan, G. J. (2006). The role of socioeconomic resources in the Black-White test score gap among young children. Developmental Review, 26(4), 365–399. https://doi.org/10.1016/j.dr.2006.06.004
  • Mannhaupt, G. (2012). Münsteraner Screening zur Früherkennung von Lese-Rechtschreibschwierigkeiten. MÜSC [Münsteraner Screening for the early development of spelling difficulties]. 1st ed. Cornelsen.
  • Maier, U., Wolf, N., & Randler, C. (2016). Effects of a computer-assisted formative assessment intervention based on multiple-tier diagnostic items and different feedback types. Computers & Education, 75, 85–98.
  • Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47 (2S), 149–174. https://doi.org/10.1007/BF02296272
  • McDonald, S.-K., Keesler, V. A., Kauffman, N. J., & Schneider, B. (2006). Scaling-up exemplary interventions. Educational Researcher, 35(3), 15–24. https://doi.org/10.3102/0013189X035003015
  • McLean, J. F., & Hitch, G. J. (1999). Working memory impairments in children with specific arithmetic learning difficulties. Journal of Experimental Child Psychology, 74(3), 240–260. https://doi.org/10.1006/jecp.1999.2516
  • Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196. https://doi.org/10.1007/BF02294457
  • Morgan, P. L., Farkas, G., & Qiong, W. (2009). Five-year growth trajectories of kindergarten children with learning difficulties in mathematics. Journal of Learning Disabilities, 42(4), 306–321. https://doi.org/10.1177/0022219408331037
  • Mullis, I. V. S., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 International Results in Mathematics. TIMSS & PIRLS International Study Center, Boston College.
  • Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus User’s guide. statistical analysis with latent variables, 7th ed. Muthén & Muthén.
  • Nicol, D. J., & Macfarlane‐Dick, D. (2006). Formative assessment and self‐regulated learning: A model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218. https://doi.org/10.1080/03075070600572090
  • Phelan, J., Choi, K., Vendlinski, T., Baker, E., & Herman, J. (2011). Differential improvement in student understanding of mathematical principles following formative assessment intervention. The Journal of Educational Research, 104(5), 330–339. https://doi.org/10.1080/00220671.2010.484030
  • Preschool Curriculum Evaluation Research Consortium. (2008). Effects of preschool curriculum programs on school readiness (NCER 2008-2009) https://ies.ed.gov/ncer/pubs/20082009/
  • Quyen, N. T. D., & Khairani, A. Z. (2016). Reviewing the challenges of implementing formative assessment in Asia: The need for a professional development program. Journal of Social Science Studies, 4(1), 160. https://doi.org/10.5296/jsss.v4i1.9728
  • Reardon, S. F., & Galindo, C. (2009). The Hispanic-White achievement gap in math and reading in the elementary grades. American Educational Research Journal, 46(3), 853–891. https://doi.org/10.3102/0002831209333184
  • Schütze, B., Souvignier, E., & Hasselhorn, M. (2018). Stichwort – Formatives assessment [Keyword – Formative assessment]. Zeitschrift Für Erziehungswissenschaft, 21(4), 697–715. https://doi.org/10.1007/s11618-018-0838-7
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference.
  • Sharkey, N. S., & Murnane, R. J. (2006). Tough choices in designing a formative assessment system. American Journal of Education, 112(4), 572–588. https://doi.org/10.1086/505060
  • Shavelson, R. J. (2006). On the integration of formative assessment in teaching and learning: Implications for new pathways in teacher education. In Oser, F., Achtenhagen, F. and Renold, U. (Eds.), Competence-oriented teacher training: Old research demands and new pathways (pp. 63–78). Sense Publishers.
  • Smith, T. M., Cobb, P., Farran, D. C., Cordray, D. S., & Munter, C. (2013). Evaluating math recovery. American Educational Research Journal, 50(2), 397–428. https://doi.org/10.3102/0002831212469045
  • Souvignier, E., & Hasselhorn, M. (2018). Formatives assessment. Zeitschrift Für Erziehungswissenschaft, 21(4), 693–696. https://doi.org/10.1007/s11618-018-0839-6
  • Stiggins, R. J. (2002). Assessment crisis: The absence of assessment for learning. Phi Delta Kappan, 83(10), 758–765. https://doi.org/10.1177/003172170208301010
  • Turner, R. C., Ritter, G. W., Robertson, A. H., & Featherston, L. (2006, April). Does the impact of preschool child care on cognition and behavior persist throughout the elementary years? [Paper presentation]. American Educational Research Association Conference, San Francisco, CA.
  • Tymms, P. (2004). Effect sizes in multilevel models. In I. Schagen & K. Elliot (Eds.), But what does it mean? The use of effect sizes in educational research (pp. 55–66). National Foundation of Educational Research.
  • Timperley, H., Wilson, A., Barrar, H., & Fung, I. (2007). Teacher professional learning and development: Best evidence synthesis iteration (BES). Ministry of Education.
  • Vendlinski, T. P., Hemberg, B., Mundy, C., & Phelan, J. (2009). Designing professional development around key principles and formative assessments to improve teachers’ knowledge to teach mathematics. Meeting of the Society for Research on Educational Effectiveness.
  • Verdine, B. N., Golinkoff, R. M., Hirsh-Pasek, K., Newcombe, N. S., Filipowicz, A. T., & Chang, A. (2013). Deconstructing building blocks: Preschoolers’ spatial assembly performance relates to early mathematical skills. Child Development.
  • Weiß, R. H., & Osterland, J. (2013). CFT 1-R – Grundintelligenztest Skala 1 Revision. Hogrefe.
  • Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., Tomita, M. K., & Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement, and conceptual change. Applied Measurement in Education, 21(4), 335–359. https://doi.org/10.1080/08957340802347845
  • Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues & Answers Report, REL No. 033).

Appendix

show an alternative structure of the regression models. In the first model, only the intervention groups are included. The second model includes the basic cognitive abilities, basic numerical skills and language skills. The third model additionally includes the earned certificate to teach mathematics and the textbooks. Finally, in Model 4 the VERA school-level mathematics score (average from 2010 to 2014) is included in the model.

Table 7. Alternative regression modeling for individual and classroom covariates and intervention groups on arithmetic achievement for students at risk for MD at the end of Grade 2.

The first model in shows the effect of the intervention on the arithmetic achievement of students at risk at the end of Grade 2 without controlling for covariates. We see no significant effect of the intervention groups. This could be due to differences in covariates. For example, the control group has a higher mean value on the basic cognitive abilities than the intervention groups. Basic cognitive abilities are known to have an effect on the arithmetic achievement. Therefore, in the second model we included the basic cognitive abilities, basic numerical skills and language skills at the beginning of first grade. We see that those covariates have a significant effect on the arithmetic achievement and explain 34% percent of the variance between students. The intervention groups have no significant effect when holding those covariates constant. In the third model we include the earned certificate to teach mathematics (studied mathematics or not) and the textbooks used by the groups. The textbooks have a significant effect on arithmetic achievement. When holding the textbooks constant, we see a significant effect of the intervention group MMS on arithmetic achievement at the end of Grade 2. When inserting the average of the school-level mathematics scores from the years before the intervention, both intervention groups MMS + and MMS show a significant effect in arithmetic achievement. We can conclude that students at risk in the intervention groups, who have comparable scores on the covariates to those in the comparison groups, reach higher scores on the arithmetic achievement test at the end of Grade 2. Similar results can be seen for students at risk at the end of Grade 3 (). Here, the MMS + group does not show a significant effect on students’ arithmetic achievement after controlling for the earned certificate to teach mathematics, the textbooks used, and the school-level mathematics score (Models 3 and 4).

Table 8. Alternative regression modeling for individual and classroom covariates and intervention groups on arithmetic achievement for students at risk for MD at the end of Grade 3.

In the results for the whole group at the end of Grade 2 can be seen. Including only the intervention groups, no effect on the arithmetic achievement at the end of Grade 2 can be seen. This could be due to differences in covariates. For example, the control group has a higher mean value on the basic cognitive abilities than the intervention groups. Basic cognitive abilities are known to have an effect on the arithmetic achievement. Therefore, in the second model we included the basic cognitive abilities, basic numerical skills and language skills at the beginning of first grade. We see that those covariates have a significant effect on the arithmetic achievement and explain 33% percent of the variance between classes. The intervention does have a significant effect when holding those covariates constant. In the third model we additionally include the Earned certificate to teach mathematics (studied mathematics or not) and the textbooks used by the groups. The textbooks have a significant effect on arithmetic achievement. When holding the textbooks constant, we see a higher value of the regression coefficient for the MMS intervention group. Furthermore, the explained variance between classes increases (48%). In the fourth model, we included the average school-level mathematics achievement from 2010 to 2014 to account for school achievement characteristics. This covariate hardly influences the effect of the intervention. We can conclude that classes in the intervention groups, who have comparable scores on the covariates to those in the comparison groups, reach higher scores on the arithmetic achievement test at the end of Grade 2.

Table 9. Alternative multilevel regression modeling for individual and classroom covariates and intervention groups on students’ arithmetic achievement at the end of Grade 2.

At the end of Grade 3 () we see a significant effect of the intervention group MMS in the first model. When including the cognitive abilities, basic numerical skills and language skills at the beginning of the first grade on individual and on the class level, we see a significant effect of the first two on individual level and only of cognitive abilities on class level. In the third model the covariates earned certificate to teach mathematics and textbook are included. The textbooks have a significant effect on arithmetic achievement at the end of Grade 3. When holding those covariates constant, the regression coefficient of the MMS+ intervention group becomes significant. In the fourth model, we included the average school-level mathematics achievement from 2010 to 2014, which did not influence the effect of the intervention. We can conclude that classes in the intervention groups, who have comparable scores on the covariates to those in the comparison groups, reach higher scores on the arithmetic achievement test at the end of Grade 3.

Table 10. Alternative multilevel regression modeling for individual and classroom covariates and intervention groups on students’ arithmetic achievement at the end of Grade 3.