Publication Cover
Educational Psychology
An International Journal of Experimental Educational Psychology
Volume 32, 2012 - Issue 3
2,865
Views
51
CrossRef citations to date
0
Altmetric
Articles

Early mathematics assessment: validation of the short form of a prekindergarten and kindergarten mathematics measure

, , , , &
Pages 311-333 | Received 17 Aug 2011, Accepted 30 Dec 2011, Published online: 31 Jan 2012
 

Abstract

In recent years, there has been increased interest in improving early mathematics curricula and instruction. Subsequently, there has also been a rise in demand for better early mathematics assessments, as most current measures are limited in their content and/or their sensitivity to detect differences in early mathematics development among young children. In this article, using data from two large samples of diverse populations of prekindergarten and kindergarten children, we provide evidence regarding the psychometric validity of a new theory-based early mathematics assessment. The new measure is the short form of a longer, validated measure. Our results suggest the short form assessment is valid for assessing prekindergarten and kindergarten children’s numeracy and geometry skills and is sensitive to differences in early mathematics development among young children.

Notes

1. The Sample 1 school district was also the site of one of the school districts in Sample 2. Assessment data were collected in different years.

2. The overall study examined the efficacy of the Technology-enhanced, Research-based, Instruction, Assessment, and professional Development (TRIAD) model (for more detail on TRIAD, see Clements, Citation2007).

3. Subitising that involves the quick recognition of small sets is perceptual subitising. The REMA also assesses conceptual subitising, in which subsets are perceptually subitised and then combined, all quickly and automatically, such as when a ‘12 domino’ is recognised (Sarama & Clements, Citation2009). Both are fast, automatic and accurate quantification processes (not estimation nor involving explicit calculation).

4. As noted earlier, the Applied Problems subtest does not measure geometric and spatial capacities and researchers have raised some concerns regarding the test’s appropriateness and sensitivity in use with young children. Nonetheless, it is very widely used and we include in the present study for concurrent validity purposes.

5. We recognise that using the Rasch model instead of a 2- or 3-parameter model assumes that items are equally discriminating and that the guessing parameter is zero (Weitzman, Citation1996). While we considered using a 2- or a 3-parameter Item Response Theory (IRT) model, we chose the Rasch model, given the centrality of Rasch modelling to the development of the full REMA. Further, from a practical perspective, Rasch modelling provides a simple, sample-independent conversion table from raw scores to ability scores that can be used by teachers and researchers who use the Short Form. By their nature, more complicated 2-parameter logistic model (PL) and 3-PL models do not have this practical advantage. Further, while a 2- or a 3-parameter IRT model might provide more accurate scoring (and this is debatable in the empirical literature; see Weitzman, Citation2008), the Short Form’s intended purpose does not require such fine tuning.

6. An item with an infit or outfit statistic of >1.3 shows signs of underfit, meaning that it is not adequately distinguishing between children of differing abilities. Because no item showed inadequate infit and outfit statistics in both Samples 1 and 2, we allowed some flexibility in meeting this benchmark.

7. We report but do not interpret the standardised mean square infit and outfit statistics (‘t standardized fit statistic (ZSTD)’ in Table ), as these tend to reject items when the sample size is large (Bond & Fox, Citation2007).

8. In Sample 1, treatment group status was equivalent to grade level, due to the multiple cohort design of that sample. This was not the case in Sample 2, where children within the same cohort were randomised to treatment/control group status.

9. We used only Sample 1 for this work because Sample 2 children took the full REMA using a stop rule. Therefore, we are unable to use Sample 2 to determine how much information was lost with the stop rule vs. without.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.