ABSTRACT
There is a significant need for kindergarten entry assessments (KEA) that meet state education agency (SEA) requirements and are psychometrically sound measures of a broad range of school readiness domains such as language, literacy, math, science, executive function, and social-emotional skills. Research Findings: In this paper, we describe five phases of development, calibration, and launch of a new KEA for a large state. We developed and tested 14 English subtests. We describe how teacher input and SEA priorities and policies guided development of the test blueprint in phase one. We calibrated the measures across the state in phase two and established initial evidence of reliability and validity in phase three (n = 208). In phase four, we developed our technology platform, scoring, and student grouping tools to improve data utilization. Practice or Policy: We describe in phase five current delivery and implementation practices across the state and future work to improve and expand the measures along with a set of linked activities to help teachers use data to guide instruction. We discuss principles and methods the assessment developers utilized, as these perspectives may inform the development and usage of other KEAs.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1. A Spanish version was also simultaneously developed. The description of this version is in preparation.
2. Over 30 different languages were represented.
3. We conducted some preliminary sensitivity analyses regarding the time of year differences and potential age differences (as this relates to time of year) with results indicating that patterns held across time and age. We acknowledge a need for more rigorous future testing of timing effects to ensure that timing of inclusion of a new subtest in the field was not affected by the time of the school year.
4. We note that the other available criterion, the Akaike Information Criterion always favored additional factors, even when additional factors were not meaningful based on statistics (e.g., no unique loadings above .4 and high cross loadings), or content (no discernable content pattern).
5. Local fit statistics such as the RMSEA and CFI are not available when the MLR estimator is used.
6. Local fit indices such as RMSEA and CFI are not available when the MLR estimator is used.
7. IRTPRO utilizes Wald tests to evaluate whether item discriminations and difficulties differ (Lord, Citation1977; see Ong, Kim, Cohen, & Cramer, Citation2015 for details). This approach is ideally suited for items with a larger number of respondents. Once these DIF analyses were completed, we compared results obtained from this approach with those obtained in SAS that focus only on difficulty and found similar item identification.
8. As a reminder, the 36 items executive function subtest includes item-level timing with items shown rapidly (1.25 seconds per item). The 65 item subtest includes a total test time of 1 minute and 20 seconds where children respond to as many of the 65 items as they can during that time period. Final item counts are in . The only subtests that changed from scaling to validity was Vocab and science – both changes are noted in the text.
9. Seven different languages were represented.
10. The one exception to this three group system was the academic motor subtest. Due to the fact that this rating scale focused on typical versus delayed development, we grouped scores to match the ratings teachers used.