Abstract
In consultation with experts in the fields of education research and whole school improvement, researchers at the Comprehensive School Reform Quality (CSRQ) Center created a framework for evaluating the scientific rigor of research studies that report on the efficacy of whole school improvement models. In this paper, the authors begin with a brief introduction to the challenges of developing standards of acceptable research practices and establishing a method of review that enables cross-program comparisons. The authors discuss CSRQ Center standards in reference to six primary areas of review: design, assessment, implementation, sampling, timing, and data analysis. Each standard is presented and the empirical rationale for evaluation criteria is explored. The authors conclude by presenting guidelines for researchers who report on the outcomes of evaluations of whole school improvement models. This discussion highlights the importance of ongoing conversation about the nature of evidence and components of research practice that illuminate program effectiveness within a larger framework of contemporary education policy and whole school improvement legislation.
Notes
1The Center was funded from 2003–2006 by the U.S. Department of Education's Office of Elementary and Secondary Education through a Comprehensive School Reform Quality Initiative Grant, S332B030012, and was operated by the American Institutes for Research.
2The CSRQ Center reports can be found at http://www.csrq.org
3To meet requirements for evidence of program impact and effectiveness specified by the No Child Left Behind Act (NCLB) of 2001, school reform models must demonstrate strong evidence of their effectiveness in increasing student academic achievement. The requirement that studies eligible for review by the CSRQ Center include academic achievement outcomes was based on the determination that this information would be of primary interest to education consumers.
4According to CSRQ standards for review, an example of an instrument with insufficient face validity would be a study of achievement involving students generating as many alternative uses of common objects as they can. An example of an instrument with moderate face validity would be a teacher-developed assessment. An example of an instrument with strong face validity would be a state or national standardized assessment with known psychometric properties.
5An example of a non-critical threat that could have led to the conclusion that the study should not continue for further review was severe or differential attrition. According to CSRQ Center standards, severe attrition is defined as at least a 20% loss of respondents (schools or individual students). Differential attrition is defined as at least 7% of the students in one group leaving for a different reason than students in the comparison group.
6Results from the CSRQ Education Service Provider, Secondary, and updated Elementary School Reports are not included in this figure. During the process of reviewing hundreds of studies in preparation for the CSRQ Center reports, reviewers became increasingly adept at screening studies for initial relevance. Many types of studies that were included for initial review in the first report were screened out before receiving initial review during the second, third, and fourth reports. As a result, the pool of initially relevant studies from the Elementary School Report provides a more illustrative example of study characteristics for those studies that did not ultimately receive full review.