4,118
Views
74
CrossRef citations to date
0
Altmetric
Articles

Serious doubts about school effectiveness

Pages 745-766 | Published online: 14 Aug 2009
 

Abstract

This paper considers the model of school effectiveness (SE) currently dominant in research, policy and practice in England (although the concerns it raises are international). It shows, principally through consideration of initial and propagated error, that SE results cannot be relied upon. By considering the residual difference between the predicted and obtained score for all pupils in any phase of education, SE calculations leave the results to be disproportionately made up of relative error terms. Adding contextual information confuses, but does not help this situation. Having shown and illustrated the sensitivity of SE to this propagation of initial errors, and therefore why it is unworkable, the paper considers some of the reasons why SE has become dominant, outlines the damage this dominant model causes and begins to shape alternative ways of considering what schools do.

Numbers are like people; torture them enough and they will tell you anything.

Notes

1. I exclude some chapters from this statement of inability to comprehend, most especially the chapter by Brown (Citation1998), which I urge everyone to read.

2. The Department for Children, Schools and Families is responsible for the organisation of schools and childrens' services in England.

3. Key Stage 2 leads to statutory testing at the end of primary education, usually for pupils aged 11. Key Stage 4 leads to assessment at age 16, currently the legal age at which a pupil can leave school.

4. These variables are used by DCSF for a number of reasons, including the fact that they are available at an individual level with reasonably complete data. Of course, other variables might be useful both at individual level, such as parents' occupation, and at school level, such as qualifications of teachers. Indeed, analyses for other purposes quite properly use different combinations of variables. However, the critique of CVA and school effectiveness presented here does not depend on the precise variables used. Occupation is harder to classify and generally less complete as a field than eligibility for free school meals, for example. The omission of potentially important measures of individuals and schools can lead to another form of bias in the results, by making the variables that are included appear more important.

5. Whereas both the ethnicity coefficient and the FSM/ethnicity interaction are 0 for White pupils, they are 29.190 and 20.460, respectively for Black African pupils, for example.

6. For further explanation, contact the PLASC/NPD User Group (PLUG) at http://www.bris.ac.uk/Depts/CMPO/PLUG/

7. Author's analysis using 2007 datasets for the purposes of this paper.

8. For example, if nearly 10% of children do not appear in the databases anyway, 10% do not have matching PLASC/NPD records, 10% do not have a matching prior attainment record and 15% of the records present have missing values in just five key variables, it is clear that once all variables are considered there could easily be fewer than 50% complete records overall.

9. Strictly, the relative error is 1/3 based on the true value we are trying to measure. In practice, of course, we do not know this value or else there would be no error, so all relative errors are here based on the achieved measure instead.

10. This does not prevent the widespread abuse of random sampling techniques with population data. A couple of recent examples will have to suffice. Hammond and Yeshanew (Citation2007) base their analysis on a national dataset, but say ‘Although no actual samples have been drawn…Statistical checks were carried out and no significant difference between the groups was found’ (p. 102). They then present a table of standard errors for this population data (p.102). They have learnt to use multi‐level modelling but clearly forgotten what significance means and what a standard error is. Similarly, Thomas et al. (Citation2007) examined data from one school district in England (and so a population, in statistical terms). Yet they report that ‘the pupil intake and time trend explanatory variables included in the fixed part of the value‐added model (Model A) were statistically significant (at 0.05 level)’ (p. 271).

11. This misuse of sampling theory with population data has sometimes been defended by saying that the population figures are somehow a random sample of a theoretical ‘super‐population’. In the example of PLASC/NPD, then, the school pupils are imagined as a random sample of all the children that could have been born to their parents and the analyst seeks to generalise the findings to those unborn and never‐born children. But why should the born children be a random subset of the unborn? What does that even mean in real life? Do politicians and parents know that this is what such statisticians mean? And why would anyone want to generalise to a non‐existent and never‐to‐be‐born group anyway? This is an example of the lengths than some analysts will go to in defending their sampling theory techniques. The approach has long been discredited (Camilli, Citation1996; Gorard, Citation2008b).

12. Some commentators and even some purported training resources suggest that a confidence interval is a band within which we can be reasonably confident the true population figure appears. This is a simple error of understanding. All confidence intervals have the manifest score at their centre and are clearly a band of likely scores we would achieve if the random sampling that led to the manifest score were repeated and using the manifest score is our best (only) guess so far. We have no idea where the true population figure actually is, other than from that guess, unless we have the population figures (as we do in this paper). If we have the population figures (as we do in this paper) we do not need confidence intervals and they make no sense then anyway.

13. Some purported authorities on school effectiveness still erroneously propose the use of confidence intervals with school effectiveness scores based on population figures (e.g. Goldstein, Citation2008).

14. Of course, the same kind of errors occur in raw scores, but they are not conflated with errors in contextual variables, do not have problems of missing prior attainment records and, above all, do not occur in scores as small as predicted/actual residuals. Raw scores, for all of their faults, have both less absolute error than CVA scores and less relative error.

15. One promising avenue is based on regression discontinuity (e.g. Luyten, Citation2006). This has the major advantage over CVA of not being zero‐sum in nature. All schools could improve and be recognised for this (and vice versa) and groups of schools or whole districts can be assessed as co‐operative units in which the success of any unit adds to the success of any other. Perhaps something like this is better for now and for the future?

16. Or perhaps parents are smarter than policy‐makers, realising that current VA scores for any school or phase are historical and tell them only what might have happened if their child had started at that school five years ago.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.