1,201
Views
33
CrossRef citations to date
0
Altmetric
RANKING AND RATING PUBLIC SERVICES

Rating the Rankings: Assessing International Rankings of Public Service Performance

Pages 298-328 | Published online: 15 Aug 2008
 

ABSTRACT

This paper documents the growth of international rankings of governance and public services and seeks to contribute to a second-generation approach to the analysis of this phenomenon. It does so primarily by setting out a method for ranking international ratings, building on and extending earlier work by other scholars, and applies that method to 14 international rankings of governance and public services, to explore the scope and limits of the approach. The final section argues that the development of such a method could form the basis for benchmarking international rankings as they develop in the future.

Acknowledgments

We would like to thank the participants at the IPMN Workshop on Ranking and Rating Public Services (Oxford, 2007), particularly Christiane Arndt and H. George Fredrickson, for their valuable observations. We also thank the anonymous referees for their insightful comments.

Notes

a Our definition of a “source” depended on the survey under consideration. Thus the source was in some cases the sample of the national population assessed directly by the surveying organization (e.g. the International Crime Victim Survey and the educational attainment surveys), and for these, the score depended on how comprehensively the relevant population was sampled. For surveys such as the World Bank Governance Indicators that used indicators obtained from secondary sources, the sources were themselves organizations (e.g., Afrobarometer, Freedom House, Amnesty International). In these cases, we assessed the representativeness of sources by taking into account the extent to which the criteria for including a source were clearly described and whether the sampling and data collection methods of the sources themselves were critically assessed. Some surveys used sources of both types. For instance, the World Economic Forum Global Competitiveness Index and the IMD World Competitiveness Yearbook used both Executive Opinion Surveys and quantitative data obtained from national and international sources. For these, a combination of both approaches was used.

b There are limits to measurement precision even in the so-called exact sciences, and the limits of accurate measurement are ordinarily greater in the social sciences, where strategic gaming has to be added to the problems of sampling error, simple recording error and variations in the assignment of cases to categories. Accordingly, where there are small differences between units of comparison, the meaningfulness of those differences needs to be assessed relative to confidence limits or likely measurement error.

For example, in his proposed regime for managing the use of timber in government dockyards, Jeremy Bentham prescribed the making of a league table of costs and outputs in Timber Masters' departments in the various naval dockyards (Hume Citation1981, 157).

We are grateful to Gary Sturgess for bringing this source to our attention, and also the practices of the East India Company.

Of which an early example is Elinor Ostrom et al. (Citation1977), Policing Metropolitan America.

The idea of putting together the evidence from international rankings and ratings to produce a picture of public service performance in the round in a way roughly approximate to a country's medal tally in the Olympic games is both appealing and beguiling. For example, the Prime Minister's Strategy Unit in the UK produced unpublished work in the early 2000s comparing rankings across public services in an attempt to identify overall patterns of performance, and the Dutch government commissioned a major research enterprise on these lines for publication in 2004 (SCP Citation2004), to coincide with the Dutch Presidency of the European Union in that year, as discussed by Geert Bouckaert in this collection (Bouckaert Citation2008).

Notably Transparency International's Corruption Perception Index, the World Economic Forum's Global Competitiveness rankings and the World Bank's own “governance” rankings.

Hernando de Soto's (Citation2000) method of direct observation (for instance of the time, effort and form-filling required to set up a bakery shop in different countries) is perhaps a third basis for such rankings. It figures in the World Bank's cost-of-regulation rankings (see http://www.doingbusiness.org) but it is much rarer than the other two and is not referred to in Besançon's (Citation2003) discussion.

See Florida and Gates Citation2001, Florida Citation2002, and Florida and Tinagli Citation2004.

Validity broadly means that a metric captures what it intends to measure, while reliability broadly means that a metric will produce consistent results when repeated, but it is here stretched a little to include the extent to which a measure allows meaningful comparisons over time, a vital consideration for ratings that can contribute to the monitoring and management of performance improvements.

The numerical scores are given in Appendix 1.

K-means cluster analysis (SPSS 14.0, 2005) based on the six criteria (V1, V2, V3, R1, R2, R3) resulted in the three educational surveys (TIMMS, PIRLS and PISA) being assigned to the same cluster for values of K (the total number of clusters) from 2 to 11. The other surveys showed no such consistency, and appeared in different groupings depending on the number of clusters chosen. While the number of surveys is too small for rigorous analysis by this method, it suggests that the educational surveys can be distinguished from the rest, but that the remaining surveys cannot be distinguished from one another on these criteria.

Numerically, the vast majority (about 90 percent) of indicators reported by the educational surveys were from the associated background questionnaires, whose questions changed significantly from year to year. But even if only the “achievement” indicators were counted, they also showed relatively high variation, due to different age groups of children being included in different years (TIMSS and PIRLS), or variations in the extent to which particular areas of the curriculum were tested (PISA). Another reason for scoring these surveys lower on reliability criteria was the relatively low number of surveys in the last 10 years which contributed to both R1 and R2, as described in Appendix 1.

For a detailed criticism of the World Bank Governance Indicators, see Arndt (Citation2008) in this issue.

“In some of the countries however the interviewers, having to work in several local dialects, were provided with the translation in the language of the majority linguistic group while translations into dialects were provided on the spot, that is to say, during the interviewing process by the interviewers themselves” (from http://www.unicri.it/wwd/analysis/icvs/methodology.php).

These cases resulted in lower validity scores, since the validity measures were intended to reflect how well the methodology was explained and justified.

For instance, the authors of the World Bank Governance Indicators admitted that, “Many of the polls and surveys we use suffer from deficiencies such as poorly-worded questions about ill-defined and excessively broad concepts” (Kaufmann, Kraay and Zoido-Lobatón Citation1999), but no mention was made of why certain sources were included and others (presumably) excluded. Similarly Transparency International said (in “frequently asked questions” on its website): “TI strives to ensure that the sources used are of the highest quality and that the survey work is performed with complete integrity. To qualify, the data must be well documented and sufficient to permit a judgment on its reliability.” But no further details were given of what this process entailed. The surveys considered most valid on our criteria were ones which designed their own surveys and implemented the survey process, keeping the whole process under the control of one organization. It is true that such an approach could lead to low ambition in terms of country or indicator coverage.

To do otherwise would have penalised surveys such as the OECD Health Survey that make this information available, since many surveys did not report the country coverage for every indicator.

We also found that the printed version of the WEF Global Competitiveness Report gave a somewhat different range of indicators from the version on CD-ROM. We have used the printed version for our analysis.

Again, the OECD Health Report was among the most transparent in this respect, flagging every instance of repeated data, but many surveys did not make this information available.

For example, systematic coding errors might have occurred as a result of various forms of common-source bias or from faulty assessments of documentary data, such as being taken in by apparently plausible arguments or faulty line-ball judgements.

Clearly, the small numbers both of the population of surveys and the coders means that this exercise of producing confidence intervals is essentially illustrative.

Rank-and-yank involved forced ranking of all corporation employees (following the logic of tournament theory as described earlier) followed by dismissal of the lower-ranked employees (Osborne and McCann Citation2004).

For the term “planning mood,” see Hogwood and Gunn Citation1984, 33. The term denotes a period epitomized by the 1961 Plowden Report that involving more systematic ways of ordering priorities and linking public spending to resources.

As in the cases of Global Integrity and the DIAL analysis of household experience of corruption in Africa and Latin America, discussed by Arndt in this issue.

“Gresham's Law” (named after Sir Thomas Gresham, 1519–79) is commonly stated as saying “bad money drives out good” in circumstances where there are two or more forms of money in existence which the law requires to be accorded the same face value.

Indicators were defined as the lowest non-aggregated level of data that was reported in the survey or methodology papers. Examples of indicators: questions asked in a survey; data provided in response to a request by researching body or obtained from national or international sources; performance in research-body-administered educational tests broken down into subject areas and where applicable by age.

Because of the large number of indicators in the OECD Health Data, these indicators were sampled at random, stratified by “Chapter” or broad area of healthcare performance (e.g. medical technology, life expectancy). This provided a sample of 59 indicators (one from each of the 59 Chapters) from a total indicator population of about 600.

Additional information

Notes on contributors

Christopher Hood

Christopher Hood ([email protected]) is Gladstone Professor of Government and Fellow of All Souls College Oxford and Director of the Economic and Social Research Council's Research Programme on Public Services: Quality, Performance, Delivery. He specializes in the study of executive government, regulation and public-sector reform.

Ruth Dixon

Ruth M. Dixon is Project Assistant to the Programme Director of the Economic and Social Research Council's Research Programme on Public Services, Department of Politics and International Relations, University of Oxford.

Craig Beeston

Craig Beeston was educated at the Universities of Manchester and Oxford, and completed his D. Phil. in Modern History at University College, Oxford in 2007. Since 2005 he has been engaged in research projects funded by the ESRC Public Services Programme and the Centre for Analysis of Risk and Regulation at the London School of Economics and Political Science, producing papers with Professor Christopher Hood and Others on the methodology of public service rankings exercises, Britain' international ranking in public service performance, and the analysis of blame management strategies adopted by office-holders during political crises. He now works in the public sector.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.