Search in:

International Public Management Journal Volume 11, 2008 - Issue 3

Submit an article Journal homepage

1,201

Views

CrossRef citations to date

Altmetric

RANKING AND RATING PUBLIC SERVICES

Rating the Rankings: Assessing International Rankings of Public Service Performance

Christopher Hood University of OxfordView further author information

Ruth Dixon University of OxfordView further author information

Craig Beeston University of OxfordView further author information

Pages 298-328 | Published online: 15 Aug 2008

Cite this article
https://doi.org/10.1080/10967490802301286

Sample our Politics & International Relations journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/10967490802301286?needAccess=true

ABSTRACT

This paper documents the growth of international rankings of governance and public services and seeks to contribute to a second-generation approach to the analysis of this phenomenon. It does so primarily by setting out a method for ranking international ratings, building on and extending earlier work by other scholars, and applies that method to 14 international rankings of governance and public services, to explore the scope and limits of the approach. The final section argues that the development of such a method could form the basis for benchmarking international rankings as they develop in the future.

Acknowledgments

We would like to thank the participants at the IPMN Workshop on Ranking and Rating Public Services (Oxford, 2007), particularly Christiane Arndt and H. George Fredrickson, for their valuable observations. We also thank the anonymous referees for their insightful comments.

Notes

^aOur definition of a “source” depended on the survey under consideration. Thus the source was in some cases the sample of the national population assessed directly by the surveying organization (e.g. the International Crime Victim Survey and the educational attainment surveys), and for these, the score depended on how comprehensively the relevant population was sampled. For surveys such as the World Bank Governance Indicators that used indicators obtained from secondary sources, the sources were themselves organizations (e.g., Afrobarometer, Freedom House, Amnesty International). In these cases, we assessed the representativeness of sources by taking into account the extent to which the criteria for including a source were clearly described and whether the sampling and data collection methods of the sources themselves were critically assessed. Some surveys used sources of both types. For instance, the World Economic Forum Global Competitiveness Index and the IMD World Competitiveness Yearbook used both Executive Opinion Surveys and quantitative data obtained from national and international sources. For these, a combination of both approaches was used.

^bThere are limits to measurement precision even in the so-called exact sciences, and the limits of accurate measurement are ordinarily greater in the social sciences, where strategic gaming has to be added to the problems of sampling error, simple recording error and variations in the assignment of cases to categories. Accordingly, where there are small differences between units of comparison, the meaningfulness of those differences needs to be assessed relative to confidence limits or likely measurement error.

For example, in his proposed regime for managing the use of timber in government dockyards, Jeremy Bentham prescribed the making of a league table of costs and outputs in Timber Masters' departments in the various naval dockyards (Hume Citation1981, 157).

We are grateful to Gary Sturgess for bringing this source to our attention, and also the practices of the East India Company.

Of which an early example is Elinor Ostrom et al. (Citation1977), Policing Metropolitan America.

The idea of putting together the evidence from international rankings and ratings to produce a picture of public service performance in the round in a way roughly approximate to a country's medal tally in the Olympic games is both appealing and beguiling. For example, the Prime Minister's Strategy Unit in the UK produced unpublished work in the early 2000s comparing rankings across public services in an attempt to identify overall patterns of performance, and the Dutch government commissioned a major research enterprise on these lines for publication in 2004 (SCP Citation2004), to coincide with the Dutch Presidency of the European Union in that year, as discussed by Geert Bouckaert in this collection (Bouckaert Citation2008).

Notably Transparency International's Corruption Perception Index, the World Economic Forum's Global Competitiveness rankings and the World Bank's own “governance” rankings.

Hernando de Soto's (Citation2000) method of direct observation (for instance of the time, effort and form-filling required to set up a bakery shop in different countries) is perhaps a third basis for such rankings. It figures in the World Bank's cost-of-regulation rankings (see http://www.doingbusiness.org) but it is much rarer than the other two and is not referred to in Besançon's (Citation2003) discussion.

See Florida and Gates Citation2001, Florida Citation2002, and Florida and Tinagli Citation2004.

Validity broadly means that a metric captures what it intends to measure, while reliability broadly means that a metric will produce consistent results when repeated, but it is here stretched a little to include the extent to which a measure allows meaningful comparisons over time, a vital consideration for ratings that can contribute to the monitoring and management of performance improvements.

The numerical scores are given in Appendix 1.

K-means cluster analysis (SPSS 14.0, 2005) based on the six criteria (V1, V2, V3, R1, R2, R3) resulted in the three educational surveys (TIMMS, PIRLS and PISA) being assigned to the same cluster for values of K (the total number of clusters) from 2 to 11. The other surveys showed no such consistency, and appeared in different groupings depending on the number of clusters chosen. While the number of surveys is too small for rigorous analysis by this method, it suggests that the educational surveys can be distinguished from the rest, but that the remaining surveys cannot be distinguished from one another on these criteria.

Numerically, the vast majority (about 90 percent) of indicators reported by the educational surveys were from the associated background questionnaires, whose questions changed significantly from year to year. But even if only the “achievement” indicators were counted, they also showed relatively high variation, due to different age groups of children being included in different years (TIMSS and PIRLS), or variations in the extent to which particular areas of the curriculum were tested (PISA). Another reason for scoring these surveys lower on reliability criteria was the relatively low number of surveys in the last 10 years which contributed to both R1 and R2, as described in Appendix 1.

For a detailed criticism of the World Bank Governance Indicators, see Arndt (Citation2008) in this issue.

“In some of the countries however the interviewers, having to work in several local dialects, were provided with the translation in the language of the majority linguistic group while translations into dialects were provided on the spot, that is to say, during the interviewing process by the interviewers themselves” (from http://www.unicri.it/wwd/analysis/icvs/methodology.php).

These cases resulted in lower validity scores, since the validity measures were intended to reflect how well the methodology was explained and justified.

For instance, the authors of the World Bank Governance Indicators admitted that, “Many of the polls and surveys we use suffer from deficiencies such as poorly-worded questions about ill-defined and excessively broad concepts” (Kaufmann, Kraay and Zoido-Lobatón Citation1999), but no mention was made of why certain sources were included and others (presumably) excluded. Similarly Transparency International said (in “frequently asked questions” on its website): “TI strives to ensure that the sources used are of the highest quality and that the survey work is performed with complete integrity. To qualify, the data must be well documented and sufficient to permit a judgment on its reliability.” But no further details were given of what this process entailed. The surveys considered most valid on our criteria were ones which designed their own surveys and implemented the survey process, keeping the whole process under the control of one organization. It is true that such an approach could lead to low ambition in terms of country or indicator coverage.

To do otherwise would have penalised surveys such as the OECD Health Survey that make this information available, since many surveys did not report the country coverage for every indicator.

We also found that the printed version of the WEF Global Competitiveness Report gave a somewhat different range of indicators from the version on CD-ROM. We have used the printed version for our analysis.

Again, the OECD Health Report was among the most transparent in this respect, flagging every instance of repeated data, but many surveys did not make this information available.

For example, systematic coding errors might have occurred as a result of various forms of common-source bias or from faulty assessments of documentary data, such as being taken in by apparently plausible arguments or faulty line-ball judgements.

Clearly, the small numbers both of the population of surveys and the coders means that this exercise of producing confidence intervals is essentially illustrative.

Rank-and-yank involved forced ranking of all corporation employees (following the logic of tournament theory as described earlier) followed by dismissal of the lower-ranked employees (Osborne and McCann Citation2004).

For the term “planning mood,” see Hogwood and Gunn Citation1984, 33. The term denotes a period epitomized by the 1961 Plowden Report that involving more systematic ways of ordering priorities and linking public spending to resources.

As in the cases of Global Integrity and the DIAL analysis of household experience of corruption in Africa and Latin America, discussed by Arndt in this issue.

“Gresham's Law” (named after Sir Thomas Gresham, 1519–79) is commonly stated as saying “bad money drives out good” in circumstances where there are two or more forms of money in existence which the law requires to be accorded the same face value.

Indicators were defined as the lowest non-aggregated level of data that was reported in the survey or methodology papers. Examples of indicators: questions asked in a survey; data provided in response to a request by researching body or obtained from national or international sources; performance in research-body-administered educational tests broken down into subject areas and where applicable by age.

Because of the large number of indicators in the OECD Health Data, these indicators were sampled at random, stratified by “Chapter” or broad area of healthcare performance (e.g. medical technology, life expectancy). This provided a sample of 59 indicators (one from each of the 59 Chapters) from a total indicator population of about 600.

Hume , L. J. 1981 . Bentham and Bureaucracy . Cambridge , UK : Cambridge University Press .

Google Scholar

Ostrom , E. , R. B. Parks and G. B. Whitaker . 1977 . Policing Metropolitan America . Washington , DC : US Government Printing Office .

Google Scholar

SCP . 2004 . Public Sector Performance: An International Comparison of Education, Health Care, Law and Order and Public Administration . The Hague : Social and Cultural Planning Office of the Netherlands .

Google Scholar

Bouckaert , G. 2008. “The Administrative and Academic Politics of Ranking Research: The Case of the 2004 ‘Public Sector Performance’ Study in the Netherlands.” International Public Management Journal , this issue.

Web of Science ®Google Scholar

de Soto , H. 2000 . The Mystery of Capital: Why Capitalism Triumphs in the West and Fails Everywhere Else . New York : Basic Books .

Google Scholar

Besançon , M. 2003 . Good Governance Rankings: The Art of Measurement . WPF Reports No. 36 . Cambridge , MA : World Peace Foundation .

Google Scholar

Florida , R. L. and G. Gates . 2001 . Technology and Tolerance: The Importance of Diversity to High-Technology Growth . Washington , DC : The Brookings Institution . http://www.brookings.edu/es/urban/techtol.pdf (accessed June 2008) .

Google Scholar

Florida , R. L. 2002 . The Rise of the Creative Class . New York : Basic Books .

Google Scholar

Florida , R. L. and I. Tinagli . 2004 . Europe in the Creative Age . Carnegie/Mellon Software Industry Center/Demos. http://83.223.102.49/files/EuropeintheCreativeAge2004.pdf (accessed June 2008) .

Google Scholar

Arndt , C. 2008 . “The Politics of Governance Ratings.” International Public Management Journal , this issue .

Web of Science ®Google Scholar

Kaufmann , D. , A. Kraay and P. Zoido-Lobatón . 1999 . Governance Matters . World Bank Policy Research Working Paper 2196 . Washington , DC : World Bank .

Google Scholar

Osborne , T. and L. A. McCann . 2004 . “Forced Ranking and Age-Related Employment Discrimination.” Individual Rights and Responsibilities Section, American Bar Association Human Rights Magazine, http://www.abanet.org/irr/hr/spring04/forced.html (accessed June 2008) .

Google Scholar

Hogwood , B. W. and L. A. Gunn . 1984 . Policy Analysis for the Real World . Oxford , UK : Oxford University Press .

Google Scholar

Additional information

Notes on contributors

Christopher Hood

Christopher Hood ([email protected]) is Gladstone Professor of Government and Fellow of All Souls College Oxford and Director of the Economic and Social Research Council's Research Programme on Public Services: Quality, Performance, Delivery. He specializes in the study of executive government, regulation and public-sector reform.

Ruth Dixon

Ruth M. Dixon is Project Assistant to the Programme Director of the Economic and Social Research Council's Research Programme on Public Services, Department of Politics and International Relations, University of Oxford.

Craig Beeston

Craig Beeston was educated at the Universities of Manchester and Oxford, and completed his D. Phil. in Modern History at University College, Oxford in 2007. Since 2005 he has been engaged in research projects funded by the ESRC Public Services Programme and the Centre for Analysis of Risk and Regulation at the London School of Economics and Political Science, producing papers with Professor Christopher Hood and Others on the methodology of public service rankings exercises, Britain' international ranking in public service performance, and the analysis of blame management strategies adopted by office-holders during political crises. He now works in the public sector.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related Research Data

Public Management by Numbers as a Performance‐Enhancing Drug: Two Hypotheses

Source: Wiley

Extending Habitus to Employment Preferences: Identifying Social-actors Influencing Employment Choices Including Self-employment Among Youth in J&K (India):

Source: SAGE Publications

External government performance evaluation in China: a case study of the ‘Lien service-oriented government project’

Source: Informa UK Limited

Indicators

Source: Wiley

Performance Measures and Democracy: Information Effects on Citizens in Field and Laboratory Experiments

Source: Oxford University Press (OUP)

Does Disclosing Performance Information Influence Street-Level Enforcement? A Quantitative Study of the Effect on Enforcement Style of Inspectors

Source: Wiley

Ranking Academic Research Performance: A Recipe for Success?

Source: Elsevier BV

Quantitative Accountability and the National Prosecuting Authority in South Africa

Source: Cambridge University Press

We All Need Help: “Big Data” and the Mismeasure of Public Administration

Source: Wiley

The Bidirectional Causality between Country-Level Governance, Economic Growth and Sustainable Development: A Cross-Country Data Analysis

Source: MDPI AG

Linking provided by

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Rating the Rankings: Assessing International Rankings of Public Service Performance

Notes on contributors

Christopher Hood

Ruth Dixon

Craig Beeston

Related Research Data

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Rating the Rankings: Assessing International Rankings of Public Service Performance

ABSTRACT

Acknowledgments

Notes

Additional information

Notes on contributors

Christopher Hood

Ruth Dixon

Craig Beeston

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date