2,013
Views
13
CrossRef citations to date
0
Altmetric
Introduction

International Large Scale Assessments: Current Status and Ways Forward

At about the same time that Scandinavian Journal of Educational Research was founded in 1956, the International Association for the Evaluation of Educational Achievement (IEA) was established. This took place in 1958 at the United Nations Educational, Scientific and Cultural Organization (UNESCO) Institute for Education in Hamburg, where an international group of researchers was invited to discuss the possibility of conducting international comparative studies on student achievement and its determinants. One of the advantages to be gained by such an approach was that it would allow study of aspects of educational systems that are fixed within countries but which vary across countries. Following an international pilot which demonstrated the practical challenges to be manageable and the results to be meaningful, a full-scale study on mathematics was conducted. This study was followed by studies in other subject matter areas and in some areas there was a second round of investigations.

The design and conduct of these studies was a pioneering effort which required identification and resolution of a wide range of difficult methodological challenges, concerning among other things theoretical frameworks, definition of internationally comparable populations, principles of sampling and treatment of missing data, development of instruments to measure knowledge and skills that supported valid comparisons, and models for analysis and reporting of data (Husén, Citation1979). The funding of the studies was another great challenge, and typically cooperation between research funding organizations and administrative bodies of the participating countries was required to resolve the funding issues.

While progress was made in tackling many of the theoretical, methodological and practical challenges, stumbling blocks also were encountered. The most severe challenge was that it proved difficult to get dependable answers to the question of greatest interest, namely how different factors causally influence educational outcomes. This was because the cross-sectional survey design used in all the studies only allowed for descriptions of association among variables, but it did not support causal inference concerning determinants of educational outcomes (Husén, Citation1979).

The IEA studies conducted during the period 1960–1990 may be seen as belonging to a first generation of international studies, which had a very significant amount of researcher involvement in development of methodology, and in conducting and reporting the studies (Gustafsson, Citation2008). A following generation of studies had a larger focus on description of outcomes achieved by different educational systems. This was partly due to a shift in focus from input factors to the output from education, and partly to great improvements in the quality and usefulness of the results from the International Large-scale Studies of Achievement (ILSAs). These improvements were largely due to developments in modern test theory which allowed use of many more test-items and which made it possible to equate measurements over time. This supported investigations of changes in levels of achievement within countries, without involving other countries. The first study to take advantage of the newly developed techniques was the Third International Mathematics and Science Study (TIMSS) which was conducted in 1995, and which has since been repeated on a four-year cycle. This study was soon to be followed by others with the same basic design. The most prominent among these is the PISA (Programme for International Student Assessment) study on mathematics, science and reading, which was started in 2000, and is repeated on a three-year cycle. PISA is conducted by the Organization for Economic Cooperation and Development (OECD), and in addition to all OECD countries, a large number of non-OECD countries participate.

While initially the ILSAs mainly recruited a relatively limited number of European countries as participants, the number of participating countries has increased and the recruitment has increasingly become world-wide. The largest studies (PISA and TIMSS) nowadays involve some 60 to 70 participants, and the IEA-studies in particular have covered a wide area of subject matters. The main reason for the growing interest in the ILSAs probably is the great importance ascribed to knowledge and skills for the well-functioning of both societies and individuals.

The second generation ILSAs may be described as advanced machineries for producing data on educational systems. These data have two main uses. The first is for purposes of evaluation and policy discussions within participating countries, and the second as an infrastructure for research. During the close to 30 years that second generation ILSAs have been conducted a large amount of data has been collected, along with experiences for good and for bad. The ILSA phenomenon has been harshly criticized by some researchers; the data have been over-used and over-interpreted by others; while yet others seem to suffer from haphephobia against ILSA data. In spite of these and other problems it does seem likely that ILSAs will be continued for a foreseeable time. However, it seems essential that educational researchers take advantage of the experiences made so far and develop the international studies in directions that make them more useful for purposes of policy and research. The four articles included in this special section all relate, in different ways, to the future development of ILSAs. The articles are briefly presented here.

Article 1: Lessons Learned from PISA: A Systematic Review of Peer-reviewed Articles on the Programme for International Student Assessment

The article by Hopfenbeck et al. presents results from a synthesis of published research based on PISA data. A search in bibliographic databases identified more than 600 articles published in English in international journals in different disciplines. This material was categorized and coded along several different dimensions intended to provide information about, among other things, (1) how PISA has contributed to research on educational topics, (2) how it has stimulated to investigations and discussions about methodological issues in relation to ILSAs, and (3) how it has generated debate and research on the impact of ILSAs on educational policies.

The largest number of articles reported results from analyses of PISA data. Hopfenbeck et al. had classified these into 16 different themes, such as socioeconomic characteristics, teacher characteristics and instruction practices, immigration/language, and bullying, just to mention a few. In their article they only selected the largest category of articles for a more detailed review, namely the theme of inequalities related to socioeconomic background. The studies show, among other things, that early forms of selection such as tracking, grade repetition and school choice affect the level of socioeconomic status (SES)-related inequalities across countries.

A considerable number of articles bring forward critique of different technical and methodological aspects of the PISA study. Among other things these concerned the definition of the constructs represented in the PISA achievement tests, the design of the student self-report questionnaires, the principles for defining student populations to be sampled, participation rates, methods for scaling the achievement test items, measurement bias, translation and language effects, and curriculum fairness. Many of these issues are common to all ILSAs so much of the research reported on these issues follows long traditions of investigations and debate.

A considerable number of articles focus primarily upon the effect of PISA on policy and governance, including the potential mechanisms which drive PISA's influence on educational policies. One suggestion for such a mechanism was that OECDs promotion of the usage of PISA data encourages educational systems to rely upon external authorities such as the OECD for knowledge production and policy guidance.

Hopfenbeck et al. observe that the different groups of articles differ with respect to the usefulness and value they ascribe to the research done on PISA data. The articles presenting secondary analyses confidently rely on PISA data as a foundation from which to generate new knowledge. At the same time a number of articles bring forward technical and methodological arguments for researchers to be cautious in making inferences on the basis of PISA data and articles critical of the policy impact of PISA warn policy-makers to rely on PISA when making educational reforms. The main conclusion thus is that even though PISA definitely contributes to the advancement of educational research, it is necessary to be aware of the limitations of the research and of the need for caution when using this research to inform educational policy.

Article 2: Improving the Comparability and Local Usefulness of International Assessments: A Look Back and a Way Forward

In the second article, Rutkowski and Rutkowski first identify several technical and methodological limitations with the ILSAs, and they then proceed to suggest several ideas for how these limitations could be amended. One main problem is that the statistical procedures that form the basis of the measurement of achievement in the ILSAs typically make the assumption that measurement is invariant across educational systems. However, many studies have demonstrated that this assumption of cross-cultural measurement homogeneity is not tenable, and that the phenomenon of measurement heterogeneity must be accepted as a fact. As one solution to the problem that test and questionnaire items function differently in different countries Rutkowski and Rutkowski propose that the strict invariance assumption is relaxed and that instead a more reasonable assumption of partial invariance is adopted. Under this assumption only a limited set of items need to be invariant, while the measurement properties of the other items are allowed to vary across countries. Statistical techniques which support application of these partial invariance techniques to large-scale data have recently been developed, so this proposal can be evaluated with existing data.

However, Rutkowski and Rutkowski also propose another approach to deal with the issue of cross-cultural measurement invariance which as particularly relevant for the non-cognitive scales. They observe that measurements may work differently across different regions of the world, such as the Nordic, Asian, and Latin American regions, but that differences are smaller among countries within different regions. In order to take regional differences into account in a constructive manner they propose that different sets of items are developed for different regions. This approach would allow for construction of scales which could satisfy different regional needs for information and which also have sound measurement properties.

Article 3: The Contribution of International Large-scale Assessments to Educational Research: Combining Individual and Institutional Data Sources

In the third article Strietholt and Scherer discuss uses of ILSA data for purposes of educational research. They argue that cross-country analyses of ILSA data have a great potential to generate knowledge about issues related to educational policy at the institutional level, as well as about phenomena at lower levels, such as the school, classroom and home. However, they also make the important point that unlike most other data sets in educational research, ILSA data may be combined across studies, cycles and grade levels in numerous different ways, and they can often also be combined with data from other national and international sources. This allows for powerful approaches to investigate research questions that cannot be addressed with data from a single study.

Strietholt and Scherer take a starting point in a previously developed framework of comparative education research which identifies three dimensions along which studies may vary: geographical location, demographic groups, and aspects of education and society. They then review seven published studies, showing how these may be fitted into this framework, and in which different ways they combine data from different sources. For example, they review studies on the effect of educational outcomes on economic growth, the consequences of technological change for reading performance, and the impact of tracking on educational inequality. They demonstrate that in all these studies it is the possibility to observe and exploit variation in educational and societal features at the level of countries that makes it possible to address important research questions. However, another important advantage they identify is that the combination of ILSA data for different outcome domains, educational stages, and time points allows for analytic strategies that can support causal inference by eliminating bias from unobserved confounding variables.

Article 4. Development of School Achievement in the Nordic Countries during Half a Century

The aim of the article by Gustafsson and Blömeke is to study changes in levels of achievement in reading literacy and mathematics in the Nordic countries using data from the oldest studies to the most recent ones, which means that a time span of about 50 years is covered. In addition to data from the IEA studies and the PISA study, data from the OECD PIAAC (Programme for the International Assessment of Adult Competencies) study formed the basis of the analyses. PIAAC is a cross-sectional study covering the age-groups from 16 to 65 years of age. It was hypothesized that observed performance differences between an age cohort for a country and the international mean for the same age cohort would reflect levels of performance at the end of compulsory school. Partial support was obtained for this hypothesis, so the PIAAC data was used as a main source of information about performance differences across 50 years for Denmark, Finland, Norway and Sweden.

The results for reading literacy showed increases in level of performance for those who left compulsory school from the mid-1960s to the mid-1980s and also that there were relatively small performance differences between the countries during this period. However, after this period, Finland's performance level increased even further, while in the other countries levels of performance stagnated or declined. By the mid-1990s Finland outperformed the other Nordic countries by a wide margin, and at this time the Nordic countries also had levels of performance well above the international mean.

These patterns of common development in the Nordic countries as well as the quite dramatic difference between Finland, on the one hand, and the other Nordic countries, on the other hand, are likely to be due both to societal development and to deliberate changes in the educational systems. Gustafsson and Blömeke discuss which changes may explain these patterns of general and differential performance development.

Ways Forward

It is not likely that the currently well-established ILSAs, such as PISA, will change in fundamental ways in the foreseeable future, even though changes in technologies of data collection will transform the actual conduct of the studies. The trend-lines of within-country development of achievement in areas such as literacy and numeracy are too important sources of information to be abolished, and it is difficult to see how they could be replaced with any other system for describing achievement changes within countries. However, the dramatic changes in information technology are likely to impact both on the nature of literacy and numeracy skills and on the techniques used to measure these skills, which creates challenges to keep the meaning of the measures invariant over time. These challenges add to the already known difficulties to achieve measurement invariance across countries and cultures. It does seem essential that efforts are made to come to grips with these challenges through development of improved measurement techniques and better methodology for modelling data. The considerations and proposals made by Rutkowski and Rutkowski are likely to be useful in this work.

As was demonstrated in the review by Hopfenbeck et al. the number of published studies which have taken advantage of the available ILSA data to conduct research in different areas is very large indeed. Their review of studies focusing on determinants of the relations between SES and achievement also demonstrates that great progress has been made in understanding these relations. The sheer amount of data offered by the ILSAs is an important factor behind this great progress, but it also is clear that the ILSA data has a higher level of quality than can typically be achieved in any single research project.

This does of course not mean that we should be satisfied with the current level of quality of the ILSA data or that the methodological challenges which are met in ILSAs can be disregarded. On the contrary, the point is that these studies offer excellent opportunities for doing research on general methodological issues, both on data from field trials (e.g. Kuger, Klieme, Jude, & Kaplan, Citation2016), and on the finally collected data. If sufficient attention is paid to such research, the ILSA data may be an important source of methodological innovation to the benefit not only for comparative research on educational achievement, but also to other areas of social and behavioural research.

As is demonstrated by Strietholt and Scherer the value of the ILSA data for purposes of research may be increased considerably by combining data from different ILSA-projects with one another, as well as combining ILSA-data with official statistics and register data. Given the ever increasing number of data sources such an approach offers virtually unlimited possibilities for creative combinations of data which allow investigation of phenomena in several different disciplinary areas.

In conclusion, even though the ILSAs certainly are not free of problems, they do have the potential both to resolve at least some of these problems, and to provide a solid basis of data for investigating phenomena within education and other areas.

References

  • Gustafsson, J.-E. (2008). Effects of international comparative studies on educational quality on the quality of educational research. European Educational Research Journal, 7(1), 1–17. http://doi.org/10.2304/eerj.2008.7.1.1
  • Husén, T. (1979). An international research venture in retrospect: The IEA surveys. Comparative Education Review, 23, 371–385. http://doi.org/10.1086/446067
  • Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (Eds.). (2016). Assessing contexts of learning. An international perspective. Cham, Switzerland: Springer International Publishing. doi: 10.1007/978-3-319-45357-6

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.