Publication Cover
School Effectiveness and School Improvement
An International Journal of Research, Policy and Practice
Volume 35, 2024 - Issue 1
548
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Quality of teaching at secondary schools in Nicaragua, South Korea, and the Netherlands

ORCID Icon, , &
Pages 73-93 | Received 04 Jan 2023, Accepted 12 Feb 2024, Published online: 20 Feb 2024

ABSTRACT

What are the differences in teaching quality in countries with large cultural and economic differences, such as Nicaragua, South Korea, and the Netherlands? Can these differences be bridged with training programmes? In Nicaragua, South Korea, and the Netherlands, respectively 271, 407, and 144 English and mathematics teachers in secondary education were observed by specially trained observers. The difference in teaching quality between Nicaraguan teachers and teachers in South Korea and the Netherlands is about 1.5 SD. The difference between Dutch and South Korean teachers is about .13 SD. After 160 hr of teacher training, in combination with training of observers in reliable observation, the Nicaraguan teachers grew slightly more than 1 SD. Further research should check whether a growth in teaching skills of teachers in developmental countries is accompanied by a growth in learning performance of students.

Introduction and research questions

In 2007 we reported about differences in the quality of mathematics lessons in elementary education in England, Flanders (Belgium), Lower Saxony (Germany), and the Netherlands (van de Grift, Citation2007). We found in this study that only a few percentage points of difference between teachers are due to differences existing in the four European countries. These four countries are among the wealthiest in Europe and do not differ greatly in terms of gross domestic product (GDP). An interesting question is whether there are large differences in the quality of teaching in countries with more different cultures and the same level of prosperity. The differences between elementary school teachers in the Netherlands and teachers with an Asian culture such as South Korea are also small or insignificant, with an average effect size below .20 (van de Grift et al., Citation2023). This is also true for the differences between secondary school teachers in the Netherlands and South Korea (van de Grift et al., Citation2017). Another question is whether any differences in the quality of teaching can be made up through teacher training and coaching.

Two grants from two different countries gave us the opportunity to set up this research. One grant was from the Ministry of Education Culture and Science of the Netherlands. This grant was aimed at studying the quality of the teaching methods of teachers in the Northern Netherlands. The second grant was from the South Korean Research Fund. This project is an ODA (Official Development Assistance) project, which means that it was conducted based on the government agreement between the Nicaraguan Ministry of Education and KOICA (KOrea International Cooperation Agency) to solve the structural inequity by providing equal opportunity of quality education.

According to the World Bank database,Footnote1 in 2022 the GDP per capita by nominal scale was $2,255 in Nicaragua, $32,254 in South Korea, and $55,985 in the Netherlands. It must be more difficult for Nicaragua to spend a great deal of money on the quality of its teaching than for South Korea and the Netherlands. It is interesting to know whether the quality of teaching in a country with a low GDP is different from that in countries with a high GDP. It is even more important to know whether differences in the quality of teaching can be made up with a relatively simple but intensive intervention. For this study we used the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument (van de Grift, Citation1994). Two research questions are central to this study:

  1. What are the differences in the quality of teaching in countries with large cultural and economic differences, such as Nicaragua, South Korea, and the Netherlands?

  2. Can these differences be bridged with training programmes?

Theoretical and empirical background

The reason for starting this study is that we wanted to know how we could help teachers with improving their skills. We provide a brief overview, first about some background theories of teachers’ professional development and coaching and second about some observational tools for teaching skill analysis.

Five approaches can be distinguished in the field of teachers’ professional development. The first one is the reflective practice. This approach emphasises the importance of reflection in teachers’ professional development. It encourages teachers to critically analyse their practices and continuously learn and grow through reflection (Schön, Citation1983). The second one is collaborative professional development. This suggests that professional development is more effective when teachers work together, sharing their knowledge and expertise, and learning from each other (Little, Citation1990). The third is continuous professional development (CPD), which suggests that professional development should be an ongoing process rather than a one-time event. It emphasises the importance of lifelong learning and ongoing growth and development (Guskey, Citation2000). The fourth one is action research. In this approach, teachers are involved in research processes to investigate and solve problems in their own practices. It encourages a cycle of planning, acting, observing, and reflecting to foster continuous improvement (Kemmis & McTaggart, Citation1988). The last one is individualised professional development. This approach supports the idea of tailoring professional development experiences to meet the individual needs and goals of each teacher, taking into consideration their unique strengths and weaknesses (Timperley et al., Citation2007). The effectiveness of these approaches to professional development should be demonstrated by research. This research should be based on repeated observations in which convincing evidence is found whether teachers improved their teaching with these approaches. Although there are many publications on these approaches to the professional development of teachers, we could not find any research in which convincing evidence was found whether teachers improved their teaching with these approaches. An exception is the individualised professional development approach. In our study we will assess the effectiveness of the individualised professional development approach, by examining how much teachers can grow under the influence of coaching based on repeated observation with a reliable and valid observation instrument.

Coaching teachers is important for the professional development of teachers. Four major theories or approaches are known so far. The first is cognitive coaching. This aims to improve teachers’ thinking processes, by helping them reflect on their practice and make informed decisions based on their analyses. In this approach, the emphasis is on the development of teacher autonomy and self-direction (Costa & Garmston, Citation2002). The second theory is solution-oriented coaching. This involves helping teachers identify and work on solutions, rather than focusing on problems. It aims to promote a positive outlook and develop actionable strategies to achieve specific goals (Grant, Citation2012). The third approach is transformational coaching, which involves working with teachers to transform their beliefs, values, and attitudes. It aims to promote deep, intrinsic change that leads to more effective educational practices (Aguilar, Citation2013). We have not been able to find any solid empirical material about the effects of these coaching approaches on the growth of teachers’ skills. The fourth approach is instructional coaching, where coaches can improve teachers’ teaching practices by working with them to implement evidence-based strategies in their classrooms. The goal is to improve student performance through more effective teaching (Knight, Citation2007). Several reviews and meta-analyses have found positive effects of the use of feedback in general on student learning (Black & William, Citation1998; Fuchs & Fuchs, Citation1986; Graham et al, Citation2015; Hattie & Timperley, Citation2007; Kluger & DeNisi, Citation1996). However, some forms of feedback appear to be much more powerful than others. In the most effective forms of feedback, concrete instructions are given to learners during skill practice and/or feedback is clearly related to the goals to be achieved (Hattie & Timperley, Citation2007). Feedback is more effective when it contains information about what was good and why, and when the feedback builds on the prior practice.

In a meta-analysis of 55 American, two Chilean, and three Canadian experimental studies, Kraft et al. (Citation2018) found on average an effect of 49% of a standard deviation of different ways of coaching. For the 14 studies in secondary education, the mean effect size was 47% of a standard deviation. Deinum et al. (Citation2018) found effect sizes between 29% and 51% of a standard deviation in teaching growth among 237 Dutch secondary school teachers who were observed four times and coached three times in their zone of proximal development. In another Dutch study, teachers in Dutch elementary schools were also coached in their zone of proximal development. After the coaching, slightly higher effect sizes of 29%–76% of a standard deviation were found (van den Hurk et al., Citation2016). In a Dutch teacher training programme for primary education, experiments were carried out with trainee teachers in which coaching in the zone of proximal development was extended with attendance at five lectures on the relationship between the quality of teaching and student performance. With coaching and lectures, effect sizes in student teacher growth were found to range from 81% to 109% of a standard deviation (Tas et al., Citation2018). In another Dutch study, effect sizes of 33%–51% were found among student teachers who did not attend lectures but did receive coaching in their zone of proximal development (Tas et al., Citation2019).

The final goal of teaching is to increase the educational effectiveness and students’ learning gains. There are various models for educational effectiveness in which a relationship is established between the teaching process and the performance of students. These models use a large number of different observation instruments (cf. Dobbelaer, Citation2019). Not all these instruments have sufficient information about the origin of the items and their reliability and validity. It is very important that observation instruments meet the following conditions: The items of the observation tool should be derived from the results of studies on the effectiveness of teacher behaviour on student achievement; the observation instrument used must meet elementary requirements of reliability, validity, and standardisation; and, last but not least, observers observing the same lesson of the same teacher must arrive at roughly the same conclusions. Various models and observation tools for educational effectiveness have become well known. We can think of the dynamic model of educational effectiveness (Creemers & Kyriakides, Citation2006), the framework for teaching (FFT) developed by Charlotte Danielson in 1996 (Danielson, Citation2007), the focused teacher evaluation model from 2017, developed by Robert Marzano and Beverly Carbaugh (Carbaugh et al., Citation2017), and the ICALT instrument, which was developed between 1989 and 1994 by Wim van de Grift (van de Grift, Citation1994). The latter model is used in the research reported in this article.

The ICALT observation instrument

The ICALT observation instrument was developed and tested for reliability and validity in several studies (van de Grift, Citation1994, Citation2007; van de Grift & Lam, Citation1998). The ICALT instrument was initially used by the Dutch Inspectorate of Education for evaluating the quality of teaching in elementary education, but can be and is also used in secondary education. In the actual version, the instrument contains six domains: safe learning climate, classroom management, clear and structured instruction, use of activating teaching methods, differentiation of instruction, and teaching learning strategies. The instrument includes 32 high-inferential indicators and more than 100 low-inferential items (see online Appendix 1). The 152 high- and low-inferential items are based on the results of the original studies found in reviews of studies on the effectiveness of teaching on student performance (Aaronson et al., Citation2007; Cotton, Citation1995; Creemers, Citation1991, Citation1994; Ellis & Worthington, Citation1994; Hanushek & Rivkin, Citation2010; Hattie, Citation2009, Citation2012; Levine & Lezotte, Citation1990, Citation1995; Marzano, Citation2003; Muijs & Reynolds, Citation2010; Purkey & Smith, Citation1983, Citation1985; Sammons et al., Citation1995; Scheerens, Citation1989, Citation1992, Citation2008; van de Grift, Citation1985, Citation1990; Walberg & Haertel, Citation1992; Wright et al., Citation1997). van de Grift (Citation2007) gave a very detailed overview of several pages of the original research publications on which the items of the ICALT instrument are based.

The scales of the ICALT observation instrument have sufficient reliability (Cronbach’s α ≥ .70), the interobserver reliability is adequate (% agreement on sufficient/insufficient scores ≥ 76%), the predictive validity for student engagement is medium to large with R ≥ .36 (van de Grift, Citation2007, Citation2014, Citation2021a, Citation2021b; van de Grift et al., Citation2011). In elementary education, the predictive validity for student achievement after correction for socioeconomic background of the students was significant (β = .16; van de Grift & Lam, Citation1998). A follow-up study in elementary education showed that the quality of the teaching-learning process as measured by a forerunner of the current ICALT instrument was significantly related (γ = .39) to the average results on various pupil tests taken in different school years, even after correcting for pupil background characteristics and school environment characteristics (van de Grift & Houtveen, Citation2006). In several studies we explored whether the ICALT instrument could be used in a reliable and valid way in other European countries (van de Grift, Citation2007, Citation2014; van de Grift et al., Citation2023). Since 2014, there have also been several studies on the reliability and validity of the use of the ICALT observation instrument in secondary education (van de Grift et al., Citation2014; van der Lans et al., Citation2015). The ICALT observation instrument is used in secondary schools in several international comparative studies in Europe, Africa, and Asia (Maulana et al., Citation2021; van de Grift et al., Citation2017).

Finding a teacher’s zone of proximal development with the ICALT scales

Some domains of the ICALT observation instrument are relatively easy to master for teachers, for example, creating a safe and stimulating learning climate. Other domains are rather difficult to master for teachers, for example, teaching learning strategies and differentiating teaching.

shows the percentage of Dutch secondary school teachers that score >2.5 at each of the six scales. A score of >2.5 means that at least 65% of the items of the scale are scored 3 or 4, which implies that the observed teacher shows a “more than sufficient” mastering of the domain. In order to get a deeper insight into the difficulty of the domains, we also computed (with WINMIRA) ß parameters of the domains. This ß parameter is used for expressing the difficulty in the dichotomous Rasch model.

Table 1. Difficulties of the six domains of the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument.

Guttman (Citation1944) developed a psychometric technique where items are arranged in a hierarchical order so that someone who dominates one item will also dominate all lower order, easier, items. We will use this technique for the six domain scores on the ICALT observation instrument. We will give a few examples. If someone has a score of >2.5 on three domains, then for a good fit of the Guttman model these should be the first three easiest scales (i.e., safe learning climate, efficient classroom management, clear and structured instruction). If a teacher has a score of >2.5 in five domains, these must be the first five domains, and not the sixth domain.

shows that this Guttman pattern occurs in 83.1% of the 237 Dutch cases. This can help us to detect the zone of proximal development of the observed teacher. The zone of proximal development of the 83.1% of the observed teachers is simply the first domain were the observed teacher scores below 2.5.

Table 2. Fit of the Guttman pattern with the six domains of the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument.

Suppose we have an observed teacher, Peter, with the following scores on the six domains (cf. ). This observed teacher has on the domain “Differentiating teaching” a score below 2.5. So, this seems to be his zone of proximal development. When we want to coach this teacher, we have to look at his scores on the scale “Differentiating teaching” on the original form. Observing , it is clear that Items 24, 25, and 26 for this teacher are scored below 3. The items themselves are high inferential, that is, too abstract to offer any help for coaching. When we want to coach this teacher, it is better to use the low-inferential examples of good practice that belong to each item.

Table 3. Example of the scores of teacher Peter.

Table 4. Scores on “Differentiating teaching” of teacher Peter.

In 10.6% of the cases, the Guttman pattern is nearly perfect (cf. ). In these 10.6% cases we find one of the following patterns (see ). The zone of proximal development in these cases is the first domain with a score below 2.5. We must of course be careful when the score is close to 2.5. As in the cases when there is a perfect Guttman pattern, there should always be a discussion with the observed teacher about the following questions:

  • Was the lesson representative?

  • Did the teacher get a chance to show all their skills?

Table 5. Cases with an almost perfect Guttman pattern (cf. Deinum et al., Citation2018).

On this basis, it can be decided where the zone of next development of the teacher concerned lies. In 6.3% (cf. ) of the cases the pattern is completely weird. Determining a teacher’s zone of proximal development is not possible in these cases, and a second observation or even a second observer is needed.

Representativeness of the samples

In Nicaragua 127 English teachers and 144 mathematics teachers were observed by specially trained observers. In South Korea these numbers of teachers were 187 and 220, respectively, and in the Netherlands 67 and 77. presents the sample sizes and the accuracy with which a picture can be given of the situation in the population in the three countries.

Table 6. Samples of English and mathematics teachers in Nicaragua, South Korea, and the Netherlands.

The Nicaraguan sample of teachers was neatly spread across schools across the country. We can see this sample as representative. The South Korean sample was mostly collected from Daejeon Metropolitan City area and some in Chungnam and Chungbuk province. However, the sample can be regarded as quite a randomised or representative sample for all South Korean teachers in the nation considering the unique characteristics of teacher education and policy. South Korean teachers’ teaching quality has been completely governed and controlled to maintain the equal quality level regardless of regions, private or public schools, school sizes, and others, by the central government in terms of pre/in-service training curriculum, qualification, salary level and job conditions, and many other aspects of teacher policy (Chun, Citation2021). Even they are periodically and regularly rotated among schools for the sake of sustaining equity of teacher quality by regions and all other school conditions. The Dutch sample is representative for the three provinces in the north of the Netherlands, but not for all of the 12 provinces of the Netherlands. A screenshot from the Inspectorate of Education site shows that the 23 departments of Dutch schools for secondary education are “very weak” (van de Grift, Citation2013). Seven of these 23 departments are located in the three northern provinces (van de Grift, Citation2013). A school is labelled “very weak” when the competent authority does not comply with the legal instructions for education. This occurs when the learning outcomes are unsatisfactory. The learning results of a secondary school are insufficient when the average exam results and the transfer rate (i.e., the doubling percentage and the percentage of students who transfer to lower education types) are below the level of comparable schools for 3 years. So, we should not expect that the teachers in this sample are representative for the Dutch population. Especially the average teaching skills on activating students, teaching learning strategies, and differentiation in the northern provinces are a bit lower than the average scores found in a national sample (Deinum et al., Citation2018).

Sample accuracy

A good sample should not only be representative but must also be large enough to provide an accurate picture of the situation in the population. For an accuracy of 5% (with an expected result of 50% and a 95% confidence interval), we need 377 observations in each of the countries. This means that when 50% of the respondents have a sufficient score, the actual number of respondents in the population is between 45% and 55% (Davidson, Citation1959).

The South Korean sample is large enough for this criterion. The Nicaraguan sample is a bit smaller and has an accuracy of 6%. The much smaller Dutch sample in our study has an accuracy of 8%. In order to find significant differences (α = .05; and a power 1-ß = .80) with a sample as n = 144, an effect size (Cohen’s δ) of at least .29 is required.

Psychometric quality of the ICALT observation instrument

An important question is whether the ICALT observation instrument has sufficient reliability (internal consistency) in each of the three countries, Nicaragua, South Korea, and the Netherlands. Other important questions are: Do the items have the same meaning in these three countries, and do the scales have sufficient predictive validity? Positive answers on these questions are necessary to be allowed to compare correlations and average scores found in the different countries and different cultures.

The reliability is determined by calculating Cronbach’s α, also called homogeneity, for each of the countries (Cronbach, Citation1951). presents the results. The reliability coefficients (homogeneity) meet (almost) the standard of .70 in all three countries. The sometimes relatively low reliability may be due to the small size and the relative homogeneity of the samples (only mathematics teachers and English teachers). We conclude that we can use the ICALT scales reliably in each of the three countries.

Table 7. Homogeneity (Cronbach’s α) of International Comparative Analysis of Learning and Teaching (ICALT) scales.

The factor structure, the factor loadings, and the intercepts have to be the same for different countries to be able to compare the correlations and the mean scores in the three countries. The tests for these assumptions are carried out with the program MplusFootnote2 Version 7.4 (Muthen & Muthen, Citation1998Citation2015). presents the results.

Table 8. Multigroup confirmative factor analysis Nicaragua (217), South Korea (407), and the Netherlands (144).

Norms for acceptable fit are: the root-mean-square error of approximation (RMSEA) < .08 and the comparative fit index (CFI) and Tucker–Lewis index (TLI) > .90 (Chen et al., Citation2008; Hu & Bentler, Citation1999; Kline, Citation2005; Marsh et al., Citation2004; Tucker & Lewis, Citation1973). The configural model checks whether the factor structure is the same in different countries. The RMSEA, goodness-of-fit index (GFI), and TLI found for the configural model are in agreement with the norms. The metric model tests whether the structure of the factor and the factor loadings are the same for different countries. This is necessary for being allowed to compare the correlations found in the different countries. The RMSEA, GFI, and TLI found for the metric model are also in agreement with the norms. So, we are allowed to compare the correlations found in the three different countries. The scalar model tests whether the factor structure and the factor loadings and the intersections are the same for different countries. This is necessary to be able to compare the average scores in the three countries. The RMSEA, GFI, and TLI found for the scalar model are also in agreement with the norms. So, we are allowed to compare the average scores in the three countries. Despite the large cultural and economic differences between Europe, Asia, and Middle America, there seems to be no problem for comparing the correlations and average scores of the three countries.

To determine the predictive validity, we calculated the correlations between the six ICALT scales and the involvement of the students. The results are presented in . Following Cohen (Citation1988), we call correlations > .30 medium size and correlations > .50 large (cf. ). The correlations between the six ICALT scales and student involvement are the highest in South Korea and the lowest in Nicaragua. Nevertheless, all correlations are medium to large. The students in all three countries pay more attention if their teachers show better skills.

Table 9. Predictive validity of the International Comparative Analysis of Learning and Teaching (ICALT) scales for student involvement.

Table 10. Standards for correlations (Cohen, Citation1988, p. 82).

Inter-rater reliability

The psychometric quality of the items is only a part of the quality of an observation instrument. Other aspects that concern the quality of an observation include the quality of the observers, interaction effects between the observed teacher and the observer, and the moment of observation. Smit et al. (Citation2017) discovered that interaction effects between observed teachers and observers explain about 18% of the variance of observations of individual teachers. A study of van der Lans et al. (Citation2016) reveals that we need at least three different lessons to get a reliable picture of the skill of an individual teacher. In this study we are not interested in the scores of individual teachers but in the differences between the average scores of teachers in different cultures. When we only look at the average scores, the effects of the moment of observation and the interaction effects of teachers and observers cancel out. Nevertheless, for this study we still need a sufficient amount of inter-rater reliability. We can only reliably and validly assess teacher competence with trained observers who are mutually consistent, who have sufficient consensus among themselves about sufficient and insufficient scores, and who have sufficient agreement with an external standard. We studied the reliability of the observations by performing both within- and between-cultural analyses with the data of the observers. In the within-culture analysis we answered the questions: Are the observers mutually consistent in their scores, and do observers agree on sufficient/insufficient scores? With between-culture analyses we answer the question whether observers from different cultures or regions differ in strictness or leniency.

A high interobserver consistency means that if an observer scores an item highly (or lowly), the other observers should also score that item highly (or lowly). To determine this, we use the intraclass correlation (ICC) coefficient (Bartko, Citation1966; Koch, Citation1982). We apply in our work for the ICC the norm: >.70 for sufficient/acceptable and >.80 for good consistency (Cronbach, Citation1951). The norm of .70 is somewhat more stringent than the standards of Cicchetti (Citation1994) and Koo and Li (Citation2016). We must bear in mind that the ICC is sensitive to the number of observers. The more observers we have, the higher the ICC will be. shows the results before and after the training. After the training the consistency of the 22 Nicaraguan observers, the 14 Dutch observers, and the 18 South Korean observers is good (>.80).

Table 11. Are observers sufficiently consistent in their ratings? Intra-class correlation (ICC).

The average amount of agreement on the score sufficient (3–4) or insufficient (1–2) on all items should be higher than chance (i.e., 50% for dichotomous items). presents the percentage consensus on sufficient/insufficient and the percentage agreement corrected for chance, which is Cohen’s κ (Cohen, Citation1960). Landis and Koch (Citation1977) suggested some norms for Cohen’s κ: > .20 is reasonable and > .40 is moderate.

Table 12. Do observers have sufficient consensus about sufficient/insufficient scores? (% consensus about sufficient/insufficient).

The Nicaraguan observers show after training results (69%) that are better than by chance (50%) only, but their results could be improved somewhat. The percentage agreement should be >70% and the related Cohen’s κ should be >.40. After training, the Dutch observers show results (78%) that are better than by chance (50%) only. The South Korean observers show 76% consensus, which is also better than chance. Cohen’s κ of the Dutch observers is .56 (moderate). The South Korean observers score 76% consensus and have also a moderate Cohen’s κ of .52. Landis and Koch (Citation1977) suggested some norms for Cohen’s κ: >.20 is reasonable and >.40 is moderate.

There should be no large intercultural differences in the observations of the same lesson. When Nicaraguan, South Korean, and Dutch observers observe the same lesson, they should reach more or less the same conclusion. This is the case if the difference in the average scores that the same observed lesson receives from observers from different cultures is small. shows that before the training, the differences between the Dutch and the Nicaraguan observers observing the same teacher are negligible according to the norms of Cohen (Citation1988). The differences between the South Korean and the Dutch observers observing the same teacher are medium according to the norms of Cohen (Citation1988). The differences are small between the South Korean and the Nicaraguan observers observing the same teacher.

Table 13. Are observers just as strict (or lenient) as observers from other cultures (Dutch lesson before training).

shows that after the training, the differences between the Dutch and the Nicaraguan observers observing the same videotaped teacher are negligible according to the norms of Cohen (Citation1988). The differences between the South Korean and the Dutch observers observing the same teacher are small according to the norms of Cohen (Citation1988). The differences between the South Korean and the Nicaraguan observers observing the same teacher are also small.

Table 14. Are observers just as strict (or lenient) as observers from other cultures (Dutch lesson after training).

Differences between Nicaraguan, South Korean, and Dutch teachers

presents an insight into the differences of the average scores of the mathematics and English teachers in the three countries. If the average score of a scale is 3 or higher (good), it means that the average teacher scores ≥75% of the items positively. A score of >2.5 is amply sufficient. In that case ≥65% of the items of a scale is scored as sufficient. A score of 2 or lower is insufficient.

Table 15. Differences in average scores of English and mathematics teachers in Nicaragua, South Korea, and the Netherlands.

Dutch teachers score slightly and significantly higher on the basic skills (educational climate, classroom management, and clear instruction), while South Korean teachers score slightly higher on the advanced teaching skills like learning strategies and differentiation. The difference between the South Korean and the Dutch teachers of 38% of a standard deviation on activating students in favour of the South Korean teachers is as great as the greater leniency of the South Korean observers. It therefore seems unwise to attach value to this difference, despite the fact that the difference is significant (p = .00).

On the basic skills, the average score of both the South Korean and the Dutch teachers is above 3 (good). On the advanced skills, the average score of the South Korean teachers is better than amply sufficient. The Dutch teachers also score on average amply sufficient on activating students, but the South Korean teachers score better than sufficient on learning strategies and differentiation.

The Nicaraguan teachers score on average amply sufficient on educational climate. The average score for classroom management and clear instruction is for the Nicaraguan teachers better than sufficient. The Nicaraguan teachers score below sufficient on three advanced skills: activating students, teaching learning strategies, and differentiation. Student involvement is good in the South Korean and Dutch lessons and amply sufficient in the Nicaraguan lessons. shows some more detailed aspects about the average scores and standard deviations of the English and mathematics teachers in Nicaragua, South Korea, and the Netherlands. Also, more detailed aspects about the effect sizes and significance levels of the differences in average scores are found in . This table shows the differences in effect sizes for the three countries’ norms for the effect sizes: negligible <.20; small .20–.49; medium .50–.79; large .80–.1.29; very large >1.30 (Cohen, Citation1988).

shows that almost all significance levels of the differences are <.01. This means that the smaller sample size of teachers from the Netherlands does not really affect the significance of the differences. Only the difference between the South Korean and the Dutch teachers on “clear instruction” is not significant (p = .34), with a negligible effect size of .13. The average difference on the six ICALT scales between the Nicaraguan teachers and the teachers in both South Korea and the Netherlands is about 1.5 SD, which is very large. With a medium effect size of .64, the Dutch teachers are better at creating a safe and stimulating climate and a little bit (ES = .29) better at classroom management than the South Korean teachers. The difference between these two countries in clear, and structured instruction is negligible. The South Korean teachers are better than the Dutch teachers in the more advanced teaching skills: learning strategies (ES = .82) and differentiation (ES = .65).

The differences between the mathematics teachers and the English teachers of the three countries are negligible and not significant for educational climate, classroom organisation, clear instruction, activating students, and learning strategies. On differentiation we found a small (ES = .22) significant difference between the mathematics teachers and the English teachers. The mathematics teachers are a little bit better on differentiation. We found no differences between the involvement of students during the mathematics and English lessons (see ).

Table 16. Differences in average scores between English and mathematics teachers in Nicaragua, South Korea, and the Netherlands.

Growth of teaching skill of Nicaraguan teachers after coaching

Of the 271 Nicaraguan teachers, 147 were trained and observed a second time after an interval of about 1.5 years.Footnote3 The teachers were trained with a training programme consisting of various parts for knowledge and skills for teaching. The training programme was composed of various skills and knowledge for teaching. This consisted of two parts. The major part, about 2–3 hr of training, was the ICT-based skills such as making power points for teaching materials, different kinds of applications for interactive teaching, and Learning and Moodle LMS (learning management system). The second one, about 1–3 hr of training, was the general pedagogic knowledge such as designing a good lesson plan, creating teaching and learning strategies, micro-teaching practices, and so on. The training curriculum was composed of two tracks by two levels: off-line and on-line, respectively, for the basic and advanced courses. Each of these four courses was composed of several different subjects with a total of 40 hr. There were two criteria for successfully completing the curriculum: One is over 70% of attendance, and the other is over 80% of quiz-based achievement tests and other appropriate measures depending on the course descriptions.

Twenty-two experienced Nicaraguan teachers have been trained to reliably and validly observe their colleagues with the ICALT observation instrument and to trace the zone of proximal development of their observed colleagues. After the first observation, these observers coached the observed teachers in their zone of proximal development. These teachers were observed again after 1.5 years. shows the average scores of the 147 teachers before and 19 months after this training.

Table 17. Differences in average scores of Nicaraguan teachers after 19 months (n = 147).

Looking at and , first we have to conclude that the 147 trained Nicaraguan teachers score about the same as the total group of 271 Nicaraguan teachers of which these 147 teachers are a part. So, there is no reason to think that this subgroup of trained 147 teachers differs from the original group of 271 teachers.

We can conclude that the differences between the first and second observation for all components on the ICALT instrument are significant at the .001 level. This also applies to the differences in the average score on the involvement of the students. also shows the differences in effect size. The norms for effect sizes are, according to Cohen (Citation1988), negligible <.20, small .20–.49, medium .50–.79, large .80–1.29, and very large >1.30. The 147 trained Nicaraguan teachers scored on average a full standard deviation higher in the second observation than in the first observation. Such a large effect of a full standard deviation was also found in Dutch student teachers who received a coaching approach consisting of classroom observation, assigning and using appropriate lesson preparation templates, combined with stage-focused feedback at the school and lectures at the teacher training college (Tas et al., Citation2018, Citation2019). The Nicaraguan teachers achieved a very large skill growth (ES = 1.35–1.51) in their basic skills. Growth was high on activating students (ES = 1.18). Medium size (.59) growth was achieved in teaching students how to learn, and in differentiation growth was small (.42). The involvement of the students of these teachers also grew with a medium size (.64). This growth can be interpreted as the outcome of the Nicaragua-KOICA ODA project, which aimed to improve the Nicaraguan secondary teaching skill. It is interesting to see to what extent the average Nicaraguan scores in the second observation still deviate from the South Korean and Dutch average scores. The overview is in .

Table 18. Differences in effect size with average scores of Nicaraguan teachers after 19 months (n = 147).

On two of the three basic skills, in the second observation, the average scores of the 147 Nicaraguan teachers are at the same level as the average scores of the South Korean and Dutch teachers. This concerns the creation of a safe and stimulating learning climate (3.49) and efficient classroom management (3.18). These average scores can definitely be called good. The average scores of the 147 Nicaraguan teachers on “clear instruction” (2.90) and on activating the students (2.56) are more than sufficient in the second observation, but still lag somewhat behind those of the South Korean and Dutch teachers. The scores of the Nicaraguan teachers remain with the most advanced skills “teaching students how to learn something” (1.97) and “differentiation” (1.91) on average just below the limit of sufficient, while the South Korean and Dutch teachers here score on average (more than) sufficient. This may be due to the relatively large gap in teaching skills of the Nicaraguan teachers.

For the provision of development aid for low-GDP countries, it is very important that a treatment of 160 hr of teacher training, in combination with proper training of observers in reliable observation and in tracing a teacher’s zone of proximal development, will improve the skills of teachers in countries with a low GDP more than 1 SD.

In our literature study, we have seen that teachers can grow 27%–76% of a standard deviation in teaching skill under the influence of observation and coaching in the zone of proximal development of the observed teacher (Deinum et al., Citation2018; van den Hurk et al., Citation2016). Student teachers can grow 81%–109% of a standard deviation with lectures on the relationship between the quality of teaching and student performance and coaching in the zone of proximal development of the observed teacher. So the amount of growth in teaching skill of the Nicaraguan teachers is about the same as the growth of the Dutch student teachers (Tas et al., Citation2018).

Conclusions

The ICALT observation scales can be used reliably in any of the three countries and present no problems for comparing correlations and mean scores. The correlations of the six ICALT scales with student involvement are medium to large. For each of the three countries, students are more engaged in the lesson if their teachers demonstrate better teaching skills. This is important for the predictive validity of the ICALT observation scales.

The trained observers are sufficiently to well consistent in their observations in each of the three countries. The observers have reasonable to moderate consensus on whether certain observed behaviour deserves a pass. After training, the Nicaraguan observers and the Dutch observers have negligible differences when observing the same lesson of the same teacher. The South Korean observers show small differences (ES ± .40) with the Nicaraguan and the Dutch observers when observing the same teacher. The South Korean observers seem to be a bit more lenient than the Nicaraguan and the Dutch observers. Therefore, we must be somewhat careful in interpreting small differences in the observation scores of the South Korean observers, on the one hand, and the observation scores of the Nicaraguan and Dutch observers, on the other hand.

At the start of this study, the average difference on the six ICALT scales between the Nicaraguan teachers and the teachers in both South Korea and the Netherlands was about 1.5 SD, which is very large. The Nicaraguan teachers were coached, on the basis of their zone of proximal development, as determined by well-trained observers. The coaching was supplemented with lectures on the relationship between the quality of teaching and student performance, but also with off-line and on-line trained ICT-based skills, power point (PPT) making, and general teaching skills. The Nicaraguan teachers received in total a treatment of 160 hr of teacher training, as planned and conducted by the ODA project by KOICA for Nicaragua. We found that the skills of the Nicaraguan teachers after this training have grown with more than one full standard deviation, which is a large growth.

In summary, we can state that individualised professional development realised by instructional coaching based on the zone of proximal development of the observed teachers yields on average about half a standard deviation growth in the teaching skills of primary and secondary school teachers. In combination with extra lessons on the relationship between the quality of teaching and student performance, this form of coaching easily yields a learning gain of a full standard deviation. We know from research by Hanushek (Citation2011) that students of teachers with a full standard deviation better teaching skills earn in later life on average $20,000 more. This offers special perspectives for future development work aimed at the quality of teaching. For further research it is important to check whether a (large) growth in teaching skills of teachers in developmental countries is also accompanied by a growth in the learning performance and in later earnings of the students. It is important to examine whether these relatively large effects of coaching on growth in teaching skill can also be achieved in other countries with a low GPD.

Supplemental material

Supplemental Material

Download MS Word (224.5 KB)

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by (1) The Ministry of Education, Culture and Science of the Netherlands, and (2) The South Korean Research Fund through the Study for Improving Teaching Skill by Classroom Observation Analysis (Project number: 2017S1A5A2A03067650), and the KOICA-ODA project for Capacity Building for Teachers in ICT-integrated Secondary Education in Nicaragua 2017–2021.

Notes on contributors

Wim J. C. M. van de Grift

Wim J. C. M. van de Grift (1951), professor emeritus, was full professor in Educational Sciences and director of the Teacher Training Institute at the University of Groningen, and scientific advisor of the Inspectorate of Education in the Netherlands. After his retirement, he started his own company. His research programme focuses on the development of teaching skills, the factors that influence this development, and the relationship with student performance.

Seyeoung Chun

Seyeoung Chun, PhD, professor emeritus, was teaching at the Department of Education of the Chungnam National University of South Korea from 1997 to 2021. He received education and a PhD from Seoul National University and has been actively doing research. He used to work actively in educational policy and took several key positions such as Secretary of Education to the President and CEO of the Korean Educational Research and Information Service (KERIS).

Okhwa Lee

Okhwa Lee, PhD, professor emeritus, was a full professor at the Department of Education, Chungbuk National University, South Korea, as a specialist of educational technology, and has been a member of the Presidential council for educational reform. She has a rich experience in international research collaborations and development programmes of the Korean government ODA (Official Development Assistant) for developing countries.

Deukjoon Kim

Deukjoon Kim is adjunct professor at the Department of Education, Chungnam National University, Deajeon, South Korea. He has been active in the fields of Electronics Engineering and Information Security from the late 1980s to the early 2000s. Since then, he has been pursuing education (majoring in Educational Technology) and has obtained a master’s degree (2006) and a doctorate degree (2014) in education. He has worked as a professor at private and national universities and has conducted various researches in the field of educational technology, such as development and utilisation of educational technology and educational evaluation. Recently, he has been working as an ODA expert in the field of education.

Notes

2 The Mplus program only works if all categories (1–4) for all 35 variables occur in each of the four countries. In the Dutch sample, var1, var2, var3, and var5 did not have a score of 1. In the South Korean sample, var8 and var9 did not have a score of 1. This was solved by recoding the relevant scored variables from 2 to 1 for the first occurring respondent of these countries. In the Nicaraguan sample, the score 4 is missing for var17, var18, and var32. Therefore, for the Nicaraguan data, we changed the first 3 to 4 for the model fit calculations at var17, var18, and var32. This recoding is only done in the data file for Mplus because otherwise the Mplus program will not work and will of course not be used for further calculations.

3 As noted in the beginning, this project for teacher training was supported by ODA between KOICA and the Nicaraguan Ministry of Education for 2018–2021. The information given here has been confirmed by the authors who participated in the project.

References

  • Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135. https://doi.org/10.1086/508733
  • Aguilar, E. (2013). The art of coaching: Effective strategies for school transformation. John Wiley & Sons.
  • Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports, 19(1), 3–11. https://doi.org/10.2466/pr0.1966.19.1.3
  • Black, P., & William, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
  • Carbaugh, B., Marzano, R., & Toth, M. (2017). The Marzano focused teacher evaluation model: A focused, scientific-behavioral evaluation model for standards-based classrooms. Marzano Center.
  • Chen, F., Curran, P. J., Bollen, K. A., Kirby, J., & Paxton, P. (2008). An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36(4), 462–494. https://doi.org/10.1177/0049124108314720
  • Chun, S. (2021). 대한민국의 교육기적 [Education miracle in the Republic of Korea] CNU press. https://product.kyobobook.co.kr/detail/S000001833217
  • Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4), 284–290. https://doi.org/10.1037/1040-3590.6.4.284
  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Costa, A. L., & Garmston, R. J. (2002). Cognitive coaching: A foundation for Renaissance schools. Christopher-Gordon Publishers.
  • Cotton, K. (1995). Effective schooling practices: A research synthesis 1995 update. Northwest Regional Educational Laboratory.
  • Creemers, B. P. M. (1991). Effectieve instructie: Een empirische bijdrage aan de verbetering van het onderwijs in de klas [Effective instruction: An empirical contribution to improvement of education in the class]. Instituut voor Onderzoek van het Onderwijs (SVO).
  • Creemers, B. P. M. (1994). The effective classroom. Cassell.
  • Creemers, B. P. M., & Kyriakides, L. (2006). Critical analysis of the current approaches to modelling educational effectiveness: The importance of establishing a dynamic model. School Effectiveness and School Improvement, 17(3), 347–366. https://doi.org/10.1080/09243450600697242
  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555
  • Danielson, C. (2007). Enhancing professional practice: A framework for teaching (2nd ed.). Association for Supervision and Curriculum Development.
  • Davidson, H. J. (1959). Accuracy in statistical sampling. The Accounting Review, 34(3), 356–365.
  • Deinum, J. F., Uffen, I., & van de Grift, W. (2018). Het oog van de meester: Lesobservaties om de lespraktijk te verbeteren [The eye of the master: Lesson observations to improve teaching practice]. Rijksuniversiteit Groningen.
  • Dobbelaer, M. J. (2019). The quality and qualities of classroom observation systems [Doctoral dissertation, University of Twente]. Ipskamp printing. https://doi.org/10.3990/1.9789036547161
  • Ellis, E. S., & Worthington, L. A. (1994). Research synthesis on effective teaching principles and the design of quality tools for educators (Technical Report No. 5). National Center to Improve the Tools of Educators.
  • Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199–208. https://doi.org/10.1177/001440298605300301
  • Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing: A meta-analysis. The Elementary School Journal, 115(4), 523–547. https://doi.org/10.1086/681947
  • Grant, A. M. (2012). An integrated model of goal-focused coaching: An evidence-based framework for teaching and practice. International Coaching Psychology Review, 7(2), 146–165. https://doi.org/10.53841/bpsicpr.2012.7.2.146
  • Guskey, T. R. (2000). Evaluating professional development. Corwin Press.
  • Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9(2), 139–150. https://doi.org/10.2307/2086306
  • Hanushek, E. A. (2011). The economic value of higher teacher quality. Economics of Education Review, 30(3), 466–479. https://doi.org/10.1016/j.econedurev.2010.12.006
  • Hanushek, E. A., & Rivkin, S. G. (2010). Generalizations about using value-added measures of teacher quality. American Economic Review, 100(2), 267–271. https://doi.org/10.1257/aer.100.2.267
  • Hattie, J. A. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
  • Hattie, J. A. (2012). Visible learning for teachers: Maximizing impact on learning. Routledge.
  • Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
  • Kemmis, S., & McTaggart, R. (1988). The action research planner (3rd ed.). Deakin University.
  • Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). The Guilford Press.
  • Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. https://doi.org/10.1037/0033-2909.119.2.254
  • Knight, J. (2007). Instructional coaching: A partnership approach to improving instruction. Corwin Press.
  • Koch, G. G. (1982). Intraclass correlation coefficient. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences: Vol 4. Icing the tails – Limit theorems (pp. 213–217). John Wiley & Sons.
  • Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012
  • Kraft, M. A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research, 88(4), 547–588. https://doi.org/10.3102/0034654318759268
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
  • Levine, D. U., & Lezotte, L. W. (1990). Unusually effective schools: A review and analysis of research and practice. The National Center for Effective Schools Research and Development.
  • Levine, D. U., & Lezotte, L. W. (1995). Effective schools research. In J. A. Banks & C. A. M. Banks (Eds.), Handbook of research on multicultural education (pp. 525–547). Macmillan.
  • Little, J. W. (1990). The persistence of privacy: Autonomy and initiative in teachers’ professional relations. Teachers College Record, 91(4), 509–536. https://doi.org/10.1177/016146819009100403
  • Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Structural Equation Modeling: A Multidisciplinary Journal, 11(3), 320–341. https://doi.org/10.1207/s15328007sem1103_2
  • Marzano, R. J. (2003). What works in schools: Translating research in action. Association for Supervision and Curriculum Development.
  • Maulana, R., André, S., Helms-Lorenz, M., Ko, J., Chun, S., Shahzad, A., Irnidayanti, Y., Lee, O., de Jager, T., Coetzee, T., & Fadhilah, N. (2021). Observed teaching behaviour in secondary education across six countries: Measurement invariance and indication of cross-national variations. School Effectiveness and School Improvement, 32(1), 64–95. https://doi.org/10.1080/09243453.2020.1777170
  • Muijs, D., & Reynolds, D. (Eds.). (2010). Effective teaching: Evidence and practice (3rd ed.). Sage Publications.
  • Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus user’s guide (7th ed.).
  • Purkey, S. C., & Smith, M. S. (1983). Effective schools: A review. The Elementary School Journal, 83(4), 427–452. https://doi.org/10.1086/461325
  • Purkey, S. C., & Smith, M. S. (1985). School reform: The district policy implications of the effective schools literature. The Elementary School Journal, 85(3), 353–389. https://doi.org/10.1086/461410
  • Sammons, P., Hillman, J., & Mortimore, P. (1995). Key characteristics of effective schools: A review of school effectiveness research. Office for Standards in Education.
  • Scheerens, J. (1989). Wat maakt scholen effectief? Samenvattingen en analyses van onderzoeksresultaten [What explains a school’s effectivity? Summaries and analyses of research outcomes]. Instituut voor Onderzoek van het Onderwijs (SVO).
  • Scheerens, J. (1992). Effective schooling: Research, theory and practice. Cassell.
  • Scheerens, J. (2008). Een overzichtsstudie naar school- en instructie-effectiviteit: Samenvattingen en analyses van onderzoeksresultaten [Review of school and instruction effectiveness: Summaries and analyses of research outcomes]. Universiteit Twente, Vakgroep Onderwijsorganisatie en -management.
  • Schön, D. A. (1983). The reflective practitioner: How professionals think in action. Basic books.
  • Smit, N., van de Grift, W., de Bot, K., & Jansen, E. (2017). A classroom observation tool for scaffolding reading comprehension. System, 65, 117–129. https://doi.org/10.1016/j.system.2016.12.014
  • Tas, T., Houtveen, T., & van de Grift, W. (2019). Effects of data feedback in the teacher training community. Journal of Professional Capital and Community, 5(1), 27–50. https://doi.org/10.1108/JPCC-09-2019-0023
  • Tas, T., Houtveen, T., van de Grift, W., & Willemsen, M. (2018). Learning to teach: Effects of classroom observation, assignment of appropriate lesson preparation templates and stage focused feedback. Studies in Educational Evaluation, 58, 8–16. https://doi.org/10.1016/j.stueduc.2018.05.005
  • Timperley, H., Wilson, A., Barrar, H., & Fung, I. (2007). Teacher professional learning and development: Best evidence synthesis iteration (BES). Ministry of Education. https://www.educationcounts.govt.nz/publications/series/2515/15341
  • Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38(1), 1–10. https://doi.org/10.1007/BF02291170
  • van de Grift, W. (1985). Onderwijsleerklimaat en leerlingprestaties [Educational climate and student achievement]. Pedagogische Studiën, 62(4), 401–414.
  • van de Grift, W. (1990). Educational leadership and academic achievement in elementary education. School Effectiveness and School Improvement, 1(1), 26–40. https://doi.org/10.1080/0924345900010104
  • van de Grift, W. J. C. M. (1994). Technisch rapport van het onderzoek onder 386 basisscholen ten behoeve van de evaluatie van het basisonderwijs [Report on the evaluation of 386 schools for elementary education]. Inspectie van het Onderwijs.
  • van de Grift, W. (2007). Quality of teaching in four European countries: A review of the literature and application of an assessment instrument. Educational Research, 49(2), 127–152. https://doi.org/10.1080/00131880701369651
  • van de Grift, W. J. C. M (2013). Van zwak naar sterk, de aanpak van zwakke en zeer zwakke scholen voor voortgezet onderwijs in het noorden van het land door observatie van en feedback door leraren [From weak to strong, tackling weak and very weak secondary schools in the north of the country through observation and feedback from teachers]. University of Groningen.
  • van de Grift, W. J. C. M. (2014). Measuring teaching quality in several European countries. School Effectiveness and School Improvement, 25(3), 295–311. https://doi.org/10.1080/09243453.2013.794845
  • van de Grift, W. (2021a). Het coachen van leraren (1). Het vaststellen van de zone van naaste ontwikkeling [Coaching teachers (1). Establishing the zone of proximal development]. Basisschoolmanagement, 35(22), 24–27.
  • van de Grift, W. (2021b). Het coachen van leraren (2). Welke aanpak werkt om vaardigheden te verbeteren? [Coaching teachers (2). What approach works to improve skills?]. Basisschoolmanagement, 35(2), 27–29.
  • van de Grift, W., Chun, S., & Lee, O. (2023). Measuring teaching quality and student engagement in elementary education in South Korea and the Netherlands. Journal of Educational & Psychological Research, 5(3), 767–772.
  • van de Grift, W. J. C. M., Chun, S., Maulana, R., Lee, O., & Helms-Lorenz, M. (2017). Measuring teaching quality and student engagement in South Korea and The Netherlands. School Effectiveness and School Improvement, 28(3), 337–349. https://doi.org/10.1080/09243453.2016.1263215
  • van de Grift, W., Helms-Lorenz, M., & Maulana, R. (2014). Teaching skills of student teachers: Calibration of an evaluation instrument and its value in predicting student academic engagement. Studies in Educational Evaluation, 43, 150–159. https://doi.org/10.1016/j.stueduc.2014.09.003
  • van de Grift, W. J. C. M., & Houtveen, A. A. M. (2006). Underperformance in primary schools. School Effectiveness and School Improvement, 17(3), 255–273. https://doi.org/10.1080/09243450600697317
  • van de Grift, W. J. C. M., & Lam, J. F. (1998). Het didactisch handelen in het basisonderwijs [Teaching in elementary education]. Tijdschrift voor Onderwijsresearch, 23(3), 224–241.
  • van de Grift, W., van der Wal, M., & Torenbeek, M. (2011). Ontwikkeling in de pedagogisch didactische vaardigheid van leraren in het basisonderwijs [Development of teaching skills in elementary education]. Pedagogische Studiën, 88(6), 416–432.
  • van den Hurk, H. T. G., Houtveen, A. A. M., & van de Grift, W. J. C. M. (2016). Fostering effective teaching behaviour through the use of data-feedback. Teaching and Teacher Education, 60, 444–451. https://doi.org/10.1016/j.tate.2016.07.003
  • van der Lans, R. M., van de Grift, W. J. C. M., & van Veen, K. (2015). Developing a teacher evaluation instrument to provide formative feedback using student ratings of teaching acts. Educational Measurement: Issues and Practice, 34(3), 18–27. https://doi.org/10.1111/emip.12078
  • van der Lans, R. M., van de Grift, W. J. C. M., van Veen, K., & Fokkens-Bruinsma, M. (2016). Once is not enough: Establishing reliability criteria for feedback and evaluation decisions based on classroom observations. Studies in Educational Evaluation, 50, 88–95. https://doi.org/10.1016/j.stueduc.2016.08.001
  • Walberg, H. J., & Haertel, G. D. (1992). Educational psychology’s first century. Journal of Educational Psychology, 84(1), 6–19. https://doi.org/10.1037/0022-0663.84.1.6
  • Wright, S. P., Horn, S. P., & Sanders, W. L. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11(1), 57–67. https://doi.org/10.1023/A:1007999204543