2,049
Views
9
CrossRef citations to date
0
Altmetric
Web Papers

Reliability and validity of a Tutorial Group Effectiveness Instrument

, , &
Pages e133-e137 | Published online: 10 Mar 2010

Abstract

Background: Tutorial group effectiveness is essential for the success of learning in problem-based learning (PBL). Less effective and dysfunctional groups compromise the quality of students learning in PBL.

Aims: This article aims to report on the reliability and validity of an instrument aimed at measuring tutorial group effectiveness in PBL.

Method: The items within the instrument are clustered around motivational and cognitive factors based on Slavin's theoretical framework. A confirmatory factor analysis (CFA) was carried out to estimate the validity of the instrument. Furthermore, generalizability studies were conducted and alpha coefficients were computed to determine the reliability and homogeneity of each factor.

Results: The CFA indicated that a three-factor model comprising 19 items showed a good fit with the data. Alpha coefficients per factor were high. The findings of the generalizability studies indicated that at least 9–10 student responses are needed in order to obtain reliable data at the tutorial group level.

Conclusion: The instrument validated in this study has the potential to provide faculty and students with diagnostic information and feedback about student behaviors that enhance and hinder tutorial group effectiveness.

Introduction

Group work lies in the heart of problem-based learning (PBL). Ensuring the effectiveness of the small-group tutorial is critical for the success of learning in a PBL program. Tutorial group interactions provide students with opportunities to give and receive explanations, to ask questions, and to discuss disagreements which are assumed to lead to a deep understanding of the subject matter (Visschers-Pleijers et al. Citation2005). Dolmans and Schmidt (Citation2006) reported that studies dealing with the cognitive effects of PBL demonstrated that the activation of prior knowledge, causal reasoning, and cognitive conflicts lead to conceptual changes. In the long term, group work plays a vital role in developing medical professionalism and team work skills that are essential for effective multidisciplinary health care teams (Singaram et al. Citation2008).

Group learning environments such as PBL have the promise of creating effective learning environments, but in reality dysfunctional groups also exist. One of the problems in tutorial group work is referred to as ‘ritual’ behavior, i.e. students pretend to be actively involved in the group work, whereas they are in fact not actually involved (Dolmans et al. Citation2005). In some groups, discussions are rather superficial instead of in-depth (Houlden Citation2001; De Grave et al. Citation2001, Citation2002). Quiet and dominant students were also found to hinder student learning (Hendry et al. Citation2003) as this leads to unbalanced discussions in the group (Virtanen et al. Citation1999).

Dolmans et al. (Citation2005) in their review of research and debate on PBL highlighted the need for more theoretical based research to understand the factors and conditions under which tutorial group work in PBL is less and more effective.

Slavin (Citation1996) distinguishes two major theoretical perspectives from which small-group PBL learning can be studied. The first perspective is a motivational one. This perspective emphasizes the importance of cohesiveness or team spirit. The second perspective is a cognitive one. A group provides opportunities to interact, discuss, argue and give explanations to each other, and to provide mutual feedback. These cognitive processes are assumed to positively influence student learning. Although groups can be motivating, some groups may have negative effects on students’ motivation, e.g. when some students do not participate actively. The influence of these factors on tutorial group productivity or success was explored by Dolmans et al. (Citation1998). Their study found a linear relationship between the tutorial group's success and several motivational (i.e. motivation and cohesion) and cognitive (i.e. elaboration and interaction) dimensions. Similar findings were reported by Carlo et al. (Citation2003) who, in addition, noted that students’ backgrounds and cultures influence motivation and the cognitive aspects of the small-group tutorial as well. Furthermore, Singaram et al. (Citation2008) highlighted that although diverse students were positive about group learning, attention should be directed toward students who respond negatively to group work as dysfunctional groups have been noticed in practice. Thus, more attention needs to be directed to understanding the array of factors that influence group effectiveness so that dysfunctional groups can be timely diagnosed and appropriately managed.

The need for instruments to diagnose problems, measure the quality of tutorial group interactions, and find ways to improve the functioning of small group has also been noted by Visschers-Pleijers et al. (Citation2005).

The aim of this article is to report on the development, reliability, and validity of an instrument aimed at measuring tutorial group effectiveness in a PBL curriculum.

Method

Setting

A 5-year integrated PBL curriculum replaced the traditional undergraduate medical curriculum in 2001, at the Nelson R. Mandela School of Medicine (NRMSM). PBL modules form a part of the first to third years of study. The student population at NRMSM is socially and culturally diverse. The majority of the students have a first language other than English (approximately 13 different languages), whilst the language of instruction is English. Students are grouped, taking into account their socio-cultural backgrounds rather than being randomly assigned to PBL groups. The groupings are changed for every new PBL theme/unit. Each group of about 10 students meets twice a week with a facilitator to discuss a case. In the first 2 h session, learning issues are generated which need to be studied during self-study and then reported on in the second 2 h session. One of the students chairs the meeting and the role of chairperson is rotated among the students.

Instrument

A Tutorial Group Effectiveness Instrument (TGEI) was developed based on Slavin's theoretical framework (Slavin Citation1996). The items within the instrument were clustered around motivational and cognitive factors. One factor focused on the cognitive aspects and two factors focused on the motivational aspects. The instrument was pilot-tested with a group of randomly selected students with diverse backgrounds. This led to the rewording of three items and skipping of one item. Nineteen items underlying factors or aspects of group effectiveness were included, i.e. cognitive, motivational, and demotivational aspects (). Some items of the instrument were adapted from Dolmans et al. (Citation1998). The students were asked to respond to each item in the instrument based on a 5-point Likert scale ranging from 1 – ‘strongly disagree’ to 5 – ‘strongly agree’. Students were also asked to rate the overall productivity of the tutorial group on a scale from 1 to 5, i.e. 1 – insufficient, 2 – reasonable, 3 – sufficient, 4 – good, and 5 – excellent.

Table 1.  Items and factors within the TGEI

Subjects

A total of 483 students responded to the survey with an average response rate of 80%. So in total, 20% of the students did not return the questionnaire. Furthermore, 34 questionnaires with missing information were omitted from the study. Hence, data from 449 students were included in the study (a response rate of 74%). This consisted of first (n = 183), second (n = 156), and (n = 110) third year undergraduate medical students. In total, 52 groups participated in the study. This was made of 20 first year, 19 second year and 13 third year groups. The number of students completing the instrument per group varied between 6 and 11.

Statistical analysis

First of all, mean scores and standard deviations were computed at the item level and at the factor level. It is appropriate to compute mean scores, since research has shown that the Likert-response format produces interval data at the scale level (Carifo & Perla 2008).

A confirmatory factor analysis (CFA) was carried out to assess the adequacy of the three factors underlying the items, to address the construct validity of the instrument. The data were analyzed at the student level (n = 449) as individual students were asked to give their personal opinion about the group productivity. In the confirmatory factor model, specified in this study, all three factors were correlated. Observed items 1–7 were affected by the first factor, observed variables 8–14 were affected by the second factor, observed variables 15–19 by the third factor. All observed variables were assumed to be affected by a unique factor (error in each variable), and no pairs of unique factors were correlated. The skew and kurtosis values of all data used are smaller than ±1.5, or even ±1.0, which implies that they are normally distributed. A maximum likelihood estimation (MLE) was used when conducting the CFA. The AMOS program was used to determine whether the data confirmed the three-factor model (Arbuckle 1999).

The coefficient alpha was computed for each factor to determine the internal consistency of each factor. A coefficient of 0.70 or higher was considered as acceptable. In addition, generalizability studies were conducted to estimate the reliability of each factor and to determine how many student responses are needed per group (Crick & Brennan Citation1983). The analyses were conducted at the individual student level (n = 449). An all-random students-nested-within-groups design was used, with groups as universe of generalization or object of measurement. In total, 52 groups were involved who had each been judged by six students or more. This design allows variance component estimation of two sources: (1) differences between groups (G; object of measurement) and (2) differences between students nested within groups and general error (S: G, e) (Shavelson & Webb Citation1991). Reliability indices (generalizability (G) coefficient and Standard Error of Measurement (SEM)) are reported as function of the number of students completing the questionnaire.

Results

Descriptive statistics

contains the items and its mean scores and standard deviations (SDs). The mean scores and SDs for each factor are reported in , as well as the overall productivity score. As illustrated in this table, the average scores per factor varied between 3.12 and 3.32 (scale 1–5). In the cognitive and demotivational domains, the mean was 3.12 and 3.17 with SD values of 0.81 and 0.82, respectively. In the motivational domain, the mean was 3.32 with an SD of 0.82.

Table 2.  Number of items, number of students, minimum and maximum score, mean score (scale 1–5), SD, and coefficient alpha per factor and/or the overall group productivity score (scale 1–5)

Table 3.  Fit indices of the one-factor, two-factor and three-factor models

Construct validity

The correlation coefficient between factors 1 and 2 is 0.64, between factors 2 and 3, 0.14 and between factors 1 and 3, 0.19. All correlation coefficients were significant (p = 0.01). A CFA was conducted. Several statistics were calculated in order to assess whether the empirical data fit with the theoretical proposed model. Unfortunately there is no single best statistic that gives an insight into the fit of the model. The best a researcher can do is to compute several statistics which reflect the fit of the model (Van Berkel & Schmidt Citation2005). The Chi square divided by the degrees of freedom, i.e. (CMIN/DF) must be less than 3 for correct models. The p-value should be higher than 0.05. The two most important indices are the RMSEA, which should be lower than 0.08 for a good fit and lower than 0.05 for an excellent fit and the CFI which should be 0.90 for a good fit and 0.95 for an excellent fit.

The results of the three-factor model as outlined above, and presented in , showed the following results: chi-square [149 df] = 452.08, p = 0.000, a root mean square residual of 0.067 and a CFI of 0.888. The results for a one-factor model are presented in , and showed the following results: chi-square [153 df] = 965.22, p = 0.000, a root mean square residual of 0.109 and a CFI of 0.701. The CMIN/DF is 6.3 for the one-factor model. The results for a two-factor model are also presented in . The statistics for the three-factor model demonstrated a better fit than the statistics for the one-factor model and the two-factor model. For the three-factor model, the following statistics given in meet the criteria: the CMIN/DF is equal to 3 but not below 3, the p-value does not differ from zero, but a CFI of 0.89 and an RMSEA of 0.07 indicate a good fit of the three-factor model. In general, the results of the CFA indicate that the three-factor model shows a good fit since the two most important conditions are met. Based on the analysis of the data, it can be concluded that the instrument within the setting of this study appeared to reveal fairly valid factor scores.

Reliability

The coefficient alpha calculated indicated high internal consistency for factors 1 and 2, both 0.82 and a lower internal consistency of 0.64 for factor 3, the demotivational factor. The results of the generalizability studies demonstrated that the variance associated with groups for the overall score is 22% as reported in . This percentage is the true variance or the variance of interest. The variance associated with groups varies per factor between 6% and 15%. The estimated variance components were used to estimate reliability indices. provides the G-coefficients per factor as a function of the number of student responses per group and the corresponding error of measurement (SEM). The SEM can be used to estimate confidence intervals for individual scores. The SEM should be lower than or equal to 0.25 (0.5/1.96) at the 95% confidence interval, taking into account a practical significance level of 0.5 point on a scale from 1 to 5. Based on this practical significance level of 0.5 points on a scale 1–5, at least nine student responses are required to obtain reliable results for factor 1, and at least 10 student responses for factors 2 and 3. To obtain a reliable G-coefficient of at least 0.70 or higher, at least nine students’ responses are needed for the overall score.

Table 4.  Estimated variance components for the variance associated with groups and students nested within groups are given (in percentages in between brackets) and the generalizability coefficient (G-coefficient) and SEM, as a function of the number of student ratings (N) for the 3 factors (scale 1–5) and the overall score

Conclusion and discussion

This article focuses on the validity and reliability of an instrument to assess tutorial group effectiveness in PBL environments. All the items were based on Slavin's theoretical framework of collaborative learning that highlights two theoretical perspectives on group learning, one is a cognitive perspective and the other a motivational perspective. The motivational domain indicates the extent to which students motivate, show concern, and help each other learn. The demotivational domain indicated the extent to which nonparticipation of students affects the group dynamics and hence has a negative effect on student learning in these groups. The cognitive domain is based on the interactions and explanations between peers, which enhances learning.

The results of the CFA indicated that a three-factor model revealed a good fit with the data, i.e. the RMSEA and CFA indicated a good fit. Thus, based on the CFA, it can be concluded that the instrument within the setting of this study appeared to reveal valid factor scores. The findings of the generalizability studies indicated that at the factor level, at least 9–10 student responses are needed in order to obtain reliable data at the factor level. This finding indicates that the instrument is reliable since the tutorial group size within this study varied between 6 and 11.

The finding that the factor scores of the instrument within the setting of this study is valid and reliable, implies that the data collected can be used to measure group effectiveness in PBL tutorials. Having an instrument that discriminates between less and more effective groups equips faculty and students with diagnostic information about group performance and learning. This is important as dysfunctional groups negatively affect tutorial group effectiveness (Virtanen et al. Citation1999; Houlden Citation2001; Hendry et al. Citation2003). Thus, the information obtained at the factor level by using the instrument can create awareness and an increased understanding of problematic and well-functioning PBL groups. This awareness and evaluation can encourage tutors/facilitators and students to implement relevant strategies and training to improve tutorial group functioning within a PBL curriculum.

In this study, the moderate ratings of the cognitive and motivational factor and the overall tutorial productivity indicate that there is room for improvement in the tutorial group effectiveness within the setting of this study. The same holds for the score on the demotivational factor. A lack of social cohesion in some may be keeping group morale down.

The validated TGEI in this study can be used to provide students with feedback about the functioning of their group during midterm and at the end of the theme. This information would stimulate students to think about their roles and responsibilities as a collaborative learner and also about their peers contributions and attitudes in the small group setting in order to optimize feedback and tutorial group effectiveness. Useful and timely feedback in the small-group PBL could revitalize the group and individuals and encourage the development of essential skills needed by health professionals (Mennin Citation2007). Thus, the TGEI can be used to provide students and tutors with feedback about the performance of the tutorial group as well as highlight areas of deficiency in group effectiveness.

Although, the CFA and the generalizability study in this article demonstrated that the instrument within the setting of this study revealed valid and reliable factor scores, the data also demonstrated that two factors correlated highly, which indicates that the factors do not discriminate much in terms of group performance. On the contrary, a two-factor solution in which the two high-correlating factors were put together in one factor resulted in a poorer fit as compared to a three-factor solution. Furthermore, the current data were collected in a diverse student population which might limit the generalizability of the findings. Nevertheless, the groups were mixed and well-balanced which might imply that the findings are generalizable toward other PBL curricula. Additional research is needed to assess the use of this instrument in providing insight into the influence of multicultural and multilingual settings on the motivational and cognitive aspects of group effectiveness in PBL. Further validation of the instrument with higher N is also recommended.

Acknowledgments

Thanks to Ron Hoogenboom from Maastricht University for statistical support on running and interpreting the generalizability data and to the National Research Foundation from South Africa for the funding received.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Additional information

Notes on contributors

Veena S. Singaram

VEENA S. SINGARAM is a program director in the Department of Medical Education and a PhD candidate in the Department of Educational Development and Research at the University of Maastricht.

Cees P. M. Van Der Vleuten

CEES P. M. VAN DER VLEUTEN, PhD, is a professor of education and chair of the Department of Educational Development and Research at the University of Maastricht.

Henk Van Berkel

HENK VAN BERKEL, PhD, is an educational psychologist in the Department of Educational Development and Research at the University of Maastricht.

Diana H. J. M. Dolmans

DIANA HJM DOLMANS, PhD, is an educational psychologist and associate professor in the Department of Educational Development and Research at the University of Maastricht.

References

  • Arbuckle JL, Wothke W. 1999. Amos User's Guide, version 4.0. Chicago, IL: Small Waters Corporation.
  • Carifio J, Perla R. Resolving the 50-year debate around using and misusing Likert scales. Med Educ 2008; 42: 1150–1152
  • Carlo MD, Swadi H, Mpofu D. Medical student perceptions of factors affecting productivity of PBL tutorial groups: Does culture influence the outcome?. Teach Learn Med 2003; 15: 59–64
  • Crick JE, Brennan RL. Manual for Genova: A generalized analysis of variance system. American college testing program, Iowa, IA 1983
  • De Grave WS, Dolmans DHJM, Van Der Vleuten CPM. Student perceptions about the occurrence of critical incidents in tutorial groups. Med Teach 2001; 23: 49–54
  • De Grave WS, Dolmans DHJM, Van Der Vleuten CPM. Student perspectives on critical incidents in tutorial groups. Adv Health Sci Educ 2002; 7: 2001–2009
  • Dolmans DHJM, De Grave W, Wolfhagen HAP, Van Der Vleuten CPM. Problem-based learning: Future challenges for educational practice and research. Med Educ 2005; 39: 732–741
  • Dolmans DHJM, Schmidt HG. What do we know about cognitive and motivational effects of small group tutorials in problem-based learning?. Adv Health Sci Educ 2006; 11: 321–336
  • Dolmans DHJM, Wolfhagen HAP, Van Der Vleuten CPM. Motivational and cognitive processes influencing tutorial groups. Acad Med 1998; 73: S22–S24
  • Hendry GH, Ryan G, Harris J. Group problems in problem-based learning. Med Teach 2003; 25: 609–616
  • Houlden RL, Collier CP, Frid PJ, John SL, Pross H. Problems identified by tutors in a hybrid problem-based learning curriculum. Acad Med 2001; 76: 81
  • Mennin S. Small-group problem-based learning as a complex adaptive system. Teach Teach Educ: Int J Res Stud 2007; 23: 303–313
  • Shavelson RJ, Webb NM. Generalizability theory. A primer. Sage, London 1991
  • Slavin RE. Research on cooperative learning and achievement: What we know, what we need to know. Contemp Educ Psychol 1996; 21: 43–69
  • Singaram SV, Dolmans DHJM, Lachman N, Van Der Vleuten CPM. Perceptions of problem-based learning (PBL) group effectiveness in a socially-culturally diverse medical student population. Educ Health 2008; 21: 116
  • Van Berkel H, Schmidt H. On the additional value of lectures in a problem-based curriculum. Educ Health 2005; 18: 45–61
  • Virtanen PJ, Kosunen EAL, Holmberg-Marttila DMH, Virjo IO. What happens in PBL tutorial sessions? Analysis of medical students’ written accounts. Med Teach 1999; 21: 270–276
  • Visschers-Pleijers AJSF, Dolmans DH JM, Wolfhagen, IHAP, Van Der Vleuten CPM. Development and validation of a questionnaire to identify learning-oriented group interactions in PBL. Med Teach 2005; 27: 375–381

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.