402
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

Measuring faculty teaching effectiveness using conditional fixed effects

&
Pages 324-339 | Received 12 Dec 2016, Accepted 13 Jun 2018, Published online: 15 Oct 2018
 

Abstract

Using a dataset of 48 faculty members and 88 courses over 26 semesters, the authors estimate Student Evaluation of Teaching (SET) ratings that are conditional on a multitude of course, faculty, and student attributes. They find that ratings are lower for required courses and those where students report a lower prior level of interest. Controlling for these variables substantially alters the SET ratings for many instructors. The average absolute value of the difference between the faculty ratings controlling just for time effects and fully conditional ratings is nearly one-half of a standard deviation in the students’ rating of how much they learned. This difference produces a change in quartile rank for over half the sample across two summary course evaluation measures.

JEL CODES:

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 Many institutions use a variety of approaches to measure teaching effectiveness, from SETs to classroom observation to review of teaching and assessment materials. However, Becker and Watts (Citation1999) report that the use of SETs is almost universal (of 302 U.S. institutions responding to their survey, only three did not use SETs). Becker, Bosshardt, and Watts (Citation2012) confirm the prevalence of SETs and note further that the average weight across all types of institutions that is given to SETs in evaluating teaching effectiveness is 48.7%.

2 Examples of the summary question are “Overall, how would you rate the teaching effectiveness of this instructor,” “Overall, how would you rate this course,” and “Would you recommend this class to another student?” Most SETs include at least one such question.

3 Semester effects account for any “evaluation inflation” that is present. Although most institutions do not explicitly estimate semester fixed effects, this is approximated by comparing SET scores semester by semester, as many institutions do.

4 It is important to note that we do not claim that the conditional faculty fixed effects that control for multiple factors are a comprehensive measure of teaching effectiveness, but rather that they provide important additional information that may be used in assessing teaching performance.

5 See Ongeri (Citation2009) for a discussion of the characteristics and content of economics courses that may result in negative student evaluations.

6 Al-Issa and Sulieman (Citation2007) identify 2,988 papers that focus on student evaluations of teaching and were published in scholarly journals between 1990 and 2005.

7 It is not surprising that instructor experience has a significant and positive impact on SET ratings. Indeed, if experience helps build human capital, then it may be appropriate to consider experience as a component of teaching effectiveness.

8 The mean principles class size in McPherson’s (Citation2006) data is 82 for principles classes, while the mean class size for upper-level classes is 33. Class sizes in our dataset are no larger than 45 and most are substantially smaller.

9 For example, if SET ratings are substantially lower for required courses, then an instructor’s ranking within a subset of courses that are all required (e.g., principles courses) may not change when course fixed effects are included. However, when all courses are pooled together, the principles instructors’ rankings may improve relative to the nonprinciples instructors’ when course fixed effects (which include whether a course is required) are factored in.

10 Given our result that required courses (e.g., principles) receive lower ratings on average, it is possible that faculty who teach principles courses try to compensate for this trend and therefore, conditional on course attributes, are actually more effective instructors.

11 We note that Carrell and West (Citation2010) do not argue that their approach is one that should be broadly implemented.

12 While a number of other control variables are subsumed by the faculty and course-semester random effects that Carrell and West (Citation2010) include in their estimation, these three attributes vary within either the course-semester or the faculty identifier; they are therefore not accounted for by the random effects and instead enter into the error term.

13 Content validity is suspect both because no precise definition of all the elements of teaching effectiveness exists at most institutions, and students (who complete the SETs) and faculty or administrators (who design them) seem to disagree about what constitutes effective teaching. Construct validity is called into question because, for example, SETs do not effectively distinguish between teaching effectiveness and how much students like the instructor; see Onwuegbuzie, Daniel, and Collins (Citation2009), Benton and Cashin (Citation2012), and Clayson (Citation2015) for further discussion.

14 Naturally, we do not include any variables that are specific to the instructor but do not vary over time, as these are subsumed by the faculty fixed effects.

15 Their approach is applicable only in those institutions where ratings are calculated by averaging an instructor’s scores across not only multiple students but also multiple questions on the instrument.

16 A course-section is defined by a combination of course, meeting time, and instructor.

17 See Sproule (Citation2002) for theory underlying the inclusion of these variables.

18 The GPA gives the average of students’ self-reported GPA across all courses prior to that semester. The gender mix of the section comprises two variables: the proportion of the students who were male, and an interaction between this proportion and a binary variable equal to one if the instructor was male. The grade level mix of the section comprises three variables: the proportion of students who were sophomores, juniors, and seniors (first-years was the excluded category). Class length is a binary variable equal to one if the class met three times a week for 50 minutes; otherwise, the class met twice a week for 75 minutes each. Class time comprises two variables: the first is a binary variable equal to one if the class met before 9:00 am, the second is a binary variable equal to one if the class met after 2:00 pm, and the excluded meeting times are those in the middle of the day.

19 Note that Rothstein (Citation2010) refers to “teacher” and “classroom” effects interchangeably because teachers are assigned to a single classroom in his data on elementary students in North Carolina. For a survey of value-added estimation, see McCaffrey et al. (Citation2004).

20 We also estimate equation (1) using weighted least squares, with weights equal to the number of student responses in each section. Results are qualitatively similar and are available from us upon request.

21 Of course, omitting factors that are significant in explaining variation in the SET ratings only results in biased coefficient estimates if the excluded variables are correlated with the included variables. As Greene (Citation2000, 229) explicitly states, “If the variables in a multiple regression are not correlated (i.e. are orthogonal), then the multiple regression slopes are the same as the slopes in the individual simple regression.” This property of multiple regression estimates implies that a simple comparison of raw SET averages across faculty members may in fact be an accurate estimate of teaching effectiveness, as long as the course and faculty characteristics that are significant in explaining variation in the raw SET scores (which are by definition not conditional) are not correlated with the (unobserved) teaching effectiveness of the instructors teaching those courses.

22 McPherson (Citation2006) and Ragan and Walia (Citation2010) both find that increased class size leads to lower ratings; Ragan and Walia also find that the effect decreases with class size.

23 The negative relationship between class size and perceived learning may be due to focusing less attention on each student during class time. Negative student load and course prep coefficients could easily be explained by faculty having less time to spend providing individual assistance to students outside of class or improving the quality of any individual course. A positive coefficient for sections taught may be attributable to faculty devoting more attention and effort to their teaching during semesters when they have a higher teaching load and focusing more on research and other activities when they have fewer teaching responsibilities.

24 It is, of course, also possible that instructors try to incentivize students to give high SET ratings with easy courses in which students received high grades but do not learn a lot. However, because the GPA is the average of students’ GPAs across all prior classes and does not include the class being evaluated, the grade inflation would have to be present in all courses across the entire university in order for a high GPA to indicate easy courses rather than better students.

25 While cognitive dissonance may be a problem in this instance due to asking students at the end of the semester to recall their level of interest in the subject prior to taking the class, any bias resulting from this would actually be likely to result in under-estimating the relationship between prior interest and amount learned. Students who do not enjoy a course may well perceive both that they have learned little and that the instructor has diminished their interest in the subject. In this case, the prior interest rating will be biased upward but the amount learned rating will be biased downward, implying that the coefficient is actually under-estimated in our results.

26 Naturally, this requires dropping the binary variable that indicates a required course. Additionally, controlling for course fixed effects eliminates two faculty from the estimated faculty fixed effects due to perfect collinearity, resulting in 45 faculty fixed effects estimated, relative to the omitted average faculty member.

27 A significant portion of this is due to including the required course and prior interest variables. When faculty fixed effects are estimated without controlling for these two variables, the correlation with the baseline faculty fixed effects is 0.98.

28 This could be due to a handful of introductory courses, populated primarily with first-year students, receiving higher than average scores in amount learned.

29 The overall results are not driven by these individual faculty members.

30 The differences are somewhat skewed: just under 25 percent of faculty included in the sample receive unconditional ratings that are higher than the fully conditional ratings, indicating that a portion of their high unconditional ratings are actually due to course fixed effects or course-section or faculty-semester characteristics, e.g., small class size or student load, students with high GPAs, etc. But three quarters (76%) of faculty receive unconditional ratings that are lower than the fully conditional ratings.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 130.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.