1,062
Views
1
CrossRef citations to date
0
Altmetric
Guest Editorial

JPAE at 25: Looking back and moving forward on teaching evaluations

ABSTRACT

In many if not most colleges and universities in the United States, raw scores from Student Evaluations of Teaching (SETs) are the primary tool of teaching assessment, and teaching evaluations often have real consequences for promotion and tenure. In 2005, JPAE published an article on teaching evaluations, and this article added to what was at that time a somewhat thin literature indicating that SETs are systematically biased against female faculty, and probably against older and minority faculty. Since that time, this literature has swelled and grown and now the evidence that SETs are invalid and systematically biased is too strong to ignore. Over its first 25 years, JPAE has been a force for good in public affairs education. As JPAE moves into its next 25 years, it should take a principled and evidence-based stand against the use of raw SETs as an important indicator of teaching quality, and should encourage high-quality articles studying other methods of assessing teaching so that we can learn what approaches are better.

Introduction

Since its inception, the Journal of Public Affairs Education – affectionately known as JPAE – has been a promoter of excellence in public affairs (PA) education.Footnote1 Its very existence validated taking teaching seriously and gave serious teachers a double-blind refereed outlet for research on teaching, thus justifying work on teaching within the structure and incentives of the academy. Through its articles, it has focused attention on a variety of different teaching methods, including active learning and service learning, as well as online learning, with significant focus and even experimentation. It has also, from very early on, turned attention to the teaching of ethics and integrity in PA programs, with the first number of volume 4 including two articles and a special issue on teaching ethics in the Master of Pubic Administration (JPAE Citation1998), and with many articles on this topic since. It has addressed important social issues, calling attention to issues of minorities in the academy, PA students who will return from PA education in the US to oppressive regimes, and cultural competency. In short, JPAE, NASPAA’s flagship journal, has been a voice for good in the PA fields.Footnote2

However, there is one issue that JPAE has largely ignored but now should address: the systematic bias in Student Evaluations of Teaching (SETs). Both from a legal perspective (Lawrence, Citation2018; Mitchell, Citation2018) and from an ethical perspective, mounting evidence (reviewed below) that SETs are systematically biased against females and probably other groups invalidates their primary use.

Some evidence that SETs are systematically biasedFootnote3

The following review is by no means a full evaluation of what is becoming a large literature on this topic. But it illustrates that the evidence that SETs are biased and invalid is longstanding and robust, and crosses a variety of research approaches.

An early article on the issue of biases in teaching evaluations was published by Elaine Martin in 1984. Martin (Citation1984) provides a good overview of related literature preceding her study and notes that, even in “laboratory research,” “sexism biases evaluations of the work of men and women” (p. 484). In her own study, she finds that sex bias was “most prevalent when students evaluate female social science instructors” (Martin, Citation1984, p. 489). As stated by Lisa Martin in 2016, “More than 30 years ago, [E.] Martin (Citation1984) wrote that the ‘message to women faculty seems clear: if your institution bases personnel decisions on student evaluations, make sure your colleagues are aware of the possibility of sex bias’ (Martin, Citation1984, p. 492)” (L. Martin, Citation2016, p. 317).

Ten years later, in work directly related to teaching in PA fields, Laura Langbein, of American University’s School of Public Affairs, published research indicating that teaching evaluations in her school were systematically biased against female instructors, and especially against female instructors who were tough graders (Langbein, Citation1994). This accords with much-earlier work by Kaschak (Citation1978), indicating that “male students were far more likely to give lower ratings to those female faculty perceived to be hard graders” (Martin, Citation1984, p. 484). Perhaps even more troubling, however, Langbein’s work goes beyond a concern with systematic bias to a broader concern with validity overall. She finds that, “of the variables examined, course characteristics have the smallest impact on student ratings, student characteristics have a mid-range impact, and the faculty characteristics of gender and experience have clearly the largest impact” (Citation1994, p. 551; emphasis added). Pondering on all of her results, she muses: “It is, in fact, unclear exactly what the student ratings really measure” (p. 551). This concern is supported in other work, for the amalgam of factors that have been found to affect teaching evaluations indicate that even non-gender-biased SET numbers would often include several factors that have nothing directly to do with instructor skill, including the size of the class, the time of day, and whether the class is required or elective (see reviews, including in Baldwin & Blattner, Citation2003).

What appears to be the first articles in JPAE discussing teaching evaluations was provided by Leslie Whittington, who won NASPAA’s Excellence in Teaching Award in 2000.Footnote4 In her article Whittington points out, as have many others, that “Students’ evaluations of their teachers are frequently the sole method that academic institutions use to determine the quality of the individual faculty members” (Citation2001, p. 5). She also argues that there is evidence that SETs are reliable and therefore scholarship demands that we use them. In her piece, Langbein (Citation1994) also argues that there is evidence of SET reliability, but states that we have much less evidence of validity.

What appears to be the first article in JPAE on issues of discrimination in SET appears in 2005 (Campbell, Steiner, and Gerdes). Overall, the authors find that

The results provide some useful information about how better to connect with students but also indicate that SETs are systematically biased against female teachers, older teachers, and perhaps minority teachers [as is not uncommon, sample size for minority professors was too small for statistical significance but estimated coefficients indicate bias]. These findings call into question de facto higher education policy making SETs our most important measure of teaching quality. (Citation2005, p. 211)

The quality of this article was judged high in the NASPAA community: In 2006, it won the “NASPAA Outstanding Article Award, 2005.” In this study, on the bright side, how much the students judged they learned was a very important predictor of teaching evaluation, with a 2-point increase in learning worth 1.2 points on a 10-point scale; unfortunately, the second-most important factor was sex, with a female instructor earning 0.8 points less on a 10-point scale than an otherwise statistically identical male.Footnote5 The results indicate that teachers can improve their SETs by giving review sessions and extra credit, but “the decrease in SETs caused by gender swamps the estimated effect of giving review sessions (−0.285) and giving extra credit (−0.148), combined” (p. 227, emphasis added).

Since that time JPAE has had few articles on teaching evaluations and no published follow-up on whether biases still exist or may have eroded over time. For example, published in JPAE seven years later, Otani, Kim, and Cho (Citation2012) references Campbell et al. (Citation2005), but, focusing on “how to use SET more effectively and efficiently” in PA education, ignores the potential effects of race/ethnicity, age, and gender in the analysis. Therefore, though otherwise a quality article, we must assume that the statistical results exhibit omitted variable bias. I was pleased to see that the call for papers for the special symposium issue of JPAE sponsored by Academic Women in Public Administration (AWPA) explicitly mentioned “gender bias in student evaluations” (AWPA April 2018, n.p.), but since abstracts are due September 1 and notification of acceptance will not be until November 15, it is unclear if this will lead to additional articles in JPAE on this topic, or not.

In the social sciences broadly, however, we have seen an increasing number of articles on this topic, finding various different types of analysis showing systematic biases, especially against women, but also indicating disturbing trends for other non-white-male groups. For example, writing more than 30 years after E. Martin, L. Martin (Citation2016) finds evidence of sex bias in evaluations of large Political Science classes.

E. Martin (Citation1984), Langbein (Citation1994), Campbell, Gerdes and Steiner (Citation2005), and L. Martin (Citation2016) are all cross-sectional and statistically inferential, making strong causal arguments less persuasive. But various experimental and quasi-experimental methods since provide greater evidence of causation. As reviewed by L. Martin (Citation2016),

Arbuckle and Williams (Citation2003) undertook a fascinating experiment in which students viewed a stick figure that delivered a short lecture. All participants observed the same stick figure and the same lecture but the figures were given labels of old or young and male or female. Participants significantly rated the figure labeled as a young male as the most expressive, which illustrates that students’ expectations influence their perception of an instructor independent of the material or how it is delivered. (p. 314)

Recently, MacNell, Driscoll, and Hunt (Citation2015) used the reality of distance in online education to advantage:

it is possible to disguise an instructor’s gender identity online. In our experiment, assistant instructors in an online class each operated under two different gender identities. Students rated the male identity significantly higher than the female identity, regardless of the instructor’s actual gender. (p. 291)

Perhaps even more strikingly, this occurred even on factors that would appear to be fairly objective (L. Martin, Citation2016):

For example, when the actual male and female instructors posted grades after two days as a male, this was considered by students to be a 4.35 out of 5 level of promptness, but when the same two instructors posted grades at the same time as a female, it was considered to be a 3.55 out of 5 level of promptness. (p. 330, emphasis in original)

Mitchell and Martin (Citation2018) also found that “a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific” (p. 648).

L. Martin also reviews work by Miller and Chamberlin (Citation2000) that indicates that students “perceive male instructors as having higher or superior credentials” (Citation2016, p. 314). In keeping with this finding, I once received a student note in my faculty mailbox that said “Dear Mrs. Campbell, Dr. David Pijawka suggested that I contact you because you are an expert on….” Apparently, even the fact that I was recommended as an expert by someone who was himself viewed as an expert did not overcome the idea that I was a wife rather than a professor. El-Alayli, Hansen-Brown, and Ceynar (Citation2018) find that certain students “request more special favors from female professors” and exhibit “negative emotional and behavioral reactions to having those requests denied. This work highlights the extra burdens felt by female professors” (p. 136).

In addition to concern regarding bias, research since Langbein (Citation1994) has supported the idea that teaching evaluations are not valid. For example, recent work by Anne Boring indicates that “Men are perceived by both male and female students as being more knowledgeable and having stronger class leadership skills… despite the fact that students appear to learn as much from women as from men” (Citation2016, p. 27). Lawrence (Citation2018) finds the accumulation of evidence regarding the validity of SETs so compelling that he simply entitles his article “Student Evaluations of Teaching are Not Valid.”

Conclusions

I am not by any means the first person to argue that we should stop relying on raw SETs as our primary indicator of teaching quality – nor is this the first time I have argued this. But we are at a time when the barrage of evidence and the chorus of calls means that we could possibly be approaching a social tipping point. Having an important champion such as JPAE could reduce the influence of this severely flawed method on academic PA careers. Based on the increase in evidence, some schools are beginning to move away from SETs (Flaherty, Citation2018). JPAE could help encourage this throughout PA programs. It is not enough for SETs to be reliable, they must also be valid (Langbein, Citation1994) – but increasing evidence shows that they are not. To reframe one of Whittington’s important points: “to disregard the large supporting body that documents that fact is simply not scholarly” (Citation2001, p. 5).

NASPAA and its flagship journal, JPAE, have been influences for good in PA education. Encouraging PA programs to drop this severely flawed, inequitable, unethical, and possibly illegal primary method of evaluating teaching could help the discipline. And JPAE itself can encourage the production of research that helps us learn what we should be doing instead.

Right now, we know that SETs are flawed, but we don’t necessarily know what is better; schools that are dropping SETs are trying something else (Flaherty, Citation2018), but will these methods exhibit the same biases? As Laura Langbein ended her article in 1994:

It is probably a good time … to supplement the SETs with other, related tools that share similar strengths but have different weaknesses. It is not reasonable to expect that a single methodology will measure teaching quality with reliability and validity….Together, however, the use of multiple measures makes it possible to attain a reasonable degree of both reliability and validity. (Langbein, Citation1994, p. 552)

It has been almost 25 years since Langbein reached this conclusion. Perhaps it was not the time then, but it is certainly time now. JPAE, champion the charge to find valid and unbiased measures of teaching quality!

Additional information

Notes on contributors

Heather E. Campbell

Heather E. Campbell is the chair of the Department of Politics and Government at Claremont Graduate University. She received her BA from the University of California at San Diego and her MPhil and PhD from Carnegie Mellon University. She served as editor-in-chief of the Journal of Public Affairs Education from 2009 to 2010.

Notes

1. Before 1998 JPAE was called the Journal of Public Administration Education.

2. NASPAA, formerly the National Association of Schools of Public Affairs and Administration, is now the Network of Schools of Public Policy, Affairs, and Administration.

3. This section owes thanks to Mitchell’s 2018 article in Slate for identifying a number of interesting articles on this topic.

4. This award was later renamed after her and is now called the NASPAA Leslie A. Whittington Excellence in Teaching Award. I note that when Whittington won the Excellence in Teaching Award in 2000 she was only the second female out of 8 recipients; in the history of the award under either name, 16 males and 8 females have won this important national award (http://www.naspaa.org/principals/awards/past.asp#Leslie).

5. This result was tied with a 2-point increase in whether the instructor was judged to use student feedback, which resulted in a 0.8-point increase in the SET, cet. par. (p. 226).

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.