11,068
Views
10
CrossRef citations to date
0
Altmetric
Editorial

Confirmation bias and methodology in social science: an editorial

Abstract

While science is presumably objective, scholars are humans, with subjective biases. Those biases can lead to distortions in how they develop and use scientific theory and how they apply their research methodologies. The numerous ways in which confirmation bias may influence attempts to accept or reject the null hypothesis are discussed, with implications for research, teaching, and public policy development.

Introduction

As I depart as editor of Marriage & Family Review after 11 years, I want to provide a final editorial that summarizes one of my main concerns with the state of social science, perhaps even all science today. All science is in principle objective. However, science is conducted by humans and as humans we are subject to our own biases and prejudices. It is quite possible that a scientist will turn to science to confirm those same biases and prejudices rather than to disconfirm them. Some social science theories may predict that, but my argument is that science will do better if we try as best as we can to resist such tendencies, even as we acknowledge them. The main thesis of this report is that the public and scientists themselves have underestimated the impact of such biases on the production of science, leading to overconfidence in research findings and in some cases inappropriate application to public policy. Sarewitz (Citation2012) has stated that “Alarming cracks are starting to penetrate deep into the scientific edifice. They threaten the status of science and its value to society. And they cannot be blamed on the usual suspects – inadequate funding, misconduct, political interference, an illiterate public. Their cause is bias, and the threat they pose goes to the heart of research” (p. 149).

I think it’s safe to say that most of us like to receive approval for our thoughts, words, and behavior. It makes us feel confirmed, supported, and happier. After all, who wants to be found to have been “inaccurate” or “wrong”? This ordinary human need and tendency poses a risk for science if we, as scientists and/or the public, allow it to get in the way of facts or what some might call “truth”. In this report, I will show systematically how scientific methods can be twisted in ways that distort facts or truth in order to confirm (i.e., confirmation bias) what we want to hear even if false rather than what we need to hear, when correct or true.

Basic question

At the most basic level of hypothesis testing, one wants to test a null hypothesis. If the null hypothesis is correct, then there should be no difference between two (or more) groups or no correlation between two (or more) variables. In advance of testing one’s hypotheses, a scholar determines the expected result—is it expected that the null hypothesis will be rejected, suggesting that there are differences or correlations or is it expected that the null hypothesis will not be rejected, suggesting that there are no differences or correlations? Of course, a scholar might be disinterested and only want to find out what is the result, either way. However, in many cases the scientist starts with a bias in favor of or against the null hypothesis. There may be strong value-based reasons for that bias. For example, social scientists may be hired by the military to prove that overseas family separations are not stressful for families of the military service member. If the scientists find results that please the military, they may get further grants. If the scientists find that family separations are stressful, that may end any further grants. If military families are told that the scientific research proves that family separations are not stressful, the families may feel they are being gaslighted. If the military families are told that the research shows that separations are stressful, they may feel validated but also push back and demand a reduction in overseas military adventures by their nation’s politicians. There may be nothing wrong with having such biases, but I think that better scientists will manage their research carefully to minimize the impact of those biases. Some may disagree and feel that the end justifies the means, that cheating on research is acceptable if it’s done for the right political cause or even merely to sustain a chain of grant funding. That’s not my view because I see facts as useful as long as they are accurate, even if they are not pleasing for whatever reason.

How does methodology tie into bias? I will suppose that one researcher seeks to reject the null hypothesis while another seeks to accept/not reject it. What methods would each researcher select if they wanted to bias the outcome in their preferred direction? Those are precisely the methods that each researcher needs to guard against because those methods will tend to produce the desired result. A deeper approach to guarding against one’s own bias would be to take the role of one’s critics and try to disprove one’s own findings by seeking alternative explanations for the results but I will say more about that later. For example, suppose you encounter the research reports of a scholar whose value system would support retaining the null hypothesis; their credibility may be much greater if they seem to be taking precautions against falling into the traps in under the column “favors null hypothesis” and/or if they use methods associated with those under the column for “favors rejecting the null hypothesis”. The reverse would be true if a scholar was believed to have a bias in favor of rejecting the null hypothesis.

Table 1. How to use the research process to yield outcomes for or against the null hypothesis.

presents information on what types of methods will tend to produce results in favor of or against the null hypothesis.

What does better science look like? In contrast to the methods discussed in , I will discuss a better way to conduct research with less bias for or against the null hypothesis, in terms of both theory development and research methodology.

Theory

Scholars should use more than one theory in any explanation or prediction. Ideally they should use theories that fit their expectations as well as theories that might not. The research should be well enough designed that the results lend or do not lend support to specific theories, so theory construction can advance as well as our research knowledge. Scholars should be prepared emotionally to consider/accept results that they did not anticipate and to publish those with as much enthusiasm as results they did anticipate. In my view, it’s not enough to test an idea without theory. For example, suppose you are on a grant from the Pentagon and they want you to prove the null hypothesis, that the experience of overseas combat deployments does not influence military family divorce rates. You could survey service members while they are deployed or check their military records for marital status changes and likely find no differences (because divorces are difficult to obtain while a service member is overseas on a deployment). But what about theory? Common sense would suggest that separating a loving family member from the family would be a stressor by itself, no matter the occupation. Most people probably are in families because they want to be in them; involuntary separations would disrupt that emotional support. It would also be likely that separating an abusive family member might be associated with a decrease in stress. If the separation is accompanied by a pay increase, the change in the financial situation might offset some of the effects of stress from separation. Barriers to divorce might be important to understand because not all unhappy families end up divorced. Unless theoretical issues like those are taken into consideration, research findings by themselves, while perhaps popular with the funding agency, might not tell us much.

Concepts

Sample size may limit the number of concepts that might be testable but aside from such practical considerations, a wide range of concepts should be considered as a component of using theory in the planning and interpretation of research. For example, perhaps a scholar thinks that discrimination might explain lower educational achievement in some minority group; might there be alternative explanations related to alternative concepts that might be useful to investigate? One might well expect perceptions of discrimination to be associated with reported educational achievement but what about building a more complex model, even taking the risk of some of one’s ideas being falsified?

Co-researchers

One can assemble a group of like-minded scholars to discuss a controversial issue but we suspect that better theory and more useful concepts may result from discussion among a diversity of minds, even minds that are associated with diverse values and political goals. Perhaps escaping one’s comfort zone might lead to more productive research and research that might be interesting and helpful to a larger span of the population.

Literature reviews

We have observed that a process can develop in which an early literature reviews finds in favor of some result and then dozens of subsequent literature reviews more or less regurgitate the same findings and quote mostly the same primary sources, again and again. This may be especially true if publishing such follow-on literature reviews serve to prove one’s academic political correctness and serve to signal interviewers and promotion committees that one has the “correct” values and perspectives for the position. While it used to be deemed sufficient to publish two or three literature reviews in an area every ten years, now we have found areas in which upwards of a hundred literature reviews may be published every decade. Surely that is a type of overkill. When preparing literature reviews a scholar should insist on contributing something new to the literature rather than merely repeating what previous reviews have already discussed. In particular, a scholar should be careful to find research that may not fit conventional scholarly wisdom and include such research along with more accepted findings. Furthermore, we think that better literature reviews will consider previous qualitative and quantitative research rather than just any one type of methodology; if the focus is on one type of research, that should be clearly explained as a major limitation.

Methodology

Qualitative research can be very useful for starting an investigation into a new area of study, providing suggestions for concepts to be measured and hypotheses to be tested. However, at some point, if research is to become generalizeable to the public and useful for public policy, quantitative research will need to be done. Qualitative research can continue to be useful to seek out overlooked concepts and untested ideas or to explain anomalies found, but not explained, in quantitative research.

Concepts will need to be measured with high reliability and validity, which can be an important part of research by itself. If there are known concepts that might invalidate other concepts, they should be measured (e.g., social desirability response bias) and included in quantitative models being tested. Ideally, concepts should be used that involve different sources, so as to avoid correlations among variables that are merely due to common methods variance.

Samples should be random and relatively free from selection bias whenever possible. If possible, researchers and participants should be blind to the values or objectives of the study, so that their biases don’t bias the study outcomes. Larger samples will allow for testing more concepts within the same statistical models or for testing similar models across different subsamples. Sample characteristics should be relevant to the theories being tested; a sample of elderly women might not be relevant for developing theory about the emotional development of adolescent boys. Likewise, a study of homosexual attractions among young girls might be less relevant if the sample consisted of girls under the age of six where sexual attraction may not be a particularly relevant issue.

Analyses should fit measurement; if variables are not reliable and valid, it is expecting too much for fancy analyses to salvage poor measurement, even though sometimes latent variables are used for that purpose. If variables are not normally distributed, then nonparametric analyses should be used, when possible. If variables are normally distributed, then parametric statistics should be used. Either way, attention should be paid to the assumptions required by the statistics to be used. If nonlinear patterns are anticipated, then variables need to have enough dispersion to permit testing of such patterns. If subgroup analyses are anticipated, then the subgroups will need to be measured properly so that they can be studied separately. If social desirability is an issue, it will need to be measured and controlled either statistically or by design. The temptation for fishing expeditions must be avoided; the best way to do that is to develop your statistical tests based on sound theory rather than from random ideas that seem interesting.

Results

Results should be reported regardless of whether they did favor or did not favor your expected outcomes. Sometimes unexpected results can lead to a great deal of advancement in science, perhaps more than from finding anticipated results. We think that media concerns should be considered but should not dictate whether results will be released. We have seen cases where research findings were delayed in publication by decades because sponsors feared what media might have done with them. One time, about 1996, we found a 70% divorce rate in two different sample surveys for the same subgroup of deployed military respondents, but the sponsor was worried what the media would say and rejected any attempt to try to publish those results. Cases have occurred where sponsors changed passwords on data sets and computers in order to keep researchers from reporting undesired rejections of the null hypothesis. Researchers should take precautions to avoid such controls, if at all possible. Results should be discussed not only in terms of statistical significance but in terms of effect sizes, so their substantive meaning will be clearer to readers. Results should be published in journals that will give a fair hearing to a diversity of ideas, theories, concepts, and outcomes rather than being mere mouthpieces for restricted points of view.

Implications

Weak research should not be presented to the public as revolutionary or applicable to the entire population. Weak research may be interesting and may indicate productive future directions for further research. If the research is ground-breaking or may be generalized to the whole population, then the general validity of that research should be explained in detail, not merely asserted. Limitations must be discussed and taken seriously in terms of limiting potential policy applications. Limited research is not necessarily “bad”; it may only be useful as an indication of how future research might be done more rigorously or with different populations. I have seen very small sample, nonrandom articles become the center of academic firestorms, with different groups trying to make more out of the article than it was worth, either way. Both sides should have resisted attempts to claim more for the article than it deserved in terms of attention. In the end, its usefulness as a stimulator for better, future research was overlooked.

Conclusion

Too often scholars attempt to, consciously or unconsciously, find support for their pet ideas and theories without giving full consideration to opposing ideas or theories. There are numerous ways that research outcomes can be predetermined by using no theory, limited theory, or biased methods. A few scientists have made up or fabricated their results in order to ensure they report desired outcomes. Fortunately, there are ways to be aware of and perhaps reduce personal or political bias throughout the research and publication process, as we have discussed. To the extent that biased research continues to be presented or published, students and scholars may refer to for guidelines for detecting bias and its methodological associations. Ideally, scholars should not merely test their own theories but test their data against theories of their critics, doing their best to prove themselves incorrect.

Teaching discussion questions

Does this editorial make you feel more cynical about research or more hopeful? Or both at the same time? Or neither? What other feelings, if any, did you experience when reviewing it?

What did the author leave out that might be important? What was included that you don’t think should have been considered?

Under what conditions might scientists be eager (1) to “prove” the null hypothesis or (2) to see the null hypothesis rejected?

What should a student/scientist do if your preferred outcomes are not “found” in your own or someone else’s research? What if your research findings seem to contradict your desired political goals or might be interpreted that way by others, even if they don’t actually conflict?

Could a scientist be biased but not aware of it, even though they were using one side or the other of in the way they were going about their research? Could they be biased and aware of it, but just don’t care? Could they see bias as legitimate as long as it leads to outcomes favorable to their political ideals?

Is a student/scientist inherently biased if they were to use the information in to raise questions about possible bias, one way or the other, of other scientists? Is studying mean that the reader is inherently biased in some way?

The issue in question is whether or not variable A is related to variable B. The responses for both variables are between 1 and 5. In one study with eleven cases, the correlation between variable A (mean, 2.55; SD, 0.93) and variable B (2.64, 1.36) is found to be r = 0.407 (p > .20). The authors claim that this is definitive proof that variables A and B are not associated with each other and that policymakers should base policy for the entire nation on their findings. A second group of authors conduct a large random survey with several thousand cases and find that the correlation between the same two variables is r = .05 (p < .0001). They claim that this is definitive proof that variables and A and B are associated with each other and that policymakers should base policy for the entire nation on their findings. What would you make of each of these claims?

Acknowledgments

This editorial was written during the COVID pandemic, without notes or a supply of journal articles. However, many of the ideas have come from my associations with numerous scholars since 1974, including Duane Crawford, Rick Scheidt, Anthony Jurich, Stephan R. Bollman, Charles Figley, Wallace Denton, Loren Marks, William Goodman, D. Bruce Bell, Paul Gade, Pauline Boss, William Doherty, Ralph LaRossa, Suzanne Steinmetz, Farrell Webb, and Reuben Hill, among many, many others, including many of my students, not named here.

Reference

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.