2,078
Views
7
CrossRef citations to date
0
Altmetric
Editorial

Can we trust observational data? Keeping bias in mind

, &
Pages 579-582 | Received 22 Oct 2019, Accepted 22 Oct 2019, Published online: 22 Nov 2019

“I have been asked ‘Pray Mr Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not rightly able to apprehend the kind of confusion of ideas that could provoke such a question” (Babbage, Citation1864).

The GIGO principle – that Garbage In results in Garbage Out - has obviously been noted but not generally understood since the very earliest days of computing, and still needs to be kept in mind today. A timely reminder of the effects of clear reporting and constructive criticism was described on August 2019 in BMJ Evidence Based Medicine. That paper reported that clinical trials in top psychiatry and psychology journals hyped their results. This is scarcely surprising because journals naturally prefer optimistic outcomes: a paper proving a negative has less chance of being published. What was shocking was that even when the results were not statistically significant on the pre-planned analyses, the paper abstracts still produced a positive spin (Jellison et al., Citation2019). This got us thinking about observational studies which similarly refer to positive results but when looking at the strengths and limitations often refer to issues about generalisation to other groups than the one studied. We have looked at papers in this journal to understand the issues of external (and internal) validity of observational studies as well as referring to problems that seem to be overlooked by many papers submitted to, but not published in, this journal, in the hope that authors will consider them in the future.

Does size matter?

Clearly, it does and size is often mentioned by authors. In an interesting study of cluster analysis Windgassen, Moss-Morris, Goldsmith, and Chalder (Citation2018), point out that “Sample size depends on the number of clusters… and the number of items/variables…” They suggest a minimum of 200 samples are necessary, with some authorities suggesting a minimum of as many as 500. These large numbers of participants are important because the more people who answer the questions then the more likely the results are to be precise but large numbers do not necessarily mean greater generalisation. Large numbers allow authors to test more than one question or at least to include more variables and some questions may be so subtle that many variables are needed to tease out complex relationships. But, and it is a big but, if the data are collected online or from a very selected sample then being big is not necessarily an asset. Basically, sample size does not address potential biases. One example of a study with clear descriptions of the large sample (N2551) and relevant variables was recently published in this journal and does describe in detail the nature of the sample (Arnaez, Krendl, McCormick, Chen, & Chomistek, Citation2019).

Larger numbers are often achieved through online data collection, but this introduces issues particularly pertinent to studies of health, such as who completes the survey and why they complete it? For example, Morrison, Stomski, and Meehan (Citation2018) posted a survey invitation in a newsletter sent to c. 3300 nurses and analysed the replies from the 171 who decided to respond. The results are interesting and relevant, but a larger non self-selected sample would have been more authoritative. Similarly, Kucharska (Citation2018) recruited female students at university campuses all over Poland, her assistants recruited in the breaks between classes, and adverts with a link to the study were posted on social media. After all this, she only ended up with 277 respondents. These are clearly not “fairly representative of Polish college population” they are a small self-selected group.

Those completing surveys online will be a group of engaged web users, who know about the survey and are motivated to complete it. We know that people with severe mental health problems are often excluded from this group (Ennis, Rose, Denis, Pandit, & Wykes, Citation2012; Robotham, Satkunanathan, Doughty, & Wykes, Citation2016) and that many people experiencing significant mental distress are just trying to get through each day. Online samples, no matter how large, are unlikely to contain groups of people who are struggling to concentrate, have decided to remove themselves from social media to recover or do not even have access to digital services. Dalum et al. (Citation2018), for example, noted that the level of “missing” data was “high due to the high drop-out rate.” This does not necessarily invalidate their results, but it is unlikely that dropouts from a sample of people with severe mental distress are purely random. Drop-outs can even provide valuable information. For example, Haddock et al. (Citation2018) started their survey under the assumption that “all people… want CBT but cannot get it” and were therefore taken by surprise when “a significant number of people did not choose to have therapy.” An interesting, though perhaps obvious, finding that could easily have been missed. Surveys may also end up with unbalanced samples due to varying clinical judgments as to who is suitable for inclusion. See Pinfold et al. (Citation2019) for a discussion of this problem.

Who completes a survey?

Some surveys are very long taking up to 40 minutes to complete, such as the recent BBC All in the Mind loneliness survey. This may mean only a cursory consideration of questions and mid-range or even random reporting. It is also those who are interested in the topic and hear about the survey who will even start to complete it, and then it is those not irritated by the length who will finally produce answers to all the questions. Data from long surveys are likely to be more biased towards people who have the time and capacity because they are not overly busy and have the motivation to finish what may be a boring set of questions. This is what is called a self-selecting sample.

There are other considerations to be taken into account. For instance, in the BBC loneliness survey (Hammond, Citation2018), there was a comparison of prevalence in younger and older people. In young people it was near 40% but for older people, it was 27%. These figures challenge current perceptions, but we already have a mismatch across the age range. More younger people have access to digital tools and may be motivated to complete surveys if they are lonely. In contrast, older people have less digital access and those who are lonely are the least likely to be using the internet or social media. Any conclusions about differences in prevalence will, therefore, need to be taken with a pinch of salt. The bias is likely to be obvious when we consider that other more complete and less self-selecting surveys have different prevalence (40% for young people in the BBC sample vs 10% in the Office of National Statistics Citation2018). Such a measurement bias “is commonly neglected. In a clinical setting a consistent bias leads to over- or under-diagnosis” (Vitoratou & Pickles, Citation2017).

The Journal of Mental Health has received many surveys of university students and their mental health difficulties. These samples are usually large groups of individuals who either complete the survey as part of their course (a more complete sample) or the circumstances of survey completion are hidden so we do not know the base rate of responding. If only 10% of potential participants take part, then it is hard to say that the results derived from the sample can be generalised even to the group from which they were drawn. Again, this is a problem of self-selected samples.

Context does matter

Sometimes researchers take questions out of a survey and compare them with other surveys which purport to ask the same question. However, the phrasing of questions can dramatically affect the way that a question is interpreted, making the comparison difficult to make without any evidence that individuals would have answered the question in the same way in the different samples.

The question might also be very complex with two sets of negatives making the “right” answer obscure. In addition, the surrounding questions affect the way a question might be answered. For instance, survey designers sometimes change the polarity of the positive answers to deter respondents from just marking one end of a continuum. If a respondent gets two or three questions where agreement is to the right of the page and then gets one where agreement is to the left, then potentially mistakes might be made by a less than focussed participants.

The actual questions leading up to the key one can also lead the respondent into different mindsets and so produce different responses to the same question. In the table below we use some leading questions that a fictional (but highly skilled) senior civil servant uses to show how different, or indeed opposite conclusions might be drawn by the same person to demonstrate how this context works (Leading Questions – https://www.youtube.com/watch?v=G0ZZJXw4MTA).

Context allows people to reflect on the questions and can lead them to different conclusions, so epidemiologists need to be wary of looking across cohorts if the surrounding questions are very different.

Context also takes a further form – the where and when of completing the questionnaires. If we take the where first. For online surveys, we know that people from across the world are likely to complete them as long as they understand them. We are often not provided with much information about whether country of origin affects the results. As a journal wanting to attract papers of international interest this is a real problem, so we would urge authors to design their studies so they can discover whether and how, for instance, individuals from the global south differ from those in the USA. Data from within a single country can be informative if they are placed in context. We have seen (and rejected) papers where context has not been considered. A simple example is in papers where medical school attenders are surveyed for mental health literacy or mental health problems. This is likely to be affected not only by the centre where the study takes place but also by the study year. Early in their education medical students may not have been placed in difficult situations or have had little contact with people experiencing mental distress. Their responses are likely to differ completely from students in other years who may be struggling with their own mental health, or who may be in their final years and have survived these difficult issues. Timing really is important, and we urge authors to at least consider it and draw conclusions that reflect the results, and the limitations, of their survey.

Data from convenience sample

Even the most straightforward-seeming sample may be biased. Thus, for example, Guha (Citation2019), in studying adult colouring books, took the first 100 titles listed on Amazon. We are all increasingly becoming aware that Amazon ratings and listings are regularly manipulated – it is unlikely that the first 100 titles displayed were a purely random selection. Again, this does not invalidate the results but raises a point to be wary of.

Observational data can also be extracted rather than purposively sampled which encompasses data collated from the internet such as mentions in social media, twitter, etc. It is very easy to use these scraped data which were not designed to answer a specific question. Apart from research ethical issues surrounding such data drawn from Facebook groups etc, there could be a view that the unfettered nature of the comments on social media is a better reflection of views than polite questionnaire responses. So, would we really know the level of stigma and discrimination in the community through scraping Twitter accounts (e.g. Bowen & Lovell, Citation2019) or from community surveys such as Gilmore and Hughes (Citation2019). We want to know and would encourage a comparison between these data collection methods to provide the context for understanding study results.

We have not discussed small qualitative studies. These can generally provide in-depth knowledge and understanding of processes, relationships and challenges not obvious in simple tables or treatment outcomes, as well as a greater understanding of the phenomenology of mental distress. Some are purposeful and select the sample based on some representation across demographic and other data thought to be important to increase transferability of findings. What we have come across is very small samples often without the appropriate specificity to answer the question posed. For instance, we received a paper which purported to tell us something about gender in service user responses when the sample was not only very biased towards the inclusion of women, but staff made up more than half the sample. However, we did publish a qualitative paper on older people’s experience of living in a secure unit where the authors did have a representative sample which was more than 50% of all those eligible (Visser, MacInnes, Parrott, & Houben, Citation2019).

Authors frequently apply the concept of ‘theme saturation’ to justify their sample size, often with little contextualising information or any reference to the study aims and methods. Malterud et al. (Citation2016) have argued that sample size is related to study aims, the specificity of the sample, whether established theory is used, the quality of the data and the analytic strategy. To this, we might add the nature of the topic and how slippery or obvious it is, along with the specific study design (Morse, Citation2000). Studies with small samples should justify their size with reference to these variables and consider providing contextual information from other sites or other services to put the data into context. Unfortunately, many of the rejected studies fell foul of this important context variable.

Why are these important issues ignored?

We understand that authors want to publish and that they believe that the words “first”, “largest” and “best” make editors more likely to publish their results. Not true in this journal. Of course, we are interested in these sorts of papers, but we also want sound content that replicates or does not replicate important findings. For replication the fact that the same results are found in a different context and with more certainty (larger samples) is important. Findings may differ across contexts. If the argument in the paper is that they cannot be replicated, then this conclusion should be based on samples that are of sufficient or appropriate size to be confident in the findings (this includes qualitative studies), and which provide enough information so we can judge whether context, self-selection and other potential bias have affected the results. We hope that we can encourage researchers to think about these things, not only before they write up their results, but also consider survey methodology before they begin to try to answer the key research question.

Til Wykes
Institute of Psychiatry, Psychology and Neuroscience, King's College London, London
South London and Maudsley NHS Foundation Trust, London

[email protected]

Angela Sweeney
Population Health Institute, St George's University of London, London, United Kingdom of Great Britain and Northern Ireland

Martin Guha
Institute of Psychiatry, Psychology and Neuroscience, King's College London, London

References

  • Arnaez, J. M., Krendl, A. C., McCormick, B. P., Chen, Z., & Chomistek, A. K. (2019). The association of depression stigma with barriers to seeking mental health care: A cross-sectional analysis. Journal of Mental Health, 1–9. doi:10.1080/09638237.2019.1644494
  • Bowen, M., & Lovell, A. (2019). Stigma: The representation of mental health in UK newspaper Twitter feeds. Journal of Mental Health, 1–7. doi:10.1080/09638237.2019.1608937
  • Babbage, C. (1864). Passages from the Life of a Philosopher. London: Longmans Green & Co.
  • Dalum, H. S., Waldemar, A. K., Korsbek, L., Hjorthøj, C., Mikkelsen, J. H., Thomsen, K., … Eplov, L. F. (2018). Participants’ and staff’s evaluation of the illness management and recovery program. Journal of Mental Health, 27(1), 30–37. doi:10.1080/09638237.2016.1244716
  • Ennis, L., Rose, D., Denis, M., Pandit, N., & Wykes, T. (2012). Can’t surf, won’t surf: The digital divide in mental health. Journal of Mental Health, 21(4), 395–403. doi:10.3109/09638237.2012.689437
  • Gilmore, L., & Hughes, B. (2019). Perceptions of schizophrenia in the Australian community: 2005–2017. Journal of Mental Health, 1–7. doi:10.1080/09638237.2019.1630720
  • Guha, M. (2019). The environment of mental health. Journal of Mental Health, 28, 109–111. doi:10.1080/09638237.2019.1581359
  • Haddock, G., Berry, K., Davies, G., Dunn, G., Harris, K., Hartley, S., … Barrowclough, C. (2018). Delivery of cognitive-behaviour therapy for psychosis. Journal of Mental Health, 27, 336–344. doi:10.1080/09638237.2017.1417549
  • Hammond, C. (2018). The anatomy of loneliness. Retrieved from https://www.bbc.co.uk/programmes/articles/2yzhfv4DvqVp5nZyxBD8G23/who-feels-lonely-the-results-of-the-world-s-largest-loneliness-study (accessed 2 September 2019).
  • Jellison, S., Roberts, W., Bowers, A., Combs, T., Beaman, J., Wayant, C., & Vassar, M. (2019). Evaluation of spin in abstracts of papers in psychiatry and psychology journals. BMJ Evidence-Based Medicine, pii: bmjebm-2019-111176. doi:10.1136/bmjebm-2019-111176.
  • Kucharska, J. (2018). Cumulative trauma, gender discrimination and mental health in women. Journal of Mental Health, 27, 416–423. doi:10.1080/09638237.2017.1417548
  • Malterud, K., Siersma, V., & Guassora, A. (2016). Sample size in qualitative interview studies: Guided by information power. Qualitative Health Research, 26, 1753–1760. doi:10.1177/1049732315617444
  • Morrison, P., Stomski, N. J., & Meehan, T. (2018). Australian mental health nurses’ perspectives about the identification of antipsychotic medication side effects. Journal of Mental Health 27(1), 23–29.
  • Morse, J. (2000). Determining sample size. Qualitative Health Research, 10(1), 3–5. doi:10.1177/104973200129118183
  • Office of National Statistics. (2018). Children’s and young people’s experiences of loneliness: 2018, Analysis of children's and young people’s views, experiences and suggestions to overcome loneliness, using in-depth interviews, the Community Life Survey 2016 to 2017 and Good Childhood Index Survey, 2018. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/articles/childrensandyoungpeoplesexperiencesofloneliness/2018 (accessed 2 September 2019)
  • Pinfold, V., Cotney, J., Hamilton, S., Weeks, C., Corker, E., Evans-Lacko, S., … Thornicroft, G. (2019). Improving recruitment to healthcare research studies: Clinician judgments explored for opting mental health service users out of the time to change viewpoint survey. Journal of Mental Health, 28(1), 42–48. doi:10.1080/09638237.2017.1340598
  • Robotham, D., Satkunanathan, S., Doughty, L., & Wykes, T. (2016). Do we still have a digital divide in mental health? A five-year survey follow-up. Journal of Medical Internet Research, 18(11), e309. doi:10.2196/jmir.6511
  • Vitoratou, S., & Pickles, A. (2017). A note on contemporary psychometrics. Journal of Mental Health, 26(6), 486–488. doi:10.1080/09638237.2017.1392008
  • Visser, R., MacInnes, D., Parrott, J., & Houben, F. (2019). Growing older in secure mental health care: The user experience. Journal of Mental Health, 1–7. doi:10.1080/09638237.2019.1630722
  • Windgassen, S., Moss-Morris, R., Goldsmith, K., & Chalder, T. (2018). The importance of cluster analysis for enhancing clinical practice. Journal of Mental Health, 27(2), 94–96. doi:10.1080/09638237.2018.1437615

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.