607
Views
0
CrossRef citations to date
0
Altmetric
Letter

Better data ≫ Bigger data

&

Dear Sir

We read Ellaway et al.’s (2014) article on Big Data in health professions education with great interest. We share the authors’ excitement and thank them for starting the conversation in our field. Here we stress two key Big Data concerns: while analytics have undeniable benefits for hypothesis generation, we can’t eschew broader questions of scientific design and analysis.

First, Big Data is not objective data. Just as with small, purposeful datasets, large datasets are defined by the assumptions, questions, tools, and interpretations that underpin them. Our understanding of health professions education may regress if we ignore issues of design, construct selection and validation of measurements. Large or small, purposefully collected datasets wrestle with these issues upfront; datasets of convenience rarely do.

Second, not all data analysis – no matter how large the dataset – constitutes science. Exploration of the signals (and noise) in large datasets without adequate conceptual frameworks can be misleading if not dangerous. Secondary data analysis is a useful but inherently limited scientific tool as it cannot robustly infer causation. It is only when data collection and analysis are informed by theory that robust results are possible.

The scientific method was developed to navigate the complex challenges of making meaning from data. In this endeavor, better data will always trump bigger data. Without proper design and analytic rigor, Big Data could easily make us aggrandize spurious results and lead us astray.

Others fields have navigated these challenges and used theory to guide Big Data. For example, Shwed and Bearman (Citation2010) used Latour’s ‘Black Box’ theory to model scientific consensus formation. They analyzed citation networks from about 30,000 publications and 124,000 citations to shed light on controversies such as the carcinogenicity of tobacco and the autism/MMR vaccine connection. In medical education, Asch and colleagues (Citation2009) tracked maternal complication rates for 4000 obstetricians who collectively performed 4.9 million deliveries over 15 years. The authors showed the effects of training program, experience, and individual ability on clinical performance, thereby testing and confirming theories developed by experimental studies.

These studies suggest that we as a community of scholars can use Big Data to serve research, rather than have Big Data dictate it. Meaningful knowledge comes only from scientifically informed design and analysis. Ultimately, it is not about the size of the dataset.

References

  • Asch DA, Nicholson S, Sindhu S, Herrin J, Epstein AJ. 2009. Evaluating obstetrical residency programs using patient outcomes. JAMA 302(12):1277–1283
  • Ellaway RH, Pusic MV, Galbraith RM, Cameron T. 2014. Developing the role of big data and analytics in health professional education. Med Teach 36(3):216–222
  • Shwed U, Bearman PS. 2010. The temporal structure of scientific consensus formation. Am Sociol Rev 75(6):817–840

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.