Abstract
The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and other scholars are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people. Significant questions emerge. Will large-scale search data help us create better tools, services, and public goods? Or will it usher in a new wave of privacy incursions and invasive marketing? Will data analytics help us understand online communities and political movements? Or will it be used to track protesters and suppress speech? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Given the rise of Big Data as a socio-technical phenomenon, we argue that it is necessary to critically interrogate its assumptions and biases. In this article, we offer six provocations to spark conversations about the issues of Big Data: a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology that provokes extensive utopian and dystopian rhetoric.
Acknowledgements
We wish to thank Heather Casteel for her help in preparing this article. We are also deeply grateful to Eytan Adar, Tarleton Gillespie, Bernie Hogan, Mor Naaman, Jussi Parikka, Christian Sandvig, and all the members of the Microsoft Research Social Media Collective for inspiring conversations, suggestions, and feedback. We are indebted to all who provided feedback at the Oxford Internet Institute's 10th Anniversary. Finally, we appreciate the anonymous reviewers’ helpful comments.
Notes
We have chosen to capitalize the term ‘Big Data' throughout this article to make it clear that it is the phenomenon we are discussing.
API stands for application programming interface; this refers to a set of tools that developers can use to access structured data.
Details of what Twitter provides can be found at https://dev.Twitter.com/docs/streaming-api/methods White-listed accounts were commonly used by researchers, but they are no longer available.
The percentage of protected accounts is unknown, although attempts to identify protected accounts suggest that under 10 percent of accounts are protected (Meeder et al. Citation2010).