1,643
Views
32
CrossRef citations to date
0
Altmetric
ARTICLES

AGILE ETHICS FOR MASSIFIED RESEARCH AND VISUALIZATION

&
Pages 43-65 | Received 28 Apr 2011, Accepted 16 Aug 2011, Published online: 13 Oct 2011
 

Abstract

In this paper, the authors examine some of the implications of born-digital research environments by discussing the emergence of data mining and the analysis of social media platforms. With the rise of individual online activity in chat rooms, social networking sites and micro-blogging services, new repositories for social science research have become available in large quantities. Given the changes of scale that accompany such research, both in terms of data mining and the communication of results, the authors term this type of research ‘massified research’. This article argues that while the private and commercial processing of these new massive data sets is far from unproblematic, the use by academic practitioners poses particular challenges with respect to established ethical protocols. These involve reconfigurations of the external relations between researchers and participants, as well as the internal relations that compose the identities of the participant, the researcher and that of the data. Consequently, massified research and its outputs operate in a grey area of undefined conduct with respect to these concerns. The authors work through the specific case study of using Twitter's public Application Programming Interface for research and visualization. To conclude, this article proposes some potential best practices to extend current procedures and guidelines for such massified research. Most importantly, the authors develop these under the banner of ‘agile ethics’. The authors conclude by making the counterintuitive suggestion that researchers make themselves as vulnerable to potential data mining as the subjects who comprise their data sets: a parity of practice.

Acknowledgements

An earlier version of this paper was presented at the Visualisation in the Age of Computerisation conference held in March 2011 at the University of Oxford. The authors would like to thank the participants for their helpful commentary, as well as the four anonymous reviewers and the editors for their insightful suggestions. The Twitter data collection for the NCL maps was initiated as a collaboration between Dr Andrew Hudson-Smith, Steven Gray and Fabian Neuhaus. The code to collect the Twitter data was developed by Steven Gray as part of the National e-Infrastructure for Social Simulation project. Support for this study was provided by the ESRC Oxford e-Social Science project.

Notes

For example, see the Code of Ethics of the American Anthropological Association, http://www.aaanet.org/committees/ethics/ethcode.htm, or the American Psychological Association's Code of Conduct, http://www.apa.org/ethics/code/index.aspx.

For instance, the UK's Data Protection Act of 1988 or the 1995 European Union Data Protection Directive.

Due to IP limitations imposed by Twitter and infrastructural limitations, only four parallel search and collect queries may be run at the time. Depending on the search location, the resulting amount of data can be quite large, putting pressure on the infrastructure. In order not to miss out on messages, the responding times of the system cannot be compromised.

Of course, the size of the file depends on the format. A ziped comma-separated value format will be much smaller. One week provides good comparison of data over a number of days and also shows the different activity patterns between weekdays and weekends. Furthermore, because of the IP and infrastructural limitations, we continuously have to make way for new collections.

Twitter's statement on privacy is available at http://twitter.com/privacy.

For more details, see http://pleaserobme.com/.

Online is not equal to public. Even though data acquired through an API are branded public, it cannot be simply taken as information in the public domain. To illustrate this, we might contrast data harvested from Twitter with online resources that make data available in response to freedom of information legislation. Among others, the Guardian Data Store (http://www.guardian.co.uk/data) or data.gov.uk offers examples where governmental information is disclosed (available at http://www.facebook.com/privacy/explanation.php). More importantly, a public authority has actively decided for these data to be made available.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.