5,114
Views
8
CrossRef citations to date
0
Altmetric
Articles

Ethnography’s future in the big data era

ORCID Icon
Pages 1625-1639 | Received 27 Jun 2018, Accepted 21 Mar 2019, Published online: 03 Apr 2019

ABSTRACT

This essay explores knowledge claims about Big Data/BD from an ethnographic viewpoint. This epistemological exploration was triggered by social scientist/BD analyst Seth Stephens-Davidowitz’ best-selling book Everybody Lies (2017). In my reading, it portrays BD in a way that evokes affinity with ethnography: as a naturalistic research practice that makes visible small subpopulations and discloses people’s hidden motives. This threefold assertion rests on misguided conceptions however. To the ethnographic researcher, ‘naturalism’ refers to a reflexive practice, but the BD researcher associates it with researcher invisibility. The term ‘population’, which has a statistical meaning in BD, has a theoretical connotation in ethnography. Finally, ‘motives’ in BD are about direct interpretation of revealed preferences as social facts, whereas the ethnographer considers them to be expressions of social behaviour that require a Verstehende interpretation. A BD revolution may be unfolding, but that does not make ethnography obsolete; ideally, both can be combined in a symphonic social science.

1. Introduction

Big Data (BD) is knocking on the door of the social sciences, provoking strong reactions from both proponents and opponents. Those in favour point out how BD is playing an important role in restoring the central place that social research once held in public discourse about society. They argue that this is much needed, as several emancipatory waves, which produced politically empowered citizens, coupled to the relativizing force of postmodernism, have led to the widespread belief that the social sciences are just a matter of opinion (Pepi, Citation2013). More critically inclined social thinkers point out that the growing popularity of BD in social research raises problems of privacy – secret service monitoring of alleged (but not yet tried) terrorists’ digital profiles presents a recent (and worrying) case in point. They also warn against a concentration of data in the hands of BD global corporations such as Amazon and Google, seeing in it evidence of an unfolding digital-business complex that relegates ordinary citizens’ concerns to a secondary role for the sake of profitmaking (O’Neil, Citation2016).

The debate may be lively, yet it stands in sharp contrast to scholarly reflection on the knowledge claims that BD makes: what are its views on human society, and how does it compare with rival social research methodologies (such as surveys and ethnographic fieldwork)? These important epistemological questions have only recently begun to attract serious debate (various pivotal contributions were published in 2014, see Crawford et al., Citation2014 and Kitchin, Citation2014), to which this essay seeks to make a modest contribution. Such further deepening of methodological reflection is not merely an academic exercise: it stands all the more to reason in view of recent, publicly perceived victories that BD recently achieved. Take the 2016 general elections in the United States. Analysis of Google.Trends data, a massive digital repository of search keys used in Google, early on in the election process revealed a strong, pro-Trump electoral wind. Pre-election polls, however, time and again prognosticated a resounding victory for the Clintons during the same period.Footnote1 The Brexit national referendum in Great Britain also registered a BD success. Study of Facebook profiles coupled with a psychometric model suggested a strong lead for the Leave camp which few political observers had registered during the months preceding the referendum (Graslegger & Krogerus, Citation2017). These cases suggest that BD has moved beyond the sphere of computer hobbyism and has serious insights to offer about society (Burrows & Savage, Citation2014).Footnote2

The ambit of this essay is to discuss BD’s knowledge claims with reference to the best-selling book Everybody Lies. Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are, by economist and BD analyst Stephens-Davidowitz (Citation2017). This book represents an important voice in the unfolding academic debate on BD, because the author positions himself emphatically as a social scientist (he received advanced training in economics); he raises serious concerns about surveys, which he views as unreliable constructions; and his work suggests affinity with ethnography. To be sure, that latter point is my reading of Stephens-Davidowitz’ work, but others have made similar observations regarding affinities between ethnography and BD. Business anthropologist Curran (Citation2013, p. 68), for instance, is critical about ethnography’s exclusive claim to represent the emic point of view in telling about society, arguing that ‘Big Data will, in the future, be able to understand the why [of human behaviour] and tell stories … literally.’ Ethnographic researcher Ford (Citation2014, p. 2) likewise argues that ‘Both [BD and ethnography] recognize that what people actually do (rather than only what they say) is invaluable, and both require an immersion in data in order to understand their research subject.’ I make Stephens-Davidowitz’ work central in this essay because it offers a developed framework that facilitates closer inspection of the BD–ethnography nexus.

Suggesting affinity between BD and ethnography at first seems awkward, as in social science discourse they are usually viewed as opposing and/or rival research practices. To briefly summarize a still ongoing (and unresolved) debate, BD is associated with digital life in the internet era in which ‘daily life is instrumented with information technologies [producing] traces of human behaviour [that] can be combined to create rich models of social activity’ (Borgman, Citation2015, p. 6). From the social research viewpoint, this produces a surplus-understanding: ‘things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value’ (Mayer-Schönberger & Cukier, Citation2013, p. 17). Academic debates about ethnography are also ongoing, but three points appear to be emerging as important: ethnography belongs to the realm of interpretive science, it is based on first-hand (participant) observation of (online/offline) social practice, and it has a special concern for everyday life: ‘[E]thnographic techniques (…) generate observational data from real life, recorded with goodly inputs from subjects themselves […]’ (Willis, Citation2000, p. xi). In sociology, it is most closely associated with American symbolic interactionism – focusing on the shared meanings that constitute social reality (Becker, Citation2013) – and in anthropology with thick description: an empirically detailed description of focused observation combining a Verstehende and an explanatory ambition (Geertz, Citation2017).

Looked at from these vantage points, the relation between BD and ethnography is often seen as a binary opposition – quantitative-positivist-large scale versus qualitative-interpretive-small scale (Wang, Citation2013). Yet in spite of this, Stephens-Davidowitz’ work suggests three commonalities: i) a naturalistic ideal of unobtrusiveness, ii) making visible small populations that in survey research typically remain unseen, and iii) disclosing hidden motives for behaviour. Although this threefold assertion is thought-provoking, I argue in this essay that it rests on fundamental epistemological misconceptions, which may be read as an unwelcome colonization of ethnographic research practice. Consequently, although a BD revolution may be unfolding, I argue that it does not make ethnography obsolete; rather, my analysis makes critically manifest the need to delineate epistemological positions carefully and reflexively. I discuss these briefly in the conclusion, thereby adding to the symphonic approach to social science as recently postulated by the British sociologists Susan Halford and Mike Savage. But first, there follows a further exploration of Stephens-Davidowitz’ work, considering in more detail the epistemological grounds for the close relations that he observes in the BD–ethnography nexus.

2. Unobtrusive presence versus researcher invisibility

For social research, BD uses the digital traces users leave behind on the internet, for instance while they google information, update their Facebook or LinkedIn profiles, and send out Tweets. BD analysts collect those traces, code and interpret the information they contain, and build socio-scientific statements from them. It is important that the information thus assembled is not created for the purpose of study. As stated by BD researcher Adar (Citation2015, p. 771): ‘Big data collection is a secondary artifact of the information production and consumption process of people living their ordinary digital lives.’ In other words, the situation in which digital actors find themselves is not a function of some research design, and there is no researcher present who has to be accounted for. This insight led yet another BD researcher, Michael Kosinski, to take matters a step further and argue that ‘[Big data] can be used as a powerful data-recording tool because it stores (…) actual behaviour expressed in a natural environment’ (Kosinski, Matz, Gosling, Popov, & Stillwell, Citation2015, p. 543).

These are statements that we normally associate with the naturalistic ideals of ethnography: ‘Studying people in everyday circumstances by ordinary means’ (Beuving & de Vries, Citation2015, p. 15). Adhering to these ideals conjures up important questions about researcher presence, a central theme in critical ethnographic introspection, which is usually framed as a problem of reflexivity; or, a commitment to ‘analyzing how the researcher’s identity – both as presented and claimed by the researcher herself and perceived by others – may affect the research process’ (Schwartz-Shea & Yanow, Citation2012, p. 98). Thus, ethnographic researchers seek to clarify whether what they observe is society itself, or a version of it dished out specially for him or her, or the result of personal projection. BD researchers uphold entirely different ideals, however. In their view, internet informants appear to be ignorant of being monitored: they think of themselves as being alone, and the researcher therefore remains invisible. Significantly, Stephen-Davidowitz considers that to be a major advantage: ‘People will admit more if they are alone than if others are in the room with them.’ (Citation2017, p. 108) That appears to solve the reflexivity problem, but only when one scratches the surface, instead of diving deeper.

First of all, it remains to be seen whether we are really so gullible in our everyday, digital lives that we truly think of ourselves as being unobserved by others – inquisitive social researchers including. The influential internet sociologist Sherry Turkle thinks that this is a fallacy. She considers the internet as a social theatre in the sense implied in Ervin Goffman’s dramaturgical model of society, seeing online social life as structured by roles that are rehearsed backstage and performed on stage: ‘[O]ur online lives are all about performance. We perform on social networks and direct the performances of our avatars [online identities] in virtual worlds’ (Turkle, Citation2011, p. 26). With that in mind, it is not difficult to grasp how digital user profiles (Facebook, Linkedin) constitute an intricate part of a presentation of the self; and to be effective and credible one needs an audience – suggesting social awareness. So, what at first may look like neutral ego data that can be harvested from a distance – without the immediate presence of a researcher – and at great speed by specially designed computer algorithms for the purpose of social research, in reality constitutes a carefully orchestrated performance. To be sure, Turkle warns us, this performance is one without the rough edges that inevitably arise in the course of our lives; a strongly biased source of personal information, therefore.

Furthermore, the results of BD research are often shared on the internet; for instance, Wired Magazine, influential in digital circles, serves as an important conduit for circulating popularized versions of BD research findings. This can therefore be expected to produce a sort of researcher awareness that has consequences for the search strings to which internet users resort or avoid altogether: ‘Digital data […] also shape the actions and interactions of individuals […] who use the data to change their behaviour’ (Lupton, Citation2015, p. 103).

The idea that researchers must strive for digital invisibility is part of this misconception. That may be a celebrated ideal in some BD researcher circles, but it is less readily shared in the ethnographic community. The very term ‘research’ in ethnography emphasizes less the collection of ‘data’ and more a reflexive practice where the researcher critically explores the effects of his or her presence on the understanding that is being developed of a society. Reasoned from this viewpoint, reflexivity is not a problem that can be ‘fixed’ with researcher invisibility, as sometimes claimed in the BD discourse, as it is ultimately about social relations in the field that deserve a serious reflexive exploration; and because an increasing amount of social interaction is mediated via the internet, we may face an even greater challenge than when dealing with offline relations alone. A priori positions that seem prevalent in the BD discourse appear of little help. At times, the researcher effect is minimal and resembles the fly-on-the-wall ideal (Winkler, Citation1990); yet in other cases the ethnographic researcher tries to make him or herself as invisible as possible but informants will not let them, and they become the centre of attention (Beuving, Citation2017).

3. Statistical and theoretical populations

Second misunderstanding: the meaning of the term ‘population’. Methodologically, this points to the problem of representativeness, resonating rather with the positivist ideal of the natural sciences.Footnote3 That is, research findings obtained through random/probability sampling are expected to reveal the same regularities as the larger population from which the sample is drawn. BD, conversely, does not have to deal with the sampling problem because the ocean of data in which it is fishing concerns the entire research population. Consider, for instance, all Facebook profiles, or the full stream of Google search information during a particular period of time; ‘N = all’ (Borgman, Citation2015, p. 4).Footnote4 This reduces the need to devise sampling procedures and offers the important additional advantage of zooming in on those subpopulations that, by virtue of their small size, are overlooked in conventional sampling. Size matters in statistical sampling, as social researchers specializing in the study of small populations can testify. The smaller a subpopulation, the lower the chance of its special properties manifesting in a randomized procedure: ‘[D]ata on the smaller groups are overwhelmed by data on the larger groups’ (Korngiebel, Taualii, Forquera, Harris, & Buchwald, Citation2015, p. 1745).

The following example of the American economist Raj Chetty and his co-workers illustrates this principle of small populations (discussed in extenso in Stephens-Davidowitz, Citation2017). They obtained access to the big data of America’s tax administration – the Internal Revenue Service (IRS) database – storing all income declarations of America’s economically active population.Footnote5 Subsequently, they combined income data with death rates and thus statistically established a relation between income and life expectancy. Earlier socio-economic literature already established that poor people die much younger than rich people (because of differences in food quality, access to Medicare, and so on, see Wilkinson & Pickett, Citation2009), but such studies were based on survey statistics. Chetty et al. (Citation2016) instead discovered that location matters, especially for poor people. Wealthy Americans are impervious to the location effect, but poor Americans in some places get much older than elsewhere. The totality of their data allowed Chetty et al. to zoom in on specific locations, finding that relations between poor and rich are predictive: poor Americans who live in the vicinity of more affluent compatriots stand a much greater chance of making it to old age than those who do not have rich neighbours.

Zooming in on smaller populations is also an edge that ethnography of old uses to distinguish itself from other social science disciplines. Ethnography is strongly dependent on the establishment of interpersonal relations with members of some special community in which the ethnographic researcher is interested. Thus, developing rapport takes (considerable) time and effort, putting a natural break on informant numbers: looked at from the conventions of survey research, ethnography focuses on small populations. Yet there is a major difference between it and BD that often causes confusion. In the qualitative research community (to which ethnography belongs), few subscribe to the idea of population as a statistical reality; the term population is rather seen as a theoretical unit: ‘[it] aims at the social significance or the sociological relevance of the population’ (Gobo, Citation2014, p. 404). Seen through ethnographic eyes, the statistical sample presents an artificial construction that does not resonate, or resonates minimally, with everyday life experiences. In other words, it remains to be seen whether what statisticians lump together as a population represents an etic category assembled for the purpose of research or an emic community with which its members identify, forming the basis for social theorizing. A generation earlier, Dutch anthropologist André Köbben arrived at a similar viewpoint when he discussed the rapid rise of social statistics after World War II and concluded that ‘[The statistical method] … notes certain phenomena in a society and isolates them from their cultural context, thereby conveying an entirely different impression from the one they make when studied within this context’ (Köbben, Citation1952, p. 133).

When we apply these considerations to the problem that Chetty et al. (Citation2016) discuss, it can be seen how an ethnographic researcher is less interested in establishing the confines of the poor American population that lives close to more affluent Americans (leaving aside for a minute questions of definition) than in working on an explanation of the transference mechanism underpinning the life-extending proximity of affluence. He or she would then purposefully look for social situations where such a mechanism can be directly observed: in a community of social practices linking poor and rich (for more details on purposeful sampling, see Glaser & Strauss, Citation2012). The explanation is then developed into abstract propositions that speak to social theories on poverty, and through that we can hope to learn about poverty under similar, or perhaps contrasting, situations. The researcher hence does not claim to have obtained representative findings in the sense implied in a statistical discourse: poor Americans living close to more affluent ones may have surprisingly little in common other than their etically defined poverty status.

4. BD as truth serum: nuancing Verstehen?

A third misconception concerns the status of motives for behaviour: musings, thoughts, inclinations, preferences, desires, and so on. This misconception begins with a powerful idea to which many BD researchers appear to subscribe, namely, that humans have direct access to their own motives: that they are aware of them and can articulate them when pressed. However, in the practice of everyday life, powerful social forces operate on our motives: they are blocked out, or otherwise regulated, by the presence of significant others, internalized as shame, embarrassment, and so on. Motives are thus essentially suppressed by social forces. Reasoned from this viewpoint, the anonymity of the internet relieves us of the oppressive presence of social forces and, facing no social pressure, humans on the internet will reveal their true inner selves. When we are surfing the internet, our motives can find free expression, and these can subsequently be collected for the purpose of social research. In the vocabulary of Stephens-Davidowitz (Citation2017, pp. 53–54):

In the pre-digital age, people hid their embarrassing thoughts from other people. In the digital age, they still hide them from other people, but not from the internet and in particular sites such as Google and PornHub, which protect their anonymity. These sites function as a sort of digital truth serum.

Apart from this rather negative view of social life (significant others are hindrances rather than possible supporters), there are several more fundamental flaws associated with this idea. It regards motives as a stable property of the individual, and that is sharply at odds with received insights in the interpretive social sciences, the broad family of social research practice to which ethnography belongs, viewing motives as part of social relations. Motives are thus not seen as stable properties, but as changing according to the social situation in which the person finds him/herself. The following quote from the American ethnographer Alice Goffman in an interview with the New York Times illustrates this position. When quizzed about her acclaimed book about the black underclass in the United States, On the Run. Fugitive Life in an American City, she stated: ‘I don’t think that [we] have real preferences, just desires that emerge in social interactions’ (Lewis-Kraus, Citation2016, p. n.p.). Consequently, because the social environment is often a stable frame of reference, associated motives appear so as well.Footnote6 This insight touches upon an essential divide with BD: as ethnography views motives to be an expression of social interaction, insight into social behaviour is an essential precondition for interpretation. Without it, one risks imputing motives where there may be none. To be sure, social behaviour on the internet may not be quite the same as its offline versions (not to mention the issue of resolving the thorny question of how they are intertwined), and modifications of the research design as well as improvements in researcher skills seem inevitable, yet this changes little about the validity of the underlying social principle.

There is also the direction of the relation between motives and social behaviour to consider. The BD consensus suggests that motives precede behaviour: first we think, and consequently we act. It may also be argued, however, that the reverse relation applies – we act first and then find motivations, meaning that motives are really ex post facto justifications.Footnote7 This seems to be argued by Chicago School sociologist Becker (Citation1963, p. 42) in his acclaimed discussion about deviant behaviour, when he states: ‘Instead of […] motives leading to […] behaviour, it is the other way around; […] behaviour in time produces […] motivations.’ A similar critical-epistemological message can be gleaned from the works of French anthropologist Pierre Bourdieu, when he warns against self-proclaimed motives as ‘normative, value oriented statements about what it is believed ought to happen, rather than a valid description of “what goes on”’ (Jenkins, Citation2002, p. 49).

These are serious reservations that cast doubt on BD’s optimism in favour of direct interpretation; that is, without direct engagement with those under study, or our informants. BD research tends to jump analytically from a description of behaviour, as manifested in digital traces that those surfing the internet leave behind, bypassing the intermediate step of Verstehende interpretation (boyd & Crawford, Citation2012). For BD analysts, digital traces are thus directly explainable as social facts, thus nuancing the necessity to Verstehen. That seems a far cry from what ethnographers typically have in mind when conducting their craft: to make central how people themselves think about their behaviour, including how they consider those sharing their lives with them. What meanings do they attribute to their digital behaviour, and in what social practices do they result? An example from Stephens-Davidowitz shows the problems involved in direct interpretation. Using Google.Trends search keys, he studied men with pregnant wives and observed that, in Mexico, the most prevalent term with regard to pregnancy was ‘words of love to my pregnant wife’, whereas in the United States top searches included ‘my wife is pregnant and now what’ (Stephens-Davidowitz, Citation2017, p. 20). This points at a cultural difference between Mexico and the United States: the romantically inclined, family-minded Latino who welcomes a new-born child as a bonus versus the modern American man who worries about the practical consequences of another mouth to feed. Stephens-Davidowitz posits this difference without ever having exchanged views with Mexican or American men (at least not in a version the reader can verify). Looked at ethnographically, he thus demonstrates a clear case of Hineininterpretieren: the uncritical projection of personal presupposition on empirical data.

5. Discussion: BD’s rival claims and alleged successes

Now that major epistemological misconceptions in the ethnographically tinted knowledge claims of BD were addressed through a detailed discussion of Stephens-Davidowitz’ best-selling book Everybody Lies, one other serious point remains to be discussed: how to deal with the alleged successes of socio-scientific BD studies. Such studies may have serious flaws when looked at from an ethnographic viewpoint, but is that a real problem when there are serious findings to report? When BD yields insight into society that rival those reported in ethnographic work, does reflecting on their epistemological differences (the ambit of this essay) not become a semantic discussion, devoid of empirical content, and with limited consequences for society?

BD successes are the linchpin in these thought-provoking questions, and here the work of Stephens-Davidowitz again proves to be important. His book opens with Obama’s second electoral success in 2012, which was far less convincing than the sweeping 2008 win. He posits that flaring racism in the pre-Obama decade was to blame, which political watchers tended to underestimate: their rosy look at the world was moulded in the liberal political climate of Washington, which turned a myopic eye to racism among ordinary Americans, or so he claims. Stephens-Davidowitz furthermore argues that this was manifest in his analysis of search terms used on Google, which he scrutinized with the BD tool Google.Trends. He observed a rebound of search keys with strongly racist overtones (such as ‘nigger’), in particular in the months preceding the 2012 re-election. In the journal article that reports these findings (also constituting the substantive core of the book), Stephens-Davidowitz (Citation2014, p. 27) reports how these search keys offer a rare peek into the soul of ordinary Americans: ‘Racially charged search rate is a significant, negative predictor of Obama’s […] vote shares.’Footnote8

By using that term ‘predictor’, Stephens-Davidowitz makes a major claim because it suggests causality: an important trump card (no pun intended) pro direct interpretation, thus potentially nuancing (or even limiting) the importance of Verstehen in social research. If BD data facilitates dependable prediction based on a replicable cause-effect mechanism in the sense of experimental research, what then is the added value of the emic perspective, the Verstehende interpretation? The article indeed suggests causality but in a subdued fashion, suggesting that Stephens-Davidowitz (Citation2014) may perhaps be in doubt of his own resolve: ‘this paper’s methodology […] allows for a more robust test of a causal effect of racial animus, relative to other papers in the literature’ (p. 27); and ‘there is strong evidence for a causal interpretation’ (p. 32); and ‘our findings further support a causal interpretation’ (p. 33). That is clever language indeed, but it does not yet prove beyond reasonable doubt a cause–effect relation. I am not well enough versed in the technical details of statistical modelling to assess the causal claim, so I put it to three colleagues, all seasoned researchers with a serious statistical track record: two were unconvinced by Stephens-Davidowitz’ claim of causality; the third had serious doubts.

Perhaps that is not a surprising outcome. As others have already observed, proving causality in the real, social world remains a serious challenge for research. The socio-critical statistician Nassim Taleb, for instance, in his most recent book Antifragile vehemently protests against the unrelenting attempts by social scientific researchers to pass for causality what in reality are clear cases of correlation – a far lighter measure of association commonly accepted in statistics (Taleb, Citation2012). Something every graduate student learns in Statistics-101 is that, during years in which storks prevail, more babies are born; the opposite effect applies too. Obviously however, this does not mean that storks actually deliver babies; there may be a correlation but not causality. Taleb further warns against yet another danger prevalent in BD research: the larger the studied population size, and the more variables that researchers identify in them, the bigger the chance that some correlation will be found – even though there is not a real, causal relation (also known as spurious correlation, see Taleb, Citation2013). In an off-guard moment during an interview, he summed up the problem thus: ‘I am not saying here that there is no information in big data. There is plenty of information. The problem – the central issue – is that the needle comes in an increasingly larger haystack’ (Taleb, Citation2013).

These comments water down the alleged relation between racially tinted search keys on the internet and the political wind prevailing around Obama’s re-election in 2012, as Stephens-Davidowitz claims. The correlations on which he reports are certainly interesting, but in themselves they do not mean much: there may be various explanations beyond a rise in racist behaviour. It suggests, more than anything else, that Americans do not easily allow their souls to be searched via their online search behaviour. BD-based election research in the United States shows that Americans have all sorts of feelings, often contradictory and sultry – including racist ones – but it certainly does not prove that those sentiments that find expression on the internet are truer than others. They co-exist with others, also true and real, and how both realities interact and together constitute our life worlds through which we perceive and understand the world around us is not an open-and-shut case. What is required is a careful understanding from within, and this favours Verstehende interpretation – ethnography’s hallmark.

6. Conclusion: towards a social symphony?

With his pioneering empirical work, Stephens-Davidowitz contributes to a nascent epistemological debate in which fundamental knowledge claims about society are at stake. He distinctly positions himself as a social scientist, and his work presents an essential case to sharpen the position of ethnography (and possibly also other qualitative social sciences that are rooted in naturalism) with regard to BD. My reading suggests that Stephens-Davidowitz adopts a radical position: he claims that the BD approach results in a form of social knowledge that compares to that of ethnography: it makes statements about the motives for behaviour, can zoom in on smaller subpopulations that are overlooked in conventional (read: survey) research, and is unobtrusive as it does not create a research situation. Reasoning along these lines, one may be tempted to argue that BD, once its algorithms are perfected, perhaps in combination with artificial intelligence, can replace ethnography in due course, with an added benefit: reducing operational costs and achieving greater efficiencies in data collection. The only serious impediment to realizing this positivist dream seems to be the global underclass, Collier’s (Citation2007) Bottom Billion, who do not (yet) have a high-speed connection to the internet. It is a matter of time, of course, before they too are connected to global digital networks, and BD analysis can be expanded in the nether parts of the globe’s political economy too.

As this essay sought to demonstrate however, this radical viewpoint suffers from major epistemological flaws. BD may be very capable of mapping digital patterns – the ‘data’ in BD – but the claims that it makes with regard to Verstehen are problematic from the reflexive viewpoint that characterizes ethnography. It may be read as a highly ‘totalitarian’ approach to emic meaning or, as German critical social theorist Jürgen Habermas would have it, an unwelcome colonization of ethnographic research practice by ontological positivism (Citation1984). In addition, the quick leap that BD tends to make from observing digital patterns to building theoretical edifices often obscures important questions about signal and noise, especially where the one degenerates into the other. By ignoring this point, one may risk producing models that are grounded in ‘broken’ (Pink, Ruckenstein, Willim, & Duque, Citation2018) or ‘rotten’ data (Eriksson, Citation2016), in other words: models that lack a meaningful fit with emic viewpoints, revealed in interpretations that lack careful engagement with society’s members (‘member checking’, see Erlandson, Harris, Skipper, & Allen, Citation1993), are then likely to result in flawed stories about society; stories that resonate poorly with everyday experience.

To overcome this predicament, a less radical epistemological viewpoint voiced by British sociologists Halford and Savage (Citation2017) looks more promising. They coin the metaphor ‘symphonic social science’, in which ethnography and BD can co-exist – albeit with a clearly specified division of labour: ‘[T]he symphonic approach calls on us to combine the empirical power of multiple and diverse data sets […] calling for “wide data” or “thick data” as much as “big data”’ (Halford & Savage, Citation2017, p. 1140). In their work, Halford and Savage explore the boundaries of this division by examining closely the received trichotomy data–method–theory – I shall draw from that to conclude this essay.

First, they consider BD’s capacity to shed new, innovative light on data as a major boon. This essay has already mentioned personal profiles on the internet and register data (tax), but other examples include mobile telephone traffic to make sense of commuting patterns (Kung, Greco, Sobolevsky, Ratti, & Ramasco, Citation2014) and pictures of cars harvested with Google Streetview that are broken down into demographic information (Gebru et al., Citation2017). However, rather than copying the naive realism towards data characterizing the BD discourse, Halford and Savage subscribe to a critical-pragmatic stance towards data. Their central tenet is that information sources are biased and that therefore careful unpacking is required. More highly educated, politically vocal groups are overrepresented on Twitter, for instance, and analysis of Tweets cannot be assumed to represent the entire population, including the ‘silent majority’ that it often contains. A similar reservation may be expressed about exploring medical registers: welfare recipients are known to avoid professional medical care and thus remain underrepresented in the records of hospitals and general practitioners.

With regard to methods, Halford and Savage’s symphonic approach propagates pluralism: mixing various, qualitative and quantitative, nomothetic and ideographic, obtrusive and unobtrusive strategies for the collection and interpretation of information. That is not the same as advocating methodical eclecticism: effective pluralism is problem-driven. In the practice of doing social research, this boils down to fostering a continuous iteration between social structure and symbolic meaning: BD excels in the statistical description of social behaviour (Durkheimian ‘social facts’), which in a next cycle of research presents the input for a Verstehende interpretation carried out with ethnographic fieldwork (regarding the internet, this is sometimes termed ‘netnography’, see Kozinets, Citation2010; or ‘digital ethnography’, see Pink et al., Citation2016); reversing the process is of course very conceivable too. As we argue in Beuving and de Vries (Citation2015, pp. 31–33): social structure cannot be understood sufficiently well without grasping the viewpoint of those encapsulated in it; conversely, an interpretation of meanings is pointless without making a sustained reference to the structure of the (im)possibilities in which they are formed. The Chetty et al. (Citation2016) example described above presents a case-in-point with reference to the analysis of tax register data in connection to questions about poverty in the United States.

The symphonic approach has a message for theory development also. Inspired by what the American sociologist Wright-Mills (Citation1959) aptly described as ‘abstracted empiricism’ – the gathering of facts or factoids with little reference to their wider meaning or application – Halford and Savage warn against confusing the collection of data (at which BD excels) with the building of abstract ideas about society. In other words, not all that can be observed is in itself data, nor do data speak for themselves. Theory ought to be selective as to what counts as data, and data become relevant only when they somehow contribute to laying bare the big story of society. Observations are brought to life in relation to unfolding, abstract ideas about the human community in which we are interested therefore (Swedberg, Citation2016). From this essay, it follows that this processual view of social theory must include and acknowledge the viewpoints of its members. To distinguish sense from nonsense in the interpretation of digital patterns, the viewpoints of those behind the search keys, the electronic profiles, Tweets, and so on are a necessary point of departure.

In sum, this essay generally subscribes to the ideals of symphonic social science, but some caution is in order: the metaphor may also trade one misconception for another one and thus confuse the very thing I seek to clarify. ‘Symphony’ is a catchy term, but it conjures up an impression that ethnography and BD occupy similar or equal positions in the social science pantheon, in the sense that various instrument groups belong equally to the symphony orchestra. That may be true in a broad sense but, as the triangle player knows all too well, his/her unassuming instrument features minimally in most classical music scores, but it must be played with great precision or else it will ruin everything. Likewise, listening to an orchestra consisting entirely of only triangles/triangle players is probably something to which few of us aspire. Delicate instrumental arrangements underpin symphonies and, for them to work well (as in being pleasant or inspiring to listen to), musical groups must be both skilled and fulfil their ascribed role in the orchestra. Only then can there be a question of a truly harmonious social symphony.

Furthermore, symphony orchestras do not operate in an organizational void; they perform best when guided, and guidance is represented by the conductor. Looked at sociologically, the conductor stands out as an essential bridge between the musical score/tradition and the performance of it: rehearsing with the orchestra, making artistic decisions about delivery, recruiting the right set of musicians, and so on. In short, the orchestra is a centred social arrangement, with the conductor representing the apex of the musical organization. The social sciences, however, lack a centre from which collaborations are forged. Instead, they resemble more what organization theorists see as shifting organizational coalitions of competing individual or collective interests for status, position, or funding (Cyert & March, Citation1992). Lacking the central guidance associated with a symphony orchestra, ethnographic researchers and BD analysts must strategize, i.e., get the best out of each other by temporarily uniting over shared interests, but they must at the same time be prepared to go their own way once organizational configurations shift again.

These points should not distract from the important epistemological message that symphonic social science has to offer, though. My essay merely seeks to add to this fruitful metaphor, hoping also that it migrates beyond the world of pure academic reflection and extends to our students. The coming generation must be prepared for its academic role in an academic world in which BD is claiming its place. This requires a new role as public intellectuals: ethnographically trained, yet at the same time intimately versed with BD principles and procedures – in sum, as truly symphonic social scientists. Likewise, BD analysts ought to strengthen their public role: they also face the emancipatory task to tell the big story of society. A truly harmonious symphony thus requires in the first place cultivating a sharing atmosphere, wherein likeminded experts in their field depart from similarities, rather than accentuate their differences. This should also become part of their training. Once again returning to the symphony metaphor: musicians receive specialist training – few triangle players are also accomplished clarinetists – but they must also learn to listen to other musicians and musical groups in the orchestra.Footnote9 With current financial and public pressures esteeming scientific hyper specialization, achieving this ambit remains possibly the largest challenge we currently face in building a meaningful future social science.

Acknowledgments

This essay matured over a long and fruitful professional collaboration with two academic friends: prof. Dr. Jan Kees van Donge and Dr. Geert de Vries. I thank both for their enduring support and suggestions. A shorter, Dutch version of this essay is part of a debate in the Dutch-Flemish journal Kwalon, with Dr. Roy Gigengack and Dr. Reinoud Bosch acting as academic referents adding further depth to the essay, and I thank them for that. Thanks go also to my students in Nijmegen for their comments on parts of this essay, usually discussed in the backstage of academic teaching – the academic sanctuary where often the most interesting ideas flourish. Finally, Catherine O’Dea is thanked for language editing.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes on contributor

Joost Beuving is a lecturer and senior researcher at the Nijmegen Institute for Social and Cultural Science, Radboud University Nijmegen. His research interests are in the sociology of everyday economic life and in qualitative research methodology. With Geert de Vries, he published the well-received textbook Doing Qualitative Research. The Craft of Naturalistic Inquiry (Amsterdam: Amsterdam University Press, 2015). See also: www.ru.nl/caos/vm/beuving/

Notes

1 The work of Dutch economist Prof. Arie Kapteyn, University of Southern California, presents a major exception: using an anonymous, longitudinal internet panel, he predicted Trump’s election victory, see https://cesr.usc.edu/election.

2 Others remain more sceptical about this claim, however. A team of German computational social scientists refuted a widely circulated claim from their compatriot Andranik Timasjan that analysis of Twitter predicted the 2009 general election in Germany, showing that it lacked ‘well-grounded rules for data collection in general and the choice of parties and the correct period in particular’ (Jungherr, Jurgens, & Schoen, Citation2012, p. 233).

3 ‘Positivism’ is used here as a shorthand for ontological positivism: the assumption that the natural world and the social world are ordered by similar principles, often in the form of law-like regularities. It is opposed to logical positivism, which does not necessarily attribute an objective status to reality but merely stipulates that statements about reality should be capable of empirical verification (Kaplan, Citation1964, p. 36).

4 MIT researcher Lawrence Bush remains critical of this position because ‘[big] data are neither populations nor samples in the classic sense […]. They often involve convenience samples’ (Citation2014, p. 1728). Whereas I subscribe to his critique, this essay focuses on implications for the sampling of small populations.

5 Although the internet plays a limited role in register data, BD thinkers usually include it in their definitions of big data. Of course, we could have selected other examples, but the Chetty case aptly illustrates the principle discussed here.

6 Other social thinkers have made similar observations. Malcolm Gladwell, for instance, in his analysis of how behaviours spread in a community, argues it thus: ‘Character is more like a bundle of habits and tendencies and interests, loosely bound together and dependent, at certain times, on circumstances and context’ (Gladwell, Citation2000, p. 163).

7 A similar reservation may be made regarding a new wave in behavioural BD research, revolving around MRI brain scans. Informants are confronted with external stimuli such as political statements, and the corresponding brain activity is recorded (Kanai, Feilden, Firth, & Rees, Citation2011). By thus locating opinions in particular parts of the brain, meaning is also explained in linguistic terms, rather than in terms of observable behaviour.

8 To be sure, ‘nigger’ may also acquire non-racist meanings, take for instance how it can foster group solidarity that converges around Gangsta rap identities (Kennedy, Citation2004, p. 71). This points to the problem of indexicality, a term from linguistic anthropology suggesting that words can have transparent meanings, depending on the social situation they figure in; Stephens-Davidowitz pays regrettably little attention to this (save an oblique reference to Kennedy’s work).

9 For the social sciences, the following insightful remark of communication sociologist Christian Fuchs rightly warns against this pitfall: ‘Turning social scientists into programmers as part of social science methods education would almost inevitably leave no time for engagement with theory and social philosophy […]’ (Citation2017, p. 47).

References

  • Adar, E. (2015). The two cultures and big data research. I/S: A Journal of Law and Policy for the Information Society, 10(3), 765–781.
  • Becker, H. (1963). Outsiders. Studies in the sociology of deviance. New York: The Free Press.
  • Becker, H. (2013). What about mozart? What about murder? Chicago: University of Chicago Press.
  • Beuving, J. (2017). The anthropologist as jester; anthropology as jest? Social Anthropology, 25(3), 353–363.
  • Beuving, J., & de Vries, G. (2015). Doing qualitative research. The craft of naturalistic inquiry. Amsterdam: Amsterdam University Press.
  • Borgman, C. (2015). Big data, little data, no data. Scholarship in the networked world. Cambridge, MA: The MIT press.
  • boyd, d., & Crawford, K. (2012). Critical questions for big data. Provocations for a cultural, technological, and scholarly phenomenon. Information, Community & Society, 15(5), 662–679.
  • Burrows, R., & Savage, M. (2014). After the crisis? Big data and the methodological challenges of empirical sociology. Big Data & Society, 1(1), 1–6. https://doi.org/10.1177/2053951714540280
  • Bush, L. (2014). A dozen ways to get lost in translation: Inherent challenges in large-scale data sets. International Journal of Communication, 8, 1727–1744.
  • Chetty, R., Stepner, M., Abrahams, S., Lin, S., Scuderi, B., Turner, N., … Cutler, D. (2016). The association between income and life expectancy in the United States, 2001–2014. Journal of the American Medical Association, 315(16), 1750–66.
  • Collier, P. (2007). The bottom billion. Why poor countries are failing and what can be done about it. Oxford: Oxford University Press.
  • Crawford, D., Miltner, K., & Gray, M. (2014). Critiquing big data: Politics, ethnics, epistemology. International Journal of Communication, 8, 1663–1672.
  • Curran, J. (2016). Big data or ‘big ethnographic data’? Positioning big data within the ethnographic space. Ethnographic Praxis in Industry Conference, unnumbered (pp. 62–73).
  • Cyert, R., & March, J. (1992). [1963] A behavioural theory of the firm. New York: Wiley-Blackwell.
  • Eriksson, M. (2016). Close reading big data. The echo nest and the production of (rotten) music metadata. First Monday, 21(7).
  • Erlandson, D., Harris, E., Skipper, B., & Allen, S. (1993). Doing naturalistic inquiry. A guide to methods. London & New York: Sage Publishers.
  • Ford, H. (2014). Big data and small. Collaborations between ethnographers and data scientists. Big Data & Society, 1(2), 1–3.
  • Fuchs, C. (2017). From digital positivism and administrative big data analytics towards critical digital and social media research. European Journal of Communication, 32(1), 37–49.
  • Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Lieberman Aiden, E., & Fei-Fei, L. (2017). Using deep learning and google street view to estimate the demographic makeup of neighbourhoods across the United States. Proceedings of the National Academy of Sciences, 114(50), 13108–13113.
  • Geertz, C. (2017). [1973] The interpretation of cultures. New York: Basic Books.
  • Gladwell, M. (2000). The tipping point. How little things can make a big difference. New York/Boston: Back Bay Books.
  • Glaser, B., & Strauss, A. (2012). [1967] The discovery of grounded theory. Strategies for qualitative research. London: Aldine Publishers.
  • Gobo, G. (2014). Sampling, representativeness and generalizability. In C. Seale, G. Gobo, J. Gubrium, & D. Silverman (Eds.), Qualitative research practice (pp. 405–425). London & New York: Sage Publishers.
  • Graslegger, H., & Krogerus, M. (2017). The data that turned the world upside down. Retrieved from https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win
  • Habermas, J. (1984). The theory of communicative action, vol. 1: Reason and rationalisation of society. Boston, MA: Beacon Press.
  • Halford, S., & Savage, M. (2017). Speaking sociologically with big data: Symphonic social science and the future for big data research. Sociology, 51(6), 1132–48.
  • Jenkins, R. (2002). [1992] Pierre Bourdieu. London: Routledge.
  • Jungherr, A., Jurgens, P., & Schoen, H. (2012). Why the pirate party won the German election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T. O., Sander, P. G., & Welpe, I. M. Predicting elections with twitter: What 140 characters reveal about political sentiment. Social Science Computer Review, 30(2), 229–234.
  • Kanai, R., Feilden, T., Firth, C., & Rees, G. (2011). Political orientations are correlated with brain structure in young adults. Current Biology, 21(8), 677–680.
  • Kaplan, A. (1964). The conduct of inquiry. Methodology for behavioural science. San Francisco: Chandler.
  • Kennedy, R. (2004). Nigger. The strange career of a troublesome word. New York: Vintage.
  • Kitchin, R. (2014). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 1–12.
  • Köbben, A. (1952). New ways of presenting an old idea: The statistical method in social anthropology. Man/JRAI, 82(2), 129–146.
  • Korngiebel, D., Taualii, M., Forquera, R., Harris, R., & Buchwald, D. (2015). Addressing the challenges of research with small populations. American Journal of Public Health, 105(9), 1744–1747.
  • Kosinski, M., Matz, S., Gosling, S., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences. Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, 70(6), 543–556.
  • Kozinets, R. (2010). Netnography. Doing ethnographic research online. New York: Sage Publishers.
  • Kung, K., Greco, K., Sobolevsky, S., Ratti, C., & Ramasco, J. J (2014). Exploring universal patterns in human home-work commuting from mobile phone data. PLOS-One, 9(6), e96180.
  • Lewis-Kraus, G. (2016, January 12). The trials of Alice Goffman. New York Times Magazine.
  • Lupton, D. (2015). Digital sociology. New York: Routledge.
  • Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. London: John Murray.
  • O’Neil, C. (2016). Weapons of math destruction. How big data increases inequality and threatens democracy. London: Penguin.
  • Pepi, M. (2013, December 30). The postmodernity of big data, The New Inquiry. Retrieved from https://thenewinquiry.com/the-postmodernity-of-big-data/
  • Pink, S., Horst, H., Postill, J., Hjorth, L., Lewis, T., & Tacchi, J. (2016). Digital ethnography. Principles and practice. New York: Sage.
  • Pink, S., Ruckenstein, M., Willim, R., & Duque, M. (2018). Broken data: Conceptualising data in an emerging world. Big Data & Society, 5(1), 1–13.
  • Schwartz-Shea, P., & Yanow, Y. (2012). Interpretive research design. Concepts and processes. New York & London: Routledge.
  • Stephens-Davidowitz, S. (2014). The cost of racial animus on a black candidate: Evidence using Google search data. Journal of Public Economics, 118, 26–40.
  • Stephens-Davidowitz, S. (2017). Everybody lies. Big data, new data, and what the internet can tell us about who we really are. New York: HarperCollins.
  • Swedberg, R. (2016). Before theory comes theorizing or how to make social science more interesting. The British Journal of Sociology, 67(1), 5–22.
  • Taleb, N. (2012). Anti-fragile. Things that gain from disorder. London: Penguin.
  • Taleb, N. (2013). Beware the big errors of ‘big data’. Retrieved from https://www.wired.com/2013/02/big-data-means-big-errors-people
  • Turkle, S. (2011). Alone, together. Why we expect more from technology and less from ourselves. New York: Basic Books.
  • Wang, T. (2013, May 13). Big data needs thick data. Ethnography Matters. Retrieved from http://ethnographymatters.net/2013/05/13/big-data-needs-thick-data/
  • Wilkinson, R., & Pickett, K. (2009). The spirit level: Why more equal societies almost always do better. New York: Allen Publishers.
  • Willis, P. (2000). The ethnographic imagination. London: Polity Press.
  • Winkler, J. (1990). The fly on the wall of the inner sanctum: Observing company directors at work. In G. Moyser & M. Wagstaffe (Eds.), Research methods for elite studies (pp. 129–146). London: Allen & Unwin.
  • Wright-Mills, C. (1959). The sociological imagination. Oxford: Oxford University Press.