3,431
Views
5
CrossRef citations to date
0
Altmetric
Introduction

Journalism History and Digital Archives

Pages 1113-1120 | Received 21 Aug 2018, Accepted 21 Aug 2018, Published online: 05 Dec 2018

Journalism History through Digital Archives and across Fields and Institutions

When putting together this special issue Journalism History and Digital Archives I had three related objectives and/or arguments: (a) that the selected articles should work with specific concerns within journalism history through digital archives; (b) that the discussion of digital methodologies, as well as specific applications, should be accessible for journalism scholars with no prior experiences with such approaches; and (c) that journalism history and digital archives are connected in other ways than through specific methods, i.e. that the connection raises larger questions of historiography and power. In the following I will briefly discuss these objectives as a way to both raise some general issues and to frame the selected articles; as a starting point for this discussion I will present a few reflections on what digital archives imply in this connection.

Traditionally, the term archive referred to collections of unpublished and unique documents or records (or artefacts) and not to published material, which were stored in libraries. Yet, journalism and other text-focused humanities scholars increasingly talk about archives rather than libraries, a widening that partly is related to the advent of the digital, but also to a “research tradition … inspired by Derrida and Foucault” (Strandgaard Jensen Citation2017a, 70; my translation), which stressed the disciplining aspect of certain categories being deemed worthy as heritage. A more technology-related reason is the rise—since the mid-1990s—of web archiving, i.e. the “the act of collecting and preserving the online web and making it available” (Brügger Citation2018, 77). Yet, as most of the stored material is public, the web archive ought to be, Brügger jokingly asserts, called a “webrary” (78; emphasis in the original). A final reason for talking about “digital newspaper archives” (1) rather than libraries—as Steel (Citation2014) does in the introduction to a special issue of Media History on “Digital Newspaper Archival Research”—is that approaching such repositories almost automatically concerns “developments and opportunities in the production, use and development of digital archives themselves” (Steel Citation2014, 1) as much as it concerns the stored content itself.

Given an interest in the broader institutional and political aspects of archives as well as an interest in journalism that is wider that newspapers the notion of a digital archive of journalism underlying this special issue can thus be described as rather simply as

digital archives that contain journalistic publications, productions or related content (in writing, still images, moving images or sound—or combinations thereof) stored and made available in digital form regardless of whether this is the result of digitization of whether the content was born as digital content.

I have here, as well be more obvious when discussing the different articles and the archives they focus on, notably not listed a feature of “large-scale digitized collections” that Gooding focus on in his recent Historic Newspapers in the Digital Age (Citation2017, 3) namely that the content should be derived from “national library collections.” Such collections are of course very important but digital technologies have also made possible archives unrelated to more established institutions—as will be discussed below the heading “Reading Archives” (the third objective listed above).

The first objective listed above, e.g. the aim of collecting articles on journalism history and digital archives that arose from concerns related to journalism history and digital archives, is linked to the fact there already is a fair amount of theoretical work dealing with the more generic issues of digital methodologies and archives. The articles here are thus meant to highlight possibilities and pitfalls from the vantage point of journalism history, rather than focusing acutely on developing methods for monitoring and studying the contemporary and fluid landscape of digital journalism, a topic that was covered very well in the special issue “Rethinking Research Methods in an Age of Digital Journalism” of Digital Journalism (2016, 4(1) edited by Michael Karlsson and Helle Sjøvaag). Obviously, there are shared concerns of analytical methods and scale between studies of contemporary studies of digitally “live” journalism and those of digitised historical material; another issue that somehow breaches archival research and contemporary studies is that contemporary journalism has to captured and stored in order to be studied and this may extend to the making of actual archives—as exemplified by Weber’s article in this issue. Yet, as Nicholson and Van Galen point out in this issue, there are indeed also important differences in terms of the structure and quality of data between digitised and digitally born material.

The second objective is aimed at distancing the work here from more detailed methodological discussions within the digital humanities and thus to highlight both challenges and opportunities in an accessible way for scholars who have not taken on computational approaches and who may otherwise have been put off by a steep learning curve standing between them and the potential outcomes of digital methods. In fact, and this is important, a number of the articles here precisely demonstrate the value of digital archives in ways that do not necessitate any prior engagement with more complex digital methods. But, utilising some of the possibilities opened up by digital archives does necessitate a reliance on computational processes that calls for new types of knowledge that scholars of (journalism) history traditionally have not had. This has—as in other fields—collectively contributed to the rise of digital humanities as scholars with various disciplinary interests collaborate to better understand methodologies at the intersections of “the digital” and “the humanities.” It is, however, arguably important that digital methodologies also are continuously developed and applied within specific fields, e.g. journalism studies and history, in addition to (or instead of) migrating to a separate and almost wholly methodology-focused field. This special issue highlights this need, as it reflects where specific legacies and considerations unique to journalism’s history texture the approaches which are useful, or even possible, when considering digital news archives.

The aim of having a special issue composed of articles situated within journalism studies and history raised a number of issues with regard to how new methodological possibilities can be written about—and for whom—and this draws out a further discussion among those working on digital archives which spans a number of fields: journalism studies, history, digital humanities, library and information studies and—to some extent—computer science. Specifically, this is an ongoing discussion among scholars working with digital archives whose scholarship and methods now cross from discipline to discipline. Given this, the peer reviews regularly split between one reviewer with a specific disciplinary background recommending “publish as it stands,” while another from a different academic field raised serious concerns as the articles were being read from rather different viewpoints. While a journalism studies scholar not particularly well grounded in computational methods would find an article a very informative illustration of how digital methods could be utilised within journalism history, another reviewer would find the application of computational tools wanting in terms of sophistication and nuance. The articles have thus been pushed to aim for a balance between introducing and applying digital methods in ways that are understandable to more conventional journalism scholars, while acknowledging the state of the art within the broader field of digital humanities—not always an easy balance to strike.

A related issue that emerged in this project are the cross-institutional interests in issues of archiving. While people working within archives focus on various (somewhat technical) issues of storing and making available different forms of content, journalism scholars may instead argue that such issues are better discussed in a journal of library studies and not within the field of journalism studies. Yet, knowledge of such processes is arguably of increasing importance for journalism studies, as in order for journalism scholars to utilise digital archives for collecting material and, not least, for making appropriate interpretations of this material, understanding the structures and accessibility of material is crucial. As many (most) digital archives are the products of complex agreements between public, individual, research and commercial interests, these necessitate scholars develop what Strandgaard Jensen (Citation2017b) calls a “digital archival literacy,” i.e. an understanding of the processes underlying the digital material upon which you wish to base your research (a somewhat similar call is made by Birkner et al. in this issue). Linked to this is a need for journalism scholars to work with people involved in the storing, maintenance and dissemination of digitally stored journalistic products, as the public facing access, search functions, and data formats often prohibit more detailed analyses. Researchers are thus often in need of pulling out data in different formats and larger quantities and this naturally points towards collaboration with those working with libraries and repositories.

Reading Archival Content

The relatively recent move of troves of archived documents as well as stores of published material into digital forms, alongside the increasing amount of digitally-born material, has rendered certain processes easier, yet these have also opened up opportunities that severely complicate what used to be a relatively individual and manual process of “approaching an archive.” With regard to the products of journalism, the move towards the digital and digitisation has generally made both accessing and searching archives much easier but has also produced degrees of decontextualization and issues of scale as it becomes relatively easy to access vast amounts of material that simply cannot be processed manually. Obviously, even when this material was there in physical form the mechanics of access and analysis were of a nature that prohibited approaching a corpus as such.

Notions of scale and possible modes of analysis have caused an ongoing discussion of a shift from close reading of a discrete portion of an archive to distant reading of large volumes of material, and a possible subsequent shift from sampling to analysing a complete corpus. Such completeness, however, is often deceptive: Firstly, in the sense that not everything may have been stored and/or digitised and, secondly, since the quality of the OCR (optical character recognition) scanning may differ widely depending, one the one hand, on digitization at different periods of technological development and, on the other, on the type and quality of the scanned original, the corpus may become somewhat uneven. Yet, the possibilities of working with large amounts of data are real and alluring insomuch as they may reveal patterns not likely to emerge from analysis of smaller samples. This, however, related to a (possible) move away from close reading and the subsequent decontextualization and lack of nuanced readings that can come with such a move.

Related to this is the question of how specific research questions relate to empirical inquiries now possible with digital archives. While traditional archival work through more focused sampling requires a relatively precise research question to narrow one’s focus, this is not necessarily the case when applying digital methods to larger amounts of data. As a number of the articles in this issue demonstrate, the ability to access and analyse vast amounts of content works in both directions, as this approach just as often raises new specific questions for research as much as it answers them. Thus, digital approaches to journalism history not only bring about questions of distant reading supplanting close readings, but also suggest what distant readings can reveal about new and interesting ways to approach specific texts and time periods in new ways. In a recent study of “fictional space” through computational methods, Tenen (Citation2018) describes such an approach rather well when he writes that “the formal, computational methods … occasion opportunities for close reading, and not just reading at scale. My methods are diagnostic, in that they identify areas of interest and unusual trends that require closer critical attention” (Tenen Citation2018, 120). The value of complex distant readings is thus arguably reliant on being applied against the background of deep contextual knowledge about the specific areas of (journalism) history in focus.

Reading Archives

Another but somewhat related issue linked to the institutional settings of archives concerns the types and amount of material, and how this relates to interests and power. Here it is important to underline how the resources available in different settings vary both with regard to processes of the digitisation of stored material and the amount and diversity of what was stored in the first place. While at some level this may cause a “re-entrenchment of the traditional canon” and a “re-disappearance” of marginalized content (Henderson Citation2017, 2) in the sense that the most popular material is digitised and made available, it is also important to remember that the dissemination of digital technologies has made new grass-roots and experimental archives possible. Thus, while the digital allows for new and illuminating historical trajectories of journalism’s “core,” i.e. developments of conventional, mainstream, male-dominated, national (political) news journalism based on archives at established research institutions, digital technologies have also allowed for the making of a broader range of emerging archives at the periphery of research institutions. When looking at the possibilities (and pitfalls) of digital archives of journalism it is thus important to include work focusing on smaller less “passive” repositories of specific material in relation to that deemed hegemonically relevant for established (national) archives. Such work can reflect on how such smaller archives make possible the crafting of voices at the periphery of an otherwise largely male, national and political journalism. Thus, while the scale of established archives requires specific distant-reading methods—e.g. topic modelling and machine learning—emergent and smaller archives rather call for broader approaches and discussions related to the politics of archiving in relation to gender and the subaltern.

The Special Issue

Following the discussions briefly introduced above, the articles for this issue have been selected according to two intersecting dimensions: one running from archives at established institutions to new and experimental ones, and one running from relatively simple and traditional approaches to more sophisticated computational methods of journalism research. The first of these dimensions functions as the organising axis as the issue starts with work utilising established archives, going on to articles dealing with archiving and analysing specific journalistic content and then on to articles discussing alternative and more “active” archives. The issue starts with Thomas Birkner, Erik Koenen and Christian Schwarzenegger’s “A Century of Journalism History as Challenge—Digital Archives, Sources, and Methods” in which they exemplify and discuss issues related to establishing an appropriate corpus for studying the development of the inverted pyramid structure in mainstream newspapers in Germany from 1914 to 2014. While pointing towards the potential benefits of a specifically defined longitudinal study, one of the important lessons of the article is its underlining of the problems of assembling a corpus from scattered and incomplete collections, how such data are “cleaned” and normalised, the importance of detailed historical knowledge and, not least, consequences for subsequent analyses.

The next article by James F. Hamilton is entitled “Excavating Concepts of Broadcasting; Developing a Method of Cultural Research Using Digitized Historical Periodicals” and takes its cue from Raymond William’s book Keywords to utilise digital archives of newspapers and magazines in the US in order to trace the development of the notion of “broadcasting” from its original usage in agriculture to its updated reference to electronic dissemination of messages in the 1920s. As this study follows the term ‘broadcasting’ across a range of media and thus collections, Hamilton’s article illustrates the value of combining a simple keyword search within a conceptually strong and historically grounded framework and as such complements more exploratory approaches employing elaborate digital methods. The two articles which follow each experiment with and discuss the potential and pitfalls of two specific computational methods related to digital archives, and specifically address the question of scale. In “Exploring the Long-term Transformation of News. Machine Learning, Newspaper Archives and Journalism History,” Marcel Broersma and Frank Harbers develop and discuss how machine learning may yield interesting results in a longitudinal study of the development of genres in Dutch journalism. While this article rightly argues for the importance of more longitudinal studies of the forms of journalism (as does the article by Birkner et al, which opens the issue), Broersma and Harbers’ article provides great insight into challenges of developing and applying the method of machine learning. The central issue is here how you can train an algorithm to recognise and code specific genres based on various latent characteristics of journalistic texts.

The next article discusses and applies a different method, namely topic modelling. The central question posed by Bob Nicholson and Quintus Van Galen in “In Search of America: Topic Modelling Nineteenth-Century Newspaper Archives” is how, given the enormous amount of British newspaper texts containing the word “America,” we may gain an overview of how America was journalistically embedded at the time, i.e. what themes (topics) were related to the country across the Atlantic? Their article does this by conducting four specific experiments, each of which applies topic modelling. This article and the one by Broersma and Harbers are both insightful and accessible discussions of the methods of topic modelling and machine learning. Further, while both articles argue for the potential of these methods, they also stress some common problems, not least the quality of OCR and issues related to the digital segmentation of articles.

Following these, the next two articles are focused on archiving specific types of contemporary journalistic content and discuss various aspects of web archiving. The first, “Journalism History, Web Archives, and New Methods for Understanding the Evolution of Digital Journalism” by Matthew S. Weber and Philip M. Napoli discusses a project in which they archived the websites of a range of local news outlets in the US in order to learn more about the development of digital journalism. Thus, while there indeed are interesting examples of analytical approaches, the bulk of the article is focused on the potentials and problems related to designing and archiving a corpus of websites for a specific study that addresses developments over time. As such, the article is a valuable contribution to what almost certainly will be an important approach within journalism studies. The next article, “Archiving, Data journalism, Web archiving, News applications, Born-digital news, Software preservation” by Meredith Broussard and Katherine Boss, is related in that it focuses on how to archive data journalism productions, i.e. interactive projects that allow users to explore a range of data. But, rather than doing experiments, this article focuses on understanding the digital infrastructure within which such productions are made in order to suggest possible paths for their archiving. As such, this article shines an important light on the complexity of such journalistic productions and, not least, the ways in which future researcher might, or might not, access them.

The final batch of articles shift focus away from what most often is understood as digital archives of journalism to raise important issues related to the politics of archiving. The four articles focus on, respectively, archives specifically tailored to women journalists and their work, what may be termed subaltern or grassroots archives in India and Brazil and, lastly, a homemade archive made for illustrating and experimenting with issues of historiography. In “The Politics of Women’s Digital Archives and its Significance for the History of Journalism,” Pernilla Severson analyses two archives—one American and one Swedish—that give privileged access to women journalists. By looking at the digital affordances of these archives, Severson raises important questions about the institutional contours of female voices and power within the landscape of journalism history. By highlighting the gendered nature of digital archives this article is an important reminder about how various vectors of power produce the materials through which history is made.

This is also in focus in the next two articles, both of which look at archives deliberately made as correctives to the “history” recorded by mainstream journalism. In “Archiving as Social Protest: Dalit Camera and the Mobilization of India’s ‘Untouchables’,” Subin Paul and David Dowling very productively analyse important linkages between social movements and news archiving in what they call the “censorious media climate” of India. The next article by Stuart Davis, “Digital Archives as Subaltern Counter-Histories: Situating ‘Favela Tem Memoria’ in the Rio de Janeiro Media and Political Landscape,” very succinctly addresses similar issues, exploring linkages between a specific disadvantaged community and digital archiving in the favelas of Rio de Janeiro. Taken together, these two articles (joining Severson’s) highlight how increasingly available digital technologies allow for the accumulation of local news as well other material can act as powerful correctives to the ways disadvantaged communities are journalistically portrayed (and subsequently archived) by the mainstream. These articles thus cast a light both back in time towards important lacunae in what has been stored as well as looking forward towards how digital technologies can allow for the recording of corrective views, which may or may not be incorporated into more established archival institutions. The final article closing this issue also focuses on a somewhat peripheral archive. In “@franklinfordbot: Remediating Franklin Ford,” Juliette De Maeyer and Dominique Trudel use a homemade collection of material by and linked to Franklin Ford (1849–1918), an American journalist, entrepreneur and thinker who conceptualized circulations of media content that remain highly relevant today. De Maeyer and Trudel approach this archive in both orthodox and novel ways, including by designing a “bot” that tweets random excerpts from the archive. They consequently use the making of and the different approaches to the archive to raise important questions related to media history, remediation and digital archives. Taken together, the last four articles forcefully remind us that the linkages between journalism history and digital archives is not only made up of methodological concerns related to the (distant) reading of journalistic content or form but also to broader political and theoretical questions about establishing and using archives.

This short introduction and the brief run-through of the 10 articles in this issue hopefully gives credit to the breadth and complexity of the articles assembled here, not only in terms of how digital archives of journalism can be approached but also in relation to what constitutes a digital archive and, not least, the power relations involved in constructing and maintaining archives. Digital archives—in their various forms—will necessarily grow and become even more important objects and locations of study for understanding the history of journalism, its contemporary setting as well as its future trajectories. It is thus important that students and scholars of journalism are not intimidated by the complex relations undergirding digital archives, or the ever-evolving and malleable complexities of access to and usage of such archives. It is my sincere hope that this collection not only gives an interesting snapshot of important work being done with digital archives but also—and even more importantly—that it helps initiate more journalism scholars into the analysis of digital archives and, as a possible consequence, introduce central aspects of digital methodologies in their teaching.

A final note should be addressed to those involved in making this special issue possible. While Bob Franklin encouraged my idea from the beginning (when he was still the editor of Digital Journalism) the reviewers of the proposal also saw its potential. The authors of the selected articles should certainly be praised for their patience with my ideas and concerns to which they reacted very productively. This could also be said for the reviewers including those who offered feedback on various versions of articles as they developed. And, finally, my thanks to Editor-in-Chief of Digital Journalism, Oscar Westlund, who engaged with the issues raised in a detailed manner, endured my more lenient approach to deadlines and, most importantly and consistently—when discussions threatened to veer off into adjacent fields—pulled us back into digital journalism studies.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author.

Additional information

Notes on contributors

Henrik Bødker

Henrik Bødker, Department of Media and Journalism Studies, School of Communication and Culture, Aarhus University, Denmark. E-mail: [email protected]. Web: http://person.au.dk/da/imvhb@hu

References

  • Brügger, Niels. 2018. The Archived Web: Doing History in the Digital Age. Cambridge, MA: The MIT Press.
  • Gooding, Paul. 2017. Historic Newspapers in the Digital Age. Milton Park: Routledge.
  • Henderson, Desirée. 2017. “Recovery and Modern Periodical Studies.” American Periodicals: A Journal of History & Criticism 27 (1): 2–5.
  • Steel, John. 2014. “Introduction.” Media History 20 (1): 1–3.
  • Strandgaard Jensen, Helle. 2017a. “Digitale Arkiver som medskabere i ny historieskrivning” [Digital Archives as co-creators in the writing of new history.” In Digitale Metoder [Digital Methods], edited by Kirsten Drotner and Sara Mosberg Iversen, 69–86. Copenhagen: Samfundslitteratur.
  • Strandgaard Jensen, Helle. 2017b. “Storing Stuff, Structuring Stories. The power of digital archives in contemporary historiography.” Keynote paper presented at the Danish DIGHUMLAB Conference, Copenhagen, November 7. A revised version of this is currently under review as an article for American Historical Review.
  • Tenen, Dennis Yi. 2018. “Toward a Computational Archaeology of Fictional Space.” New Literary History 49 (1): 119–147.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.