ABSTRACT
This article analyses the socio-technical epistemic processes behind the construction of historical facts by the Internet Archive Wayback Machine (IAWM). Grounded in theoretical debates in Science and Technology Studies about digital and algorithmic platforms as “black boxes”, this article uses provenance information and other data traces provided by the IAWM to uncover specific epistemic processes embedded at its back-end, through a case study on the archiving of the North Korean web. In 2016, an error in the configuration of one of North Korea's name servers revealed that it contains 28 websites. However, the IAWM has snapshots of the majority of the .kp websites, which have been archived from as early as 2010. How did the IAWM accumulate knowledge about the .kp websites that are generally hidden to the world? Through our findings we argue that historical knowledge on the IAWM is generated by an entangled and iterative system comprised of proactive human contributions, routinely operated crawls and a reification of external, crowd-sourced knowledge devices. These turn the IAWM into a repository whose knowing of the past is potentially surplus – harbouring information which was unknown to each of the contributing actors at the time and place of archiving.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Notes on contributors
Anat Ben-David
Anat Ben-David is a senior lecturer in the department of Sociology, Political Science and Communication, and head of the Open Media and Information Lab at the Open University of Israel. Her research focuses on national web studies and digital sovereignty, web history and web archive research, and the politics of online platforms. Methodologically, her work specialises in developing and applying digital and computational methods for web research.
Adam Amram
Adam Amram is a scientific programmer. He holds an MSc degree from the department of Information and Knowledge Management at Haifa University. His research focuses on developing computational tools for web research.