Publication Cover
Archives and Records
The Journal of the Archives and Records Association
Volume 42, 2021 - Issue 1: Interdisciplinarity and Archives
982
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Safeguarding the nation’s digital memory: towards a Bayesian model of digital preservation risk

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 58-78 | Received 22 Jul 2020, Accepted 13 Nov 2020, Published online: 28 May 2021
 

ABSTRACT

Preservation of digital material is a challenge for which many archives feel underprepared and ill equipped. The National Archives (UK) has been working in collaboration with statisticians from the University of Warwick and partners from across the UK archives sector to develop a decision-support system which quantifies the risks involved in digital preservation. Through interdisciplinary collaboration, this partnership has developed an interactive tool for managing risks to digital material, based on a Bayesian statistical network. The tool provides archivists with a different way of thinking about digital preservation, supported by an evidence base they can use to advocate for action. The project illustrates the potential benefit of a collaborative approach, combining insight from different disciplines.

Notes

1. The National Archives, “Digital Strategy,” 2.

2. The National Archives and Pye Tait Consulting, “Archives Workforce Development Strategy”, 14-15; and The National Archives, “Digital Capacity Building Strategy”.

3. If nothing else, storage hardware is typically replaced on a more frequent cycle than this, though this may be invisible to the end user.

4. Conway, Preservation in the Digital World, Context For Action.

5. HM Government, The Orange Book.

6. International Organization for Standardization, ISO 31000:2018 Risk Management — Guidelines; Barateiro, Antunes, and Borbinha, “Proposals for New Perspectives”; and Barateiro et al., “Designing Digital Preservation Solutions”.

7. Vermaaten, Lavoie, and Caplan, “Threats to Digital Preservation”.

8. Ibid., Appendix.

9. Rosenthal et al., “Requirements for Digital Preservation Systems”; Consultative Committee for Space Data Systems, “Open Archival Information System”; and see http://www.iso16363.org/standards/.

10. National Digital Stewardship Alliance, “Levels of Digital Preservation”.

11. The National Archives, “Archive Service Accreditation Guidance,” 63.

12. The National Archives, “Digital Strategy”.

13. Fenton and Neil, Risk Assessment with Bayesian Networks.

14. Rafiq, Chryssanthopoulos and Sathananthan, “Bridge Condition Modelling”.

15. For more information on the Turing Institute see https://www.turing.ac.uk/; and for more information on the Applied Statistics and Risk Unit see https://warwick.ac.uk/fac/sci/statistics/asru/.

16. European Food Safety Authority, “Guidance on Expert Knowledge Elicitation”. This publication gives further information on how to use structured expert judgement elicitation.

17. Hanea et al., “Investigate Discuss Estimate Aggregate”.

18. Barons, Wright and Smith, “Eliciting Probabilistic Judgements”.

19. The existing assessments referenced in DiAGRAM are the Digital Preservation Coalition, “DPC Rapid Assessment Model”; National Digital Stewardship Alliance, “Levels of Digital Preservation”; and DigiCurV, “DigiCurV Curriculum Framework”.

Additional information

Funding

This work was supported by the National Lottery Heritage Fund under project reference number OM-19-01060; The Engineering and Physical Sciences Research Council under grant EP/R511808/1; and The National Archives (UK)

Notes on contributors

Martine Barons

Dr Martine J. Barons is the director of the Applied Statistics & Risk Unit at the University of Warwick. Martine has research interests in all aspects of decision support and was part of the team, led by Jim Q. Smith, which developed the IDSS paradigm. Martine has applied decision support paradigms, most notably for pollinator abundance and household food security.  She also has extensive experience in structured expert judgment, both within the above applications and as a consultant for government applications.

Sidhant Bhatia

Sidhant Bhatia is a Software Engineer Graduate from Monash University. During the course of the project, he was undertaking a short-term research placement at the University of Warwick under the supervision of Dr Martine J. Barons. He joined the development team to build the Graphical User Interface dashboard for the statistical model and was primarily involved in the design process to ensure an intuitive user experience. Prior to the placement, he had industrial experience in developing web applications, both front-end and back-end. He also explored and developed basic statistical models as a personal interest.

Jodie Double

Jodie Double is the Digital Content and Copyright Manager at Leeds University Library. She has over two decades of experience working in digital collections and archives and joined the University of Leeds in 2009 from the University of Minnesota as Director of Digital Collections and Archives in the College of Design. Her role at Leeds for the past 11 years has focused on developing access and services around the growing corpus of digital content created from the extensive collections at Leeds in addition to the growing body of born-digital content being deposited.

Thais Fonseca

Dr Thaís C. O. Fonseca is an Associate Professor at UFRJ, Brazil and a Research Fellow at the Applied Statistics & Risk Unit at the University of Warwick. Thaís’ main research interests include Bayesian inference for time series and stochastic processes, Bayesian network models, and Bayesian econometrics. She has experience in consultancy for industry and government with applications to insurance and economics.

Alex Green

Alex Green is the Service Owner for Digital Preservation at The National Archives and is the project lead. She is an experienced Digital Archivist having worked on creating user-centric digital tools and services for the past twenty years.

Stephen Krol

Stephen Krol is a Computer Scientist/Data Scientist graduate from Monash University, Melbourne, Australia. He currently works as a data science consultant and is experienced in Python and R. Through a partnership between Warwick University and Monash University he was able to build the model for this project.

Hannah Merwood

Hannah Merwood is a Research Assistant in Applied Statistics at The National Archives (UK) on secondment from the Department for Digital, Culture, Media and Sport. She holds a bachelor’s degree in mathematics and statistics from the University of Oxford and is a member of the Government Operational Research Service analytical profession. Hannah has experience of developing complex models and data tools to support decision makers within government, including for prisoner escort contracts and broadband delivery programmes.

Alec Mulinder

Dr Alec Mulinder is Head of Service Assurance at The National Archives (UK). He is responsible for making sure that new digital services, and enhancements to existing digital services, meet the required quality standards set by UK government. Alec has been involved in the DiAGRAM project from the beginning, researching the applicability of Bayesian modelling techniques, and being a main contributor to our successful National Heritage Lottery Fund bid. His interest in digital preservation risk began in 2012 working on the UK Government funded Digital Continuity project. Alec is also working on modelling of digital preservation storage, is a co-supervisor of two PhD students researching born digital access and has a PhD in Medieval History.

Sonia Ranade

Dr Sonia Ranade is Head of Digital Archiving at The National Archives (UK), with responsibility for digital services to depositors (for selection and transfer), preservation of the digital public record and access to digital records. Her research interests include digital preservation risk, probabilistic approaches to archival description and developing new access routes for digital archives. Sonia holds a PhD in Information Science.

Jim Q Smith

Prof Jim Q. Smith is a Professor of Statistics at Warwick University and a Fellow of the Alan Turing Institute. He is a decision analyst and Bayesian dynamic systems modeller with interests that span statistical inference, data science, machine learning and operations research. He specialises in the methodology and application of various types of graphs for describing uncertain processes and also in expert elicitation, especially structural elicitation. He has recently worked with domain experts to design Bayesian decision support systems for managing risks associated with nuclear accidents, public health, policing, food poverty and COVID 19. He has published over 200 refereed papers and written three books.

Tamara Thornhill

Tamara Thornhill is the Corporate Archivist at Transport for London (TfL) and has 16 years’ experience working in archives and records management roles. She has been at TfL since 2010, responsible for the management of a collection consisting of over 165,000 files of physical material dating from the 16th century, and 20TB of digital files. She firmly believes that archives are a real asset to their parent body and wider communities and that they should be accessible and used. Since joining TfL, Tamara has undertaken various outreach activities to promote the Archives service. This programme has seen overall enquiries rise by 125% and visits increase by 359%. Exhibitions have expanded, and exhibition attendance has increased by 802%.

David H Underdown

David Underdown joined The National Archives in 2005 as a database administrator and soon gained his introduction to digital preservation from supporting our PRONOM registry of file formats and involvement in projects to refine and update our digital repository system. Since David’s background (a degree in mathematics from Imperial College London and several years working in systems development for a life and pensions company) had not really prepared him for working in archives, he used a general interest in First World War history to develop his experience of archival research and archival theory. David is also involved in defining image and metadata specifications for digitisation projects such as First World War Unit Diaries and 1921 Census. His current research project sees a return to his mathematical roots, applying Dynamic Bayesian Networks to modelling digital preservation risk through the National Heritage Lottery fund supported project ‘Safeguarding the Nation’s Digital Memory’.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 372.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.