441
Views
1
CrossRef citations to date
0
Altmetric
Program Sessions

Vermont Digital Newspaper Project: From Reel to Real

Pages 151-157 | Published online: 08 Apr 2013

Abstract

In the summer of 2010, Vermont began formal participation in the National Digital Newspaper Program with the launch of the Vermont Digital Newspaper Project (VDNP). Libraries, repositories, and other organizations from across the state partnered to select and digitize historical state newspapers to be made freely available on the Library of Congress website. Birdie MacLennan described previous projects that facilitated the development of the VDNP, and highlighted the preliminary labor involved in planning a large-scale digitization project and preparing the grant application. Tom McMurdo described the logistics of selecting a digitization vendor, preparing the materials for digitization, and then reviewing the finished product.

Birdie MacLennan, Project Director for the Vermont Digital Newspaper Project (VDNP), began the session by giving background information about how the project got started, also about related past projects and larger national programs. The infrastructure established with these previous projects helped pave the way for the VDNP.

The United States Newspaper Program (USNP), which ran from 1982–2007, served as a precursor to the National Digital Newspaper Program (NDNP) and its subsidiary state projects. Funded by the National Endowment for the Humanities (NEH), and with technical assistance provided by the Library of Congress (LC), the goal of the USNP was to survey, locate, catalog, and preserve on microfilm “at risk” newspapers that were published from the eighteenth century to the present. Archivists recognized the widespread need for preserving historical newspaper content beginning in the 1950s, when it became apparent that newspapers printed on acidic paper were literally falling apart and rendered unusable and nearly impossible to preserve. They recognized microfilm as a solid preservation method that would capture and preserve this disappearing content. Beginning in 1997, the University of Vermont Libraries and the Vermont Department of Libraries collaborated to participate in this national program, coming together to form the Vermont Newspaper Project.

Throughout its tenure, from 1997–2001, the Vermont Newspaper Project, as part of the USNP, inventoried and cataloged for the Cooperative Online Serials program (CONSER) database in OCLC approximately 1,000 newspapers from seventy-two repository libraries, historical societies, museums, and courthouses across the state. In addition, the project microfilmed over 170,000 pages of newspapers and produced an Internet database of state newspaper titles with institutional holdings. This database was a major boon for the state historical societies, many of which did not have access to OCLC. The catalog end users, however, were soon hungry for full-text access. After discovering that the newspaper content was available, users, including many researchers, genealogists, and historians, were disappointed at the lack of instantaneous full-text access to the indexed content online.

Enter the NDNP, a partnership between the NEH, the LC, and participating state partners. From the program's inception in 2004, these partners shared a goal of developing a long-term national effort that would provide a freely accessible, Internet-based searchable database of newspapers, eventually from all U.S. states and territories. The LC's Chronicling America: Historic American Newspapers (http://chroniclingamerica.loc.gov/) database emerged from the NDNP and serves to aggregate descriptive information about historic newspapers along with full-text content that is selectively chosen and digitized by NEH state-funded partners participating in the NDNP. In 2009, libraries and institutions from across Vermont formed a coalition to begin applying for a grant to participate in the National Digital Newspaper Program. The University of Vermont had already developed the database of state newspapers from its participation in the Vermont Newspaper Project. Chris Kirby, representing the Ilsley Public Library, based in Middlebury, Vermont, initiated the preliminary proposal with digital specifications for the grant application, incorporating guidelines found on the NEH and NDNP websites. As the central newspaper repository in the state, the Department of Libraries, based in Montpelier, Vermont, also partnered with the project, since its staff could efficiently track down the newspaper microfilm and negatives needed for digitization. The Vermont Historical Society, based in Barre and Montpelier, also provided input on the grant application. Six individuals from these institutions formed the core Project Planning Group.

Additionally, the planning group formed a twelve-member Advisory Committee, comprised of archivists, librarians, journalists, historians, researchers, and a publisher from The Rutland Herald. This committee helped the Project Planning Group effectively apply the selection criteria to identify newspapers for inclusion in the digitization project. The NDNP encourages state partners to choose newspapers that are considered to be the papers of record for their regions, papers that contain research value and are representative of state and local history, as well as papers that provide extensive geographical coverage. Moreover, the selected papers should also have an extended run, since gaps in holdings and coverage are frustrating to users and difficult from a processing perspective as well. Planning teams should also consider the quality and availability of the microfilm negatives that become the basis of the digitized files.

During this time, the Project Planning Group also negotiated how they would divide the workload among the participating institutions. Who would be responsible for providing the technical infrastructure and support? What types of cost-sharing commitments would make the best use of the grant money? When undertaking a large-scale digitization project, MacLennan emphasized the importance of building in ample time for preparation, researching NEH and NDNP requirements, analyzing historic titles and availability of microfilm negatives, and negotiating cooperative agreements before officially applying for a grant.

In November 2009, the University of Vermont Libraries formally applied for an NEH grant, in collaboration with the Vermont Department of Libraries, the Ilsley Public Library of Middlebury, and the Vermont Historical Society, to select, digitize, and make freely available up to 100,000 pages of historic Vermont newspapers published between 1836 and 1922. In June 2010, the NEH awarded the Vermont partners a $391,000 grant for two years (July 2010–August 2012), to cover staff, equipment, and digitization and microfilm duplication outsourcing costs, making Vermont the twenty-fifth state—and the first New England state—to receive grant funding for participation in the NDNP.

The LC provides the technical guidelines and support, along with hosting the states’ digital content on the Chronicling America website. Specifically, the NDNP requires state participants to identify the master negatives for each newspaper selected, write a title essay of about 500 words for each newspaper, conduct a technical analysis of all the microfilm negatives, digitize the microfilm to generate separate TIFF, JPEG2000, PDF, XML, and optical character recognition (OCR) files for each page, and finally, to update the catalog records in the OCLC CONSER database with a link to full-text content for each title digitized. Project staff members also update catalog records in the original VTNP state database and distribute cataloging notifications to state partner institutions.

The project team spent the first couple of months setting up the organizational infrastructure to track the project's progress. They first established the accounting infrastructure and immediately launched a search for a project librarian, knowing it would take several months to complete the search. They decided to use Basecamp software as their main project management tool. Its messaging, whiteboard, and file-sharing functions were appealing, along with the ability to add new sub-projects as the need arose.

Shortly thereafter, in fall 2010, the project team composed their Advisory Committee Briefing Book, which provided a project overview, detailed explanations of the selection criteria for newspapers, along with instructional guidelines for using Basecamp to manage the multiple workflows. Simultaneously, the project team analyzed over 500 Vermont newspapers listed in the VTNP database. They continued to narrow down this universe of content by looking at the newspaper selection by county, to make sure that no one region of the state was over- or under-represented. They also paid close attention to the completeness of the holdings. In the midst of the newspaper analysis and selection, the project team composed requests for proposal (RFP) for digitization vendors.

In early 2011, Tom McMurdo joined the project as the Project Librarian. Shortly thereafter, the team acquired a list of master negatives available from the Vermont State Archives and Records Administration (VSARA), which McMurdo cross-referenced with the VTNP catalog. From there, he could approximate reel and page count estimates for fifty-nine newspaper title families identified from across the state. These title families represented over a million pages of content, so the team developed a ranking form that would help them objectively select newspapers for the project. The form categorizes the titles, first by county, then by city, along with each title's start and end dates, the range available on microfilm negatives, and page counts. In addition to the standard selection criteria outlined by the NDNP guidelines, the team also considered the context of the various papers in Vermont state history. Team members examined whether various political viewpoints were sufficiently represented and also covered distinct or historically significant titles, such as the Windham County Democrat, a 19th-century newspaper that devoted space to women's rights issues.

At this point, the team sent sample microfilm reels to the digitization vendors for an RFP response, and production officially began. The Advisory Committee ultimately selected twelve newspaper titles from ten of Vermont's fourteen counties. In accordance with the NDNP's selection requirements, the project team focused on content produced between 1836 and 1922. Pre-1836 content is increasingly available through commercial sources, thus NDNP efforts are focused on reducing overlap with commercial vendors as much as possible. Content produced after 1922 falls outside the public domain and the scope of the NDNP. The project team selected iArchives as its digitization vendor, hired a part-time digital production specialist, and also recruited a few catalogers to help with the metadata preparation. The team planned to process 10,000 pages of microfilm a month to meet its final production goal.

At this point in the session, McMurdo presented the nuts and bolts of how the digitization process works, along with some valuable lessons about newspaper digitization. Since they use microfilm as the basis for digitization, the project team members have to cope with microfilm in a variety of conditions. The condition of the microfilm negatives was a significant factor in the Advisory Committee's selection decisions. Luckily, the VSARA stores microfilm in a preservation-friendly vault, so much of the microfilm they have digitized has been in good condition. Nearly all of these microfilm reels, however, were filmed before standards dictated the filming process, so the formatting and display is inconsistent from one paper to the next and even from one page to the next. These inconsistencies required a significant amount of cleanup and processing. Additionally, the project team had to determine the condition of the microfilm negatives, noting if the films were printed on acetate suffering from advanced “vinegar syndrome” and thereby impossible to duplicate or read.

As with most of the other state NDNP projects, the Vermont project team decided to outsource scanning, since the team did not have the infrastructure to scan the content, attach OCR files to the individual scans, and generate metadata for each individual page. McMurdo acknowledged that there are several reputable companies that provide scanning services. The project team sent out duplicates of the same sample reel of film to each of the vendors that responded to their RFPs to be able to compare the results of the vendors’ work. Four of the vendors who expressed interest in the digitization RFP were able to complete duplication of the sample reel on schedule. The volume of work required under an NDNP production schedule naturally weeded out companies that could not produce high-volume work. Bids for services came in anywhere between fifty-one and seventy-eight cents per frame, where metadata would be supplied at the page level. Ultimately, the team decided that the best option for the Vermont project was to choose iArchives, a vendor who produced high-quality scans at a competitive price, and had a history of successfully working with other NDNP state projects.

Before they send microfilm negatives to iArchives, members of the project team—in particular the Digital Production Specialist and the catalogers—generate metadata about the newspaper content. Staff members examine every frame of content, recording title, date, and pagination of each newspaper issue in an Excel file. During this process, they often encounter duplicate or missing pages or other issues on the film, the result of errors in the original filming process. Such errors and irregularities are noted to ensure that the metadata matches the digital content generated from each reel. Project staff send each reel's companion Excel file to iArchives, along with the microfilm negatives. They convert the information from Excel into XML files about ten thousand pages at a time, with hierarchical batch-level and reel-level data, which contain issue and page data. In addition to TIFF 6.0 archival files, iArchives also provides derived JPG2000 and PDF image files of each page of scanned content. After iArchives completes the scanning process, McMurdo examines each of the TIFF and JPG2000 files, and checks a sample of the PDF files, reviewing them for quality control. The XML and OCR data files are also checked for quality and adherence to NDNP standards. If the scanned content passes review, McMurdo ships the data to the LC on 1 terabyte hard drives. The LC loads the digitized content and its accompanying metadata to its database, and makes these keyword-searchable freely available newspapers available on the Chronicling America website.

Session attendees asked a number of questions, related to the project's selection criteria, its protocol for dealing with damaged microfilm, and its plans for the future. MacLennan and McMurdo emphasized the planning group's desire to make unique content available as a result of the project. They carefully surveyed Vermont newspaper content available through commercial sources and discovered that content dated before 1836 was largely available through commercial products. They chose to focus on content produced between 1836 and 1922. With an end date of 1922, content made available from the project will be in the public domain, and project staff will not need to worry about securing copyright releases or tracking orphaned content. Another important factor the planning group considered was the size of the available run for a newspaper title. MacLennan commented that it can be very frustrating for a remote researcher to encounter gaps in digital holdings as he or she tries to follow the thread of a story. The planning group also focused on digitizing newspapers from a variety of groups. MacLennan pointed out that unlike today, even small towns often had multiple, competing newspapers to represent both the Whig and Democrat groups in the area. Additionally, they made efforts to include Free Soil newspapers, since Vermont was historically the first state to outlaw slavery, and they also included papers that followed the careers of notable women's suffrage leaders in the state.

Attendees also inquired how project staff handled damage to the microfilm. McMurdo observed that overall, the project was very fortunate in that the microfilm they digitized was in good condition. Since readability is a criteria mandated by the LC for the project, the planning group was careful to select titles largely devoid of widespread damage to the microfilm. McMurdo remarked that readability does not necessarily lead to accuracy with the OCR software. Thus, searching the digitized content is not entirely reliable. In many cases, researchers are better served to browse the content.

Some session attendees were particularly interested in the project's plan for the future after the grant expires. MacLennan and McMurdo noted that LC and the NEH used this national project to set the ideal standards for a large-scale newspaper digitization project, and there might be some overhead costs that could be reduced without sacrificing service to the end-user. With each state's newspaper project representing about six terabytes of data, the costs for serving and housing all the data associated with the digitized content, particularly the TIFF files, on a local system become expensive. Securing funding from local institutions to continue storing this content and making it publicly available can be difficult. Project participants in various states are exploring whether other, more compressed file types, like JPEG 2000 and PDF, are viable alternatives that will save on storage costs but still deliver the same experience to the end user. The outcome of these explorations regarding file types might allow NDNP participants to continue making this valuable digitized content available after their grants expire at a lower cost to their local state-based institutions. As the Vermont project concludes, participating staff and planning group members are pleased with the high volume of unique content they have made available for researchers and are looking forward to the national project's next steps.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.