5,912
Views
9
CrossRef citations to date
0
Altmetric
Articles

Collaborating to Reduce Content Gaps in Discovery: What Publishers, Discovery Service Providers, and Libraries Can Do to Close the Gaps

&

Abstract

IEEE has partnered with discovery service providers and libraries to improve content-related processes and workflows to aid better discoverability and ultimately usage. IEEE has been working with four major discovery service providers to identify content gaps between the IEEE Xplore platform, discovery service indexes, and select library discovery service interfaces. This study will illustrate content gaps, causes, and how IEEE, discovery service providers, and libraries are collaborating to reduce these gaps and adopt best practices for feeding, ingesting, and exposing publisher content in discovery services.

INTRODUCTION

The emergence of Google and Google Scholar, which index most of the world’s scholarly content and have a simple one-box interface, has radically changed how scholarly information can be discovered and accessed. Users find it easy to enter a search term into one box and quickly get a list of results.

Given that users find it easy to use Google and Google Scholar, the library community has been implementing Google-like, single-search-box discovery services. Since 2009, the library market has been dominated by four major discovery products: EBSCO Discovery Service (EDS), Ex Libris Primo, Proquest Summon, and OCLC WorldCat Local. More and more academic libraries in North America and Europe are adopting discovery service tools, and the trend is spreading to Asia, Latin America, and Africa, and to some corporate and government libraries as well.

However, discovery services pose both advantages and challenges for publishers whose content is indexed in these discovery service products. Not every publisher is providing its content to discovery service providers in a timely, complete, and accurate way; the content provided by publishers is not always fully indexed by the discovery service providers; and some libraries implement their discovery services incorrectly, preventing the full exposure of the publisher content.

IEEE is the world’s largest technical professional association dedicated to advancing technological innovation and excellence for the benefit of humanity and publishes more than 170 transactions, journals, and magazines; over 1,400 annual conference proceedings; and more than 6,000 standards. In addition, more than 2,000 e-books are hosted on IEEE Xplore, a digital Library with more than 3.7 million full-text documents.

As a publisher whose content has been indexed by all four major discovery service vendors, IEEE started to evaluate the quantity and quality of its content feeds to discovery service vendors; identify gaps in the discovery service indexes and select library discovery service interfaces; understand the causes of the content gaps; and find ways to mitigate and correct.

Between 2009 and 2012, IEEE signed agreements with all four discovery service vendors and provided them with FTP feeds for IEEE’s major collections. In 2013, IEEE organized a three-person team, which began actively working with vendors to identify areas for improvement, provide all four major discovery service providers with the complete backfile for IEEE journals and conferences (1884–1999), and create Quick Reference Guides for Discovery Services.

A dedicated IEEE Discovery Service Relations Manager was hired in 2014 and was appointed to the NISO Open Discovery Initiative (ODI) Standing Committee, Knowledge Base and Related Tools (KBART) Standing Committee, and the Discovery to Delivery Topic Committee (D2D). Internally, an eight-person IEEE Discovery Service Working Group was formed, drawing members from business units like Content Operations, Content Engineering, Software Development, Platform Design, Product Design, Marketing, Client Services, and Web Metrics.

The IEEE Discovery Service Working Group meets regularly to identify and troubleshoot major issues in discovery services. One of the initial issues identified is how to fill and avoid content gaps. The team hopes to:

  • Improve quality control of the content feeds and delivery processes;

  • Ensure that IEEE content in discovery service indexes and library discovery implementations is as complete as possible;

  • Become the first publisher to conform to the NISO Open Discovery Initiative (ODI) Content Provider Checklist; and

  • Make IEEE the industry leader in continuous development of industry standards.

LITERATURE REVIEW

Most papers on library discovery services have been written by librarians, focusing on individual libraries’ experiences with specific discovery service tools or on comparing several tools from user perspectives. Few studies have come from a publisher’s perspective or analyzed a publisher’s experiences with discovery services. “Discovery or Replacement? A Large-Scale Longitudinal Study of the Effect of Discovery Systems on Online Journal Usage,” reveals that usage of some publishers’ journal content may be negatively impacted by discovery services, but it does not venture to surmise what could have caused the negative impacts (Levine-Clark, Price, and John Citation2014). A white paper by Sage calls on publishers and discovery service providers to collaborate to improve the library discovery but does not identify reducing content gaps among the areas of collaboration (Somerville and Conrad Citation2014). The Breeding white paper, “The Future of Library Resource Discovery,” acknowledges content gaps in discovery services as a persistent issue and calls for some content analysis tools to help deal with the problem (Breeding Citation2015) . This case study intends to describe a publisher’s experience with discovery tools and highlights the importance of discovery content analysis tools. The study records the content gaps in the data transferring processes; analyzes the causes for these gaps; documents the collaboration process among the publisher, the vendors, and libraries to reduce these gaps; and offers best practices for publishers, vendors, and libraries.

RESEARCH DESIGN AND METHODOLOGY

This study compares the number of IEEE documents in the IEEE Xplore platform, in the content feeds IEEE delivered to the discovery service providers, in the selected discovery service base indexes, and in some libraries’ discovery service interfaces, at a given time period, in order to identify the content gaps among these four types of mechanisms and analyze the causes of the content gaps (). There are several assumptions to this study.

FIGURE 1 Publisher Content Transfer Process.

FIGURE 1 Publisher Content Transfer Process.

First, content gaps can occur at any stage. They can occur from content published to the publisher platform to content delivered to discovery service vendors through the FTP process; from the content received by the discovery service providers to the content actually indexed in the discovery tools; or from the discovery base index to the library discovery interfaces, configured by each of the libraries.

Second, the lack of discovery service content coverage analysis tools makes it difficult to quantify the content gaps. Currently, there are no coverage analysis tools to retrieve the most accurate number of records. Between October 2014 and February 2015, IEEE relied on ballpark numbers provided by discovery service providers. From April 2015, IEEE started developing search algorithms for each of the discovery indexes with input from the discovery service providers.

Third, since content is being added on a daily basis, and there are time gaps when content is transferred from one data set to another, the data collected only reflect the records on these specific days.

Fourth, IEEE began collecting data through library discovery interfaces in August 2014 and started informing all four discovery service vendors of the missing content problems in October 2014. In early 2015, IEEE offered the complete set of IEEE content to the major discovery service providers to help identify and fill in the missing content. As of August 2015, all four of the major discovery service providers are processing and filling in the missing content. As a result, IEEE content gaps in several discovery tools are narrowing, and the data used in this study are constantly changing.

This study acknowledges that the content transferring through the library discovery service ecosystem is a very complex process. The purpose of this study is not to calculate the most accurate numbers or to fault any single party but to find patterns of content gaps, analyze causes of these content gaps, and design ways to refill and mitigate these gaps as a collaborative effort.

This study samples data from four major sources: articles published to the IEEE Xplore platform, articles delivered to discovery service providers, records in the base discovery service indexes as reported by discovery service providers, and records from twenty-four library discovery interfaces. The four discovery service providers are anonymized as A, B, C, and D. The twenty-four libraries, all subscribing to the IEEE/IET Digital Library (IEL), are divided into four groups, each group using one of the discovery service products. Within each group, two libraries each are from the United States and Europe and one each from Asia and Latin America/Africa. The number of full-text records are further broken down by three content types: conference papers, journal papers, and standards ().

FIGURE 2 Survey Data Sampling.

FIGURE 2 Survey Data Sampling.

FINDINGS

Gaps in Syndication Feeds to Discovery Service Providers

, , and compare the number of records in the IEEE Xplore platform and the records in the inventory of content delivered to the discovery service providers. This comparison reveals a small content gap in the journal papers, conference papers, and standards sent to the discovery service providers.

FIGURE 3 IEEE Journal Papers in IEEE Xplore vs. Syndication.

FIGURE 3 IEEE Journal Papers in IEEE Xplore vs. Syndication.

FIGURE 4 IEEE Conference Papers in IEEE Xplore vs. Syndication.

FIGURE 4 IEEE Conference Papers in IEEE Xplore vs. Syndication.

FIGURE 5 IEEE Standards in IEEE Xplore vs. Syndication.

FIGURE 5 IEEE Standards in IEEE Xplore vs. Syndication.

The 10,000+ journal articles not available in the syndication feeds are mostly early access articles. The IEEE early access articles take advantage of the rapid publishing capabilities offered by digital media and represent new content that is made available online in advance of the final electronic or print versions. The IEEE Xplore platform contains over 10,000 early access articles. IEEE offered to deliver this content to discovery service providers, but currently all the discovery service providers are only doing issue-based indexing. Two of the discovery service vendors are aware of this technical limitation and are working toward indexing early access articles.

The 100,000+ conference papers not provided to discovery service vendors are mostly from conference proceedings that IEEE partners own the copyright to. IEEE needs to receive permission from these copyright owners before being able to deliver the content to the discovery service providers. Since this gap was revealed, IEEE has been proactively reaching out to conference partners to secure permission to include these papers in the syndication feeds.

The comparison also revealed that IEEE was not sending draft standards, which account for about 50 percent of the IEEE standards in the IEEE Xplore platform, to the discovery service providers. IEEE has since changed this and is now working on delivering all standards, including draft versions, to the major discovery service providers.

Content Gaps in Discovery Service Indexes

IEEE tried to find out how much of the IEEE content delivered to discovery service providers was actually indexed in the discovery tools. It is difficult to determine an accurate count of IEEE records in a discovery service index. In October 2014, IEEE started searching for IEEE content in the discovery interfaces of dozens of libraries using different discovery tools. The numbers of IEEE records retrieved showed significant content gaps in four discovery tools ().

FIGURE 6 IEEE Content Indexed Reported by Discovery Service Providers.

FIGURE 6 IEEE Content Indexed Reported by Discovery Service Providers.

IEEE asked all discovery service providers to report the IEEE content indexed in their discovery service products. Three vendors provided ballpark estimates but did not provide specific numbers broken down by year or content type, making it difficult to identify and verify which content was actually missing. One discovery service provider reported that over half of the IEEE content was missing, including all of the content before 2000 and after 2010.

In January 2015, IEEE offered to redeliver the complete IEEE content to all discovery service providers to help identify content gaps and fill in the missing content. All of them accepted the offer and started the process of identifying, processing, and indexing missing IEEE content. IEEE is working closely with the discovery service providers during this process. Three providers also created test accounts so that IEEE could search for the IEEE records in the discovery base indexes and monitor the process of filling content gaps.

IEEE Content in Selected Library Discovery Interfaces

To further identify content gaps in library discovery service implementations and assess how an individual library’s configuration may contribute to content gaps, IEEE conducted searches of IEEE content in the discovery interfaces of twenty-four libraries.

IEEE decided to restrict the search to IEEE-provided full-text content only for several reasons. First, IEEE is the only party that delivers full-text IEEE content to all the discovery service providers for indexing, but the discovery service providers are also indexing IEEE metadata from dozens of third-party Abstracting & Indexing (A&I) providers. IEEE wants to focus on assessing how IEEE-provided content has been indexed. Second, all discovery tools allow the options to search beyond the libraries’ holdings, so the numbers can show large variations. IEEE wants to focus on the number of full-text records accessible by library users, which, theoretically, should be consistent, because all of these libraries subscribe to IEL. Third, to show the IEEE content available in full-text format, libraries not only have to select all the IEEE collections when configuring their discovery tool but also have to select the appropriate IEEE targets in their link resolver knowledge base. Some libraries use discovery tools from one vendor and link resolvers from another vendor, complicating matters even more. Restricting the searches of IEEE content to full-text format only can help expose the various configuration irregularities.

Search results still contain some duplicate records, so the final numbers are not 100 percent reliable. Some discovery interfaces contain more duplicate records than others. The discovery service providers acknowledge the existence of duplicate records and are working on solutions. Duplicate records should be taken into consideration when IEEE records in library discovery implementations are compared.

shows the comparison of total IEEE records in IEEE Xplore versus the number of full-text records in the discovery interfaces of twenty-four libraries. While all of these libraries subscribe to the same package and should, in an ideal environment, have full-text access to over 3.4 million IEEE records in the discovery interfaces, the chart shows a wide range in the number of full-text records. For libraries using discovery service A, four come close to or exceed the number of IEEE records in IEEE Xplore, while two have much lower numbers. For libraries using B, four have records close to 3 million, while two are missing a significant amount of content. Three of the six libraries using discovery service C are missing almost half of the full-text records. All six libraries using D show very small numbers of IEEE full-text records in their discovery interfaces.

FIGURE 7 Total Full-Text Records in IEEE Collections in IEEE Xplore vs. 24 Library Discovery Interfaces.

FIGURE 7 Total Full-Text Records in IEEE Collections in IEEE Xplore vs. 24 Library Discovery Interfaces.

to compare the number of IEEE full-text content in IEEE Xplore and twenty-four library discovery interfaces by content type, i.e., journal papers, conference papers, and standards.

FIGURE 8 IEEE Full-Text Journal Papers in 24 Library Discovery Interfaces.

FIGURE 8 IEEE Full-Text Journal Papers in 24 Library Discovery Interfaces.

FIGURE 9 IEEE Full-Text Conference Papers in 24 Library Discovery Interfaces.

FIGURE 9 IEEE Full-Text Conference Papers in 24 Library Discovery Interfaces.

FIGURE 10 IEEE Full-Text Standards in 24 Library Discovery Interfaces.

FIGURE 10 IEEE Full-Text Standards in 24 Library Discovery Interfaces.

Libraries using A and C appear to include some duplicate records for IEEE journals and magazines in their discovery interfaces. Five libraries using B have journal records close to the 860,000 in IEEE Xplore. There may be some configuration problems for the sixth library, which is missing close to 20 percent of the journal articles. All the libraries using D have very few full-text IEEE journal articles. The problems are more than individual library configurations. A large amount of full-text journal articles are missing from the base index of D.

Duplicate records are less of a problem for conference papers in general, but library configurations may play a larger role in the availability of full-text proceedings articles in their respective discovery interfaces. Two libraries with A and two with B show few full-text IEEE conference papers. Three out of the six libraries with C are missing more than 60 percent of the IEEE conference papers. All of the libraries using discovery service D show almost no full-text IEEE conference papers.

shows that half of the twenty-four libraries in this study have not selected the IEEE Standards collection in their discovery service or link resolver knowledge base configurations, while the other half show significant full-text content gaps for IEEE standards. The problems are more significant for libraries using B. It is possible that libraries subscribing to IEL may not be aware that IEEE standards are included in IEL or choose not to activate the IEEE Standards collection because they believe few researchers at their institution use standards. Most of the IEEE full-text standards appear to be missing from D’s base index.

DISCUSSION

The content gaps between publisher platform and library discovery implementations can be attributed (). Publishers may not be providing all of their content to discovery service providers. They may withhold some databases for business concerns, or they need special permissions from content partners for copyright and business relations reasons. They may not deliver some content for technical reasons. For instance, discovery service providers currently do not have the technical capabilities to index early access articles. Publishers may fail to provide some content simply because of an oversight. The NISO Open Discovery Initiative (ODI) Content Provider Conformance Checklist, published in early 2015, can help publishers examine what they have delivered to discovery service providers before, during, or after declaring conformance.

FIGURE 11 Causes for Content Gaps in Discovery Tools.

FIGURE 11 Causes for Content Gaps in Discovery Tools.

Discovery service providers may fail to index all of the full-text content provided by publishers for several reasons. First, they may not receive all of the content due to technical issues. For instance, the FTP process, which most publishers use to deliver content, may not always be reliable. System failures, firewall settings, or password changes may lead to discovery service providers missing some content. Second, discovery service providers may not index publisher-provided full-text content in a timely way. Due to the sheer volume of content that needs to be indexed and the complexity of the indexing processes, discovery service providers need to prioritize which publishers’ content will be indexed first. This prioritizing may significantly delay the indexing of some publisher content. Third, there are time gaps between when some articles are published on the publisher platforms and when they are indexed in discovery services and made available to library users through library discovery interfaces. Discovery service providers may struggle more with some content types, such as conference proceedings and e-books. Also, discovery service providers are indexing publisher content from metadata provided by dozens of A&I database providers in addition to the publisher-provided full text. The process of mapping and merging these records can be so complicated that some records may be missing or duplicated.

Libraries’ configurations of discovery tools may also impact the content gaps. Although most libraries receive some initial assistance from discovery service providers when setting up a discovery service tool, the help often is not publisher specific. A few publishers provide publisher-specific quick reference guides for configuring discovery service tools, but many libraries are unaware of the existence of these guides. Consequently, a significant percentage of libraries do not select all of the subscribed collections in their discovery tool configurations and do not update configurations when new collections are added. Furthermore, they may fail to select all the related packages from their link resolver knowledge base, so that full-text content does not become available to library users.

CONCLUSIONS

Reducing or closing content gaps in library discovery services is a complex and continuous initiative that requires collaboration among publishers, discovery service providers, and libraries. The NISO ODI Conformance Checklist provides a starting point for these parties to work together . Since significant content gaps in library discovery services may seriously impact the overall usage of publishers’ content, publishers should take the initiative to develop measures to help reduce the gaps. Before declaring conformance to ODI, publishers can do a self-auditing of their content syndications to discovery service providers, revisiting business and technical reasons that may impact content gaps and avoiding inadvertent errors. They should work with discovery service providers to identify content gaps. They can continuously improve methodologies for identifying content gaps, redeliver missing content, and provide as much metadata as possible to make it easier for discovery service providers to conduct the complex indexing process. Publishers should also reach out to the library communities and create publisher-specific guides to help librarians configure discovery tools to maximize discoverability of the publisher’s content. In addition, publishers can audit content coverage in libraries’ discovery interfaces, promote best practices among libraries, remind librarians to configure for new collections, and help troubleshoot content-related problems.

FIGURE 12 Collaborate to Achieve Content Completeness in Discovery Services.

FIGURE 12 Collaborate to Achieve Content Completeness in Discovery Services.

Content gaps in library discovery services may undermine the credibility of discovery tools, so discovery service providers should also take steps to reduce the gaps. They can work on completing the ODI Discovery Service Provider Checklist and updating their conformance statuses. They can work with publishers and libraries to identify gaps, create coverage analysis tools to make the process easier and more transparent, and fill in the gaps whenever possible. They can review their agreements with publishers to ensure that no large chunks of publisher content are left unindexed due to misunderstanding or oversight. Discovery service providers should upgrade their technological capabilities to include early access articles and examine their content-reception mechanisms to ensure that no content is missing or corrupted during the data transfer process. They can also promote configuration best practices among libraries, help ensure no subscribed content is left out of libraries’ configurations, and remind libraries to update configurations when new collections are indexed.

Significant content gaps in library discovery services also negatively impact user experience and not fully realize the value of the expensive library investments, so libraries should also play an active role in helping to reduce the gaps. Librarians can audit their own discovery service configurations to make sure all subscribed publisher collections are selected in both the discovery index and the link resolver knowledge bases. They can follow instructions from publisher-specific configuration guides to make sure no content is inadvertently configured out of discovery services. They should also follow discovery service providers’ notifications and continuously update the configurations when new collections are added to the base index. Moreover, librarians can regularly conduct content gap analysis and report missing content to both publishers and discovery service providers. Footnote1

Standards organizations like NISO and NFAIS should continue to play a constructive role in reducing content gaps in library discovery services. Encouraging and facilitating both publishers and discovery service providers to conform to the ODI checklists is a significant step in the right direction. The ODI committee can study the impact of content gaps in discovery services on publisher content usage, advocate transparency of content gaps, and develop coverage analysis tools to help all parties, including publishers, discovery service providers, and libraries, to actively participate in identifying, reporting, and troubleshooting missing content. As library technologies continue to mature, it is likely that more publishers, discovery service providers, and libraries will resort to application program interfaces (APIs) for various purposes. NISO can encourage all parties to use APIs as a potentially powerful method for transferring content and auditing for content completeness.

Notes

1. 1.IEEE is conducting a related project, to reach out to libraries and inform them how to audit and correct their discovery service and link resolver configurations. Initial results show that corrections in configurations significantly increase the numbers of IEEE records in their discovery service implementations and thus significantly improve the discoverability, visibility, and accessibility of IEEE content. A systematic comparison will be the subject of a separate article.

REFERENCES