782
Views
1
CrossRef citations to date
0
Altmetric
Preconference Programs

Metadata in a Digital Age: New Models of Creation, Discovery, and Use

, , , , , , , & show all
Pages 7-24 | Published online: 13 Mar 2009

Abstract

Metadata is critical to finding content. However, the expansion of digital content, rapid changes in access and use, and transitions in the digital supply chain pose tremendous challenges to the effective organization, description, and management of this information. Seven speakers from different communities in the digital supply chain addressed various aspects of this important topic. Participants in this preconference session gained a valuable overview of and an insight into the many issues associated with creating and distributing metadata in the digital age.

INTRODUCTION

The “Metadata in a Digital Age: New Models of Creation, Discovery, and Use” preconference was a full-day program developed by the National Information Standards Organization (NISO). Todd Carpenter, Managing Director at NISO, welcomed program attendees and gave introductory remarks. Carpenter pointed out that in recent years, the expansion of digital content, rapid changes in access and use, and transitions in the digital supply chain have all had tremendous impacts on the creation and exchange of metadata. Metadata often begins with the author of a work and is continually enriched as it passes from publisher to vendor to library and then to the end user. What is the role of metadata in the life cycle of content? How is metadata created, stored, communicated, and used to provide access in the increasingly digital age? What are some of the emerging community standards and best practice initiatives? Carpenter announced that seven experienced speakers from different communities had been invited to address different aspects of these topics. They were Renee Register, Global Product Manager for Cataloging Partnering at OCLC; Kevin Cohn, Director of Client Services at Atypon Systems, Inc.; Steven Shadle, Serials Access Librarian at the University of Washington Libraries; Regina Reynolds, Head of U.S. ISSN Center, National Serials Data Program at the Library of Congress; Les Hawkins, CONSER Coordinator at the Library of Congress; Helen Henderson, Managing Director of Ringgold Ltd; and William Hoffman, Process Analyst at Swets.

In commenting that the preconference was NISO's first programming partnership with the North American Serials Interest Group (NASIG), Carpenter presented a brief history of NISO and emphasized that NISO was truly a volunteer organization seeking engagement from communities.

NEW DIRECTIONS IN CATALOGING AND METADATA

Register was the first of the seven speakers and delivered a keynote presentation on new directions in cataloging and metadata creation. She explained that her talk would cover three topics: challenges facing libraries, publishers, and vendors; strategies for meeting the challenges; and next generation cataloging and metadata.

Register emphatically stated that the current cataloging and metadata models in the library community were unsustainable. She referenced the first recommendation in the final report of the Library of Congress Working Group on the Future of Bibliographic Control by stating that the library community must increase the efficiency of bibliographic production and maintenance, eliminate redundancies, and make use of more bibliographic data available earlier in the supply chain.Footnote 1 But libraries are not alone. The publisher supply chain experiences challenges in metadata creation and management as well.

The challenges for both the library community and the publisher supply chain are multifaceted. Register categorized them into four main areas. First, there is simply too much stuff to deal with: the growth of materials and formats, faster publishing cycles, multiple sources for metadata, and multiple metadata formats and standards. Second, users expect fast Web exposure of new materials, easy information retrieval, and immediate access to materials. Third, metadata creation is expensive and labor intensive, but the danger of hidden materials is greater. For libraries, a lack of metadata or delayed, incomplete or incorrect metadata can mean a lack of access to library materials. Hidden materials also have implications for collection analysis, selection, and reporting. For publishers, a lack of metadata can result in no sales while incomplete or incorrect metadata can result in missed sales and poor business intelligence. Finally, there is still enormous duplication of effort in metadata creation and maintenance for the same set of titles. Libraries commonly utilize complex local practices, local editing and other manual manipulation of existing metadata and do not share all metadata creation and enhancement. Multiple libraries (and vendors) may be doing work on the same titles before a record appears in cooperative databases such as OCLC WorldCat. Publishers have different data feeds or data streams for retail vendors and libraries. Therefore, in the publisher supply chain, staff and systems are deployed for creation and enhancement of metadata for publishers, for extensive review and manipulation of metadata for retail, wholesale, and metadata aggregation vendors, and again for adding library-specific metadata in library vendor programs and ordering tools such as Web-based ordering, selection lists, and approval plans. In addition, many library vendors allocate staff or outsource MARC record creation and many of these records end up in proprietary systems and are not shared.

Following her explanations of the challenges, Register went on to outline strategies for meeting the challenges. As the first strategy, Register pointed out that we must understand that we can no longer separate library metadata and metadata practices from the bigger supply chain of metadata. We need to increase collaboration and cooperation between library and publisher supply chain communities and allow remixing and reusing existing metadata. A great example of remixing and reusing existing metadata is to break down barriers between metadata used for acquisitions and metadata used for discovery, business intelligence, and collection management. We need to become more involved in upstream metadata creation processes, integrate available metadata into workflows upstream, and allow the metadata to evolve over time. The second strategy was that metadata management workflows and practices must change in order to allow for the easy ingest and use of existing metadata. Specific strategic actions include reducing practices that require manual manipulation of existing metadata, allowing different levels of metadata based on material type and user needs, and allowing metadata for new titles to evolve over time. As the third and last strategy, Register emphasized that solutions developed to address the challenges must be interoperable and easily shared, both inside and outside the library community. The library community must extend expertise to include publishers and publisher supply chain partners. We must find ways to create and share multiple types of metadata. Libraries must become more open to the use of non-MARC data and non-library vocabularies while publishers must find ways to leverage library data including classification, terminologies, and authorities.

After delineating the strategies, Register went on to introduce an OCLC pilot project called “Next Generation Cataloging and Metadata Services.”Footnote 2 Launched in January 2008, its aim is to explore capturing ONIX (ONline Information eXchange) metadata from publishers and vendors upstream, enhancing that metadata in WorldCat, and returning the enhanced metadata in ONIX back to publishers and vendors. Four publishers and vendors and four libraries participated in the project. In the pilot, publishers and vendors provide title information to OCLC in ONIX format. OCLC crosswalks the data to MARC and, where possible, enriches the data in automated ways through data mining and data mapping. The resulting MARC record is then added to WorldCat for library review and use, and the enhanced metadata is converted back to ONIX and returned to publishers and vendors for review and use. Register reported on the pilot's progress made since January and indicated that the pilot would be wrapped up in June. There would be a pilot program update session at the upcoming American Libraries Association annual conference on June 29, 2008, in Anaheim, California.

In response to questions from the audience regarding the pilot program, Register clarified that although it may not seem to save work for libraries, it was important for publishers and vendors to have better and enriched metadata in the early phase of the game. She explained that the pilot project was a more realistic approach to enriching metadata early in the data supply chain rather than trying to train publishers to do classification and Library of Congress subject headings. She added that the next generation cataloging would be a service to publishers and hoped several layers of services would be offered from which publishers could choose. Register confirmed that the pilot included materials in all formats, including electronic resources.

PUBLISHING PLATFORMS AS METADATA HUBS

Cohn was the second speaker and offered a perspective from a publishing platform vendor. His talk covered three topics: providers of metadata and how they provide it; consumers of metadata and how they consume it; and how publishing platforms deliver metadata from the providers to the consumers. Cohn asked the audience to keep in mind two things that would be mentioned often during the presentation: incoming/outgoing formats and exchange mechanisms. Examples of incoming/outgoing formats include National Library of Medicine (NLM) metadata schema, Dublin Core, and RSS while exchange mechanisms include FTP, Z39.50, and e-mail.

To set the stage for his talk, Cohn reiterated the well-versed program summary: “There is an increasing number of providers and consumers of content metadata. This, coupled with the use of different DTDs (Document Type Definitions) and exchange protocols, demands that publishers' platforms serve as advanced metadata hubs.”Footnote 3

According to Cohn, there are generally five groups of metadata providers: authors, publishers, librarians, secondary publishers, and end users. Among them, publishers produce the majority of metadata that is consumed within the industry. Speaking of his experience with Atypon's publishing platform, Cohn explained that publishers generally provide metadata as XML or SGML and upload the metadata in submission packages to Atypon's platform via WebDAV (Web-based Distributed Authoring and Versioning). He further commented that NLM was Atypon's house DTD, which, however, was only one of many various DTDs that publishers use to submit metadata. This undoubtedly presents a challenge for a publishing platform vendor like Atypon.

On the other hand, publishers nowadays are putting in a lot of effort to create metadata because they want their products to be discoverable. Cohn showed a portion of an article marked up in NLM DTD on seven PowerPoint slides to illustrate how a publisher used ninety-five lines of the document to mark up the metadata for just the front matter of the article. Similarly, eighty-seven lines of the document were used to mark up the metadata for the references section of the same article. This example demonstrated that rich metadata was marked up in human-friendly tags so that it was both machine and human readable. Another example Cohn presented was from a CSA Illumina database, where charts, tables, and objects embedded in an article were extracted and appended with metadata so they could become discoverable. In commenting on the value of the added metadata, Cohn shared a quote from a CSA whitepaper, The Value of CSA Deep Indexing for Researchers: “Figures and tables represent the distilled essence of research communicated in academic articles. Although the analysis contained in the surrounding text is important, it is clear that researchers are eager to view the actual data collected, observed, or modeled to determine the article's relevance to their own work.”Footnote 4

Speaking of the consumers of metadata, Cohn noted that they were much more varied than the providers but by and large had the same basic needs. Metadata consumers include abstracting and indexing database vendors, aggregators, booksellers, CrossRef, Google, libraries, subscription agents, end users, and many others. However, it is important to note that these consumers use a variety of metadata formats ranging from NLM, ONIX, MARC, and Dublin Core, to XML and more. As varied as they are, metadata consumers have many applications or workflows that rely on aging formats and protocols. Unfortunately some of them are not sophisticated enough to handle rich metadata. Cohn observed that there are plenty of consumers wanting plenty of metadata delivered in plenty of ways.

The problem, Cohn emphasized, is that there are too many formats and too many exchange protocols, which causes too much metadata being lost in the process of transmission. The key to addressing the problem is to promote the adoption of appropriate standards. Cohn suggested CrossRef as a possible solution to this problem because CrossRef already held authoritative metadata, which was stored in a single, unified XML schema for maximum interoperability, for the majority of journals. Cohn explained that the CrossRef model would allow easy exposure of metadata to the various consumers in the formats required and would eliminate a lot of complexity for publishers if they agreed to this model.

Cohn concluded his presentation with some emphatic suggestions for publishing platform vendors. To turn publishing platforms into metadata hubs, vendors must embrace and support appropriate format and protocol standards. Cohn's specific suggestions included:

Leverage standards (XML and XSL) to ease ingestion, transformation, and syndication

Encourage interoperability by using the NLM DTD whenever possible

Support a multitude of exchange protocols because consolidation is not happening soon

Participate in industry forums to understand metadata consumer needs

Metadata is critical to finding content, and standards are critical in the provision and consumption of metadata. Cohn suggested that NISO continue to facilitate the standardization of metadata formats, exchange mechanisms and unique identifiers, as well as bring providers and consumers together to communicate on these important issues.

LIBRARY CATALOG METADATA BASICS FOR PUBLISHERS

Shadle was the third and last speaker of the morning session. He opened his talk by stating the agenda of his presentation, which was to educate publishers about the need and use of e-serial metadata in the library environment.

To approach this topic, Shadle first gave an overview of how libraries provide access to e-serial content. Libraries typically deploy three methods to provide access to e-serial content: A–Z lists, OpenURL link resolvers, and library catalogs. An A–Z list, which could be a library home-grown application or an outsourced service, is a Web-based, alphabetical list of e-journals or e-serials. The basic data elements in an A–Z list include title, ISSN, coverage dates, journal URL, and names of access providers. OpenURL link resolvers are well loved by libraries because the OpenURL technology addresses the appropriate-copy problem and allows users to connect to only the copy (or copies) that they are entitled to access. An OpenURL link resolver provides a seamless link between the citations users retrieve in database searches and the full-text of articles cited. As for library catalogs, Shadle emphasized that they were not dead, quite contrary to a popular perception that some might like to embrace. He noted that we still need to provide a local context for e-journal collections to facilitate user access to those online resources. Shadle went on to show a few screenshots of a serial record presented in a library catalog. An initial screen of a serial record usually displays only brief information about the title, which serves as a stepping off point to a full-record view backed by a full MARC record. Shadle pointed out that publishers would not need to create and maintain MARC records because crosswalks are designed to address the need to convert publisher metadata into MARC format.

Additionally, some libraries utilize hybrid A–Z lists or hybrid library catalogs, where the key functionalities of a link resolver are integrated seamlessly to expand e-journal access options for users. Libraries also provide e-content through syndication so that users may discover library resources while visiting other websites or search engines such as Google Scholar or WorldCat.

Why is this important for publishers? Shadle explained that it is in the interest of both publishers and the library community to maximize the use of e-serial publications. If publishers provide accurate metadata to libraries, their e-products will get more use. If publishers do not provide accurate metadata, their e-content will not surface in the library environment. Shadle then showed typical journal metadata (title, print ISSN, and e-ISSN) found in bibliographic databases for the title, Journal of the American Society of Nephrology. It quickly became clear that there were several errors in the seven instances of publisher-provided journal metadata in Shadle's example, of which ISSN problems were prevalent. Shadle stated that the quality of publisher metadata was in such a sad condition that it prompted Serials Solutions, a publications access management service provider, to launch the KnowledgeWorks Certified program to certify content providers on behalf of their library customers.

Shadle reiterated that library metadata would help publishers maximize the use of their products and urged publishers to increase their awareness of library metadata issues, particularly those regarding title and ISSN. Shadle proceeded to present a few basic serials cataloging principles that publishers must be familiar with:

Cataloging rules stipulate the official form of title is the presentation of title that appears on the title page or cover of the first issue. All other presentations of title are considered variants.

If a title appears on the title page or cover in both full form and acronym, cataloging rules specify the full form will be the official title.

For e-serials, the official form of title will generally be the same as the print serial title.

Serial metadata is transcribed from first issue. Changes in the serial are noted over time.

When the title appearing on title page or cover changes significantly, it is considered a new title and a successive record is created.

When there is a major title change (per ISO 3297), a new ISSN is assigned.

Each format is separately cataloged and assigned a different ISSN.

Publishers must understand that ISSNs are critical to connecting users to content and attempt to provide accurate ISSN information in their metadata. Common ISSN problems include lack of an ISSN, old or cancelled ISSNs, ISSNs for other titles or formats, retrospective ISSN assignments, and mistyped ISSNs. Shadle encouraged publishers to establish a working relationship with their national ISSN center and to make sure to contact the center when considering a serial title change, publishing a serial in a new format, starting a new title, or acquiring an existing serial title or backfile. It is important to remember that ISSN assignments are free of charge. In return for their work on quality ISSN information, Shadle asserted that publishers would enjoy benefits offered by the ISSN network, particularly a well-managed distribution system where all ISSN assignments are distributed to most national libraries and made available through shared databases such as the CONSER database or OCLC WorldCat.

ISSN: LINKING DATA AND METADATA

The afternoon session started with Reynolds' presentation on the role of the ISSN in the electronic environment as an essential link for connecting data and metadata. Reynolds explained that the expansion of content (or data) required an expansion of metadata to facilitate search, identification, matching, and other data management tasks. In order for metadata to function well, standardization and quality control is key. Reynolds' presentation provided examples of how ISSNs, like other standard identifiers, can help make connections between data and metadata.

The ISSN, a critical metadata element and standard identifier, can help connect users to content through metadata in OPACs, ERM systems, OpenURL, knowledgebases, abstracting and indexing services, and the ISSN Portal. To set the stage for further discussion, Reynolds demystified common misconceptions about ISSNs. She clarified that ISSNs in fact could be assigned retrospectively to ceased serials in addition to current serials. ISSNs are assigned by ISSN centers throughout the world including the ISSN International Centre located in Paris, not self-assigned by publishers as are ISBNs. The ISSN is both an international standard and a U.S. standard. ISSNs have been assigned since 1972, and it is free of charge to get an ISSN. The ISSN Portal, a subscription product available from the ISSN International Centre, is the most complete and authoritative source for ISSNs, and is more accurate and comprehensive than other sources such as CONSER records in OCLC or directories such as Ulrich's Periodicals Directory. Reynolds commented that in the electronic environment, the ISSN has grown beyond its traditional functions and taken up new and exciting roles.

Authoritative ISSN data can help clean up dirty metadata. Reynolds stated that everybody in the serials supply chain had dirty metadata, such as mistyped ISSNs, missed ISSNs, and incorrect ISSNs associated with former titles or titles in other formats. The good news is that the ISSN Register, an international database developed and maintained by the ISSN International Centre, can help to keep metadata clean. The ISSN Register contains metadata records for the more than one million ISSNs that have been assigned so far. It includes data received from the eighty-five ISSN national centers around the world and is updated on a continuous basis. The ISSN Register is accessible through the ISSN Portal, which is a subscription product and is available for free trial. Reynolds emphasized that ongoing access to authoritative ISSN data by all partners was important because it would help keep multiple data sources synchronized so metadata could work effectively.

A new mechanism called the “linking ISSN” (ISSN-L) was introduced in the revised ISSN standard (ISO 3297) that was published in 2007. The linking ISSN was designed to help solve the “multiple ISSNs” problem caused by separate ISSNs being assigned to print and online formats of a serial, which can often result in failures in OpenURL linking, record merging, and deduping, when the ISSN from different medium versions are being used to represent the same content. The linking ISSN is a medium-neutral collocating mechanism that enables linking and grouping of different media versions of a continuing resource. It will facilitate OpenURL resolution regardless of multiple ISSNs for different manifestations, collocation in ERM systems, and OPAC searching to retrieve all manifestations, as well as all situations where identification without regard to medium is desired. The revised standard also affirms the policy of separate medium-specific ISSNs. Separate ISSNs will continue to be assigned for print, online, CD-ROM, and other versions and will continue to be used for specific product identification in library ordering, claiming, check-in processes, in the European Article Number (EAN), or in the workflows of subscription agencies, all situations where identification of the specific medium version is needed.

Reynolds noted that the implementation of the ISSN-L is underway at the ISSN International Centre. The entire ISSN Register will be populated with ISSN-Ls. All resources, current or ceased, will have an ISSN-L, whether they are published in one medium or more than one. Retrospective designations will use the lowest ISSN from the cluster linked via the MARC 776 field, while future and ongoing designations will use the first-assigned ISSN, which may not always be the lowest in numerical order because of the way blocks of ISSNs are allocated to the national centers. Reynolds emphasized that ISSN-Ls could not be predicted and that some situations could be complex, such as when titles of different formats changed at different times. In MARC 21, two new subfields in the field 022 have been approved: subfield “1” for the ISSN-L and subfield “m” for a cancelled ISSN-L. Reynolds happily announced that provisions were being made by the ISSN International Centre to provide free ISSN-L information via two tables of correspondences (ISSN to ISSN-L and ISSN-L to ISSN). These tables will be available free of charge on the ISSN International Centre website soon. She further noted that ideally ISSN-L would be displayed on all serial publications, but she also admitted that she was not sure if all publishers would welcome the idea of printing yet another ISSN on their publications. The implementation of ISSN-Ls at the ISSN International Centre is expected to be complete before the end of 2008.

Good ISSN data can help facilitate interoperability between systems. Reynolds noted that a lot of data management tasks in the serials supply chain relied on data exchange and communication between systems and that ISSNs could greatly help facilitate the process. For example, ISSNs can help with importing and exporting data to and from various sources (Publications Access Management Services, publishers, libraries, A&I services), data migration to new systems, and data exchange processes such as ONIX transactions. ISSNs in ERMs can be used for title identification, file matching, and deduplication. An ISSN or an ISSN-L can be embedded in DOIs, OpenURL links, or URNs (Uniform Resource Name) to facilitate linking and interoperability. As OPAC links, ISSNs and ISSN-Ls can also help provide “FRBR-ized” displays as well as displays that group earlier, later, and related titles or displays that group together different physical formats of the same title. Additionally, ISSNs in the OPAC can be used to link outward to other sources of metadata that include ISSNs such as entries in Ulrich's Periodicals Directory because Ulrich's entries provide more complete and current publisher contact information and subscription price information than most library catalog records for serials.

Speaking of the future of ISSNs, Reynolds indicated that she could envision development of machine-to-machine ISSN Web services at some point in the future, which would certainly be a step in the right direction. She commented on another future vision: that through ISSN online registration, the National Serials Data Program (NSDP) would be able to better help link publisher metadata and library catalog records. In this model, publishers will provide metadata to NSDP via interactive Web forms, and NSDP would use that publisher metadata as basic descriptive metadata to create baseline OPAC records for further enhancement with controlled headings by catalogers. This model presents several benefits. It promotes exchange of clean metadata where publishers get authoritative ISSNs while NSDP's ISSN record is based on metadata from the source without rekeying by staff. It can streamline the cataloging process by providing baseline OPAC records for catalogers so that they can focus on knowledge-intensive tasks such as subject analysis, authority headings, and classification assignment. It can create closer ties between publishers and libraries. Reynolds noted that potential metadata exchange partners also included abstracting and indexing database vendors, OpenURL knowledgebase vendors, and PAMS (Publications Access Management Services).

CHANGES IN COOPERATIVE CATALOGING STANDARDS: IMPLEMENTATION OF THE CONSER STANDARD RECORD

Hawkins' presentation focused on issues surrounding the implementation of the Cooperative Online Serials (CONSER) standard record, which is a new policy in cooperative cataloging standards developed in response to the rapidly changing cataloging environment in the digital age.

Hawkins began by giving an overview of the CONSER program. He joyfully announced that CONSER was celebrating its thirty-fifth birthday this year and that it currently had over fifty institutional members. CONSER became a program under the Library of Congress Program for Cooperative Cataloging (PCC) in 1995, along with Monographic Bibliographic Record Program (BIBCO), Name Authority Cooperative Program (NACO), and Subject Authority Cooperative Program (SACO). Hawkins went on to show a diagram illustrating the relationship of the four programs under PCC. CONSER is well known for its Serials Cataloging Cooperative Training Program (SCCTP). SCCTP provides standardized training materials and trained trainers in the field of serials cataloging, through workshops sponsored by library associations, networks, and institutions. SCCTP has benefitted over 5,000 individuals in the past ten years.

Commenting on the process of introducing policy or standard changes in PCC, Hawkins presented a diagram showing the PCC structure. Under the Steering Committee that makes final decisions, there are two levels: the policy and operations committees. The PCC Policy Committee is made up of technical services directors at member institutions and implements change through development of the long-term strategic and tactical goals of the PCC. The BIBCO and CONSER operations committees are made up of staff that oversee the day-to-day cataloging at institutions, implement changes in practice and policy based on practical needs, and carry out tactical objectives of the PCC.

The need for change in serials cataloging is driven by several forces. There is a strong push for doing more, better, and faster at a cheaper cost. There is a mandate to seek more efficient creation of records that serve user needs. There is a desire to share records and their creation widely. There is an anticipation of a new cataloging code on the horizon. It was in such an environment that the CONSER standard record was developed. The purpose of the CONSER standard record is to provide essential elements for meeting user needs, allow catalogers freedom to make decisions, and streamline training and cataloging practices for serials.

This initiative started in August 2005 when a working group was formed to work on the “Access Level Record for Serials,” the former name of the CONSER standard record. Developed in a cooperative cataloging environment, the new record incorporated a focus on user needs defined in Functional Requirements for Bibliographic Records (FRBR). Fourteen libraries or institutions participated in pilot testing the new record and provided feedback from the perspectives of many different types of users, including public services staff, circulation staff, and acquisitions staff. A final report was published in July 2006, which was subsequently approved by the PCC Policy Committee, and the implementation began on June 1, 2007.

However, the implementation of the CONSER standard record did not go without challenges. First of all, the implementation was launched at a time when there was a feeling of discontent in the cataloging community due to the Library of Congress' decision, announced in April 2006, to cease creating series authority records as part of its cataloging operations. Secondly, Resource Description and Access (RDA) was being developed at the same time. It was not clear how RDA would be related to the CONSER standard record and if there needed to be a one-to-one correspondence between the two.

To meet those challenges, Hawkins reported that CONSER operations committee members took ownership to promote a better understanding of the new standard among member libraries. The membership agreed to monitor the implementation for a year to identity anything that needed to be changed. CONSER formalized communication channels with RDA developers through the PCC representative on the ALCTS Committee on Cataloging: Description and Access (CC:DA). Other cataloging communities were invited to learn about the new standard through materials posted on the CONSER website and through experiments with live online learning.

Because of these efforts, CONSER members have largely embraced the standard, and other communities have expressed interest in using the standard as well. Few changes to the standard were identified over the past year, and member libraries have freely offered additional ideas. Hawkins concluded that encouraging participation and buy-in from those implementing the standard is key to the success of the implementation of the CONSER standard record.

INSTITUTIONAL IDENTIFIERS

Henderson's presentation was centered on the new NISO Working Group on Institutional Identifiers (I2), which is charged with proposing a standard for an institutional identifier that can be implemented in all library and publishing environments.

As an introduction, Henderson provided an overview of the issues surrounding institutional identifiers and the efforts that have been made to address those issues. The supply chain between libraries and their content providers is a complicated process with many transactions. Any mistake in these transactions may lead to customers not receiving their content. On the other hand, content providers often distribute materials or content to a variety of entities within an institution (such as libraries, departments, and offices) or across institutions (such as local, state or regional consortia). Without a common way of uniquely identifying the institution and its relationship with its subsidiaries or its partners in a consortium, customer service is bound to suffer at many levels for the stakeholders engaged in information exchange. Unfortunately, this is exactly the situation in which we find ourselves.

In response to this challenge, three main initiatives have emerged in the past few years, and two of them are still active. Journal Supply Chain Efficiency Improvement Pilot (JSCEIP), established in 2006, was an industry-wide pilot project aiming to discover whether the creation of a standard, commonly used identifier for institutions would be beneficial to all parties involved in the journal supply chain. This group concluded its work in January 2008 with publication of a final report, announcing its positive finding about the need for unique institutional identifiers. The work of this group became a base for the NISO I2 Working Group. The other two initiatives are led by libraries and publishers respectively. The library initiative is the WorldCat Registry, which seeks to identify the world's libraries and collect information about them and their services. The publisher initiative is Ringgold's Identify, which identifies institutions that subscribe to academic journals. Henderson reported that Ringgold's Identify database currently contained over 100,000 institutions and was constantly being developed.

Following the overview, Henderson proceeded to discuss four major aspects of the NISO I2 Working Group: stakeholders, scenarios, a work plan, and a timescale. The list of stakeholders she presented was impressively long, showing twenty groups of stakeholders, which include libraries, agents, publishers, hosting services, institutional repositories, consortia, and many others. Henderson noted that the list was by no means a comprehensive one because when she talked to other groups about this new NISO initiative many of them expressed an interest in being involved.

The Working Group has identified nine topical scenarios that need institutional identifiers: electronic resources supply chain, e-learning/courseware, research evaluation/funding, author registries, institutional repositories, licensing/micro-licensing, usage metrics, collaboration, and authentication. Each of these scenarios has its own set of stakeholders and issues. For example, the stakeholders in the scenario for the electronic resources supply chain are libraries, agents, publishers, aggregators, distributors, hosting services, and fulfillment services; and the key issues for this particular scenario include granularity of metadata, population of systems, and system capability. The Working Group has also identified four additional scenarios in terms of communities: higher education, research evaluation, public library, and medical communities. Henderson noted that different communities had different needs regarding institutional identifiers. Like the topical scenarios, each of these community scenarios also has its own set of stakeholders and issues. It is important to note that these scenarios represent possible case studies and that more scenarios may emerge as the Working Group continues its work.

The goal, as Henderson stated, is to establish a core set of data that is required for unique identification and that will work with as many scenarios as possible. Building on that, other data may also be used to support the business models of respective organizations.

Speaking of a work plan, Henderson pointed out that the first step would be to develop sample scenarios. After that, there will be eight more steps to go through before a final proposal can be released for ballot. As for a timeline, Henderson presented the following:

Final appointment of Work Group (June 10, 2008)

Development of sample scenario (June 2008)

Data gathering (June–December 2008)

Test and finalize direction of identifier (January–February 2009)

Working draft (March–October 2009)

Submit for public review (August 2009)

Draft standard for trial (October 2009–March 2010)

Start trial use (November 2009)

Ballot release (March–May 2010)

Henderson admitted that the timeline was ambitious, given the fact that this project involved working with a lot of people and a lot of groups. However, she was confident that it was an achievable goal.

One of the major steps in the work plan is to review current identifier standards and usage and explore their relationship to the work of the I2 Working Group. Henderson elaborated on this topic and mentioned four existing standards or guidelines that exist in this space and which themselves work with an institutional identifier: ICEDIS (International Committee on EDI for Serials), ONIX (ONline Information eXchange), Linking ISSN, and COUNTER (Counting Online Usage of NeTworked Electronic Resources). She further commented that there are four other new standards and guidelines deserving our attention: NISO/UKSG Working Group on Knowledge Bases and Related Tools (KBART), UKSG's Project TRANSFER, NISO Working Group on Cost of Resource Exchange (CORE), and author registry initiatives such as CrossRef's CrossReg and Thomson Reuter's ResearcherID.

Henderson emphasized that the use of an institutional identifier in the journal supply chain would improve efficiencies. However, implementation will require a commitment and work by all parties to use such an identifier.

In response to questions from the audience regarding Ringgold's Identify database, Henderson clarified that the identifiers used in the database were dumb numbers, rather than numbers with hierarchies, for the ease of data management. She confirmed that the institutions and their metadata represented a highly volatile data set. She reported that at least 30 percent of the data in the Identify database changed in some way every year. Therefore, maintaining such a database is certainly not an insignificant effort.

METADATA: THE IMPORTANCE OF INTEROPERABILITY, AND FACTORS TO CONSIDER IN DEVELOPING IMPLEMENTATION STRATEGIES

Because the original speaker, Hoffman, could not attend the preconference, Sri Rajan, Electronic Products Manager at Swets, presented instead. The presentation covered three topics: challenges surrounding expanding metadata types and use, the importance of interoperability for the exchange and reuse of metadata in a standard way, and factors to consider when developing metadata implementation strategies, especially in view of the ONIX protocol.

Rajan began by clarifying that his talk would be mainly about metadata for licensed resources. He stated that there were various groups of people generating various types of metadata and storing them in various types of systems. This reality naturally produces varied results, where metadata could be structured, unstructured, or somewhere in between. But why do we need to focus on metadata? Rajan explained that metadata is helpful in identifying, assessing, and managing information. Simply put, it helps us find the information we need. The more metadata we have, the more we (and others) can accomplish. In addition to the various types of metadata already mentioned, Rajan added that user-generated metadata from the social Web had emerged as a new breed of metadata, which was commonly manifested in the form of tag-based browsing, recommendations, ratings, and reviews. On the positive side, user-generated metadata is simple and flexible in formats. It is distributed to a large population—everybody gets to see everybody's contribution. It is also relevant because it is community-based and always current. On the negative side, users' personal tags do not always make sense, and there is an issue of scale—something that is very popular does not necessarily mean that it is a good resource.

There is a great need for exchanging and reusing metadata among the players in the e-resource supply chain, to which metadata interoperability is essential. Rajan went on to provide some guidelines for building metadata interoperability. First of all, one would inventory sources and targets for the metadata and compare the levels of granularity and content rules between systems, such as a vendor's system and a library's electronic resource management system. Then the involved partners would need to agree on metadata standard and message exchange (Web services). Finally, one would review metadata schema and data dictionaries to determine what creation tools would be used to assist in the conversion to standard. One should also consider applying templates and crosswalk wherever possible.

Rajan noted three types of challenges to building metadata interoperability: system gap, process gap, and standard gap. Vendor systems may lack sufficient ability to extract, load, or properly store required metadata. To address this challenge, Rajan suggested that libraries submit development requests to their vendors to fill system gaps. Process gap occurs when departmental procedures, policies, and quality control can not adequately track or properly store required metadata. Rajan suggested addressing this challenge by reviewing departmental procedures and policies as well as standardizing local thesauri as much as possible. Additionally, existing metadata standards may have limited ability to properly represent the metadata. Rajan's suggestion for this was to explore possibilities for modifying existing standards or developing new ones with NISO.

As an example, there is significant interest among some libraries to automate the process of moving some acquisitions data from their Integrated Library Systems (ILS) into their Electronic Resource Management Systems (ERMs). In response to this need, Digital Library Federation's Electronic Resource Management Initiative, Phase II (DLF ERMI II) formed a subcommittee, which investigated the subject and subsequently published a White Paper on Interoperability between Acquisitions Modules of Integrated Library Systems and Electronic Resource Management Systems in January 2008. The subcommittee conducted case studies with four libraries and reviewed the institutional environment, consortium considerations, systems architecture, and electronic resource workflows of each library. A review of these components revealed a complex systems environment, transitioning of staff and workflows to more effectively manage electronic resources, and license negotiation of documentation as relatively new and time-consuming tasks. The case studies also yielded a set of seven acquisitions elements deemed critical for exchange between the ILS and ERM: purchase order number, price, subscription start and end dates, vendor name, vendor ID, fund code, and invoice number.Footnote 5 Rajan happily announced that a new NISO initiative on Cost of Resource Exchange (CORE) had been established. This new standard project will work on facilitating the exchange of cost, fund, vendor, and invoice information between ILS, ERM, Business Systems and other interested parties such as subscription agents.

Rajan went on to introduce another standard initiative, EDItEUR ONIX for Licensing Terms, with a focus on ONIX for Publications Licenses (ONIX-PL). The ONIX-PL format is intended to support the communication of license terms for electronic resources between licensors, licensees and any intermediaries involved in the licensing process, such as subscription agents and library consortia. And, the ONIX-PL Dictionary is designed to provide a rich but well-structured vocabulary for expressing many of the key elements in the format. Rajan presented an implementation strategy adopted at Swets, where a license bank was created to store key license information from publishers so the information was available for use by Swets customers. He explained that the SwetsWise License Bank contained forty-five separate license data fields, tracked following the guidelines of the DLF ERMI. The license data in the License Bank is proprietary but is available through Web services for extraction by external parties without knowing Swets proprietary infrastructure. The data in License Bank is stored in a proprietary format, but can be mapped into ONIX-PL format to transfer the license data to other systems.

Rajan wrapped up his presentation by emphasizing that metadata was very powerful and very necessary. He believed that automating the exchange of metadata was the key and that much of the work could be done up front when developing metadata standards. He encouraged organizations to develop tools to aid in future metadata initiatives and urged participation from all parties in the metadata supply chain because we are all in this together!

CONCLUSION

Carpenter provided a brief ending remark. He echoed Rajan's final comment that “we are all in this together” as a great way to end the preconference. He thanked all the speakers for presenting an informative and engaging preconference, which addressed multiple aspects of the topic, “Metadata in a Digital Age,” from the perspectives of various, different communities. He also expressed appreciation for the organizing groups of the preconference, NASIG and NISO, as well as the sponsoring groups: Thomson Reuters and Swets.

Carpenter announced that the presentation slides would be available on the NISO website during the following week. Once again, he emphasized that NISO relied on volunteers from different communities to further their work, and he encouraged and invited interested audience members to participate in NISO's many working groups.

CONTRIBUTOR NOTES

Renee Register is the Global Product Manager for Cataloging Partnering at OCLC.

Kevin Cohn is the Director of Client Services at Atypon Systems, Inc.

Les Hawkins is the CONSER Coordinator at the Library of Congress.

Helen Henderson is the Managing Director of Ringgold Ltd.

Regina Reynolds is the Head of the National Serials Data Program, the U.S. ISSN Center of the Library of Congress.

Steven C. Shadle is the Serials Access Librarian at the University of Washington Libraries.

William Hoffman is a Process Analyst at Swets. William Hoffman was unable to attend the session so Sri Rajan, Electronic Products Manager at Swets, presented in his place.

Paoshan W. Yue is the Director of Technical Services at the University of Nevada, Reno Libraries.

Notes

1. Library of Congress, “On the Record: Report of the Library of Congress Working Group on the Future of Bibliographic Control.” 2008. http://www.loc.gov/bibliographic-future/news/lcwg-ontherecord-jan08-final.pdf (accessed July 17, 2008).

2. OCLC, “Next Generation Cataloging.” 2008. http://www.oclc.org/partnerships/material/nexgen/nextgencataloging.htm (accessed July 17, 2008).

3. NISO, “NISO Metadata Forum Agenda.” 2008. http://www.niso.org/news/events/2008/metadata08/agenda (accessed July 17, 2008).

4. ProQuest, “The Value of CSA Deep Indexing for Researchers.” 2006. http://info.csa.com/csaillustrata/whitepaper/CSAIllustrataWhitePaper.pdf (accessed July 17, 2008).

5. Digital Library Federation, “White Paper on Interoperability between Acquisitions Modules of Integrated Library Systems and Electronic Resource Management Systems.” 2008. http://www.diglib.org/standards/ERMI_Interop_Report_20080108.pdf (accessed July 17, 2008).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.