2,217
Views
2
CrossRef citations to date
0
Altmetric
TRENDS IN... Emme Lopez, Michelle Bass, and LaVentra Danquah, Column Editors

Trends in Research Data Management and Academic Health Sciences Libraries

, , , &

Abstract

Spurred by the National Institute of Health mandating a data management and sharing plan as a requirement of grant funding, research data management has exploded in importance for librarians supporting researchers and research institutions. This editorial examines the role and direction of libraries in this process from several viewpoints. Key markers of success include collaboration, establishing new relationships, leveraging existing relationships, accessing multiple avenues of communication, and building niche expertise and cachè as a valued and trustworthy partner.

An introduction to research data management

About the author

Our editors typically write the background and support for column content. However, in this edition, Jeff Lacy provided this piece. C. Jeff Uribe-Lacy is the Liaison Librarian to the Schools of Dentistry, Public Health, and Graduate Studies at the University of Texas Health Science Center San Antonio. Jeff earned a master’s in library and information studies from the University of Alabama. In a previous position as a science librarian at a small liberal arts university, Jeff conducted a survey of RDMS in the Oberlin Group of Libraries.

Why talk about research data management?

In 2008, the National Institutes of Health (NIH) Public Access Policy required peer-reviewed manuscripts based on NIH-funded research to be openly and freely available to the public.Citation1 Recently, the NIH has done the same for data. The Policy for Data Management and Sharing (DMSP) requires data gathered by NIH-funded research to be openly and freely available to the public as of January 25, 2023.Citation2 Researchers are largely in favor of the DMSP, but also predict that meeting the new data management requirements will require significant additional work and may strain early-career researchers.Citation3

This additional effort is no small consideration. DMSP represents a cultural shift in medical research toward more data sharing.Citation4 Other funders of biomedical research, especially other government agencies, are likely to adopt similar data management and sharing policies. Ready or not, researchers must adapt to these new research data management mandates, as will the libraries and librarians that support them.

A brief history of research data management services

Libraries have invested in research data management for almost two decades. Defining the exact start is difficult, but several noteworthy developments happened as early as 2006. In that year, Purdue University Libraries created the Distributed Data Curation Center in 2006Citation5 and Cornell University Library formed the Data Working Group to consider engagement in data curation.Citation6 Also in 2006, the Association for Research Libraries (ARL) explored data management with two major initiatives. First, the ARL formed an E-Science Task Force and charged it to make recommendations relevant to “the curation of long-lived digital data” and “storage of massive data sets.”Citation7 Second, ARL held a workshop on “the role of academic libraries in [the] digital data universe” supported by the National Science Foundation (NSF).Citation8 The following year, NSF posited that academic librarians have potential roles in the application of library standards to the analysis, archiving, and curation of digital data.Citation9

In those early days, libraries’ interest in research data management was driven by a growing concern in academia over long term, large-scale digital data curation.Citation7,Citation8,Citation10–12 Relevant buzzwords included big data, born digital, cyberinfrastructure, data deluge, and e-science. Shortly after the groundswell of investigation into research data management (RDM) in 2006, surveys suggested that libraries’ interest did not quickly translate into services. A 2007 survey found that fewer than 13% of U.S. and Canadian academic health science libraries (n = 134) provided research data management services (RDMS).Citation13 Similarly, in 2008, Cornell University’s Data Working Group discovered few academic libraries involved in research data management.Citation6 In 2010, only 10% of libraries in the US and Canada with ARL member profiles (n = 86) indicated e-science support as an important service.Citation14

The NSF spurred the development of RDMS in libraries in 2011 by enacting a policy requiring the inclusion of research data management plans (DMPs) in grant proposals.Citation15 This new requirement and other, similar federal policies created a need among science researchers for support.Citation16 The immediate needs of researchers proved more persuasive to libraries developing relevant services than the looming shadow of big data. By 2013, according to a survey of ARL libraries (n = 73), 72% offered RDMS and about 50% of those began offering RDMS after the NSF’s policy announcement.Citation17 Apropos of the new requirements, several library RDMS were related to DMPs, including DMP consultations, DMP resources, and DMP training.Citation17

Growth in library RDMS since the kickstart provided by NSF DMP requirements has been uneven. Where libraries serving large, research-oriented institutions tend to offer a menu of services, other libraries struggle even though they see the need. Many libraries “continue to ‘plan’, rather than actually offer” robust RDMS.Citation18 Different RDMS models, often involving different relationships or coordination between the library, campus information technology (IT), and research offices, have emerged from different institutions, influenced by local needs and personnel.Citation16 This variety, and the lack of a one-size-fits all solution, has elicited much professional analysis and advice among librarians.

The potential roles, opportunities, and challenges for librarians in respect to RDM and the provision of RDMS have been perennial topics in the literature.Citation12,Citation19–22 Several primers on RDM/RDMS have appeared.Citation11,Citation20,Citation23,Citation24 Libraries with successful RDM programs have encouraged best practices and communities of practice with a piloted model for establishing RDMSCitation25 and a training academy for data management skills.Citation26 Even as late as 2021, RDMS was called a key global trend among health science libraries.Citation27

Where are we now?

Several recent publications provide a broadly scoped perspective on the state of RDMS, with evidence of overall progress. 60% of the libraries in Tenopir’s 2012 surveyCitation28 did not provide any kind of RDMS, but that number dropped to 44.1% in 2019.Citation18 There is also widespread acknowledgement that libraries should not be the sole providers of RDMS to their institutions, but rather should collaborate or coordinate RMDS with other departments.Citation18,Citation24,Citation29,Citation30

Despite these improvements, there is evidence of inequity. Among the libraries that do not provide RDMS, the reasons tend to be insufficient funds, insufficient time, insufficient staff, and/or lack of interest from researchers.Citation18 Libraries serving institutions with larger student populations, doctoral programs, research programs, or NSF funding are more likely to provide technical support, have dedicated RDMS librarians or specialists, or hire new staff for RDMS.Citation18 In Cox et al’s examination of service maturity in 2017, no library examined had RDMS extensive enough to be considered mature.Citation29

Finally

Libraries are still developing RDMS. There is still room for growth and innovation. If history repeats itself, with the NIH’s DMSP requirement providing a kickstart to RDMS similar to the NSF’s DMP requirement, we should expect a new wave of progress. Informed by studies like those outlined above, and in the 3 case studies below, librarians in institutions without the funds or personnel to develop robust RDMS can identify services most likely to succeed in their circumstances and focus their resources.

Case study: Harvard University

About the author

Julie Goldman is the Countway Research Data Services Librarian with the Harvard Library. Julie earned her Master’s in Library and Information Science from Simmons University and a Bachelor’s of Science in Marine Biology from the University of New Hampshire. Prior to her position at Harvard, Julie was eScience Coordinator with the National Network of Libraries of Medicine (NNLM), New England Region at the University of Massachusetts Chan Medical School. With the NNLM, her work focused on building research data education and resources for librarians. Julie is the Managing Editor for the Journal of e-Science Librarianship (JeSLIB).

Background: Harvard Library

Harvard LibraryCitation31 is a multi-library system with several locations across the University’s campus and worldwide. The system includes over 25 libraries that serve professional schools and disciplines, responding to the needs of their unique communities. Harvard Library houses the department for Open Scholarship and Research Data Services, with a goal to strengthen the Library’s response to growing expectations for open science, encompassing publications, data, code, and other research outputs. Harvard Library Research Data ServicesCitation32 brings together members of Harvard’s network of data practitioners, subject experts, services, and resources to help ensure that Harvard’s multi-disciplinary research data is findable, accessible, interoperable, and reusable, also known as the FAIR Data Principles.Citation33 The Research Data Services team includes two full-time embedded data librarians at the Countway Library of Medicine and the Harvard Dataverse Repository.

Working together: Harvard University-level data services

Harvard University established a collaborative approach to data services, bringing together the Library, Research Computing, and the research community. Since Harvard schools and units are relatively distributed and siloed, many networks and groups across the campus either directly support data services or indirectly support various aspects of data management. As one of these network partners, Harvard Library works with other campus groups to deliver data-related services and resources. Local units communicate and coordinate through interactions on school and institutional committees.

Changes to the data policy landscape significantly affected the institution’s data services. As a top research institution, the university receives a large amount of external funding from agencies like NIH. In 2022, the Harvard T.H. Chan School of Public Health (HSPH) ranked second among U.S. Schools of Public Health receiving NIH funding, while Harvard Medical School (HMS) ranked 39th among Schools of Medicine.Citation34 Given the established working relationships and expertise with biomedical research, the Research Data Services team spearheaded the coordination of additional services and institutional response to the 2023 NIH DMSP.

Harvard data librarians lead working groups to develop policy guidance and education, and created services for “Data Management Plan Review” through both Countway LibraryCitation35 and Harvard Library.Citation36 Service statistics show a growth in data management services from 2022 to 2023. More researchers signed up for DMPTool accounts, created Data Management Plans, and requested a review services () than previously seen before.

Table 1. Data management services statistics October 2022–March 2023.

The Research Data Services team works together to connect services, tools, and units to foster a community of research computing and data at Harvard University. Due to Harvard’s distributed network of support, data services thrive when they are offered at the local level. The embedded services the library provides through Countway Library and Harvard Dataverse are true success stories. The next section details how data services have grown over the last six years at Countway Library with the addition of a dedicated Data Services Librarian.

Applying data services locally: Countway Library

The Countway LibraryCitation37 is located in the Longwood Medical Area (LMA) and serves HMS, HSPH, and the Harvard School of Dental Medicine. LMA is a diverse population of over 12,000 faculty, 10,000 fellows and postdoctoral students, and a variety of degree programs.

Countway’s Research Data Services (RDS) Librarian position was created in 2017 to determine and prioritize the RDS most applicable to the local LMA community. As a member of the Publishing and Data Services team, the RDS Librarian serves as a subject expert, providing direct support to biomedical and public health researchers in navigating the data landscape, including writing data management and sharing plans, finding data, organizing data, publishing and sharing data, and guidance on the reuse of data.Citation38

A key activity of the RDS Librarian is teaching Research Data Management SeminarsCitation39 and organizing special events to celebrate special topics such as International Open Access WeekCitation40 and Love Data Week.Citation41 The RDS Librarian is embedded in curricula and required training, such as the NIH’s Responsible Conduct of Research (RCR) course. The RDS Librarian is an expert in free and subscription platforms and tools for the research community, such as OpenRefine,Citation42 DMPTool,Citation43 protocols.io,Citation44 and Open Science Framework.Citation45

This model of a local embedded data librarian provides tailored services to a specific community, ensuring the subject expertise researchers often require. The model has benefited the broader research community at Harvard University as well. Due to the COVID-19 pandemic, the RDS Librarian began offering monthly classes as webinars, allowing members outside LMA to learn about RDM. The use of Zoom has expanded the RDS Librarian’s reach in offering more classes to more participants ().Citation46

Table 2. Data management seminar series statistics 2018–2022.

Many community members have provided feedback on the RDM Seminar Series:

Thank you so much for organizing these webinars. I have always wanted to come to attend [these] seminars by the Data Management group. Because the location at Countway Library is far from our lab, it requires me to time it with my experiments, which was usually not possible. Then, this pandemic happened and I finally managed to attend your seminars online! Thank you very much for [everything]. This is a great resource. (Postdoc, Harvard University, 2020)

These sessions are SO helpful, I really wish that other areas of the school would provide seminars like this that go through everything you need to know about the data lifecycle and what it means for people in all stages. Thank you! (Staff, Harvard T.H. Chan School of Public Health, 2022)

In 2022 the RDS Librarian worked with a library school student intern to conduct an environmental scan on the state of data services across LMA to get additional community input and inform the ongoing development of data services. The intern developed a survey that was distributed as an anonymous link via email to 1,527 recipients and was advertised in the Countway Library newsletter. At the end of the survey period, there was a 6% response rate, with 143 respondents. Additionally, the intern conducted ten virtual interviews with library stakeholders.

Seventy-five percent of respondents indicated an interest in training opportunities, including 100% of faculty.Citation47 This response revealed a disconnect in the awareness of data services and resources available based on learning communities. The need for maintained university-wide communication as the data landscape develops at a rapid pace is crucial for successful coordinated data services. From this feedback, the Countway Library will ensure its data services, particularly the educational seminars, are catalogued in a university-wide research data management service directory to facilitate knowledge dissemination and to provide an outreach mechanism for promoting RDM services, trainings, and education programs. Additionally, the Countway Library will leverage local communities of practice to encourage internal knowledge sharing and to build an effective referral process.

Fostering local expertise for data services: Longwood Medical Area Research Data Management Working Group

The RDS Librarian also co-chairs the Longwood Medical Area Research Data Management Working Group (LMA RDMWG),Citation48 which has been vital in building the research data management services and resources for the LMA research community. Members draw strength from their diversity of experience and expertise, which informs process-implementation and decision-making. In order to create useful resources, the LMA RDMWG has applied a “strawman” step approach:

  1. Broaden and combine the issues.

  2. Develop a draft document.

  3. Solicit feedback from various stakeholders.

  4. Refine and finalize the document.

This ensures the developed solution addresses the research community’s concerns and needs.Citation49

For example, the LMA RDMWG observed Principal Investigators needed help with streamlining the intake of new employees and ensuring they could access and locate data following an employee’s departure.Citation50 LMA periodically experiences large influxes of graduate and postdoc students associated with both Harvard University schools (particularly HMS and HSPH) and Harvard Medical School’s affiliate teaching hospitals. In response, LMA RDMWG created onboarding and offboarding checklists to outline data management processes.

Through outreach and education, the RDWMG found the checklists to be most effective when tailored to a lab’s specific needs. The George Church Lab with the Harvard Medical School Genetics Department,Citation50 successfully adapted the checklists to help streamline and improve their existing lab protocols and procedures. Their lab consists of over 100 members, so they experience considerable turnover from entering and existing graduate students and postdocs. By reviewing the checklists with the graduate students as they enter the lab and then prior to their exit, lab procedures become more consistent and keep projects moving forward.Citation50

Overall, the local data management working group can break down the silos and barriers between institutional units. With guidance from the RDS Librarian, the group documents and tracks services offered by the various groups across the campus, working to create greater transparency on data services for LMA researchers, and ultimately researchers across Harvard University. Defining these data services and project workflows connects disparate units and brings service providers together, strengthening Harvard’s commitment to research data management.

Case study: University of Washington

About the author

Jennifer Muilenburg is the Research Data Services Librarian at the University of Washington. Her focus is on coordinating campus support for research data and supporting the research data needs of researchers of all levels, including providing education, consultation, guidance, and reference support.

Background

The University of Washington (UW) is a Research 1 institution with a large number of students and researchers across three campuses in Washington State. The main campus is in Seattle and there are additional campuses in Tacoma and Bothell. For the 2022–2023 academic year, enrollment is close to 33,000 undergrads and 16,000 graduate students in Seattle; total across all three campus is 42,000 undergrads and 17,000 graduate students. For 2019, the most recent year for which a public annual report is available, UW received $1.6 billion in grant funding. The top five federal funders include the Department of Health and Human Services (which includes NIH), NSF, and the Departments of Defense, Energy and Commerce. Over $800 million of that funding was awarded to the UW Schools of Medicine and Public Health.Citation51

The UW Libraries is a collective organization serving all three campuses, as well as the island campus on Friday Harbor in Puget Sound. UW is ranked one of the top ten public university research libraries by the Association of Research Libraries and employs about 350 librarians, staff, and students to support all campus students and researchers.

RDM at UW

Some libraries are lucky enough to have multiple data-focused personnel on staff, but many others provide support for research data with limited staff who may not be data focused. A similar situation exists for library budgets for research data services: a handful of libraries have a robust budget for RDM efforts, while others make do with small amounts of money for data support staff and services. In these cases, collaboration and coordination across the institution is key to being able to provide the types of data support services researchers need.

UW Libraries has had one or two librarians working in RDM for over ten years. Originally aiding with data acquisition and data management plans, over time services have extended to include access to tools such as Open Science Framework, digital object identifiers (DOIs), ORCID, our institutional repository, DMPTool, and others. Education about data stewardship and management remains a core service component.

UW currently maintains one part-time Research Data Services (RDS) Librarian. Several subject librarians have data-related activities included in their job descriptions, including subject librarians in health sciences, geographic information systems, global studies, biology, scholarly communications. These librarians comprise the Data Services Team (DST), providing support to UW researchers using data. This team makeup promotes broad representation from disciplines commonly performing data-intensive research. The team allows for two-way communication, so that the library can push messaging about new services out to the departments via liaisons, and the liaisons can bring data-related topics back to the DST for discussion. The team structure allows for collaboration around educational efforts as well as cooperative data purchases. With just one dedicated RDM staff, DST is essential to being able to provide support for research data from the library.

The DST works with colleagues around the institution to provide a wide variety of support across the data lifecycle, from data acquisition to assistance with methodologies and coding, active and cloud data storage, and data sharing and publishing. Example departments and units include IT for technical support such as cloud storage or computation, UW’s eScience Institute which focuses on data science across the disciplines, and the UW Center for Studies in Demography and Ecology, which developed the UW Data Collaborative for researchers who need access to highly sensitive data.

These productive collaborations developed over a series of years as a result of relationships built between library staff members and various academic and administrative departments. In some cases, a faculty or staff member has reached out to a library liaison to request assistance with something like acquiring a dataset. In other cases, a library staff member has reached out to a department to request assistance with, for example, data storage for a particular project, or support for a data purchase. Those successful, but small, partnerships are what can eventually lead to larger, more complicated, and effective collaborations across multiple stakeholders and units.

The most recent example of a successful collaboration from around the university concerned the new NIH DMSP mandates that went into effect in January 2023. In the months leading up to the mandate, the RDS Librarian rallied administrative support within the library for the purchase of a membership to the data repository Dryad, one of NIH’s approved generalist repositories.Citation52 This was the UW’s first true data repository. Options had been discussed for years, including whether to create one in-house using open access software or purchase an off-the-shelf solution. In the interim, small datasets were being deposited into “Research Works,” our institutional repository (https://digital.lib.washington.edu/researchworks/).Citation53 This proved problematic since our repository is not optimized for data and had significant size limitations. The Dryad membership allowed UW to meet the growing needs of the majority of researchers who required the use of a data repository.

The timing of this data repository membership was key, and allowed our administrative offices of research, sponsored programs, and others, to include the announcement with their messaging about the new NIH DMSP. Librarians promoted awareness by presenting at top-level research meetings about the new standards and implications for researchers. The library collaborated with administrative groups on LibGuide content which focused on the new mandate and sharing information about other data-related tools and services that researchers can use to adhere to the NIH standards. Within weeks of going live, dozens of deposits were received, and many researchers contacted the library to express their gratitude at having Dryad as an option for their data sharing requirements. Use of Dryad is steadily growing, and the RDS librarian continues to speak to research groups and departments about Dryad and other tools available to researchers.

The reason so many deposits were made so early was due to a collaborative marketing, communication, and education strategy across multiple departments and units. When early announcements were going out about the change to the NIH DMSP, the library made multiple announcements about Dryad and other UW data support, created a LibGuide, and directly communicated with liaisons about the upcoming changes. The liaisons then communicated with their departments, and other announcements were shared with data communities on campus as well as health sciences researchers and others who were likely to be funded by the NIH. Administratively, the library worked with the Office of Sponsored Programs as well as the Vice Provost’s Office to make sure the monthly research administrators list and School of Medicine, and others, knew about the upcoming changes.

This most recent example from UW is just one of many ways we have been able to leverage relationships to engage campus stakeholders on research data management and stewardship issues. It was an example of what can happen when a library collaborates with other areas of an institution, such as IT, research offices, and academic departments, to promote RDM tools, services, and education. Working together in these types of collaborations builds awareness of data services around campus, and provides opportunities for the library and those departments to learn from each other about priorities and needs.

There are multiple other areas for collaboration. The UW library has provided RDM-based curriculum and education modules for over a decade, equipping researchers with the basic skills necessary to effectively manage their research data. UW has provided recurring one-off workshops on persistent identifiers, data management plans and metadata, and local tools and services. For the past six years, we have also offered a four-day, online workshop about basic data management skills and local resources twice a year. This course always generates a wait list and has been very effective at increasing learners’ knowledge and awareness of RDM concepts and skills. This class educates researchers about data management skills, and increases awareness about data tools, services, and other supports around campus.

The course was developed in consultation with multiple library staff and was informed by several campus researcher surveys and interviews about research data needs done by the library. The course itself is available to be downloaded and modified by other institutions. The fact that it recurs on a regular basis helps with awareness, and subject librarians are key to pushing out information to departments when registration is open. We also keep a waitlist/notification list open at all times for researchers who discover the course via online search during a time when it is not offered.

Data publishing and sharing is an area that aligns well with library functions, but the essential needs researchers have for data storage is beyond our technical and staff capabilities, as it is in many academic libraries. This is an area where campuswide collaboration is key. We have departments that offer cloud expertise (UW-IT and eScience Institute), and a department that offers assistance for those who need to access or analyze highly sensitive data (UW’s Data Collaborative). Awareness of campus-based resources and expertise, and establishing and deepening relationships with those units demystify data-related research for all.

Marketing is another a key area to encourage awareness and encourage RDM-based conversations around an institution. Newsletters, blog posts and social media, both from the library and from other departments, go a long way to getting the word out about what is being offered and opens a channel for researchers to be able to ask for additional assistance if necessary.

For those who are looking to engage with other data librarians and curators about how to engage with researchers (as well as many other issues!), there are several options:

  • IASSIST (https://iassistdata.org/) and RDAP (https://rdapassociation.org/) are two groups with annual conferences that are highly focused on research data issues.

  • There is a Slack channel for a data curation/librarian group called Datacure, which requires an active login to Slack.

  • Arizona State University held an inaugural Data Science in the Libraries conference in 2022, and the group continues to meet several times a year to discuss data-science related issues as they pertain to libraries.Citation54

Case study: University of Texas Health Science Center San Antonio

About the author

Andrea N. Schorr is the Associate Director of Resource Management at the University of Texas Health Science Center Library in San Antonio, Texas. Andrea has a Master of Science in Information Science, with a concentration in Information Organization from the University of North Texas. In her role, Andrea is responsible for providing resources and services that support campus educational, clinical, and research initiatives. She is also an active member of her campus Data Management & Sharing Working Group, which is focused on developing local guiding principles and solutions for complying with the NIH Data Management & Sharing policy.

Background

The University of Texas Health Science Center at San Antonio, also called UT Health San Antonio, is a rapidly expanding health science center with five professional schools (medicine, nursing, dentistry, health professions and graduate school of biomedical sciences) and a growing research enterprise that is among the top 3 cities in Texas for NIH funding.

Research at UT Health San Antonio

Prior to the fall of 2022 the library’s involvement in research services primarily focused on targeted resource and research support. The research community at UT Health San Antonio is a focused and discreet group. Researchers are frequently hard at work with little time to waste and are less candid than students and faculty about their needs. The library has made great strides in reaching out to the research community over the years regarding services and support, but gaining the attention of the masses has been a challenge. When the NIH Public Access Policy was first announced the library sought opportunities to work with the research community to build awareness and provide guidance. At that time feedback was less enthusiastic but overall, well received.

UT Health San Antonio response to NIH DMSP

In the fall of 2022, the library was invited to join a taskforce called the Data Management & Sharing Planning Group. The group was comprised of several campus groups including members of the Office of the VP for Research, Compliance, Office of Sponsored Programs, Clinical Informatics, Core Labs, and IT Services. The group was led by the Associate VP for Research and the objective for the group was to develop a plan that would ensure that our research community is aware and working in compliance with the NIH DMSP that would go into effect on January 25, 2023.

Early conversations revealed that a mixed awareness about the DMSP and upcoming requirements. The first priority for the planning group was to create awareness through a unified message that could be disseminated through a variety of channels. The library’s role at this early stage was to develop a web page that would communicate the upcoming implementation date and guidance for more information. The website along with targeted messaging were distributed in late 2022.

The second priority was to assess the current research submission process to understand where new DMSP protocols would need to be integrated. Once points of integration were identified, the planning group broke out into smaller working groups to develop new procedures and workflows for research submissions. The library was grouped with the VP for Research, Clinical Informatics, and IT Services. At this point the library was not entirely sure what their role was going to be. It was clear early on in this process that the Office of the VP for Research would lead this endeavor. In looking at other DMS models at other institutions, there were some libraries that have taken the lead in DMS initiatives and others who are not involved at all. In our case, the library is integrated into the process as partners.

As the new working groups began to pave the way for a revised submission process, plans for a new information web portal and an informatics service began to take shape. UT Health San Antonio is fortunate to have a very active clinical informatics team that is very well-versed in the research grant submission process. Librarians have been asked to work closely with the informatics group on training opportunities for faculty. The library will also continue to participate in the development of a multifaceted web portal. When conversations about data repositories began to occur, roadblocks began to surface. Regarding data storage solutions, there are existing repositories for many different areas of research, but not every subject area is covered in these repositories. In cases like these researchers must look for generalist repositories as a solution for data storage and sharing. While there are generalist repositories available, many of them charge considerable fees for storage. An early assessment of the fees for generalist repository solutions revealed a cost burden that would be discouraging for researchers. This was a turning point. The library knew of a repository solution that was not only cost effective but also included a fully functional framework that was approved by NIH. Through the Texas Digital Library (TDL), member libraries can participate in various services fully supported by TDL staff. One of these services is the hosting a Data Management Repository (DMR) through a Dataverse platform where an institution can manage its research data and make it sharable.

After a thorough review of the Texas Digital Library DMR and the cost of membership, the library, in collaboration with the Office of the VP for Research, decided to pursue a membership with TDL and use the hosted DMR service as a generalist repository solution. The library is currently in in the early stages of implementation. The library has committed to implementing and managing the Data Management Repository and will also develop standards for submission and training for authors.

Lessons learned

Throughout this process the library learned the importance of listening and truly understanding the world that our researchers live in. Being invited to the planning group (and later the working group) has allowed incredible inroads to the UT Health San Antonio research community. This has given the research community (as well as other campus partners) a better understanding of what librarians can do and how they are valuable assets to the research enterprise. It is the library’s hope that this undertaking will lead to more ongoing partnerships with researchers and open doors for more collaboration. The DMSP is a major change for the research community and will no doubt lead to more Open Science initiatives, an area where many librarians are already actively involved.

An expert’s viewpoint

About the author

Peace Ossom-Williamson is Associate Director of the National Center for Data Services (NCDS) at NYU Langone Health, part of the Network of the National Library of Medicine. Prior to this, she was Director of Research Data Services at The University of Texas at Arlington Libraries, where she built a department providing services around data management, data literacy, and data sharing. She also was principal investigator for the free asynchronous Data Analytics Research Training (DART) Course, funded by the NNLM South Central Region (now Region 3). At NCDS, the aim is to use her expertise and that of the rest of the team of data librarians to develop capacity in the health information community to conduct data science and deliver data services by providing learning opportunities, events, programs, and resources in the U.S. Based upon her experience and viewpoints, she provides her perspective below.

Rationale for research data services in libraries

The growth in research data management (RDM) services has come out of the evolution of technology and information access and the resulting need to better organize, describe, store, and share data. As datasets have increasingly been recognized as a research and academic output and open science efforts continue to encourage transparency, reuse, and reproducibility, there is greater focus on making use of these outputs and avoiding spending institutional and public resources on building new datasets and collecting new data where reuse is possible.

Datasets involve more considerations and greater complexity than publications as they consist of drastically different data types and structures, with examples including MRI scans, GIS files, tabular data, and genomic data. Many files are proprietary, and most data can only be used by very specialized software. In addition, file sizes continue to grow exponentially. Moreover, datasets that undergo consistent change or growth, due to constant update and addition from sensors, transactions, and other forms of input, are also becoming more prevalent and have their own unique preservation challenges. So, the support needed across institutions and entities in terms of preservation, discovery, and reuse of data is correspondingly complex.

It is natural for libraries to play a central role in RDM services as libraries have always served to provide access to and the collection and preservation of academic outputs. This only becomes a more fundamental contribution to the library’s role in the research lifecycle as data become an increasingly central source leading to information and knowledge. In addition, in the “State of Open Data 2022”Citation55 report, the authors found that 72% of surveyed researchers’ responses indicated they would rely on internal institutional offices, such as the library, to make their data more open. Although the need has begun to shrink slightly as awareness and skills increase, researchers are still highly in need of internal support networks when tasked with data management; so, having library services to support them can be very beneficial.

Scope and variety of services

Research data management and curation is one of two tracks within the data services field. The other track of data librarianship falls under the scope of data science and data literacy which is more focused on data preparation, analysis, and visualization. For both tracks, there is some level of focus on scientific reproducibility. For example, good data management will result in well-documented data, which will facilitate the ability for others to reuse data.

Some of the ways libraries provide support in the area of RDM are through teaching workshops and classes and giving presentations to build institutional knowledge around how to collect, document, and share data in ways that ensure FAIRCitation33 principles: findable, accessible, interoperable, and reusable. These include topics such as file naming conventions, file structure and organization, and creating metadata and data dictionaries. Libraries also have employees providing informational and consultative services around selecting and depositing data in repositories and advising on, or engaging in, steps for data curation. However, the scope of services provided and the depth of support varies by library in terms of complementing the services already provided by other offices at the home institution. Libraries must take staff resources and scalability into account. For example, when I was the only data librarian at my library, we made sure to advertise what I offered in a way that would be manageable in terms of fulfilling requests that would come in. As I was able to hire additional data librarians and graduate assistants, the team was able to vastly increase the depth and array of services provided. This expansion included more time working on grant-funded research teams, adding infrastructure such as a staffed computing lab and a data repository, and influencing institutional policy on data management and related topics.

For libraries in health sciences institutions, there is a greater need for expertise in the ethics and practice of accessing protected data, privacy and deidentification, secure and HIPAA-compliant storage, and data sharing protections in consideration of human subject data. From my experience, the more mature your RDM services become, the more it will be necessary to collaborate with institutional partners, such as information technology, information security, legal, and grants offices as well as scientific cores and the institutional review board(s).

The major hurdle for libraries trying to solidify RDM services is in making those connections across campus and getting buy-in from faculty and researchers. Training graduate students is also important as they tend to be most receptive to RDM services, and some become professors and researchers themselves. So, it then informs future faculty who often become library champions wherever they go. In my experience, it also takes an ongoing effort of connecting with researchers through ways that are important to them, including providing workshops of interest and connections with data resources. Examples include coordinating presentations from a Federal Reserve Statistical Research Data Center or publicizing guidance for working with high performing computing systems. Sometimes this can be done as simply as meeting with IT employees and learning and publishing or otherwise sharing out these directions.

The future of research data management services

There continues to be a push for open data practices globally, and more funders and publishers are starting to require that data be shared openly. For example, in the United States, the National Institutes of Health established a policy requiring data management and sharing plans from all funding proposals that would generate scientific data, which was effective as of January 2023. The policy encourages sharing as openly possible no later than publication or the end of the award. In addition, the White House Office of Science and Technology Policy has been developing policies toward data availability and data reuse. Clearly, open science and open data are only going to become more prominent with time. Therefore, researchers’ need for support from librarians skilled at addressing these areas will likely continue to increase in the coming years.

Additional information

Funding

This project has been funded with Federal funds from the National Library of Medicine (NLM), National Institutes of Health (NIH), under cooperative agreement number UG4LM01234 with the University of Massachusetts Chan Medical School, Lamar Soutter Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Notes on contributors

Julie Goldman

Julie Goldman, MLIS ([email protected]) is the Research Data Services Librarian at the Countway Library of Medicine, Harvard Medical School, Boston, Massachusetts, USA.

Jennifer Muilenburg

Jennifer Muilenburg, MLIS ([email protected]) is the Research Data Services, Curriculum, and Communications Librarian at the University Libraries, University of Washington, Seattle, Washington, USA

Andrea N. Schorr

Andrea N. Schorr, MSIS ([email protected]) is the Associate Director of Resource Management at the Dolph Briscoe, Jr. Library, University of Texas Health Science Center San Antonio, San Antonio, Texas, USA.

Peace Ossom-Williamson

Peace Ossom-Williamson, MLS, AHIP ([email protected]) is the Associate Director at the NNLM National Center for Data Services, Grossman School of Medicine, New York University, New York, New York, USA

C. Jeff Uribe-Lacy

C. Jeff Uribe-Lacy, MLIS, MA ([email protected]) is the Liaison Librarian to the Schools of Dentistry, Public Health, and Graduate Studies at the Dolph Briscoe, Jr. Library, University of Texas Health Science Center San Antonio, San Antonio, Texas, USA.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.