ABSTRACT
Climate change is intensifying natural hazards, putting critical infrastructure systems at risk. The effects of climate change on critical infrastructure can be significant, and communities need to consider these risks when planning and designing infrastructure systems for the future. To that end, natural language processing (NLP) is a promising approach for analyzing large volumes of climate change and infrastructure-related scientific literature. To train a supervised model using NLP techniques, a significant subset of the corpus must be labeled into categories based on user-defined criteria, which is a time-consuming process. To expedite this process, we developed a weak supervision-based approach that leverages semantic similarity between categories and documents to generate category labels for the domain-specific corpus. In comparison with a months-long process of subject-matter expert labeling, we assign category labels to the whole corpus using weak supervision and supervised learning in 13 hours.
Acknowledgments
This material is based in part upon work supported by the Laboratory Directed Research and Development (LDRD), Argonne National Laboratory. This research used resources from the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility under contract DE-AC02-06CH11357.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data and code availability
The data and code used to generate these results are available in a public github repository: https://github.com/allenai/s2orc
Government license
The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. http://energy.gov/downloads/doe-public-access-plan.
Additional information
Funding
Notes on contributors
Tanwi Mallick
Tanwi Mallick is an assistant computer scientist in the Mathematics and Computer Science Division at Argonne National Laboratory, where she previously held a postdoctoral appointment. Her research is primarily focused on spatiotemporal graph neural networks, uncertainty quantification, trustworthy scientific machine learning (SciML), foundation models, natural language processing, and high-performance computing. She also has experience working across various scientific domains, such as transportation systems, climate science, and HPC network analysis. Before her tenure at Argonne, she was a senior data scientist at General Electric. Tanwi obtained her Ph.D. in computer science from the Indian Institute of Technology, Kharagpur, India.
Joshua David Bergerson
Dr. Joshua David Bergerson is a Principal Infrastructure Analyst in the Decision and Infrastructure Sciences (DIS) Division at Argonne National Laboratory. His research focuses on the resilience of critical infrastructure systems supporting communities throughout the Nation. His research is primarily funded by the Cybersecurity and Infrastructure Security Agency’s Regional Resiliency Assessment Program, where Dr. Bergerson leads data collection and analysis to identify resilience gaps and options for closing these resilience gaps in collaboration with infrastructure owners and operators. Dr. Bergerson also provides emergency response and recovery support to the Federal Emergency Management Agency.
Duane R. Verner
Duane Rudolph Verner is the Director of Global Energy and Climate Security at Argonne National Laboratory. He oversees staffing and technical assignments related to global energy security, climate change adaptation, critical infrastructure resilience, system modeling, and artificial intelligence. Mr. Verner leads Argonne’s support to the United States Department of Energy Office of International Affairs and the Organization for Security and Cooperation in Europe (OSCE), developing and implementing projects focused on protecting critical energy networks and energy-related aspects of disaster risk reduction in Eastern Europe and Central Asia. In 2021, Duane served as a U.S. Embassy Science Fellow with the goal of promoting collaboration with the European Union to develop a climate-resilient economy in the Western Balkans. He regularly contributes to the international resilience and security research community through presentations, publications, and trans-Atlantic collaboration, including serving as a Civil Expert within NATO’s Energy Planning Group, participating in the OECD High Level Risk Forum, and contributing to the European Centre of Excellence for Countering Hybrid Threats (Hybrid COE). He has provided project management and methodology development support to the United States Department of Homeland Security Regional Resiliency Assessment Program since its inception in 2009.
John K Hutchison
John Hutchison is an Operations Research in ANL’s Decision and Infrastructure Science Division (DIS). He supports DIS sponsors like the Department of Homeland Security’s Federal Protective Service (FPS) and Cybersecurity and Infrastructure Security Agency (CISA) with his mathematical and statistical expertise. His primary research interests lie in natural language processing and decision science.
Leslie-Anne Levy
Leslie-Anne Levy currently serves as the Deputy Director of the Decision and Infrastructure Sciences Division in Argonne National Laboratory’s Nuclear Technologies and National Security Directorate. Ms. Levy has been involved in research, policy development, and program implementation for global, homeland, and national security portfolios in the public and private sectors for more than 25 years. At Argonne, she manages an applied research portfolio largely focused on assessing the security and resilience of critical infrastructure, as well as analyzing and managing risk to infrastructure systems. Prior to joining Argonne, Ms. Levy served as a Managing Director for the safety and security practice at CNA, where she led several emergency management initiatives, including national preparedness assessments and homeland security risk management training. She also served in the U.S. Department of Homeland Security (DHS), managing the program development team for state and local preparedness grant programs and leading the development of technical assistance services for state and local homeland security personnel. Before DHS, Ms. Levy held analyst positions in the private sector and a security policy research organization.
Prasanna Balaprakash
Prasanna Balaprakash is currently the Director of the AI Initiative and a Distinguished Research and Development Scientist with Oak Ridge National Laboratory. His research interests include artificial intelligence, machine learning, optimization, and high-performance computing. His current research interests include the development of scalable and data-efficient machine learning methods for scientific applications. He was a recipient of U.S. Department of Energy 2018 Early Career Award.