Abstract
This article introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a public-use patent data exploration platform that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical, and principled—key characteristics that allow us to paint the first representative picture of PatentsView’s disambiguation performance. The results are used to inform PatentsView’s users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
Authors’ Contributions
Olivier Binette led the evaluation project and wrote the majority of the manuscript. Sokhna A York and Emma Hickerson carried out the data collection by manually reviewing inventor clusters. Youngsoo Baek provided bias adjustment and uncertainty quantification for ratio estimators. Sarvo Madhavan was a technical advisor and contributed to code. Christina Jones was an advisor and project manager. All authors provided input on the manuscript.
Data Availability Statement
All data and code used for this article are available as part of the PatentsView-Evaluation Python package (version 1.0.1) at https://github.com/PatentsView/PatentsView-Evaluation/releases/tag/1.0.1.
Disclosure Statement
The authors report there are no competing interests to declare.