ABSTRACT
Keyword search is the most ordinary tool in patent offices; however, for more advanced research, free software is not presented on their websites. Thus, this paper has the purpose to provide a data-mining framework for patent documents, linking the natural language processing techniques and data analysis algorithms. The system has two main goals: the analysis of technological prospection and the evaluation of similarities among patents through titles and abstracts. For numerical experiments, we used the base of the US Patent and Trademark Office, with over a million documents. Analysing patents about TFT-LCD, Flash Memory and PDA, from 2010 to 2018, with S-Curve it was observed that the last two technologies decline. Using a cloud of words, it was possible to see the phone’s evolution, from 2010 to 2015. To evaluate the degree of similarity among patents, we investigated Latent Semantic Analysis (LSA), Word2vec, Word Mover’s Distance (WMD), in three different study cases. In addition, these methods were compared with the classical Jaccard index. Numerical results show that LSA and WMD obtained similar patent indications, and the Jaccard index presented different indications from the other three.
KEYWORDS:
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Google and USPTO signed an agreement in 2010 to make USPTO bulk data available for free until 2015.
Additional information
Funding
Notes on contributors
João Marcos de Rezende
João Marcos de Rezende is a Doctoral Student in Computer Science (UFES). Master in Control and Automation Engineering, with an emphasis on Intelligent Systems (IFES, 2019). Specialist in Project Management (UNINTER, 2015) and Web Development (FAESA, 2012). Graduated in Computer Science (FAESA, 2010). He is currently an IT Specialist at the SENAI Institute of Technology (IST) in Operational Efficiency.
Izabella Martins da Costa Rodrigues
Izabella Martins da Costa Rodrigues is Graduated in Biological Sciences. Master in Plant Science (UFV 2008). Ph.D. in Plant Biology (UFMG 2013), having completed part of this doctorate in London, Natural History Museum. She is a Pós-doctoral researcher in Applied Computing at the Federal Institute of Espírito Santo, Campus Serra.
Leandro Colombi Resendo
Leandro Colombi Resendo is a Professor at the Federal Institute of Espírito Santo (IFES), Serra. Graduated in Mathematics (UFES 2002). Master in Computer Science (UFES 2004). Ph.D. in Electrical Engineering (UFES 2008). He has experience in the field of Mathematics Education. Published works related to linear allocation problem and quadratic allocation problem. The current area of research is related to WDM optical networks, traffic grooming problem and integer linear programming.
Karin Satie Komati
Karin S. Komati is a Professor at the Federal Institute of Espírito Santo (IFES), Serra. Graduated in Computer Science (UFES) in 1995, graduated in Electrical Engineering (UFES 1997), Master in Informatics (UFES 2002), and Ph.D. in Electrical Engineering (UFES 2011). She has been working in higher education teaching since 1998, working in several private and public institutions. She is a senior researcher at the Nu[Tec]2 laboratory. Her research interests include the areas of digital image processing, pattern recognition and data science.