310
Views
3
CrossRef citations to date
0
Altmetric
Articles

Combining natural language processing techniques and algorithms LSA, word2vec and WMD for technological forecasting and similarity analysis in patent documents

, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1695-1716 | Received 16 Sep 2021, Accepted 29 Jul 2022, Published online: 09 Aug 2022
 

ABSTRACT

Keyword search is the most ordinary tool in patent offices; however, for more advanced research, free software is not presented on their websites. Thus, this paper has the purpose to provide a data-mining framework for patent documents, linking the natural language processing techniques and data analysis algorithms. The system has two main goals: the analysis of technological prospection and the evaluation of similarities among patents through titles and abstracts. For numerical experiments, we used the base of the US Patent and Trademark Office, with over a million documents. Analysing patents about TFT-LCD, Flash Memory and PDA, from 2010 to 2018, with S-Curve it was observed that the last two technologies decline. Using a cloud of words, it was possible to see the phone’s evolution, from 2010 to 2015. To evaluate the degree of similarity among patents, we investigated Latent Semantic Analysis (LSA), Word2vec, Word Mover’s Distance (WMD), in three different study cases. In addition, these methods were compared with the classical Jaccard index. Numerical results show that LSA and WMD obtained similar patent indications, and the Jaccard index presented different indications from the other three.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Google and USPTO signed an agreement in 2010 to make USPTO bulk data available for free until 2015.

Additional information

Funding

This work has been supported by the following Brazilian research agencies: FAPES and CAPES, through the PDPG (Graduate Development Program – Strategic Partnerships in the States) (PROCESS: 2021-2S6CD, n° FAPES 132/2021). The second author is funded by the grant 88887.568442/2020-00 (CAPES/FAPES – Post doctoral) and the last author is funded by the grant 308432/2020-7 (Bolsa de Produtividade DT) from CNPq research agency, and by the research fee 293/2021 from FAPES research agency. All authors are grateful to IFES for the support.

Notes on contributors

João Marcos de Rezende

João Marcos de Rezende is a Doctoral Student in Computer Science (UFES). Master in Control and Automation Engineering, with an emphasis on Intelligent Systems (IFES, 2019). Specialist in Project Management (UNINTER, 2015) and Web Development (FAESA, 2012). Graduated in Computer Science (FAESA, 2010). He is currently an IT Specialist at the SENAI Institute of Technology (IST) in Operational Efficiency.

Izabella Martins da Costa Rodrigues

Izabella Martins da Costa Rodrigues is Graduated in Biological Sciences. Master in Plant Science (UFV 2008). Ph.D. in Plant Biology (UFMG 2013), having completed part of this doctorate in London, Natural History Museum. She is a Pós-doctoral researcher in Applied Computing at the Federal Institute of Espírito Santo, Campus Serra.

Leandro Colombi Resendo

Leandro Colombi Resendo is a Professor at the Federal Institute of Espírito Santo (IFES), Serra. Graduated in Mathematics (UFES 2002). Master in Computer Science (UFES 2004). Ph.D. in Electrical Engineering (UFES 2008). He has experience in the field of Mathematics Education. Published works related to linear allocation problem and quadratic allocation problem. The current area of research is related to WDM optical networks, traffic grooming problem and integer linear programming.

Karin Satie Komati

Karin S. Komati is a Professor at the Federal Institute of Espírito Santo (IFES), Serra. Graduated in Computer Science (UFES) in 1995, graduated in Electrical Engineering (UFES 1997), Master in Informatics (UFES 2002), and Ph.D. in Electrical Engineering (UFES 2011). She has been working in higher education teaching since 1998, working in several private and public institutions. She is a senior researcher at the Nu[Tec]2 laboratory. Her research interests include the areas of digital image processing, pattern recognition and data science.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 650.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.