419
Views
23
CrossRef citations to date
0
Altmetric
Articles

LEARNING-based Focused WEB Crawler

ORCID Icon & ORCID Icon
Pages 2037-2045 | Published online: 22 Feb 2021
 

Abstract

As the number of pages being published every day increases enormously, there is a consistent need to design an efficient crawler mechanism that can result in appropriate and efficient search results for the everyday query. Every day people face the problem of inappropriate or incorrect answers among search results. So, there is a strong need to develop enhanced methods to provide precise search results for the user in an acceptable time frame. Through this project, we exhibit an effective approach to building a crawler considering factors that have never been considered before. The main focus of the project would be designing an intelligent crawler that learns itself to improve the effective ranking of URLs using a focused crawler. Moreover, there exist many crawlers which first head to the seed URL, read the pages, and download the pages for further indexing to the search engines. In this, there is a problem that if a website/page which does not update regularly, is still crawled by the crawler even though it had already been downloaded in its previous visit. Due to this, there is a great loss of bandwidth, network, time, and storage. So, we aim to minimize these problems by making an effective system with a revisited policy for web crawlers. First, websites are divided into three categories frequently, frequent, static in the first crawl, and then the crawler decides its time that at what time it has to crawl again for that website.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Notes on contributors

Naresh Kumar

Naresh Kumar holds a PhD from Kurukshetra University, Kurukshetra and MTech (Computer Science and Engineering) degree from YMCA University of Science and Technology, Faridabad. He is currently working at Maharaja Surajmal Institute of Technology, New Delhi where he is working as an associate professor. His area of research interest includes web crawlers, search engines, and meta search engines. He has published over 41 research papers.

Dhruv Aggarwal

Dhruv Aggarwal is currently pursuing PGDM from the Institute of Management Technology, Ghaziabad. He holds a BTech degree in computer science and engineering from Maharaja Surajmal Institute of Technology, New Delhi. Email: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.