419
Views
23
CrossRef citations to date
0
Altmetric
Articles

LEARNING-based Focused WEB Crawler

ORCID Icon & ORCID Icon
 

Abstract

As the number of pages being published every day increases enormously, there is a consistent need to design an efficient crawler mechanism that can result in appropriate and efficient search results for the everyday query. Every day people face the problem of inappropriate or incorrect answers among search results. So, there is a strong need to develop enhanced methods to provide precise search results for the user in an acceptable time frame. Through this project, we exhibit an effective approach to building a crawler considering factors that have never been considered before. The main focus of the project would be designing an intelligent crawler that learns itself to improve the effective ranking of URLs using a focused crawler. Moreover, there exist many crawlers which first head to the seed URL, read the pages, and download the pages for further indexing to the search engines. In this, there is a problem that if a website/page which does not update regularly, is still crawled by the crawler even though it had already been downloaded in its previous visit. Due to this, there is a great loss of bandwidth, network, time, and storage. So, we aim to minimize these problems by making an effective system with a revisited policy for web crawlers. First, websites are divided into three categories frequently, frequent, static in the first crawl, and then the crawler decides its time that at what time it has to crawl again for that website.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Notes on contributors

Naresh Kumar

Naresh Kumar holds a PhD from Kurukshetra University, Kurukshetra and MTech (Computer Science and Engineering) degree from YMCA University of Science and Technology, Faridabad. He is currently working at Maharaja Surajmal Institute of Technology, New Delhi where he is working as an associate professor. His area of research interest includes web crawlers, search engines, and meta search engines. He has published over 41 research papers.

Dhruv Aggarwal

Dhruv Aggarwal is currently pursuing PGDM from the Institute of Management Technology, Ghaziabad. He holds a BTech degree in computer science and engineering from Maharaja Surajmal Institute of Technology, New Delhi. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.