Abstract
As the number of pages being published every day increases enormously, there is a consistent need to design an efficient crawler mechanism that can result in appropriate and efficient search results for the everyday query. Every day people face the problem of inappropriate or incorrect answers among search results. So, there is a strong need to develop enhanced methods to provide precise search results for the user in an acceptable time frame. Through this project, we exhibit an effective approach to building a crawler considering factors that have never been considered before. The main focus of the project would be designing an intelligent crawler that learns itself to improve the effective ranking of URLs using a focused crawler. Moreover, there exist many crawlers which first head to the seed URL, read the pages, and download the pages for further indexing to the search engines. In this, there is a problem that if a website/page which does not update regularly, is still crawled by the crawler even though it had already been downloaded in its previous visit. Due to this, there is a great loss of bandwidth, network, time, and storage. So, we aim to minimize these problems by making an effective system with a revisited policy for web crawlers. First, websites are divided into three categories frequently, frequent, static in the first crawl, and then the crawler decides its time that at what time it has to crawl again for that website.
Correction Statement
This article has been republished with minor changes. These changes do not impact the academic content of the article.
Additional information
Notes on contributors
![](/cms/asset/e30ba307-4d84-4ed5-8d86-c0c4cf3700e6/tijr_a_1885312_ilg0001.gif)
Naresh Kumar
Naresh Kumar holds a PhD from Kurukshetra University, Kurukshetra and MTech (Computer Science and Engineering) degree from YMCA University of Science and Technology, Faridabad. He is currently working at Maharaja Surajmal Institute of Technology, New Delhi where he is working as an associate professor. His area of research interest includes web crawlers, search engines, and meta search engines. He has published over 41 research papers.
![](/cms/asset/adf4e755-2b43-4776-8db2-fc049be5e4d1/tijr_a_1885312_ilg0002.gif)
Dhruv Aggarwal
Dhruv Aggarwal is currently pursuing PGDM from the Institute of Management Technology, Ghaziabad. He holds a BTech degree in computer science and engineering from Maharaja Surajmal Institute of Technology, New Delhi. Email: [email protected]