419
Views
23
CrossRef citations to date
0
Altmetric
Articles

LEARNING-based Focused WEB Crawler

ORCID Icon & ORCID Icon

References

  • H. Bullot, S. K. Gupta, and M. K. Mohania, “A data-mining approach for optimizing performance of an incremental crawler,” in Proceedings IEEE/WIC International Conference on Web Intelligence, Halifax, NS, Canada, 2003, pp. 610–615.
  • M. Kc, M. Hagenbuchner, and A. C. Tsoi, “A Scalable Lightweight distributed crawler for crawling with limited resources,” in 2008 IEEE/WIC/ACM International Conference on Web Intelligence and intelligent Agent Technology, Sydney, NSW, 2008, pp. 663–666.
  • D. Mukhopadhyay, A. Biswas, and S. Sinha, “A new approach to design domain specific ontology based Web crawler,” in 10th International Conference on information Technology (ICIT), Orissa, 2007, pp. 289–291.
  • M. Wu, and J. Lai, “The research and implementation of parallel Web crawler in Cluster,” in 2010 International Conference on Computational and information Sciences, Chengdu, 2010, pp. 704–708.
  • S. Anbukodi, and K. M. Manickam, “Reducing web crawler overhead using mobile crawler,” in 2011 International Conference on Emerging Trends in Electrical and Computer Technology, Nagercoil, 2011, pp. 926–932.
  • A. Gupta, and P. Anand, “Focused web crawlers and its approaches,” in International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), Noida, 2015, pp. 619–622.
  • G. H. Agre, and N. V. Mahajan, “Keyword focused web crawler,” in 2nd International Conference on Electronics and Communication Systems, Coimbatore, 2015, pp. 1089–1092.
  • T. Suebchua, A. Rungsawang, and H. Yamana, “Adaptive focused website Segment crawler,” in 19th International Conference on network-based information Systems (NBi-S), Ostrava, 2016, pp. 181–187.
  • R. Navinkumar, and S. Sureshkumar, “Two-stage Smart crawler for efficiently Harvesting deep-Web Interfaces,” International Research Journal of Engineering and Technology (IRJET), Vol. 3, pp. 111–114, 2016.
  • Y. Patil, and S. Patil, “Implementation of enhanced web crawler for deep-web Interfaces,” International Research Journal of Engineering and Technology (IRJET), Vol. 3, pp. 2088–2092, 2016.
  • G. V. Jaybhaye, and A. V. Deorankar, “Machine learning approach for self-adaptive semantic focused crawler based data mining Services,” International Journal of Innovative Research in Computer and Communication Engineering, Vol. 4, pp. 2507–2512, 2016.
  • P. S. Sekhon, and S. Aggarwal, “Focused web crawling using neural network, decision tree induction and naıve bayes classifier,” IJCST, Vol. 5, pp. 155–159, 2014.
  • S. Gurav, J. Gilani, V. Gore, and S. Jadhao, “Web content extraction using machine learning,” International Research Journal of Engineering and Technology (IRJET), Vol. 5, pp. 4517–4518, 2018.
  • D. Taylan, M. Poyraz, S. Akyokus, and M. C. Ganiz, “Intelligent focused crawler: learning which links to crawl,” in International Symposium on Innovations in intelligent Systems and Applications (INISTA), Istanbul, 2011, pp. 504–5081.
  • N. Mtetwa, M. Yousefi, and V. Reddy, “Feature Selection for an SVM based webpage classifier,” in International Conference on Soft Computing & machine Intelligence (ISCMI), Port Louis, 2017, pp. 85–88.
  • A. Darshakar, “Crawler intelligence with machine learning and data mining integration,” in International Conference on Pervasive Computing (ICPC), Pune, 2015, pp. 1–6.
  • L. Jiang, Z. Wu, Q. Zheng, and J. Liu, “Learning deep Web crawling with Diverse features,” in IEEE/WIC/AC-M International Joint Conference on Web Intelligence and intelligent Agent Technology, Milan, Italy, 2009, pp. 572–575.
  • M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, “Focused crawling using context Graphs,” in VLDB ‘00 Proceedings of the 26th Intern-ational Conference on very large data Bases, Cairo, Egypt, 2000, pp. 527–534.
  • M. Chau, and H. Chen, “A machine learning approach to web page filtering using content and structure analysis,” Decis. Support. Syst., Vol. 44, pp. 482–494, 2008.
  • H. T. Y. Achsana, and W. C. Wibowob, “A Fast distributed focused-Web crawling,” in DAAAM International Symposium on intelligent Manufacturing and automation, Zadar, Croatia, 2013, pp. 492–499.
  • M. Kumar, A. Bindal, R. Gautam, and R. Bhatia, “Keyword query based focused web crawler,” in International Conference on Smart Computing and Communications, ICSCC, Kurukshetra, India, 2018, pp. 584–590.
  • M. Kumar, and R. Vig, “Learnable focused Meta crawling Through Web,” in International Conference on Communication, Computing & security, Odisha, India, 2012, pp. 606–611.
  • S. R. Mani Sekhar, G. M. Siddesh, S. S. Manvi, and K. G. Srinivasa, “Optimized focused Web crawler with Natural Language processing based relevance Measure in Bioinformatics Web Sources,” Cybern. Inf. Technol., Vol. 19, pp. 146–158, 2019.
  • S. Han, M. Bendersky, P. Gajda, S. Novikov, M. Najork, B. Brodowsky, and A. Popescul, “Adversarial bandits policy for crawling Commercial Web content,” in International World Wide Web Conference Committee (IW3C2), ACM, Taipei, Taiwan, 2020, pp. 407–417.
  • Y. Azar, E. Horvitz, E. Lubetzky, Y. Peres, and D. Shahaf, “Tractable near-optimal policies for crawling,” Proc. Natl. Acad. Sci. U. S. A., Vol. 115, pp. 8099–8103, 2018.
  • K. Avrachenkov, K. Patil, and G. Thoppe, “Change rate estimation and optimal freshness in Web page crawling,” in 13th EAI International Conference on performance Evaluation Methodologies and Tools, VALUETOOLS'20, ACM, New York, USA, 2020, pp. 3–10.
  • R. Todesschini, “k-nearest neighbor method: The influence of data transformations and metrics,” Chemom. Intell. Lab. Syst., Vol. 6, pp. 213–220, 1989.
  • J. Di, and X. Gou, “Bisecting K-means algorithm based on K-valued Self-determining and Clustering Center Optimization,” School of Control and Comp-Uter Engineering, Vol. 13, pp. 588–595, 2018.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.