Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
International Journal of Computers and Applications
List of Issues
Volume 44, Issue 12
An ontology learning based approach for ....

Search in:

International Journal of Computers and Applications Volume 44, 2022 - Issue 12: Artificial Intelligence for Sustainable Internet Research. Guest Editors: Dr. H. Anandakumar, Dr. Muhammad Sharif and Dr. Sri Devi Ravana

Submit an article Journal homepage

139

Views

CrossRef citations to date

Altmetric

Articles

An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm

P. R. Joe DhanithDepartment of Computer Science and Engineering, National Institute of Technology, Karaikal, Puducherry, IndiaCorrespondence[email protected]

https://orcid.org/0000-0002-9022-9145 View further author information

B. SurendiranDepartment of Computer Science and Engineering, National Institute of Technology, Karaikal, Puducherry, India

https://orcid.org/0000-0001-5435-0880 View further author information

Pages 1123-1129 | Received 25 Jun 2019, Accepted 18 Oct 2019, Published online: 30 Oct 2019

Cite this article
https://doi.org/10.1080/1206212X.2019.1684023
CrossMark

Sample our Computer Science journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/1206212X.2019.1684023?needAccess=true

Abstract

The precedence of unexplored Uniform Resource Locators (URLs) is calculated in many existing works based on a linear combination of similarities of different texts of the web_page and the specified topic along with their associated weights. These weights, however, are chosen based on various methodologies like Term Frequency-Inverse Document Frequency (TF-IDF), so these weights can immediately create severe deviations from the priorities of unvisited web pages and also it will calulate the similarity only if the word occurs in the web page. It won’t consider the semantic similarity of the word in the web page. To overcome the troubles mentioned above, this article presents a new focused web crawler based on combined Normalized Pointwise Mutual Information (NPMI) and Resnik based semantic similarity algorithm, called as P-crawler. In the P-crawler, the records of an unexplored web page are made up of web page text, anchor text, title text, bold text and heading text of the web page. The experimental findings show that the suggested algorithm increases focused on crawler efficiency. In conclusion, the above technique is efficient and promising for focused web crawlers.

KEYWORDS:

Web crawler
NPMI
P-crawler
focused crawler
ontology
semantic similarity

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

P. R. Joe Dhanith

P. R. Joe Dhanith received his B.Tech degree in Information Technology from Anna University in 2010 and M.E degree in Computer Science and Engineering from Anna University in 2012. He is currently pursuing his Ph.D degree in Computer Science and Engineering at National Institute of Technology Puducherry. His main research interests includes web mining, web crawling and information retrieval.

B. Surendiran

B. Surendiran is currently working as Assistant Professor in the Department of Computer Science and Engineering at National Institute of Technology Puducherry, Karaikal, India. He has completed his Ph.D in Computer Science and Engineering at National Institute of Technology Tiruchirapalli. His research interest includes recommender systems and data mining. He has received “Best Paper Award” for his paper at artcom2009 international conference.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 288.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Sign me up

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research