Abstract
Due to mass information on the Internet, efficient information retrieval has become the topic of interest for both academia and industry. In order to improve the accuracy of search engines' ranking algorithm, we propose an algorithm for determining the theoretical degree of user attention to a webpage (TDUAW). News webpages were selected from the Chinese online forums as the object of our research. We designed a content-oriented algorithm for TDUAWs through such steps as extracting web content, identifying the feature vector of a webpage and calculating the degree of user attention based on Baidu Index. Experiment and regression analyses were conducted on 150 webpages. The results indicate that the optimal number of feature words is three and the correlation coefficient between the theoretical degree of user attention and the actual degree of user attention (net page view) is over 0.8, which proves the validity of our method. So, the theoretical degree of user attention can serve as an alternative to the page view in the case when page view is unavailable.
Acknowledgements
This work was partially supported by the NSFC Grant 70971099, the Fundamental Research Funds for the Central Universities (1200219198), Shanghai Philosophy and Social Science Planning Projects (2013BGL004) and the Doctoral Thesis Funding for Soft Science Research from the Shanghai Science and Technology Development Funds 12692193000.