Abstract
The ranking of search results largely determines the quality of service (QoS) of a meta-search engine (MSE). To address the demand of big data applications, this paper proposes a new method considering factors such as network bandwidth, client and limit server resources. In this method, Web pages with the same contents (but with different URLs) are identified by calculating similarity among contents of pages traversed by the user and those of pages not yet traversed. Hence, deviation of statistics about the user’s intent for traversing caused by factors such as ranking differences in the orders of traversing and repeated contents of Web pages can be eliminated. While a search service is being provided, each component search engine (CSE) weight can be given dynamically before returned results receive a second rotary ranking in combination with initial ranking information. Experimental results and statistics show that (1) the numbers of traversals and downloads can be decreased; (2) the ratio of the number of pages clicked by the user to that of pages navigated can also be decreased; (3) the matching degree between searches/traversals and returned results can be increased; and (4) the stability of a search engine can be improved by taking into account the factor of repeated contents of Web pages.