Abstract
Efficiency of full text retrieval using signatures depends on the number of filtering and the reduction of the original text, but there has been no discussion how a signature is constructed keeping the worst-case filtering ratio. In order to consider this problem, we present a technique of constructing signatures by using an appearance probability of strings in a textual data. It enables us to retrieve any keywords in expected worst-case searching time.
A partial appearance probability is proposed because the overall probability for the whole text takes a lot of time building signatures. From the simulation result, it turns can't that the worst-case filtering ratio of the presented method can keep the expected ratio while that of the traditional method degrades zero.