20
Views
16
CrossRef citations to date
0
Altmetric
Original Articles

A common basis for similarity measures involving two stringsFootnote

&
Pages 17-40 | Received 01 Dec 1981, Published online: 19 Mar 2007
 

Abstract

Many numerical indices which quantify the similarity and dissimilarity between a pair of stringsX and Y, have been defined in the literature. Some of these include the Length of their Longest Common Subsequence (LLCS(X, Y)), the Length of their Shortest Common Supersequence (LSCS(X, Y)), and their Generalized Levenshtein Distance (GLD(X, Y)). Some non-numerical indices relating the strings are the set of their common subsequences, the set of their common supersequences and the set of their shuffles. In this paper, we consider an abstract measure between X and Y, written as D(X, Y), defined in terms of two abstract operators ⊕ and ⊛, and a binary function d( [sdot], [sdot] ) whose arguments are symbols of an alphabet à Depending on the various concrete operators used for ⊕ and ⊛ and the specific function used for d( [sdot], [sdot] ), all the quantities discussed above can be seen to be particular cases of D(X, Y). We have presented an algorithm to recursively compute D(X, Y), which can serve to be a common scheme to compute all these quantities. Many new results are obtained using this abstract formulation, such as an explicit linear relationship between the LLCS and the LSCS between two strings. It can also be seen that the algorithm to compute the LLCS between X and Y is a special case of the algorithm to compute D(X, Y). In addition, by using different concrete values for ⊕, ⊛ and d([sdot],[sdot]), new one-pass algorithms can be developed to compute various quantities such as the set of Longest Common Subsequences (LCS), the set of all the Shortest Common Supersequences (SCS) without backtracking, and the set of all the shuffles of two strings.

C.R. Categories:

†Partially supported by the National Science Foundation Grant No. ECS-80-0904L.

†Partially supported by the National Science Foundation Grant No. ECS-80-0904L.

Notes

†Partially supported by the National Science Foundation Grant No. ECS-80-0904L.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.