805
Views
4
CrossRef citations to date
0
Altmetric
Articles

A feature-based intelligent deduplication compression system with extreme resemblance detection

ORCID Icon, , , , &
Pages 576-604 | Received 28 Jul 2020, Accepted 03 Dec 2020, Published online: 21 Dec 2020

Figures & data

Figure 1. An overview of FIDCS-ERD. fp means the fingerprint representing an identifier of a chunk, which is used to identify the duplicate chunk. Sketch is a Bloom filter-based array, which is compared to find the similar chunk.

Figure 1. An overview of FIDCS-ERD. fp means the fingerprint representing an identifier of a chunk, which is used to identify the duplicate chunk. Sketch is a Bloom filter-based array, which is compared to find the similar chunk.

Figure 2. The processing flow in FIDCS-ERD.

Figure 2. The processing flow in FIDCS-ERD.

Table 1. Description of notations.

Figure 3. The influence of the average size of a chunk on duplicate and resemblance detection. D, S and N represent the duplicate, the similar, and the unique chunk, respectively.

Figure 3. The influence of the average size of a chunk on duplicate and resemblance detection. D, S and N represent the duplicate, the similar, and the unique chunk, respectively.

Figure 4. Basic operations of Bloom filter.

Figure 4. Basic operations of Bloom filter.

Figure 5. The process of compression and decompression of a similar chunk.

Figure 5. The process of compression and decompression of a similar chunk.

Table 2. An overview of datasets.

Figure 6. Results of dataset Glibc. (a) CR with α. (b) DCRpC with α. (c) TP with α. (d) CR with β and γ. (e) DCRpC with β and γ. (f) TP with β and γ. (g) CR with k for DD. (h) CR with k for RD. (i) TP with k for RD.

Figure 6. Results of dataset Glibc. (a) CR with α. (b) DCRpC with α. (c) TP with α. (d) CR with β and γ. (e) DCRpC with β and γ. (f) TP with β and γ. (g) CR with k for DD. (h) CR with k for RD. (i) TP with k for RD.

Figure 7. Results of dataset email. (a) CR with α. (b) DCRpC with α. (c) TP with α.

Figure 7. Results of dataset email. (a) CR with α. (b) DCRpC with α. (c) TP with α.

Figure 8. Results of image. (a) CR with α. (b) DCRpC with α. (c) TP with α.

Figure 8. Results of image. (a) CR with α. (b) DCRpC with α. (c) TP with α.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.