119
Views
0
CrossRef citations to date
0
Altmetric
Computers & Computing

File Semantic Aware Primary Storage Deduplication System

ORCID Icon, &

References

  • J. Gantz, and D. Reinsel, “The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,” IDC IView: IDC Analyze the Future, Vol. 7, pp. 1–16, 2012
  • A. El-Shimi, R. Kalach, A. Kumar, A. Ottean, J. Li, and S. Sengupta, “Primary data deduplication–large scale study and system design,” in USENIX ATC '12. Boston, MA; 2012, pp. 285–286.
  • D. Meister, J. Kaiser, A. Brinkmann, T. Cortes, M. Kuhn, and J. Kunkel, “A study on data deduplication in HPC storage systems,” in IEEE Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Utah; 2012, pp. 1–11.
  • D. T. Meyer, and W. J. Bolosky, “A study of practical deduplication,” ACM Trans. Storage (TOS), Vol. 7, no. 4, pp. 1–20, 2012. DOI:10.1145/2078861.2078864.
  • V. Tarasov, D. Jain, G. Kuenning, S. Mandal, K. Palanisami, P. Shilane, S. Trehan, and E. Zadok, Dmdedup: Device mapper target for data deduplication, Ottawa Linux Symposium. Ottava; 2014, pp. 1–14.
  • H. Yu, X. Zhang, W. Huang, and W. Zheng, “Pdfs: Partially dedupped file system for primary workloads,” IEEE. Trans. Parallel. Distrib. Syst., Vol. 28, no. 3, pp. 863–876, 2017. DOI:10.1109/TPDS.2016.2594070.
  • B. Mao, H. Jiang, S. Wu, and L. Tian, “Leveraging data deduplication to improve the performance of primary storage systems in the cloud,” IEEE Trans. Computers, Vol. 65, pp. 1775–1788, 2015. DOI:10.1109/TC.2015.2455979.
  • K. Srinivasan, T. Bisson, G. R. Goodson, and K. Voruganti, “iDedup: Latency-aware, inline data deduplication for primary storage FAST'12,” in Proceedings of the 10th USENIX conference on File and Storage Technologies. San Jose, CA: 2012, pp. 1–14.
  • Y. Tan, H. Jiang, D. Feng, L. Tian, Z. Yan, and G. Zhou, “SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup,” in 2010 39th International Conference on Parallel Processing. San Diego, CA; 2010, pp. 614–623.
  • J. Yin, Y. Tang, S. Deng, Y. Li, and A. Y. Zomaya, “D3: A dynamic dual-phase deduplication framework for distributed primary storage,” IEEE Trans. Computers, Vol. 67, no. 2, pp. 193–207, 2017. DOI:10.1109/TC.2017.2743199.
  • B. Zhu, K. Li, and R. H. Patterson, “Avoiding the disk bottleneck in the data domain deduplication file system,” in 6th USENIX Conference on File and Storage Technologies. San Jose, CA; 2008, pp. 1–14.
  • Y. Tang, J. Yin, and W. Lo, “Saud: Semantics-aware and utility-driven deduplication framework for primary storage,” in 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. New York; 2015, pp. 190–197.
  • K. Jin, and E. L. Miller, “The effectiveness of deduplication on virtual machine disk images,” in Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference. Haifa, Israel; 2009, pp. 1–12.
  • Y. Fu, H. Jiang, N. Xiao, L. Tian, and F. Liu, “AA-dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment,” in 2011 IEEE International Conference on Cluster Computing. Washington, DC; 2011, pp. 112–120.
  • P. Zhang, P. Huang, X. He, H. Wang, and K. Zhou, “Resemblance and mergence based indexing for high performance data deduplication,” Elsevier J. Syst. Softw., Vol. 128, no. C, pp. 11–24, 2017. DOI:10.1016/j.jss.2017.02.039.
  • B. Lin, S. Li, X. Liao, J. Zhang, and X. Liu, “Leach: An automatic learning cache for inline primary deduplication system,” Springer Frontiers Computer Sci., Vol. 8, no. 2, pp. 175–183, 2014. DOI:10.1007/s11704-014-3377-2.
  • A. Wildani, E. L. Miller, and O. Rodeh, “Hands: A heuristically arranged non-backup in-line deduplication system,” in IEEE 29th International Conference on Data Engineering (ICDE). Brisbane, Australia; 2013, pp. 446–457.
  • B. K. Debnath, S. Sengupta, and J. Li, “ChunkStash: Speeding up inline storage deduplication using flash memory,” in USENIX Annual Technical Conference. Boston, MA; 2010, pp. 1–16.
  • H. Wu, C. Wang, Y. Fu, S. Sakr, K. Lu, and L. Zhu, “A differentiated caching mechanism to enable primary storage deduplication in clouds,” IEEE. Trans. Parallel. Distrib. Syst., Vol. 29, no. 6, pp. 1202–1216, 2018.  DOI:10.1109/TPDS.2018.2790946.
  • S. Wu, X. Chen, and B. Mao, “Exploiting the data redundancy locality to improve the performance of deduplication-based storage systems,” in 2016 IEEE 22nd international conference on parallel and distributed systems (ICPADS). Wuhan, China; 2016, pp. 527–534.
  • X. Du, W. Hu, Q. Wang, and F. Wang, “ProSy: A similarity based inline deduplication system for primary storage,” in IEEE International Conference on Networking, Architecture and Storage (NAS). Boston; 2015, pp. 195–204.
  • A. Godavari, C. Sudhakar, and T. Ramesh, “Hybrid deduplication system–a block-level similarity-based approach,” IEEE Syst. J., Vol. 15, no. 3, pp. 3860–3870, 2020. DOI:10.1109/JSYST.2020.3012702.
  • J. Paulo, and J. Pereira, “Efficient deduplication in a distributed primary storage infrastructure,” ACM Trans. Storage (TOS), Vol. 12, no. 4, pp. 1–35, 2016. DOI:10.1145/2876509.
  • S. Saharan, G. Somani, G. Gupta, R. Verma, M. S. Gaur, and R. Buyya, “QuickDedup: Efficient VM deduplication in cloud computing environments,” J. Parallel. Distrib. Comput., Vol. 139, no. C, pp. 18–31, 2020. DOI:10.1016/j.jpdc.2020.01.002.
  • D. Bhagwat, K. Eshghi, D. D. Long, and M. Lillibridge, “Extreme binning: scalable, parallel deduplication for chunk-based file backup,” in 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems. London; 2009, pp. 1–9.
  • T. Yang, H. Jiang, D. Feng, Z. Niu, K. Zhou, and Y. Wan, “DEBAR: A scalable high-performance de-duplication storage system for backup and archiving,” in 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). Atalanta; 2010, pp. 1–12.
  • J. Wei, H. Jiang, K. Zhou, and D. Feng, “MAD2: A scalable high-throughput exact deduplication approach for network backup services,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Incline Village, NV; 2010, pp. 1–14.
  • A. Z. Broder, “On the resemblance and containment of documents,” IEEE Proceedings Compression and Complexity of SEQUENCES 1997.  Washington, DC; 1997, pp. 21–29.
  • FIU traces web-link, 2010. http://iotta.snia.org/traces/390/.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.