Abstract
Large-scale distributed database includes a large number of redundant data, which will lead to low correlation between the data, resulting in lower network efficiency. This paper proposes a method for optimizing redundant data classification in distributed databases. Firstly, the redundant data attributes are extracted to provide the accurate data base for the classification of redundant data. Then the optimal classification threshold is obtained according to the weight probability theory, and the redundant data classification is realized. The experimental results show that the algorithm can improve the accuracy of classification and achieve satisfactory results.