Abstract
The rapid growth of the Internet exceeds all expectations. The analysis and mining of huge amounts of web data is facing a bottleneck in computing power and storage space. Through the use of cloud computing technology, we can facilitate the network access to powerful computing power, storage capacity and infrastructure. Cloud computing can effectively solve the problems by providing a data processing storage center of high reliability and scalability, which will improve the ability to process web data and reduce the requirements of the terminal devices. This paper studies web mining algorithms in a cloud computing environment. The web data mining algorithm and the MapReduce programming model are combined. We study the web mining techniques, especially the K-centers clustering algorithm, explore the combination of web mining algorithms and cloud computing technology and improve the data mining algorithms to adapt to the analysis and processing of mass web data based on cloud computing platforms. Our study constructs a distributed cloud environment using a Hadoop framework. In the experimental environment, we analyze the impact on computational performance by setting different block size parameters. Here, the block size determines the number that the pending data file is split, and the corresponding scale and amount of parallel calculation.