Abstract
This paper proposes a hot topic detection method based on Chinese semantic clustering. The method is aimed at high-dimensional Chinese WeChat and fragmentation of information. In order to analysis the sparse and content fragmentation features of Chinese WeChat and Blog data, we combine multiple strategies that repeated string computation, context adjacency analysis and linguistic rule filtering to abstract meaningful sentences, which can express independent and complete semantics. Then we construct the model of Chinese WeChat data in a relatively small and meaningful string space, and generate candidates’ topics via feature clustering and pick up the hot topics according to the heat sorting. The experimental result on the WeChat data and Blog data shows that the method can reduce the dimension of high-dimension sparse space of the blog in a way, which is effective and feasible to the WeChat hot topic detection method.