Download PDFOpen PDF in browserShort Text Clustering Algorithm Based on Frequent Closed Word SetsEasyChair Preprint no. 13344 pages•Date: July 28, 2019AbstractThe text mining of microblog topic information can effectively obtain the attention degree of internet users for news events. It is of great significance in the field of public opinion monitoring and analysis. At the situation of the algorithm of traditional frequent word set is suitable for long text information clustering, this paper proposes to mine top-K frequent corpus in short text database and then to divide microblog topic texts covering the same frequent word sets into the same cluster. Combined with the largest frequent word-sets for similarity calculation, the overlapped document is re-divided to achieve microblog short text clustering. The experimental results of microblog topic dataset and the comparison with K-means clustering algorithm show that the proposed algorithm can effectively solve the sparseness and high-dimension problem of microblog topic short text clustering and greatly improve the microblog short text clustering effect. Keyphrases: frequent closed word sets, Microblog Topic, text clustering
|