Download PDFOpen PDF in browser

Short Text Clustering Algorithm Based on Frequent Closed Word Sets

EasyChair Preprint no. 1334

4 pagesDate: July 28, 2019

Abstract

The text mining of microblog topic information can effectively obtain the attention degree of internet users for news events. It is of great significance in the field of public opinion monitoring and analysis. At the situation of the algorithm of traditional frequent word set is suitable for long text information clustering, this paper proposes to mine top-K frequent corpus in short text database and then to divide microblog topic texts covering the same frequent word sets into the same cluster. Combined with the largest frequent word-sets for similarity calculation, the overlapped document is re-divided to achieve microblog short text clustering. The experimental results of microblog topic dataset and the comparison with K-means clustering algorithm show that the proposed algorithm can effectively solve the sparseness and high-dimension problem of microblog topic short text clustering and greatly improve the microblog short text clustering effect.

Keyphrases: frequent closed word sets, Microblog Topic, text clustering

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:1334,
  author = {Chunxia Jin and Qiuchan Bai},
  title = {Short Text Clustering Algorithm Based on Frequent Closed Word Sets},
  howpublished = {EasyChair Preprint no. 1334},

  year = {EasyChair, 2019}}
Download PDFOpen PDF in browser