Short Text Clustering Algorithm Based on Frequent Closed Word Sets

EasyChair Preprint 1334

4 pages•Date: July 28, 2019

Abstract

The text mining of microblog topic information can effectively obtain the attention degree of internet users for news events. It is of great significance in the field of public opinion monitoring and analysis. At the situation of the algorithm of traditional frequent word set is suitable for long text information clustering, this paper proposes to mine top-K frequent corpus in short text database and then to divide microblog topic texts covering the same frequent word sets into the same cluster. Combined with the largest frequent word-sets for similarity calculation, the overlapped document is re-divided to achieve microblog short text clustering. The experimental results of microblog topic dataset and the comparison with K-means clustering algorithm show that the proposed algorithm can effectively solve the sparseness and high-dimension problem of microblog topic short text clustering and greatly improve the microblog short text clustering effect.

Keyphrases: Microblog Topic, frequent closed word sets, text clustering

Links:

https://easychair.org/publications/preprint/qvKK

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:1334,
  author    = {Chunxia Jin and Qiuchan Bai},
  title     = {Short Text Clustering Algorithm Based on Frequent Closed Word Sets},
  howpublished = {EasyChair Preprint 1334},
  year      = {EasyChair, 2019}}

Download PDF Open PDF in browser