Create List of Stopwords and Typing Error by TF-IDF Weight Value

On these days, development of SNS generate huge text data. It is most important things to remove the meaningless words, stopwords and typing error to analyse text data. In English, it grew rapidly to create stopwords dictionary. However, there are few researchs in Korea for Korean language. In this research, we suggest way to firter stopwords and typing errors out by words importance with TF-IDF algorithm. First, calculate TF-IDF value from collected data. Second, decide criteria to separate to two groups by TF-IDF value and transform to n*2 matrix. Third, calculate accumulative frequency of TF-IDF weight. In this way, new accumulative frequency is gotten without stopwords and typing error. Furthermore, this method can be used in both language : Korean and English. without creating stopwords dictionary.

Keyphrases: Preprocessing, Stopwords, text mining, TF-IDF

