SOICT 2018: THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY
PROGRAM FOR THURSDAY, DECEMBER 6TH
Days:
next day
all days

View: session overviewtalk overview

08:50-09:35 Session 1: Keynote Talk: Internet of Things Revolution: Can Predictive Analytics Create Sustainable and Fair Smart Cities
Chair:
Location: Lotus
08:50
Internet of Things Revolution: Can Predictive Analytics Create Sustainable and Fair Smart Cities

ABSTRACT. Currently there are about 30 billion IoT devices connected to the Internet. By 2020, an estimated 75 billion devices will be connected. Already, 55% of the 7.4 billion population have internet connection. By 2050, 70% of the world's population and over 6 billion people are expected to live in cities and surrounding regions. Increasing population density in urban centres demands adequate provision of services and infrastructure to meet the needs of city inhabitants, encompassing residents, workers, and visitors. Managing city, people and resources (water, electricity, air, land, transport, public health) are set to become challenging. The utilization of information and communications technologies to achieve these objectives present an opportunity for the development of smart cities, where city management and citizens are given access to a wealth of real-time information about the urban environment upon which to base decisions, actions, and future planning. Smart city is the one that uses information and communications technologies to make the city services more interactive, efficient and citizen centric. Cities need to be smart and sustainable to survive while allocating resources and developing platforms that enable economic, social and environmental wellbeing. This talk presents how IoT can seamlessly integrate physical infrastructure and digital information across diverse platforms and applications to develop a common operating picture (COP) of the city.

09:35-10:20 Session 2: Keynote Talk: Three Major Approaches to Tackling Many Objectives
Location: Lotus
09:35
Three Major Approaches to Tackling Many Objectives

ABSTRACT. Many optimisation problems in the real world need to consider multiple conflicting objectives simultaneously. Evolutionary algorithms are excellent candidates for finding a good approximation to the Pareto optimal front in a single run. However, many multi-objective optimisation algorithms are effective for two or three objective only. It is an on-going challenge to deal with a larger number of objectives. In this talk, I will explain several methods for dealing with many objectives. First, we will describe a method for reducing a large number of objectives to a smaller one, especially when there is redundancy among different objectives. Second, alternative dominance relationship, other than the Pareto dominance, will be introduced into to make previously non-comparable solutions comparable. Lastly, new algorithms will be introduced to cope with many objectives through the use of two separate archives, for convergence and diversity, respectively. Our studies show that these methods are very effective and outperform other popular methods in the literature.

10:25-10:35Coffee Break
10:35-11:55 Session 3A: Data Analytics I
Location: Lotus 2
10:35
Episode-Rule Mining with Minimal Occurrences via First Local Maximization in Confidence

ABSTRACT. An episode rule of associating two episodes represents a temporal implication of the antecedent episode to the consequent episode. Episode-rule mining is a task of extracting useful patterns/episodes from large event databases. We present an episode-rule mining algorithm for finding frequent and confident serial-episode rules via first local-maximum confidence in yielding ideal window widths, if exist, in event sequences based on minimal occurrences constrained by a constant maximum gap. Results from our preliminary empirical study confirm the applicability of the episode-rule mining algorithm for Web-site traversal-pattern discovery, and show that the first local maximization yielding ideal window widths exists in real data but rarely in synthetic random data sets.

10:55
Automatic Embedding of Social Network Profile Links into Knowledge Graphs

ABSTRACT. Recent Knowledge Graphs (KGs) like Wikidata and YAGO are often constructed by incorporating knowledge from semi-structured heterogeneous data resources such as Wikipedia. However, despite their large amount of knowledge, these graphs are still incomplete. In this paper, we posit that Online Social Networks (OSNs) can become prominent data resources comprising abundant knowledge about real-world entities. An entity on an OSN is represented by a profile; the link to this profile is called a social link. In this paper, we propose a KG refinement method for adding missing knowledge to a KG, i.e., social links. We target specific entity types, in the scientific community, such as researchers. Our approach uses both scholarly data resources and existing KG for building knowledge bases. Then, it matches this knowledge with OSNs to detect the corresponding social link(s) for a specific entity. It uses a novel matching algorithm, in combination with supervised and unsupervised learning methods. We empirically validate that our system is able to detect a large number of social links with high confidence.

11:15
An Entailment-based Scoring Method for Content Selection in Document Summarization

ABSTRACT. This paper introduces a scoring method for content selection in an extractive summarization system. Our method judges the importance of a sentence based on its own information and the relation between sentences. For the relation between sentences, we utilize textual entailment, a relationship indicating that the meaning of a sentence can be inferred from another one. Unlike previous work on using textual entailment for summarization, we go a step further by looking at aligned words in an entailment sentence pair. Assuming that important words in a salient sentence can be aligned by several words in other sentences, word alignment scores are exploited to compute the entailment score of a sentence. To take advantage of local and neighbor information for facilitating the salient estimation of sentences, we combine entailment scores with sentence position scores. We validate the proposed scoring method with greedy or integer linear programming approaches for extracting summaries. Experiments on three datasets (including DUC 2001 and 2002) in two different domains show that our model obtains competitive ROUGE-scores with state-of-the-art methods for single-document summarization.

11:35
Reducing class overlapping in supervised dimension reduction

ABSTRACT. Dimension reduction is to find a low-dimensional subspace to project high-dimensional data on, such that the discriminative property of the original higher-dimensional data is preserved. In supervised dimension reduction, class labels are integrated into the lower-dimensional representation, to produce better results on classification tasks. The supervised dimension reduction (SDR) framework by [15] is one of the state-of-the-art methods that takes into account not only the class labels but also the neighborhood graphs of the data, and have some advantages in preserving the within-class local structure and widening the between-class margin. However, the reduced-dimensional representation produced by the SDR framework suffers from the class overlapping problem - in which, data points lie closer to a different class rather than the class they belong to. The class overlapping problem can hurt the quality on the classification task. In this paper, we propose a new method to reduce the overlap for the SDR framework in [15]. The experimental results show that our method reduces the size of the overlapping set by an order of magnitude. As a result, our method outperforms the pre-existing framework on the classification task significantly. Moreover, visualization plots show that the reduced-dimensional representation learned by our method is more scattered for within-class data and more separated for between-class data, as compared to the pre-existing SDR framework.

10:35-11:55 Session 3B: Image Processing and Computer Vision
Chair:
Location: Lotus 1
10:35
Calf Robust Weight Estimation Using 3D Contiguous Cylindrical Model and Directional Orientation from Stereo Images
SPEAKER: Ryo Nishide

ABSTRACT. Calving interval is often used as an indicator for fertility of beef cattle, however, maternal abilities are also required because the value of breeding cows depends on how efficiently the healthy and growing calves are produced. The calf's weight has been used as an indicator of maternity ability since the past few decades. We propose a method to estimate body weight by modeling the shape of calf using 3D information extracted from the stereo images. This method enables to predict the swelling of the cattle's body by creating a 3D model, which cannot be obtained solely from a 2D image. In addition, it is possible to estimate robust weight regardless of different shooting conditions toward cattle's posture and orientation. An image suitable for estimation is selected from motion images taken by the camera installed in the barn, and 3D coordinates are calculated by the images. Then, only the body is developed with a 3D model as it has the highest correlation with the body weight. Considering that the side of cattle's body may not be exactly perpendicular to the camera's shooting direction, a symmetric axis is extracted to find the inclination of cattle body from the camera in order to generate a 3D model based on the symmetric axis. 3D contiguous cylindrical model is used for the body of a cattle which has a rounded shape. In order to manipulate the shapes of the cylindrical surface, the circle and ellipse fittings are applied and compared. The linear regression equation of the volume of the cylindrical model and the actually measured body weight are used to estimate the cattle weight. As a result of modeling with the proposed method using the actual camera images, the correlation coefficient between the body weight and the model volume was at the best value, 0.9107. Even when experimentally examined with the different 3D coordinates obtained from other types of camera, the MAPE (Mean Absolute Percentage Error) was as low as 6.39%.

10:55
Quantifying the Approaching Behaviors for Interactions in Detecting Estrus of Breeding Cattle

ABSTRACT. In the management of beef cattle breeding, estrus detection is especially important for improving productivity. Generally, pedometers have been used since the past few decades to measure the increase in momentum, which is likely to occur during estrus. Cattle are known to be social animals which take unique actions such as the approaching or mounting other cattle during estrus. However, the sociality of cattle has not been focused much on detecting estrus accurately. In this paper, we propose the method to detect estrus in relation to the approaching behavior between cattle. We extract the approaching behavior based on position information from GPS devices attached to the cattle. Our method focuses on moving direction to detect even small approaching actions with which the momentum may not increase clearly. Furthermore, we conducted an experiment based on the records of artificial insemination to examine the efficiency of our method. As a result, we verified some cases with a distinctive characteristic related to approaching behavior, which occurred during the days when the pregnancy was confirmed from the artificial insemination records.

11:15
Cattle Community Extraction Using the Interactions Based on Synchronous Behavior

ABSTRACT. In management of beef cattle, it is important to grasp cattle's unusual conditions such as estrus and disease as soon as possible. The methods to detect the condition of estrus based on information such as the amount of activity from pedometer have been proposed so far. In this paper, we propose an innovative method to grasp cattle's status by focusing on unique changes of communities in time series. Our method has a possibility that it can discover and deal with new cases which were not found on the amount of activity in previous method. To extract cattle's communities, the nature of cattle's behaviors that synchronize in the community is used. The cattle's walking speed is calculated by position information obtained from GPS collar, and its behaviors are classified. We quantify the duration of cattle's behaviors being synchronized, and create a graph to observe the relationship between cattle. Then, we extract communities from the graph, and analyze changes of communities in time series. In the proposed method, we focused on the size of community and discovered cases that the cattle's condition, especially estrus, changed accordingly due to the dynamic changes of communities.

11:35
Cow estrus detection via Discrete Wavelet Transformation and Unsupervised Clustering

ABSTRACT. Estrus is a special periods in the life cycle of female cows. Within this period, they have much more chance to become pregnant. Successfully detecting this period increase the milk and meat productivity of the whole farm. Recently, a potential approach is unsupervised learning on motion data of the cows, similar to human activity recognition based on motion. In particular, an accelerometer is attached to the neck of the cows to measure their acceleration, then the unsupervised algorithm group the measured acceleration time-series. Recent study adopted bag-of-feature and Discrete Fourier Transform for feature extraction, yet it may not reflect the nature of motion data. Thus, we proposed a method based on Discrete Wavelet Transform to get the multi-resolution feature, Dynamic Time Wrapping as clustering distance and Iterative-K-Means as clustering algorithm, to better match with the characteristic of cows' movement. The proposed methods demonstrated higher score on human activity recognition dataset with ground truth and more reliable prediction on cow motion dataset.

11:55-13:15Lunch at Epice Restaurant
13:15-14:00 Session 4: Keynote Talk: Teamwork in Multi-Robot Systems
Location: Lotus
13:15
Teamwork in Multi-Robot Systems

ABSTRACT. The increasing number of robots around us will soon create a demand for connecting these robots in order to achieve goal-driven teamwork in heterogeneous multi-robot systems. In this presentation we focus on the engineering viewpoint of robot teamwork. While the conceptual modelling of multi-agent teamwork has been studied extensively during the last two decades, related engineering concerns have not received the same degree of attention. Now is the time to change this because real robots are available and increasingly used in real applications. Our presentation has two parts: The analysis part discusses general design challenges that apply to robot teamwork in dynamic application domains. The constructive part presents existing engineering approaches in response to these challenges. Thus, we aim at creating awareness for the manifold challenges and dimensions of the design space, and we highlight characteristics of viable technical solutions. Finally, we present some open research questions that need to be tackled in future work.

14:00-14:40 Session 5: Industrial Talks
Location: Lotus
14:00
Speech Processing in Viettel Cyberspace Center

ABSTRACT. We first present out effort to collect a 500 hours corpus for Vietnamese speech. After that, various techniques such as data augmentation, recurrent neural network language model rescoring, language model adaption, bottleneck feature, system combination are applied to build the speech recognition system. Our final system achieves a low word error rate at 6.9% on the noisy test.

14:20
BigData insights, Machine learning and AI in VCCORP

ABSTRACT. "We think the future of coding is no coding at all" - CEO Gitub Chris Wanstrath has predicted recently, opening many debate questions about future of Artificial Intelligence (AI). Will artificial intelligence replace humans?. It is highly possible. Nowaday, computer vision algorithms - automated translation, image recognition - have surpassed others in the industry even humans. AI technology improves human life, facilitating their working performance, thanks to the breakthroughs in computational technology with the rapid development of hardware (CPUs/GPUs). In this presentation, we will be disgusting AI platforms in VCCORP, the challenges and possibilities.

14:40-14:55Coffee Break
14:55-16:15 Session 6A: Deep Learning for Computer Vision I
Location: Lotus 1
14:55
Pavement crack detection using convolutional neural network

ABSTRACT. Pavement crack detection is an important problem in road maintenance. There are many processing methods, including traditional and modern methods, solving this issue. Traditional methods use edge detection or some other digital image processing for crack detection, but these approaches are sensitive to many kinds of noise and unwanted objects on the road. For the purpose of increasing accuracy, image pre-processing methods are required for many of these techniques. Recently, some techniques that utilize deep learning to detect cracks in images have achieved high accuracy, without pre-processing. However, some of them are very complicated, some make use of manually collected data and some methods still need some form of pre-processing. In this paper, we propose a method that applies a convolution neural networks to detect cracks in pavement images. Our research uses two data sets, one public data set and the other collected by ourselves. We also experimentally compare our method with some exiting methods and the experiments show that the proposed approach achieves high accuracy and generates stable models.

15:15
Efficient vehicle detector in traffic surveillance systems

ABSTRACT. Object detection is a major problem in computer vision. Recently, deep neural architectures have shown a dramatic boost in performance, but they are often too slow and burdensome for embedded and real-time applications such as video surveillance. In this paper, we describe a new object detection architecture that is faster than state-of-the-art detectors while improving the performance of small mobile models. Moreover, we apply this new architecture into the problem of vehicle detection, which is central to traffic surveillance systems. In more detail, our architecture uses an efficient backbone network in MobileNetV2, whose building blocks consist of depthwise convolutional layers. On top of this network, we build a feature pyramid using separable layers so that the model can detect objects at many scales. We train this network with smooth localization loss and weighted softmax loss in tandem with hard negative mining. Both training and test sets are built from recorded videos of Ho Chi Minh and Da Nang traffic or selected from DETRAC dataset. The experimental results show that our proposed solution can still achieve an mAP of 75% on the test set while using only around 3.4 million parameters and running at 100ms per image on a cheap machine.

15:35
Fully Residual Convolutional Neural Networks for Aerial Image Segmentation

ABSTRACT. Semantic segmentation from aerial imagery is one of the most essential tasks in the field of remote sensing with various potential applications ranging from map creation to intelligence service. One of the most challenging factors of these tasks is the very heterogeneous appearance of artificial objects like buildings, cars and natural entities such as trees, low vegetation in very high-resolution digital images. In this paper, we propose an efficient deep learning approach for aerial image segmentation. Our approach utilizes the architecture of fully convolutional network (FCN) based on the backbone ResNet101 with additional upsampling skip connections. Besides typical color channels, we also use DSM and normalized DSM (nDSM) as input data of our models. We achieve overall accuracy of 91\%, which is in top 4 among 140 submissions from all over the world on the well-known Vaihingen dataset from ISPRS 2D Semantic Labeling Contest. Especially, our approach yields better results then all state-of-the-art methods in segmentation of car objects.

15:55
Recover Water Bodies in Multi-spectral Satellite Images with Deep Neural Nets

ABSTRACT. Due to limitations of optical satellites like Landsat, MODIS, on the days that surface is covered by thick clouds, the acquired image from those devices usually suffer missing information, caused to not able to use because we can't see anything under cloudy cover. Many methods have been proposed in order to recover the missing data, but those only recover the image from one or more images that seem to be referenced images, and those approaches mostly select the similar part or corresponding pixels to recover the original damaged. Because weather has a periodicity, this research proposes a new approach for recovering damaged image, which aims to use this attribute of weather. The main idea is combining prediction and reconstruction techniques. For prediction, A time-series data of consecutive images will be used to predict the next image. This image will be used as referenced image for reconstruction process.

14:55-16:15 Session 6B: Data Analytics II
Chair:
Location: Lotus 2
14:55
High Accuracy Forecasting with Limited Input Data: Using FFNNs to Predict Offshore Wind Power Generation

ABSTRACT. This study proposes a Feed Forward Neural Net (FFNN) to forecast renewable energy generation of marine wind parks designated in Denmark. The neural network uses historical weather and power generation data for training and applies the learned pattern to forecast wind energy production. Furthermore, the study shows how to improve prediction quality by leveraging specific parameters, like the location of the weather station. In addition, we examined various parameters of the network to improve the accuracy. The proposed model distinguishes itself by the fact that the optimal accuracy of more than 90 percent can be reached with training data of only a limited size. In the presented work we apply two months of data with hourly resolution.

15:15
Combined Objective Function in Deep Learning Model for Abstractive Summarization

ABSTRACT. Abstractive Summarization is the particular task in text generation whose popular approaches are based on the strength of Recurrent Neural Network. With the purpose to take advantages of Convolution Neural Network in text representation, we propose to combine these above networks in our encoder to capture both the global and local features from the input documents. Simultaneously, our model also integrate the reinforced mechanism with the novel reward function to get the closer direction between the learning and evaluating process. Through the experiments in CNN/Daily Mail, our models gains the significant results. Especially, in ROUGE-1 and ROUGE-L, it outperforms the previous works in this task with the expressive improvement (39.09\% in ROUGE-L F1-score).

15:35
From Helpfulness Prediction to Helpful Review Retrieval for Online Product Reviews

ABSTRACT. Nowadays, online product reviews belong to a valuable data source for customers in e-commerce. They provide customers with helpful details about a given product before customers make a decision on purchasing that product. Nevertheless, in this regard, if the e-commerce system returns too many reviews to customers and they are not well presented in a relevant manner, those reviews might become cumbersome and time-consuming. In this paper, we define a helpful review retrieval task to support the customers by returning a ranked list of helpful reviews according to their helpfulness about the product of their interest. For an effective solution to the task, we also propose a method with an enhanced list of features for review representation and a multiple linear regression model using the elastic net regularization method. Our method is comprehensive as examining the task in its entirety from review’s helpfulness prediction to helpful review retrieval for online product reviews. Evaluated on a real world Amazon data set of the reviews about electronic devices, our method outperforms the others with the best values: 0.8 for the Normalized Discounted Cumulative Gain measure and 0.83 for the Accuracy measure. Such promising experimental results confirm the effectiveness of our method for the task.

15:55
Discord Discovery in Streaming Time Series based on an Improved HOT SAX Algorithm

ABSTRACT. In this paper, we propose an improved variant of HOT SAX algorithm, called HS-Squeezer, for efficient discord detection in static time series. HS-Squeezer employs clustering rather than augmented trie to arrange two ordering heuristics in HOT SAX. Furthermore, we introduce HS-Squeezer-Stream, the application of HS-Squeezer in the framework for detecting local discords in streaming time series. The experimental results reveal that HS-Squeezer can detect the same quality discords as those detected by HOT SAX but with much shorter run time. Furthermore, HS-Squeezer-Stream demonstrates a fast response in handling time series streams with quality local discords detected.

16:15-16:30Coffee Break
16:30-17:30 Session 7A: Deep Learning for Computer Vision II
Location: Lotus 1
16:30
Large Scale Fashion Search System with Deep Learning and Quantization Indexing

ABSTRACT. Recently, the problems of clothes recognition and clothing items retrieval have attracted a number of researchers, due to its practical and potential values to real-world applications. The task is to automatically find relevant clothing items given a single user-provided image without any extra metadata. Most existing systems mainly focus on clothes classification, attribute prediction, and matching the exact in-shop items with the query image. However, these systems do not mention the problem of latency period or the amount of time that users have to wait when they query an image until the query results are retrieved. In this paper, we propose a fashion search system that automatically recognizes clothes and suggests multiple similar clothing items with an impressively low latency. Through extensive experiments, it is verified that our system outperforms all existing systems in term of clothing items retrieval time.

16:50
Two-stream Deep Residual Learning with Fisher Criterion for Human Action Recognition

ABSTRACT. Action recognition is one of the most important areas in the computer vision community. Many previous work use two-stream CNN model to obtain both spatial and temporal clues for predicting task. However, two stream are trained separately and combined later by late fusion. This strategy has overlooked the spatial-temporal features interaction. In this paper, we propose new two-stream CNN architectures that are able to learn the relation between two kinds of features. Furthermore, they can be trained end-to-end with standard back propagation algorithm. We also introduce a Fisher loss that makes features more discriminative. The experiments show that Fisher loss yields higher accuracy than using only the softmax loss.

17:10
A New Framework For Crowded Scene Counting Based On Weighted Sum Of Regressors and Human Classifier

ABSTRACT. Crowd density estimation is an important task in the survei-llance camera system, it serves in security, traffic, business etc. At the present, the trend of monitoring is moving from individual to crowd, but traditional counting techniques will be inefficient in this case because of issues such as scale, clutter background and occlusion. Most of the previous methods have focused on model-ing work to accurately estimate the density map and thus infer the count. However, with non-human scenes, which have many clo-uds, trees, houses, seas etc, these models are often confused, res-ulting in inaccurate count estimates. To overcome this problem, we propose the "Weighted Sum of Regressors and Human Classifier" (WSRHC) method. Our model consists of two main parts: human – non-human classification and counting estimation. First of all, we built a Human Classifier, which filters out negative sample images (non-human images) before entering into the regressors. Then, the count estimation is based on the regressors. The difference between regressors is the size of the filters. The essence of this method is the count depends on the weighted average of the density map obtained from these regressors. This is to overcome the defects of the previous model, Switching Convolutional Neural Network (Switch-CNN) select the count as the output of one of the regressors. Multi-Column Convolutional Neural Network (MCNN) combines the count and the weight of the Regressors by fixed weights from MCNN, while our approach is adapted for individual images. Our experiments have shown that our method outperform Switch-CNN, MCNN on ShanghaiTech dataset and UCF_CC_50 dataset.

16:30-17:30 Session 7B: Data Analytics III
Location: Lotus 2
16:30
CitationLDA++: an extension of LDA for discovering topics in document network

ABSTRACT. Along with the rapid development of electronic scientific publication repositories, automatic identification topics of papers has helped a lot for the researchers in their research. Latent Dirichlet Allocation (LDA) model is the most popular method which is used to discover hidden topics in texts basing on the co-occurrence of words in a corpus. LDA algorithm has achieved good results for large documents. However, article repositories usually only store title and abstract that are too short for LDA algorithm to work effectively. In this paper, we propose CitationLDA++ model that can improve the performance of the LDA algorithm in inferring topics of the papers basing on the title or/and abstract and citation information. The proposed model is based on the assumption that the topics of the cited papers also reflects the topics of the original paper. In this study, we divide the dataset into two sets. The first one is used to build prior knowledge source using LDA algorithm. The second is training dataset used in CitationLDA++. In the inference process with Gibbs sampling, CitationLDA++ algorithm use topics distribution of prior knowledge source and citation information to guide the process of assigning the topic to words in the text. The use of topics of cited papers helps to tackle the limit of word co-occurrence in case of linked short text. Experiments with the Aminer dataset including title or/and abstract of papers and citation information, CitationLDA++ algorithm gains better perplexity measurement than no additional knowledge. Experimental results suggest that additional knowledge from the citation information can improve the performance of LDA algorithm to discover topics of papers in the case of full content of them are not available.

16:50
Random ensemble oblique decision stumps for classifying gene expression data

ABSTRACT. Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including $k$ nearest neighbors ($k$NN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.

17:10
Hot Topic Detection on Newspaper

ABSTRACT. Online newspaper nowadays is gradually replacing the traditional one and the variety of articles on newspaper motivated the need for capturing hot topics to give Internet users a shortcut to the hot news. A hot topic always reflects the people’s concern in real life and has big impact not only on community but also in business. Topic detection and tracking (TDT) algorithms have long been developed for discovery of the surging patterns of specific data. In this paper, we proposed a novel topic detection approach by applying Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on Vector Space Model (VSM) to solve the challenge in noisy data and Pearson product-moment correlation coefficient (PMCC) on high ranking keywords to identify topics behind keywords. The proposed approach is evaluated over a dataset of ten thousand of articles and the experimental results are competitive in term of precision with other state-of-the-art methods.