SOICT 2018: THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY
PROGRAM FOR FRIDAY, DECEMBER 7TH
Days:
previous day
all days

View: session overviewtalk overview

08:30-09:15 Session 8: Keynote Talk: Networking and Big Data: Challenges and Opportunities
Location: Lotus
08:30
Networking and Big Data: Challenges and Opportunities

ABSTRACT. Big Data is one of the hottest topics in data science community. However, we found that the fancy learning and mining applications would be impossible without the support from the underneath layers, which is generally named as Networking for Big Data or Big Data Networking. It is an emerging and attractive topic in networking and communication communities. In this talk, we will firstly overview the current landscape of this energetic area, and then present the unprecedented challenges in this new domain, and finally discuss the current research directions in the field. We humbly hope this talk will shed light for forthcoming researchers to further explore the uncharted part of this promising land.

09:15-09:25Coffee Break
09:25-10:25 Session 9A: IoT and Blockchain
Chair:
Location: Lotus 1
09:25
Blockchain and Internet of Things Opportunities and Challenges

ABSTRACT. We are now living in the significant era where technologies are advancing very quickly. Among the leading technologies such as Artificial Intelligence, Machine Learning, Deep Learning, Blockchain, Virtual Assistance, we could see that Internet of Things or IoT in short is also standing at the peak of inflated expectation in the Gartner Hype Cycle. The application of IoT has evolved from home appliances in the smart home environment such as smart tv, smart refrigerator, smart door and has grown into smart building, smart cities, smart health care, smart industries. We could imagine that everything could go online to cut cost, add value to businesses and increase users convenience. The prediction has been made that there will be 50 Billion IoT devices by 2020.

Along the new revolution there has also been threats that significantly warn us if we didn't take security into account some bad things could happen, such as mirai botnet attack dyn DNS server by using vulnerable IoT devices which turn smart devices become exploitable or vulnerable devices and named as the weakness link of the network. At the same time another hype among technology communities of using Blockchain Technology to secure the Internet of Things is getting mainstream.

Blockchain is not a silver bullet to secure Internet of Things, in this paper we will propose an architecture model of using a Blockchain called Hyperledger Fabric to secure the Internet of Things network in a smart classroom scenario and integrated with The Thing Networks (TTN) cloud server and demonstrates the opportunities and challenges of this concept by taking security properties such as confidential, integrity, availability and non-repudiation to evaluate the concept.

09:45
A survey on opportunities and challenges of Blockchain technology adoption for revolutionary innovation

ABSTRACT. In the “Industry 4.0” era, blockchain as well as related distributed ledger technologies has been an unmissable trend for both academy and industry recently. Blockchain technology has become famous as the innovative technology that underlies cryptocurrencies such as Bitcoin and Ethereum platform. It also has been spreading with multiple industries exploring their capabilities and new blockchain use cases springing up on a daily basis. Its emergence has brought a great deal of impact on how the information will be stored and processed securely. Furthermore, almost of advocates say that blockchain will disrupt and change everything from education to financial payments, insurance, intellectual property, healthcare,… in the years to come. However, a comprehensive survey on potential and issues of blockchain adoption in academy and industry has not been yet accomplished. This paper tries to conduct a comprehensive survey on the blockchain technology adoption by discussing its influences as well as the opportunities and challenges when utilizing it in the real-world scenarios.

10:05
UiTiOt v3: A Hybrid Testbed for Evaluation of Large-Scale IoT Networks

ABSTRACT. The 21st century is promising to be the era of technology standing out with the thrive of the Internet of Things (IoT). The rapid growth of the IoT leads to an increasingly urgent need of testbed systems. Taking a new step in our long-term research, in this paper, we introduce the third version of our testbed named UiTiOt v3. It is actually a hybrid system which puts virtual wireless nodes together with real physical devices to carry out a variety of IoT experiments. Utilizing the power of the emulation tool, namely QOMET along with the two-layer virtualization based on two state-of-the-art virtualization techniques (Container and OpenStack), our research is towards a large-scale testbed having great advantages in terms of reliability, availability as well as cost-and-effort-effectiveness. After giving details of the overall architectural design, we make a discussion on two typical experiments out of a vast number of test cases we have already done to demonstrate the feasibility and to show up the potential of our system

09:25-10:25 Session 9B: Software Engineering I
Location: Lotus 2
09:25
Personal Diary Generation from Wearable Cameras with Concept-Augmented Image Captioning and Wide Trail Strategy

ABSTRACT. Writing diary is not only a hobby but also provides a personal lifelog for better analysis and understanding of a user's daily activities and events. However, in a busy society, people may not have enough time to write in diary all their social interaction. This motivates our proposal to develop a ubiquitous system to automatically generate daily text diary using our novel method for image captioning from photos taken periodically from wearable cameras. We propose to incorporate common visual concepts extracted from a photo to enhance the details of the image description. We also propose a wide trail beam search strategy to enhance the naturalness of text caption. Our captioning method improves the results on MSCOCO dataset on four metrics: BLEU, METEOR, ROUGE-L, CIDEr. As compared to the method proposed by Xu et.al and Neuraltalk of Karpathy, our model has better performance on all four metrics. We also develop smart glasses and a prototype smart workplace in which people can have their personal diary generated from photos taken by smart glasses. Furthermore, we also apply a transformer machine translation model in order to translate captions into Vietnamese language. The results are promising and can be used for Vietnamese people.

09:45
Walk Rally Application for Revitalizing Shopping Areas

ABSTRACT. We investigated whether shoutengai, old shopping districts in Japan, can be revitalized by solving the disparity problem in the number of customers among shoutengai according to their location and by solving the problem of inequality in popularity among the stores in a shoutengai. We developed the ``store appeal value'', which is unique index of showing the attractiveness of a store and developed a walk rally application based on incentives based on the store appeal value. We investigated whether shoutengai can be revitalized by using our walk rally application. To verify the effectiveness of the proposed application , we conducted a demonstration experiment involving three shoutengai in Shinjuku Ward, Tokyo. By analyzing the behavior logs obtained from the experiment, we clarified that our store appeal value is appropriate. We were also able to show that it is possible to revitalize shoutengai by using our walk rally application, that is, to distribute customers among shoutengai and among stores in a shoutengai.

10:05
DECOM: A framework to support evolution of IoT services

ABSTRACT. In the heterogeneous and dynamic Internet of Things (IoT), applications and services are frequently subject to change for various reasons such as maintaining their functionality, reliability, availability, and performance. Detecting and communicating these changes are still performed manually by responsible developers and administrators. Such a mechanism will not be adequate anymore in the future of large-scale IoT environments. Therefore, we present a comprehensive framework for automatic detection and communication changes (DECOM framework). Here, we assume that the capabilities and interfaces of IoT devices are described and provided through REST services. To be able to detect syntactic as well as semantic changes, we transform an extended version of the interface description into a logic program and apply a sequence of analysis steps to detect changes. The feasibility and applicability of the framework are demonstrated through a motivation scenario.

10:25-10:35Coffee Break
10:35-11:35 Session 10A: Algorithms
Chair:
Location: Lotus 1
10:35
Traveling Salesman Problem with Multiple Drones

ABSTRACT. Combining trucks and drones in delivering parcels is an emerging research in recent years. In this paper, we investigate an extension of the TSP-D problem in which a truck travels with $m$ ($m>1$) drones (called TSP-$m$D) instead of one drone in TSP-D. We adapt the greedy randomized adaptive search procedure (GRASP) (proposed by Ha et al.) and propose an adaptive large neighbourhood search (ALNS) heuristic to the resolution of this problem. Experimental results on different instances show that by combining a truck with more than one drone, the GRASP can bring more efficient solution. Moreover, the ALNS is more efficient than the GRASP in this context.

10:55
A dynamic programming algorithm for the maximum induced matching problem in permutation graphs

ABSTRACT. For a finite undirected graph G = (V, E) and a positive integer k ≥ 1, an edge set M ⊆ E is a distance-k matching if the pairwise distance of edges in M is at least k in G. The special case k = 2 has been studied under the name maximum induced matching (MIM for short), i.e., a maximum matching which forms an induced subgraph in G. MIM arises in many applications, such as artificial intelligence, game theory, computer networks, VLSI design and marriage problems. In this paper, we design an O(n^2) solution for finding MIM in permutation graphs based on a dynamic programming method on edges with the aid of the sweep line technique. Our result is better than the best known algorithm.

11:15
Implementation of OpenMP Data-Sharing on CAPE

ABSTRACT. CAPE (Checkpointing-Aided Parallel Execution) is a framework that automatically translates and executes OpenMP on distributedmemory architectures based on checkpoint technique. In some experiments, this approach shows high-performance on distributedmemory system. However, it has not been fully developed yet. This paper presents an implementation of OpenMP data-sharing on CAPE that improves the capability, reduces checkpoint size and makes CAPE even more performance.

11:35
CVSS: A Blockchainized Certificate Verifying Support System

ABSTRACT. Blockchain has shown its great potential with the success of Bitcoin. By using a decentralized peer-to-peer network together with a public and distributed ledger to decentralize the central authority, the blockchain technology can go beyond financial transactions. In this paper, we propose an approach that utilizes the blockchain technology to issue immutable digital certificates and improve the current limitations of the existing certificate verifying systems such as faster, more trusted, and independent of the central authority. Our prototype has been successfully deployed for several short-term courses at the Center of Computer Engineering, HCMC University of Technology, Vietnam. This indicates that our proposed system is an appropriate solution adopting ICT for e-government, especially in certificate and diploma management.

11:45
Evaluating the security levels of the Web-Portals based on the standard ISO/IEC 15408

ABSTRACT. Evaluating the security level of the Web-Portal is an urgent need, but it is not yet paid enough attention. The formal model of the standard ISO/IEC 15408 cannot be directly applied to Web-Portals due to the generality and the abstraction of the model. This paper proposes a feasible model for evaluating the security levels of the Web-Portals based on the standard ISO/IEC 15408, which is highly feasible in the practice.

10:35-11:55 Session 10B: Software Engineering II
Location: Lotus 2
10:35
Automated Large Program Repair based on Big Code

ABSTRACT. The task of automatic program repair is to automatically localize and generate the correct patches for the bugs. A prominent approach is to produce a space of candidate patches, then find and validate candidates on test case sets. However, searching for the correct candidates is really challenging, since the search space is dominated by incorrect patches and the search space size is huge. This paper presents several methods to improve the automated program repair system Prophet, called Prophet+. Our approach contains three steps: 1) extract twelve relations of statements and blocks for Bi-gram model using Big code, 2) prune the search space, 3) develop an algorithm to re-rank candidate patches in the search space. The experimental results show that our proposed system enhances the performance of the Prophet, as known as the state-of-the-art system, significantly. Specifically, for the top 1, our system generates correct patches for 17 over 69 bugs while Prophet is 15.

10:55
Formal Verification of ALICA Multi-agent Plans using Model Checking

ABSTRACT. In multi-agent systems (MAS), plans consisting of sequences of actions are used to accomplish the team task. A critical issue for this approach is avoiding problems such as deadlocks and safety violations. Our recent work addresses that matter by verifying plans composed in a language called ALICA~(A Language for Interactive Cooperative Agents) that controls the agents' behavior. The investigation is conducted by creating a translation tool that implements an algorithm for translating ALICA plans into the format used by the real-time model checker UPPAAL. We tested our concept using several cases, and the result is promising to get further insight on multi-agent model checking .

11:15
A method for Automated User Interface Testing of Windows-based Applications

ABSTRACT. This paper proposes a method for automated user interface testing of Windows-based applications to increase the accuracy in identifying the target widgets or executing several interactions. The key idea of this method is to generate new test scenarios from widgets and test specification where widgets are extracted during the execution of the application and test specification is generated by combining the interactions of widgets. Furthermore, the paper contributes some techniques to detect hidden widgets which considering as one of the most challenging problems in user interface testing. Currently, a supporting tool has been implemented and tested with several industrial projects. The details of the experimental results will be presented and discussed.

11:35
A Bayesian Critical Path Method for Managing Common Risks in Software Project Scheduling

ABSTRACT. Although project managers nowadays can use a range of tools and techniques to develop, monitor and control project schedules, the task of creating project schedules is often very difficult since it has to deal with planning against uncertainty. Popular techniques for project scheduling based on the assumption that projects are carried out as planned or scheduled – which hardly happens. This paper takes the advantage of Bayesian Networks in modeling uncertainty and incorporates them in Critical Path Method - one of the most popular means of monitoring project scheduling. The paper also examines common risk factors in project scheduling and proposes a model of 19 common risk factors. A tool was also built, and experiments were carried out to validate the model.

11:55-13:15Lunch at Epice Restaurant
13:15-14:00 Session 11: Keynote Talk: Massive fungal biodiversity data re-annotation and visualization with multi-level clustering
Location: Lotus
13:15
Massive fungal biodiversity data re-annotation and visualization with multi-level clustering

ABSTRACT. With the availability of newer and cheaper sequencing technologies, genomic data are being generated at an increasingly fast pace. In spite of the high degree of complexity of currently available search routines, the massive number of sequences available virtually prohibits quick and correct identification of large groups of sequences sharing common traits. Hence, there is a need for clustering tools for automatic knowledge extraction to enable the curation of large-scale databases. Current sophisticated approaches on sequence clustering are based on pairwise similarity matrices. This is impractical for databases of hundreds of thousands of sequences since such a similarity matrix alone would exceed the available computer memory. In this talk, I will present a new approach called MultiLevel Clustering (MLC) to avoid a majority of sequence comparisons, and therefore, the total runtime for clustering is significantly reduced. An implementation of the algorithm allowed clustering of all 344,239 ITS (Internal Transcribed Spacer) fungal sequences from GenBank utilizing only a normal desktop computer within 22 CPU-hours whereas the greedy clustering method took up to 242 CPU-hours. MLC has been applied to predict optimal thresholds to identify fungal species and higher taxa using the DNA barcode datasets generated at the Westerdijk Institute, and to reveal the most frequently sampled environmental sequence types that have been difficult to be assigned to meaningful taxonomic levels.

14:00-14:10Coffee Break
14:10-15:50 Session 12A: Network Communication and Security
Location: Lotus 1
14:10
Data Redundancy Dynamic Control Method for High Availability Distributed Clusters

ABSTRACT. For session control servers of carriers networks, the scale out type session control server architecture that could control system performance flexibly has been studied. Network anomaly detection technology using autoencoder has attracted attention. An autoencoder is one of the dimensionality reduction algorithm using neural network. We propose methods to recover from faults in a short time for high availability distributed clusters using consistent hashing in the environments with network anomaly detection technology. The methods controls data redundancy before serious faults of servers or networks using anomaly detection technology. We evaluated three anomalous server selection methods by calculation and computer simulation. We also confirmed the feasibility of the data redundancy dynamic control methods by implementation and operation experiment.

14:30
Design and Experimental Validation of SFC Monitoring Approach with Minimal Agent Deployment

ABSTRACT. Service Function Chaining (SFC) is a new flexible network service deployment model to efficiently address the overwhelming increase in demand for new services. SFC consists of dynamically provisioned softwarized service functions (SFs), which are logically chained together to deliver a particular service. Software Defined Networking (SDN) simplifies the control and management of SFCs by centralizing the control plane, as it manages the SF links and controls the service flow traffic. Various critical functions like load balancing, fault management, and congestion avoidance in SFC are dependent on effective monitoring system.However, conventional monitoring approaches have high signaling cost due to the deployment of Monitoring Agents (MAs) in all SFs.In this paper, we present an SFC monitoring approach that reducesthe signaling costby deploying MAs in minimum number of SFs. We propose an SF selection algorithm that identifies the faulty SF using an optimized set of SFs to deploy the MAs.We conduct thetestbed experimentsto evaluate the effectiveness of our approach.The results show that our approach reduces the signaling cost by 59.2% compared with the conventional one. We furtherpresent the effect of various thresholds and data rates on the proposed SFC monitoring approach.

14:50
A Performance Study of Color-Based Caching in Telco-CDNs by Using Real Datasets
SPEAKER: Anh-Tu Tran

ABSTRACT. Telco-CDN can be considered as a promising trend in the future to significantly reduce traffic by deploying content servers deeper inside the network of Internet Service Provider (ISP), which helps network operators meet the rapid growth of Video-on-Demand (VoD) services. The recent color-based strategy has proven its impressive performance in utilizing simple color tags to effectively distribute contents across such a network. In this work, we verify the feasibility of this approach by reproducing the experiments on traces of real accesses at our server. Since all previous evaluations were performed on simulated data generated by the gamma distribution, which produces duplicated results when the experiments are re-performed, utilizing the server access log helps us exploit the real traces of data accessed on a daily basis to assess the color-based approach in various situations. The empirical results on our dataset show that the color-based caching using 4 colors coupled with its routing strategy gives the best performance in most of our cases, among some other variants and the LFU caching approach. As inserting 455 new contents, the hybrid feature helps the color-based approach to improve 8.1% hit rate compared to the single cache area.

15:10
Online estimation for Packet Loss Probability of MMPP/D/1 queuing by Importance Sampling

ABSTRACT. In this paper, we propose a new method to estimate the packet loss probability of MMPP/D/1 queuing system by Importance Sampling (IS). In order to estimate rare event we do not increase arrival rate of traffic, but we decrease service rate of queuing packet. Then, we implement our algorithm and compare accuracy and simulation time of our experiments to the Monte Carlo method (MC) and conventional IS method.

15:30
Techniques for improving performance of the CPR-based approach

ABSTRACT. TCP-targeted low-rate distributed denial-of-service (LDDoS) attacks have created an opportunity for attackers to reduce their total attaking rate (and hence, the detection probability of the attacks) while inflicting the same damage to TCP flows as traditional flooding-based DDoS attacks. CPR-based approach has been proposed by Zhang et al. to detect and filter this kind of DDoS attacks, but its performance in terms of TCP throughput under attack is shown to be limited by the way it calculates CPR for each flow. In this paper, we will propose some modifications to the CPR-based approach in order to increase its performance. Simulation results show that the modifications can increase performance significantly.

14:30-15:50 Session 13: ICT Solution for E-Governance I
Location: Lotus 2
14:30
NADM: Neural Network for Android Detection Malware

ABSTRACT. Over recent years, Android is always captured roughly 80\% of the worldwide smartphone volume. Due to that popularity and its open characteristic, the Google's OS is becoming the system platform most targeted by mobile malware. They can cause a lot of damage on Android devices such as data loss or sabotage of hardware. To deal with the danger and complexity of these malwares, machine learning is one of the best methods to choose. In this paper, we propose an effective Android malware classification method called NADM, which is applied a Neural Network Model during building detection model. First, we perform a static analysis to gather several features of an Android application. Then, these data will be embedded into a joint vector space, which to be input for the training part of deep learning process. The classier model can be used to build a high accuracy system.

14:50
A Website Defacement Detection Method Based on Machine Learning Techniques

ABSTRACT. Website defacement attacks have been one of major threats to websites and web portals of private and public organizations. The attacks can cause serious consequences to website owners, including interrupting the website operations and damaging the owner’s reputation, which may lead to big financial losses. A number of techniques have been proposed for website defacement monitoring and detection, such as checksum comparison, diff comparison, DOM tree analysis and complex algorithms. However, some of them only work on static web pages and the others require extensive computational resources. In this paper, we propose a machine learning-based method for website defacement detection. In our method, machine learning techniques are used to build classifiers (detection profile) for page classification into either Normal or Attacked class. As the detection profile can be learned from training data, our method can work well for both static and dynamic web pages. Experimental results show that our approach achieves high detection accuracy of over 93% and low false positive rate of less than 1%. In addition, our method does not require extensive computational resources, so it is practical for online deployment.

15:10
Spatial Decision Tree Analysis to Identify Location Pattern

ABSTRACT. Jakarta has become a megacity with elaborate service network activities. Fast food restaurants as a type of food service provider have a role in supporting urban lifestyles. Despite the growth of value and transaction volume, there are some fast food categories in Indonesia which have a negative percentage of outlets growth. In general, the location of fast food restaurants divides into two categories. The first one is stand-alone restaurants, and the second is restaurants which located in other public facilities, such as malls, supermarket, and market area. According to the first law of Tobler, closer public facilities will have activity relatedness. This study aims to examine whether proximity between fast food restaurant locations and other public facilities affect categories of fast food restaurants, using spatial decision tree analysis approach. The public facilities examined for proximity to fast food restaurants consist of 11 criteria, which are considered to have a co-location pattern from previous research results. The results will be spatial characteristics of public facilities which expected to be indicators of consumer movement behavior, especially from and to fast food restaurant.

15:30
Detecting Attacks on Web Applications using Autoencoder

ABSTRACT. Web attacks have become a real threat to the Internet. This paper proposes the use of autoencoder to detect malicious pattern in the HTTP/HTTPS requests. The autoencoder is able to operate on the raw data and thus, does not require the hand-crafted features to be extracted. We evaluate the original autoencoder and its variants and end up with the Regularized Deep Autoencoder, which can achieve an F1-score of 0.9463 on the CSIC 2010 dataset. It also produces a better performance with respect to OWASP Core Rule Set and other one-class methods, reported in the literature. The Regularized Deep Autoencoder is then combined with Modsecurity in order to protect a website in real time. This algorithm proves to be comparable to the original Modsecurity in terms of computation time and is ready to be deployed in practice.

14:55-15:50 Session 14: Short presentations
Location: Jasmine
14:55
Aspect Based Sentiment Analysis Using NeuroNER and Bidirectional Recurrent Neural Network

ABSTRACT. Nowadays, understanding sentiments of what customers say, think and review plays an important part in the success of every business. In consequence, sentiment analysis (SA) has been becoming a vital part in both academic and commercial standpoint in recent years. However, most of the current sentiment analysis approaches only focus on detecting the overall polarity of the whole sentence or paragraph. That is the reason why this work presents another approach to this task, which is Aspect Based Sentiment Analysis (ABSA). My proposed ABSA has two main phases: aspect term extraction and aspect sentiment prediction. For the first phase, as to deal with the named-entity recognition (NER) task, it is performed by reusing the NeuroNER program without any modifications because it is currently one of the best NER tool available. For the sentiment prediction task, a bidirectional gated recurrent unit (Bi-GRU) Recurrent Neural Network (RNN) model which processes 4 features as input: word embeddings, SenticNet, Part of Speech and Distance is implemented. However, because this network architecture performance on SemEval 2016 dataset showed some drawbacks and limitations that influenced the polarity prediction result, this work proposes some adjustments to this model to solve the current problems and improve the accuracy of the second task.

15:00
An Effective Ensemble Deep Learning Framework for Malware Detection

ABSTRACT. Malware (or malicious software) is any program or file that brings harm to a computer system. Malware includes computer viruses, worms, trojan horses, rootkit, adware, ransomware and spyware. Due to the explosive growth in number and variety of malware, the demand of improving automatic malware detection has increased. Machine learning approaches are a natural choice to deal with this problem since they can automatically discover hidden patterns in large-scale datasets to distinguish malware from benign. In this paper, we propose different deep neural network architectures from simple to advanced ones. We then fuse hand-crafted and deep features, and combine all models together to make an overall effective ensemble framework for malware detection. The experiment results demonstrate the efficiency of our proposed method, which is capable to detect malware with accuracy of 96.24% on our large real-life dataset.

15:05
Vietnamese Speaker Authentication Using Deep Models

ABSTRACT. Speaker Authentication is the identification of a user from voice biometrics and has a wide range of applications such as banking security, human computer interaction and ambient authentication. In this work, we investigate the effectiveness of acoustic features such as Mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), and Linear Predictive Codes (LPC) extracted from audio streams for constructing feature spectral images. In addition, we propose to use the deep Residual Network models for user verification from feature spectrum images. We evaluate our proposed method under two settings over the dataset collected from 20 Vietnamese speakers. The results, with the Equal Error rate of around 4\%, have demonstrated that the feasibility of Vietnamese speaker authentication by using deep Residual Network models trained with GFCC spectral feature images.

15:10
Development of a Vietnamese Large Vocabulary Continuous Speech Recognition System under Noisy Conditions

ABSTRACT. In this paper, we first present our effort to collect a 500-hour corpus for Vietnamese read speech. After that, various techniques such as data augmentation, recurrent neural network language model rescoring, language model adaptation, bottleneck feature, system combination are applied to build the speech recognition system. Our final system achieves a low word error rate at 6.9% on the noisy test set.

15:15
Fingerprint Recognition using Gabor wavelet in MapReduce and Spark

ABSTRACT. Fingerprint recognition is one of the most popular biometric recognition methods nowadays. It is applicable in many areas including time recorder systems, criminal tracking, authentication and system security. However, one of the challenges to current traditional methods is the dependence on the minutiae extraction and recognition time. Hence, the limitations of these methods are that they do not effect to recognition in a large data environment. In addition, the processing of input images is very important for improving the accuracy of the recognition process. MapReduce technique is used in exploring and analyzing of large data that can not be processed on classical techniques due to some constraints on computer resources such as processing capability, memory, etc. We performed parallel processing in feature extraction and recognition with the MapReduce model in a Spark environment. We have also compared the accuracy and the runtime of our method before and after using MapReduce in the Spark. The experimental results show that the proposed method has achieved the automatic and effective fingerprint recognition.

15:20
Refined Cattle Detection Using Composite Background Subtraction and Brightness Intensity from Bird’s Eye Images
SPEAKER: Mami Aotani

ABSTRACT. Breeding cattle are known to be social animals that make groups as humans. Focusing on the sociality of the cattle, this paper aims to grasp and predict the conditions of breeding cattle by detecting the interactions between them. In order to detect such interactions, it is necessary to follow the behaviors of the breeding cattle to examine how they approach each other. In this study, the positions and movements of the breeding cattle are detected from bird’s eye images. In the preceding study, breeding cattle were experimentally detected by the background subtraction method using multiple background images because of the poor distinctive features of breeding cattle. However, the method employed in that study used images that may not completely remove breeding cattle in a background image in order to cope with the changing brightness, which may cause errors in detection. Moreover, a huge amount of time may be consumed in selecting the optimal background image for the input image. Therefore, we propose a method in this paper by applying composite background images and reduction of search images using brightness to the method of the preceding study. The composite background image is an image obtained by overriding other images to the breeding cattle region, resultantly removing the cattle region. When creating the composite background, we consider that the image that does not contain cattle region can be used as a background image which may successfully improve the detection accuracy. When selecting an optimal background image, we also consider as that the processing time will be shortened by reducing the search images by brightness. In the experiment, the precision and the processing time are compared based on the cases with or without composite background image and by reduction of the search images by brightness. As a result, it was confirmed that the detection accuracy was improved by the proposed method and the processing time could be shortened.

15:25
Prediction and Portfolio Optimization in Quantitative Trading Using Machine Learning Techniques

ABSTRACT. Quantitative trading is an automated trading system in which the trading strategies and decisions are conducted by a set of mathematical models. Quantitative trading applies a wide range of computational approaches such as statistics, physics, or machine learning to analyze, predict, and take advantage of big data in finance for investment. This work provides a basic idea and core components of a quantitative trading system. Machine learning offers a number of important advantages over traditional algorithmic trading. With machine learning, multiple trading strategies are implemented consistently and able to adapt to the real-time market. To demonstrate how machine learning techniques can meet quantitative trading, linear regression and support vector regression models are used to predict the stock price movement. In addition, multiple optimization techniques are employed to optimize the return and control risk in trading. One common characteristic for both prediction models is they effectively performed in short-term prediction with high accuracy and return. However, the linear regression model is outperform compared to super vector machine in short-term prediction. The prediction accuracy is considerably improved by adding technical indicators to dataset rather than adjusted price and volume. Despite the gap between prediction modeling and actual trading, the proposed trading strategy achieved a higher return than the S&P 500 ETF-SPY.

15:30
Intelligent Assistants in Higher-Education Environments: The FIT-EBot, a Chatbot for Administrative and Learning Support

ABSTRACT. The purpose of this paper is to discuss about smart learning environments and present the FIT-EBot, a chatbot, which automatically gives a reply to a question of students about the services provided by the education system on behalf of the academic staff. The chatbot can play the role of an intelligent assistant, which provides solutions for higher-education institutions to improve their current services, to reduce labor costs, and to create new innovative services. It takes into account the context information of learners and personalizes responses accordingly, focuses more on individual supports and interactive conversations. Moreover, the FIT-EBot enables learners to engage more to promote the active learning. Various artificial intelligence techniques such as text classification and named entity recognition are used in this work to enhance the system performance.

15:35
OVERALL STRUCTURAL SYSTEM SOLUTION FOR SUPPORTING SERVICES AND TOURISTS MANAGEMENT ORIENTED ON SMART CITY IN VIET NAM

ABSTRACT. Recently, "Smart tourism" has appeared as a new term to describe the application of the technological advancements that rely on sensors, big data processing technique, and new methods for connecting and exchanging information (such as IoT, RFID, and NFC) in tourism. When these technologies are utilized, the digital data become practical and valuable products. At the same time, new management tool for the government, new business opportunities for travel agencies, as well as new experiences for tourists are created. Therefore, in this paper, a model for developing sustainable and intelligent tourism is studied and then an overall architecture model of the information system that can provide supporting services and tools for visitor management (Smart Tourism Service Centre – STSC) is proposed for fostering the smart cities in Vietnam, in general, and Danang, in particular.

15:40
Migrating Vietnam Offshore into Agile

ABSTRACT. Agile and Offshoring are emerging as 2 prominent trends of the software industry. Unfortunately, offshoring might impact the results of Agile projects, and Agile practices also often cause difficulties to offshore development centers. This paper outlines the problems of migrating the offshore development centers to the Agile process as analyzed from the perspectives of the largest IT outsourcing company in Vietnam. The paper also details an improved model based on the Scrum framework, the most popular Agile implementation according to State of Agile in Version One 2017. The purpose of this model is to minimize the negative impacts of Agile in offshore development centers and vice versa. Finally, this paper also provides the result of applying the model into a "focal" project under the author's management.

15:45
The Miniaturized IoT Electronic Nose Device and Sensor Data Collection System for Health Screening by Volatile Organic Compounds Detection from Exhaled Breath

ABSTRACT. The recent convergence of ICT technology and biotechnology has led to an increasing number of areas in which machines take over what people do. The small sized medical electronic devices easily check health condition by simple test and confirm whether the bio signals are abnormal to advise medical treatment in the hospital. The role of such health screening devices is not to diagnose the disease precisely but to check bio-signal roughly. The conventional health screening devices pick blood sample to detect amount of specific component in blood but invasive blood sampling is painful and burdensome to the patient. Breath analysis is a technique that provides comfortable and easy health screening method unlike conventional techniques because it is non-invasive. However, it is difficult for people to use it because of its complex breath sampling procedures, huge system volume, and sensitive characteristics of gas sensors. We designed a smartphone-sized miniaturized electronic nose system and constructed database system to derive novel rules from various multi-sensors data. The experiment was conducted by applying the electronic nose system to actual diabetic patients and we confirmed the possibility of distinguishing the diseases had. If big data is collected, various artificial intelligence algorithms will be applied to find more accurate health screening methods.

15:50-16:05Coffee Break
16:05-17:25 Session 15A: ICT Solution for E-Governance II
Location: Lotus 2
16:05
1-1 Fingerprint Enrollment and Verification Using Two-Step Minuatia-based matching algorithm

ABSTRACT. In our previous work, we introduced a hybrid fingerprint matcher which consists of two stages: local minutiae matching stage and consolidation stage. To improve the accuracy of the former stage, in this paper we suggest characterizing each minutia by an additional feature representing the ability to distinguish it from other minutiae in the fingerprint. By utilizing the discriminability of each minutia in the calculation of the local similarity score between two minutiae, the performance of the local matching stage is improved significantly. Thereby, an increase in the accuracy of the whole matching algorithm of $0.33\%$ in \textbf{EER} and $0.51\%$ in \textbf{FMR1000} over the previous work now makes our matcher rank $2^{nd}$ in FVC2002-DB2A leaderboard.

16:25
An Efficient Parallel Algorithm for Computing the Closeness Centrality in Social Networks

ABSTRACT. Closeness centrality is an substantial metric used in large-scale network analysis, in particular social networks. Determining closeness centrality from a vertex to all other vertices in the graph is a high complexity problem. Prior work has a strong focuses on the algorithmic aspect of the problem, and little attention has been paid to the definition of the data structure supporting the implementation of the algorithm. Thus, we present in this paper an efficient algorithm to compute the closeness centrality of all nodes in a social network. Our algorithm is based on (i) an appropriate data structure for increasing the cache hit rate, and then reducing amount of time accessing the main memory for the graph data, and (ii) an efficient and parallel complete BFS search to reduce the execution time. We tested performance of our algorithm, namely BigGraph, with five different real-world social networks and compare the performance to that of current approaches including TeexGraph and NetworKit. Experiment results show that BigGraph is faster than TeexGraph and NetworKit 1.27-2.12 and 14.78-68.21 times, respectively.

16:05-17:25 Session 15B: Computer Vision and Pattern Recognition
Location: Lotus 1
16:05
Inspecting rice seed species purity on a large dataset using geometrical and morphological features

ABSTRACT. Although there is a great interest in developing automatical machines for classifying rice seed varieties, it is still unclear if differences in performance of existing techniques come from better feature descriptors or if this is due to varying inter-class or intra-class among the examined species. In this paper, we present a novel method for inspecting purity of rice seed species from the largest number of rice species dataset. The proposed method is conducted utilizing both morphological and geometrical features extracted from high resolution RGB images. Particularly, we take into account relevant pre-processing techniques so that the collected seeds are normalized by their biological structure. As a consequent, the geometrical features at local part of a seed can measured precisely. In addition, whereas existing methods include a limitation number (or a few) of examined species, we construct a dataset a much larger number of species. Because of a sufficient number of species, we can analyze the dependence of a classification performance on similarities of species (or their distinguishable), or types of the extracted features. In the evaluations, we confirm that both morphological features and geometrical features are informative. Combinations of them achieve the highest performances. Extensive evaluations on several schemes of different classifiers as well as several sub-datasets which consist of varying similarity of species are taken into account. These evaluations confirm stability and feasibility of the proposed method.

16:25
A Binarization Method for Extracting High Entropy String in Gait Biometric Cryptosystem

ABSTRACT. Inertial-sensors based gait has been considered as a promising approach for user authentication in mobile devices. However, securing gait template of the enrolled user in such system remains a challenging task. Biometric Cryptosystems (BCS) provide elegant approaches for this matter. The primary task of adopting BCS is to extract from raw biometric data a discriminative, high entropy and stable binary string, which will be used as input of BCS. Unfortunately, the state-of-the-art gait based BCS does not notice the gait features' distribution when extracting such string. Thus, the extracted binary string has low entropy, and degrades the overall system security.

In this study, we address the aforementioned drawback to improve the entropy of the extracted string which is used as input for BCS. Specifically, we design a binarization scheme, in which the distribution population of gait features are analyzed and utilized to allow the extracted binary string to achieve maximal entropy, thus the security of gait cryptosystem is improved. In addition, the binarization is also designed to provide strong capability of variation toleration to produce highly stable binary string which enhances the system friendliness. We analyzed the proposed method using a gait dataset of 38 volunteers which were collected under nearly realistic conditions. The experiment results show that our proposed binarization method improved the extracted binary string's entropy 30%, and the system achieved competitive performance (i.e., 0.01% FAR, 9.5% FRR with 139-bit key).

16:45
Frozen Shoulder Rehabilitation: Exercise Simulation and Usability Study

ABSTRACT. Frozen shoulder treatment is normally a time-consuming process. Continual physical therapy is required in practice for a patient to gradually recover over time. With the advent of mobile technology, there is an increasing number of smartphone applications being developed to facilitate patients to perform tele-rehabilitation. In this study, we incorporate animation to simulate arm movement in various exercise types via a mobile app to augment the use of biofeedback data for the treatment process. The main contribution of this paper is to simulate the frozen exercise using a Unity 3D model. The patient can do his or her exercise at home by putting the smartphone on the shoulder using an armband and send the data to the physiotherapist without the need to wait in long queue at the clinic to see the practitioner. The results indicated that our mobile app and web dashboard can help physiotherapists to easily monitor as well as manage a patient’s rehabilitation remotely.

17:05
A new assessment of cluster tendency ensemble approach for data clustering

ABSTRACT. The assessment of cluster tendency is a method determining whether a considering data-set contains meaningful clusters. Recently, this method has been of interest in the support to determine the number of data clusters. The advantages of these methods are accuracy and less parameters, while there are limitations in data size and processing speed. In this paper, we proposed a new method by integrating results of assessment of the cluster tendency on data subsets. We call an ensemble assessment of the cluster tendency method (eSACT). Experiments were conducted on synthetic data sets and color image images. The proposed algorithm exhibited high performance, reliability and accuracy compared to previously algorithms in the assessment of cluster tendency.

16:05-17:25 Session 15C: Computational Biology and Applied Soft Computing
Location: Jasmine
16:05
Semantics Based Substituting Technique for Reducing Code Bloat in Genetic Programming

ABSTRACT. Genetic Programming (GP) is a technique that allows computer programs encoded as a set of tree structures to be evolved using an evolutionary algorithm. In GP, code bloat is a common phenomenon characterized by the size of individuals gradually increasing in size during the evolution. This phenomenon has a negative impact on GP performance. In order to address this problem, we have recently proposed two code bloat control methods based on semantics including Statistics Tournament Selection with Size (TS-S) and Substituting a subtree with an Approximate Terminal (SAT-GP). In this paper,we propose another newmethod that is an extension of SAT-GP, namely Substituting a subtree with an Approximate Subprogram (SAS-GP). We then conduct an investigation and compare these methods with different GP parameter settings on a real-world time series forecasting problem. The experimental results demonstrate the benet of these methods in predicting, reducing the code bloat phenomenon, lowering the complexity of the GP solutions and speeding up the evolving process. Particularly, the new proposed SAS-GP method mostly achieves the best performance compared to other tested GP systems on the four popular performance metrics in GP research.

16:25
An Evolutionary Algorithm for Solving Task Scheduling Problem in Cloud-Fog Computing Environment

ABSTRACT. Recently, IoT (Internet of Things) has grown steadily, which generates a tremendous amount of data and puts pressure on the cloud computing infrastructures. Fog computing architecture is proposed to be the next generation of the cloud computing to meet the requirements of the IoT network. One of the big challenges of fog computing is resource management and operating function, as task scheduling, which guarantees a high-performance and cost-effective service. We propose TCaS - an evolutionary algorithm to deal with Bag-of-Tasks application in cloud-fog computing environment. By addressing the tasks in this distributed system, our proposed approach aimed at achieving the optimal trade-off between the execution time and operating costs. We verify our proposal by extensive simulation with various size of data set, and the experimental results demonstrate that our scheduling algorithm outperforms 38.6\% Bee Life Algorithm (BLA) in time-cost trade-off, especially, performs much better than BLA in execution time, simultaneously, satisfies user's requirement.

16:45
A matrix completion method for drug response prediction in personalized medicine

ABSTRACT. One of the significant goals of personalized medicine is to provide the right treatment to patients based on their molecular features. Several big projects have been launched and generated a large amount of -omics and drug response data for cell lines of the human. These projects are very useful for testing of drug responses on cell lines before employing clinical trials on humans. However, a range of drugs and cell lines have not been tested yet. Thus, many computational methods attempt to predict such the responses to maximize the treatment efficiency and to minimize side-effects. These methods use not only known drug – cell lines responses but also the similarity between drugs and between cell lines. Nevertheless, -omics data for cell lines which is used to calculate the cell-line similarities usually varies among platforms leading to heterogeneous results. Therefore, in this study, we propose a drug response prediction method (MCDRP) based on a matrix completion technique using only known drug – cell lines response information to predict drug responses for untested cell lines. The method can impute responses for not only one at time but also all drugs simultaneously. In comparison with other methods, we found that our method achieved better performance for IC50 response measurement.