next day
all days

View: session overviewtalk overviewside by side with other conferences

08:00-08:30 Session 1B: Registration
Location: Yersin Ballroom
09:00-09:45 Session 3: Keynote Talk: Deep Learning for Food Recognition
Location: Yersin Ballroom
Deep Learning for Food Recognition

ABSTRACT. In multimedia, dishes recognition is regarded as a difficult problem due to diverse appearance of food in shape and color because of different cooking and cutting methods. As a result, while there is a large number of cooking recipes posted on the Internet, finding a right recipe for a food picture remains a challenge. The problem is also shared among health-related applications. For example, food-log management, which records dairy food intake, often requires manual input of food/ingredients for nutrition estimation. This talk will share with you the challenge of recognizing ingredients in dishes for recipe retrieval. Finding a recipe that exactly describes a dish is challenging because ingredient compositions vary across geographical regions, cultures, seasons and occasions. I will introduce deep neutral architectures that explore the relationship among food, ingredients and recipes for recognition. The learnt deep features are used for cross-modal retrieval of food and recipes.

10:30-10:35 Session 5: Group photos
Location: Yersin Ballroom
10:35-10:50CSBio Poster session and Coffee Break
10:50-12:10 Session 6B: Image and Video Processing
Location: Yersin Ballroom A
Image Restoration With Total Variation and Iterative Regularization Parameter Estimation

ABSTRACT. Regularization techniques are widely used for solving ill-posed image processing problems and in particular for image noise removal. Total variation (TV) regularization is one of the foremost edge preserving method for noise removal from images that can overcome the over-smoothing effects of the classical Tikhonov regularization. One of the important aspect in this approach is the involvement of the regularization parameter that needs to be set appropriately to obtain optimal restoration results. In this work, we utilize a fast split Bregman based implementation of the TV regularization for denoising along with an iterative parameter estimation from local image information. Experimental results on a variety noisy images indicate the promise of our TV regularization with iterative parameter estimation with local variance method, and comparison with related schemes show better edge preservation and robust noise removal.

Compressive Online Robust Principal Component Analysis with Optical Flow for Video Foreground-Background Separation

ABSTRACT. In the context of online Robust Principle Component Analysis (RPCA) for the video foreground-background separation, we propose a compressive online RPCA with optical flow that separates recursively a sequence of frames into sparse (foreground) and low-rank (background) components. Our method considers a small set of measurements taken per data vector (frame), which is different from conventional batch RPCA, processing all the data directly. The proposed method also incorporates multiple prior information, namely previous foreground and background frames, to improve the separation and then updates the prior information for the next frame. Moreover, the foreground prior frames are improved by estimating motions between the previous foreground frames using optical flow and compensating the motions to achieve higher quality foreground prior. The proposed method is applied to online video foreground and background separation from compressive measurements. The visual and quantitative results show that our method outperforms the existing methods.

3D Graphical Representation of DNA Sequences and its application for long sequence searching over whole genomes

ABSTRACT. With the development of Next Generation Sequencing techniques, research on the whole genome has been activated. So the it is common to analyze the biological sequence by the unit of megabytesized whole genome rather than a few kilobytes DNA segments. In general genome sequence comparison is conducted by the dynamic programming based alignment algorithm model. This method is accurate, but assuming that the length of the target sequence is short(less than a few kilobytes) since it requires the quadratic time and space complexity, O(n2) where n is the length of target and query sequences. To overcome these drawbacks in whole genome scale comparison, we suggest a newmethod for finding local similar subsequences among whole genomes. First we propose a new visualization algorithm which transforms long DNA sequences into a random walk plot in 3D space. Next we try to find the similar part between two geometric random walks from the reference genome and a query sequence. Therefore, the sequence searching problem in DNA strings can be reduced to find some parts of random walk within a relatively small-scale geometric space. This means our random walk is a good approximated representation of a very long genomic sequences. Our experiment showed that our algorithm is so successful and efficient to locate a lone query sequence over a whole genome whose size is more than 100 megabytes.

Pedestrian Localization and Trajectory Reconstruction in a Surveillance Camera Network

ABSTRACT. In this paper, we propose a high accuracy solution for locating pedestrians from video streams in a surveillance camera network. For each camera, we formulate the vision-based localization service as detecting foot-points of pedestrians in the ground plane. We address two critical issues that strongly affect the foot-point's detection results: casting shadows and pruning detection results due to occlusion. For the first issue, we adopt a removing shadow technique based on a learning-based approach. For the second issue, a regression model is proposed to prune the wrong foot-point detection results. The regression model plays a role in estimating the position by using the human factors such as height, width and its ratio. A correlation of the detected foot-points and the results estimated from the regression model is examined. Once a foot-point is missed due to uncorrelation problem, a Kalman filter is deployed to predict the current location. To link the trajectory of the human in the camera network, we base on an observation about the same ground-plane/floor in view of cameras then the transformation between a pair of cameras could be computed offline. In the experiments, a high accuracy performance for locating the pedestrians and a real-time computation are achieved. The proposed method therefore is particularly feasible to deploy the vision-based localization service in scalable indoor environments such as hall-way, squares in public buildings, offices, where surveillance cameras are common used.

10:50-12:10 Session 6C: Time Series and Predictive Models
Location: Yersin Ballroom B
DTA Hunter System: A new statistic-based framework of predicting future demand for taxi drivers

ABSTRACT. The ever-growing popularity of taxi services in modern cities creates the demand for making taxi activities more efficient. Specifically, the main aims are reducing the cruising time of taxi when drivers hunt for new passengers and maximize potential profit for the next trip, which attracts many interest of researchers. However, most research use historical GPS tracks without considering 1) the data of current day, especially a few last hours from the current time and 2) completely ignore the road-passengers (traditional passengers who hail taxi on road), which account for a large portion of taxi demand in reality. To overcome such drawbacks, we propose DTA hunter system, incorporating such information into a statistical model by vectorizing historical data and probability equations respectively. The final aim of the model is that given a taxi information (current location \& time), it will suggest $k$ parking places and optimal paths to get there that maximize the probability of picking up new passengers and the expected distance of next trip. We evaluate the model with taxi services dataset of Vietnam Vinasun Taxi in one month (from 18/10/2015 to 14/11/2015) and the result of our model (the probability of picking up new passengers in the future) is better than the daily behavior of taxi drivers in reality.

A Robust Approach for Multivariate Time Series Forecasting

ABSTRACT. Time series forecasting is often confronted with multivariate data, but few model is available in this situation. Besides, data distortion aggravates the difficulty to predict multivariate time series. To tackle such problems, we propose an approach based on convolutional neural network with a feature extraction layer added before convolution layer to extract multivariate features and handle multivariate time series data, as well as decreases the effect of distortion by transforming the sample into a denser representation with both its information and the information of its temporal neighbours. A full connection layer then fuses these extracted features and gets the final result. Given that events in the world are always related, using both the target time series and other related time series to forecast the future changes of the target dimension would achieve a better prediction. The proposed approach can process multivariate time series data and is robust to the number of samples, numeric ranges of data etc. Extensive experiments validate the effectiveness of the approach in accomplishing multivariate time series forecasting.

An Application of Similarity Search in Streaming Time Series under DTW: Online Forecasting

ABSTRACT. Time-series forecasting has had an incessant attraction to many researchers on time-series data mining. In this paper, we introduce an efficient online forecasting method based on similarity search in streaming time series under Dynamic Time Warping (DTW). The proposed method takes the newly incoming time-series subsequence, then finds k nearest neighbor subsequences and makes predictions based on the manner that these best matches evolved in the past. Prior to the similarity search, these subsequences have been extracted from the original time series by a novel segmentation technique using major extrema in time series. Experimental results show that for trend and seasonal streaming time series, the proposed method can bring out short-term forecasts with high prediction accuracy and remarkable time efficiency. Furthermore, if the streaming time series has some linear feature and no trend, another version of our proposed online forecasting method which hybridizes the aforementioned method with simple exponential smoothing can improve the prediction accuracy.

A New Model for Stock Price Movements Prediction in Vietnam Stock Market

ABSTRACT. In this paper, we introduce a new prediction model depend on Bidirectional Gated Recurrent Unit (BGRU). Our predictive model relies on both online financial news and historical stock prices data to predict the stock movements in the future. Experimental results show that our model accuracy achieves nearly 60% in S&P 500 index prediction whereas the individual stock prediction is over 65%

10:50-12:10 Session 6D: Network and Applications
Location: Hon tre Room
Entrusting a Responsibility of Personal Information Management to Users from a Service Provider

ABSTRACT. Nowadays, various network services, such as online shops and reservation of facilities, have been used with the spread of the Internet. Some of these services request users to offer personal information. However, incidents such as information leakage are occurring frequently. Thus, the responsibility of personal information management is burden for service providers. On the other hand, users cannot know how service providers use their offered personal information. Thus, the users feel uneasy to offering personal information to the service providers. Therefore, we propose a framework that a user can designate the usage procedure of his/her personal information. A service provider can entrust the responsibility of personal information management to a user because the service provider does not determine usage procedures but the user determines. Additionally, since the user can know the usage procedure of his/her personal information, the user feels easy to offering personal information. In this paper, we discuss policies (a protection policy and a use policy) defined in this framework.

Limiting the Spread of Epidemics within Time Constraint on Online Social Networks

ABSTRACT. In this paper, we investigate the problem of limiting the spread of epidemics on online social networks (OSNs) with the aim to seek a set nodes of size at most $k$ to remove from the networks such that the number of saved nodes is maximal for cases where we already know the set of infected nodes on the networks. The problem is proved to be NP-hard and it is NP-hard to approximate the problem with ratio $n^{1-\epsilon}$, for $0 < \epsilon <1$. Besides, we also suggest two algorithms to solve the problem. Experimental results show that our propsed outperform baseline algorithms.

Implementing Genetic Algorithm Accelerated By Intel Xeon Phi

ABSTRACT. In this paper, genetic algorithm (GA) accelerated by Intel Xeon Phi coprocessor based on Intel Many Integrated Chip (MIC) Architecture is proposed and called GAPhi framework. The GAPhi framework solves the power-aware task scheduling (PATS) problems in shorter execution time than sequential genetic algorithm. We evaluate GAPhi, sequential GA (SGA) and GAGPU from [8] for solving the same problem size of PATS problems. Due to limited hardware resources (i.e. memory) for executing simulation, we created a workload that contains maximum problem size of 1000 jobs and 1000 physical machines. The experimental results show the GAPhi program executed on a single Intel Xeon Phi coprocessor (61 cores) obtains significant speedup in comparison to the SGA program executed on CPU Intel® Xeon and GAGPU program executed on NVIDIA Tesla with same input problem size. They share the same GA’s parameters (e.g. number of generations, crossover and mutation probability, etc.).

A Study of Uber-based Applications

ABSTRACT. The Uber-based applications have recently created a new business model: a taxi company without any car, a tutor company without any tutor, or a hotel without any room. These applications coordinate mobile computing and peer-to-peer technology to facilitate the peer-to-peer provision of services. This paper presents a study of Uber-based applications. The paper first explains the driving force of mobile computing and peer-to-peer technology that exploit direct communications between mobile applications for services. It then describes a common application framework with the system architecture and prevailing components. We use virtual healthcare and software outsourcing case studies to demonstrate the prototyping systems with functions and evaluate service availability and performance.

12:10-13:30Lunch at Feast Restaurant - 1st Floor
13:30-14:15 Session 7B: Keynote Talk: Trustworthy Software and Automatic Program Repair
Location: Yersin Ballroom
Trustworthy Software and Automatic Program Repair

ABSTRACT. Software controls many critical infra-structures and a variety of software analysis methods have been proposed to enhance the quality, reliability and security of software components. In this talk, we will first study the gamut of methods developed so far in software validation research - ranging from systematic testing, to analysis of program source code and binaries, to formal reasoning about software components. We will also discuss the research on trustworthy software at NUS which make software vulnerability detection, localization and patching much more systematic. We will specifically explore research on futuristic programming environments which enable auto-patching of software vulnerabilities, with a focus on automatic program repair - where software errors get detected and fixed continuously. This research aims to realize the vision of self-healing software for autonomous cyber-physical systems, where autonomous devices may need to modify the code controlling the device on-the-fly to maintain strict guarantees about trust.

14:15-15:15 Session 8B: Industrial Talks
Location: Yersin Ballroom
Leveraging advanced technologies and operation excellence to build the world's first Knowledge as a Service (KaaS) platform

ABSTRACT. People have dozen of questions everyday and some are very difficult that can make users stuck in work or study. There are two popular ways for everyone to find answers online: Google and community places like Reddit, StackOverflow, or forums. While Google can give instant search results for free it can only find the generic information that can't address user's personalized needs. Community places can provide users with personalized answers to very specific questions but they are often slow and there is no guarantee about the services. Got It is the world's first Knowledge as a Service (KaaS) platform to address above issues with Google and community places. A user with a question is connected instantly with an expert who can help via 10-minute chat sessions anytime and anywhere. In this presentation we will present the approaches Got It has employed to leverage advanced technologies like AI and operations excellence to successfully deliver millions of sessions to millions of users around the world. We will also present few directions that our R&D team is heading toward to stay ahead in the marketplace.  

BigData insights, Machine learning and AI in VCCORP

ABSTRACT. "We think the future of coding is no coding at all" - CEO Gitub Chris Wanstrath has predicted recently, opening many debate questions about the future of Artificial Intelligence (AI). Will artificial intelligence replace humans?. It is highly possible. Nowadays, computer vision algorithms - automated translation, image recognition - have surpassed others in the industry even humans. AI technology improves human life, facilitating their working performance, thanks to the breakthroughs in computational technology with the rapid development of hardware (CPUs/GPUs). In this presentation, we will be discussing AI platforms in VCCORP, the challenges and possibilities.

Cooperation between research institutions and enterprises: challenges and solutions

ABSTRACT. The Industry 4 is taking place at unprecedented speed. It has both positive and negative impacts on every economic and social aspect. One of the most important components of Industry 4 is AI. Big corporates and research institutions in developed countries are investing a lot of resources in this field, and amazing results have been achieved. 
In Vietnam, universities, research institutes are showing great interests in the field of AI. One of the problems in the country, however, is that it takes so long from research to applications. 
This talk mentions challenges in cooperation between research institutions and enterprises; it then proposes some solutions that can advance the efficiency of cooperation, benefiting all parties. 

15:15-15:35CSBio Poster session and Coffee Break
15:35-17:35 Session 9B: Natural Language Processing I
Location: Yersin Ballroom A
Enhancing extractive summarization using non-negative matrix factorization with semantic aspects and sentence features

ABSTRACT. The main task in extractive text sum-marization is to evaluate the important of sentences in a document. This paper aims at improving the quality of an unsupervised summarization method, i.e. non-negative matrix factorization, by using sentence features and considering semantically related words using word embeddings (i.e. word2vec) in sentence scoring. The experiments were carried out with different scenario using the DUC 2007 dataset. Experimental results showed that when NMF was combined three types of sentence features (i.e., sur-face, content, and relevant features) and word2vec, the system got best performance with 42.34% for Rouge-1 and 10.77% for Rouge-2, increasing 0.57% Rouge-1 and 0.78% Rouge-2 in compared with only NMF.

Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization

ABSTRACT. In the context of social media, users usually post relevant information corresponding to the content of an event mentioned in a Web document. This information (called by users posts) has two important characteristics: (i) reflecting the content of an event and (ii) sharing hidden topics with the sentences in the main document. In this paper, we present a model to capture the nature of relationship between sentences and user posts such as comments in sharing hidden topics for summarization. Unlike the previous methods which usually base on hand-crafted features, our approach ranks sentences and comments based on their importance affecting the topics. The sentence-comment relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our newly proposed matrix co-factorization algorithm computes the score of each sentence and comment and extracts top \emph{m} ranked sentences and m comments as the summarization. Experimental results on two datasets in two languages of the social context summarization task (English and Vietnamese) and DUC 2004 confirm the efficiency of our model in summarizing Web documents.

Parallel Multi-feature Attention on Neural Sentiment Classification

ABSTRACT. The analysis of the review's sentiment polarity is a fundamental task in NLP. However, most of the existing sentiment classification models only focus on extracting features but ignore features' own differences. Additionally, these models only pay attention to content information but ignore the user's ranking preference. To address these issues, we propose a novel Parallel Multi-feature Attention (PMA) neural network which concentrates on fine-grained information between user and product level content features. Moreover, we use multi-feature, user's ranking preference included, to improve the performance of sentiment classification. Experimental results on IMDB and Yelp datasets show that PMA model achieves state-of-the-art performance.

Combining Convolution and Recursive Neural Networks for Sentiment Analysis

ABSTRACT. This paper addresses the problem of sentence-level sentiment analysis. In recent years, Convolution and Recursive Neural Networks have been proven to be effective network architects for sentence-level sentiment analysis. Nevertheless, each of them has their own potential drawbacks. For alleviating their weaknesses, we combined Convolution and Recursive Neural Networks into a new network architect. In addition, we employed transfer learning from a large document-level labeled sentiment dataset to improve the word embedding in our models. The resulting models outperform all recent Convolution and Recursive Neural Networks. Beyond that, our models are able to achieve comparable performance with the state-of-the-art systems of Stanford Sentiment Treebank.

News Classification from Social Media Using Twitter-based Doc2Vec Model and Automatic Query Expansion

ABSTRACT. With the development of Internet and mobile devices, people are surrounded with wide abundant of information from various online sources. News classification is among essential needs for people to organize, better understand, and utilize information from the Internet. This motivates the authors to propose a novel method to classify news from social media. First, we propose to vectorize an article with TD2V, our pre-trained Twitter-based universal document representation following Doc2Vec approach. We then define Modified Distance to better measure the semantic distance between two document vectors. Finally, we apply retrieval and automatic query expansion to get the most relevant labeled documents in a corpus to determine the category for a new article. As our TD2V is created from 297 million words in 420,351 news articles from more than one million tweets in Twitters from 2010 to 2017, it can be used as one of the efficient pre-trained models for English document representation in various applications. Experiments on datasets from different online sources show that our method achieves the classification accuracy better than existing methods, specifically 98.4 +/- 0.3 (BBC dataset), 98.9 +/- 0.7% (BBC Sport dataset), 94.1 +/- 0.2% (Amazon4 dataset), and 78.6% (20NewsGroup dataset). Furthermore, in the classification training process, we just encode all articles in the training set with TD2V, not to train a dedicated classification model for each of these datasets.

Towards State-of-the-art English-Vietnamese Neural Machine Translation

ABSTRACT. Machine translation is one of the most challenging topics in natural language processing. The common approaches to machine translation base on either statistical or rule-based methods. Rule-based translation analyzes sentence structures, requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of manually created rules. Statistics-based translation faces the challenge of collecting bilingual text corpora, which is particularly difficult for low resource language pairs as English-Vietnamese. This research aims at building state-of-the-art English-Vietnamese machine translation. Our contribution includes: (1) an enormous effort in collecting training dataset, (2) an application of advanced methods in neural machine translation to optimize the translation model, (3) an experimental result suggested the unnecessary of Vietnamese tokenization as a common pre-processing step. Our model achieves a highest BLEU score in comparison with other researches.

15:35-17:35 Session 9C: Security
Location: Yersin Ballroom B
Protecting consensus seeking NIDS modules against multiple attacks

ABSTRACT. This work concerns distributed consensus algorithms and application to a network intrusion detection system (NIDS) [20]. We consider the problem of defending the system against multiple data falsification attacks (Byzantine attacks), a vulnerability of distributed peer-to-peer consensus algorithms that has not been widely addressed in its practicality. We consider both naive (independent) and colluding attacks. We test three defense strategy implementations, two classified as outlier detection methods and one reputation-based method. We have narrowed our attention to outlier and reputation-based methods because they are relatively light computationally speaking. We have left out control theoretic methods which are likely the most effective methods, but their computational cost increase rapidly with the number of attackers. We compare the efficiency of these three implementations for their computational cost, detection performance, convergence behavior and possible impacts on the intrusion detection accuracy of the NIDS. Tests are performed based on simulations of distributed denial of service attacks using the KSL-KDD data set

FDDA: A Framework For Fast Detecting Source Attack In Web Application DDoS Attack

ABSTRACT. Anomaly detection technique is used in Intrusion Detection System/Intrusion Prevention System (IDS/IPS) products to find out Zezo-day attacks. However, the anomaly detection technique needs to conduct a training phase in order to learn or set up parameters of the system when the system is under free attack status. Moreover, the efficiency of detecting abnormal signals mainly depends on the data learned from the training phase, as well as, the updating data learned during the detection phase. In this research, we propose a framework named FDDA which can improve the speed and efficiency in defensing DDoS attacks to web application. FDDA allows to detect and quickly remove the (IP) sources of requesting packets in DDoS attacks to web application, i.e. greatly reduces the slow process of training phase. Additionally. FDDA introduces a procedure of automatically update dynamic featured data (for detecting and blocking attacking requests). It hence provides the flexibility and strength to deal with the hackers that can change their methods and forms of attacking

GINTATE: Scalable and Extensible Deep Packet Inspection System for Encrypted Network Traffic

ABSTRACT. Deep packet inspection (DPI) is a basic monitoring technology, which realizes network traffic control based on application payload. The technology is used to prevent threats (e.g., intrusion detection systems, firewalls) and extract information (e.g., content filtering systems). Additionally, transport layer security (TLS) monitoring is required as the use of the TLS protocol, including hypertext transfer protocol secure (HTTPS), is increasing. TLS monitoring is different from TCP monitoring in two aspects. First, monitoring systems cannot inspect the contents in TLS communication, which is encrypted. Second, TLS communication is a session unit composed of one or more TCP connections.

In enterprise networks, dedicated TLS proxies are deployed to perform TLS monitoring. However, the proxies cannot be used when monitored devices are unable to use a custom certificate. Additionally, the networks contain problems of scale and complexity which affect the monitoring. Therefore, the DPI processing using another method requires high-speed processing and various protocol analyses across TCP connections in TLS monitoring. However, it is difficult to realize both simultaneously.

We propose GINTATE, which decrypts TLS communication using shared keys and monitors results. GINTATE is scalable architecture that uses distributed computing and considers a relational session across multiple TCP connections in TLS communication. Additionally, GINTATE performs DPI processing that is achieved by adding an extensible analysis module. We show that GINTATE performs DPI processing by treating the relational session in distributed computing and that it is scalable by comparing the system with other systems.

DGA Botnet Detection Using Supervised Learning Methods

ABSTRACT. Modern botnets are based on Domain Generation Algorithms (DGAs) to build a resilient communication between bots and Command and Control (C&C) server. The basic aim is to avoid blacklisting and evade the Intrusion Protection Systems (IPS). Given the prevalence of this mechanism, numerous solutions have been developed in the literature. In particular, supervised learning has received an increased interest as it is able to operate on the raw domains and is amenable to real-time applications. Hidden Markov Model, C4.5 decision tree, Extreme Learning Machine, Long Short-Term Memory networks have become the state of the art in DGA botnet detection. There also exist several advanced supervised learning methods, namely Support Vector Machine (SVM), Recurrent SVM, CNN+LSTM and Bidirectional LSTM, which have not been suitably appropriated in such domain. This paper presents a first attempt to thoroughly investigate all the above methods, evaluate them on the real-world collected DGA dataset involving 38 classes with 168,900 samples, and should provide a valuable reference point for future research in this field

Using CPR metric to Detect and Filter Low-rate DDoS Flows

ABSTRACT. Low-rate distributed TCP-targeted denial-of-service(LDDoS) attack now becomes a big challenge for existing defense mechanisms. It throttles TCP throughput by exploiting TCP's timeout mechanism, which emphasizes the use of a common minimum retransmission timeout (minRTO) of 1 second. Congestion participation rate (CPR) metric and a CPR-based approach have been proposed by Zhang et al. to detect and filter LDDoS flows. The approach uses a threshold τ to judge whether a flow is an attack flow or not. If a flow having CPR greater than τ, it is considered as an attack flow, otherwise it is not. Problem arises when using the CPR-based approach with τ fixed. With that, the approach cannot simultaneously achieve high TCP throughput under attack and fairness to new TCP flows in normal time. We then propose a method of adapting τ to solve this problem. Simulation results show that the adaptive CPR-based approach can preserve TCP throughput under attack fairly well, while maintaining fairness between new TCP flows in normal time.

Efficient Secure Text Retrieval on Multi-Keyword Search

ABSTRACT. It is necessary to protect the data, while the data owner still let the users retrieve the information. In this paper, we present a secure text retrieval on multi keyword search, where the data owners and users can guarantee the privacy of their documents and searching keywords against the semi-trusted document servers while maintaining the functionality of ranked text retrieval. Our scheme also supports access control where the data owners can specify the users that can search and access their files. We build our scheme based on the term frequency ranking function that is widely used in many real text retrieval systems. Hence, the efficiency of our secure scheme is verified empirically with real text corpus.

15:35-17:15 Session 9D: Software Engineering
Location: Hon tre Room
A Compositional Type Systems for Finding Log Memory Bounds of Transactional Programs

ABSTRACT. In our previous works, we proposed several type systems that can guarantee log memory bounds of transactional programs. One drawback of these type systems is their restricted compositionality. In this work, we develop a type system that is completely compositional. It allows us to type any sub-terms of the program, instead of bottom-up style in our previous works. In addition, we also extend the language with basic elements that are close to real world languages instead of abstract languages as in our previous works. This increases the implementability of our type systems to real world languages.

A Test Data Generation Method for C/C++ Projects

ABSTRACT. This research proposes an automated test data generation method for C/C++ projects to generate the lower number of test data while gaining higher code coverage in comparison with KLEE, CAUT, PathCrawler, and CREST. In order to do that, the proposed method contributes an algorithm named loop depth first search by combining both static testing and concolic testing together. Besides, the paper also provides an improvement symbolic execution for avoiding the initial test data problem in the concolic testing. Currently, a tool supporting the proposed method has been developed and applied to test on different C/C++ projects in several software companies. The experimental results show the higher coverage with the lower number of test data compared with the existing methods. The experimental results display the effectiveness and practical usefulness of the proposed method for automated test data generation in practice.

Mutants Generation For Testing Lustre Programs

ABSTRACT. Lustre is synchronous language, widely used for the development of reactive systems, control systems and monitoring systems, such as nuclear reactors, civil aircraft, automobile vehicles... In particular, Lustre is suitable for developing real-time systems. In such applications, testing activities for fault detection play a very important role. Mutation testing is one of the most commonly used techniques for evaluating the probability of fault detection of test data. Typically, the mutants generated by a set of mutation operators of a programming language are very large, so the manual mutant generation is often very costly. In this paper, we present a mutants generator by using the set of mutation operators defined for the Lustre language. Automatic mutants generation strategy is implemented in the generator in order to reduce test cost. Mutant generation and random test data generation are also experimented on different Lustre programs.

Design and implementation of a new execution model for CAPE

ABSTRACT. CAPE, which stands for Checkpointing-Aided Parallel Execution, is an approach based on checkpoints to automatically translate and execute OpenMP programs on distributed-memory architectures. This approach demonstrates high-performance and completes compatibility with OpenMP on distributed-memory system. This paper presents the new design and implementation model for CAPE that improves the performance and makes CAPE even more flexible.

USL: Towards Precise Specification of Use Cases for Model-Driven Development

ABSTRACT. Use cases have been widely employed as an efficient means to capture and structure software requirements. A use case model is often represented by a loose combination between a UML use case diagram and a textual description in natural language. The use case model expressed in such a form often contains ambiguous and imprecise parts. This prevents integrating it into model-driven approaches, where use case models are often taken as the source of transformations. This paper introduces a domain specific language named the Use case Specification Language (USL) to precisely specify use cases with two main features: (1) The USL has a concrete syntax in graphical form that allows us to achieve the usability goal;(2) The precise semantics of USL that is defined by mapping the USL to a Labelled Transition System (LTS) opens a possibility for transformations from USL models to other artifacts such as test cases and analysis class models.

19:00-21:30Gala Dinner at Champa Island, 304, 2/4 road, Nha Trang