SOICT 2023: 12TH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY
PROGRAM FOR THURSDAY, DECEMBER 7TH
Days:
next day
all days

View: session overviewtalk overview

08:40-10:00 Session 1: Keynote Speaker
Location: Conference Hall
08:40
Designing Mobile Interaction for Wellbeing

ABSTRACT. The use of computerized devices is now ubiquitous in many societies. Our daily lives are shaped by perpetually changing interactive devices and applications, from touch-based smartwatches and tablets to large digital displays. Yet while smart devices, such as laptops, pads, and smartphones, are mobile, their users are typically either stationary or risking their physical safety using such devices while in-motion. The health effects of today’s mobile devices are well-documented. These effects include the impact of a predominantly sedentary lifestyle on both health and well-being of many individuals. This talk will present high-risk thinking, design, and engineering research into the domain on mobile user interaction.

09:20
Immersive Mobile Media: From Extended Reality to the Metaverse

ABSTRACT. Immersive mobile media systems and services have become increasingly popular. Extended reality (XR) systems capture all real-and-virtual combined environments generated by computers ranging from 360-degree videos to virtual reality (VR), augmented reality (AR), augmented virtuality (AV), and mixed reality (MR). The different levels of computer-generated virtual worlds that are offered by XR systems rely on novel human-machine interfaces and advanced head-mounted displays. In this context, the metaverse is one of the latest innovations aiming to offer seamless transitions between the physical world and digital content. As such, the metaverse is foreseen to accommodate XR, the Internet of Things, and communication technologies. As far as connectivity is concerned, current 5G and future 6G mobile networks are considered as essential enablers that can cope with the high bandwidth demands of related ultra-reliable low-latency communications, massive machine-type communications, and the emerging Internet-of-Everything. This keynote discusses major components, opportunities, and challenges of immersive mobile media applications and systems. It includes an overview of the ecosystem of interconnected immersive mobile media with focus on the roadmap from XR to the metaverse, review of key performance indicators of such systems, and standardization activities.

10:10-10:40 Session 2: AI Foundation and Big Data (Poster Session)
Leveraging FFT and Hybrid EfficientNet for Enhanced Action Recognition in Video Sequences

ABSTRACT. Human Action Recognition (HAR) has emerged as a pivotal challenge in the rapidly evolving realm of artificial intelligence and computer vision. This research delves into enhancing action recognition in video sequences by synergizing the Fast Fourier Transform (FFT) capabilities and the EfficientNet model. We employed the UCF101 dataset, encompassing a diverse range of action categories, to validate our approach empirically. The results underscored that incorporating FFT in the preprocessing stage amplifies the recognition efficacy of the EfficientNet model. Comparative analyses further substantiated the superiority of this combined approach over conventional methods, marking a significant contribution in the domain of action video recognition.

Maximizing a $k$-Submodular Maximization Function under an Individual Knapsack Constraint

ABSTRACT. In this work, we consider a novel problem of maximizing monotone $k$-submodular functions under the individual knapsack constraint ($\kSMIK$) over the ground set $n$ which has found numerous applications in machine learning, including data summarization and information propagation. We propose an approximation algorithm that has approximation ratio $\frac{1-\epsilon}{2(k+1)}$ and takes $O(nk\log (n)/\epsilon)$ query complexity, where $\epsilon$ is an input parameter. Alongside theoretical analysis, we conduct extensive experiments on our proposed algorithm via some applications, such as Influence Maximization and Sensor Placement. The experimental results demonstrate that our algorithm gives competitive solution quality with state-of-the-art techniques but significantly reduces the required queries.

IQAGA: Image Quality Assessment-Driven Learning with GAN-Based Dataset Augmentation for Cross-Domain Person Re-Identification

ABSTRACT. Person re-identification (reID) is the task of matching images of the same person across different cameras or domains. It has many applications in security, surveillance, and biometrics. However, supervised learning-based person reID faces the challenge of domain shift, which means that the performance of a model trained on a specific domain (source domain) may degrade when testing on another domain (target domain) with different distributions, backgrounds, and lighting conditions. To enhance the generalization of person reID models, we propose a new approach consisting of three components: GAN-based data augmentation, cross-domain learning, and evaluation modules. Particularly, generative adversarial network (GAN) approaches are used first to generate synthetic data from real source data by diversifying the environmental condition of the dataset. We then propose a cross-domain learning approach powered by image quality assessment (IQA) to reduce the impact of low-quality images in the combined source data, including synthetic and real source data. The extensive experiments evaluate the superiority of our proposed method over state-of-the-art methods on two famous person reID benchmarks, namely DukeMTMC-reID and Market-1501.

Improving Multilingual Neural Machine Translation with Artificial Labels

ABSTRACT. Inspired by the work which uses artificial translation units (ATUs) for generation of synthetic data in low-resource Neural Machine Translation systems [12], we propose using the ATUs to enhance ability of sharing information between translation units in the Multilingual Neural Machine translation (multi-NMT) systems. In Particular, we concentrate on improving the translation of rare-words. Our method also suggest a new idea about leveraging bilingual dictionaries in multi-NMT which is still limited in prior works. Our experiments show improvements of up to +1.4 BLEU scores in the translation tasks between Chinese, Japanese and Vietnamese from the TED Talks domain.

Weapon Detection Using Deep Learning

ABSTRACT. Nowadays, security and safety issues are very complex and tend to be related to criminals using weapons to commit crimes, posing many potential risks to society. Recently, Deep Learning models have been researched and applied to Computer Vision problems. This article focuses on training a weapon detection system using the YOLO - 5, 7, 8 model and the Swin Transformer model combining Mask R-CNN, Cascade Mask R-CNN, Mask RepPoints V2 for detecting weapon in the images and providing early warning. Object recognition solution The research focuses on 3 main objects of weapons: Pistols, Rifles, and Knives. Due to limited available data, the WeaponData_VN dataset was built and described.

Clustering Mixed Data Comprising Time Series

ABSTRACT. The health and medicine sector is currently experiencing significant transformations, such as the integration of artificial intelligence in the decision-making process. In this complex system, there is a continuous data flow consisting of quantitative, qualitative, ordinal types, and time series. Hierarchical clustering is a useful tool to handle this complexity. However, clustering mixed data containing time series without distorting the inherent nature of the data poses a challenge. Although there are existing clustering techniques for mixed data or time series, the literature does not address the clustering of mixed data and time series. This paper presents several methodologies for such data clustering, including a novel algorithm based on pretopology. This hierarchical algorithm allows for customizable logical clustering, enabling health experts to better interpret and utilize the results for classification and recommendation by analyzing the hierarchy of clusters.

A flexible framework for customer behavior prediction based on ensemble learning

ABSTRACT. Predicting customer behavior holds immense significance for businesses across diverse industries, encompassing the anticipation of actions such as churn and purchasing behavior. In this paper, we introduce a tailored model for predicting customer behavior. Notably, our model is flexibly and effectively applied to datasets of two different types of problems: customer churn prediction and customer purchasing behavior prediction. The study advances by incorporating ensemble learning strategies, including stacking and voting methodologies. The outcomes reveal that the amalgamation of Random Forest, CNN, and Boosting algorithms through hard voting, alongside the optimization of classifier weights via an evolutionary algorithm, results in a notably heightened level of predictive performance in terms of AUC and F1 score. We evaluated the performance of our model on four different datasets. For the Campaign dataset, the AUC was 94.82%, and the F1 score was 68.29%. For the Cell2Cell dataset, the AUC was 86.82%, and the F1 score was 79.99%. In the case of the Bank dataset, we achieved an AUC of 87.81% and an F1 score of 89.11%. Lastly, on the Online Shoppers dataset, we obtained an AUC of 94.55% and an F1 score of 70.79%.

Impact of the ground truth quality for handwriting recognition

ABSTRACT. Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance for solving this task. However, to achieve its full potential, it requires a large amount of labeled data, which is difficult to obtain for ancient languages and scripts. Often, a trade-off has to be made between ground truth quantity and quality, as it is the case for the recently introduced Bullinger database. It contains an impressive amount of over hundred thousand labeled text line images of mostly premodern German and Latin texts that were obtained by automatically aligning existing page-level transcriptions with text line images. However, the alignment process introduces systematic errors, such as wrongly hyphenated words. In this paper, we investigate the impact of such errors on training and evaluation and suggest means to detect and correct typical alignment errors.

Adaptive Nonlinear Dimensionality Reduction with a Local Metric

ABSTRACT. Understanding the structure of multidimensional patterns, especially in unsupervised cases, is of fundamental importance in data mining, pattern recognition and machine learning. Several algorithms have been proposed to analyze the structure of high dimensional data based on the notion of manifold learning. These algorithms have been used to extract the intrinsic characteristics of different types of high dimensional data by performing nonlinear dimensionality reduction. Most of these algorithms rely on the Euclidean metric and manually choosing a neighborhood size. They cannot recover the intrinsic geometry of data manifold automatically and accurately for some data sets. In this paper, we propose an adaptive version of ISOMAP that integrates the advantages of local and global manifold learning algorithms. Faster convergence rate of this algorithm is obtained by replacing the Euclidean metric with the arc length metric as the sum of second-order approximation to the geodesic distance. Our experiments on synthetic data as well as real world images demonstrate that our proposed algorithm can achieve better performance than that of ISOMAP and solve some harder data sets that are impossible for existing global methods.

Video Sage: Video Recommendation Using Graph Convolution Neural Network

ABSTRACT. Video recommendation has become an indispensable means of helping individuals navigate through the vast expanse of videos and discover content that aligns with their interests. In current video recommendation systems, recommendations are made based on user-video interactions and specific individual content features. Despite the richness of information contained within videos, the challenge lies in effectively representing and extracting distinctive features that can be presented to users. Presently, video recommendation models predominantly rely on isolated video features for suggestions, without effectively harnessing the available information by combining these features cohesively. In this article, we introduce a video model called Video Sage, which integrate numerous video characteristics with a user's interaction history. The model is rooted in the concept of Graph Convolution Neural Network (GCN), which can discern user-video interactions and maintain scalability for future data. Additionally, we propose a solution for embedding and merging complex video features, which serve as input to the Video Sage model, enhancing recommendation efficiency. Our method has been rigorously tested on a dataset comprising users' video viewing histories on social networks and has yielded exceptionally competitive results.

Optimizing GANs using Relativistic Discriminator with Margin Losses for Semi-supervised Learning

ABSTRACT. We introduce a novel framework that combines Relativistic GANs (RGANs) and Margin Losses to enhance semi-supervised learning (SSL). Termed RMGANs, our approach integrates RGANs’ discriminator that measures relative realism with the strengths of Margin Losses, known for their distinct class separation. This union empowers our framework to simultaneously improve sample quality and encourage class separability in the SSL context. Our work delves into architecture, theoretical analysis, and emphasizing RMGANs’ potential in leveraging limited labeled data and abundant unlabeled samples in the SSL manner. Empirical evaluation across datasets MNIST and CIFAR-10 showcases RMGANs’ efficacy in achieving higher accuracy compared to the state-of-the-art. By uniting RGANs and Margin Losses within SSL, our approach advances SSL, providing insights into the synergy between generative modeling and class separability in real-world data scenarios with limited labeled data.

Understanding the Role of Population Experiences in Proximal Distilled Evolutionary Reinforcement Learning

ABSTRACT. Evolutionary Reinforcement Learning (ERL) combines the sample-efficiency property of Reinforcement Learning and exploration capabilities from the population-based search of Evolutionary Computation. These methods have shown promising performance on many continuous control tasks. However, one could observe the instability that may occur from such methods. Several works have shown that the experiences coming from the population individuals lead the state distribution shift in the policy updating process. A vanilla remedy method has been proposed to alleviate this issue by separating the experience transitions stored into two distinct replay buffers for the RL policy and the population and mixing the samples from the double buffers with a fixed ratio to update the RL policy. The effectiveness of this approach has been shown empirically on ERL method where Evolution Strategies (ES) assists an external RL agent. Nevertheless, there is not any clear investigation on Genetic Algorithm (GA) based ERL to understand how this method performs on these type of ERL approaches. In this paper, we analyze the influence of off-policy data from the quantitative and qualitative aspects coming from the GA population to the RL policy and how the mixing method performs on a state-of-the-art ERL method, namely Proximal Distilled Evolutionary Reinforcement Learning (PDERL).

Integrated Hybrid Approaches for Stock Market Prediction with Deep Learning, Technical Analysis, and Reinforcement Learning

ABSTRACT. Forecasting stock market prices through machine learning stands as a widely explored and vibrant research domain in both academic and industrial sectors. Although it can provide in-depth information and generate trading signals, accurately predicting stock prices remains challenging due to the volatile and unpredictable nature of the market. To address this limitation, recent research has advocated for incorporating deep learning alongside other methodologies such as reinforcement learning and technical analysis indicators, which have garnered considerable interest. In this study, we aim to evaluate the effectiveness of two combined approaches: deep learning with reinforcement learning and deep learning with technical analysis when predicting trading signals. We enhance previous research efforts by using modern deep learning models, including LSTM, GRU, bi-LSTM, and bi-GRU, to generate stock market predictions. We divide our experiments into two branches: firstly, information regarding market predictions will serve as input for reinforcement learning algorithms such as DQN, DDQN, and Rainbow DQN to identify optimal trading signals. Secondly, from the market prediction data, we aim to discover trends combined with technical analysis indicators like ATR, RSI, SMA, and MACD to generate trading signals. The experiments are conducted on a 10-year dataset from 2010 to 2020 for Google Stock, with a daily time frame. Our experimental findings reveal that ATR and DDQN strategies delivered the most substantial profits, reaching 29% and 30.2%, respectively, during a 16-month evaluation period spanning from September 14, 2019, to December 30, 2020. The efficacy of these strategies is influenced by the specific contextual factors, which we discuss in detail within our paper.

Fuzzy Deep Hybrid Network for Fake News Detection

ABSTRACT. The proliferation of fake news in the digital age poses a significant threat to the democratic process and undermines trust in the media. As disinformation campaigns become more sophisticated and pervasive, it has become increasingly challenging to discern credible news sources from deceptive ones. Machine learning and deep learning techniques have shown promise in automatically detecting fake news, but there is still room for improvement. In this paper, we propose an innovative fuzzy logic-based hybrid model to improve the performance of fake news detection. The model leverages a combination of news articles and textual and numerical context information. We evaluate our proposed model on a factchecking benchmark dataset and achieve state-of-the-art results. Our findings suggest that combining fuzzy logic with deep learning can improve fake news detection and provide a reliable tool for combatting disinformation.

10:40-12:00 Session 3A: AI Foundation and Big Data
Location: Conference Hall
10:40
Learning Algorithm for LesserDNN, a DNN with Quantized Weights

ABSTRACT. This paper presents LesserDNN, a model that uses a set of floating-point values {-1.0, -0.5, -0.25, -0.125, -0.0625, 0.0625, 0.125, 0.25, 0.5, 1.0} as quantized weights, and a new learning algorithm for the proposed model. In previous studies on deep neural networks (DNNs) with quantized weights, because DNNs employ the gradient descent method as their learning algorithm, quantized weights were applied only during the inference stage. Due to differentiability properties, quantized weights cannot be used when the gradient descent method is applied during training. To address this issue, we devised an algorithm based on simulated annealing. Since simulated annealing has no differentiability requirements, LesserDNN can utilize quantized weights during training. With the use of quantized weights and this simulated annealing-based algorithm, the learning process becomes a combinatorial problem. The proposed algorithm was applied to train networks on MNIST handwriting datasets. The tested models were trained with the simulated annealing-based algorithm and quantized weights, achieving the same level of accuracy as gradient descent-based comparison methods. Thus, LesserDNN has a simple design and small implementation scale because backpropagation is not applied. Moreover, this model achieves a high accuracy.

11:00
A Slim Digital Twin for a Smart City and Its Residents
PRESENTER: Matthias Melzer

ABSTRACT. In the engineering domain, representing real-world objects using a body of data, called a digital twin, which is frequently updated by “live” measurements, has shown various advantages over traditional modelling and simulation techniques. Consequently, urban planners have a strong interest in digital twin technology, since it provides them with a laboratory for experimenting with data before making far-reaching decisions. Realizing these decisions involves the work of professionals in the architecture, engineering and construction (AEC) domain who nowadays collaborate via the methodology of building information modeling (BIM). At the same time, the citizen plays an integral role both in the data acquisition phase, while also being a beneficiary of the improved resource management strategies. In this paper, we present a prototype for a “digital energy twin” platform we designed in cooperation with the city of Regensburg. We show how our extensible platform design can satisfy the various requirements of multiple user groups through a series of data processing solutions and visualizations, indicating valuable design and implementation guidelines for future projects. In particular, we focus on two example use cases concerning building electricity monitoring and BIM. By implementing a flexible data processing architecture we can involve citizens in the data acquisition process, meeting the demands of modern users regarding maximum transparency in the handling of their data.

11:20
Locally Differentially Private and Fair Key-Value Aggregation

ABSTRACT. In the era of Big Data, the ability to extract meaningful insights from vast datasets while maintaining individual privacy has become an increasingly complex challenge. Recent years have witnessed the development of various locally differentially private data aggregation schemes which allow an untrusted data collector to derive meaningful statistics from user data while maintaining strong privacy guarantee for individual users. As a fundamental data type in NoSQL databases, key-value data has two important statistics of interest, the frequency of each key and the corresponding mean value. Current locally differentially private key-value aggregation schemes primarily rely on uniform sampling for mean estimation, i.e., a single key-value pair is selected randomly from each user's key-value set. This approach, however, results in high mean estimation accuracy for frequent keys and low accuracy for infrequent ones. To tackle this problem, this paper presents the design and evaluation of Adaptive, a novel locally differentially private and fair key-value aggregation scheme that can deliver uniformly high mean estimation accuracy across different keys. In the first phase, we utilize a portion of the privacy budget to estimate the frequency of each key. Subsequently, based on the key frequencies estimated in the first phase, we employ non-uniform random sampling for mean estimation, which enables higher probability sampling of values associated with low-frequency keys. Comprehensive theoretical analysis and simulation studies confirm the superiority of Adaptive over previous solutions.

11:40
Simulated Annealing with Dynamic Programming-based Vertex Insertion for Efficiently Solving the Traveling Thief Problem

ABSTRACT. Many real-world optimization problems today are challenging to solve due to their inclusion of multiple interdependent NP-Hard subproblems. The Travelling Thief Problem (TTP), a relatively new combinatorial optimization problem, has been proposed to better model these types of problems. TTP comprises two well-known NP-Hard problems: the Travelling Salesman Problem (TSP) and the Knapsack Problem (KP). This paper introduces the SAVI algorithm, utilizing Simulated Annealing with Vertex Insertion and efficiently implemented through Dynamic Programming technique. From the obtained experimental results, the proposed algorithm runs efficiently across various instances of the TTP, yielding highly competitive outcomes compared to other state-of-the-art algorithms, especially on medium and large-sized instances.

10:40-12:00 Session 3B: Networking and Communication Technologies
Location: Tulip
10:40
Multi-radar interference experiment and performance evaluations on algorithm-based and learning-based schemes

ABSTRACT. The demand for high-resolution automotive radar operating in the 77GHz band is on the rise, especially as we approach the practical implementation of autonomous driving technology. With the increasing prevalence of in-vehicle Chirp Sequence (CS) radars in the future, there is a growing concern about the potential for broadband inter-radar interference. This interference poses a significant risk, potentially leading to a higher likelihood of undetected targets. To address this problem, various algorithm-based and learning-based schemes have been proposed to suppress the inter-radar interference, e.g., iterative threshold based zero suppression method and RNN (Recurrent Neural Network) based interference suppression method. However, they only demonstrate their effectiveness based on simulation results. In this paper, we conducted a multi-radar interference experiment with up to four interference sources, and compared the performance of different schemes by using the collected real data.

11:00
A New Transfer Learning-Based Traffic Classification Algorithm for a Multi-Domain SDN Network

ABSTRACT. To enhance the efficiency and resource utilization of a computer network, it is imperative to classify network traffic and implement distinct priority policies. Network traffic classification plays a pivotal role across various domains, including network administration, cybersecurity, and network resource optimization. As encrypted network data undergoes diverse evolution, as evident in datasets from tech giants like Google, Facebook, and Youtube, traditional traffic classification methods have given way to machine learning-based approaches. Given that computer networks are primarily deployed as distributed multi-domain systems, employing machine learning for traffic classification becomes challenging when a new network domain appears with a limited dataset. One potential remedy is to employ transfer learning, allowing knowledge transfer from a pre-trained model in an established domain to a new one. In this paper, we introduce an algorithm named Multi-class TrAdaBoost-CNN for a distributed and multi-domains SDN network. This algorithm combines a variant of the Multi-class TrAdaBoost approach with a Convolutional Neural Network (CNN) as a learner model. Our experimental results demonstrate that our proposed algorithm surpasses the performance of the traditional CNN model, achieving accuracy improvements of up to 16% even with extremely limited data.

11:20
Achieving Zero Secrecy Outage in Overlay Cognitive Radio Network

ABSTRACT. A simple scheme to guarantee zero secrecy outage in an overlay cognitive radio network is proposed with the aid of Artificial Noise (AN). The secrecy outage probability and the condition on the secrecy rates to achieve zero secrecy outage will be determined. Our scheme shows that the transmission between the transceivers will be secured toward the eavesdropper with zero outages. This means that the message can always be secured by a simple strategy of transmission with the aid of AN. The optimum power allocation to information obtained to achieve the maximum zero-outage secrecy throughput will be highlighted. Importantly, the resulting maximum secrecy throughput increases with increasing transmit power, which overcomes the limit of traditional secret transmissions.

11:40
Accurate Spectrum Sensing with Improved DeepLabV3+ for 5G-LTE Signals Identification

ABSTRACT. This paper presents a deep learning approach for fifth-generation (5G) and Long-Term Evolution (LTE) signal discrimination, explicitly focusing on identifying modulated signals in next-generation wireless networks. The mixture of modulated signals, which is essentially difficult to discern in the form of a complex envelope, should be converted into a visually informative spectrogram image by applying Fast Fourier transform (FFT). To segment spectral regions of 5G new radio (NR) and LTE in a spectrogram, we aptly improve DeepLabV3+, a deep encoder-decoder network for semantic segmentation, by incorporating an adaptive Atrous Spatial Pyramid Pooling (ASPP) block and an attention mechanism to accommodate intrinsic signal characteristics and amplify relevant features, respectively. Besides increasing the learning efficiency in the encoder, the improvement enriches the recovery capability of crucial 5G and LTE details, thus resulting in more accurate signal identification in the spectrogram image. Relying on the simulation results benchmarked on a dataset consisting of spectral images containing both LTE and 5G signals, the new network demonstrated effectiveness when compared to the original version by increasing global accuracy, mean intersection-over-union (IoU), and mean boundary-F1-score (BFScore) up to 1.37%, 2.85% and 9.43% in that order. For medium SNR level, it can achieve 98.28% global accuracy and 96.66% mean IoU, while also showing the robustness under various practical channel impairments.

10:40-12:00 Session 3C: Operations research for Sustainable Urban Development
Location: Camellia
10:40
Managing a Time Expanded Networks through Project-and-Lift

ABSTRACT. Time Expanded Networks are useful tools to modelize many Vehicle Routing Problems with synchronizations or resource transfers. However, such models are usually difficult to handle due to its large size and because the reduction techniques to discard some arcs and vertices may introduce errors with an unpredictable impact on the feasibility and on the solution quality. Recently, we have developed an alternative two-step approach, that we call ``Project-and-lift'', to solve a Relocation Problem using a Mixed Integer Linear Programming Model which involves two coupled flows over a Time Expanded Network. In the first step, we solve a ``Projected'' version of the problem that handles a part of the time constraints in an implicit way. Then, in the second step we try to convert (i.e., ``Lift'') the ``projected'' solutions into solutions of the Relocation Problem. Here, we revisit the Project-and-Lift approach, propose an exact mixed integer linear programming model for the non-strong version of the Lift problem, describe heuristics to handle it efficiently, and perform some numerical experiments.

11:00
Ensemble Learning Technique with A Novelty Multi‑Source Information for Stock Price Movements

ABSTRACT. Stock price movement is a complex problem to solve, involving diverse political and economic factors. Integrating these factors involves designing multiple data pre-processing schemes and ensemble learning techniques to develop a novel stock market prediction architecture that provides better and higher prediction accuracy rates. Numerical and text format data are both utilized as inputs for the ensemble regressors and classifiers to learn features. The trained results are concatenated and fed into the final deep learning layer to predict the direction of the closing price. Empirical results from news and historical data of five specific companies — Apple Inc. (AAPL), Microsoft Corporation (MSFT), Alphabet Inc. (GOOGL), Amazon.com, Inc. (AMZN), and Tesla Inc (TSLA) — demonstrate the effectiveness of the proposed prediction model.

11:20
A Filtering System for the Large-Scale Dial-A-Ride Problem With Shared Autonomous Vehicles

ABSTRACT. In this paper, we study a prospective transportation system in which shared autonomous vehicles provide dial-a-ride service for large passenger demands in urban and rural areas. We address this very large-scale problem using a greedy insertion heuristic. Accompanying this heuristic, we propose a filtering system to speed up the routing process. The approach includes two specific data structures used as filtering devices, three different filtering modules that are successively applied to wisely select candidate insertion parameters, and a stopping mechanism to control the computation effort spent in exploring the search space for candidate vehicles. The approach also relies on an original route encoding that exploits the topology of the road network. Experimental results show that en-route vehicles can be significantly reduced in such a system, decreasing the vehicle number by more than 98%. Thanks to the filtering system, the execution time can be reduced by almost 97% compared to the classic best-fit insertion heuristic while maintaining a comparable level of solution quality.

11:40
Joint Location and Cost Optimization with Market Expansion and Customer-centric Objective

ABSTRACT. This work concerns a facility location and cost optimization problem within a competitive market context. The primary goal is to simultaneously optimize the selection of facility locations and the allocation of budgets, with the aim of maximizing the expected market share, assuming that customers select a facility according to a random utility maximization (RUM) discrete choice model. We consider a novel variant that includes two realistic settings. First, total customer demand is no longer fixed but is modeled as a function that increases with customers' expected utilities. Second, the optimization objective now includes a customer-centric element, seeking to optimize both market share and customer satisfaction. The problem formulation would be highly relevant in real-world situations, but it is challenging to address due to the nonlinearity of the objective function. In addressing this challenge, we introduce two solution approaches. The first is a local search heuristic, and the second is an exact method built upon an outer approximation framework. The local search heuristic, leveraging monotonicity and submodularity properties, can guarantee solutions that are optimal up to (1-1/e). On the other hand, the outer approximation algorithm solves the problem iteratively through the solution of mixed-integer linear programs. Our numerical experiments, conducted on instances of varying sizes, clearly demonstrate that our outer approximation method is highly scalable and efficient.

10:40-12:00 Session 3D: Recent Advances in Cyber Security
Location: Daisy
10:40
Enhancing Intrusion Detection and Explanations for Imbalanced Vehicle CAN Network Data

ABSTRACT. In the realm of modern automobiles, the reliance on vehicle Controller Area Network (CAN) networks has surged, underlining the paramount significance of safeguarding these networks against intrusions. In this work, we unveil an innovative approach for intrusion detection and explanation within in-vehicle CAN networks, employing the formidable synergy of Extreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP). Notably, our method is tailored to address network imbalance challenges, offering prowess in both binary class and multiclass classification tasks. Integral to our approach is the seamless integration of SHAP values, serving as illuminating guides that unravel the intricacies of detected intrusions. This fusion elevates the system's interpretability, equipping stakeholders with deeper insights. Our contribution is underpinned by a rigorous evaluation of our approach, featuring a comprehensive analysis on a published dataset alongside comparisons with established literature. The results underscore the exceptional efficacy of our method, showcasing its remarkable accuracy in detecting intrusions. However, the essence of our methodology transcends mere detection precision. The explanatory capabilities of SHAP values come to the forefront. This augments both understanding and decision-making into the contributing factors behind detected intrusion classification model.

11:00
Binary Representation Embedding and Deep Learning For Binary Code Similarity Detection in Software Security Domain

ABSTRACT. Binary Code Similarity Detection (BCSD) is the process of analyzing the binary representations of two functions, programs, or related entities to generate a quantitative output that signifies the similarity score between them. This task encompasses a wide range of applications, including addressing the binary search problem, which involves searching for code segments within a binary file that are similar to a specified binary code segment. These capabilities open up numerous potential applications within the domain of binary code analysis such as software vulnerability detection, clone detection, and malware analysis. In this paper, we introduce BiSim-Inspector, a binary code similarity detection tool based on deep learning. This tool leverages the bytes2vec method, which we develop to transform the bytecode of binary functions into vectors, which are then fed into the Convolutional Neural Network - Gated Recurrent Unit (CNN-GRU) model. Additionally, we conducted a series of experiments to assess the effectiveness of our method by comparing it with existing state-of-the-art (SOTA) tools. We use a large-scale, well-structured, and diversified dataset, BinaryCorp, for the task of BCSD. The outcomes of these experiments show that our framework achieves a recall rate of 89%, which is 25% higher than existing SOTA methods, without compromising the training and prediction time.

11:20
Automated generation of adaptive perturbed images based on GAN for motivated adversaries on deep learning models

ABSTRACT. Deep learning techniques have achieved great success in many fields, such as computer vision, natural language processing and computer security. However, deep learning models are facing many security risks, in particular motivated adversaries which lead to incorrect predictions or reduce the models’ effectiveness. Many previous researches have been investigated on adversarial attacks, aiming to improve the robustness and security of deep learning models. In this article, we propose a method to automatically produce adaptive perturbed images based on GAN for motivated adversaries on deep learning models. The experimental results achieved approximately 60% success rate in evading five state-of-the-art deep learning models for image recognition including ResNet-56, MobileNetV2, VGG19_bn, ShuffleNetV2, RepVGG_a2 on the CIFAR-10 dataset. The proposed method’s results are much higher than AIGAN model proposed by Tao Bai et al. achieving 10.17%. The image fidelity of distorted images generated by the proposed method also is positive. The proposed method’s PSNR (Peak-Signal-Noise-Ratio) is greater than 40 compared to previous studies such as FGSM, DeepFoll, C&W and AdvGAN, Zhang in which PSNR is less than 30.

11:40
Strategic Improvements of SqueezeSegV2 for Road-Scene Semantic Segmentation Using 3D LiDAR Point Cloud

ABSTRACT. Semantic segmentation of LiDAR point clouds for road-scene analysis in autonomous vehicles and driver assistance systems is a challenging task due to the confusion of categories and the sparse distribution of point clouds, thus leading low performance. In this paper, we propose two important improvements to SqueezeSegV2, a deep encoder-decoder neural network, to improve the overall performance of semantic segmentation. The first improvement is the adaptive Fire module, which can be configured to be lightweight or accurate, depending on the service and application requirements. The second one is the steady Fire Deconvolution module, which boosts the accuracy of the segmentation mask reconstruction. Remarkably, both modules are improved by aptly manipulating the combination of symmetric and asymmetric grouped convolution with dilation rate to enhance the contextual learning efficiency of the deep model. We evaluate our proposed methods on the Panda dataset and show that they achieve better segmentation accuracy than the original SqueezeSegV2 model by 2.95% mean accuracy and 2.86% mean IoU, while also reducing the number of trainable parameters by around 23%.

13:30-14:10 Session 4: Kenote Speaker
Chair:
Location: Conference Hall
13:30
Optimization for sustainable supply chains

ABSTRACT. In recent decades, sustainability concerns have seen a notable surge in attention and importance. This heightened awareness and interest have not only raised public and corporate consciousness but have also given rise to numerous research questions within the area of supply chain management, particularly in the context of its optimization. This presentation aims to investigate the intricate domain of supply chain management, where sustainability and environmental considerations are at the forefront. It is dedicated to addressing the multifaceted challenges associated with optimizing supply chains while making tactical decisions that align with sustainable practices. This presentation will initially establish a comprehensive framework for understanding the landscape of sustainable supply chain planning decisions. This will provide the necessary context for the subsequent exploration of some key topics that are pivotal in achieving environmentally responsible supply chain management. These topics include but are not limited to: (1) Carbon Emission Constraints: We will investigate the strategies and methods for mitigating carbon emissions within production planning problems. This involves not only understanding the carbon footprint but also the induced complexity when optimizing the production plan; (2) Energy Management: In an era where energy efficiency and sustainability are paramount, we will analyze how production planning problems are impacted when managing energy; (3) Industrial Symbiosis: As the concept of industrial symbiosis gains prominence, we will explore how supply chains can create mutually beneficial relationships within and between industries, where one organization’s waste or byproducts become valuable resources for others. We will also deal with the induced complexity in this context. We will close this presentation by highlighting potential research directions that hold promise for further enhancing sustainability in supply chain management.

14:10-15:10 Session 5A: AI Foundation and Big Data
Location: Conference Hall
14:10
Stratified Ranking for Dense Passage Retrieval

ABSTRACT. Dense passage retrieval has recently boosted the performance of involved systems such as question answering or search engine. On this problem, prior works trained a dense retriever by learning to rank, i.e. ranking relevant/positive passages higher than irrelevant/negative ones. In this paper, we propose a Stratified Ranking approach for Dense Passage Retrieval (SR-DPR), which performs three-way ranking instead of the typical two-way positive-negative ranking. SR-DPR is concerned with three relevance levels, which are positive, hard negative and random/easy negative. We train SR-DPR model by minimizing a contrastive negative log-likelihood (NLL) loss involved with these three relevance levels, which is a finer-grained version of the N-pair loss. To efficiently implement SR-DPR, we designed three data pipelines, each pipeline is used to learn the contrast between two out of three relevance levels. SR-DPR outperforms the strong baseline DPR by 0.6-1.5% retrieval accuracy on Natural questions dataset and 3-6% of that on Zalo Legal Text Retrieval dataset. SR-DPR also gives competitive results compared with current state-of-the-art methods without requiring complicated training regime or intensive hardware resources. The idea of stratified ranking of SR-DPR is not restricted to the scope of dense passage retrieval but can be applied in any contrastive learning problem. We conducted detailed ablation studies to give insights into SR-DPR's behavior.

14:30
Enhancing Face Anti-Spoofing with Swin Transformer-driven Multi-stage Pipeline

ABSTRACT. Facial anti-spoofing distinguishes between real and fake faces in images and videos, which is crucial for security systems like facial authentication and payment applications. The Zalo AI Challenge organized a competition for more than 250 teams to use real-world face data to address this problem. In this article, we proposed to use the Swin Transformer backbone combined with a suitable pipeline, which helped us achieve second place in the competition. The competition evaluation results indicate that the proposed solution offers high accuracy with a Time-constraint EER of 7.1% and a faster processing time compared to other solutions.

14:50
Domain Adaptation in Nested Named Entity Recognition From Scientific Articles in Agriculture

ABSTRACT. In the realm of digital agriculture, the ability to make timely, profitable, and actionable decisions is dependent on agronomists using agricultural data and related cultivated data, including text sources such as news articles, farm notes, and agricultural scientific reports. Named entity recognition (NER) and agricultural entity recognition (AGER) facilitate semantic understanding, enabling precise identification, categorization of farming components, and knowledge discovery. However, current approaches of agricultural entity recognition encounter limitations due to limited resources. Moreover, the necessity to identify nested named entities emerges from the complexities inherent in the agricultural domain. Relevant information often traverses multiple interconnected elements rather than residing as isolated entities. For instance, comprehending a target farming practice might necessitate pinpointing the crop, the associated nutrients, or diseases—each constituting a nested entity within a broader context. Consequently, agricultural entity recognition from unstructured text assumes paramount importance for information retrieval and knowledge construction within this domain. This study constructs the SAGRI dataset, incorporating a novel tagset for AGER that encompasses prevalent agricultural and scientific concepts, methodically established through annotation. This tagset enables the extraction of domain-independent concepts from scientific article abstracts. This study also introduces a cutting-edge deep learning baseline, equipped with an advanced Triaffine attention mechanism for robust entity extraction. Additionally, it presents a pioneering few-shot learning strategy that optimizes cross-domain categorization, particularly when dealing with scarce training data. Notably, this strategy achieves high F1 scores in comparison with the baseline, underscoring its potential to considerably curtail required training data.

14:10-15:10 Session 5B: Networking and Communication Technologies
Location: Tulip
14:10
Perpetual Sensor Networks with the Minimum Number of Mobile Chargers

ABSTRACT. Wireless charging is a promising solution to resolving the energy constraint of wireless sensor networks. In a wireless rechargeable sensor network, mobile chargers (MCs) move around the network and charge the sensor nodes. This study focuses on the optimal deployment of MCs to perpetually maintain network operations. More specifically, we aim to determine the minimum number of MCs and their charging schedule to guarantee the perpetual lifetime of sensor nodes. To this end, we first mathematically formulate the targeted problem. We then propose a dynamic programming-based algorithm to determine the minimum number of MCs. Since the complexity of the dynamic programming-based algorithm is exponential, we introduce a lightweight algorithm based on local search. We conduct experiments to prove the effectiveness of the proposed algorithm compared to other alternatives. The experimental results show that our algorithm can reduce the number of sensors by at least 23.8% on average and 68% in the best case compared to the existing algorithms. In addition, we also perform theoretical analysis to derive the computational complexity of the proposed algorithm.

14:30
Secure Energy Scavenging NOMA Communication with Jamming

ABSTRACT. Energy scavenging (ES) non-orthogonal multiple access (NOMA) communication ameliorate various aspects of wireless communication, such as communication reliability, energy-and-spectral efficiencies. The paper further secures this communication with jamming, eventually offering a comprehensive approach to improving energy-and-spectral efficiencies, reliability, and security. It provides mathematical models and practical insights into how these improvements can be achieved, taking into account factors like ES nonlinearity, channel-and-hardware impairments. Additionally, it highlights the flexibility of the proposed approach in balancing security and reliability requirements.

14:50
GPU-Based Parallel Path Planning for Mobile Robot Navigation in Dynamic Environments

ABSTRACT. In this paper, we propose a new approach called the Parallel Kinematic Rapidly Exploring Random Tree (PK-RRT) algorithm for mobile robot path planning in dynamic environments. Our PK-RRT algorithm incorporates kinematic constraints when the robot moves on the RRT tree to ensure that the paths generated are feasible and smooth. It also applies the multithreading technique in which multiple threads work in parallel to speed up the path generation process and improve its convergence. We conducted a comprehensive performance comparison with two existing algorithms, namely Bidirectional Rapidly Exploring Random Tree (Bi-RRT) and Quadratic Rapidly Exploring Random Tree (Quad-RRT). The results show that our PK-RRT algorithm outperforms these existing algorithms regarding computation time, computational efficiency, and stability.

14:10-15:10 Session 5C: Operations research for Sustainable Urban Development
Location: Camellia
14:10
Synchronising Lot Sizing and Job Scheduling

ABSTRACT. We deal here with the production by a subcontractor of resources that a final job scheduler will use to achieve a sequence of jobs. Both are provided with limited storage capacities and must synchronise respective production and consumption processes. The resulting scheduling problem appears, from the point of view of the resource producer, as a multi-stage Lot Sizing problem, where every stage has to be delimited by transfers which become the core of the decision. Such a problem may typically arise in contexts when the resource is some renewable energy (hydrogen, photo-voltaic, etc.) required by the jobs and stored inside tanks or batteries. We first set the SLSS: Synchronised Lot Sizing/Scheduling problem in the MILP format and handle it through branch and cut. Next, we get rid of the non {0, 1} decision variables by applying Bender's cut generation. Finally, we reformulate the SLSS problem as a path search problem set in a specific Transfer space and handle it through an adaptation of the A* algorithm.

14:30
A Monte Carlo Tree Search with Ant Colony Optimization for Inter-domain Path Computation Problem

ABSTRACT. Over the past few years, the Hierarchical Path Computation Element (h-PCE) architecture has been advocated for handling packet routing in multi-domain networks. However, this architecture may not be able to effectively deal with large-scale networks, especially as the number of network components increases. To address this challenge, the Inter-Domain Path Computation problem under the Domain Uniqueness constraint (IDPC-DU) was proposed to enhance the h-PCE architecture. The IDPC-DU limits the search space by finding paths that pass through each domain only once. Although the constraint can help increase the search process’s efficiency significantly, IDPC-DU is still an NP-hard problem. So far, some algorithms have been proposed to solve the problem, but either the encoding method is complex, or their search easily falls into the local optima. To overcome the drawbacks, we propose a two-level approach combining Monte Carlo Tree Search (MCTS) and Ant Colony Optimization (ACO). In the proposed algorithm, we represent the problem’s search space in the form of a search tree using MCTS, then ACO algorithm is used to find a good solution on that tree. Moreover, a pruning technique is integrated into ACO to reduce the solution space. The algorithm is tested on many benchmark datasets. The experimental results show that the proposed algorithm outperforms all other algorithms in most cases.

14:50
Improve the Quantum Approximate Optimization Algorithm with Genetic Algorithm

ABSTRACT. The Quantum Approximate Optimization Algorithm (QAOA) is a variational quantum optimization technique used for solving combinatorial optimization problems. However, in constrained binary optimization, QAOA's reliance on equal initial probabilities for all solutions can lead to suboptimal outcomes. To enhance the performance of QAOA in this context, we propose a novel approach that combines QAOA with genetic algorithms. In this hybrid approach, the results obtained from QAOA serve as the initial population for a genetic algorithm. We apply this methodology to address the k-heaviest subgraph problem, a critical challenge in quantum computing research. Our experiments, conducted on benchmark datasets, demonstrate a significant improvement in solution quality.

14:10-15:10 Session 5D: Lifelog Event Retrieval
Location: Daisy
14:10
News Event Retrieval from Large Video Collection in Ho Chi Minh City AI Challenge 2023
PRESENTER: Minh-Triet Tran

ABSTRACT. Event retrieval from large collections of TV news videos is crucial for efficient information access, enabling researchers, journalists, and the general public to quickly locate and analyze relevant content amidst the vast sea of news coverage, facilitating informed decision-making and a comprehensive understanding of significant events. This paper presents an overview of the AI-driven video retrieval task in Ho Chi Minh City AI Challenge 2023. The competition draws inspiration from internationally recognized competitions, namely the Video Browser Showdown (VBS) and the Lifelog Search Challenge (LSC). Participants are tasked with developing AI models to retrieve specific video segments from a diverse dataset from reputable news channels. The dataset comprises a vast collection of videos, keyframes, object detections, CLIP features, and metadata. It is divided into three packs with a total of 1,270 videos, spanning approximately 360 hours of content. The challenge comprises two groups. Group A is open to students, researchers, and practitioners in artificial intelligence and information retrieval, emphasizing substantial knowledge and experience. Group B is tailored for high school students, focusing on nurturing interest, learning, and engagement among the next generation of AI enthusiasts. The wide variation in the content of queries challenged participants to demonstrate their adaptability and creativity in effectively retrieving diverse events from the extensive TV news video dataset. The winning teams showcased promising solutions by effectively harnessing artificial intelligence and information retrieval techniques to excel in event retrieval from a vast collection of TV news videos.

14:30
Integrating Multiple Models For Effective Video Retrieval and Multi-stage Search

ABSTRACT. Video is one of the most prevalent forms of data due to the widespread availability of recording devices. This makes video retrieval systems essential since they assist in locating a video segment within a dataset that most closely matches a given query. One of the difficulties of video querying is the processing of multimedia data (images, audio, and text). In addition, it is important to integrate temporal information, as inquiries frequently pertain to the depiction of events occurring within a specific time frame. Thus, this study introduces an innovative system capable of not only integrating various types of models but also effectively managing temporal searches through multi-stage processes. The efficacy of the systems was demonstrated at the AI Challenge 2023, which took place in Ho Chi Minh City, where our team got the best accuracy across all other contestants in the qualifying phase and received top 1 position among the 60 participating teams.

14:50
Zero-shot Video Retrieval using CLIP with Temporally Ordered Multi-query Scoring
PRESENTER: Huy-Giap Bui

ABSTRACT. In this work, we present a new method for video retrieval using OpenAI's CLIP and Temporally Ordered Multi-query Scoring (TOMS). Our approach extends CLIP with a scoring function for matching multiple ordered queries, which enables fast, accurate video search while retaining its zero-shot capability. This allows effective video retrieval on any dataset without the cost of data annotation and model fine-tuning, both of which can be expensive if not unaffordable in Vietnam. An extensive benchmark against using CLIP alone shows superior performance in video searching. Furthermore, we also present our solution for Ho Chi Minh City AI Challenge 2023, which is built upon this method and achieved competitive results.

15:10-15:40 Session 6: Lifelog Event Retrieval & Networking and Communication Technologies (Poster Session)
BlazeSearch: A multimodal semantic search engine for retrieving in-video information for AI Challenge HCMC 2023

ABSTRACT. In the world today, exploring information has become a critical part of modern life. As a result, search engines have shown their ability to enhance the knowledge-seeking process. However, these search engines still focus on searching for websites or images. The capacity to find information in videos is extremely needed to experiment and study more in order to improve the power of search engines. In this study, we investigate the potentiality of in-video information search engines by introducing BlazeSearch, a multimodal search engine designed to search frames of video with simple input text. By leveraging the CLIP model, which is superior for the image-text retrieval task, our search engine can be guaranteed reliability and accuracy. Furthermore, we optimize the searching speed and provide an easy-to-use, fully functional user interface for BlazeSearch, which can help users have a pleasant experience.

Global Knowledge-Aware and Local Attention-Aware Framework for News Recommendation

ABSTRACT. Nowadays, news recommendations become the most popular approach to finding useful news information when the number of users dramatically increases after the COVID-19 pandemic. Therefore, the recommendation system must extract personalized features to simplify things for users to get the content they want and to make reading more enjoyable. In this paper, we propose the global knowledge-aware and local attention-aware recommendation framework which is improved from a knowledge-aware CNN. The global knowledge-aware module extracts the knowledge-graph via every single word in news titles, corresponding entities of its word, and associated contexts. The local attention-aware module analyzes the relationship between the candidate news and various historical news. Subsequently, from the two above modules, our framework can effectively predict whether the candidate news is clicked by the user or not. The training data is the MIND dataset. The global knowledge-aware and local attention-aware framework has shown remarkable results which outperform the existing methods in news recommendation in AUC, MRR, nDCG@5, and nDCG@10 evaluation metrics.

DoppelSearch: A Novel Approach to Content-Based Video Retrieval for AI Challenge HCMC 2023

ABSTRACT. Video retrieval, which has been considered as a critical task in the field of computer vision and pattern recognition recently, finds extensive applications in several aspects such as education, entertainment, security, and healthcare. However, it faces challenges due to the complexity of video data, the instability in feature extraction methods, or the semantic disparities between videos and text. In this paper, we present a novel approach for content-based video retrieval, named Insec-Tex2Vid, leveraging the CLIP (Contrastive Language-Image Pre-training) model architecture to classify and label video segments, offering users the ability to search for videos based on specific content . Our method capitalizes on the ViT-b/32 model for feature extraction and employs feature embedding, in conjunction with the Faiss library, to enhance search efficiency. Experimental results demonstrate our model's high accuracy and swift retrieval times, promising new opportunities in content-based video retrieval for researchers, developers, and end-users. This paper not only introduces the application of the CLIP and ViT-b/32 models, but also elaborates on the specific feature extraction process and the utilization of Faiss for optimizing video retrieval. The Insec-Tex2Vid method represents a significant stride in the field of video retrieval and holds promise for diverse applications across various industries.

Multi-User Video Search: Bridging the Gap Between Text and Embedding Queries

ABSTRACT. Video search is a crucial task in the modern era, as the rapid growth of video platforms has led to an exponential increase in the number of videos on the internet. Effective video management is therefore essential. Significant research has been conducted on video search, with most approaches leveraging image-text retrieval or searching by object, speech, color, and text in images. However, these approaches can be inefficient when multiple users search for the same query simultaneously, as they may overlap in their search spaces. Additionally, most video search systems do not support complex queries that require information from multiple frames in a video. In this paper, we propose a solution to these problems by splitting the search space for different users and ignoring images that have already been considered by other users to avoid redundant searches. To address complex queries, we split the query and apply a technique called forward and backward search.

Efficient Video Retrieval with Advanced Deep Learning Models

ABSTRACT. Video retrieval is the process of finding specific video content in a large database. This is a crucial challenge in the age of digital multimedia. This article proposes a new approach to video retrieval using advanced deep learning models to extract features and perform retrieval tasks based on those features. Our method combines multiple feature extraction methods, including keyframe extraction, OpenAI CLIP feature extraction, object detection, and automatic speech recognition (ASR). We use BERT embeddings to encode these transcripts and store them in JSON and binary file formats. Our system achieves remarkable results in indexing and retrieving videos based on their visual, audio, textual, and contextual attributes. Our system can also retrieve videos based on either a single text description or multiple text descriptions of a sequence of events. We conducted extensive tests on diverse video data from Ho Chi Minh City AI Challenge 2023 competition organizers to validate the effectiveness of our approach. The results demonstrate that our proposed system is superior to other methods in terms of both retrieval accuracy and speed, making it highly suitable for real-time applications.

Vi-ATISO: An Effective Video Search Engine at AI Challenge HCMC 2023

ABSTRACT. In this paper, we present the first version of Vi-ATISO, a fast and efficient video search engine on medium-scale datasets. The tool provides several search functions based on text-to-image retrieval, text-to-video retrieval, optical character recognition, and object detection algorithms. With diverse algorithms provided, our system can handle a larger amount of data from the AI Challenge HCMC 2023 and achieve good results. In addition, we feel confident that this search engine can be applied in practice because we also consider user experience during the development process.

Dialogue Attributes' Zero-Shot Classification Based Anime Scene's Matching for Japanese Listening Test

ABSTRACT. This paper proposes a method to provide Anime dialogue scenes for Japanese listening test training through Zero-shot classification. In this study, listening test dialogues and anime dialogue scenes were categorized by three attributes: speakers' relationship, dialogue location, and dialogue style. We collected 90 listening dialogues from each level of the past Japanese Language Proficiency Test (JLPT) and manually labeled the attributes of each dialogue. After testing the effect of the zero-shot model on listening dialogues' classification with 2,143 different sets of label keywords, nine sets with the highest accuracy were identified. We classified 247,645 anime dialogue scenes' attributes using the nine sets of label keywords and counted the number of anime scenes, word cover rate, and text similarity between anime dialogues and 90 listening tests when different kinds of attributes were matched. The results show that as the number of matched attributes increases, the range of selected anime scenes continues narrowing and being precise while keeping basically the same word cover rate and text similarity. The average number of anime scenes when matching single, double, and triple attributes were 140,988, 70,834, and 29,362, while the word cover rate of matched anime scenes to the input listening dialogues were 96.18%, 95.22%, and 94.51%.

Enhancing Video Retrieval with Robust CLIP-Based Multimodal System

ABSTRACT. Content-based video retrieval has emerged as a critical task in the age of abundant multimedia data. This paper presents an interactive video retrieval system designed to address the challenges posed by the growing volume of video content on the internet. Leveraging diverse search methods including rich text, Human Detection and Sketch-Text retrieval, the system empowers users to efficiently retrieve relevant video frames. At its core, the system utilizes the Contrast Language-Image Pretraining (CLIP) model. The user-friendly web application allows users to create queries, explore top results, find similar images, preview short video clips, and select and export pertinent data, enhancing the effectiveness and accessibility of content-based video retrieval.

WiFi-based Positioning System with k-means Clustering and Outlier Removal: Evidence from Multiple Datasets

ABSTRACT. Indoor positioning systems are receiving much attention due to their practical applications in our daily lives. WiFi Fingerprinting is the technique that is commonly used for indoor positioning due to its utilization of the WiFi infrastructure which is ubiquitously deployed in modern buildings. However, this technique suffers from the computational burden when determining the user’s position in big areas that consist of lots of fingerprints and reference points. To reduce the search space as well as computing time, the clustering algorithm can be applied. In this paper, we evaluate the efficacy of k-means clustering in the WiFi Fingerprinting-based positioning system. Moreover, k-means clustering with outlier removal is applied in several public datasets, in which the WiFi data was collected in different environmental conditions, to show how much the clustering algorithm can help in reducing the computing time as well as improving the positioning accuracy. From the experimental results, compared to popular deterministic-based algorithms, the clustering algorithm can significantly reduce the computing time by up to 79.07%. Nevertheless, it cannot surpass the positioning accuracy of the compared algorithms.

An Interactive System for Multimedia Retrieval in Video Collection with Temporal Integration

ABSTRACT. Multimedia retrieval in computer science is the process of obtaining text, images, videos, and audio segments, all in digital form relevant to an information need from a collection of these resources. With the ever-growing amount of data, scalable and interactive retrieval systems that can efficiently work on extensive data collections while maintaining high precision are in high demand by industries and researchers. This paper presents the Pumpkin system, an interactive multimedia retrieval system first used in The AI Challenge Ho Chi Minh City 2023, an annual video event and moment retrieval competition. The system is built and set in motion to handle the retrieval task in a video collection of considerable size and complexity by three primary methods: visual-text association search, object-based search, and audio speech instances search. Additionally, the system has an integrated temporal workflow to search for conceptually related shots in a sequential motion, which removes out-of-context while leveraging suitable results as the user inputs more details to the system. Our system also puts great emphasis on user experience by cooperating with a clean and intuitive interface design with simplified user-side functionality, allowing a more efficient process of information retrieval, whether primary or complex, in a huge collection of multimedia data.

Packet Timeout Probability for Cognitive Cooperative Radio Network under Security Constraint

ABSTRACT. In this paper, we analyze the packet timeout probability in a cognitive cooperative radio network (CCRN) under the security constraints of multiple primary users. In particular, we assume that a secondary transmitter (SU-Tx) sends packets to a secondary receiver (SU-Rx) with the assistance of a secondary relay (SR). The SU-Tx and SR employ a strategy to select the suitable licensed band of the primary users (PUs). Furthermore, they must adhere to joint constraints related to the security of multiple PUs and the peak transmit powers of the secondary users to keep the security performance of secondary users. We consider adaptive transmit power allocation policies for both the SU-Tx and SR, incorporating the concept of timeout. Additionally, we derive expressions for both the upper bound and lower bound of the end-to-end packet timeout probability in this model. We also investigate the impact of the selected licensed frequency band, transmit power of the PUs, and channel mean powers on the end-to-end packet timeout probability.

Graph Neural Network-based Federated Learning for Sum-rate Maximization in Small-cell Wireless Network

ABSTRACT. This paper investigates the scalability ability of Graph Neural Network (GNN) for solving resource allocation problems in wireless networks. Although GNNs are able to work on diverse network settings, the performance significantly decreases on unseen network settings. To overcome this issue, we propose the combination of GNN and Federated Learning (FL) framework, namely GraphFL. To illustrate the proposed framework, we consider the power control optimization problems of a small-cell wireless system, where each access point at each cell trains a local GNN model to manage the power allocation for its own users. A global GNN model is aggregated from local models following the FL procedure. Thereby, the global GNN is able to work on diverse network settings. Experimental results demonstrate the adaptability of the GNN model on both seen and unseen network settings.

GAPRO: An Adaptive User-centric Resource Allocation and Task Offloading Strategy for Multi-access Edge Computing

ABSTRACT. Multi-access Edge Computing (MEC) is a promising technology to enhance the performance of latency-critical applications by providing additional computing capabilities at network edges. However, in traditional cellular-based MEC, users at the cell edge experience signal attenuation and inter-cell interference, resulting in increased risks of transmission outages and offloading failures. We introduce an integration of MEC functionalities and the User-centric Network, harnessing the strengths of both paradigms. The main objective is to minimize overall delay and energy consumption while adhering to specific delay and energy constraints. To address this problem, we formulate it as a Markov Decision Process and then propose a Deep Reinforcement Learning algorithm to jointly optimize task partitioning, transmit power control, and computing resource allocation. Simulation results demonstrate that the proposed optimization scheme significantly reduces energy consumption and delays for users, while also guarantees delay and battery constraints.

Enhancing Security in WhatsApp: A System for Detecting Malicious and Inappropriate Content

ABSTRACT. With the increasing reliance on instant messaging applications like WhatsApp, ensuring user privacy and security has become paramount. This paper argues for the need to enhance the security of the WhatsApp messaging service. An API endpoint and Chrome extension capable of detecting profanity, malicious URLs, and inappropriate images are developed to satisfy this need. In its current stage, not only can malicious URLs circulate freely through WhatsApp without any checks, but it is also easy to send inappropriate images and texts which receiving users should be careful with. We survey existing publicly available models and APIs in the domains of profanity, malicious URLs, and inappropriate images. A novel approach that incorporates the most appropriate models for our purpose in an integrated detection system, based on simple statistical models and output from the third-party APIs, is proposed. We compile two extensive datasets of profane text and malicious URLs and employ them in testing the effectiveness of our proposed model using statistical methods. Additionally, we present a publicly available proof of concept Chrome extension that incorporates our model to provide users with an added layer of protection. Finally, we discuss potential areas for future research and suggest improvements to enhance the effectiveness of the proposed system and the cybersecurity aspect of chatting applications in general.

Blockchain Oracles: Implications for Smart Contracts in Legal Reasoning and Addressing the Oracle Problem

ABSTRACT. This paper presents a comprehensive investigation into the role, functionalities, and complexities of blockchain oracles, focusing particularly on the implications for smart contracts in legal reasoning contexts. Oracles serve as a vital bridge to smart contracts’ inability to interact with external or “off-chain” data, enabling them to be used in a variety of real-world situations. Oracle’s integration, however, introduces a number of complexities, including security vulnerabilities, collectively referred to as the Oracle Problem. In addition to a review of existing literature, we also provide a mathematical analysis quantifying the computational complexity associated with automating legal reasoning and a novel design framework aimed at establishing oracles that are secure, efficient, and legally compliant. The paper aims to serve as a foundational text for researchers, legal practitioners, and blockchain developers, advancing the academic discourse surrounding blockchain oracles and their role in smart contracts.

A Q-learning-based Energy Efficiency Optimization in LoRa Networks

ABSTRACT. Long-range (LoRa) technology has become the mainstream of low-power wireless communication technology for long propagation distances. Developing mobile nodes with low-power technology is a crucial aspect; thus, energy efficiency (EE) has been considered in LoRa network designs. This area has been the subject of extensive research over the years, with a primary focus on enhancing signal transmission quality and minimizing noise components during communication. In this research, we propose a novel approach to optimize EE for LoRa networks, leveraging the principles of Q-Learning. Our proposed method entails dynamically adjusting the transmission power based on the evolving state of the network environment, ultimately striving to attain the highest EE possibility. To capture the optimality of our learned transmission power, the Gradient Descent method is utilized to point out the progress. Numerical results reveal a good convergence of learning-based and gradient-based approaches and describe the trade-off between performance and inference time. Our proposed algorithm showcases remarkable outcomes, achieving an impressive 91% of the gradient-based solution while reducing the computation time by an average of 30%. These findings emphasize the substantial potential of reinforcement learning techniques, particularly Q-Learning for LoRa networks.

15:40-17:00 Session 7A: AI Foundation and Big Data
Location: Conference Hall
15:40
RoSENet: Rotary Squeeze and Excitation for Vietnamese Food Recognition

ABSTRACT. Along with the accelerated impact of social media in Vietnam, food recognition presents unique opportunities for food identification, food sharing, and tourist attractions. However, the literature on Vietnamese food is mostly still unexplored. Moreover, to be bet- ter applied with real applications, it requires an approach which does not require much computational resources. This prompts us to explore an approach for recognizing Vietnamese cuisine that delivers acceptable accuracy while requiring reasonably affordable resources. As a result, we present the RoSENet architecture, a modi- fied classification network equipped with our novel Rotary Squeeze and Excitation (RoSE) block to take advantage of the rotational invariance property of the Vietnamese food images when presented on diverse serving platters. Additionally, this aids RoSENet in ad- dressing the common issue of a narrow receptive field that often arises in convolutional-based architectures. When evaluated across multiple popular backbones, RoSENet demonstrates improved ac- curacy on two public Vietnamese food datasets, and it even sets a new state-of-the-art (SOTA) benchmark for the 30VNFoods dataset.

16:00
A Question-Answering System for Vietnamese Public Administrative Services

ABSTRACT. In the realm of legal question-answering (QA) systems, information retrieval (IR) plays a pivotal role. Despite thorough research in numerous languages, the Vietnamese research community has shown limited interest in legal information retrieval, particularly in the context of public administrative services. In this paper, we propose the development of a QA system tailored to the Vietnamese language, specifically focusing on the domain of public administrative services. Our system provides legal-based responses, and it is built upon a combination of retrieval and re-ranking techniques. We employ both lexical-based and semantic-based retrieval models and integrate them to create the final model. Our research shows that the system outperforms existing models in retrieving public administrative information and answering questions related to Vietnamese legal documents.

16:20
Evaluating Audio Feature Extraction Methods for Identifying Bee Queen Presence

ABSTRACT. Beehive monitoring is an essential task for beekeepers to keep under surveillance the health and productivity of their beehives. Traditional monitoring methods, such as visual inspection, are labor-intensive, time-consuming, and have negative effects on bee colonies. Recently, machine learning (ML) algorithms have emerged as a powerful tool for automated monitoring beehives using bee sounds. To apply the ML methods, the first main step is to extract important features from the original audio data. In this study, we examine the performance of various ML algorithms using six different audio feature extraction methods. Experiments have been conducted on an audio dataset collected in Vietnam when the bee queen is present or absent. The experiment results indicate that the audio-based approach can effectively monitor beehives and by choosing the suitable feature extraction technique the performance of the ML methods for detecting the absence of the bee queen can be improved significantly

16:40
Scoring model for NFT evaluation

ABSTRACT. The non-fungible token (NFT) market has experienced significant growth in recent years. Similar to real estate and artwork, NFT is illiquid and it does not have a market price. Thus, there arises a need for a comprehensive scoring system to analyze the value of NFTs. Using a single score is often inadequate, as NFT value is influenced by numerous factors and is assessed based on personal objectives. Addressing the issue, this study explores critical factors influencing NFT value and develops three scores to rank NFTs including rarity, return on investment (ROI), and reputation score. The scores serve to evaluate both NFTs and NFT collections. (i) The rarity score evaluates the level of scarcity, and it applies to NFTs within a collection. (ii) The reputation score evaluates the interest of communities in NFT projects, considering the number of followers and the interaction rate; the score is for NFT collections. (iii) The ROI score assesses the profit generated by NFTs; it applies to both NFTs and NFT collections. Our empirical results show that a well-distributed rarity score of NFTs enhances their demand and profit; high-rarity NFTs are typically associated with a low number of sale transactions. NFT collections with high ROI and high reputation generally draw more attention and yield a positive return. In addition to the three scores, this paper presents a system to collect and process a huge amount of blockchain and social network data for NFT evaluation.

15:40-17:00 Session 7B: Networking and Communication Technologies
Location: Tulip
15:40
UAV Path Planning for Backscatter and Cache-aided Wireless Power System Under Imperfect CSI

ABSTRACT. Unmanned aerial vehicles (UAVs) are essential in sixth-generation (6G) networks as they can operate as data relay devices or even a dynamic base station between the source and the destination. In this paper, we study a radio communication system with the support of the UAV to cope with the harsh propagation environment. Besides, caching and backscatter technologies are exploited to boost network efficiency. Specifically, the caching technology reduces network propagation delay, while the backscatter technology provides energy transmitted by the source to the UAV to extend its lifetime. We design the UAV trajectory by formulating and solving the total data throughput at the destination over a time window with respect to the limited energy and the quality of service requirement. The effective algorithm is based on the backtracking method, which can obtain the solution in polynomial time. Numerical results demonstrate an improvement of the sum data throughput by exploiting the proposed algorithm design compared with the different fixed UAV path planning benchmarks.

16:00
Research and Development of a Smart Solution for Runtime Web Application Self-Protection

ABSTRACT. In contemporary times, ensuring web application security is a critical concern for organizations due to the prevalence of numerous types of attacks that serve diverse purposes. Although traditional security measures like web application firewalls (WAF) and intrusion detection systems (IDS) can aid in mitigating attacks, they may still be circumvented or compromised. A more efficacious approach is to adopt runtime application self-protection (RASP) solutions integrated within the web application. In this research, we propose a smart solution for runtime web application self-protection (RASP) to protect against vulnerabilities, attacks, and common weaknesses that have been rated among the top ten web security risks in 2021 by the Open Web Application Security Project (OWASP). The proposed solution leverages convolutional neural network (CNN) and a family of recurrent neural network (RNN) techniques to build a deep learning model with shallow network architectures that scrutinizes user requests, thereby detecting potential SQL injection (SQLi), cross-site scripting (XSS), command injection (CMDi), and other types of attacks. It is designed to adapt dynamically to the behavior and traffic of the application, minimize false positives and the blocking of legitimate traffic. Furthermore, the proposed solution based on microservices architecture enhances the flexibility of the prediction module during upgrades and automated deployment and the solution is also built to be compatible with REST Full API servers. Our results have validated the efficacy of this solution in providing real-time application protection.

16:20
Security and Reliability Performance Analysis of Cognitive NOMA Network Under Outage Constraint of Multiple Primary Users

ABSTRACT. In this paper, we conduct an investigation into the security and reliability aspects of a cognitive non-orthogonal multiple access (NOMA) network operating under outage constraints imposed by multiple primary users (PUs). Specifically, we examine a scenario where secondary users (SUs) have the capability to operate in NOMA mode, allowing them to harness the licensed spectrum band of Np PUs, provided that the interference generated by SUs remains below a predefined threshold. Given that context, both SUs and PUs are susceptible in this spectrum-sharing environment, a potential threat arises in the form of an eavesdropper who seeks to exploit the secrecy of messages exchanged among SUs. Given this setting, we derive a power allocation policy for the SUs, formulate closed-form expressions for outage probability (OP) and intercept probability (IP) to assess system performance, and analyze the trade-off between reliability and security. Finally, we supplement our findings with numerical examples and discussions to illustrate the proposed concepts.

15:40-17:00 Session 7C: Operations research for Sustainable Urban Development
Location: Camellia
15:40
Pickup and Delivery Problem with Cooperative Robots

ABSTRACT. This paper explores the Pickup and Delivery Problem with Cooperative Robots (PDP-CR), a new interest that has emerged in warehouse settings with automated vehicles where multiple robots must cooperate to complete a task. i.e. when a task requires more than one robot. In PDP-CR, a fleet of identical robots has to handle a set of tasks. Each task consists of a pickup point, a destination, processing time, and the number of robots required. The primary objective is to complete all tasks using the initial robots while minimizing the makespan, representing the time until the last robot returns to the depot. PDP-CR is an NP-hard problem. In this paper, we will propose two MILP formulations that are solved using CPLEX. The primary goal of this paper is to introduce PDPC-CR, develop, test, and compare the mathematical models proposed for it.

16:00
Bias-free Trading Algorithms with Momentum Scores for the Vietnamese Stock Market

ABSTRACT. This paper aims to assess the profitability of the Momentum Score trading strategy when applied to the Vietnamese stock market, while concurrently comparing its performance against the VN100 Index and addressing the potential effects of survivorship bias. Our findings reveal that the Momentum Score trading strategy has yielded impressive results, exhibiting a cumulative return of 498.90%, which is nearly 5 times greater than the 95.97% return achieved by the VN100 Index over the same period from February 2015 to June 2023. Moreover, when evaluating returns against the maximum drawdown, the Momentum Score strategy outperforms the VN100 Index by approximately 9.7 times. The robustness of the Momentum Score's profitability is particularly noteworthy given that both strategies adhere to the same trading rules. However, the examination of these results also underscores the considerable influence of survivorship bias on strategy outcomes, emphasizing the importance of accounting for this bias in performance evaluations.

16:20
Metaheuristic for a soft-rectangle packing problem with guillotine constraints

ABSTRACT. We investigate the partitioning of a rectangular region with specified length and height into $n$ soft rectangles of given areas using two-stage guillotine cuts. The goal is to minimize the largest perimeter of the resulting rectangles. The problems hold significance in the ongoing land-allocation reform in Vietnam, as well as in optimizing matrix multiplication algorithms. It has been established that these problems are NP-hard. Within the existing literature, the solutions have primarily been based on Mixed Integer Programming approaches, which, unfortunately, are limited to handling small instances. This limitation serves as the impetus for our work, where we introduce Metaheuristic algorithms designed to tackle larger instances of these problems. Specifically, we propose the utilization of Iterated Local Search, Variable Neighborhood Search, and Tabu Search for addressing the problem. To evaluate the effectiveness of these solution approaches, we conduct experimental analyses that compare their computational efficiency and solution quality. Furthermore, these analyses demonstrate the efficiency of the proposed approaches in managing medium and large-sized instances of these problems.

16:40
Self-Adaptive Ant System with Hierarchical Clustering for the Thief Orienteering Problem

ABSTRACT. Thief Orienteering Problem (ThOP) is a multi-component problem with two interdependent sub-problems Knapsack Problem and Orienteering Problem. ACO++, a state-of-the-art heuristic for ThOP, combines the MAX-MIN Ant System (MMAS) algorithm for route construction, a randomized heuristic for packing plan creation, and the 2-opt method for local search. The excellent reported performance of ACO++, however, is obtained using different sets of parameter values that have been extensively fine-tuned for each specific group of problem instances. In this paper, we present a novel self-adaptive variant of ACO++. Without requiring a cumbersome tuning process, our approach employs adaptive mechanisms to adjust the parameters to each particular problem instance during the algorithm runtime. We also use a lazy evaporation technique and a hierarchical clustering procedure to improve the efficiency of ants exploring the search space. In the 432 benchmark instances, our proposed Self-Adaptive Ant System with Hierarchical Clustering (SAAS-HC) produces superior results compared to previous state-of-the-art approaches.

15:40-17:00 Session 7D: Lifelog Event Retrieval
Location: Daisy
15:40
AGAIN: A Multimodal Human-Centric Event Retrieval System using dual image-to-text representations

ABSTRACT. Event retrieval from visual data presents a formidable challenge, requiring the identification of specific events from a pool of similar occurrences based on concise textual descriptions or a sequence of consecutive frames. The AI Challenge HCMC 2023 - Event Retrieval from Visual Data was a competition to encourage scientists to create interactive retrieval systems tailored for extensive Vietnamese video news databases. Participating teams promptly submitted precise event identifications derived from visual data or natural language descriptions. Therefore, we propose a lightweight and user-friendly yet robust event retrieval system, incorporating functionalities such as image-text matching, object detection, automatic speech recognition, and optical character recognition. This system enables users to search for events through natural language descriptions and relevant images. Notably, our team, named "Again," secured the second position on the public leaderboard of the competition using this system.

16:00
NewsInsight: A Comprehensive Video Event Retrieval System with Spatial Insights and Query Assistance

ABSTRACT. Video event retrieval is the task of finding videos that are relevant to a given query. It is a challenging problem because videos are typically much larger than images, and they can contain a variety of different objects and scenes. However, there are a number of different approaches to video retrieval, and the field is rapidly evolving. Some of the most promising research directions include the use of deep learning and multimodal features. In this paper, we introduce LifeInsight -- a comprehensive video event retrieval system developed for participating AI Challenge 2023. The system under investigation leverages the Bootstrapping Language-Image Pre-training (BLIP) model for zero-shot image-text retrieval, demonstrating superior recall scores on the Flickr30K dataset compared to the Contrastive Language–Image Pretraining (CLIP) model. In addition, it employs an Elastic Search filtering mechanism to discard irrelevant images. Beyond semantic search mechanisms, the system supports visual similarity search by calculating the inner product distance between vectors in the video frames corpus and the query image. The system also incorporates an explicit relevance feedback function, AI-based query description rewriting, and visual-example-generating features, enhancing the precision of the query description and aiding end-users in formulating a more accurate depiction of the targeted image for retrieval.

16:20
Diverse Search Methods and Multi-Modal Fusion for High-Performance Video Retrieval

ABSTRACT. Querying events within extensive video datasets currently stands as a prominent research focus within the field of multimedia information retrieval. Achieving high-performance retrieval within such contexts necessitates the efficient extraction and effective storage of information from videos to expedite the retrieval process. These challenges become notably pronounced when handling substantial datasets. In this paper, we introduce a system tailored for event querying within video data. Our system is meticulously crafted to optimize information retrieval speed and to efficiently organize storage, harnessing the power of FAISS and ElasticSearch. It boasts the capability to process diverse forms of input information, including textual video descriptions, Optical Character Recognition (OCR) results, Automatic Speech Recognition (ASR) transcriptions, visually similar images, and details about objects within videos, encompassing aspects such as color and position. By amalgamating these various input types, our system delivers optimal results.

16:40
Anomaly Event Retrieval System from TV News and Surveillance Cameras

ABSTRACT. This paper introduces a novel approach for lifelogging event retrieval using TV news and surveillance cameras as valuable sources. Our focus is on rapidly identifying candidate shots, or moments of interest, across diverse scenarios to cater to various user intentions. To facilitate this, we curate a dataset of anomaly events from Ho Chi Minh City, leveraging the CLIP model to encode visual information and harness text subtitles for efficient abnormal event annotation in video clips. Additionally, we present a user-friendly system for expressing retrieval preferences. This open system is designed for seamless future upgrades, promising enhanced functionality and adaptability.