SOICT 2025: THE 14TH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY
PROGRAM FOR FRIDAY, DECEMBER 12TH
Days:
next day
all days

View: session overviewtalk overview

08:50-09:30 Session 1: Keynote I: Vincent Wong (The University of British Columbia, Canada)

Integrated sensing and communication (ISAC) is a key technology for the sixth-generation (6G) wireless networks, where the same spectral and hardware resources are used for both communication and environmental sensing. Many optimization problems in ISAC require accurate sensing and communication channel models, which are often difficult to obtain. Machine learning (ML) is a powerful tool for solving ISAC problems by enabling data-driven solutions that can bypass the reliance on explicit models. This talk will explore how ML techniques can improve ISAC performance beyond traditional optimization approaches. Two case studies will be discussed: sensing-assisted predictive beamforming and cooperative sensing through ML. These examples will demonstrate the potential of ML to enable end-to-end signal processing for ISAC in 6G wireless networks.

Location: Ballrooom
09:30-10:10 Session 2: Keynote II: John C.S. Lui (The Chinese University of Hong Kong, Hong Kong)

In this talk, I will begin with a brief introduction to quantum computing, highlighting the importance and opportunities for pursuing fundamental research in the quantum Internet. In particular, I will discuss how quantum networks can enable quantum information transmission, parallel processing, and distributed processing. Next, I will introduce online learning theory and explain how it can help us explore compelling challenges in building quantum networks and the quantum Internet. To this end, I will delve into the quantum path selection problem, as well as the quantum border gateway protocol (QBGP) if time allows. Finally, I will outline several exciting open research problems at the intersection of quantum networks and quantum computing.

Location: Ballrooom
10:40-12:00 Session 3A: SOICT Technical Session I: Quantum Information
Location: Ballrooom A
10:40
Quantum Circuit Resource Assessment for ChaCha20 Stream Cipher

ABSTRACT. The emergence of Grover's algorithm has significantly impacted the perceived security of symmetric-key cryptography in the quantum era. In response, NIST proposed three security levels for symmetric ciphers based on their resistance to quantum adversaries. This paper investigates the quantum implementation of the ChaCha stream cipher, focusing specifically on ChaCha20, which is the 20-round variant of ChaCha. We construct and simulate a quantum circuit for ChaCha20 using the ProjectQ framework, and evaluate its quantum resource requirements. Our implementation requires 1025 qubits, 64,512 CNOT gates, 21,504 Toffoli gates, and achieves a circuit depth of 511. Compared to existing designs, our circuit offers a significant depth reduction and uses only one ancillary qubit, making it more suitable for depth-constrained quantum environments. This result contributes to the broader understanding of quantum cost for stream ciphers, and provides a useful reference point for post-quantum cryptographic analysis.

11:00
EDM4QS: An Emulator-Driven Model for Quantum Scheduling

ABSTRACT. Today, quantum processors have evolved from prototypes to real backends. Various platforms are offering quantum computing resources, including those provided by high performance computing centers and cloud providers. Essentially, quantum jobs, also known as input circuits submitted by users, can vary in terms of qubit requirements and complexity, relying on topology-aware or qubit connectivity. To manage this variability, quantum jobs go through a scheduling pipeline that maps, optimizes, and assigns tasks to the hardware under specific physical constraints. While numerous scheduling methods have been proposed, there are no proposed evaluation models or pipelines to address performance bottlenecks that occur when one phase impacts the others. Our paper presents EDM4QS – an emulator-driven model for quantum scheduling. This pipeline aims to synthesize quantum scheduling components that can evaluate various related techniques within a comprehensive framework, analyzing the interdependencies between phases and their cumulative impact on system performance. As a long-term vision, this work facilitates holistic optimization and identifies the most essential steps in scheduling quantum jobs.

11:20
Toward Acceleration of Variational Quantum Classifier Simulation on GPUs

ABSTRACT. The Variational Quantum Classifier (VQC) is among the most widely studied models in Quantum Machine Learning (QML). However, due to the current limitations of quantum hardware, simulating QML algorithms on classical platforms such as CPUs, GPUs, or FPGAs has become a crucial step to assess performance and feasibility before a deployment on real quantum devices. In this work, we present an accelerated VQC simulation framework, referred to as A-VQC, which leverages the parallelism of classical hardwares, particularly GPUs, to achieve efficient simulation. Specifically, A-VQC introduces two complementary acceleration strategies: (1) data worker concurrency, which speeds up data transfer to GPUs by employing independent and asynchronous data-loading processes alongside VQC execution; (2) stream-wise concurrency, which exploits GPU parallel streams to train VQC on mini-batches concurrently. We implement A-VQC using a cross-platform integration of PennyLane and PyTorch. Our experiments demonstrate improvements in training speed (~10%) and GPU utilization (~30%) compared to conventional VQC simulations.

11:40
Performance Analysis of Quantum Federated Learning with Personalized Layer

ABSTRACT. Quantum computing has recently emerged as a groundbreaking field, promising unprecedented computational power and information processing capabilities. These unique advantages of quantum mechanics present a potential solution to the inherent challenges in personalized federated learning, including high communication costs due to the transmission of local updates and the limited computational capacity of classical devices. Inspired by this synergy, we propose a novel architecture named Quantum Federated Learning with Personalized Layer (QFL-PL). Our method significantly accelerates convergence, outperforming state-of-the-art approaches with 99.62% accuracy on MNIST and 86.23% on CIFAR-10.

10:40-12:00 Session 3B: SOICT Technical Session II: AI Applications
Location: Ballrooom B
10:40
TriFusion: GNN-Based Multimodal Fusion for 3D Object Detection in Autonomous Driving

ABSTRACT. Reliable 3D object detection is critical for autonomous driving, yet LiDAR-only methods often fail under adverse weather, occlusion, or sensor degradation. We introduce Trifusion, a GNN-based multimodal fusion framework that integrates LiDAR, camera, and radar for robust 3D detection. Our approach builds a heterogeneous graph with nodes representing modality-specific features and edges encoding spatial and cross-modal correspondences, enabling attention based message passing across sensors. Evaluated on the nuScenes benchmark against leading baselines (e.g., PointPainting, MVX-Net, BEVFusion), Trifusion achieves superior accuracy and robustness in challenging conditions while maintaining efficiency. These results underscore the promise of graph-based fusion for reliable perception in autonomous driving.

11:00
A Novel Approach for Sino-Vietnamese Text Transcription by Leveraging a Pre-trained BERT and Self-Attention Mechanism

ABSTRACT. The Sino-Vietnamese (aka Hán Việt) vocabulary, derived from ancient Chinese characters (Han) but read with Vietnamese pronunciations, served as Vietnam’s primary writing system until it was replaced by the modern script (chữ Quốc ngữ) in the 20th century. Today, most Vietnamese cannot read Han texts, making transcription tools crucial for preserving cultural heritage. However, the task of Sino-Vietnamese text transcription is challenging due to the presence of single-reading and multiple-reading characters, where the correct reading depends on meaning, sentence position, context, etc. Capturing contextualized information is therefore essential. This study proposes a neural network model based on a pre-trained BERT architecture, enhanced with specialized layers to capture contextual relationships in Han character sequences. Trained on expert-annotated data, the model achieved 96.08% accuracy and a 95.59% F1-score, outperforming existing approaches and providing a robust transcription solution.

11:20
A Comparison of Machine Learning Methods for Alzheimer's Disease Classification in Vietnamese Patients

ABSTRACT. Alzheimer's disease (AD) is a neurodegenerative disorder that poses an increasing burden in middle- and low-income countries. However, computational research on AD in these areas is limited, primarily due to resource constraints and small sample sizes. Traditional machine learning (ML) methods, designed for large datasets, thus may not perform optimally on smaller datasets under resource-limited settings. To evaluate which ML methods may be helpful for AD classification, given limited data, we present a modular framework for the analysis of a private Vietnamese AD dataset comprising 113 subjects. The framework incorporates a predictor module for model training and an explainer module for interpretation. We compared the classification performance, robustness, resilience, and reliability of the models. We also compared interpretable ML models with black-box models using post hoc explainability techniques. Our results indicated that the black-box XGBoost achieved the highest accuracy (81.4%), while the generalized additive model achieved a competitive performance (78.8% accuracy, 87.2% AUC). The generalized additive model also demonstrated greater robustness and resilience when compared to linear and tree-based models. Explainability analysis on inherently interpretable models and post hoc analysis on black-box models suggested the hippocampus as a common important brain region for AD classification, which aligns with previous medical findings. Overall, this study demonstrates the feasibility of using ML approaches for AD diagnosis using small datasets while balancing predictive performance and explainability.

11:40
CodeLit: A Skill-Based Framework for Automated Assessment of Code Comprehension

ABSTRACT. In programming education, verifying whether students genuinely comprehend the code they submit - especially in the age of AIgenerated solutions and peer imitation - poses a growing pedagogical challenge. This paper introduces CodeLit, a skill-based framework for automated assessment of code comprehension, bridging insights from story-based reading comprehension with Bloom’s Taxonomy. CodeLit defines nine essential code comprehension skills - ranging from basic syntactic understanding to recognizing implicit logic and design abstractions—mapped across multiple cognitive levels.

Leveraging large language models (LLMs), CodeLit automatically generates targeted multiple-choice questions (MCQs) to assess these skills, applying a consistent prompt engineering strategy to both general-purpose and programming-oriented models. Our evaluations demonstrate that programming-oriented models significantly outperform general-purpose models in both completeness and quality, highlighting the value of domainadapted LLMs for code comprehension assessment. By aligning computational assessment with cognitive theory, CodeLit provides a novel pathway to reinforce academic integrity, personalize feedback, and deepen learning in programming courses

10:40-12:00 Session 3C: SOICT Technical Session III: Software Engineering
Location: Yersin A
10:40
CandleGen : Generating Synthetic OHLC Data for Different Market Trends using GANs

ABSTRACT. Financial data has been widely used in various applications, such as stock price prediction, algorithmic trading, and risk management. In algorithmic trading, for instance, traders often use historical OHLC financial data to develop and backtest trading strategies. This process allows traders to refine their approaches, understand risk, and ensure strategies are robust across different market conditions. However, relying on the historical financial data can be challenging as the historical data may not always be repeating in the future. In this project, we proposed generative adversarial network based system, CandleGen, for the synthetic OHLC data generation. Taking advantage of the strong generation ability of GAN, CandleGen creates OHLC data for the different market conditions (strong bull, bull, flat, bear, strong bear). We conduct both qualitative and quantitative experiments to evaluate the performance of CandleGen. The result shows that the generated OHLC data are aligned with the realistic data with a small margin of error.

11:00
Graph-based Multi-Agents for Text-to-SQL
PRESENTER: Quoc-Hung Pham

ABSTRACT. As a structured text generation task, Text-to-SQL translates natural language queries (NLQs) into executable SQL, enabling seamless database access. Despite advances in Large Language Models (LLMs), challenges persist with large relational databases and complex, multi-step queries requiring precise schema reasoning. We present GMA-SQL, a graph-based multi-agent framework that builds a three-layer schema graph and coordinates three LLM agents: a Graph Selector for schema pruning, a Graph CoT Decomposer for query reasoning, and a Reflexive Refiner for iterative validation. On Spider and BIRD benchmarks, GMA-SQL achieves higher execution accuracy than strong baselines, with notable gains on hard and extra-hard queries. Beyond SQL parsing, the framework supports bidirectional augmentation: graph reasoning generates synthetic NLQs from schemas for data enrichment, fostering advancements in natural language generation pipelines.

11:20
URAG 2.0: An Agentic Dual Retrieval Framework for Enhanced Reasoning in RAG-based QA Systems

ABSTRACT. Large Language Models (LLMs) have advanced Question-Answering (QA) systems but still suffer from factual errors and limited reasoning when relying solely on implicit knowledge. Retrieval-Augmented Generation (RAG) mitigates these issues by grounding responses in external corpora, yet existing pipelines often depend on a single retrieval channel, which hampers multi-hop reasoning and underutilizes heterogeneous evidence. Graph-based extensions attempt to capture structural relations but remain costly, noisy, and ultimately constrained to one stream. To address these limitations, we propose URAG 2.0, an agentic dual-retrieval framework that extends our original URAG design. URAG 2.0 constructs two complementary indices: Frequently Asked Questions (FAQs) distilled and paraphrastically enriched from documents, and semantically chunked documents refined with context-aware rewriting. At inference, both indices are queried in parallel, and an orchestration layer fuses and ranks evidence before synthesis. Experiments across multiple QA benchmarks demonstrate that URAG 2.0 consistently outperforms advanced RAG baselines in both factual QA and multi-hop reasoning, establishing dual retrieval as a promising direction for building more accurate and explainable QA systems.

11:40
Boosting Test Smell Prediction Using Deep Learning

ABSTRACT. Test smells are indicative symptoms of poor design choices in test code, po-tentially reducing maintainability and compromising test effectiveness. While machine learning-based methods have been proposed to automate test smell detection, their predictive performance is still limited. Deep learning offers a promising solution due to its ability to earn complex con-text and patterns from data. However, its application to test smell predic-tion, particularly with sequence data extracted from test code, remains un-derexplored. To address these motivations, this study aims to present a deep learning-based approach for test smell prediction using input data in the form of sequences. The proposed method is experimentally evaluated on two popular test smells: Eager Test and Mystery Guest. The performance of all proposed models demonstrated significant improvement over baseline models, with the highest F1-score increase of approximately 24%. A com-parative evaluation of three deep learning models, including Convolutional Neural Network, Bidirectional Long Short-Term Memory, and Gated Recur-rent Unit, reveals that Bidirectional Long Short-Term Memory achieved the highest F1-score of 0.7475 for Eager Test, while Convolutional Neural Network performed best on Mystery Guest with F1-score of 0.6529. This work is considered the first effective application of deep learning for pre-dicting test smell on sequence data, highlighting the promising approach in the area.

10:40-12:00 Session 3D: SOICT Technical Session IV: Networking and Communication Technologies
Location: Yersin B
10:40
Propagated Presence: A Bluetooth Propagation-Based Method for Automated Classroom Attendance on Mobile Devices

ABSTRACT. Abstract. Intoday'sfast-moving,technology-drivenworld,manualrou- tines are steadily giving way to automated workows. Yet in many uni- versities, attendance systems remain outdated: students check in once at the start, with no location verication, no continuous tracking, and little defense against cheating. These gaps waste class time, invite er- rors, and make proxy check-ins easy. To tackle these shortcomings, we propose a solution that is fast, accurate, scalable, and cost-eective: an Android application that turns every student's smartphone into a Blue- tooth Low Energy (BLE) beacon. Instead of installing dedicated hard- ware, our approach reuses the radio already built into modern phones. Each device periodically broadcasts a unique identier linked to the user, along with session-specic data. At the same time, the lecturer's mobile device broadcasts its own beacon. After each round, every phone uploads its scan log to the server, where a propagation algorithm determines the set of students actually in attendance. Using a round-based cyclescan, upload, repeatprevents data gaps and continuously conrms that stu- dents remain in the room for the entire session. We also embed real-time validity checks to block impersonation attempts. Because no extra hard- ware is needed, campus-wide deployment is inexpensive, and the algo- rithm's design minimizes calibration headaches even when phone models dier. Our experiments show that the method is reliable, highly accu- rate, and ready to scale. In future work, we plan to adapt the system to other settings, such as workplaces, and to develop a companion iOS app.

11:00
An Evaluation on Defragmentation with CDC ROADMs in Elastic Optical Networks

ABSTRACT. The anticipated widespread adoption of Colorless, Directionless, and Contentionless (CDC) ROADMs in Elastic Optical Networks (EONs) is driven by their ability to eliminate spectrum contention. However, a significant gap exists in the literature regarding a quantifiable evaluation of their benefits, particularly in the context of network defragmentation. To address this, we present a new exact solution and two efficient heuristic algorithms for the make-before-break defragmentation problem in EONs with CDC ROADMs. Our exact algorithm marks a significant advance over existing solutions, and our heuristics enable us to perform a large-scale evaluation on a realistic 24-node, 86-link network. The results from our experiments show that CDC ROADMs provide a 2\% improvement in the blocking rate, offering crucial data for their cost-benefit analysis and strategic deployment.

11:20
Fusing Gated Spatial-Channel Units and Fractal Cross-Scale Attention for Lightweight Waveform Classification

ABSTRACT. Accurately classifying radar-communication signals remains a fundamental yet challenging task, as these waveforms often exhibit high variability and are easily distorted by noise, channel fading, and spectrum overlap. Conventional deep learning approaches, despite their remarkable progress, tend to capture either local spatial cues or long-range dependencies in isolation, thereby limiting their ability to achieve robust recognition in complex environments. To address these challenges, we propose SynerNet, a lightweight deep neural architecture that effectively integrates spatial diversity and cross-scale attention mechanisms. Specifically, SynerNet leverages the smoothed pseudo Wigner-Ville distribution to generate informative time-frequency representations, which retain high resolution while mitigating undesired cross-term interference. Building upon these representations, the network is enhanced by two key components: the Gated Spatial-Channel Unit module, which jointly models spatial and channel dependencies to selectively emphasize salient features while suppressing noise, and the Fractal Cross-Scale Attention module, which employs a hierarchical fractal-inspired attention scheme to preserve fine-grained details across multiple scales while ensuring global consistency. Simulation results on 12 waveform classes encompassing both radar and communication signals demonstrate that SynerNet achieves an average classification accuracy of 90.61%, with only 47K parameters and an inference latency of 0.552 ms, outperforming existing deep learning approaches. These results highlight the strong potential of SynerNet for real-world deployment in intelligent sensing and wireless communication systems under resource-constrained environments.

11:40
A mobile-based attendance system using Bluetooth MAC address scanning

ABSTRACT. The development of automated attendance systems is a mature area of research, with a wide body of literature exploring various technologies to overcome the limitations of manual methods. Traditional paper-based roll calls or sign-in sheets are widely considered inefficient, consuming valuable instructional time, and are highly susceptible to inaccuracies and academic dishonesty, such as proxy attendance. In response, various technological paradigms have been proposed and implemented, each presenting a unique set of trade-offs between security, cost, usability, and precision. This paper presents the design, implementation, and evaluation of a mobile-based attendance management system for academic institutions. The system leverages the classic Bluetooth device discovery process, capitalizing on the observation that modern smartphones broadcast their static, non-randomized MAC address when placed in a user-initiated discoverable mode. It leverages a robust Android application and a cloud backend to provide an automated, contactless attendance verification mechanism that is both efficient and scalable.

10:40-12:00 Session 3E: Poster Exhibition
Feature Optimization for Improving Locust Detection

ABSTRACT. Locusts are a major contributor to economic losses in global agriculture. Early detection of these insects enables growers to implement effective control strategies, thereby enhancing crop yield and quality. However, current object detection models have achieved suboptimal results in locust detection, primarily due to they are often small size, occlusion, and camouflage. In this work, we propose an effective approach to address this challenge, dubbed FO-YOLO. A Feature Optimization module is developed and integrated into the head of YOLOv10 to improve feature extraction for detection. The proposed module concurrently receives features from multiple levels and establishes a shorter pathway between low-level and high-level features through feature fusion. In addition, we add an extra prediction layer to enhance the detection performance for small locusts. Experimental results verify that our FO-YOLO achieves state-of-the-art performance, surpassing competitive models.

HMCT: A Hybrid Multi-Scale CNNs- Transformer encoder for Fault Diagnosis in WSNs

ABSTRACT. Fault diagnosis in wireless sensor networks (WSNs) is a critical task to ensure the reliability and precision of the collected data. Sensor nodes are often deployed in harsh and unattended environments, making them prone to faults such as bias shifts, drifts, spikes, erratic fluctuations, and stuck failures. If not detected, these faults can reduce network performance and cause wrong decisions. This paper proposes a hybrid deep learning framework that integrates multiscale convolutional neural networks (CNNs) with a Transformer-based attention mechanism to extract temporal and spatial features from collected sensor data in order to maximize the ability to diagnosis faults. To validate our approach, we construct various experiments on a realistic dataset that includes temperature, humidity, and surface pressure measurements. The experimental results show that our proposed framework reaches strong performance in fault classification, significantly outperforming conventional machine learning baselines. Compared to the strongest baseline, the proposed approach achieves an accuracy improvement of more than X%.

SafeGen: Embedding Ethical Safeguards in Text-to-Image Generation

ABSTRACT. Generative Artificial Intelligence (AI) has created unprecedented opportunities for creative expression, education, and research. Text-to-image systems such as DALL·E, Stable Diffusion, and Midjourney can now convert ideas into visuals within seconds, but they also present a dual-use dilemma, raising critical ethical concerns: amplifying societal biases, producing high-fidelity disinformation, and violating intellectual property. This paper introduces SafeGen, a framework that embeds ethical safeguards directly into the text-to-image generation pipeline, grounding its design in established principles for Trustworthy AI. SafeGen integrates two complementary components: BGE-M3, a fine-tuned text classifier that filters harmful or misleading prompts, and Hyper-SD, an optimized diffusion model that produces highfidelity, semantically aligned images. Built on a curated multilingual (English-Vietnamese) dataset and a fairness-aware training process, SafeGen demonstrates that creative freedom and ethical responsibility can be reconciled within a single workflow. Quantitative evaluations confirm its effectiveness, with Hyper-SD achieving IS = 3.52, FID = 22.08, and SSIM = 0.79, while BGE-M3 reaches an F1-Score of 0.81. An ablation study further validates the importance of domain-specific fine-tuning for both modules. Case studies illustrate SafeGen’s practical impact in blocking unsafe prompts, generating inclusive teaching materials, and reinforcing academic integrity.

Evaluating Syllabus via Sub-Criteria: A Comparative Study of LLM and Experts

ABSTRACT. Syllabus design and evaluation are critical tasks for high school teachers but remain time-consuming, resource-intensive, and difficult to scale in Vietnam’s education reforms. Although the Ministry of Education and Training (MOET) provides official criteria, these standards are often vague, leading to inconsistent application. Large language models (LLMs) offer efficiency and scalability for educational tasks, yet their role in aligning with national evaluation criteria remains underexplored. In this study, we investigate the use of LLMs to automatically decompose MOET’s criteria into fine-grained sub-criteria and compare them with human-expert decompositions. We then evaluate the same syllabus against both sets of sub-criteria to assess consistency, reliability, and scalability. Our findings reveal that LLMs can effectively transform broad standards into actionable components, reducing workload and enabling replication, while also presenting limitations in accuracy and contextual alignment. This study contributes to the growing discourse on AI in education by highlighting how LLMs can complement human expertise in curriculum evaluation within the Vietnamese context.

ViTrustKOL: A Vietnamese Dataset for Consumer Trust Classification toward Key Opinion Leaders

ABSTRACT. We present a novel annotated Vietnamese dataset for the study of consumer trust toward Key Opinion Leaders (KOLs) on the TikTok platform. The corpus comprises 16,000 user comments manually labeled into three trust categories—Positive, Neutral, and Negative—reflecting authentic consumer–KOL interactions and a wide spectrum of linguistic expressions of trust and distrust. To demonstrate the dataset’s utility, we benchmark two modeling paradigms: a Vietnamese-specific pre-trained encoder (PhoBERT) and a large language model (LLM)-based pipeline. Experimental results indicate that the LLM-based approach substantially outperforms PhoBERT, achieving 69.2% accuracy and a 68.5% macro F1-score versus PhoBERT’s 60.7% accuracy and 59.3% macro F1-score. The primary contributions of this work are threefold: (1) the introduction of a large, manually curated Vietnamese dataset tailored for trust classification in social media, (2) a systematic benchmarking of both language-specific and LLM-based methods on this resource, and (3) empirical evidence that LLM-based pipelines can provide notable performance gains for trust analysis in Vietnamese. This dataset and the accompanying benchmarks establish a foundation for future research on context-aware trust modeling and practical applications in Vietnamese natural language processing.

Efficient Caching for Conditional Flow Matching in Vietnamese Zero-Shot TTS

ABSTRACT. Zero-shot text-to-speech (ZS-TTS) has advanced rapidly in high-resource languages, but Vietnamese remains challenging due to its complex phonology and the mismatch between orthography and pronunciation. We investigate Conditional Flow Matching (CFM) for Vietnamese ZS-TTS and find that naive multilingual fine-tuning fails to close the quality gap. To address this, we propose two components: a phoneme-based input representation that better aligns linguistic and acoustic units for Vietnamese, and a cache-based sampler that reuses intermediate computations to reduce inference time without retraining. Implemented on F5-TTS, our system achieves strong perceptual quality and speaker similarity on Vietnamese (MOS 4.42, SIM-o 0.8093) with competitive intelligibility, and generalizes well to cross-lingual synthesis (MOS 3.84, WER 2.94\%). Ablation results reveal a clear balance between quality and efficiency: moderate caching retains most perceptual quality while significantly improving synthesis speed. These findings demonstrate that phoneme-level modeling and caching together offer a simple and effective path toward high-quality, efficient CFM-based Vietnamese ZS-TTS.

A Robust Multi-Modal Framework for Explicit Content Detection in Digital Forensics via Adversarial-Resilient Ensemble Learning and Homomorphic Encryption

ABSTRACT. The rapid expansion of digital media, driven by generative AI developments as of 2025, has posed formidable challenges in digital forensics, especially for identifying explicit materials such as child sexual abuse material (CSAM). Conventional detection systems, often based on skin-tone heuristics or basic convolutional neural networks (CNNs), remain susceptible to adversarial distortions and synthetic deepfakes, resulting in elevated false-positive rates in practical settings. In this study, we propose an integrated multi-modal architecture that combines CNNs with adversarial hardening, feature extraction in alternative color spaces (YCbCr, HSV), and ensemble methods (SVM and RNN) to deliver enhanced durability and precision. To mitigate privacy risks associated with sensitive CSAM handling, we incorporate the Cheon-KimKim-Song (CKKS) homomorphic encryption scheme, facilitating operations on ciphertexts without revealing plaintext, thereby upholding investigator confidentiality and forensic integrity. Tested on benchmarks including NPDI, UTKFace, and bespoke adversarial datasets, our approach yields 98.5% accuracy, reducing false positives by 15-20% relative to references like NudeNet and DeepPornDetection, while sustaining efficacy in encrypted computations with negligible performance penalties. Key innovations encompass a specialized adversarial perturbation generator for forensic contexts, CKKS-enabled secure ensemble inference, and compatibility with platforms such as Autopsy. This research fills notable voids in the 2025 scholarly landscape, where emphases on deepfake resilience and privacy-preserving machine learning (PPML) prevail, yet the synergy of explicit-content-targeted adversarial defenses with homomorphic encryption is largely uncharted, presenting a methodologically fresh strategy to bolster forensic inquiries while prioritizing data security

A multimodal framework for Vietnamese Sign Language Recognition

ABSTRACT. In this paper, we propose a multimodal framework that integrates multiple input modalities, namely RGB video, optical flow, and keypoint information, after each modality is processed through deep learning architectures. This design is based on the idea that various input sources complement features of sign language: RGB frames offer appearance cues, optical flow encodes motion dynamics, and keypoints highlight skeletal structures. By performing a late fusion, our method leverages the strengths of each modality. We evaluated the proposed framework on the ViSL120 dataset of isolated Vietnamese Sign Language (ViSL) and performed systematic comparisons with single-modal and other fusions. The results demonstrate that our multimodal approach significantly outperforms the recognition accuracy of the baseline models.

Addressing Data Scarcity and Imbalance in Depression Screening with Persona-Driven Synthetic Data

ABSTRACT. Data scarcity and privacy concerns are significant barriers to developing machine learning models for depression screening. This research introduces and validates a novel framework for generating high-quality synthetic data to address this challenge. The core of the methodology is a three-stage pipeline where a Large Language Model (LLM), guided by clinical scales (PHQ-8) and diverse user personas, generates realistic narrative synopses from clinical interviews. Empirical validation demonstrates the framework's efficacy: a model trained exclusively on our synthetic data achieved a high F1-score of 0.84 on the DAIC test set, indicating a more stable and reliable training process compared to prior methods while approaching the performance of models trained on real data. Further analysis, including embedding space visualization, confirms that our synthetic data closely mirrors the semantic features and label distribution of real-world data, thereby mitigating the risks of label imbalance and enhancing model generalization. This study contributes (1) a validated framework for synthetic data generation and (2) a high-quality dataset released to the research community. Our work lays the groundwork for developing more robust and equitable AI-driven tools for mental health screening, underscoring that the quality of synthetic data is paramount for building reliable applications.

Fish-Net: an Effective Model for Underwater Fish Detection

ABSTRACT. The automated detection of fish is growing in demand for different applications, such as aquaculture monitoring and oceanographic research. Nevertheless, the performance of existing object detectors is often limited by changing illumination and low-light conditions underwater. In this study, we propose an effective model to improve the accuracy of fish detection, dubbed Fish-Net. Our approach—a feature-based learning method is developed by attaching a proposed feature improvement (FI) module before the backbone section of YOLOv10. The FI module is responsible for enhancing low-light images and generating sharp features of fish to provide to the baseline detector. Comprehensive experimental results on publicly available datasets validate that our approach outperforms existing object detection models.

VRAE: Vertical Residual Autoencoder for License Plate Denoising and Deblurring

ABSTRACT. In real-world traffic surveillance, vehicle images captured under adverse weather, poor lighting, or high-speed motion often suffer from severe noise and blur. Such degradations significantly reduce the accuracy of license plate recognition systems, especially when the plate occupies only a small region within the full vehicle image. Restoring these degraded images in a fast real-time manner is thus a crucial pre-processing step to enhance recognition performance. In this work, we propose a Vertical Residual Autoencoder (VRAE) architecture designed for the image enhancement task in traffic surveillance. The method incorporates an enhancement strategy that employs an auxiliary block, which injects input-aware features at each encoding stage to guide the representation learning process, enabling better general information preservation throughout the network compared to conventional autoencoders. Experiments on a vehicle image dataset with visible license plates demonstrate that our method consistently outperforms Autoencoder (AE), Generative Adversarial Network (GAN), and Flow-Based (FB) approaches. Compared with AE at the same depth, it improves PSNR by about 20%, reduces NMSE by around 50%, and enhances SSIM by 1%, while requiring only a marginal increase of roughly 1% in parameters.

A Hybrid Quantum-Classical Machine Learning Framework for Robust Sepsis Detection Utilizing Immune Gene Signatures

ABSTRACT. Sepsis remains a critical medical condition demanding rapid and accurate diagnosis to improve patient outcomes. While gene expression data offers a promising avenue for precise diagnostics, its high dimensionality and inherent platform-specific bias pose significant challenges for conventional machine learning models. This paper introduces a novel hybrid quantum-classical machine learning framework for robust, cross-platform sepsis detection. Our methodology leverages a carefully curated set of public gene expression datasets (Affymetrix U133, AgilentV2, AffyU219) partitioned into distinct training and testing cohorts to ensure generalizability. We employ a rigorous feature engineering pipeline focused on immune-related genes, involving differential expression analysis and Random Forest-based selection, to identify a minimal, high-impact gene signature. The core of our approach is the development of a Quantum Support Vector Machine (QSVM) model, where gene expression data is encoded into a quantum feature space using a parameterized quantum circuit, enabling the calculation of a complex, high-dimensional kernel. We benchmark our developed QSVM-based method against a state-of-the-art XGBoost model and a Classical SVM. Results demonstrate that our proposed model consistently outperforms these classical counterparts in multiple independent test sets, achieving superior accuracy (up to 99.42%), sensitivity (up to 99.79%), and F1-score (up to 99.69%).

ViFin-MARS: A Question-Answering System for Financial News Dataset integrating User Intent Identification and Multi-Agent RAG Systems

ABSTRACT. Large Language Models (LLMs) have significantly advanced Natural Language Processing, demonstrating remarkable capabilities in general tasks such as text generation, summarization, and question answering. However, when generating authentic questions in the rapidly changing financial news domain, they are prone to producing hallucinated responses. Additionally, retraining LLMs on new data has many considerations such as computational cost and data quality. This paper introduces ViFin-MARS, a question-answering system. Our approach uniquely integrates User Intent Identification with a Multi-Agent Retrieval Augmented Generation (RAG) framework. The system first classifies user queries into specific financial intents. This classification then directs the query to a specialized agent within the multi-agent system, which is optimized to retrieve relevant context from our financial news dataset and generate a answer. Experimental results on our evaluation dataset show that ViFin-MARS achieves 76% accuracy, demonstrating the effectiveness of this integrated architecture in enhancing the reliability and precision of financial question-answering.

Polynomial-Augmented Instant Neural Graphics Primitives

ABSTRACT. Neural Radiance Fields (NeRFs) provide high-quality view synthesis and 3D reconstruction. However, training is still expensive. Instant Neural Graphics Primitives (Instant-NGP) reduce this cost by combining hash encoding with shallow MLPs. This design enables real-time training but keeps the dense layers linear, which limits expressiveness and slows convergence. We propose MPLP (Multi-degree Polynomial Layer Perceptrons), a lightweight extension of Instant-NGP. Each layer augments its input with second-degree polynomial terms before projection and normalization. This expansion improves non-linear modeling without increasing depth and with only a minor parameter increase.. On the NeRF-Synthetic Lego dataset, MPLP-NeRF improves PSNR from 24.34 dB to 24.93 dB (+0.59 dB). Reconstruction loss drops by 12%. The model also converges 40% faster. Training overhead is modest: only +15% runtime and negligible memory use. These results show that MPLP is a practical and scalable upgrade to Instant-NGP. It balances speed and quality, making polynomial-enhanced NeRFs a strong candidate for real-time view synthesis and 3D reconstruction.

Improve the Effectiveness of Predicting Student Learning Outcomes using a MoE Networks with LSTM Routing

ABSTRACT. In online or blended courses, learners must self-manage their learning progress. Predicting learning outcomes is therefore essential to provide timely warnings to learners and help lecturers adjust teaching plans to enhance training quality. Previous studies have shown that LSTM models using time-series data are effective. Still, they often rely on multi-layer deep networks, which increase computational costs and make it difficult to model the behaviors of different learner groups. This study proposes a hybrid expert network architecture to predict whether learners will pass or fail. Each expert comprises an LSTM layer combined with an Attention mechanism, while a routing mechanism—implemented as an LSTM network—selects the appropriate expert based on time-series data. The number of experts is determined by clustering learners using the Fuzzy C-Means. Experiments on the OULAD dataset demonstrate that the proposed architecture outperforms traditional stacked multi-layer LSTM models.

Contrastive Preference Optimization for Low-Resource Vietnamese to Khmer Neural Machine Translation

ABSTRACT. With the rapid development of multilingual large language models (LLMs), machine translation has made significant progress. However, translating low-resource languages remains challenging because LLMs are often trained on limited data for these languages. In addition, the available bilingual datasets for such language pairs are typically small and may contain noise, leading to suboptimal translation quality. In this paper, we address this problem in the context of Vietnamese to Khmer machine translation. We propose a three-stage training pipeline that effectively leverages both monolingual and bilingual data while keeping computational costs low. First, we continually pre-train the LLM on Vietnamese and Khmer monolingual data to improve its language understanding. Next, we apply parameter-efficient fine-tuning using LoRA to align the model with bilingual data. Finally, we train the model using the Contrastive Preference Optimization (CPO) method, which helps it better distinguish between high- and low-quality translations, thereby producing more accurate and natural outputs. Experimental results show that our approach outperforms existing models and few-shot prompting with GPT-OSS-120b across both traditional metrics (BLEU, METEOR) and semantic metrics (COMET, KIWI-COMET).

13:30-15:30 Session 4A: SOICT Technical Session V: Generative AI
Location: Ballrooom A
13:30
SEA-LION: Southeast Asian Languages in One Network
13:50
AD-GENESIS: Anomaly Detection through Gradient-Guided Generative Synthesis

ABSTRACT. The generalization capacity of anomaly detection models when confronted with previously unseen anomalies continues to pose a substantial challenge, especially within mission-critical systems. To tackle this fundamental problem, we present AD-GENESIS - an innovative framework that leverages generative artificial intelligence in conjunction with gradient-based optimization for training anomaly detection models. The framework performs direct optimization of latent variables within the generative model to synthesize diverse and novel anomalous samples that can effectively evade existing detection models. These synthetically generated samples are subsequently employed to enhance the detection model, thereby improving its capability to process complex anomalous patterns. Additionally, AD-GENESIS incorporates a state-of-the-art Mamba-based architecture for the detection model. This choice leverages Mamba's linear-time complexity and ability to capture long-range dependencies, ensuring high accuracy and computational efficiency during inference. Experimental evaluation conducted on the ADBench benchmark reveals that AD-GENESIS achieves superior performance compared to contemporary state-of-the-art models while maintaining minimal inference latency.

14:10
PRADA-QA: Product QA with Multi-Agent Planning and Dynamic Knowledge Retrieval

ABSTRACT. Large Language Model (LLM)-based autonomous agents have demonstrated strong capabilities in decision-making and handling complex tasks. However, there remains a notable gap in public research on leveraging multi-agent systems for Product Question Answering (PQA), a crucial area in modern e-commerce. In this work, we introduce PRADA-QA, a framework that enhances the user experience through multi-agent collaboration, enabling dynamic information retrieval from diverse sources to respond to user queries accurately. In addition, we propose a planning module that adaptively guides the agents' objectives, improving task fulfillment efficiency while minimizing redundant steps and operational costs. For evaluation, we employ a reward model--an indispensable component in reinforcement learning-based LLM post-training--as a proxy for human preferences. This approach was designed to capture user-centric quality and may also be generalizable to other open-ended QA scenarios. Leveraging a reward model-based evaluation strategy, we conduct extensive experiments across three distinct domains to assess the effectiveness of PRADA-QA. The experimental results demonstrate its superiority compared to traditional approaches, highlighting its enhanced ability to generate accurate and contextually appropriate responses for the product question-answering task.

14:30
Enhancing RAFT with Knowledge Graphs for Question Answering on Vietnamese Legal Texts

ABSTRACT. Vietnamese Legal Question Answering (Legal QA) is an emerging field with the potential to enhance access to legal information, yet it faces challenges such as limited datasets, insufficient reasoning capabilities, and a lack of unified benchmarks. This study addresses these gaps by constructing an expert-verified dataset in two key domains—Labor Law and Enterprise Law—and by advancing Retrieval-Augmented Generation (RAG) methods for legal QA. We implement multiple approaches, including Naive RAG, RAG with query expansion, RAFT with a fine-tuned reranker, and RAFT with knowledge graph integration (RAFT-KG). Experimental evaluations compare open-source models (LLaMA-based) with GPT-4.1 using automatic LLM-based metrics (RAGAs) across dimensions of context recall, faithfulness, answer relevancy, and factual correctness. Results show that while GPT-4.1 achieved the highest factual correctness, RAFT-KG and query expansion methods improved context recall and faithfulness, and the RAFT reranker provided balanced, domain-adapted performance. These findings demonstrate that integrating knowledge graphs and domain-tuned rerankers enables opensource models to approach proprietary LLM performance, paving the way for transparent, robust, and scalable legal QA systems tailored to the Vietnamese context

14:50
Segmentation-Free Handwriting Recognition from Historical Handwritten Documents Using Large Vision-Language Models

ABSTRACT. Handwriting recognition and information extraction are valuable tasks for preserving cultural heritage. Traditional deep learning approaches depend on two principal success factors, which are often difficult to ensure. First, an explicit segmentation of scanned pages into text lines to facilitate handwriting recognition. Secondly, a large amount of training samples for fine-tuning to specific manuscripts. In contrast, pre- trained Large Vision-Language Models (LVLMs) do not depend on these factors, as they are able to extract handwritten information from entire pages by means of a prompt. In this paper, we contribute an experimental benchmark that aims to assess to what degree explicit segmentation and/or fine-tuning still are necessary for LVLMs. Our findings on the Washington, IAM, and CM1 datasets indicate that both traditional success factors are becoming less important for the best performing LVLMs and optimized prompts.

15:10
GenAI-Enabled Backlog Grooming in Agile Software Projects: An Empirical Study

ABSTRACT. Effective backlog management is critical for ensuring that development teams remain aligned with evolving requirements and stakeholder expectations. However, as product backlogs consistently grow in scale and complexity, they tend to become cluttered with redundant, outdated, or poorly defined tasks, complicating prioritization and decision-making processes. This study investigates whether a generative- AI (GenAI) assistant can automate backlog grooming in Agile software projects without sacrificing accuracy or transparency. Through Design Science cicles, we developed a Jira plug-in that embeds backlog issues with the vector database, detects duplicates via cosine similarity, and leverage GPT-4o model to propose merges, deletions, or new issues. We found that AI-assisted backlog grooming achieved 100% precision while reducing the time-to-completion by 45%. The findings demonstrated the tool’s potential to streamline backlog refinement processes while improving user experiences

13:30-15:30 Session 4B: SOICT Technical Session VI: AI Applications
Chair:
Location: Ballrooom B
13:30
Optimization Approaches for Language Models in the Task of Translating Sino-Vietnamese Texts into Modern Vietnamese

ABSTRACT. This study proposes a method for developing compact language models in the task of translating Sino-Vietnamese texts into Modern Vietnamese. In the field of language modeling, developing models from scratch and deploying them is costly due to the large amount of data and computational resources required. To address this challenge, we introduce the KD-QLoRA method, which combines Knowledge Distillation (KD) with Quantized Low-Rank Adapter (QLoRA). This approach inherits and extends existing techniques by integrating multiple optimization methods to improve the development and deployment of language models. Experiments conducted on open-source pretrained models such as LLaMA 3, Qwen 3, and Phi 3 demonstrate that KD-QLoRA enables the development of smaller language models that outperform QLoRA. Moreover, Qwen 3 1.7B, when fine-tuned with KD-QLoRA, achieves performance comparable to or better than larger models in the task of translating Sino-Vietnamese texts into Modern Vietnamese.

13:50
Motion-Gated Adaptive Filtering for Continuous Sign Language Recognition

ABSTRACT. Continuous Sign Language Recognition (CSLR) is challenged by the complex spatio-temporal dynamics inherent in sign language videos. Existing methods often rely on uniform processing strategies, computationally expensive external cues like optical flow, or struggle with undertrained feature extractors. To address these limitations, we propose Motion-Gated Adaptive Spatio-Temporal Filtering (MG-ASTF), a novel plug-and-play module for deep CSLR networks. Crucially, MG-ASTF computes motion estimations directly from intermediate feature maps, eliminating the need for external data. It uses these internal motion cues to dynamically gate two parallel, specialized filtering pathways: one prioritizing temporal dynamics for high-motion segments and another emphasizing spatial detail for static or low-motion regions. We integrate MG-ASTF into a standard ResNet-based architecture and demonstrate its efficacy on the PHOENIX14 and PHOENIX14-T benchmarks. Our RGB-only approach achieves highly competitive results, notably matching the performance of complex multi-modal systems, thereby showcasing a more efficient path to robust feature learning in CSLR.

14:10
Fine-Tuning Large Language Models for Automated English Speaking Proficiency Assessment Using Multimodal Linguistic and Prosodic Features

ABSTRACT. Automated Spoken Language Assessment (ASLA) presents a scalable solution for evaluating English as a Second Language (ESL) learners, yet requires robust and accurate systems. This paper proposes a novel approach by fine-tuning several Large Language Model (LLM) variants from the Qwen family on a rich, multimodal dataset incorporating both prosodic and linguistic features. Our systematic evaluation demonstrates that a fine-tuned Qwen 2.5 7b model achieves the best overall performance in predicting CEFR proficiency levels, outperforming even larger, newer models. The results confirm that fine-tuning is essential for predictive accuracy and that both prosodic and linguistic features provide complementary information that enhances performance. Furthermore, we show that post-processing with isotonic regression substantially improves score calibration. However, a detailed per-score analysis reveals a primary limitation: the models exhibit a systematic bias, overestimating low-proficiency speakers and underestimating high-proficiency speakers, largely due to data imbalance. While performance is strong in the mid-range proficiency levels, this study highlights the critical challenge of ensuring accuracy at the extremes. Overall, this work charts a practical path for integrating advanced LLMs into ASLA systems and provides a clear roadmap for future research to address model bias and enhance discriminative power across the full proficiency spectrum.

14:30
DRONEs: Deep Reinforcement Optimization for Network k-Connectivity Restoration Enhancement in UAVs

ABSTRACT. Maintaining k-connectivity is essential for resilient multi-hop UAV communication, yet mixed-integer and heuristic approaches scale poorly with swarm size and operating area. We present a deep reinforcement learning framework with a permutation-invariant, cross-attention encoder that captures inter-agent interactions and generalizes zero-shot from small training swarms (5–50 UAVs) to much larger fleets (e.g., 100) without retraining. The policy is trained under centralized training with decentralized execution (CTDE), enabling fast, feed-forward decisions at test time. We further apply a Hungarian post-assignment refinement to map goals to vehicles, minimizing total displacement without altering the learned topology. Across benchmarks, our method matches or surpasses strong heuristics on small instances and, at larger scales, achieves real-time performance with markedly lower latency and equal or better solution quality. These results highlight a practical path to scalable k-connectivity restoration in UAV swarms.

14:50
XMedCLIP: A Multimodal Deep Neural Network for Bone Pathology Classification from X-ray Image

ABSTRACT. This study proposes a two-stage framework for bone pathology classification under limited data. The pretraining stage aligns a ViT image encoder and a PubMedBERT text encoder in a shared space using bidirectional contrastive learning on paired X-rays and physician diagnoses. The fine-tuning stage then freezes both encoders and trains crossattention fusion and a classifier on paired X-rays with patient self-reports and initial examination notes. The model employs a cross-attention fusion head to combine image and text features before a linear classifier. This framework simulates reasoning of clinicians. With a cross-attention fusion head (Ours-CrsAtt), it reaches 72.78% accuracy in the full-shot setting that outperforms strong unimodal and multimodal baselines. In addition, with a cosine classification head (Ours-Cosine), the model achieves 51.85% accuracy in the one-shot setting and 66.30% in few-shot. Overall, the proposed multimodal architecture learns effectively even with few examples and delivers higher accuracy than single-modality baselines. The method offers a practical and scalable solution for medical imaging workflows.

15:10
Automated ESG classification by using Natural Language Processing Techniques from Vietnamese Company Annual Reports

ABSTRACT. As sustainable development gains increasing attention, more and more companies and investors are using environmental, social, and governance (ESG) performance, i.e., non-financial activities, as evalua- tion indicators. Currently, ESG classification and ratings are performed by numerous institutions, and these assessments are subject to human bias, resulting in varying ESG classifications and ratings for the same company. Although many automated ESG classification and rating mod- els employing natural language processing (NLP) techniques have been developed to address this shortcoming, these models are primarily based on the English language. For Vietnam, a developing country, such mod- els are currently lacking. Therefore, in this work, we first construct a Vietnamese-language ESG dataset, collected from the annual sustainabil- ity reports of listed companies in Vietnam. We then employ fine-tuning techniques to fine-tune the bidirectional encoder representations from transformers (BERT) model on this dataset, resulting in an ESG classi- fication model tailored for the Vietnamese market. This model achieves 81.88% accuracy on this dataset. The trained model improves trans- parency in ESG classification and ratings and reduces human bias, pro- viding Vietnamese companies and investors with a reliable tool for as- sessing corporate ESG performance.

13:30-15:30 Session 4C: SOICT Technical Session VII: Applied Operations Research and Optimization
Location: Yersin A
13:30
Exponential Cone Reformulation for Scalable Estimation of Quantal Response and Multinomial Logit Models

ABSTRACT. Quantal response (QR) and multinomial logit (MNL) models are fundamental in behavioral and discrete choice modeling, providing probabilistic frameworks that capture bounded rationality in strategic environments and heterogeneous preferences in individual decision-making. Traditional parameter estimation relies on maximum-likelihood methods solved via gradient-based algorithms, which are sensitive to step-size choices and often scale poorly in high-dimensional or large-scale settings. In this work, we revisit the estimation of both QR and MNL models through convex conic optimization. We show that their maximum-likelihood problems admit exact reformulations as exponential cone programs (ECPs), enabling the replacement of log-sum-exp terms in the likelihood with convex conic constraints. This reformulation allows efficient solution by modern conic solvers using interior-point algorithms with polynomial-time complexity guarantees, thereby ensuring robustness and stable convergence. Numerical experiments on synthetic datasets demonstrate that the ECP approach consistently outperforms gradient-based methods in both runtime and solution quality, highlighting exponential cone programming as a practical and scalable alternative for estimating MNL and QR models.

13:50
Reinforcement Learning-Enhanced GRASP for the Multiple Traveling Repairmen Problem with Workload Balance

ABSTRACT. The Multiple Traveling Repairmen Problem with Balanced Workloads (mTRP-WB) is a novel variant of the classical Multiple Traveling Repairmen Problem (mTRP), motivated by real-world applications where fairness in workload distribution is as important as service efficiency. Typical scenarios include urban logistics, preventive maintenance, and healthcare services, where tasks must not only minimize customer waiting times but also be allocated equitably among repairmen to avoid overburdening certain agents. In mTRP-WB, the objective is to minimize a weighted combination of total customer latency and workload imbalance, making the problem computationally challenging as it generalizes both mTRP and load balancing. To address this challenge, we propose a hybrid metaheuristic called GRASP-RL, which combines the Greedy Randomized Adaptive Search Procedure (GRASP) with Reinforcement Learning (RL). In GRASP, the RL-based adaptive strategy dynamically guides the restricted candidate list (RCL) selection, allowing the algorithm to gradually learn which choices lead to a higher-quality initial solution, thereby balancing exploration and exploitation during the construction phase. In the improvement phase, Variable Neighborhood Search (VNS) is employed to exploit promising solution spaces and further enhance solution quality. Extensive computational experiments on benchmark instances demonstrate the impressive efficiency of GRASP-RL across many cases. Compared with state-of-the-art metaheuristics for related problems, GRASP-RL achieves competitive performance despite not being specifically designed for them, highlighting its robustness and scalability.

14:10
The Min-makespan Vehicle Routing Problem with Drones under Multiple Trips and Visits

ABSTRACT. This paper introduces a novel hybrid pickup routing problem that integrates drones and ground vehicles operating in parallel and independently to minimize the system makespan, while ensuring that the time from each customer’s pickup to depot arrival does not exceed a predefined waiting time threshold. The model incorporates practical constraints, including vehicle capacity and drone endurance limits, as well as support for multiple trips with multiple customer visits per trip. We formalize this problem as the Min-makespan Vehicle Routing Problem with Drones under Multiple Trips and Visits, which has applications in epidemic response and reverse logistics. To the best of our knowledge, this represents the first comprehensive study of this particular problem formulation.

We develop a Mixed Integer Linear Programming model to capture the problem structure and propose an adaptive tabu search algorithm that incorporates multiple neighborhood structures and memory mechanisms to enable effective exploration of the solution space. Extensive computational experiments on a newly developed benchmark dataset with up to 1,000 customers demonstrate the algorithm’s efficiency. Furthermore, we show the algorithm’s versatility by adapting it to solve related routing problem variants, with comparative results confirming its competitiveness against state-of-the-art methods.

14:30
Grey Wolf Optimization with Entropy Control for Coverage in DSNs

ABSTRACT. Ensuring reliable and efficient q-coverage in directional sensor networks (DSNs) is a crucial yet challenging task due to heterogeneous coverage requirements, directional limitations, and the NP-hard nature of deployment optimization. To tackle this problem, we propose AACGWO (Adaptive A/C Dynamics Grey Wolf Optimizer), a novel metaheuristic algorithm designed for the target-oriented $q$-coverage problem. Unlike conventional Grey Wolf Optimizer (GWO) variants that adopt linearly decreasing parameters, AACGWO introduces a cosine-based decay mechanism to ensure smoother phase transitions. In addition, it employs entropy-guided adaptation to dynamically balance exploration and exploitation by monitoring population diversity. The algorithm further incorporates an angle-based encoding strategy to optimize sensor orientations in DSNs. Extensive experiments are carried out under multiple deployment scenarios, covering variations in sensor density, target distribution, sensing range, and spatial configuration. The results consistently demonstrate that AACGWO outperforms recent GWO-based approaches, including DEGWO, IGWO2, ACGWO, and CGGWO, with respect to coverage satisfaction rate, fairness, resource efficiency, and redundancy minimization. These findings confirm the robustness, scalability, and adaptability of AACGWO for real-world DSN applications, particularly in complex and dynamic deployment environments.

14:50
Modeling and Solving the Bin Packing Problem with Relaxed Capacity Constraints: Applications in Agricultural Land Consolidation in Vietnam

ABSTRACT. This paper introduces the Bin Packing Problem with Relaxed Capacity Constraints (\textbf{BPRC}), a novel variant of the well-known Bin Packing Problem arising from Agricultural Land Consolidation in Vietnam. \textbf{BPRC} aims to assign farming households, each with an expected land area, to agricultural fields of fixed areas such that the total absolute difference between each field's area and the sum of assigned households' expected areas is minimized. In this paper, we prove \textbf{BPRC} is NP-hard via reduction from the 2-Partition problem. To solve it, we propose a Mixed Integer Programming model for exact method using solvers like CPLEX, incorporating optimality properties to enhance efficiency, and a Tabu Search metaheuristic for large-scale instances. Extensive experiments on a real-world instance from Thai Binh province and 27 synthetic instances demonstrate that our methods outperform Vietnam's government guidelines, eliminating added parcels and reducing area deviations significantly. The exact method solves small instances optimally within 30 minutes, while the metaheuristic yields near-optimal or optimal solutions in under 2 minutes, proving practical for real-world applications.

13:30-15:30 Session 4D: SOICT Technical Session VIII: Multimedia Processing
Location: Yersin B
13:30
DTD-Mamba: Dual Teacher Distillation for Mamba in Head and Neck Abscess Segmentation

ABSTRACT. Accurate delineation of head and neck abscess boundaries on contrast-enhanced CT is essential for diagnosis and treatment planning. However, it remains difficult due to ambiguous lesion margins and this region’s complex anatomy. In this study, we present Dual Teacher Distillation from Mamba and CNN-based models (DTD-Mamba). This efficient segmentation framework trains a compact Mamba student network under guidance from two heterogeneous teachers. A CNN teacher emphasizes local textures and sharp edges, while a Mamba-based teacher captures long-range dependencies and the global context of related anatomy. Our distillation objective jointly transfers knowledge at multiple levels, allowing the student to inherit both fine-grained details and holistic structure without incurring the computational cost of large models. We evaluate DTD-Mamba on our head and neck abscess dataset, characterized by heterogeneous appearance and intricate anatomy. Our proposed model achieves a Dice Similarity Coefficient of 0.44, an Intersection-over-Union of 0.31, and a Normalized Surface Distance of 0.74. Moreover, this architecture substantially reduces computation and memory requirements compared to its predecessors. These results highlight both the intrinsic challenge of precise boundary delineation in this clinical setting and the benefit of combining local and global supervision during training. DTD-Mamba provides a practical approach to deploying resource-efficient segmentation for complex neck infections. Our code will be released upon paper acceptance.

13:50
VietMed-VQA: A Novel Dataset and Benchmark for Vietnamese Medical Visual Question Answering

ABSTRACT. Medical visual question answering systems hold significant promise for assisting healthcare through automated image analysis. However, most existing systems are English-centric, limiting their accessibility in non-English-speaking regions such as Vietnam. To bridge this gap in multilingual medical AI, we introduce VietMed-VQA, a novel Vietnamese dataset for medical visual question answering, derived from translating established English benchmarks including PathVQA, SLAKE, and VQA-RAD. Leveraging Llama-3.1-70B-Instruct with domain-tailored prompts, we ensure translation accuracy via back-translation, embedding-based semantic checks, and lightweight filtering that removes only the bottom 1\% of pairs. LoRA fine-tuning on three state-of-the-art vision-language models Llama-3.2-11B-Vision-Instruct, Qwen2-VL-7B-Instruct, and LLaVA-1.5-7B, achieves strong results, including 82.9\% accuracy for Llama-3.2-11B on SLAKE. Our ablation studies reveal that subtle filtering enhances performance without sacrificing data volume. Furthermore, evaluations on unseen datasets like MedVQA and WorldMedQA-V demonstrate robust generalization, with accuracies exceeding 64\% on challenging out-of-domain samples. These resources lay the groundwork for AI-driven diagnostics and training in Vietnamese healthcare, paving the way for broader adoption of multilingual medical AI in underserved regions.

14:10
MasHeNe: A Benchmark for Head and Neck CT Mass Segmentation using Window-Enhanced Mamba with Frequency-Domain Integration

ABSTRACT. Head and neck masses are space-occupying lesions that can compress the airway and esophagus and may affect nerves and blood vessels. Available public datasets primarily focus on malignant lesions and often overlook other space-occupying conditions in this region. To address this gap, we introduce MasHeNe, an initial dataset of 3,779 contrast-enhanced CT slices that includes both tumors and cysts with pixel-level annotations. We also establish a benchmark using standard segmentation baselines and report common metrics to enable fair comparison. In addition, we propose the Windowing-Enhanced Mamba with Frequency integration (WEMF) model. WEMF applies tri-window enhancement to enrich the input appearance before feature extraction. It further uses multi-frequency attention to fuse information across skip connections within a U-shaped Mamba backbone. On MasHeNe, WEMF attains the best performance among evaluated methods, with a Dice of 70.45 %, IoU of 66.89 %, NSD of 72.33 %, and HD95 of 5.12 mm. This model indicates stable and strong results on this challenging task. MasHeNe provides a benchmark for head-and-neck mass segmentation beyond malignancy-only datasets. The observed error patterns also suggest that this task remains challenging and requires further research. Our dataset and code will be made publicly available upon acceptance of the paper.

14:30
An Optimization-Driven Fusion Framework of Vision–Language Foundation Models for Large-Scale Video Retrieval

ABSTRACT. Video retrieval from large-scale datasets has become increasingly vital as the demand for efficient access to multimodal information continues to grow. Yet, existing approaches often fall short when confronted with complex queries that require understanding both global semantics and fine-grained temporal or contextual dependencies. To overcome these limitations, we propose a hybrid multimodal retrieval framework that unifies the complementary strengths of CLIP and BEiT-3. CLIP offers robust global alignment between visual and textual representations, while BEiT-3 captures nuanced contextual relationships within frames. The framework further integrates LLM-based per-frame captioning, OCR, and ASR modules to enrich multimodal semantics. A temporally aware multi-channel re-ranking mechanism then fuses these representations to accurately retrieve sequential and multi-stage events. Evaluated on the 2025 Ho Chi Minh City AI Challenge dataset, our system achieves notable gains in retrieval accuracy for complex and context-dependent video queries.

14:50
Text-Driven 3D Interior Scene Generation using 3D Gaussian Splatting

ABSTRACT. Text-driven generation of 3D(3D) indoor environments underpins applications in AI-assisted interior design, e-commerce visualization, and immersive XR. Yet practical systems still struggle to turn natural language into coherent, editable 3D scenes with high visual and geometric fidelity. We present a pipeline focused on robust panorama initialization, geometry-preserving optimization, and efficient view completion. Concretely, we introduce (i) a Best-of-N panorama selection guided by a composite CLIP–Discontinuity score to enforce semantic alignment and global wrap-around consistency, (ii) engineering optimizations to Progressive Novel View Inpainting (PNVI) that improve throughput, and (iii) depth-regularized 3D Gaussian Splatting (3DGS) to stabilize geometry during optimization. We further curate a bilingual (EN/VI) evaluation set of 50 prompts spanning 34 room types to stress diversity and linguistic coverage. Empirically, our components jointly yield more faithful, structurally consistent, and editable indoor scenes from text: Best-of-N selection improves semantic alignment (CLIP: 55.69 to 60.03) and reduces discontinuities (111.48 to 45.24), depth-regularized 3DGS enhances geometric fidelity (PSNR: +1.01 dB, SSIM: +0.0193), and PNVI optimizations achieve 35% runtime reduction (122s to 80s) while maintaining practical efficiency.

15:10
When Events Speak: MLLM-Guided Video Retrieval with Temporal Reranking

ABSTRACT. The rapid expansion of online video content has intensified the need for efficient retrieval of specific moments. However, most existing video retrieval methods fail to capture short-lived events that occur within only a few frames and overlook dynamic cues such as motion, audio, and dialogue. This lack of event awareness prevents the system from understanding how actions evolve, leading to fragmented interpretations of event sequences. To address these challenges, we propose an event-aware video retrieval system that integrates MLLM-based event capturing with a temporal reranking mechanism. The MLLM captures both coarse-grained and fine-grained events, providing semantically rich and temporally aligned descriptions. The temporal reranking module then re-assesses the entire event sequence, dynamically relinking results based on recomputed sequence scores to maintain temporal coherence. This design allows the system to retrieve results that are both semantically meaningful and temporally coherent. Evaluated on the Ho Chi Minh AI Challenge 2025, our system achieved 85 out of 88 correct results with accuracy above ninety-five percent across three evaluation stages, demonstrating strong robustness in complex event-sequence retrieval.

13:30-18:00 Session 4E: Poster Exhibition
VisionCare: Compute-Aware Hybrid CNN–Transformer Heads for Multi-Disease Retinal Diagnosis with Explainable AI

ABSTRACT. Population-scale retinal screening demands models that are accurate, interpretable, and computationally efficient. We present VisionCare, a unified and reproducible framework for multi-disease diagnosis from color fundus photographs, trained end-to-end using a single, fixed recipe at a standard input resolution of 224×224. Built on a ConvNeXt-Base backbone, VisionCare explores three progressively more expressive classification heads: (A) a top-down feature pyramid with generalized mean pooling (GeM) and squeeze-and-excitation (SE); (B) a compute-aware reverse-pyramid design that compresses multi-scale features to the deepest stride with dilated refinement; and (C) a dual-path fusion module that incorporates pooled-key/value self-attention with a learned gating mechanism to balance convolutional and non-local representations. VisionCare supports case-level interpretability via Grad-CAM and Grad-CAM++, and includes lightweight integration points for tele-ophthalmology deployment. On a benchmark dataset with 11 diagnostic categories, the attention-gated head (C) achieves a test accuracy of 0.9138, macro-F1 of 0.9289, Cohen's kappa of 0.9014, and a macro-AUROC of ≈0.996 without increasing input resolution or modifying the training procedure. By standardizing the pipeline and holding optimization constant, we isolate architectural contributions and maintain practical feasibility for deployment on commodity GPUs.

An In-Depth Investigation into Vietnamese LexicalText Normalization on Social Media

ABSTRACT. Informal Vietnamese on social media, often rich in abbreviations and misspellings, poses significant challenges for natural language processing (NLP). This study presents an in-depth investigation of Vietnamese lexical normalization through evaluating a broad range of approaches, including deep learning models, transformer-based transfer learning, and prompting-based large language models. The experimental results show that ViT5, a Vietnamese-specific T5 variant, delivers the highest performance, achieving a 63.06% error reduction rate (ERR) over the Leave-As-Is baseline. At the same time, LLM-based prompting approaches yield only moderate gains and, in some cases, even underperform due to over-normalization and loss of context. We then assess the impact of normalization on five downstream Vietnamese social media NLP tasks. Normalization markedly boosts performance in Constructive Speech and Toxic Speech Detection (F1 improvements from 69.29% to 83.82% and from 68.25% to 79.77%, respectively), while showing limited or even negative effects on tasks that require the preservation of subtle linguistic cues, such as Emotion Classification and Hate Speech Detection. These findings underscore the value of targeted normalization in improving task performance where non-standard text forms impede model understanding, while cautioning against its blanket application in contexts reliant on nuanced linguistic features.

A Method for Composing Concerns into a Unified Domain Model in Domain-Driven Design

ABSTRACT. Domain-Driven Design (DDD) emphasizes iterative development around a rich domain model to align developers and domain experts. While ubiquitous language and Domain-Specific Languages (DSLs) improve expressiveness and maintainability, modern systems often require multiple heterogeneous DSLs to cover diverse concerns. Existing DDD approaches, however, lack systematic methods to compose such DSLs, resulting in fragmented models and limited automation. Although meta-modeling offers a standard way to define DSLs, it is often rigid and framework-dependent. This paper introduces a novel method for composing heterogeneous concern DSLs into a unified domain model within DDD. Each DSL is defined with consistent syntax and formal semantics, and integrated via an annotation-based composition mechanism at the abstract syntax tree (AST) level. This ensures concern orthogonality, model cohesion, and supports consistency checking, automated code generation, and traceability. The approach is implemented using JetBrains MPS and the JDA framework, and validated through representative case studies, advancing modular and executable domain modeling for complex systems.

MedPRS: Scientific Paper Submission Recommendation System for Medical Research

ABSTRACT. A paper submission recommendation system aim to assist researchers in choosing suitable journals or conferences for their work. This research topic has been extensively studied during the last five years. The need for such systems is especially critical in the medical field, where the rapid expansion of biomedical literature makes selecting appropriate venues increasingly challenging. In this study, we propose three approaches developed on a newly constructed dataset comprising 1.2million biomedical articles from 1,406 journals. The dataset is enriched with metadata such as titles, abstracts, keywords, journal Aims & Scope, and a newly introduced feature, Categories. By leveraging domain-specific transformer models, including BioBERT and BioMedBERT, our system achieves strong performance, with Top-1 to Top-10 accuracies ranging from 0.6865 to 0.9582.

Enhancing YOLOv11n for Reliable Child Detection in Noisy Surveillance Footage

ABSTRACT. This paper presents a practical and lightweight solution for enhancing child detection in low-quality surveillance footage, a critical component in real-world missing child alert and daycare monitoring systems. Building upon the efficient YOLOv11n architecture, we propose a deployment-ready pipeline that improves detection under challenging conditions including occlusion, small object size, low resolution, motion blur, and poor lighting, common in existing CCTV infrastructures. Our approach introduces a domain-specific augmentation strategy that synthesizes realistic child placements using spatial perturbations (e.g., partial visibility, truncation, and overlaps) combined with photometric degradations (e.g., lighting variation and noise). To improve recall of small and partially occluded instances, we integrate Slicing Aided Hyper Inference (SAHI) at inference time. All components are trained and evaluated on a filtered, child-only subset of the Roboflow Daycare dataset. Compared to the baseline YOLOv11n, our enhanced system achieves a mAP@0.5 of 0.967 and mAP@0.5:0.95 of 0.783, yielding absolute improvements of 0.7% and 2.3% respectively, without architectural changes. Importantly, the entire pipeline maintains compatibility with low-power edge devices and supports real-time performance, making it highly suitable for cost-sensitive deployments in industrial surveillance applications. The example augmented dataset and the source code used to generate it are available at: https://github.com/html-ptit/Data-Augmentation-YOLOv11n-child-detection

Accurate Mixed-Gas Concentration Prediction in Electronic Nose Using Image-Guided Autoencoder–TCN Hybrid Model

ABSTRACT. Reliable estimation of gas mixture concentrations is fundamental for advancing intelligent electronic-nose technologies. Nevertheless, the accurate prediction of gas mixture concentrations remains a challenge because of the complex interactions between the gases, drift, and noise from the sensors. Many of the existing methods are limited in terms of the generalizability of the model. To address this challenge, this study presents an image-based Autoencoder–TCN hybrid model to predict gas concentrations in mixtures using an electronic nose sensor array. Raw sensor signals are converted into images, allowing the autoencoder to extract deep nonlinear features that reflect the complex interactions between different gases. These features are then fed into the temporal convolutional network (TCN) regressor, resulting in an accurate estimation of multicomponent gas concentrations. The experiment was conducted on both a public dataset consisting of a mixture of methane and ethylene and a private dataset consisting of a mixture of ammonia and hydrogen sulfide. The proposed method achieved a high accuracy and demonstrated noise tolerance and generalizability. Under five-fold cross-validation, our model achieved superior performance compared to baseline models, such as MLP, CNN, RNN, and XGBoost. This combination opens up the potential for real-time gas monitoring in environmental and industrial applications.

Merging-based Federated Learning for Lifelong Whole Slide Image Analysis with Vision-Language Models

ABSTRACT. Whole Slide Images (WSIs) are gigapixel-scale pathology images essential for accurate cancer diagnosis and prognosis. However, current WSI analysis methods typically require training a separate model for each cancer type, leading to scalability issues, high computational cost, and practical limitations in data sharing across institutions. Recent continual learning approaches aim to build unified multi-task models but remain constrained by fixed class/task assumptions and the requirement of centralized training data, which is often impractical due to privacy concerns and the massive size of WSIs. In this study, we propose an efficient federated learning framework for WSI analysis that aggregates knowledge from distributed models into a single multi-purpose model without requiring raw data sharing. Our method leverages pathology vision–language models (VLMs) as the backbone, fine-tunes them on local tasks using class prompts, and merges the resulting weights through a model-merging strategy. We evaluate the approach on six TCGA cancer subtyping tasks under both class-incremental (CLASS-IL) and task-incremental (TASK-IL) settings. Experimental results demonstrate that our method consistently outperforms continual learning and zero-shot baselines, achieving the best trade-off between accuracy and forgetting. These findings highlight the potential of federated model merging to enable scalable, privacy-preserving, and clinically useful WSI analysis.

Domain-Incremental Learning for UAV Traffic Video Anomaly Detection

ABSTRACT. Anomaly detection in video is a crucial research area, but traditional methods face challenges when environmental conditions change, particularly in real-world scenarios. This paper addresses this issue by applying Domain-Incremental Learning (DIL) to anomaly detection in UAV-based traffic surveillance, enabling the model to better adapt to different weather conditions while mitigating catastrophic forgetting. We experiment with four prominent anomaly detection methods: Future-Frame Prediction (FFP), Spatio-Temporal Dissociation (STD), Margin Learning Embedded Prediction (MLEP), and Memory-guided Normality for Anomaly Detection (MNAD). These methods are combined with forgetting mitigation strategies such as Elastic Weight Consolidation (EWC) and Experience Replay (ER). The experiments are conducted across three weather domains: clear, snow, and fog. The results show that FFP outperforms the domain-specific models, achieving 1-4\% higher performance, demonstrating good generalization across domains. STD performs well in foggy conditions (AUC = 56.54), while MLEP and MNAD struggle with forgetting knowledge from previous domains, showing unstable performance (AUC = 51.63 and 51.14 on the Original domain). These results highlight the potential of DIL in developing flexible and efficient anomaly detection systems, and also point to the need for improving forgetting mitigation strategies to optimize model generalization, particularly with methods such as MLEP and MNAD.

A Dual-Path approach for Time Series Anomaly Detection in Building Environmental Sensors

ABSTRACT. Anomaly detection within environmental time series data plays a crucial role in modern monitoring systems, yet it continues to pose challenges due to the inherently complex and nonlinear nature of sensor-generated signals. This study proposes a dual-path approach for time series anomaly detection that combines the expressive capabilities of deep learning with the transparency of classic machine learning techniques. The approach integrates a bidirectional Long Short-term memory (LSTM) autoencoder for extracting temporal features with density-based outlier detection algorithms, specifically Local Outlier Factor (LOF) and Isolation Forest. This methodology effectively models time-dependent patterns while maintaining a balance between interpretability and computational cost. The proposed approach shows significant improvements compared to standalone deep learning and conventional statistical approaches, across various evaluation metrics through extensive testing on indoor environmental sensor datasets. The results analyze the different impacts of components on the anomaly detection process: Isolation Forest (49.08%), Reconstruction Error (39.27%), and LOF (11.65%). Using synthetic data with differing noise intensities improved the model’s resilience across diverse anomaly categories—point, contextual, and collective-achieving detection rates above 86%. These findings highlight the approach’s practical value in real-world environmental monitoring by balancing high accuracy and interpretability.

FLoRA-KD: Efficient Communication in Federated Learning for Multi-Organ Segmentation through LoRA Knowledge Distillation

ABSTRACT. Models for medical image analysis require large-scale datasets with expert-annotated labeling. However, most datasets are either partially labeled or collected from multiple institutions, leading to issues such as data inconsistency and quality problems. Additionally, medical data sharing is restricted by privacy regulations. Federated learning is a well-established approach to address these challenges. It ensures data privacy by sharing model parameters instead of raw data, mitigating privacy concerns. However, previous studies on multi-organ CT segmentation and federated learning did not consider client scalability and communication overhead. Due to the heterogeneity of partially labeled data, repetitive parameter sharing between the server and clients can lead to issues including increased communication overhead, scalability limitations, and potential delays. To address these limitations, we propose “FLoRA-KD” an efficient federated learning method for reducing communication costs on multi-organ segmentation, which uses LoRA aggregation and LoRA knowledge distillation with partially labeled datasets. FLoRA-KD initializes and trains each client with the global LoRA, allowing efficient fine-tuning on local private datasets with minimal parameter updates. Moreover, sharing each client's LoRA adapter enables knowledge transfer from the latest updated parameters on other datasets. Our proposed method was evaluated on three publicly available abdominal CT datasets. Experimental results demonstrated that FLoRA-KD outperformed state-of-the-art methods in communication efficiency while achieving high accuracy.

Auto-Prompting with Retrieval Guidance for Frame Detection in Logistics

ABSTRACT. Prompt engineering plays a critical role in adapting large language models (LLMs) to complex reasoning and labeling tasks without the need for extensive fine-tuning. In this paper, we propose a novel prompt optimization pipeline for frame detection in logistics texts, combining retrieval-augmented generation (RAG), few-shot prompting, chain-of-thought (CoT) reasoning, and automatic CoT synthesis (Auto-CoT) to generate highly effective task-specific prompts. Central to our approach is an LLM-based prompt optimizer agent that iteratively refines the prompts using retrieved examples, performance feedback, and internal self-evaluation. Our framework is evaluated on a real-world logistics text annotation task, where reasoning accuracy and labeling efficiency are critical. Experimental results show that the optimized prompts—particularly those enhanced via Auto-CoT and RAG—improve real-world inference accuracy by up to 15\% compared to baseline zero-shot or static prompts. The system demonstrates consistent improvements across multiple LLMs, including GPT-4o, Qwen 2.5 (72B), and LLaMA 3.1 (70B), validating its generalizability and practical value. These findings suggest that structured prompt optimization is a viable alternative to full fine-tuning, offering scalable solutions for deploying LLMs in domain-specific NLP applications such as logistics.

Factors Influencing the Actual Use of AI-Enabled Chatbots in Digital Wallets for Personal Financial Management Among Vietnamese Online Users

ABSTRACT. As digital wallets gain popularity in Vietnam, AI-enabled chatbots are powerful tools for personal financial management, yet their actual usage remains limited and underexplored. This study investigates the key factors influencing their adoption by integrating the Information Systems Success model, Unified Theory of Acceptance and Use of Technology 2, and Innovation Resistance Theory into a unified framework. Survey data from 543 Vietnamese users reveal that social influence, effort expectancy, performance expectancy, hedonic motivation, and AI self-efficacy positively impact behavioral intention, which in turn strongly drives actual usage. Meanwhile, the tradition barrier, image barrier, and risk barrier negatively affect intention. Quality factors like system quality, information quality, and anthropomorphism also contribute to shaping performance and effort expectations. Interestingly, service quality did not show a significant direct effect on performance expectancy. These findings suggest that to foster adoption, digital wallet providers should focus on enhancing user trust, reducing perceived barriers, and delivering intuitive, personalized, and socially supported AI experiences.

Toward Adaptive Web Application Honeypots: Fine-Tuned Large Language Models for Realistic Response Emulation

ABSTRACT. With the rise of complex attacks, traditional web honeypots struggle to maintain authenticity due to static and easily fingerprinted responses. This paper presents a fine-tuned Large Language Model (LLM)-powered honeypot framework that generates dynamic, context-aware responses closely aligned with real web application behaviors. To achieve this, we collect realistic requests and responses from target applications, preprocess them by extracting essential information from requests and normalizing responses, and fine-tune the LLM on these request-response pairs. Experimental results demonstrate that the proposed method consistently outperforms both raw-training baselines and existing LLM-based honeypots, namely Galah and VelLMes. Specifically, it achieves higher similarity scores across all metrics, with Cosine similarity reaching 0.9396 compared to 0.4506 for Galah and 0.7357 for VelLMes. Moreover, it yields a substantially lower Levenshtein distance at 329.35 compared to 564.98 for the baseline, 2940.25 for Galah, and 2340.99 for VelLMes. These improvements confirm the model’s ability to generate highly realistic, structurally valid, and functionally robust responses, thereby enhancing attacker engagement and deception effectiveness.

GAFB-MKL: Adaptive Filter Banks via Genetic Algorithm and Sparse Multiple Kernel Learning for EEG-based Motor Imagery Classification

ABSTRACT. Motor–imagery EEG (MI–EEG) decoding is difficult due to non-stationarity, inter-subject variability, and the concentration of discriminative information in narrow, person-specific sub-bands. We present GAFB–MKL, a compact and interpretable pipeline that adapts continuous, subject-specific filter banks with a Genetic Algorithm (GA), weights bands via sparse, nonnegative Multiple Kernel Learning (MKL) on FBCSP features, and classifies with a precomputed-kernel SVM. Unlike fixed-bank designs with post hoc feature selection, GAFB–MKL unifies supervised band adaptation and label-aligned fusion, yielding bandlevel attribution with few hyperparameters and no heavy training. On BCI Competition IV-2b, GAFB–MKL attains 79.65% average accuracy and peaks at 96.56% on Subject 4, surpassing fixed-bank FBCSP variants and slightly exceeding EEGNet (79.44%) under a unified preprocessing/evaluation pipeline. On the HCMIU hand–binary dataset, it reaches 63.21% on average, outperforming FA-GPNet (57.44%), CNN baselines, and classical Riemannian/FBCSP pipelines. Ablations corroborate the design principle discover, then distill: removing GA (fixed bank + MKL) reduces performance to 75.55%, while removing MKL (GA + SVM) yields 74.44%. Beyond accuracy, learned band weights consistently reveal subject-specific µ/β peaks, supporting fast calibration and transparent deployment in resource-constrained BCI.

Linguistic and Semantic Graph-based Neural Networks for Hate Speech Detection

ABSTRACT. Hate speech detection remains a challenging problem due to the diversity of expressions, implicit abuse, and frequent use of slang or coded language. Although pretrained and large language models (LLMs) have demonstrated strong capabilities, they often struggle with these issues. This paper proposes a graph-based framework that models relationships between text samples using both syntactic and semantic similarities. Each dataset is represented as a graph where nodes are text samples and edges are constructed based on token matching and cosine similarity of pretrained embeddings. Two graph neural network architectures—Graph Attention Networks (GAT) and the Unified Message Passing model (UniMP)—are employed to capture structural and contextual dependencies. Experiments on four real datasets show that our models consistently outperform LLMs on multi-class tasks. These findings highlight the effectiveness of GNNs as a resource-efficient alternative to LLMs for robust hate speech detection.

A deep learning model for drug-target interactions prediction in drug discovery

ABSTRACT. Accurately predicting interactions between drugs and proteins is a crucial step in the drug discovery process. While most current research focuses on regression-based prediction of binding affinity values, many real-world applications only require identifying whether a significant interaction exists. This study proposes an effective model for predicting binding affinity values in drug–target interaction (DTI) tasks by leveraging a Long Short-Term Memory (LSTM) network to extract sequential features from SMILES strings of drugs and amino acid sequences of proteins. The learned representations are then concatenated and passed through a regression layer to estimate the binding affinity. The training data is constructed from benchmark datasets, where binary interaction labels are derived based on predefined affinity thresholds. Experimental results demonstrate that the LSTM-based model achieves high predictive accuracy while reducing model complexity compared to regression-based approaches, highlighting its feasibility and effectiveness for virtual screening of drug–target interactions.

Optimization of Resource Allocation Using SLA Violation Penalty and Workload Prediction in Cloud Datacenters

ABSTRACT. This paper proposes a Service Level Agreement (SLA) Violation Penalty (SVP) mechanism as a new determinant for optimizing resource allocation in cloud datacenters. The proposed approach leverages time series workload prediction models to comprehensively consider prediction accuracy, inference latency, and node workload, effectively reducing SLA violations while improving system availability and operational efficiency. We evaluate ARIMA, LSTM, Bi-LSTM, LSTM-Attention, and LSTM-Autoencoder models on real-world datasets from Alibaba, Google Cluster, and Bitbrains, simulating diverse workload patterns. Results show that SVP-based allocation significantly improves node availability and lowers violation rates compared to prediction-based and minimum load methods. The mechanism also maintains stable performance in both resource-limited and abundant environments. Overall, SVP provides an effective strategy for minimizing SLA-related costs and improving datacenter performance.

Lightweight Multi-Trait IELTS Essay Scoring with Prompt- and Topic-Awareness

ABSTRACT. Automated essay scoring (AES) systems have been increasingly developed to assist in evaluation of IELTS Writing tasks. However, most existing systems typically predict a single overall score and are often too computationally expensive for low-end servers. In this paper, we introduce a lightweight, multi-trait AES model that scores essays on the four IELTS criteria–Task Response (TR), Coherence and Cohesion (CC); Lexical Resource (LR), and Grammatical Range and Accuracy (GRA)–by jointly encoding the prompt and essay with BERT, augmenting representations with topic-distribution features from FASTopic, and modeling inter-trait interactions with an attention mechanism and a trait-similarity loss. Each trait is derived from the shared prompt–essay representation and informed by topic features, interacting through attention mechanism, and producing interpretable trait-level scores. To enable low-resource deployment we distill the model into a more compact model. Evaluated on the chillies IELTS writing dataset (10,000+ essays), our model raises quadratic weighted kappa (QWK) by 2.78% over base BERT and outperforms prompting-based LLMs; the distilled model is approximately 5× faster and 4× smaller while maintaining competitive accuracy–offering a practical solution for real-world EdTech applications. This work provides a practical and interpretable AES solution, bridging research models and scalable deployment. Our prototype is available at https://study.engonow.com/ielts_writing_scoring

ViLexCPO: A Multi-Task and Preference-Aligned Framework for Legal Question Answering

ABSTRACT. In legal Question Answering (QA), especially for Vietnamese, building reliable systems is challenging because there is not enough high- quality training data, and the tasks often require complex logical reason- ing and accurate legal citations. Previous studies have focused on large language models (LLMs), but these models are difficult to use in practice due to high infrastructure requirements. This creates a need for small- to-medium language models (SLMs) that are specially trained and opti- mized for the legal domain. To address this need, we propose a two-stage training framework. In the first stage, we use multi-task supervised fine- tuning (SFT) to train the model to perform three tasks at the same time: (1) determine the usefulness of legal citations, (2) answer multiple-choice legal questions, and (3) predict the most relevant citations. In the second stage, we apply Contrastive Preference Optimization (CPO) to further align the model with high-quality human feedback, improving both the accuracy and legal soundness of its responses. Experiments on the VLSP LegalSLM dataset show that our approach improves citation accuracy by up to 20% compared to using SFT alone. It also performs well across all three tasks, with especially strong results in syllogism-based QA.

MSA: Breaking Down MOET Criteria into Sub-Criteria for Education

ABSTRACT. At the global level, efforts to decompose curricula and educational standards have applied techniques such as natural language processing (NLP) and adaptive syllabus generation. While effective in certain contexts, these approaches struggle with vague or high-level criteria and often lack the flexibility required for practical classroom use. Recent studies suggest that large language models (LLMs) such as GPT-4 can support tasks like exercise generation, syllabus development, and course mapping. However, they frequently exhibit overgeneralization, omissions, and cultural bias, indicating the need for structured decomposition strategies and human oversight. To address this gap, we propose MOET Sub-Criteria Analytic (MSA) a method that leverages in-context learning with LLMs to decompose the Ministry of Education and Training (MOET) criteria into actionable sub-criteria. To ensure the quality and pedagogical soundness of the generated outputs, we introduce EduQual-5, a systematic evaluation framework grounded in educational measurement theory and policy research. EduQual-5 operationalizes five interrelated dimensions—relevance, validity, clarity, feasibility, and fairness—into a transparent Likert-scale assessment conducted through both expert judgment and structured analysis. Experiments indicate that MSA produces sub-criteria rated at competitive Likert results compared to real sub-criteria of MOET.

XGPhy: A Machine Learning Framework for Predicting Optimization Difficulty in Maximum Likelihood Phylogenetic Inference

ABSTRACT. In the task of inferring phylogenetic trees from multiple se- quence alignments, widely-used maximum likelihood tools such as IQ- TREE and RAxML rely on heuristic optimization techniques to approx- imate globally optimal solutions. When the solution space exhibits a high density of near-optimal trees—indicative of a flat or rugged combinato- rial landscape—search effectiveness can be compromised. A key open problem is how to automatically characterize the hardness of a phyloge- netic inference instance before performing the search, enabling adaptive selection of search heuristics and hyperparameters. In this study, we present XGPhy, a machine learning framework that predicts instance-level difficulty in maximum likelihood phylogenetic in- ference. The model is trained using features and training data derived through a procedure adapted from Pythia (Haag, 2022), and leverages the XGBoost algorithm for supervised prediction. Preliminary results show that XGPhy’s difficulty scores are positively correlated with Pythia’s, indicating its promise as an automated pre-search diagnostic tool.

Enhancing User-Based Context-Aware Collaborative Filtering Using Energy Distance with Post-Filtering Contextual Features

ABSTRACT. This paper introduces an enhanced framework for user-based context-aware collaborative filtering (CACF-EDPF) that incorporates energy distance (ED) as a robust metric to capture distributional discrepancies in user–item interactions, alongside a post-filtering (PF) mechanism to refine recommendations based on contextual features. Unlike conventional CACF methods, which often struggle to effectively integrate contextual information, the proposed CACF-EDPF approach models user preferences with greater fidelity by jointly leveraging statistical distance and contextual relevance. Comprehensive experiments on three benchmark datasets—MovieLens, Amazon, and Yelp—demonstrate that CACFEDPF consistently outperforms both standard CACF and energy-based models (EBM) in terms of prediction accuracy and adaptability to context. These findings highlight the effectiveness of combining distribution-aware similarity with contextual post-filtering, pointing toward a promising direction for developing more accurate, flexible, and context-sensitive recommender systems for real-world applications.

TI-FS: Text and Images Mutual Support for Improving Few-Shot Learning in Cross-Device Image Recapture Detection

ABSTRACT. Text-image joint training has been extensively employed in computer vision tasks, particularly in cross-device image recapture detection. Moreover, utilizing text-image combined models for few-shot deep learning model fine-tuning has shown promising performance improvements. However, the integration of these two modalities presents significant challenges, and issues related to the quality of cross-device image recapture detection are often difficult to distinguish. To address these challenges, we develop a mutual guidance mechanism that enables the text-image joint training model to guide the few-shot deep learning model (TI-FS) through image representations and textual guidance components. Our extensive experiments demonstrate that our TI-FS model significantly outperforms current state-of-the-art methods in both general image recognition tasks and specifically in cross-device image recapture detection.

ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations

ABSTRACT. Recent advances in contextualized word embeddings have greatly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, but most progress has been limited to high-resource languages like English. Vietnamese, in contrast, still lacks robust models and evaluation resources for fine-grained semantic understanding. In this paper, we present ViConBERT, a novel framework for learning Vietnamese contextualized embeddings that integrates contrastive learning (SimCLR) and gloss-based distillation to better capture word meaning. We also introduce ViConWSD, the first large-scale synthetic dataset for evaluating semantic understanding in Vietnamese, covering both WSD and contextual similarity. Experimental results show that ViConBERT outperforms strong baselines on WSD (F1 = 0.87) and achieves competitive performance on ViCon (AP = 0.88) and ViSim-400 (Spearman’s rank correlation = 0.60), demonstrating its effectiveness in modeling both discrete senses and graded semantic relations. Our code, models, and data are available at https://github.com/tkhangg0910/ViConBERT.

LoDiBi: Automated Course Quality Evaluation Framework with LOQCA, DeepIFSA, and BiLSTM

ABSTRACT. Course quality is a critical factor shaping learner experience and institutional success. High-quality courses help learners achieve goals and ensure compliance with education standards. Traditional evaluation relies on expert review after course completion, which is slow, costly, and delays improvement. In Massive Open Online Courses (MOOCs), challenges are greater due to sparse and fragmented learning data. We propose LoDiBi (LOQCA, DeepIFSA, BiLSTM), a framework with three integrated modules. LOQCA automatically labels course quality from learner behavior. DeepIFSA imputes missing values using attention, CutMix, and contrastive learning, making it effective in sparse settings. BiLSTM captures temporal learning patterns to enhance prediction accuracy. Combined, these modules enable early prediction of course quality and provide instructional designers with actionable evidence for timely adjustments. Experiments on real-world MOOC datasets show that LoDiBi outperforms existing methods. Data quality was maximum (Completeness and Consistency reached 1). Balanced classification was achieved (Macro-F1, Balanced Accuracy greater than 0.9). Strong agreement with ground-truth labels was confirmed (MCC and Kappa greater than 0.9). Predictive performance was also high (Accuracy, Precision, and Recall between 0.93 and 0.94). LoDiBi provides a scalable solution for automated course evaluation, helping institutions make faster, data-driven decisions to enhance learning outcomes.

Exploring Consumer Behavior in Clean Food Consumption using Positive–Negative Association Rule Mining: A case study in Vietnam

ABSTRACT. The growing demand for clean food, including safe and organic products, reflects increasing global concerns about food safety, health, and sustainability. While developed markets such as Europe, North America, and parts of Asia have experienced steady growth in organic food consumption, Vietnam has more recently witnessed a rapid rise in demand, driven by greater health awareness and concerns over food contamination. However, the domestic market still faces challenges such as high prices, supply chain constraints, and limited consumer trust in certification. Previous studies mainly applied econometric methods to analyze purchasing intentions and willingness to pay, whereas recent advances in machine learning have enabled deeper insights into consumer behavior. Among these, association rule mining is particularly effective in uncovering hidden consumption patterns. This study employs positive–negative association rule mining to identify both frequent and infrequent purchasing habits in the clean food sector. The analysis was conducted using the dataset of a clean food retail chain in Hanoi. The experimental results provide implications for supply planning, inventory management, and the promotion of sustainable consumption.

Optimization of Kolmogorov–Arnold Networks for Reinforcement Learning via NeuroEvolution of Augmenting Topologies

ABSTRACT. The Kolmogorov–Arnold Network (KAN) emerges as an alternative neural architecture that replaces fixed activation functions with learnable basis functions, enabling improved interpretability and efficiency in function approximation. In this work, we propose a novel method for evolving KAN architectures using the NeuroEvolution of Augmenting Topologies (NEAT) algorithm. Unlike conventional gradient-based optimization, our approach explores both the structure and functional composition of networks through evolutionary search, allowing the discovery of compact and expressive models without backpropagation. We modified NEAT to incorporate the KANs formulation at each node, enabling the evolution of not only connectivity patterns but also node-specific functional mappings. Experimental results demonstrate that evolved KANs can achieve competitive performance in reinforcement learning tasks. This study highlights the potential of combining KANs with evolutionary computation to develop interpretable and gradient-free learning systems.

Towards Regional AQI Mapping in Northern Vietnam: Multi-Source Data Fusion and Ensemble Learning

ABSTRACT. Air pollution, particularly fine particulate matter (PM2.5), poses a growing threat to both the environment and public health in rapidly urbanizing regions. In Northern Vietnam, this challenge is amplified by dense population, concentrated industrial activities, and seasonal meteorological factors, highlighting the need for scalable and reliable forecasting solutions. We propose a multi-source data fusion framework that integrates ground monitoring data, meteorological variables from GFS, remote sensing data from MODIS/Terra and Sentinel-5P, and geographical indicators for PM2.5–AQI prediction and regional AQI mapping. Leveraging tree-based models (Random Forest, LightGBM, Extra Trees, Gradient Boosting) and the advanced ensemble technique Stacking, our framework outperforms traditional baselines. On the test set, it achieves an Accuracy of 0.6991 and F1-score of 0.6890, with strong balance across metrics such as Cohen’s Kappa and MCC. The resulting AQI maps highlight spatial–temporal disparities, with higher pollution levels in Hanoi and lower levels in coastal and mountainous areas. This study demonstrates a practical and scalable approach to bridging monitoring gaps and supporting air quality management at the regional scale.

Balanced Multimodal Training through Unified Forward-Backward Modulation Strategy

ABSTRACT. Multimodal learning aims to integrate complementary information from diverse modalities such as language, vision, and audio. A persistent challenge in this field is modality imbalance, where dominant modalities suppress weaker ones during training, limiting the benefits of multimodal fusion. Existing approaches often suffer from inaccurate estimation of modality contributions and fail to jointly consider feed-forward and backpropagation dynamics. We propose Balanced Classifier-Guided Modulation (BCGM), a general training strategy that dynamically balances modality learning across both forward and backward stages. BCGM introduces three key components: (1) Classifier-Guided Dropout, which uses discriminative scores from lightweight unimodal classifiers to guide information flow; (2) Adaptive Gradient Modulation, which scales gradients based on modality-specific learning progress; and (3) Directional Gradient Alignment, which aligns fusion gradients with unimodal signals to preserve modality diversity. BCGM is a plug-and-play module compatible with diverse multimodal architectures. Experiments demonstrate consistent improvements over strong baselines, and ablation studies confirm the importance of jointly optimizing forward and backward modality learning. Code is available at: https://anonymous.4open.science/r/BCGM-7D83.

Vehicle routing problems via Quantum Graph Attention Network Deep Reinforcement Learning

ABSTRACT. The Vehicle Routing Problem (VRP) is a fundamental NP-hard task in intelligent transportation systems with broad applications in logistics and distribution. Deep reinforcement learning (DRL) with Graph Neural Networks (GNNs) has shown promise, yet classical models rely on large multi-layer perceptrons (MLPs) that are parameter-heavy and memory-bound. We propose a Quantum Graph Attention Network (Q-GAT) within a DRL framework, where parameterized quantum circuits (PQCs) replace conventional MLPs at critical readout stages. The hybrid model maintains the expressive capacity of graph attention encoders while reducing trainable parameters by more than $50\%$. Using proximal policy optimization (PPO) with greedy and stochastic decoding, experiments on VRP benchmarks show that Q-GAT achieves faster convergence and reduces routing cost by about $5\%$ compared with classical GAT baselines. These results demonstrate the potential of PQC-enhanced GNNs as compact and effective solvers for large-scale routing and logistics optimization.

TaP-GA: A Novel Genetic Algorithm for Target-Prioritized, Orientation-Constrained, and Adaptive Coverage Optimization in Wireless Multimedia Sensor Networks

ABSTRACT. Wireless Multimedia Sensor Networks (WMSNs) are now creating a new era in target monitoring systems where each network node can record multimedia data such as video, images, and audio, but also pose greater challenges than typical Wireless Sensor Networks (WSNs) in ensuring quality of service (QoS), satisfying coverage requirements, and maintaining network lifetime. The Heterogeneous Target Coverage (HTC) problem is particularly noteworthy among these difficulties, as sensor orientation is limited to discrete directions and coverage requirements vary for each target according to its relevance. This paper addresses the HTC problem in two crucial scenarios: a fixed number of sensors and a fixed number of targets. Taking advantage of evolutionary algorithms' ability to find near-optimal solutions, we offer an improved version of the traditional Genetic Algorithm (GA), called Target-Prioritized GA (TaP-GA), with Hybrid Population Initialization, Coverage-Biased Crossover, Target-Guided Exploratory Mutation, and an adaptive selection mechanism. These enhancements allow us to extend the search space while preserving some of the superior properties. Multiple simulation scenarios reveal that the suggested strategy is more efficient and effective than previous solutions.

Fast Stochastic Greedy Algorithm for $k$-Submodular Cover Problem

ABSTRACT. We study the $k$-Submodular Cover (kSC) problem, a natural generalization of the classical Submodular Cover problem that arises in artificial intelligence and combinatorial optimization tasks such as influence maximization, resource allocation, and sensor placement. Existing algorithms for kSC often provide weak approximation guarantees or incur prohibitively high query complexity. To overcome these limitations, we propose a Fast Stochastic Greedy algorithm that achieves strong bicriteria approximation while substantially lowering query complexity compared to state-of-the-art methods. Our approach dramatically reduces the number of function evaluations, making it highly scalable and practical for large-scale real-world AI applications where efficiency is essential.

A Mathematical Model and Exact Column Generation Approach for RMSA Problem in Elastic Optical Networks

ABSTRACT. The Routing, Modulation, and Spectrum Assignment problem extends the RSA by incorporating modulation format selection to balance transmission reach and spectrum efficiency. As RMSA is NP-complete, most prior work has relied on heuristic, hybrid, or machine learning–based methods to improve scalability. This paper proposes a configuration-based Integer Linear Programming model and an exact solution framework using Column Generation approach for RMSA problem. Experimental evaluations on NSFNET and USNET topologies demonstrate that the proposed approach consistently attains near-optimal solutions within feasible runtimes for medium-scale and large-scale networks, underscoring both its effectiveness and practical applicability.

An Improved Initialization-based Evolutionary Algorithm for the Top k 2-Clubs Problem

ABSTRACT. The s-club problem has been widely applied in social network analysis and biological network studies. Among its variants, the Top k 2-Clubs problem has attracted considerable attention from the research community. The objective of this problem is to identify k large 2-clubs that maximize a combined score reflecting both their sizes and pairwise dissimilarities. This study focuses on enhancing the efficiency of the population initialization process. The proposed initialization method integrates both greedy and random strategies to achieve a balance between individual solution quality and population diversity. Experimental evaluations conducted on DIMACS benchmark datasets demonstrate that the proposed algorithm outperforms the original approach in approximately two-thirds of the test cases.

Evaluating Phylogenetic and Ancestral Recombination Graph Approaches for Analyzing RNA Virus Recombination: A Case Study of SARS-CoV-2 in Vietnam

ABSTRACT. Detecting recombination in rapidly evolving RNA viruses presents a significant computational challenge. This paper presents a case study using SARS-CoV-2 genomes from Vietnam to compare the performance of two complementary computational approaches: (i) RIPPLES, a specialized, tree-based method designed for SARS-CoV-2, and (ii) ARG4WG, a general-purpose tool for reconstructing ancestral recombination graphs (ARGs). RIPPLES demonstrated a clear advantage, identifying a strong, high-confidence recombination signal supported by the highest parsimony gain. The inferred breakpoints were consistent with previous findings and biologically plausible, being enriched in the Spike coding region. In contrast, ARG4WG inferred an unrealistically high number of recombination events that lacked clear functional patterns, suggesting its methodology is ill-suited for the unique characteristics of short, fast-mutating viral genomes. This case study demonstrates that specialized tools like RIPPLES are currently more reliable for SARS-CoV-2 analysis and highlights the critical need for novel or adapted ARG-based methods to accurately model the evolutionary history of RNA viruses.

Comprehensive Assessment of SLM Performance on Vietnamese High School History Tasks

ABSTRACT. Due to their low inference costs and ease of implementation on a large scale, small language models (SLMs) are becoming an increasingly viable choice in the field of education. While SLMs have been validated and benchmarked comprehensively on subjects such as Mathematics, Natural Sciences, and Foreign Languages, when it comes to Social Studies, specifically history, there does not exist a publicly available and standardized benchmark to assess the performance of SLMs. This paper introduces a benchmark, as well as a reproducible evaluation framework designed for models with less than or around 9 billion parameters. Our dataset was compiled from official Vietnamese National High School history examination questions in the 5-year period 2020-2024. Attached to each question are the labels of context and difficulty; labelling was performed manually and cross-validated to ensure consistency and objectivity. Additionally, we ran tests of 25 SLMs on our dataset, evaluating their performance on Multiple Choice and Essay Question tasks in Vietnamese history, analyzing a model’s overall accuracy, as well as scrutinizing a model’s reasoning as to how it arrived at its final answer. This paper contributes a curated dataset, an evaluation protocol, and standardized analyses, laying the foundation for objective comparisons between models and providing valuable insights into selecting the appropriate SLMs in the context of Vietnamese education. The source code and illustrative charts, as well as results and key findings of our experiments can be found at https://github.com/HoTuMinh/Framework-for-evaluating-SLMs-in-History-QA.

An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling

ABSTRACT. Large Language Models (LLMs) are increasingly considered for educational counseling, yet most existing efforts remain limited to prototypes or synthetic benchmarks, leaving little evidence from real-world deployments. We present MARAUS (Multi-Agent and Retrieval- Augmented University Admission System), a domain-specific conversational platform designed for higher-education admissions in Vietnam. MARAUS combines hybrid retrieval, multi-agent orchestration, and LLM based response generation into a lightweight and practical system. We conducted a two-phase study encompassing both development and live deployment. During two weeks of operation, MARAUS processed over 6,000 authentic user queries across six admission-related categories, achieving 87–94% accuracy with mean response times below 4 seconds. The system also proved highly cost-efficient, incurring only USD 11.58 using GPT-4o mini. Our findings provide rare empirical evidence on deploying agentic RAG systems in low-resource educational contexts and offer design insights for building trustworthy, scalable, and domain-adaptive advisory services.

Synthesizing Cultural Heritage: An End-to-End System for Designing Jewelry with Vietnamese Hue Imperial Motifs

ABSTRACT. The preservation and modernization of cultural heritage in the digital age pose multifaceted challenges. Key difficulties include: (1) the risk of cultural dilution when traditional motifs are adapted into modern forms, (2) the absence of computational frameworks capable of integrating visual and textual modalities for design synthesis, and (3) the lack of standardized benchmarks for evaluating cultural fidelity in AI- generated artifacts. To address these challenges, this paper introduces an end-to-end diffusion-based framework that leverages generative artificial intelligence for the creative reinterpretation of Vietnamese cultural arti- facts. Our system accepts a content image, a style image, and a textual prompt to generate unique jewelry designs that harmonize contempo- rary aesthetics with the rich artistic heritage of Hue imperial motifs. As a foundational contribution, we introduce the HueJewelry-500 dataset, which consists of approximately 500 labeled jewelry images and 200 au- thentic Hue motifs, enabling reproducible evaluation in this emerging domain. Quantitative and qualitative assessments validate the effective- ness of our approach: the framework achieves a CLIP score of 0.28, out- performing contemporary baselines, while a user study with 25 cultural and design experts yielded the highest overall rating of 3.98/5 for quality and cultural authenticity. These results demonstrate that our approach not only supports digital preservation but also facilitates modern rein- terpretation of heritage, bridging historical artistry with contemporary design paradigms.

Self-training from Self-memory in Data-to-text Generation

ABSTRACT. This paper introduces a novel training model, self-training from self-memory (STSM) in data-to-text generation (DTG), allowing the model to self-train on subsets, including self-memory as outputs inferred directly from the trained models and/or the new data. The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data (T2D), by two pre-defined conditions: (1) the appearance of all source values in the outputs of the D2T model and (2) the ability to convert back to source data in the outputs in the T2D model. We utilize a greedy algorithm to generate shorter D2T outputs if they contain all source values. Subsequently, we use the T2D model to confirm that these outputs can capture input relationships by demonstrating their capacity to convert text back into data. With 30% of the dataset, we can train the D2T model with a competitive performance compared to full training in the same setup. We experiment with our model on two datasets, E2E NLG and DART. STSM offers the D2T model a generalization capability from its subset memory while reducing training data volume. Ultimately, we anticipate that this paper will contribute to continual learning solutions that adapt to new training data, incorporating it as a form of self-memory in DTG tasks. Our repo is publicly available at https://github.com/hoangthangta/STSM.

Vietnamese-guided Post-OCR Processing for Historical Nom Scripts

ABSTRACT. Despite its historical prevalence as a writing system in Vietnam, the majority of Nom documents remain inaccessible due to the obsolescence of the language. While optical character recognition (OCR) offers a pathway to digitize Nom documents, its performance is severely constrained by the limited availability of Nom annotations for training. To circumvent this, this paper proposes a post-OCR processing framework that exploits existing Vietnamese transcriptions, which are often available even when parallel Nom texts are not. Specifically, we propose the Vietnamese-guided Residual Connections (VRC) module, a module that enriches Nom encoder states by attending to contextualized Vietnamese representations, enabling the model to recover from noisy OCR predictions while avoiding overfitting to errors. In the absence of established benchmarks, we also construct a novel corpus through automatic data generation to evaluate post-OCR processing methods. Experimental results demonstrate that even a simple Seq2Seq equipped with VRC achieves state-of-the-art performance, yielding up to 38% absolute gains in correction F1 and up to 11% in detection F1 over baselines, substantially improving both error detection and correction.

SelfCheckHybrid: A Hybrid Framework for Hallucination Detection in Vietnamese Large Language Models

ABSTRACT. Large language models (LLMs) demonstrate strong text generation ability but remain vulnerable to hallucinations, producing responses that are fluent and coherent yet factually incorrect. Although hallucination detection has been studied extensively in English, there is still no systematic approach for Vietnamese. As a low-resource language, Vietnamese lacks annotated datasets and specialized NLP tools, which limits the direct transfer of existing techniques. We address this challenge by adapting SelfCheckGPT, a black-box and zero-resource hallucination detection method, to Vietnamese. We explore multiple variants, including BERTScore, N-gram, Multiple-choice Question Answering and Generation, Natural Language Inference (NLI), and LLM Prompting. Building on these, we propose SelfCheckHybrid, a novel combination of NLI and prompting strategies. SelfCheckHybrid achieves accuracy comparable to SelfCheckPrompt while reducing computational cost by 44 percent, making it more efficient for practical use. Furthermore, we introduce the Vietnamese Hallucination Dataset, which consists of 210 manually annotated sentences with sentence-level hallucination labels. This benchmark represents the first resource for hallucination detection in Vietnamese. Our study provides both methodological advances and a new dataset, offering a foundation for reliable hallucination detection in Vietnamese and other low-resource languages.

R2E - Requirements-to-Execution System

ABSTRACT. Requirements analysis and task planning in software engineering often demand substantial time and effort. Recent advances in Large Language Models (LLMs), combined with Agentic AI, enable these processes to be performed more efficiently. In this work, we develop a system that generates tasks list from requirements. Based on Scrum framework, our system accepts project information as input and breaks down into user stories, and ultimately creates an optimized task schedule visualized by Gantt chart. Experiments conducted on real-world projects across companies demonstrate that the generated outputs align with realistic project plans by up to 80\% in terms of features and task content. These results highlight the effectiveness of the system in supporting task planning and project management, contributing to cost reduction and improved overall performance. This work illustrates the potential of LLMs in improving task scheduling and project management practices.

P-PQGC: A Proposed Post Quantization Gain Control for Offline and Streaming Whisper under Different Speaker-to-Microphone Distances

ABSTRACT. OpenAI's offline Whisper and its streaming architectures provide robust automatic speech recognition (ASR), which is crucial for real-time communication. However, in single-microphone meeting rooms, increasing the speaker-to-microphone distance (SMD) might lead to a lower-quality speech-to-text model. This paper proposes a Post Quantization Gain Control (PQGC) to address the challenge, focusing on adjusting the signal’s PCM amplitude, since factors such as noise, reverberation, and distance-related attenuation, if not properly handled, might distort the waveform and ultimately degrade the ASR performance. Furthermore, the research provides an effective architecture to integrate our PQGC into the streaming Whisper. By ensuring that every homogeneous segment is normalized separately before being combined, this proposed PQGC streaming approach maintains local consistency and enhances recognition performance under non-uniform SDM situations. Our proposed method consistently outperforms the RMS-based approach, achieving its best performance at a peak value of 0.8 with significant improvements for distant speech. When evaluated on the 100-hour AMI Meeting Corpus benchmark and our self-gathered Vietnamese datasets in both offline and streaming modes, it achieves WER reductions of up to 2.5%, demonstrating stable and consistent gains across conditions.

A No-Code Solution for Creating AR Indoor Navigation Applications

ABSTRACT. Indoor navigation in complex environments such as hospitals, malls, and university campuses is challenging due to complicated layouts and unclear signage. This paper presents an Augmented Reality (AR) navigation system using existing visual features as landmarks, requiring no additional infrastructure. Developed with Unity and Vuforia, it generates AR maps from floor plans and provides desktop and Android apps for map creation and navigation. The system offers prompt response times and reduces implementation costs while preserving interior design, making it suitable for healthcare, education, and retail environments. Future work will address dynamic indoor changes and advanced AR integration.

Fast and Lightweight CNN Model for EEG Person Identification on Constrained Hardware

ABSTRACT. Electroencephalography (EEG)-based person identification has emerged as a secure and spoof-resistant biometric solution, leveraging the uniqueness of individual brain activity. While convolutional neural networks (CNNs) have demonstrated strong performance in this domain, existing models often require large numbers of parameters and high-dimensional inputs due to reliance on many EEG channels, leading to excessive computational demands. Recent works have attempted to introduce lightweight models by reducing the number of EEG channels and network parameters. Yet, these approaches still generate excessive amounts of input data, which significantly slows down training and increases computational cost. This undermines the core objective of lightweight modeling by shifting the computational burden to the data pipeline, making the overall system far from efficient. We introduce a CNN-based EEG identification model that is lightweight in every aspect from architecture to data pipeline. The model uses only three EEG channels and 3-second input segments, keeping data compact and efficient. With approximately 64,000 learnable parameters—making it, to the best of our knowledge, the most lightweight CNN-based EEG identification model reported—it trains rapidly, requiring only ~2 seconds per epoch on a modest CPU. Despite its simplicity, the model achieves a 98.2\% rank-1 test accuracy. This balance of accuracy, speed, and ultra-low complexity makes the model especially suitable for small-scale applications, portable EEG devices, and scenarios where developers lack access to high-end GPUs or CPUs.

SolARG: A Collaborative Tangible Augmented Reality Game for Learning Gravity and Solar System Planets

ABSTRACT. Teaching children (aged 6–12) about gravity and the relative sizes of planets in the solar system poses a well-known challenge: these concepts are highly abstract, invisible in daily life, and difficult to grasp through traditional instructional methods. To address this, we designed and implemented SolARG, a collaborative augmented reality (AR) multiplayer game that transforms these abstract principles into interactive, embodied experiences. In SolARG, two players work together in a competitive mission to stabilize an asteroid by selecting planets on opposite sides that exert equivalent gravitational forces. Because distances between the asteroid and planets differ, players must reason about both mass (represented by planet size) and distance to achieve balance while avoiding planetary collisions. We deployed the game with 60 students aged 6–12 in real-world classroom settings. Results show that SolARG significantly enhanced engagement, supported rapid understanding of gravitational balance, and improved long-term retention of knowledge about gravity and planetary sizes. These findings suggest that collaborative AR games can make abstract STEM concepts more tangible, accessible, and memorable for young learners.

CAMIronment: Supporting Environmental Design Prototyping With Generative AI and Context-Aware Multimodal Interaction

ABSTRACT. Extended Reality (XR) technologies are increasingly expanding the opportunities of environmental design prototyping (e.g., interior design or MR classroom environments) by enabling users to create and manipulate virtual objects within immersive environments. These capabilities support rapid externalization of ideas, exploration of spatial configurations, and in-situ evaluation of experiential qualities. In contrast, conventional design approaches—such as sketches and CAD-based 3D modeling—are constrained by steep expertise requirements and lengthy iteration cycles. Recent advances in Generative AI present promising alternatives, particularly through text-to-3D pipelines that generate diverse assets from natural language prompts. Yet, systems that rely solely on textual input place significant cognitive and descriptive demands on users and often yield outputs misaligned with intended spatial or contextual requirements. To overcome these limitations, we introduce CAMironment, a context-aware multimodal interaction system that combines voice, gesture, and scene context as inputs for efficient 3D asset creation in environmental design prototyping. We conducted a comparative evaluation of CAMironment against a baseline text-to-3D system to examine usability and user effort. Results demonstrate that CAMironment alleviates users’ descriptive burden and enables more effective integration of generated assets into immersive design workflows.

Finite-time error control combining neural networks in noisy environments and mobile targets

ABSTRACT. This paper presents an integrated guidance and control approach for a single-channel high-speed aircraft (HSA) based on finite-time error control (FTC) augmented with an adaptive neural network (NN). The FTC scheme is devised to ensure that the tracking error converges to zero within a user-specified finite time, thereby guaranteeing fast response and precise interception performance even against highly maneuvering targets. To enhance robustness, a radial basis function neural network is incorporated into the controller design to estimate and compensate unknown nonlinear dynamics and external disturbances online. The proposed control law consists of an equivalent control component and a switching component, among which the NN serves as a real-time intelligent approximation module embedded in the equivalent part. An adaptive weight update rule is developed to automatically adjust the neural parameters during operation. Lya-punov-based theoretical analysis is conducted, confirming that the closed-loop system is finite-time stable and both tracking and estimation errors remain bounded in the presence of uncertainties. Numerical simulations are carried out to evaluate the effectiveness of the proposed FTC–NN scheme. Simulation results demonstrate clear superiority over conventional FTC and sliding mode control (SMC) methods in terms of tracking accuracy, convergence time, and disturbance rejection capability.

16:00-18:00 Session 5A: SOICT Technical Session IX: AI Applications, AI Foundations and Big Data
Location: Ballrooom A
16:00
FedEABoost: A Client Entropy Adaptive Boosting Framework for Federated Learning

ABSTRACT. Federated Learning (FL) enables distributed training of machine learning models across decentralized devices without sharing raw data, thus providing privacy-preserving solutions suitable for various applications. However, FL is significantly hindered by data heterogeneity across clients, known as the non-IID problem, leading to degraded model performance, slower convergence, and challenges in achieving fairness across client models. Existing methods typically struggle to adequately handle class imbalance, particularly when certain classes are underrepresented locally. In this paper, we propose FedEABoost, a novel FL framework integrating AdaBoost training principles to address the non-IID issue effectively. FedEABoost generates multiple local models in each client by iteratively emphasizing harder to classify samples, often from minority classes and then employs an entropy-based selection mechanism to choose the most suitable local models for aggregation. Extensive evaluations conducted on Fashion-MNIST and CIFAR-10 datasets demonstrate that FedEABoost outperforms conventional FL baselines, achieving up to 25\% improvement in global model accuracy across various non-IID scenarios. Our results highlight the robustness and effectiveness of FedEABoost in mitigating the impact of data heterogeneity, promoting stable convergence, and enhancing generalization.

16:20
Entropy-Based Gradient Weighting and Batch-Size Adaptation for Virtual Data-Parallel Training

ABSTRACT. Distributed deep learning commonly relies on uniform gradient averaging and fixed per-node batch sizes. These choices ignore that model confidence and mini-batch difficulty vary across workers and over training, which can dilute informative updates and produce unstable progress. We provide a simple control signal derived from predictive uncertainty. For each worker we compute entropy- and margin-based measures from its predictions, normalize them to obtain a weight, and use this weight in two ways: (i) to reweight gradients during aggregation, and (ii) to adjust the next-epoch batch size per worker so that computation is allocated where it is most informative. The method is optimizer-agnostic, communication-compatible, and easy to integrate into existing code. Experiments on CIFAR-10 with ResNet-18, VGG-16, MobileNet-V2, and a lightweight CNN show that our approach consistently improves the stability of training, yields higher area under the accuracy curve (AUC), and reaches target accuracy faster or on par with baselines. These results indicate that predictive uncertainty is an effective control signal for both gradient aggregation and resource allocation in distributed training.

16:40
AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

ABSTRACT. Training Large Language Models (LLMs) is highly memory-intensive, largely due to optimizer state overhead. The FRUGAL framework addresses this with gradient splitting, combining a stateful optimizer on a small subspace and a memoryless one on the remainder. However, FRUGAL relies on two static hyperparameters—the subspace ratio (ρ) and update frequency (T)—which may be suboptimal across training phases. We present AdaFRUGAL, an adaptive extension that introduces (i) a linear decay schedule for ρ to progressively reduce memory usage, and (ii) a loss-aware adaptive schedule for T to lower computational overhead as convergence slows. Experiments on large-scale English (C4), Vietnamese (VietVault), and GLUE fine-tuning show that AdaFRUGAL maintains competitive perplexity and downstream performance compared to AdamW and static FRUGAL, while significantly reducing GPU memory and training time. AdaFRUGAL offers a practical, autonomous solution for efficient LLM training in resource-constrained environments.

17:00
Part-GNN: A partitioning-based graph neural network for efficient memory large scale data classification

ABSTRACT. Graph Neural Networks (GNNs) are a subset of deep learning models that have shown great promise in various forms of classification problems, including image detection and classification as well as predict- ing labels for nodes or edges. They are designed to operate on graphs and rely on graph-structured data, which may include node and/or edge attributes. Several GNN variants have been developed, such as GCN, Cluster-GCN, GraphSAGE, and FastGCN. However, GNNs also have high memory requirements, largely due to the large adjacency matrix, which becomes computationally expensive for large-scale graphs, as well as long training times. In this work, we describe a new technique for graph partitioning in order to reduce the effective size of a graph while still achieving as high a predictive F1-score as possible. Additionally, we combine the graph partitioning method with layer-wise training to achieve greater computational efficiency.

17:20
Enhancing Survey Efficiency: A Validated Vietnamese Short-Form of the MBTI Developed Through Machine Learning

ABSTRACT. In academic and career counseling, validated personality tools like the MBTI are crucial, but standard versions with over 90 questions are time-consuming and difficult to implement in large items often sacrifice psychometric integrity by providing insufficient coverage of the four personality axes, thereby reducing reliability and classification accuracy. Compounding this, the direct application of international instruments in Vietnam risks cultural and linguistic inaccuracies, necessitating a tool that is not only parsimonious but also culturally resonant. This study addresses this gap by developing and validating an optimized MBTI-based questionnaire for Vietnamese youth. The methodology involved a rigorous language and context conversion process to ensure conceptual and psychometric equivalence, validated through qualitative evaluation by language, psychological, and cultural experts. Subsequently, a feature extraction process utilizing a Random Forest model with Recursive Feature Elimination with Cross-Validation (RFECV) was employed to identify and select items with the highest classification power. The optimized instrument was then administered to a sample of approximately 1800 participants for quantitative validation. The results were positive, with the instrument demonstrating acceptable to excellent psychometric properties and a valid factor structure confirmed by Confirmatory Factor Analysis, indicating a good model fit. The study successfully produced a culturally resonant and psychometrically robust personality assessment tool suitable for the target audience.

17:40
CITADEL: A Web-Based Faculty Performance Evaluation and Decision-Support System for Higher Education Institutions

ABSTRACT. Faculty Performance Evaluation (FPE) in higher education often depends on PDFs and spreadsheets, producing delayed consolidation, non-transparent assessments, and weak decision support. This paper presents a university-wide platform that unifies configurable Performance Indicators (PIs), distributed scoring with endorsements and governed overrides, real-time dashboards and reports, and period archiving with provenance. The architecture adopts an evidence-first, live-consolidation model in which each PI serves as both intake form and data source, enabling permission-scoped views from criterion to PI to evidence. Related work shows gains in automation but persistent fragmentation and limited analytics; the design addresses these gaps via modular services (PI management, scoring, workflow, reporting, audit), role-based access control (RBAC) with attribute-based access control (ABAC), and immutable audit events. An expert appraisal with n = 15 IT professionals used a guided walkthrough and an ISO/IEC 25010 survey. All constructs—Functional Suitability, Performance Efficiency, Usability, Reliability, Security, Maintainability, and Portability—were above neutral (95% CI lower bounds ≥ 5.05) with acceptable to excellent internal consistency. Findings indicate early fitness for purpose and support mid-cycle decision making.

18:00
AUF iAssist: A Web-Based Helpdesk System for Efficient Support and Concern Resolution

ABSTRACT. This study presents AUF iAssist, a centralized, web-based helpdesk for Angeles University Foundation that consolidates fragmented channels into a unified platform integrating ticket workflows, a knowledge-driven FAQ chatbot, AI-powered solution recommendations, role-based dashboards, and automated notifications. Built on a secure client–server architecture with modular services, the system enhances traceability, scalability, and policy compliance. Using a structured walkthrough with 15 IT professionals (purposive sampling), we evaluated AUF iAssist against ISO/IEC 25010. Results show strong reliability across Functional Suita-bility, Performance Efficiency, Security, and Reliability (Cronbach’s α > .90), with Usability highest (M = 5.29). AUF iAssist reduces administrative workload, improves response times, and increases user satisfaction, aligning with SDG 4 and SDG 9. We outline targeted re-finements in usability, reliability, and portability for broader deployment.

16:00-18:00 Session 5B: SOICT Technical Session X: AI Applications
Chair:
Location: Ballrooom B
16:00
GRACE: A Knowledge Graph–Enhanced Conversational Recommendation System via Retrieval-Augmented Generation

ABSTRACT. We present GRACE (Graph-Reasoning Augmented Conversational Engine), a multi-stage conversational recommender system that integrates Large Language Models (LLMs) without costly fine-tuning by using a Retrieval-Augmented Generation (RAG) framework grounded in a knowledge graph. Unlike prior CRS approaches that rely on either shallow KG lookups or expensive fine-tuning, GRACE introduces a novel hybrid retriever that fuses three complementary strategies: (i) semantic similarity over plot embeddings, (ii) schema-aligned content filtering with LLM-extracted genres, and (iii) collaborative filtering via graph expansion along creator relationships. GRACE, evaluated on the ReDial and INSPIRED datasets, effectively addresses the performance–cost trade-off inherent in the existing methods. Without requiring any training, the framework achieves 0.062 Recall@1 and 0.302 Recall@10 on ReDial, as well as 0.116 Recall@1 and 0.307 Recall@10 on INSPIRED, consistently outperforms traditional CRS models, zero-/few-shot LLM baselines, and state-of-the-art knowledge-enhanced approaches. These results demonstrate that tightly coupling LLM reasoning with KG grounding yields a practical, scalable foundation for high-performance conversational recommendation. Our code is publicly available at: [https://github.com/DatPhan06/GRACE](https://github.com/DatPhan06/GRACE).

16:20
Effectiveness of Rolling-Sum Preprocessing in River Mouth Water Depth Prediction Using Machine Learning

ABSTRACT. Accurate prediction of river-mouth water levels is critical for flood preparedness in Japan’s steep catchments, where short response times heighten climate-related risks. Conventional hydrologic and statistical approaches demand dense input data and extensive calibration, while machine-learning models trained solely on raw station records often overlook prior flow conditions and deliver weak forecasts. This study tests a physically informed preprocessing method—rolling sums of upstream water depths—within the Niyodo River system. Daily observations from 14 upstream stations were used to compare models fed with raw data against those using rolling-sum features, spanning linear (Lasso, ElasticNet, Ridge) and ensemble (CatBoost, Extra Trees, LightGBM, XGBoost) algorithms under a chronological split. Incorporating rolling sums substantially increased predictive accuracy, raising the coefficient of determination from below 0.25 to around 0.65 and cutting root-mean-square error by more than 10 cm. Feature-importance analyses identified main-stem stations as key predictors, aligning with hydrological expectations. These results show that lightweight, domain-guided preprocessing can significantly enhance water-level forecasting in flood-prone basins.

16:40
Enhance Sequential Recommendation via Linear Recurrent Units

ABSTRACT. User interactions on online platforms naturally form sequences, making sequential recommendation essential for capturing user's preferences. Self-attention-based models have recently driven progress in this area, but their high computational cost and latency hinder real-time use. In contrast, lightweight models are efficient but often overlook fine-grained item order, which reduces accuracy. To address this gap, we propose PLRec, a sequential recommendation model based on efficiency linear recurrent architectures combined with a dual-task learning framework. PLRec employs linear recurrent units with a recursive parallelization strategy to enable incremental updates. By extending linear recurrent units with a recursive parallelization strategy, PLRec enables incremental updates while preserving sequential order information. The dual-task design incorporates category prediction as an auxiliary objective, which provides complementary supervision and strengthens preference modeling. Extensive experiments on multiple real-world datasets show that PLRec achieves state-of-the-art accuracy with significantly better efficiency on long user sequences, highlighting its potential for real-time and scalable personalized recommendation. Our code is publicly available at this site.

17:00
Aspect-Based Sentiment Analysis for Stock Price Movement Prediction

ABSTRACT. Forecasting short-term stock price movements remains a central challenge in financial prediction, particularly in emerging markets where volatility and data asymmetry complicate traditional approaches. This paper presents an integrated framework that combines Aspect-Based Sentiment Analysis (ABSA), fundamental indicators, and technical signals to improve market forecasting accuracy and interpretability. Using GPT-4o, we extract fine-grained sentiment on 33 finance-specific aspects from over 16,000 Vietnamese financial news articles and corporate disclosures, then merge these structured signals with technical indicators and firm-level fundamentals for model training. Experiments on 30 VN30 equities (2019–2024) show that gradient boosting models (notably XGBoost) achieve superior performance, with AUC up to 0.67, surpassing technical- and fundamental-only baselines. SHAP analyses highlight that ABSA-derived features enhance predictive stability, especially during neutral market conditions. Backtesting further demonstrates that probability-thresholded trading strategies deliver annualized returns exceeding both buy-and-hold and deep learning baselines. These findings underscore the value of disentangling what the market discusses from how it is discussed, and illustrate how multi-source fusion with modern NLP can provide scalable and explainable decision support for data-driven trading in emerging markets.

17:20
Tokenization in Protein Language Models: Methods, Taxonomy, and Applications

ABSTRACT. The application of natural language processing (NLP) to biological sequences is reshaping computational biology, particularly protein analysis. At the center of this paradigm lies tokenization the process of segmenting protein sequences into discrete units for language models. Unlike human language, protein lacks explicit delimiters, making tokenization both a critical and challenging design choice, with direct consequences for downstream performance. In this survey, we propose a systematic taxonomy of protein sequences tokenization methods and analyze their trade-offs in terms of biological interpretability, computational efficiency, and predictive performance. We also review evaluation metrics, summarize applications across a range of protein analysis tasks, and examine the interpretability of learned protein tokens. Our findings show that no single tokenization strategy dominates across all contexts; rather, the optimal choice depends on the biological objectives and computational constraints at hand.

16:00-18:00 Session 5C: SOICT Technical Session XI: Applied Operations Research and Optimization
Location: Yersin A
16:00
Non-Parametric Feature Combination For Explainable Credit Scoring

ABSTRACT. Credit scoring has become a cornerstone of modern financial risk assessment, enabling lenders to evaluate borrowers' creditworthiness with unprecedented precision. By transforming complex borrower data into standardized numerical scores, credit scoring predicts the likelihood of loan repayment, thereby reshaping lending practices across diverse industries. Recently, machine learning models have attracted scholars and stakeholders as their accuracy surpasses that of traditional models based on logistic regression and decision trees, which are simple, explainable, and adaptable. However, machine learning models lack interpretability, which leads to a barrier in their real-life implementation. In traditional model development, feature selection is widely utilized to eliminate low-discriminative features, while feature combination, which creates new features by integrating existing ones to capture latent relationships, remains underexplored in credit scoring field. This paper proposes a novel non-parametric approach to feature combination, designed to maximize the use of available features' information. The experimental results on benchmark datasets demonstrate the potential of this approach in enhancing the accuracy of credit scoring models, as the model using the combined features achieves the highest accuracy among the experimental models across all the datasets.

16:20
Deterministic one-pass streaming algorithm for non-monotone DR-submodular maximization under a size constraint

ABSTRACT. Submodular optimization on the integer lattice has recently attracted significant attention due to its ability to capture problems involving multiple occurrences of elements, with diverse applications in influence maximization, budget allocation, and data summarization. In this paper, we study the problem of maximizing a non-monotone DR-submodular function under a size constraint in a streaming setting, denoted as $\DrSMC$. We propose the first deterministic one-pass streaming algorithms designed for this problem that achieve a theoretical approximation guarantee of $\frac{1}{6}-\epsilon$, with query complexity $O\left(\tfrac{n \log^2 K}{\epsilon}\right)$ and space complexity $O\left(\tfrac{K \log K}{\epsilon}\right)$. Extensive experiments on the Revenue Maximization benchmarks demonstrate that our proposed methods consistently outperform existing baselines in terms of solution quality, query efficiency, and memory usage. These results establish the practicality and robustness of our approach for large-scale streaming applications.

16:40
DESW: Reducing Concentration in Proof-of-Stake with Dynamic Exponential Stake Weighting

ABSTRACT. In permissionless blockchains, Proof-of-Stake (PoS) selects validators in proportion to staked assets, securing the ledger by aligning incentives. However, this proportionality amplifies advantages for large validators through rewards, delegation, and liquid staking, leading to stake concentration and threatening decentralization. To address this issue, we introduce the Dynamic Exponential Stake Weighting (DESW) model, grounded in the principle of weighted probability distribution and equilibrium theory. DESW defines an adaptive validator selection rule that rebalances stake weights according to network-wide inequality, measured by the Gini coefficient. By employing weights in the value domain, DESW reduces the influence of oversized validators as centralization increases, without altering rewards, penalties, or protocol flow. Thus, DESW directly addresses the “rich-get-richer” dynamic in PoS and enhances decentralization. We evaluate DESW across three scenarios that are representative of public PoS networks: (i) stable distributions, (ii) large high-turnover systems, and (iii) adversarial stake injections. Relative to baseline PoS, DESW reduces the Gini coefficient from 0.623 to 0.318 (-48.9%), from 0.876 to 0.501 (-42.8%), and from 0.945 to 0.462 (-52.1%), respectively. Simultaneously, it increases the Nakamoto coefficient from 134 to 303 (+126.1%), from 384 to 2138 (+456.8%), and from 9 to 196 (+2,077.8%), respectively. When applied to Ethereum’s validator set, DESW would raise the Nakamoto coefficient from 3 to 391 (about 130×) and lower Gini from 0.99 to 0.10. These results indicate that DESW strengthens decentralization and fairness while preserving PoS efficiency, offering a practical drop-in improvement for networks.

17:00
Balancing Efficiency and Fairness in the Integrated Truck–Drone Dispatching Problem with Dynamic Endurance via Pareto Front Grid Guided Multi-objective Optimization

ABSTRACT. The coordination of trucks and drones in logistics systems offers new opportunities for cost-effective and sustainable logistics operations while enhancing customer satisfaction, yet significant challenges arise from heterogeneous vehicle capabilities and the payload-dependent endurance of drones. This paper introduces Integrated Truck–Drone Dispatching Problem with Dynamic Endurance (ITDDPDE), a novel multi-objective formulation that jointly schedules a fleet of trucks and drones, capturing two realistic features of drone operations: (i) dynamic endurance that depends on the payload carried, and (ii) flexible launch and retrieval points along truck routes. The problem simultaneously minimizes three objectives: total operational cost, fairness across customers, and fairness among vehicles. To tackle the ITDDPDE problem, we propose a multi-objective evolutionary algorithm based on Pareto front grid decomposition, incorporating two-level encoding and a heuristic mechanism to determine drone launch and landing points. Numerical experiments on diverse benchmark scenarios demonstrate that the proposed algorithm significantly outperforms state-of-the-art multi-objective algorithms in terms of Hypervolume and Inverted Generational Distance.

17:20
Budgeted Object Detection via Online Submodular Approximation Algorithm

ABSTRACT. We propose two online algorithms, $\ST$ and $\IST$, for object detection via a reduction to the Submodular Subset Selection problem under a budget constraint. $\ST$ provides a baseline framework, while $\IST$ improves it with an approximation ratio of $1/2$ using only $5n$ queries and $\mathcal{O}(n)$ time. Our methods combine attribution techniques with online approximation to handle streaming inputs efficiently, achieving near-Greedy quality with significant runtime and memory savings. Theoretical guarantees and analysis confirm their practicality for large-scale or real-time scenarios.

16:00-18:00 Session 5D: SOICT Technical Session XII: Networking and Communication Technologies,Software Engineering
Location: Yersin B
16:00
A Lightweight and Robust Framework for Waveform Classification Using Dynamic Warping and State-Space Models

ABSTRACT. Accurate and efficient waveform classification is a critical challenge in dense radio frequency environments. While deep learning offers solutions, many models struggle to balance high performance with computational constraints. This paper introduces the Warping State Space Model (WarpSSM), a lightweight and robust framework designed to address this trade-off. Our approach processes spectrograms using two novel components: the Dynamic Adaptive Warp block to mitigate channel-induced geometric distortions, and the Visual State Space Block with Directional Selective Fusion to capture long-range dependencies. Evaluated on a synthetic dataset of 12 waveforms under realistic channel degradation, WarpSSM achieves a state-of-the-art average accuracy of 90.61%. It demonstrates exceptional efficiency, with an inference latency of 0.55 ms from a model of only 42.2K parameters.

16:20
Channel-Aware Power and Rate Control for UOWC with DRL and HARQ Integration

ABSTRACT. Underwater optical wireless communication (UOWC) has difficulties in fulfilling ultra-reliable low-latency communication (URLLC) standards owing to channel distortions, including absorption, scattering, and oceanic turbulence. This paper presents a deep reinforcement learning (DRL) approach utilizing proximal policy optimization (PPO) to concurrently adjust transmit power and coding rate in a point-to-point UOWC system employing hybrid automatic repeat request (HARQ) protocols chase combining (CC-HARQ) and incremental redundancy (IR-HARQ) capitalizing on statistical channel information and signal-to-noise ratio feedback, structured as a Markov decision process (MDP) with rewards that penalize power consumption and delay infractions. By reducing the long-term average power while adhering to stringent delay constraints (e.g., 99.9% dependability at 13 dBm in pristine marine conditions), the approach enables energy-efficient and dependable UOWC for Beyond 5G (B5G) and 6G applications, such as ocean monitoring.

16:40
Threshold-based AP Filtering and Distance Measure Analysis for K-means Clustering in WiFi Fingerprinting-based Indoor Localization System

ABSTRACT. WiFi Fingerprinting is widely used in indoor localization due to its cost-effectiveness. However, scalability remains a challenging problem. For large WiFi fingerprint datasets, clustering reduces the computational burden by limiting the search space during matching. K-means is a commonly utilized clustering algorithm valued for its speed and simplicity. This paper examines the effectiveness of the threshold-based access point (AP) filtering, which aims to choose valuable APs, and six different distance measures for k-means clustering, which can optimize the clustering process. To the best of our knowledge, this is the first time various distance measures are considered in k-means clustering in WiFi Fingerprinting-based localization systems to evaluate their effectiveness. We conduct experiments on five public datasets with three AP filtering strategies. The experimental results show that the AP filtering significantly affects the localization performance. Using the Søren-sen distance measure, compared to the original K-Nearest Neighbors meth-od, k-means shows less than a 2% increase in average localization error while achieving over a 98% reduction in average computation time. The analysis shows the scalability potential of the k-means algorithm and its effectiveness in improving the WiFi Fingerprinting technique.

17:00
A Bounded Model Checking Approach for Verifying OSEK/VDX Applications

ABSTRACT. We propose a bounded model checking technique for verifying applications running on the OSEK/VDX operating system (OS). Our approach encodes task behaviors into logical formulas by exploring the control-flow graphs of tasks in the application, following the scheduling policy of the OSEK/VDX OS, and the API function calls by unrolling the program with a bound. The formula is then solved using an SAT/SMT solver to detect potential bugs in the applications (within the specified bounds). The method is implemented as an extension of the CBMC model checker. Several experiments were conducted to demonstrate the accuracy and performance of our method.

17:20
UAV-Based Target Terminal Search System for Emergency Rescue

ABSTRACT. It is very difficult to obtain accurate location information of a target terminal owned by a man-in-distress. Especially, if a target terminal is a legacy mobile phone or located in a GPS (Global Positioning System)-denied area, it is generally possible to obtain the cell information where a target terminal is located. In this case, search time for a target terminal becomes too long to search a man-in-distress within golden time.

In this paper, a novel search system is presented to find the accurate location of a target terminal owned by a man-in-distress. In the proposed system, a UAV (Unmanned Aerial Vehicle) is equipped with SME (Signal Measurement Equipment), which measures the signal transmitted from a target terminal. UAV navigates and changes its direction based on the AoA (Angle of Arrival) of the measured signal. To evaluate the performance of the proposed system, a simulator was built. The simulation results show that accurate location information can be obtained within golden time in most cases.