ACIIDS2026: 18TH ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS
PROGRAM FOR TUESDAY, APRIL 14TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 7A: Data Manipulation, Processing and Integration
09:00
A Semantic Approach for Legal Norm Representation with Explicit Scope and Deontic Modality

ABSTRACT. The formal representation of legal norms remains a major challenge for semantic legal information systems, particularly with respect to the explicit modeling of the scope of applicability and deontic modality. Existing approaches to legal knowledge representation, including ontology-based standards such as OWL, LKIF-Core, and LegalRuleML, provide valuable conceptual foundations but lack a uniform and explicit treatment of normative scope across subject, object, spatial, temporal, and situational dimensions. As a result, reconstructing normative structures from legal texts and applying them to concrete legal situations remains difficult and error-prone.

This paper introduces a semantic metamodel for legal norm representation that explicitly captures normative scope and deontic modality as first-class semantic constructs. The proposed metamodel defines a set of formal categories for normative elements, including conditions of applicability, scope constraints, normative effects, and deontic operators, while maintaining interoperability with external domain ontologies. By separating normative structure from domain-specific knowledge, the model enables consistent interpretation and reuse of legal norms across different application contexts.

09:20
FLAT: A Method for Accurate Evaluation of Semantic Preservation in Conceptual Model Transformations

ABSTRACT. This paper introduces a method for evaluating the semantic fidelity of transformations between conceptual metamodels. Using a semi-formal system, we define FLAT method for quantification of enforcment, loss, alteration, and preservation (translation) of meaning through formal metrics. The method is validated by evaluating mappings from an Association-Oriented Metamodel (AOM) to UML, EER, and ORM, demonstrating its effectiveness in identifying semantic discrepancies.

09:40
Secure medical text classification through decentralized Federated Learning using Ensemble Learning

ABSTRACT. Medical text classification is crucial in healthcare but the complexity of medical terminology, the issue of polysemy, confidentiality of patient data and the need for an accurate and robust classification model pose significant challenges. This paper introduces a novel approach designed to overcome these obstacles. We propose an approach that combines advanced text representation techniques, ensemble learning with deep learning models and federated learning to address these problems. Initially, our approach combines advanced language models like GPT and BERT, enabling a thorough analysis of complex medical texts. Then, to further improve the classification performance, we implemented an ensemble learning strategy, leveraging the strengths of multiple deep learning models. Crucially, to ensure data privacy, we incorporated a federated learning framework. This allows multiple hospitals to collaboratively train ensemble learning models without sharing sensitive patient data, thus adhering to data protection regulations. Our comprehensive experiments on a medical text dataset demonstrate that our proposed approach significantly outperforms existing methodologies across key performance metrics, highlighting the effectiveness of our integrated framework in addressing the challenges of medical text classification, particularly in ensuring the security and privacy of patient data.

09:00-10:00 Session 7B: Towards an Understanding of Artificial Intelligence: Bridging Theory, Explainability, and Practical Applications
Location: BANQUET HALL
09:00
Shapley Pruning for Neural Network Compression

ABSTRACT. Neural network pruning is a rich field with a variety of approaches. In this work, we propose to connect the existing pruning concepts such as leave-one-out pruning and oracle pruning and develop them into a more general Shapley value-based framework that targets the compression of convolutional neural networks. To allow for practical applications in utilizing the Shapley value, this work presents the Shapley value approximations, and performs the comparative analysis in terms of cost-benefit utility for the neural network compression. The proposed ranks are evaluated against a new benchmark, Oracle rank, constructed based on oracle sets. The broad experiments show that the proposed normative ranking and its approximations show practical results, obtaining state-of-the-art network compression.

09:20
Explaining LSTM Battery RUL Prediction via Temporal Attribution

ABSTRACT. Accurate prediction of battery Remaining Useful Life (RUL) is essential for the safe operation of industrial systems, such as Autonomous Guided Vehicles (AGVs). Although deep learning has advanced in RUL prediction, these models often operate as black boxes, limiting operator trust, regulatory compliance, and root-cause analysis. Prior research provides overall feature attributions for RUL prediction but lacks temporal attribution, making it difficult to identify which historical cycles or operating intervals influence the predicted failure time. This study introduces the first application of TimeSHAP, a temporal extension of SHAP values for recurrent neural networks, to battery RUL prediction. The proposed methodology offers three levels of explanation: event-level attribution to identify influential past charge/discharge cycles, feature-level attribution to monitor the evolution of sensor importance (voltage, temperature, current) toward end-of-life, and cell-level attribution to pinpoint specific measurements at specific cycles. Additionally, sequence pruning determines the minimum historical window required for accurate forecasting. Experiments on the NASA PCoE battery dataset show that the Attention-LSTM model achieves an RMSE of 0.0599 in normalized RUL space (corresponding to approximately 7.8 cycles given the test battery’s 130-cycle lifespan), while TimeSHAP demonstrates that 16.9% of historical cycles can be pruned while maintaining predictions within 5% relative error. Temporal features (discharge time, time-to-voltage thresholds) and voltage curve characteristics dominate importance throughout the lifecycle, whereas temperature-derivative features maintain moderate, stable importance.

09:40
An Adaptive Resampling Hyperparameter Prediction Framework for Improved Classification on Imbalanced Datasets based on Self-Feature Analysis
PRESENTER: Wei-Tsung Su

ABSTRACT. Imbalanced datasets often lead to biased classifiers and suboptimal generalization performance. To address this challenge, this study proposes a two-stage resampling hyperparameter prediction framework that automatically determines near-optimal resampling configurations based on dataset characteristics. In the training stage, a large collection of imbalanced datasets is compiled, from which self-features are extracted. Pre-sampling and classification are performed across a predefined hyperparameter space, and the resulting performance metrics—accuracy, sensitivity, and specificity—are used to train a Random Forest regression model capable of predicting continuous resampling hyperparameters. In the deployment stage, self-features from a new dataset are fed into the trained model to estimate suitable resampling parameters, which are subsequently used to balance the dataset and train the final classifier. Experimental results demonstrate that the proposed framework effectively reduces manual tuning effort while achieving competitive or improved classification performance compared to traditional resampling approaches. This work provides a practical and scalable solution for automated handling of imbalanced datasets.

09:00-10:00 Session 7C: Advances in Natural Language, Speech, and Multimodal Processing
Location: UNICORN HALL
09:00
Data-driven model transformation of Resource-Constrained Problems in Production and Logistics
PRESENTER: Paweł Sitek

ABSTRACT. Resource-constrained decision and optimization problems in production and logistics are characterized by combinatorial complexity, heterogeneous con-straints, and dynamic environments. Traditional declarative modeling offers expressive representations of such problems, but their direct formulations of-ten result in intractable models. Building upon the research on hybrid declar-ative approaches, this paper presents a data-driven model transformation ap-proach that systematically restructures resource-constrained problems before solution. This approach combines presolving techniques with data-driven analysis of problem structures to detect redundancies, reformulate con-straints, and generate solver-efficient models. The transformation process can be applied to representative classes of production scheduling and logis-tics problems, where experiments confirm significant improvements in com-putational efficiency and scalability, while preserving solution accuracy. The paper proposes an original production optimization model with a data-driven model transformation approach for a real crayon factory. The results extend earlier work on model-driven problem solving by demonstrating the effec-tiveness of data-driven transformation strategies.

09:20
TagFill: Leveraging LLMs for Privacy-Preserving Administrative Form Filling via Semantic Tagging
PRESENTER: Tien Huy Nguyen

ABSTRACT. With the rapid progress of digital transformation, the manual completion of administrative forms remains inefficient and error prone, creating significant challenges for public service delivery. Large Language Models (LLMs) offer strong potential for structured text generation, yet their adoption in this domain is hindered by concerns over privacy and reliability. To address these issues, we introduce the TagFill System, which automatically replaces blank fields in Vietnamese administrative forms with semantically meaningful tags (e.g., [user1_full_name]) rather than real values, thereby preserving user privacy. Our approach combines in-context learning with a post-processing pipeline designed to enhance reliability and reduce common LLM errors. We further construct both synthetic and real-world datasets to facilitate comprehensive evaluation. Experimental results with Gemini-2.0-Flash and GPT-4o-mini show that nearly 70% of fields can be correctly tagged, with over 95% accuracy on predefined fields. These findings demonstrate the feasibility of applying LLMs to privacy-preserving form automation and lay the groundwork for future integration with value mapping in public service workflows.

09:40
Early-Stage Integration of Attention Mechanisms in OmniPose for Human Pose Estimation
PRESENTER: Khac Anh Phu

ABSTRACT. Human pose estimation is one of the most high-profile themes in the community of computer vision researchers. However, it is noted that a crucial evaluation benchmark is still very difficult and challenging due to large variation in poses, number of persons interacting, and frequent occlusions. Recently, attention mod-ules like SE, CBAM, and ELA have shown great potential for feature extraction improvements particularly for HPE models and more broadly for computer vision tasks. This paper studies the effects that integrating these attention modules at primary levels of feature extraction in the OmniPose model brings about. Three variants of OmniPose with SEBlock, CBAM, and ELA included are described, which were trained under exactly similar conditions to make a fair comparison among them. SE yields the best overall accuracy, ELA balances accuracy and training efficiency at larger batch sizes, while CBAM provides improved stability but sacrifices final accuracy. Such results underscore the fact that the effective-ness of attention mechanisms largely varies with where they are positioned within the architecture of a model, rather than how they are designed internally. This work initiates a shifted mindset for the research community towards laying a foundation for more appropriately designed attention mechanisms and thus open-ing up their usage to disparate models for HPE at various integration levels.

10:00-10:20Coffee Break
10:20-12:00 Session 8: Keynote session III
Location: BANQUET HALL
10:20
Composing Music by Evolutionary Computation

ABSTRACT. Composing music is a creative yet challenging task, in that the process requires careful consideration of various musical elements such as pitch, rhythm, chords, timbre, form, and accompaniment. Recently, algorithmic composition using computational intelligence techniques has received significant attention. In particular, evolutionary computation is widely used in creation of music and has shown promising results. This talk will introduce the designs and applications of evolutionary computation for music composition. Furthermore, I will present our recent work and discuss the research challenges and future directions.

11:05
Towards organically working AI

ABSTRACT. Starting from the philosophical inspirations from Western countries, Central Europe, and the Far East, I'd like to go through the current situation in politics, business, science, and education, considering AI. What we do, what we don't do, and what we should think about in the near future. This is, of course, a personal view on that matter; however, it might be a good start for the discussion and give an insight into the current trends in technology and science, going beyond all that we imagined before 2017, when we did not know that all we lacked is attention.

12:00-13:20Lunch Break
13:00-14:00 Session 9A: Intelligent Data Processing and Human-Centered AI Applications
Location: Online room 1
13:00
On Discovering High Utility Core Co-Location Patterns From Spatial Datasets
PRESENTER: Vanha Tran

ABSTRACT. High-utility co-location patterns (HUCPs) can effectively reveal valuable knowledge about spatial features and instances hidden in spatial datasets. However, current HUCPs do not consider user reference features. For example, users may be concerned with which facilities within 500 meters of a school form a HUCP, but they are not concerned with the utility of the school itself. Therefore, the utility of the school should not be considered when calculating the utility value of the pattern. To address the shortcomings of traditional HUCPs, this work proposes mining high-utility core co-location patterns (HUCCPs), which treat user reference features as core features. Measuring the interestingness of HUCCPs still does not satisfy the downward closure property. If the generating-testing candidate-based methods are used, the candidate search space will be large, resulting in low mining efficiency. This paper adopts a clique-query-based mining method to improve mining efficiency. First, maximal cliques are enumerated from the neighboring instances graph, and then the maximal cliques are combined into a compact hash structure. The keys of this hash serve as the initial candidates. Only those candidates that contain core features are considered. The utility value of a candidate is calculated by querying the hash value, thereby improving the mining efficiency. Extensive experiments were conducted on the real Yelp dataset, and the experimental results demonstrate the effectiveness and efficiency of the proposed method.

13:20
Global Classification from Heterogeneous Sources Using Neighborhood Rough Decision Trees. A Comparative Study of Fusion Strategies

ABSTRACT. The paper introduces a novel two-stage hierarchical framework for global classification from heterogeneous sources. In the first stage, local decision trees based on the CART algorithm generate probabilistic predictions for a shared validation set. These predictions are then fused using two alternative strategies: concatenation model, which preserves source-level distinctions, and averaging model, which emphasizes aggregations. In the second stage, a Neighborhood Rough Decision Tree (NRDT) serves as the global classifier. Unlike conventional decision trees, NRDT leverages rough set theory and adaptive neighborhood relations to handle continuous, probabilistic features without discretization, producing simpler and more interpretable rules. Experiments conducted on dispersed versions of three UCI datasets - Vehicle Silhouettes, Soybean (Large), and Lymphography - demonstrate that both fusion strategies achieve competitive performance across accuracy, F-measure, and balanced metrics, with no statistically significant difference between them. The proposed approach offers a robust, interpretable, and privacy-preserving solution for synthesizing knowledge in fragmented data environment.

13:40
Adapting Emotional Expressiveness in Text-to-Speech for Low-Resource Languages
PRESENTER: Luong Ho

ABSTRACT. Emotional text-to-speech (TTS) has advanced rapidly in recent years, yet building such systems for languages without emotional data remains a fundamental challenge. The scarcity of annotated emotional corpora makes it difficult to model expressive prosody and transfer emotional nuances, particularly in low-resource languages. To address this problem, we propose a cross-lingual emotional TTS framework that transfers emotional expressiveness from a source language with rich emotional resources to a target language with scarce emotional data. Our approach builds on a VITS-based architecture and introduces three key innovations. First, we condition the model with emotion embeddings and zero-shot speaker embeddings, enabling expressive and speaker-specific synthesis across languages. Second, we adopt a dual-tokenizer design that maintains distinct phoneme vocabularies for the source and target languages, thereby reducing accent leakage and improving pronunciation robustness in cross-lingual adaptation. Finally, to enhance emotion controllability, we incorporate an auxiliary emotion classification objective. Experimental results show that our method synthesizes emotionally expressive speech in the target language despite the absence of target-language emotional data, with naturalness and emotional clarity comparable to systems trained on fully emotional corpora. This work demonstrates a practical and effective pathway for developing emotional TTS in low-resource settings.

13:00-14:00 Session 9B: Advanced NLP and Cognitive Architectures for Ethical and Multi-Agent AI Systems
Location: Online room 2
13:00
Advancing Arabic Named Entity Recognition in Invoices with YOLOv11, BERT, Active Learning, and LLM Integration
PRESENTER: Hassen Mahdhaoui

ABSTRACT. In the fields of Computer Vision and Natural Language Pro- cessing (NLP), there has been significant progress in automating tasks such as text region detection and Named Entity Recognition (NER). This research integrates active learning with deep learning models to enhance the efficiency of these tasks for Arabic invoice documents. For text re- gion detection, we employed YOLOv11, the latest iteration of the YOLO object detection model, while AraBERT, a transformer-based model pre- trained on large Arabic corpora, was used for NER. By incorporating ac- tive learning, we minimized the annotation burden by selecting the most informative data points for manual labeling. In the text region detection task, the model achieved a mean Average Precision (mAP) of 93.42%, and AraBERT achieved an F1 score of 89.36% for detecting key entities such as persons, organizations, locations, dates, and prices. The results demonstrate that active learning can significantly reduce the amount of labeled data required while maintaining high performance, showcasing its potential for real-world applications in Arabic document processing. This approach opens avenues for more efficient and scalable document analysis, reducing the need for extensive human annotation.

13:20
Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring
PRESENTER: Jakub Masłowski

ABSTRACT. Large Language Models (LLMs) are being increasingly used as autonomous agents in complex reasoning tasks, opening the niche for dialectical interactions. However, Multi-Agent systems implemented with systematically unconstrained systems systematically undergo semantic drift and logical deterioration and thus can hardly be used in providing ethical tutoring where a precise answer is required. Current simulation often tends to degenerate into dialectical stagnation, the agents degenerate into recursive concurrence or circular arguments. A critical challenge remains: how to enforce doctrinal fidelity without suppressing the generative flexibility required for dialectical reasoning? To address this niche, we contribute the Heterogeneous Debate Engine (HDE), a cognitive architecture that combines Identity-Grounded Retrieval-Augmented Generation (ID-RAG) for doctrinal fidelity and Heuristic Theory of Mind for strategic opponent modeling. Our evaluation shows that architectural heterogeneity is a crucial variable to stability: contrary doctrinal initializations (e.g., Deontology vs. Utilitarianism) have increased the Argument Complexity Scores of students by an order of magnitude, over baselines. These findings validate the effectiveness of ID-RAG and Heuristic ToM as architectural requirements in maintaining high-fidelity (adversarial) pedagogy.

13:40
Multi-Agent Dialectical Refinement for Enhanced Argument Classification
PRESENTER: Jakub Bąba

ABSTRACT. Argument Mining (AM) is a foundational technology for automated writing evaluation, yet traditional supervised approaches rely heavily on expensive, domain-specific fine-tuning. While Large Language Models (LLMs) offer a training-free alternative, they often struggle with structural ambiguity, failing to distinguish between similar components like Claims and Premises. Furthermore, single-agent self-correction mechanisms often suffer from sycophancy, where the model reinforces its own initial errors rather than critically evaluating them. We introduce MAD-ACC (Multi-Agent Debate for Argument Component Classification), a framework that leverages dialectical refinement to resolve classification uncertainty. MAD-ACC utilizes a Proponent-Opponent-Judge model where agents defend conflicting interpretations of ambiguous text, exposing logical nuances that single-agent models miss. Evaluation on the UKP Student Essays corpus demonstrates that MAD-ACC achieves a Macro F1 score of 85.7\%, significantly outperforming single-agent reasoning baselines, without requiring domain-specific training. Additionally, unlike "black-box" classifiers, MAD-ACC's dialectical approach offers a transparent and explainable alternative by generating human-readable debate transcripts that explain the reasoning behind decisions.

13:20-14:20 Session 10A: Computer Vision and Intelligent Systems
13:20
Improving Skin Cancer Diagnosis via Deep Metric Learning, Center-Based Down Sampling, and Test-Time Augmentation

ABSTRACT. Skin cancer is one of the most common cancers worldwide, with melanoma being the subtype associated with the highest mortality rate, even though it is highly curable when detected in its early stages, which makes achieving reliable automated diagnosis especially important. Severe class imbalance, large intra-class variation, and strong visual similarity across cancer categories limit the effectiveness of deep learning models, often causing even advanced architectures to degrade on rare or ambiguous classes. We suggest a unified end-to-end approach that improves sample representativeness, stabilizes inference, and increases feature discriminability in order to overcome these difficulties. Fundamentally, the approach makes use of Deep Metric Learning, which captures the impact of Center Loss in encouraging compact intra-class structure and more distinct inter-class boundaries. We also present Center-Based Down Sampling to choose representative samples close to class centers and minimize redundancy and Test-Time Augmentation to enhance prediction stability via multi-view inference. These components consistently provide complementary improvements, as demonstrated by extensive evaluations across various training configurations, ablation settings, and comparisons with recent state-of-the-art architectures. The complete method achieves 91.16\% precision, 88.37\% recall, and 89.68\% F$_1$-score. Overall, more compact embeddings and more accurate predictions across a variety of cancer categories are produced by combining Deep Metric Learning, Center-Based Down Sampling, and Test-Time Augmentation. The our method also achieves competitive performance relative to existing systems such as ConvNeXt-ST-AFF, EFAM-Net, and Enhanced-MobileNet while delivering more reliable per-class behavior.

13:40
Integrating Large Language Models with Symbolic Reasoning for Knowledge Graph–Driven Legal Query System
PRESENTER: Hien Nguyen

ABSTRACT. The need to search and exploit legal information is always necessary and increasing, requiring more efficient, accurate and accessible query support systems. The key factor in building such systems is the ability to organize legal knowledge in a comprehensive and structured way, while understanding and accurately interpreting users' questions. This study proposes a method to integrate Large Language Models into the process of building knowledge graphs to represent, standardize and organize legal information. It allows the automation of the process of extracting and structuring data from legal documents, while combining the language understanding and generation power of LLM and the symbolic reasoning ability on graphs to enhance the ability to query, infer and generate natural responses for users. Experimental results show that the solution has significant potential in developing intelligent legal question-and-answer systems, contributing to improving the quality of knowledge retrieval and user experience.

14:00
A parallel U-Net concept for image recontextualization
PRESENTER: Maciej Zięba

ABSTRACT. Image recontextualization involves transforming an image from its original setting to a new context while preserving the details of the subject, including broad applications in media, marketing, and e-commerce. Recent advances in deep generative modeling, particularly diffusion-based approaches, have significantly improved recontextualization techniques. However, existing methods such as IP-Adapter and Magic Clothing struggle with balancing contextual integration and precise control over image generation, while HyperLoRA suffers from information loss due to CLIP-based encoding. On the other hand, fine-tuning methods like Dreambooth or textual inversion require time-consuming adaptation for a single subject and struggle to preserve the details of the item. In this work, we introduce Parallel U-Net for Image Recontextualization (PUIR), a novel approach that leverages dual U-Net architectures, namely, one U-Net is dedicated to extracting features from the conditioning image and another U-Net allows generating the target image in a new context. By employing self-attention layers to fuse parallel features from both U-Nets, PUIR enhances detail preservation and improves image fidelity. Moreover, we develop a method for enabling multi-subject recontextualization while requiring only single-item training pairs. During the denoising, we propose to use a novel noise-mixing strategy together with a masking approach to generate the target image composed of many subjects. We demonstrate the effectiveness of PUIR in fashion applications. Our experiments show that PUIR outperforms HyperLoRA in maintaining item-specific details and overall image quality.

13:20-14:20 Session 10B: Advanced Data Mining Techniques and Applications
Location: BANQUET HALL
13:20
Voting Rule-Based Framework for Multi-Label Emotion Detection
PRESENTER: Minh Hieu Le

ABSTRACT. Emotion detection is an essential tool for gaining insights into user or customer feedback on products, whether it is through social media comments, product reviews, or news discussions. However, accurately identifying emotions in text is a challenging task due to the complexity and variability of human language. To address this challenge, machine learning and data analysis techniques are often employed. This paper presents an enhanced voting rule-based framework that systematically combines predictions from six transformer models through adaptive fusion strategies. By combining the outputs of multiple BERT models through hierarchical decision rules, this approach leverages the complementary strengths of diverse transformer architectures to enhance overall classification accuracy. The voting mechanism systematically reduces prediction errors and model-specific biases that individual models might introduce, ensuring that the final prediction is more reliable by reflecting the consensus of multiple models rather than relying on a single model. Experimental evaluation on the SemEval 2025 Task 11 dataset demonstrates our proposal has achieved Macro F1 of 0.7462 and Micro F1 of 0.7770, consistent improvements over individual transformer models (Macro F1 is up 1.22 percentage points over the best performing DeBERTa) and substantial gains over conventional ensemble approaches (9.67 percentage points over majority voting). These results demonstrate that aggregating transformer models through our efficient voting mechanism with hierarchical decision rules is a highly effective strategy for improving emotion detection accuracy in natural language processing tasks and is readily adaptable to other emotion detection methods.

13:40
Erasable Itemset Mining with Itemset-Range Constraints by Utilizing Bit Vectors and Data Shrinking
PRESENTER: Jyun Lin

ABSTRACT. Erasable itemset mining usually assists users in analyzing the production planning of factories. While the financial crisis occurs, manufacturing factories may stop the production lines of products with less profit to minimize the effect on the overall profit. Discovering specific itemsets is one of the fundamental problems in data mining, as different types of constraints in erasable itemset mining are used to find erasable itemsets that better fulfill users’ needs. In this paper, erasable itemset mining with itemset-range constraints was proposed. The conditional requirement for assigning subsets and supersets from users is used to avoid the unrelated patterns generated. To overcome this issue, the proposed method adopts the mechanisms of bit vectors and the implementation of data shrinking. The former encodes material usage of products into bit vectors, accelerating the calculation of the gain of each itemset. The subset and superset constraints are considered to eliminate the generation of unnecessary candidates during the mining process. Besides, to improve the mining operation, three strategies of data-shrinking are adapted to compact the size of the dataset in the latter. Experiments are conducted on multiple datasets to assess the performance of the proposed approach. Experimental results indicate that the proposed approach outperforms the previous work in the of running time and consumed memory, especially when the real datasets are enlarged in multiples, which simulates the batch production of products.

14:00
Federated Two-Phase High-Utility Mining
PRESENTER: Tzung-Pei Hong

ABSTRACT. In the era of big data, securely and effectively mining data from a distributed data environment has become a critical issue. It is a significant challenge to balance data privacy and the utility of utility mining results. Federated learning is a well-known learning framework that balances data privacy and utility in decentralized environments. In utility mining problems, how to make a balance between data privacy and utility in decentralized environments. This paper addresses the federated utility mining problem in decentralized environments. We propose an effective horizontal-based federated utility mining algorithm, FTPHUIM, that transmits local candidate high-utility itemsets, as known high transaction-weighted utility itemsets, rather than the original transaction data to the server, enhancing data privacy. The server aggregates these local candidates to generate global high-utility itemsets. By leveraging decentralized computing resources while preserving data privacy, the proposed algorithm significantly reduces computational and communication overheads, achieving remarkable efficiency gains. The results of the numerical experiment demonstrate that the proposed FTPHUIM algorithm is more effective than the previous work on the real-world datasets.

14:20-14:40Coffee Break
14:40-15:40 Session 11: Poster session II
Location: Hallway
Efficient Logic Gate Networks for Video Copy Detection

ABSTRACT. Video copy detection requires robust similarity estimation under diverse visual distortions while operating at very large scale. Although deep neural networks achieve strong performance, their computational cost and descriptor size limit practical deployment in high-throughput systems. In this work, we propose a video copy detection framework based on differentiable Logic Gate Networks (LGNs), which replace conventional floating-point feature extractors with compact, logic-based representations. Our approach combines aggressive frame miniaturization, binary preprocessing, and a trainable LGN embedding model that learns both logical operations and interconnections. After training, the model can be discretized into a purely Boolean circuit, enabling extremely fast and memory-efficient inference. We systematically evaluate different similarity strategies, binarization schemes, and LGN architectures across multiple dataset folds and difficulty levels. Experimental results demonstrate that LGN-based models achieve competitive or superior accuracy and ranking performance compared to prior models, while producing descriptors several orders of magnitude smaller and delivering inference speeds exceeding 11k samples per second. These findings indicate that logic-based models offer a promising alternative for scalable and resource-efficient video copy detection.

Evaluation of Large Language Models in Difficult Spelling Error Detection in Polish Parliamentary Transcripts

ABSTRACT. This paper investigates the applicability of large language models (LLMs) to the task of automatic error detection and correction in the Polish Parliamentary Corpus. Building on extensive evaluation data collected during manual proofreading, the study focuses on a challenging error category: in-vocabulary words used in incorrect contexts, which are difficult to detect both for humans and automated systems. A curated dataset of 300 paragraphs, reflecting the real-world imbalance between erroneous and error-free texts, is used to evaluate eleven state-of-the-art proprietary and open-source LLMs under prompts with varying levels of constraints. Performance is assessed using relaxed and strict precision metrics, recall, and a detailed error-type taxonomy that emphasizes the cost of false positives. The results demonstrate that stronger prompt constraints significantly reduce overcorrection, with some mid-sized models outperforming larger ones by being more conservative. The study highlights the importance of cautious correction strategies and confirms the value of authentic error data for evaluating LLMs in corpus maintenance tasks.

Evaluating the Impact of External Perturbations on Human Gait Dynamics Using Entropy Measures

ABSTRACT. The primary objective of this study was to determine and compare multiscale entropy values for gait data representing sagittal plane joint angles of the lower limb and pelvis, recorded in the CAREN Extended environment under three conditions: in the presence of audio-visual disturbances, mechanical perturbations, and without disturbances. The research group consisted of 14 young women. The findings demonstrated that multiscale entropy measures effectively distinguish gait with mechanical perturbations from natural gait and gait with mild audio-visual disturbances. A statistical analysis of the computed results was performed, including an assessment of the effect size of the observed differences.

Onto-DP: Constructing neighborhoods for differential privacy on ontological databases
PRESENTER: Yasmine Hayder

ABSTRACT. In this paper, we investigate how attackers can discover sensitive information embedded within databases by exploiting inference rules. We demonstrate the inadequacy of naively applied existing state of the art differential privacy (DP) models in safeguarding against such attacks.

We introduce ontology aware differential privacy (Onto-DP), a novel extension of differential privacy paradigms built on top of any classical DP model by enriching it with semantic awareness. We show that this extension is a sufficient condition to adequately protect against attackers aware of inference rules.

MAP-Seg: Semi-Parametric Prototype Learning for Few-Shot Logging Detection in Aerial Imagery
PRESENTER: Pi Wei Chen

ABSTRACT. Few-shot semantic segmentation is a promising approach for agricultural monitoring scenarios where dense annotations are expensive, yet existing methods often rely on opaque encoder–decoder models or manual clustering pipelines, limiting reliability in real-world systems. We propose MAP-Seg, a prototype-based few-shot segmentation framework for rice lodging detection from aerial imagery, built upon a frozen self-supervised vision foundation model (DINO). MAP-Seg reformulates clustering as a differentiable, end-to-end learnable process through learnable prototypes and a cluster selector, eliminating the non-differentiability and manual intervention of traditional K-means–based workflows. Lightweight feature adaptation and multi-scale representations are further incorporated to address domain shift while maintaining parameter efficiency. By producing segmentation through a structured set of prototypes, MAP-Seg offers a more interpretable and controllable intermediate representation, which is desirable for trustworthy deployment in agricultural cyber-physical systems. Experiments on real-world aerial datasets demonstrate consistent improvements over conventional encoder–decoder methods under strict few-shot settings.

A Comparison of Feature Selection using SHAP-Values for Tree and Non-tree-based Models
PRESENTER: Harald Rietdijk

ABSTRACT. In healthcare, clinical trial datasets are often high-dimensional and have small sample sizes, increasing the risk of overfitting. Furthermore, research in this context usually focuses not only on the accuracy of the resulting models but also on their explainability. Feature selection can mitigate overfitting and provide valuable insights into relevant features. Using SHAP (SHapley Additive exPlanation) values has proved to be a useful tool to identify significant features. In this paper, we propose implementing Kernel SHAP in the shap-select module to extend its functionality to non-tree-based models. This is done by introducing a masker that generates background values that SHAP uses to simulate the removal of a feature. The performance is then evaluated on a small high-dimensional dataset, and the resulting feature selection is evaluated. The results show that, despite higher instability in the performance metrics, the method's capacity to identify significant features remains promising. These findings can serve as a starting point for further research into the relationship between feature characteristics and feature selection methods, as well as for improving the capacity to identify relevant features using SHAP values for feature selection.

PhishFusion: A Multimodal Phishing Website Detection Framework using Joint URL and JavaScript Features
PRESENTER: Duc Doan Manh

ABSTRACT. Phishing websites pose a significant cybersecurity threat, leveraging deceptive techniques to steal sensitive user information. Traditional detection methods often rely on single-modal approaches, such as URL-based heuristics or content-based analysis, which fail to generalize effectively against evolving phishing tactics. In this paper, we develop PhishFusion, a multimodal phishing detection framework that integrates diverse feature representations for improved accuracy and robustness. Our approach processes three complementary modalities: the URL itself, extracted URL-based features, and JavaScript codes present in webpages. By fusing these sources of information into various deep learning models such as transformers and convolutional neural networks, PhishFusion captures both lexical patterns and functional malicious behavior of phishing websites, thus improving the detection accuracy and robustness against recent developments of phishing tactics. We perform extensive experiments on benchmarking datasets collected from well-known anti-phishing platforms such as PhishTank and Phishing.Database. Experimental results show that our framework outperforms state-of-the-art methods in phishing detection, achieving higher precision and recall (up to $99.54\%$) while maintaining computational efficiency. Our findings highlight the importance of multimodal learning in combating phishing threats and suggest future directions for enhancing real-time phishing prevention systems.

Causal Interactions Among Climate, Prices, and Trading Volumes in Vegetable Price Forecasting

ABSTRACT. This study investigates the causal relationships between climate and vegetable prices and trading volumes, as well as its impacts on vegetable price forecasting. In the context of climate change, agricultural products are susceptible to climate impacts like typhoons and plum rain seasons, damaging farmer incomes. In the past, farmers relied mainly on experience for planting, leading to vegetable supply exceeding demand. In recent years, scholars have introduced technologies like big data and artificial intelligence for forecasting vegetable prices. However, although many predictive features like climate and vegetable production are considered to influence vegetable prices, the causal relationships between these features are often neglected. This study analyzes the predictive effects of features with and without causal relationships, and uses T-tests to examine their significant impacts on vegetable price forecasting. The ultimate goal is to improve the applicability and effectiveness of vegetable price forecasting.

A High-Readability and Image-Integrated QR Code Generation Method
PRESENTER: Jun-Sheng Lin

ABSTRACT. This research presents an aesthetic and readable QR Code beautification method that embeds image information into QR Codes in a structured manner, achieving natural visual fusion while preserving stable scanning performance. The proposed approach exploits the linear properties of Reed–Solomon (RS) codes to redistribute data and error-correction symbols. By constructing an RS extension matrix via Gaussian–Jordan elimination, the control bits originally located at the QR Code center are remapped to peripheral modules, producing results that maintain both data robustness and visual alignment. To enhance visual quality, the QR Code is partitioned into central and peripheral regions. The central modules are rendered using circular elements with an 8-pixel diameter and are further refined through brightness and color adjustments, maximizing retention of the original image details. In contrast, the peripheral modules preserve the traditional black-and-white structure to ensure decoding stability. For improved recognizability, the system automatically adjusts brightness based on grayscale balance and the contribution of each RGB channel, applying weighted corrections when the available dynamic range becomes limited. Experimental results demonstrate that the proposed method significantly enhances the visual appeal and perceptual integration of QR Codes while maintaining a high scanning success rate. The overall design is compatible, reproducible, and provides a practical technical foundation for the generation and application of visually enriched QR Codes.

Behind PRISMA’s Success: Generative Artificial Intelligence Support in Systematic Reviews
PRESENTER: Dariusz Krol

ABSTRACT. This paper explores how generative artificial intelligence (GAI) can automate the selection process for systematic reviews in accordance with the PRISMA guidelines. A key challenge in this process is ensuring the accuracy of the output list. The findings suggest that GAI can be advantageous for both experts and non-experts. From an expert’s perspective, GAI enhances recall and enables the use of Scopus AI tools for automated classification or selection, thereby reducing the risk of missing relevant literature. Similarly, non-experts can significantly enhance their efficiency by integrating AI-based tools into their screening processes. This paper demonstrates a clear advantage in using Scopus AI, as it shows that expert decisions are more consistently aligned with the records retrieved through Scopus AI compared to those obtained from traditional Scopus searches. This conclusion reinforces the case for assisting and partially automating the PRISMA 2020 flow diagram phases.

Design and Implementation of an Intelligent Adaptive Path Selection Mechanism for SDN-Orchestrated SRv6 Networks
PRESENTER: Chu-Sing Yang

ABSTRACT. With the rapid advancement of networking technologies, low latency, high bandwidth, and reliable routing strategies have become critical requirements. IPv6 Segment Routing (SRv6) offers a flexible and programmable solution for traffic engineering, enabling optimized path selection based on real-time network conditions. However, traditional routing mechanisms often fail to achieve efficient resource management and rapid path switching in the presence of network failures or congestion.This study proposes a Probabilistic Routing Optimization Algorithm (PROA) designed for SRv6 networks under the Software-Defined Networking (SDN) architecture to enhance network resilience and adaptability. By dynamically adjusting routing strategies, PROA selects the optimal path based on real-time link conditions, improving fault tolerance, and reducing packet loss. Experimental results demonstrate that, compared to conventional intradomain routing protocols, PROA significantly improves fault recovery time, resource utilization, and overall network efficiency. These findings validate that the integration of SDN and SRv6 can effectively enhance the intelligence and adaptive routing capabilities of next-generation networks.

Body mass index prediction on the basis of gait data

ABSTRACT. The method for predicting body mass index (BMI), as well as additionally weight, height, and age of a human on the basis of kinematic data collected during walking activity is proposed and validated. It contains two main components. In the first stage, the feature extraction of motion time sequences is carried out. The baseline sample entropy, as well as its composite multiscale extension assessing the uncertainty of univariate time series, are selected for this purpose. As a result, gait descriptors containing an entropy values for every pose attribute are obtained. Ultimately, the predicted parameters are determined using supervised techniques, which are appropriate for anticipating numerical values. Moreover, feature rankings are constructed. They allow us to assess the impact of selected anthropometric parameters on the movements of successive human body segments.

In the numerical experiments, highly precise marker-based motion capture measurements, containing data of 30 individuals, are utilized. They provide 3D rotation angles for human body segments. Two variants of the walking activity - overground and performed on a treadmill - with comfortable, slow and fast paces are taken into consideration.

The obtained results confirm relationships between anthropometric parameters of the human and the way of walking. Their prediction is much more robust in comparison to the naive approach based only on the parameters' distributions. The relative improvement for the best case is almost 30\%. Furthermore, the selected entropy measures provide a valuable and interpretable description of the gait data. In general, the influence of body mass and height on gait is most strongly noticeable for the lower and upper limb movements.

A New Model of Fuzzy Tournament Selection in Evolutionary Algorithms for Solving Multimodal Problems

ABSTRACT. This paper presents a novel tournament selection method in evolutionary algorithms for solving multimodal optimization problems. The method applies fuzzy logic to select individuals for the parent pool. The proposed approach improves the performance of evolutionary algorithms and directs the final solution toward specific regions of the Pareto front. Experiments on the Leading Ones Trailing Zeroes problem confirm the robustness of the algorithm, and the method can also be applied to other optimization tasks, such as determining optimal paths for mobile robots or vehicles. Compared to the standard evolutionary algorithm with tournament selection, the proposed algorithm requires less running time and fewer fitness function evaluations. The method is particularly effective for solving multimodal optimization problems in both minimization and maximization.

Fast High-Fidelity Latency Predictor for Homogeneous Neural Networks on ST Microcontrollers

ABSTRACT. Fast accurate inference latency prediction is a important requirement for applying Neural Architecture Search (NAS) to resource-constrained microcontrollers. Conventional methods often rely on complex architectural features. This work explores a simpler, more direct approach. We systematically benchmarked thousands of neural network models on an ST microcontroller and discovered a highly predictable, piecewise linear relationship between a model's memory footprint and its on-device inference time. Our analysis identified a distinct performance breakpoint corresponding to the device's internal memory capacity, which defines two separate linear performance regimes. The resulting piecewise linear models demonstrated high fidelity, achieving R^2 scores of 0.999 for fully connected and 0.996 for convolutional architectures, with error percentages of 0.26 and 5.5 percent respectively. We propose using this robust correlation as a lightweight yet highly accurate latency prediction proxy to guide the NAS process.

HeritKAN: Robust and Ensemble Kolmogorov–Arnold Networks for Intangible Cultural Heritage Classification
PRESENTER: Thanh Ma

ABSTRACT. The classification of Intangible Cultural Heritage (ICH) imagery remains challenging due to subtle semantics, fine-grained inter-class boundaries, and culturally diverse representations. Although Kolmogorov–Arnold Networks (KANs) exhibit strong representational power, their practical use is limited by the computational overhead of B-spline activations and training instability from high variance. To overcome these issues, we propose HeritKAN, a computationally efficient KAN variant employing compactly supported Wendland $C^2$ kernels for sparse and accelerated inference. We further introduce BaggingHeritKAN, an ensemble model using bootstrap aggregation to reduce variance and improve generalization. In addition, we curate the Vietnamese ICH dataset with 10,143 images across 11 categories and develop a web-based recognition system for real-time heritage identification. Experiments show that HeritKAN improves efficiency while BaggingHeritKAN achieves the highest accuracy ($96.25\%$), surpassing CNN ($93.36\%$) and standard KAN ($94.94\%$), providing a principled framework for cultural heritage analysis and digital preservation.

Evaluation of DPO Configurations for LLM Alignment
PRESENTER: Jan Majkutewicz

ABSTRACT. Aligning large language models (LLMs) with human preferences is crucial for ensuring their helpfulness and safety. Direct Preference Optimization (DPO) has emerged as a popular method due to its simplicity and efficiency, but its practical aspects remain underexplored. We aim to address some of the gaps, examine key trade-offs, and provide empirical guidance. We align two open-weight LLMs, Zephyr 7B SFT and Llama 3.1 Tülu3 8B SFT, using DPO across four preference datasets and multiple β values that control deviation from the base model. We also compare dataset mixing against five model-merging methods under different weighting schemes. Our experiments show that lower β values often lead to the largest gains in helpfulness and safety, but the latter often comes with response over-refusal. Dataset mixing and model merging frequently underperform relative to alignment on a single AI feedback dataset, while linear merging with suitable weighting can outperform other strategies. We also demonstrate that the performance of DPO-aligned LLMs as implicit reward models on RewardBench correlates with their downstream helpfulness and safety. This correlation suggests that RewardBench can serve as a helpful indicator for alignment evaluation, potentially reducing the need for costly generative benchmarking. Overall, our results underscore the importance of dataset selection, hyperparameter tuning, and merging strategies, providing practical insights for balancing helpfulness, safety, and response refusal behavior in aligned models.

Enhancing ACCENT with Causality

ABSTRACT. Explainable AI (XAI) draws widespread attention, especially in recommendation systems. In this context, counterfactual explanations stand out as an efficient method for providing explanations with actionable insights about how the system makes decisions. The counterfactual explanation is a small set of changing inputs to make different outputs (e.g., by removing or modifying data points). Methods like ACCENT (Action-based Counterfactual Explanations for Neural Recommenders for Tangibility) have proven their ability to generate counterfactual explanations by removing user action. However, these methods suppose the user actions are independent of each other, ignoring the causal relationship between actions, leading to unrealistic counterfactual scenarios. In this work, we propose a new way to improve ACCENT with causality, by integrating causal inference into the process of generating counterfactual explanations. We build a causal data structure to maintain logic and consistency of user interaction history. Specially, when a data point is removed, causally related points will also be adjusted synchronously. Experimental results show that this improved method not only improves the causal percentage but also improves the CF set size, leading to more reliable and interpretable counterfactual explanations.

AutoViVQA: A Large-Scale, Automatically Constructed Dataset for Vietnamese Visual Question Answering
PRESENTER: Duc Phan Ba

ABSTRACT. Visual Question Answering (VQA) is a fundamental multimodal task that requires models to jointly understand visual and textual information. Early VQA systems relied heavily on language biases, motivating subsequent work to emphasize visual grounding and balanced datasets. With the success of large-scale pre-trained transformers for both text and vision domains, such as PhoBERT for Vietnamese language understanding and Vision Transformers (ViT) for image representation learning, multimodal fusion has achieved remarkable progress. For Vietnamese VQA, several datasets have been introduced to promote research in low-resource multimodal learning, including ViVQA, OpenViVQA, and the recently proposed ViTextVQA. These resources enable benchmarking of models that integrate linguistic and visual features in the Vietnamese context. Evaluation of VQA systems often employs automatic metrics originally designed for image captioning or machine translation, such as BLEU, METEOR, CIDEr, WUPS, and BERTScore. However, recent research suggests that large language models can further improve the alignment between automatic evaluation and human judgment in VQA tasks. In this work, we explore Vietnamese Visual Question Answering using transformer-based architectures, leveraging both textual and visual pre-training while systematically comparing automatic evaluation metrics under multilingual settings.

Leveraging LLMs to Support Malware Analysis from Structured and Semantic Binary Data
PRESENTER: Tran Tuan Anh

ABSTRACT. The rapid growth of advanced malware and sophisticated attack campaigns poses critical challenges to traditional detection and analysis methods, which often fail against obfuscation and evolving adversarial techniques. In this study, we propose a novel framework that leverages Large Language Models (LLMs) in combination with Retrieval Augmented Generation (RAG) and real-world Threat Intelligence (TI) sources to enhance binary malware analysis. Our system processes both static (PE features, strings, APIs) and dynamic (sandbox logs, behavioral traces) reports, automatically extracting and mapping Tactics, Techniques, and Procedures (TTPs) under the MITRE ATT&CK framework. To strengthen contextual accuracy, the framework employs a RAG-based knowledge retrieval pipeline together with external TI enrichment from platforms such as ThreatFox, and further supports adversary attribution through Advanced Persistent Threat (APT) group inference. We evaluate the approach using a ground-truth APT-labeled dataset with over 600 malware samples, measuring performance with Coverage Rate (CR) and Mean Reciprocal Rank (MRR). Results demonstrate the effectiveness of our model in producing explainable, evidence-backed TTP mappings and accurate APT group suggestions. This work highlights the potential of combining LLMs with enriched contextual reasoning to improve scalability, attribution, and operational utility in modern malware analysis.

Evaluating Deep Learning Method for Risk-Adjusted Portfolio Optimization: Evidence from the Warsaw Stock Exchange
PRESENTER: Marcin Hernes

ABSTRACT. The aim of the paper is to develop a method for portfolio construction that max-imizes the return-to-risk ratio, based on deep learning. It includes separate stages for data preprocessing, deep learning parameter optimization, and backtesting. The method was verified using data from stocks listed on the Warsaw Stock Ex-change (WSE).In this study, portfolio construction was guided by signals gener-ated through the Triple Barrier Method with deep learning models trained on his-torical financial indicators.

Identifying distress in Vietnamese narratives: dataset construction and model benchmarking

ABSTRACT. Mental health concerns are becoming increasingly prevalent in modern socie-ty. Personal narrative texts—particularly online confessions—provide a rich and authentic source of self-expressed emotions, where individuals reveal their thoughts, experiences, and inner struggles. Such narratives offer valuable cues for identifying early signs of emotional distress, which often appear subtly in language before clinical symptoms become evident. This paper presents a study on the binary classification of emotional distress in Vietnamese personal narratives, employing a comparative evaluation of machine learning, deep learning, transformer-based pretrained language models and Large Language Models to detect distress from textual expressions. We construct a dataset called VNMH1550, comprising 1550 user-submitted confessions from the “Tâm sự” (Confessions) section of VnExpress, the most widely read Vietnam-ese online newspaper. Annotation guidelines were developed in collaboration with a clinical psychologist, and well-trained annotators labeled each text, achieving an agreement accuracy ratio of at least 0.8, highlighting the inherent complexity of the task and the Vietnamese language nuances. Experiments demonstrate that among traditional machine learning models, Random Forest achieved the highest performance with a macro-F1 of 69.66%, confirming the robustness of ensemble methods in capturing linguistic variability. Within deep learning approaches, the BiLSTM model obtained a macro-F1 of 65%, re-flecting its capacity to model sequential dependencies in Vietnamese narrative texts. However, transformer-based models significantly outperformed all base-lines. In particular, PhoBERT achieved the best results with an accuracy of 72.58% and a macro-F1 of 74.93%, highlighting its superior ability to detect emotional distress in Vietnamese personal narratives. Additionally, a zero-shot LLM baseline (ChatGPT, GPT-4) achieved an accuracy of 60.32% and a mac-ro-F1 of 60.14%, demonstrating moderate alignment with human annotations.

15:40-16:00Coffee Break
16:00-17:00 Session 12A: Computer Vision and Intelligent Systems
16:00
A Lightweight CNN-Transformer Hybrid Network for Efficient Cancer Detection Using Ultrasound Images
PRESENTER: Thanh-An Pham

ABSTRACT. Ultrasound is a practical, non-invasive clinical method, which is the first choice for screening and detecting many diseases in current medical examination and treatment, especially breast cancer and thyroid cancer. In this study, a novel model based on stacking attention blocks and depth-wise convolutional blocks is proposed. This hybrid deep learning architecture enables highly accurate detection of breast and thyroid cancer, outperforming state-of-the-art models. The model employs window attention and coordinate attention mechanisms together with large depth-wise convolution kernels to reduce the number of parameters and expand the receptive field. The depth-wise convolutional blocks extract local features, while the attention blocks capture long-range dependencies; their combination produces feature representations rich in both texture and contextual information, thereby improving detection performance. Two datasets of breast cancer, BUSI, and thyroid cancer, TN5000 are selected experimentally to evaluate the proposed model and state-of-the-art models. Our method achieves 96.13% accuracy on the BUSI dataset and 89.99% on the TN5000 set, outperforming the SOTA models. The proposed model has 14.5 million parameters and 5.02 Gflops, which is optimal in both the number of parameters and computational performance compared to state-of-the-art models.

16:20
Asymmetric Hierarchical Fusion for Visual Grounding in SAR Images

ABSTRACT. Visual grounding on synthetic aperture radar (SAR) imagery aims to localize objects described by natural language expressions. However, existing SAR grounding studies remain limited to single-category scenarios and lack relational descriptions, preventing models from generalizing to realistic multi-object scenes. To address this issue, we introduce SAAVG, a new multi-class SAR visual grounding dataset built from multiple high-resolution SAR detection sources and enriched with automatically generated and manually verified referring expressions. Building upon the LQVG framework, we propose an Asymmetric Hierarchical Fusion mechanism that deepens the Vision–Language Interaction pathway through iterative cross-modal refinement, motivated by the dominant role of this pathway in visual grounding performance. We examine two variants, Shared-Weight and Per-Scale, to characterize their behavior across SAR and optical domains. Experiments on DIOR-RSVG, SARVG1.0, and SAAVG show that the Shared-Weight variant achieves state-of-the-art results on SARVG1.0 (Pr@0.5 92.01%, mIoU 83.39%), while the Per-Scale design performs better in optical imagery, revealing modality-dependent fusion dynamics. The proposed dataset and findings establish a useful benchmark and offer valuable insights into hierarchical cross-modal interaction for SAR visual grounding.

16:40
3W-DBSCAN-IS: A Three-Way Density-Based Clustering Framework for Uncertainty-Aware Medical Image Segmentation
PRESENTER: Jingtao Yao

ABSTRACT. Medical image segmentation often depends on expert-annotated data, which requires high costs and effort. Unsupervised clustering methods offer a scalable alternative. However, they struggle with uncertain anatomical boundaries and uneven intensity distributions, particularly in complex modalities such as MRI. This study presents 3W-DBSCAN-IS, a three-way decision-based extension of the DBSCAN algorithm that integrates spatial–color feature fusion with adaptive density scaling. Each pixel is assigned to one of three regions: accept, defer, or reject to capture confidence levels within the segmentation. The defer region highlights ambiguous boundaries that are important for clinical significance. To our best knowledge, this is the first application of 3W-DBSCAN to brain MRI segmentation. Experimental results on synthetic and BraTS 2018/2020 datasets demonstrate improved boundary consistency, interpretable uncertainty visualization, and strong Dice, mIoU, and HD95 scores compared to other clustering methods. These results also indicate that 3W-DBSCAN-IS provides an effective unsupervised framework for reliable and interpretable medical image segmentation.

16:00-17:00 Session 12B: Advanced Data Mining Techniques and Applications
Location: BANQUET HALL
16:00
Enhancing Opening Range Breakout Strategies with LSTM-based True Range Prediction
PRESENTER: Jun-Yo Wu

ABSTRACT. This study investigates the potential of replacing the Average True Range (ATR) with a predictive True Range (TR) estimate to enhance the perfor-mance of the Opening Range Breakout (ORB) trading strategy. An initial or-acle-based backtest demonstrates that access to future TR values significant-ly improves cumulative returns. Building on this insight, a Long Short-Term Memory (LSTM) neural network is employed to forecast next-day TR using a sliding window framework, focusing specifically on S&P 500 Index Futures (ES). The results indicate that incorporating the predicted TR into the ORB strategy substantially improves performance, achieving a cumulative return gain of approximately 70% relative to the ATR-based baseline. Forecasting accuracy is assessed using the Mean Absolute Percentage Error (MAPE), demonstrating consistent predictive capability. These findings provide empir-ical evidence that machine learning-based volatility forecasting can mitigate the lag inherent in traditional indicators, supporting more adaptive and data-driven intraday trading strategies.

16:20
A Personalized News Recommendation Technique based on Generalized Additive Mixed Effect Model
PRESENTER: Chun-Hao Chen

ABSTRACT. With the rapid development of the Internet, people can browse and read a massive amount of information online. Although the information is diverse and abundant, the sheer volume, coupled with excessive duplication or similarity, can easily leave users feeling overwhelmed. In addition, users often spend a great deal of time searching for content they are genuinely interested in. Recommender systems were thus created to address this problem. The construction of a recommender system is not a unified, formulaic process. Different application domains consider different issues. In a news recommender system, in addition to common issues such as "data filtering and extraction," "user interest drift," and "recommendation efficiency and quality," the timeliness of news is an important indicator of concern for both readers and operators. It means that the real-time recommendation capability must be considered as an additional factor. Therefore, news recommender system models need to possess the ability to rapidly update streaming data. This research aims to develop a news recommendation technology that features news data filtering and extraction, personalized recommendations, and rapid progressive updates. Based on generalized additive mixed effect model (GAME), the personalized news recommendation model, which comprises a fixed-effect model and many random-effect models, is designed in this paper. The recommendation results from these two types of models are then combined according to weights to produce the final recommendation. Based on the model trained offline, incremental learning is incorporated to update news data and user viewing content in real-time, resulting in the development of a fast, progressive personalized news recommendation technology. Experiments were also conducted to show the performance of the proposed model.

16:40
Mechanistic Traceability in Drug-Disease Association Discovery: A Deterministic Graph-RAG Approach
PRESENTER: Ujjwal Maulik

ABSTRACT. Retrieval-Augmented Generation (RAG) systems for biomedicine frequently produce plausible but unverifiable mechanistic claims, undermining their utility for safety-critical applications such as adverse event prediction, drug repurposing, and clinical decision support. The core pathology is generation-before-verification: systems that permit language models to propose mechanistic links without requiring explicit, structured evidence paths. We address this by inverting the pipeline: structural traceability precedes and constrains generation. Our deterministic Graph-RAG method validates drug-disease associations if and only if an explicit mechanistic path exists: Drug (D) to Target (T) to Phenotype (P), where edges are constructed from curated interaction databases (ChEMBL, OpenTargets) and validated via MedCPT-encoded retrieval over PubMed. We introduce a hybrid weighted scoring mechanism (0.8-Retrieval + 0.2-LLM) where the LLM acts as a "negation detector" to penalize factually incorrect high-similarity vectors below a strict acceptance threshold of 0.52. We evaluate against the Comparative Toxicogenomics Database (CTD) on 50 rare and infrequently prescribed drugs spanning oncology, psychiatry, and cardiology, measuring retrieval quality (Precision@K, Recall@K) and generation faithfulness via a defined claim-extraction protocol. Every emitted association carries an audit-ready provenance trace: concrete protein targets, stable database identifiers, and verifiable literature support.

16:00-17:00 Session 12C: Information Security and Cryptology
16:00
Private and Secure Decentralized Federated Learning for IoT Smart Healthcare via Blockchain

ABSTRACT. The Internet of Medical Things (IoMT) yields highly sensitive and non-Independent and Identically Distributed (non-IID) data. While blockchain-based decentralized Federated Learning (BFL) eliminates the single-point-of-failure of server-centric Federated Learning (FL), existing proposals remain poorly adapted to resource-constrained IoMT. In addition, their Byzantine filters (e.g., Multi-Krum) frequently misclassify updates under severe non-IID. We present PS-DFL, a blockchain-based framework that balances privacy, security, robustness, and efficiency. PS-DFL introduces a cluster-then-verification pipeline: masked client updates are first partitioned by K-means to respect non-IID structure, then Multi-Krum is applied within clusters for fine-grained vetting. Privacy is provided in two complementary layers: verification-time masking (to hide updates during vetting) and aggregation-time light Differential Privacy (DP). Each client performs l2-clipping and adds calibrated Gaussian noise (DP-SGD) before secure aggregation via Shamir Secret Sharing (SSS). A stake-based policy mitigates stake concentration and encourages honest participation. In the experiments, a non-IID wearables dataset with a 10% label-flipping adversary and 10-50 clients, PS-DFL achieves an accuracy of approximately 86–89% with stable convergence, exhibiting lower variance and higher resilience than previous approaches. Notably, in attack scenarios with varying threat levels, our method exhibits a roughly 1–15% lower attack success rate relative to existing ones. These results indicate that PS-DFL is a practical and robust solution for collaborative intelligent system in sensitive IoMT environments.

16:20
Speech-EA: Evolutionary Algorithm-based Attack on Automatic Speech Recognition Systems

ABSTRACT. Automatic Speech Recognition systems are powerful tools that can transform speech into text, allowing the possibility of turning spoken requests into actions. However, these tools are susceptible to adversarial attacks that can have dramatic consequences. It is therefore of paramount importance to expand the knowledge on the weaknesses of these models. In this context, our contribution is twofold. First, we describe Speech-EA, an evolutionary algorithm designed to work as a black-box attack against ASR systems for the untargeted scenario. Second, we experimentally validate Speech-EA against Wav2Vec 2.0. With 10,000 attack runs on 1,000 audio samples taken from the Synthetic Speech Command Dataset, our attack achieves a success rate of 89.9%, creates an adversarial audio in 13.82 seconds on average, and many in less than 2 seconds. These results demonstrate that Speech-EA is effective, competitive, fast and highlights the need for more robust ASR systems.

16:40
A Knowledge-Based Method for Designing Knowledge Querying Systems in Computer Programming Education
PRESENTER: Huynh Thuong

ABSTRACT. Computer programming knowledge is fundamental in information technology education. Supporting students in querying programming knowledge is therefore important for teaching and digital transformation. To develop such systems, programming knowledge must be modeled and organized into structured knowledge bases, and an expressive query language is required to capture diverse learner information needs. Existing studies on knowledge representation and query languages remain incomplete, particularly for rep-resenting programming language knowledge. This paper proposes a combined knowledge representation model for computer programming together with a query language tailored to the proposed model. We also investigate query parsing and forward-chaining inference mechanisms to support query execution. The approach is implemented in a programming knowledge querying system and evaluated with real information technology students. Experimental results demonstrate the effectiveness and practical applicability of the proposed system.

16:00-17:00 Session 12D: Natural Language Speech and Text Processing
16:00
A Novel Approach to Arrhythmia Classification Using Explanations and Domain Expertise
PRESENTER: Phuc Ho

ABSTRACT. This paper addresses the challenge of improving both the performance and interpretability of deep learning models for arrhythmia classification using 12-lead electrocardiogram (ECG) signals. While deep neural networks have demonstrated high accuracy in medical signal analysis, their black-box nature often limits clinical trust and adoption. We propose an explanation-guided learning framework that incorporates expert-driven explanations into the training process, thereby enhancing the model’s ability to focus on clinically relevant features. The approach leverages saliency maps and domain knowledge to guide the model’s attention during classification tasks. Experimental results on CPSP-2018 dataset demonstrates that the proposed method not only achieves superior classification performance compared to previous studies but also produces more interpretable decision rationales, as validated by quantitative and qualitative assessments. This study highlights the potential of explanation-guided learning to bridge the gap between model accuracy and interpretability in critical healthcare applications.

16:20
SBV-LawGraph: A Hybrid RAG Approach Integrating Knowledge Graph for the State Bank of Vietnam Legal Documents
PRESENTER: Khoa Phan

ABSTRACT. While Retrieval-Augmented Generation (RAG) pipelines often demonstrate strong performance in general settings, they often struggle with legal texts, where interpreting the structure and relationships between laws is crucial. To address this, we introduce SBV-LawGraph - a dual-retrieval framework designed specifically for Vietnamese legal documents. It combines semantic retrieval with graph-based reasoning by integrating two modules: a Legal Retrieval module that uses sparse–dense reranking for textual accuracy, and a Relationship Retrieval module that traverses a curated Legal Knowledge Graph to capture links like amendments, citations, and definitions. This design enables SBV-LawGraph to generate responses that are not only relevant but also structurally grounded, addressing the limitations of standard RAG systems. Evaluations on the ALQAC2025 and SBV Legal Questions datasets show it consistently outperforms strong baselines, highlighting its effectiveness for precise and explainable legal QA.

16:00-17:00 Session 12E: Computational Imaging and Vision
Location: Online room 2
16:00
Explainable and Robust Conformer for Multi-Label Chest X-Ray Classification
PRESENTER: Rahma Fourati

ABSTRACT. Accurate and trustworthy diagnosis from chest radiographs remains a critical challenge in medical imaging. In this work, we propose a robust deep learning framework based on the Conformer architecture to address multi-label classification of thoracic diseases using the CheXpert dataset. The Conformer, which integrates convolutional and self-attention mechanisms, is particularly well-suited for modeling both local and global dependencies in high-resolution medical images. Our approach incorporates three key pillars for clinical readiness: (i) robustness, evaluated through adversarial perturbations and sensitivity analysis to ensure resilience to input noise; (ii) uncertainty quantification, enabling calibrated confidence estimates essential for high-stakes decision-making; and (iii) explainability, achieved through feature attribution techniques. Extensive experiments demonstrate that our model not only achieves competitive performance across multiple pathologies but also maintains stable predictions under distributional shifts and perturbations. These results highlight the potential of hybrid transformer-based models as reliable tools for real-world medical imaging applications, bridging the gap between performance and deployability.

16:20
Enhancing AGWO-YOLO for Surveillance: Real-Time Human Detection with Adaptive Grey Wolf Optimizer
PRESENTER: Minh Ngoc Le

ABSTRACT. In the rapidly evolving field of intelligent surveillance, real- time human detection is essential for enhancing security and operational efficiency. This study optimizes human detection and tracking by in- tegrating YOLOv10 with ByteTrack, leveraging its high accuracy, low computational cost, and real-time processing capabilities. Among YOLO variants with fewer than 26 million parameters, YOLOv10-L achieves a mAP50 of 91% on INRIA (900 images) and 68% on Human Detection v2 (13,659 images), outperforming YOLOv5-m, YOLOv8-m, and YOLOv9- c. The integration with ByteTrack enhances multi-object tracking, lead- ing to a 37.3% increase in precision and a 61.3% improvement in mAP50 compared to YOLOv5-m. To further optimize performance, the Adap- tive Grey Wolf Optimizer (AGWO) is employed for hyperparameter tun- ing, yielding an additional 4% increase in mAP50, with final scores of 0.945 on INRIA and 0.723 on Human Detection v2. Experimental results confirm that AGWO-YOLOv10-L is a robust and scalable solution for real-time surveillance, providing higher precision, reduced computational overhead, and enhanced adaptability across diverse environments.

16:40
Reducing Estimation Uncertainty Using Normalizing Flows and Stratification

ABSTRACT. Estimating the expectation of a real-valued function of a random variable from sample data is a critical aspect of statistical analysis, with far-reaching implications in various applications. Current methodologies typically assume (semi-)parametric distributions such as Gaussian or mixed Gaussian, leading to significant estimation uncertainty if these assumptions do not hold. We propose a flow-based model, integrated with stratified sampling, that leverages a parametrized neural network to offer greater flexibility in modeling unknown data distributions, thereby mitigating this limitation. Our model shows a marked reduction in estimation uncertainty across multiple datasets, including high-dimensional (30 and 128) ones, outperforming crude Monte Carlo estimators and Gaussian mixture models.

16:00-17:20 Session 12F: Online poster session II
Location: Online room 3
16:00
Multilingual Depression Detection Using Transformer and Hybrid Deep Learning Models
PRESENTER: Surajit Dutta

ABSTRACT. In recent years, the growing usage of social networking sites among youngsters has generated a wealth of data for understanding mental health issues such as depression. The research focuses on the early identification of depression in youth by analysing social media posts in English, Assamese and Hindi Languages. In this study, a dataset of 114,000 posts/comments with 38,000 in each language is utilised to train and test transformer-based models. There are two transformer-based models, BERT and DistilBERT, used for Hybrid deep learning models to compare their effectiveness in analysing for each language. The experimental results show that BERT-based frameworks are effective to early depression identification among youths from multilingual social media data by successfully capturing the contextual characteristics of three languages.

16:05
An Empirical Study of the Role of Incompleteness and Ambiguity in Interactions with Large Language Models
PRESENTER: Riya Naik

ABSTRACT. Natural language as a medium for human-computer interaction has long been anticipated, has been undergoing a sea-change with the advent of Large Language Models (LLMs) with startling capacities for processing and generating language. Many of us now treat LLMs as modern-day oracles, asking it almost any kind of question. Unlike its Delphic predecessor, consulting an LLM does not have to be a single-turn activity (ask a question, receive an answer, leave); and---also unlike the Pythia--it is widely acknowledged that answers from LLMs can be improved with additional context. In this paper, we aim to study when we need multi-turn interactions with LLMs to successfully get a question answered; or conclude that a question is unanswerable. We present a neural symbolic framework that models the interactions between human and LLM agents. Through the proposed framework, we define incompleteness and ambiguity in the questions as properties deducible from the messages exchanged in the interaction, and provide results from benchmark problems, in which the answer-correctness is shown to depend on whether or not questions demonstrate the presence of incompleteness or ambiguity (according to the properties we identify). Our results show multi-turn interactions are usually required for datasets which have a high proportion of incompleteness or ambiguous questions; and that that increasing interaction length has the effect of reducing incompleteness or ambiguity. The results also suggest that our measures of incompleteness and ambiguity can be useful tools for characterising interactions with an LLM on question-answering problems.

16:10
Architecting AI for Compliance: Integrating EU AI Act Principles with MITRE ATLAS Threat Models
PRESENTER: Tomas Valenta

ABSTRACT. We present a compliance-aware reference architecture that operationalises the EU Artificial Intelligence Act (AI Act) by design and aligns security controls with the MITRE ATLAS threat model. The ar- chitecture (C-AIA) integrates three layers—Compliance (traceability and technical documentation), Security (ATLAS-informed mitigations), and Monitoring (observability, oversight, and post-market processes)—and is instantiated as a BPMN lifecycle that unifies pre-market conformity as- sessment with post-market monitoring and continuous risk management. To make conformity verifiable and repeatable, we provide (i) an archi- tecture–obligation–threat mapping (AI Act Arts. 9–15, 72 ↔ ATLAS techniques) and (ii) an evaluation plan with pre-registered questions (EQ1–EQ3) and design-level metrics (M1–M6) covering compliance cov- erage, evidence completeness, threat-informed prevention/detection/ re- sponse, and PMM→RMS feedback. Results are reported analytically via a design-justified matrix that links each obligation cluster to con- crete control points and evidence artefacts, enabling auditability without benchmark numbers. The contribution is a practical blueprint—reference architecture, BPMN, and evaluation plan—that enables organisations to build AI systems that are secure and compliant by design and immediately testable with fu- ture empirical studies. Future work will execute the registered threat- emulation scenarios and publish quantitative outcomes.

16:15
Secure and Semi-Fragile Image Authentication Using Autoencoder-Based Watermarking and Al-Kindi Encryption
PRESENTER: Hanen Rhayma

ABSTRACT. This paper presents a novel hybrid image authentication scheme that combines watermarking, cryptography, and deep learning to ensure image authentication. The proposed method first uses an auto-encoder architecture to extract salient features from the input image via unsupervised training. These features are then transformed into a secure representation by a cryptographic method based on Al-Kindi encryption. The resulting encrypted watermark is embedded in the image approximation sub-band of Discrete Wavelet Transform (DWT) using the Quantization Index Modulation (QIM) techniques. During verification, the embedded watermark is extracted and decrypted to authenticate the image integrity. Experimental results demonstrate that the proposed hybrid approach provides high imperceptibility, robustness against JPEG compression attacks, and effective tamper detection.

16:20
A Use-Case Driven Decision Model for CRIS Users: Integrating Data Fabric, Data Mesh, Knowledge Graphs, and AI for Informed Research Management

ABSTRACT. Research management is increasingly driven by data, yet many current re-search information systems (CRIS) remain fragmented and provide limited decision support. This paper introduces a use-case–driven decision model for CRIS that systematically integrates advanced technologies—including data fabric, data mesh, knowledge graphs, predictive analytics, and AI/ML meth-ods—to enable informed, transparent, and explainable decision-making. The proposed framework addresses key challenges such as data heterogeneity, manual processing, and siloed information, offering a scalable, flexible, and user-centric solution for research managers, scientists, and university lead-ers. By combining analytical depth with governance and interpretability, the model provides actionable insights that enhance strategic planning, collabo-ration, and resource allocation in research institutions.

16:25
Method of Optimizing Microservices Security Based on Workload
PRESENTER: Michał Sikacki

ABSTRACT. Security in microservice architectures remains a major challenge due to their dynamic nature and the complexity of service interdependencies. To address this, a method is proposed for quantitatively assessing security alongside the effort invested in implementing it. Effort is estimated through source code analysis, detecting security patterns and anti-patterns, while security is measured based on identified vulnerabilities. The approach combines dynamic application security testing (DAST) with the ZAP tool and static analysis (SAST) with Semgrep. From these analyses, two normalized metrics are constructed in the range [−1, 1]: a security metric, reflecting the level of risk based on discovered vulnerabilities, and an effort metric, representing the cost of secure implementation in code. The method was initially validated on three microservice-based systems. Results indicate that the proposed metrics provide a practical, quantitative representation of security and effort, while also revealing that higher effort does not necessarily correspond to higher security. This highlights the importance of treating effort as an independent dimension of evaluation. The results suggest that the proposed approach could be further developed towards integration with CI/CD pipelines, enabling continuous monitoring of security posture and implementation effort in microservice systems.

16:30
Modern C++ in Resource-Constrained Edge Systems: An ESP32 Evaluation Bridging Efficiency and Practical AI Applications
PRESENTER: Dariusz Marek

ABSTRACT. The article presents the results of a study on the use of modern C++ language features in embedded systems. The experiments were carried out on an ESP32 microcontroller, comparing traditional programming approaches with features introduced in C++11 and later standards - such as \texttt{constexpr}/\texttt{consteval}, move semantics, smart pointers, \newline \texttt{std::optional}/\texttt{std::expected}, static containers, and \texttt{if constexpr} conditional expressions.

For each test case, execution time, RAM and Flash memory usage, binary size, and the extent of dynamic memory allocation were measured.

The results demonstrate that modern C++ idioms can be effectively employed in resource-constrained environments without incurring significant performance penalties. In some instances, performance even improved due to compiler optimizations. Memory and binary size overheads were minimal and within acceptable limits. Furthermore, the use of modern C++ features enhances code quality, improves memory safety, and increases the predictability of application behavior. In particular, the study highlights their suitability for edge devices that must execute local AI-oriented workloads such as sensor fusion under strict constraints on determinism, energy efficiency, and long-term maintainability.

16:35
Automated Fraud Detection in Financial Transactions: A Comparative Study of PCA, t-SNE, UMAP, Autoencoder, Factor Analysis, and Deep Learning Approaches
PRESENTER: Mohammad Nusir

ABSTRACT. With the rise in online transactions and the growing complexity of fraud, it is crucial to create automated systems for detecting credit card fraud. This can help reduce financial losses and maintain consumer trust. This paper presents an improved framework for automated fraud detection that tackles important gaps in current research. It does this by comparing five dimensionality reduction techniques, optimizing hyperparameters, and assessing stability over time. A thorough baseline evaluation of 30-dimensional features shows that Random Forest has a performance accuracy of 93.92%. A systematic comparison with six algorithms supports the choice of Random Forest. Five dimensionality reduction methods are tested: Principal Component Analysis (PCA), Factor Analysis (FA), t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and Autoencoders.These methods are applied to balanced, standardized transaction data to reduce dimensionality to 10 components. Hyperparameter optimization using RandomizedSearchCV addresses concerns about overfitting. A dual evaluation with stratified 5-fold cross-validation and 95% confidence intervals shows that Factor Analysis has better results. It achieves 94.26% accuracy and 94.59% recall while reducing dimensionality by 66.7%. This beats the baseline recall of 89.86% and PCA's recall of 89.19%. An analysis of concept drift shows that the model stays stable despite changes over time. Robustness tests under noise indicate that PCA is slightly better at handling noise, with 92.57% compared to 92.23% for Factor Analysis. The evaluation framework provides strong evidence that combining dimensionality reduction with improved ensemble learning offers a practical and scalable solution for automated financial fraud detection.

16:40
Enhanced Flight Path Planning for Fixed-wing Unmanned Aerial Vehicles

ABSTRACT. This paper presents research focused on enhancing the efficiency of flight path planning for fixed-wing UAVs. The core path planning algorithm operates on a continuous search space and proceeds in two main stages. First, an intelligent random tree method is used to generate a set of feasible flight paths. Then, im-proved genetic algorithm operators are applied to identify the optimal path from this set. To accelerate the planning process, a parallel computing model was ex-plored and implemented. Simulation results show that the algorithm successfully generates an optimal flight path, and the initialization speed of feasible flight paths increases by approximately 3.5 times when using eight parallel processes.

16:45
CAMO-InstSynth: Few-shot Camouflage Instance Segmentation with Multi-Conditional Background Synthesis and Generative Augmentation

ABSTRACT. Camouflage object detection and instance segmentation remain a challenging frontier in computer vision due to the intrinsic high similarity between fore ground objects and their background surroundings. Furthermore, the scarcity of annotated camouflage data exacerbates the difficulty of training robust models, particularly in data-sparse regimes. In this paper, we introduce CAMO InstSynth, a novel data enhancement framework exploiting the background context understanding to address few-shot camouflage object detection and instance segmentation. Accordingly, we propose a Multi-Conditional Background Synthesis Module that utilizes diffusion models to generate diverse, high-fidelity backgrounds that maintain semantic consistency with camouflaged foregrounds. Unlike traditional augmentation techniques, which increase only the instance diversity, our method further conditions the synthesis process to simulate the camouflage environment. We validate our approach on the common CAMO FS dataset over strong few-shot baselines. Our experiments demonstrate that CAMO-InstSynth significantly outperforms state-of-the-art methods, improving the iFS-RCNN baseline in both Segmentation and Detection tasks. Source code is available upon acceptance of this paper.

16:50
ViTKGQA: A Vietnamese Temporal Knowledge Graph Question Answering Method
PRESENTER: Long Nguyen

ABSTRACT. Temporal Knowledge Graph Question Answering (TKGQA) aims to answer natural language questions involving temporal constraints by reasoning over time-aware knowledge graphs. Despite recent progress, existing datasets and models mainly target high-resource languages, leaving low-resource languages such as Vietnamese largely unexplored. Moreover, many temporal knowledge graph embedding methods rely on monolithic temporal representations that inadequately capture both long-term temporal evolution and short-term variations. This paper introduces ViTKGQA, the first Vietnamese dataset for Temporal Knowledge Graph Question Answering. The dataset is constructed from Wikidata temporal facts with explicit timestamps and supports diverse temporal reasoning patterns, including single-constraint and multi-constraint queries, multiple temporal granularities (day, month, and year), and both entity-valued and time-valued answers. To better model temporal dynamics, we propose a wavelet-based temporal knowledge graph embedding model that employs the discrete Haar wavelet transform to decompose temporal information into low-frequency components representing long-term trends and high-frequency components capturing short-term changes. Experimental results on standard temporal knowledge graph completion benchmarks and on ViTKGQA show that the proposed approach consistently outperforms strong baselines, demonstrating its effectiveness for complex temporal reasoning in Vietnamese and other low-resource language settings.

16:55
From High-Accuracy to Resource-Efficient Cloud–Edge Deployment: Compression-Aware Pruning of Deep Neural Networks for EEG Epilepsy Prediction
PRESENTER: Salem Trabelsi

ABSTRACT. Real-time seizure prediction using electroencephalography (EEG) signals can significantly enhance patient safety and autonomy for individuals living with epilepsy. Although deep learning models have shown strong performance in seizure forecasting, their high computational complexity and memory footprint make deployment difficult in real-world healthcare systems and resource-constrained edge-IoT environments. To address this challenge, this paper proposes a hardware-independent model compression framework for EEG-based epilepsy prediction, evaluated on the CHB-MIT dataset using a CNN-BiLSTM-Attention architecture comprising 2.19\,M parameters. The proposed compression strategy combines aggressive global magnitude pruning with reparameterization, followed by post-training FP16 and INT8 quantization. This approach achieves up to $14.8\times$ model compression with a 93.3\% reduction in parameters, while incurring only minimal degradation in predictive performance. In patient-specific evaluation, the compressed model preserves segment-level accuracy above 96\% and achieves an event sensitivity of 94.7\% at low false alarm rates. Furthermore, cross-subject generalization is improved through lightweight fine-tuning using a single seizure per patient. System-level evaluation demonstrates consistent inference acceleration across cloud, on-premise, and edge platforms. On cloud CPU and GPU servers, the optimized models achieve speedups of up to $2.7\times$ compared to FP32 baseline inference, with sub-millisecond latency when deployed using ONNX Runtime. On resource-constrained edge hardware, specifically a Raspberry Pi~4, the compressed INT8 model enables real-time inference with latency below 5\,ms per segment and throughput exceeding 450 segments/s, corresponding to $11\times$ a speedup over the non-compressed baseline. Overall, this work demonstrates that compression-driven pruning effectively bridges the gap between high predictive accuracy and practical deployment constraints. By offering a balanced trade-off between accuracy, latency, and model footprint, the proposed framework provides a scalable solution for cloud-edge EEG-driven seizure prediction systems.

17:00
Spatial Audio Simulation in Video Games
PRESENTER: Marek Kopel

ABSTRACT. This study investigates the current state of spatial audio solutions in video games, focusing on accessibility. The research specifically compares two prominent approaches in the Unity engine: the commercial Wwise Reflect + RoomVerb and the open-source Steam Audio. The primary evaluation metric is the subjective player experience, assessed through a custom game environment and a Likert-scale survey administered to 26 participants. Results indicate a significant user preference and lower perceived dissatisfaction for the commercial Wwise Reflect + RoomVerb solution compared to standalone Steam Audio, particularly concerning reverberation effects. A key finding is that the implementation of a simple, non-physically accurate, ceiling-height-steered reverb effect closed the quality gap, demonstrating no statistically significant difference between the enhanced Steam Audio and the commercial Wwise solution.

17:05
An approach towards object identification in logistics using QR codes
PRESENTER: Jakub Mikłasz

ABSTRACT. Augmented reality (AR) is increasingly adopted in Industry 4.0 logistics to support tasks such as picking, sorting, and inventory handling. A core requirement for these systems is the reliable, real-time identification of objects and locations, commonly achieved using QR codes. In industrial environments, however, mobile operators and wearable devices introduce challenges, including motion blur, changing distances and angles, limited on-device computation, and intermittent network connectivity. This paper proposes an architecture for QR code detection and recognition tailored to AR devices used in warehouses. The contribution is architectural rather than algorithmic: we design an end-to-end identification pipeline and a hybrid processing strategy that can run fully on-device when connectivity is limited or offload processing to a server when device resources are constrained. We also incorporate tracking and buffering mechanisms to improve user experience and reduce duplicate reads. An experimental study evaluates the feasibility of QR decoding across different printed code sizes, camera types, and viewing angles, providing concrete guidelines for QR code sizing and scanning distances in practice. The proposed architecture supports near-real-time operation and provides a practical reference for deploying QR-based identification in AR-assisted industrial workflows.

17:10
Adaptive Medical Image Watermarking Based on Multi-Resolution Function Contribution and Wavelet Neural Networks
PRESENTER: Rayen Ben Salah

ABSTRACT. This paper presents a novel medical image watermarking method that combines Multi-Resolution Analysis (MRA), Fast Wavelet Transform (FWT), and Wavelet Neural Networks (WNNs) to ensure imperceptibility and robustness. The proposed approach constructs a library of wavelet and scaling functions and analyzes their contributions to image reconstruction. Functions with the least impact are identified and selected as optimal zones for watermark embedding. Unlike conventional neural models, the WNN used here is adaptive and does not require training, making it lightweight and scalable. The watermark is inserted directly into the identified low-contribution regions using a simple modulation scheme. Experimental validation on various medical image modalities demonstrates high imperceptibility (PSNR $>$ 60 dB), strong structural fidelity (SSIM $=$1), and robustness against common image processing attacks. The method is well-suited for securing medical images in telemedicine applications, offering a balance between visibility preservation and resilience.

17:15
CSMC-VQA: Dynamic Code-Switch Multimodal Curriculum Learning for Vietnamese Visual Question Answering
PRESENTER: Khoi Tran Tam

ABSTRACT. The Visual Question Answering (VQA) problem on Vietnamese data is still challenging due to the complex and heterogeneous language structure, because the code-switching phenomenon between Vietnamese and English appears together in real-life situations. To address these issues, this paper proposes a CSMC-VQA model utilizing Dynamic Code-Switch Multimodal Curriculum Learning to enhance the accuracy and adaptability of the VQA model in bilingual and multimodal settings. The proposed method includes three main components: (1) semantically guided bilingual data augmentation, ensuring that the mixed English-Vietnamese “question-answering” pairs retain their meaning through cosine similarity testing; (2) determining the complexity of each sample based on the semantic difference between linguistic and visual features; (3) learning according to the Dynamic Curriculum Learning path, helping the model learn from easy to difficult and self-adjust the training sequence based on actual performance. Experimental results on two datasets, ViVQA and OpenViVQA, show that the combination of CLIP + PhoBERT for the CSMC-VQA model achieves the highest accuracy of 68.5% and 65.2% respectively. This result demonstrates a significant improvement in accuracy performance for the VQA problem, while also opening up a new approach for the multimodal VQA problem and applying it to English-Vietnamese bilingual data.

17:20
CBiF-Net: A Lightweight CNN–Transformer Network with Bilinear Feature Fusion for Crack Segmentation
PRESENTER: Dang Ho

ABSTRACT. Robust crack segmentation has a crucial role in automated infrastructure inspection as well as structural health monitoring. In this paper, we introduce a novel dual-branch hybrid architecture that effectively integrates local texture representation and global contextual modeling, named CBiF-Net, for robust crack segmentation. The proposed framework consists of a CNN branch for capturing fine-grained local details and a parallel Transformer branch for modeling long-range dependencies. To tightly couple the complementary features from the two branches, we introduce a Bi-linear Interaction Module (BiFusion), which enables effective cross-branch feature interaction and significantly enhances representational capability. Extensive experiments tested on the three benchmark datasets, including DeepCrack, SteelCrack, and CrackVision12k, show that our CBiF-Net consistently achieves SOTA performance in metrics of F1-score and mean Intersection-over-Union. Moreover, by adopting the lightweight MobileNetV3-Large backbone, the proposed model achieves an excellent trade-off between accuracy and computational costs in segmentation. This makes it highly suitable for real-time and large-scale practical deployment in automated crack inspection systems.

17:25
An Efficient Hybrid Deep Learning Approach for Detecting Online Abusive Language
PRESENTER: Vuong Ngo

ABSTRACT. The digital age has expanded social media and online forums, allowing free expression for nearly 45% of the global population. Yet, it has also fueled online harassment, bullying, and harmful behaviors like hate speech and toxic comments across social networks, messaging apps, and gaming communities. Studies show 65% of parents notice hostile online behavior, and one-third of adolescents in mobile games experience bullying. A substantial volume of abusive content is generated and shared daily, not only on the surface web but also within dark web forums. Creators of abusive comments often employ specific words or coded phrases to evade detection and conceal their intentions. To address these challenges, we propose a hybrid deep learning model that integrates BERT, CNN, and LSTM architectures with a ReLU activation function to detect abusive language across multiple online platforms, including YouTube comments, online forum discussions, and dark web posts. The model demonstrates strong performance on a diverse and imbalanced dataset containing 77,620 abusive and 272,214 non-abusive text samples (ratio 1:3.5), achieving approximately 99% across evaluation metrics such as Precision, Recall, Accuracy, F1-score, and AUC. This approach effectively captures semantic, contextual, and sequential patterns in text, enabling robust detection of abusive content even in highly skewed datasets, as encountered in real-world scenarios.

16:00-17:00 Session 12G: Applied Artificial Intelligence and Predictive Analytics
Location: Online room 1
16:00
A Comparative Study of Advanced Hybrid and Deep Learning Architectures for Forecasting 10-Year Indian Government Bond Yields
PRESENTER: Manjula Pilaka

ABSTRACT. Government bond yield forecasting occupies a central place infinancialeconomics,withdirectimplicationsformonetarypolicy,port- folio allocation, and risk control. This paper undertakes a detailed empir- ical assessment of a wide spectrum of forecasting techniques—including classical, machine-learning, and advanced hybrid models—using the 10- Year Indian Government Bond Yield as the focal case. We benchmark a simple predictive model against a suite of advanced approaches, includ- ing an ensemble with residual Multi-Layer Perceptron (MLP), a hybrid ARIMA-LSTM model, an LSTM on differenced yields (∆-Yields), and a novel Two-Head LSTM architecture. Models are evaluated on a test sam- ple of 250 observations using both quantitative metrics (MAE, RMSE, DA)andqualitativevisualanalysis.Resultsdemonstratethathybridand multi-output models significantly outperform traditional benchmarks. The novel Two-Head LSTM, which simultaneously learns to predict the yield level and its first difference, achieves the lowest error (MAE: 0.18, RMSE: 0.24) and highest directional accuracy (89.5%), almost perfectly capturing the complex dynamics and turning points of the yield series. The Ensemble + Residual MLP model also proves highly effective. This study concludes that architectures explicitly designed to model both the integrated (I(1)) and stationary (I(0)) components of a financial time se- ries offer a substantial and statistically significant improvement in fore- casting accuracy for emerging market government bonds.

16:20
A Combined Back-Translation and Self-Distillation Approach for Robust and Calibrated Polish Medical Text Classification

ABSTRACT. Classifying Polish medical texts in conditions of low data availability places high demands on effectiveness, reliability, and calibration. This article examines the interaction between data augmentation (BT) and self-distillation (SD) for the mDeBERTa-v3-base model, analysing their impact on effectiveness, calibration and two dimensions of stability: resistance to linguistic noise and OOD degradation. Our experiments reveal a key trade-off: we show that BT alone improves calibration (ECE 0.115) but is unstable on out-of-distribution data (degradation ΔECE +0.023). In contrast, SD worsens raw calibration (ECE 0.125) but provides exceptional robustness to linguistic perturbations (degradation ΔF1 ≈ 0). The combination of both methods (BT+SD) resolves this conflict, achieving the best balance: highest effectiveness (F1-ID 0.869), best probabilistic quality (Brier-ID 0.239; Brier-OOD 0.266) and double stability. The model inherits the stability of OOD (degradation ΔECE +0.006) while maintaining the noise resistance inherited from SD. We conclude that BT+SD is the most balanced strategy, combining the semantic benefits of BT with the stability of SD, making it a promising method for reliable NLP systems in clinical applications.

16:40
Bringing Traditional Manufacturing Assets Under the Industry 4.0 Predictive Maintenance Framework
PRESENTER: Syed Shafiq

ABSTRACT. In the era of Industry 4.0, predictive maintenance has emerged as a key enabler for improving reliability, reducing downtime, and optimizing asset performance in smart manufacturing environments. Rolling element bearings, critical components in rotating machinery, are susceptible to degradation that can lead to costly failures if not detected early. This study presents an unsupervised machine learning framework for automated anomaly detection in bearing condition monitoring data in an Industry 4.0 environment. Vibration signals were acquired from the motor drive end of an industrial circulating water pump and processed using envelope demodulation to enhance sensitivity to bearing-related impacts. A comprehensive set of time-domain and spectral features, including RMS, kurtosis, crest factor, spectral entropy, and dominant frequency, was extracted from 60 quarterly measurement snap-shots. Isolation Forest and One-Class SVM models were applied to the engineered feature set to compute anomaly scores, with results compared against global RMS (GRMS) trends. The analysis revealed consistent alignment be-tween detected anomalies and elevated GRMS levels, indicating the effectiveness of the proposed approach in identifying potential fault progression. The framework demonstrates a scalable, sensor-driven anomaly detection strategy suitable for real-time deployment in Industry 4.0 predictive maintenance systems.