View: session overviewtalk overview
| 13:00 | Sentiment Analysis in Recommendation Systems: An Exploration with a Special Focus on Arabic Language PRESENTER: Sarra Zrigui ABSTRACT. With the exponential growth of textual data on the internet, sentiment analysis has emerged as a crucial technique for enhancing the performance of recommendation systems by extracting user opinions from textual data such as reviews, comments, and social media posts. Sentiment analysis is the process that aims to identify and extract the sentiments and opinions expressed in a text. It is a research field whose main objective is to extract information about the polarity of a text, that is, to determine whether the author expresses positive, negative, or neutral sentiments towards the subject in question. While traditional recommendation systems usually rely on explicit ratings and user-item interactions, sentiment analysis can provide additional deeper insights into user preferences by assessing the polarity and emotional tone of text. Nowadays, sentiment analysis is gaining increasing attention in many domains, including politics, education, marketing, and economics, among others. This is due to the fact that opinions have a considerable influence as they significantly contribute to decision-making processes. This has led to the emergence of various sentiment analysis methods, including lexicon-based methods, machine learningbased methods, hybrid methods, and deep learning-based methods. Despite these advancements, the field still faces significant challenges, such as developing robust models capable of handling cultural and contextual variations, as well as the need for larger and more diverse datasets. This article presents an in-depth study on sentiment analysis approaches in different languages and its role in recommendation systems, with a particular focus on sentiment analysis in Arabic. |
| 13:20 | Improving Cross-Dataset Generalization in Facial Emotion Recognition Through EmoSet: A Balanced and Diverse Dataset PRESENTER: Jihed Jabnoun ABSTRACT. Facial Emotion Recognition (FER) is crucial for applications in human-computer interaction and mental health. However, existing FER datasets often suffer from limitations such as class imbalance and limited diversity, hindering the development of robust models. This paper introduces EmoSet, a novel dataset designed to address these challenges. Comprising 29K images across seven basic emotions (anger, disgust, fear, happiness, neutral, sadness, and surprise), EmoSet integrates diverse sources, including movies, TV shows, GIFs, internet images, and AI-generated content, ensuring balanced representation across categories. We evaluate EmoSet using a Vision Transformer (ViT) architecture, comparing its performance against established datasets (FER2013, RAF-DB, and RAVDESS). Our results demonstrate EmoSet’s balanced performance, particularly excelling in challenging emotions like disgust, and significantly improved accuracy when combined with existing datasets (e.g., achieving 65.27% on AffectNet and 69.32% on FER2013). These findings highlight EmoSet’s contribution to developing more generalizable and robust FER systems. |
| 13:40 | Adaptive Multimodal Fusion for Interpretable and Efficient Conversational Emotion Recognition PRESENTER: Jihed Jabnoun ABSTRACT. Conversational emotion recognition requires integrating text, speech, and visual cues in real time, often under noisy and resource-constrained conditions. We propose an adaptive multimodal fusion method that learns how different emotions rely on different modalities while remaining efficient and interpretable. Three pretrained experts for language, speech, and vision are partially fine-tuned for conversational data and projected into a shared representation space, where cross-modal attention enables mutual feature refinement. An emotion-aware gating mechanism dynamically assigns modality weights per input, revealing a learned emotion–modality affinity matrix. Seven parallel classifiers then operate on the gated representations to capture emotion-specific decision boundaries and improve minority-class recognition. We evaluate our Adaptive Multimodal Fusion on MELD, our approach surpasses state-of-the-art accuracy with lower model size and inference cost, while exposing interpretable modality dominance patterns (e.g., happiness is text-dominant, disgust face-dominant), providing both performance gains and insight into multimodal emotion expression. |
| 13:00 | Interpretable Cooperative MARL for Intrusion Detection via Logical Rule Integration PRESENTER: Faten Louati ABSTRACT. Explainability is vital in intrusion detection, as it clarifies the reasoning behind system predictions, especially when misclassifications could cause serious security breaches. By making model decisions interpretable, explainability facilitates debugging and continuous improvement. This paper advances the field by focusing on explainability and interpretability in Machine Learning (ML), particularly within Reinforcement Learning (RL). We propose a decentralized Multi-Agent Reinforcement Learning (MARL) framework with a hybrid architecture that integrates logical reasoning to design an explainable Intrusion Detection System (IDS). Logical rules provide agents with prior knowledge of their environment, improving learning efficiency and decision transparency and enhancing overall performance and reliability. The model, evaluated on the NSL-KDD benchmark dataset, achieves a 99% detection rate, demonstrating strong robustness and effectiveness. |
| 13:20 | PRESENTER: Swetha Krishna Sriram ABSTRACT. Adversarial attacks have consistently posed a critical challenge to the reliability of deep learning models. This warrants heightened concern towards malicious cyber attacks in sensitive application domains of cybersecurity where such learning models are relied upon to detect and mitigate adversarial threats. In this work, we present a method to improve the robustness of network intrusion detection system (NIDS) models by leveraging a game theoretical adversarial training algorithm with a variational adversary to generate optimal perturbations causing misclassifications. Subsequently, we show that integrating such adversarially manipulated data samples into the training algorithm leads to an improvement in the resilience of the NIDS classifiers against such attacks. To validate our method, we also simulate cyber attacks in a Software Defined Networking (SDN) environment, generating synthetic traffic data for adversarial training. We detail the setup for this simulation, providing a reproducible framework for adversarial training in security contexts. This complements our experiments on widely used benchmark datasets for network intrusion detection systems such as CSE-CIC-IDS2018 and NSL-KDD. We also assess the generality of the proposed method by applying it to the emerging Kolmogorov Arnold Networks (KANs). Our results confirm that the game theory based adversarial training algorithm significantly enhances the robustness of KANs, highlighting the value of our approach. |
| 13:40 | Explainable ResNet–Transformer Fusion with Belief Function Theory for Automated Infection Detection and Severity Assessment in Cactus Pears PRESENTER: Adel Ben Ali ABSTRACT. Climate change exacerbates food security threats by intensifying the spread of agricultural pests and diseases. This dissertation addresses the cultivation challenges of the prickly pear (Opuntia sp.), a vital crop for food security in arid regions, which is highly vulnerable to pest infestations. To tackle this issue, the intelligent detection framework ResViT-BeliefFusion (RVBF) is introduced, integrating a customized ResNet50 with Vision Transformer architectures through evidential belief function theory for enhanced decision-making. The framework was trained on 1,173 high-resolution images collected under diverse field conditions, with particular emphasis on challenging scenarios such as poor lighting and severe infestations. Experimental results demonstrate RVBF’s superiority over individual models, achieving G-mean scores of 97.23% for binary classification and 93.55% for multi-class classification. Furthermore, interpretability analysis using Local Interpretable Model-Agnostic Explanations (LIME) confirms the model’s robustness and transparency. RVBF represents a significant advancement by delivering an intelligent and explainable system for pest and disease monitoring, thereby contributing to the sustainability, productivity, and resilience of prickly pear cultivation in the face of climate change. |
| 13:00 | Meta-Plasticity and Memory Mechanisms for Multi-Activity Recognition under Environmental Variability PRESENTER: Ahmed Zaghdoud ABSTRACT. Recurrent neural networks (RNNs) process sequential data by preserving past states. However, they face the vanishing gradient problem, which makes them difficult to process long sequences. To solve this problem, long short-term memory (LSTM) networks, developed by Hochreiter and Schmidhuber in 1997, use gates to regulate information flow, supporting the selective retention of long-term dependencies. However, traditional LSTM networks have limited dynamic adaptability, resulting in suboptimal performance when dealing with non-stationary data or evolving patterns. Based on neuroscience insights, the concept of synaptic plasticity, which governs plasticity, offers a promising approach to enhancing flexibility. This research explores the integration of meta-plasticity into a hybrid framework that combines a convolutional neural network (CNN) and an LSTM framework, resulting in CLSTM-META, a compact architecture that combines convolutional feature extraction and meta-recurrence dynamics. Comparative analysis with standard LSTMs on Alzheimer's disease data shows that CLSTM-META continuously adapts to changing biomarkers, providing accurate monitoring tools for cognitive assessment and a deeper understanding of daily activities associated with Alzheimer's disease. |
| 13:20 | Cross-Modal Attention and Residual Dense Learning Approach for Multimodal Brain Tumor Segmentation us-ing 3D MRI Scans PRESENTER: Muhammad Attique Khan ABSTRACT. Accurate segmentation of brain metastasis is essential for treatment planning and surgical procedures. Still, it is difficult due to the heterogeneous nature and complex spatial relation-ships of tumor regions. The objective of this study is to develop an automated segmentation model that labels an MRI image into the background + resection cavity (BC+RC), non-enhancing tumor core (NETC), surrounding non-enhancing FLAIR hyperintensity (SNFH), and enhancing tumor (ET) regions. For this purpose, we proposed a deep learning model, SECBAMUNet, that integrates the standard U-Net architecture with CBAM and Squeeze-and-Excitation blocks, as well as residual dense blocks. It is trained on a BraTS dataset using a combined loss function and obtained 71.2% Dice coefficient, 7.0 mm Hausdorff distance (95th percentile), and 1.8 mm average symmetric surface distance, outperforming all the SOTA models. These results demonstrate the robustness and practicality of the pro-posed model in real-world clinical settings. |
| 13:40 | Enhancing the Efficiency and Accuracy of Arabic NER with DeepSeek R1 and active learning PRESENTER: Abdelkarim Mars ABSTRACT. Named Entity Recognition (NER) is a core Natural Language Processing (NLP) task that identifies and classifies entities such as people, organizations, locations, and dates within text. While progress in Arabic NER has accelerated with the advent of transformer-based models, the annotation cost and dialectal variability of Arabic still pose major challenges. This paper presents a novel semi-automatic annotation framework that integrates the reasoning capabilities of the large language model DeepSeek- R1 with a calibrated active learning (AL) strategy to improve both efficiency and accuracy in Arabic NER. DeepSeek-R1 is employed to prelabel unlabeled sentences using a fixed JSON schema with UTF-8 offsets and confidence scores. These pseudo-labels are then filtered and calibrated through temperature scaling, while a diversity-aware acquisition function guides sample selection for human review under a cost budget. Experiments on the AQMAR and ANERcorp datasets demonstrate that the proposed method achieves an F1-score of 90.32% and 92.54%, respectively—outperforming strong baselines such as AraBERT and MARBERT with 40% less human annotation effort. The results confirm that coupling LLM-assisted pre-labeling with calibrated active learning provides a scalable and reliable solution for developing high-quality Arabic NER resources. |
| 13:20 | Distributed Seasonal Temporal Pattern Mining PRESENTER: Van Ho-Long ABSTRACT. The explosive growth of IoT-enabled sensors is producing enormous amounts of time series data across many domains, offering valuable opportunities to extract insights through temporal pattern mining. Among these patterns, an important class exhibits periodic occurrences, referred to as seasonal temporal patterns (STPs). However, mining STPs poses challenges, as traditional measures such as support and confidence cannot capture seasonality, and the lack of the anti-monotonicity property results in an exponentially large search space. Existing STP mining methods operate sequentially and therefore do not scale to large datasets. In this paper, we propose the Distributed Seasonal Temporal Pattern Mining (DSTPM), the first distributed framework for mining seasonal temporal patterns from time series. DSTPM leverages efficient data structures, specifically distributed hierarchical lookup hash structures, to enable efficient computation. Extensive experimental evaluations demonstrate that DSTPM significantly outperforms sequential baselines in runtime and memory usage, while scaling effectively to very large datasets. |
| 13:40 | Utilizing Representation Learning for ECG-based Authentication PRESENTER: Stanisław Saganowski ABSTRACT. ECG-based authentication is an emerging biometric modal- ity that provides strong resistance to spoofing attacks. This study eval- uates the effectiveness of representation learning for generating discrim- inative ECG embeddings by comparing the TS-TCC and CLOCS ar- chitectures, and identifies optimal pipeline configurations for real-world ECG-based authentication. Using the LarField dataset, comprising over 20,000 hours of single-lead ECG recordings from 48 subjects, we assessed the complete authentication pipeline. To ensure subject-level indepen- dence, no subjects used for training were included in the evaluation or experimental sets. Our results demonstrate that contrastive learning ef- fectively mitigates physiological variability; specifically, CLOCS achieves the highest recall (0.99) and F1 Score (0.94), while TS-TCC matches it only in precision. Furthermore, an analysis of reference-set construction indicates that approximately 100 readings collected over a 10-day span are sufficient to achieve consistent performance for both models. These findings provide a comprehensive benchmark for state-of-the-art meth- ods and offer practical guidelines for the design of physiological biometric authentication systems. |
| 14:00 | EBC-CARS: Energy-Based Context-Aware Recommendation with Energy Distance PRESENTER: Linh Nguyen ABSTRACT. Context-aware recommender systems aim to improve personalization by incorporating situational information beyond traditional user–item interactions, yet they often struggle with sparse and heterogeneous contextual signals. In this paper, we propose EBC-CARS, an energy-based context-aware recommendation framework that formulates user–item–context interactions as an energy minimization problem, enabling contextual conditions to directly shape the compatibility landscape. Furthermore, we introduce ED-EBC-CARS, which integrates Energy Distance as a principled statistical measure to quantify contextual similarity and regularize neighborhood-based predictions under varying contextual distributions. Experimental results on MovieLens- 25M, Amazon Reviews, and Yelp datasets demonstrate that EDEBC- CARS consistently outperforms conventional collaborative filtering and state-of-the-art context-aware baselines, achieving up to 6% RMSE reduction and improved Precision@10 across datasets, highlighting its robustness under sparse and heterogeneous contextual conditions. |
| 13:20 | Hybrid Fuzzy Utility Mining with Graph-Reinforcement Learning for Circular Shrimp-Rice Farming PRESENTER: Tan Duy Le ABSTRACT. Shrimp-rice systems in Vietnam’s Mekong Delta face volatile water quality, rising energy costs, and underutilized nutrient waste. We present a hybrid fuzzy-GNN-MOERL-circular framework that unifies prediction and prescription for sustainable control. First, fuzzy spatiotemporal utility mining extracts interpretable early-warning rules under sensor uncertainty. Second, a graph neural encoder captures inter-pond/canal dependencies to form compact states. Third, multi-objective evolutionary RL (NSGA-II/MOEA-D) discovers Pareto policies that balance yield, energy, CO2, and reuse, with fuzzy rules enabling safety-aware reward shaping and action masking. Finally, a graph-based circular-allocation module optimizes sludge/effluent routing to rice fields and biogas units. On ten Mekong farms, our approach improved early-warning F1 by +19% with -31% false alarms, and achieved -18% energy and -22% CO2 versus baselines with minimal yield trade-off. Circular allocation raised sludge reuse from 45 → 83% while cutting transport cost by 26%. An XAI dashboard (SHAP + rule summaries) increased expert trust and auditability. The results demonstrate a practical, explainable pathway to low-carbon, circular aquaculture at cooperative scale. |
| 13:40 | Fast-NSTBC: A scalable topological-based clus-tering method for large network-constrained geo-spatial data PRESENTER: Loan Nguyen ABSTRACT. The rapid growth of geospatial data has introduced substantial challenges to geospatial clustering in network space, particularly in terms of computa-tional efficiency. Although recent network-constrained clustering ap-proaches achieve improved clustering performance over earlier methods such as NS-TBC, they still incur considerable computational overhead due to unprioritized expansion and full adjacency-matrix traversal, leading to redundant edge processing and high time complexity. To address these limi-tations, this paper proposes Fast-NSTBC (Fast Network-Space Topological-Based Clustering). This improved topology-based clustering algorithm pri-oritizes expansion to the nearest adjacent point and uses an adjacency-list representation rather than an adjacency-matrix. These enhancements reduce unnecessary edge expansion and restrict traversal to actual adjacent points rather than traverse all points in the dataset. Experimental results on eight real-world datasets from three different network environments confirm that Fast-NSTBC considerably accelerates clustering execution compared with NS-TBC, while maintaining comparable clustering quality. On average, Fast-NSTBC reduces computational cost by 86% and memory usage by 97% without compromising accuracy. These results demonstrate the effec-tiveness of the proposed enhancement, particularly for large-scale sparsely connected datasets, and highlight its potential for real-time geospatial ap-plications such as intelligent transportation systems, emergency response, and smart-city analytics |
| 14:00 | Amortized Gradient Estimation for Ecient EBM Box Renement PRESENTER: Dung Dang ABSTRACT. Bounding box regression is confronted with critical limitations in current approaches: coordinate-wise losses are observed to ignore geometric relationships, while IoU-based losses are found to suffer from gradient instability in non-overlapping scenarios. In this paper, EBM-BBR is proposed as a novel framework that reformulates bounding box regression as an energy minimization problem. A composite energy function is designed, which comprises spatial, contextual, and shape components. Well-dened gradients are provided everywhere in the box space, and iterative renement through gradient descent is enabled by this formulation. Comprehensive experiments are conducted on Pascal VOC 2012, including convergence analysis, computational cost evaluation, sensitivity studies, and per-category analysis. Consistent improvements are demonstrated when the proposed method is integrated with YOLOv8n, RetinaNet, and Faster R-CNN. An average improvement of +1.6 AP (3.2%) over the strongest baselines is achieved, with 62% of boxes converging within 4 iterations and only 9.1% inference overhead. |
Seg-JEPA: Joint-Embedding Predictive Architecture for Self-Supervised Medical Image Segmentation PRESENTER: Khoa Le ABSTRACT. Medical image segmentation is essential for clinical diagnos- tics, yet it traditionally relies on large-scale, expertly annotated datasets that are costly and labor-intensive to produce. This challenge spans di- verse imaging modalities and applications, from detecting tuberculosis (TB) in chest X-rays to identifying pathologies across various medi- cal imaging domains. Supervised learning approaches, while effective, face significant limitations due to their dependency on extensive labeled data. Annotating medical images requires specialized expertise, and vari- ations in imaging conditions, anatomical structures, and low-contrast features further hinder generalization and robustness. In this paper, Seg- JEPA introduces a self-supervised learning framework based on a Joint- Embedding Predictive Architecture (JEPA) that learns by leveraging a world model, designed to address annotation scarcity in medical im- age analysis and the application of Image World Model (IWM). Unlike reconstruction-based methods, Seg-JEPA learns high-level semantic rep- resentations by predicting features in latent space, avoiding pixel-level redundancy. The paper presents a method for combining knowledge- based masking strategies, prioritizing anatomically relevant regions, and enhancing feature learning capabilities for diagnostic tasks. Evaluated on the Chest X-ray Dataset for Tuberculosis Segmentation, Seg-JEPA achieved a robust Dice score of 96.3%, comparable to the industry- standardnnU-Net(96.4%).Mostnotably,intheextremefew-shotregime using only 1% of labeled data, it achieved 91.9% Dice, outperforming the Masked Autoencoder (MAE) baseline (76.3%) by a massive 15.6%. Fur- thermore,withjust10%oflabeleddata,itreached95.1%Dice,effectively matching the performance of fully supervised models trained on 100% data. |
End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering PRESENTER: Nhi Dang ABSTRACT. Large language models (LLMs) combined with retrieval augmented generation have enabled the deployment of domain-specific chatbots, but these systems remain prone to generating unsupported or incorrect answers. Reliable evaluation is therefore critical, yet manual review is costly and existing frameworks often depend on curated test sets and static metrics, limiting scalability. We propose an end-to-end automatic evaluator designed to substantially reduce human effort. Our system generates Q\&A pairs directly from the underlying knowledge base, uses LLMs to judge chatbot responses against reference answers, and applies confidence-based filtering to highlight uncertain cases. Applied to a Vietnamese news dataset, the evaluator achieves high agreement with human judgments while significantly lowering review overhead. The framework is modular and language-agnostic, making it readily adaptable to diverse domains. This work introduces a practical, scalable solution for evaluating chatbots with minimal reliance on manual intervention. |
PRESENTER: Phong Chung ABSTRACT. Multi-intent detection and slot filling are fundamental tasks of natural language understanding in task-oriented dialog systems. Early approaches treated them as separate tasks, which undermines the direct connection between intents and their associated slots. This limitation becomes more pronounced when multiple intents are expressed within a single utterance. In the Vietnamese language landscape, research on this topic remains limited, largely due to its low-resource status. To address this gap, we introduce VSLIM, a joint model designed for multi-intent detection and slot filling in Vietnamese. Inspired by the SLIM framework (Cai et al. 2022), VSLIM builds on its foundation with a biaffine classifier that more directly captures the relationship between intents and slots. This design allows the model to better understand and represent the dependencies across sequence labels in multi-intent settings. Experiments on the Vietnamese PhoATIS dataset and our newly introduced VPED corpus show that VSLIM outperforms strong NLU baselines, highlighting its potential for improving Vietnamese task-oriented dialog systems. We publish our VSLIM implementation and VPED at https://github.com/dongphong543/VSLIM |
Advancing Vietnamese Visual Question Answering via Adapter-Augmented Cross-Lingual Embedding Integration PRESENTER: Tung Le ABSTRACT. Visual Question Answering (VQA) requires deep integration of visual and linguistic understanding to answer questions about images. While state-of-the-art Vision–Language Models (VLMs) achieve remarkable performance in English, their effectiveness declines sharply in low-resource languages such as Vietnamese. Existing Vietnamese VQA studies primarily rely on task-specific architectures, overlooking the potential of leveraging powerful pre-trained VLMs. In this work, we propose a lightweight and modular framework for cross-lingual adaptation of VLMs to Vietnamese VQA. Our approach performs a “model surgery” by transplanting pre-trained PhoBERT word embeddings into established VLM backbones, followed by parameter-efficient fine-tuning for downstream adaptation. Experiments with BEiT-3, BLIP, and BLIP-2 demonstrate that our method surpasses translation-based baselines and achieves competitive results compared to strong state-of-the-art models, including BARTPhoBEiT and AViVQA-TrainConI. This study highlights an effective and scalable pathway to extend large vision–language models to low-resource languages without extensive retraining. |
A Context-Aware Framework for Time Series Forecasting of Energy Consumption: A Maritime Case Study PRESENTER: Marwa Boulakbech ABSTRACT. Accurate energy consumption forecasting plays a critical role in improving operational efficiency and sustainability in maritime transport. This paper introduces a context-aware framework for time series forecasting of energy consumption in refrigerated maritime containers. The proposed framework integrates environmental and operational context through five iterative stages: data collection, context curation, enrichment, modeling, and evaluation, forming a continuous feedback process that refines contextual representations and improves forecasting accuracy. Unlike traditional model-centric workflows, the framework emphasizes the explicit modeling of contextual dependencies rather than relying solely on temporal dynamics. A maritime case study demonstrates that incorporating contextual information, such as ambient temperature and vessel operation conditions, enhances predictive performance and interpretability. The proposed approach contributes to the development of intelligent, context-aware forecasting systems applicable to dynamic and data-rich industrial environments. |
FOCUS+: Enhancing Sustained Attention Through Comparative Evaluation of Digital and Physical Interventions PRESENTER: Trang Nguyen ABSTRACT. Sustained attention is critical for success in work, education, and daily life, yet maintaining prolonged focus remains challenging. To address this, vari-ous interventions such as meditation, cognitive training, and auditory stimu-lation have been proposed. However, the comparative effectiveness of these interventions in improving sustained attention in Human-Computer Interac-tion (HCI) remains underexplored. This study investigates and constructs the efficacy of three widely used attention-enhancing techniques: binaural beats, cognitive training, and mindfulness meditation. Forty participants were as-sessed using continuous performance tests (CPTs) before and after undergo-ing one of the interventions. Our findings indicate that cognitive training had the most effect on improving attentional performance (mean difference = 0.06, t =3.16, p<0.05). These results highlight the effectiveness of interven-tions that actively engage working memory and cognitive control. We pro-vide practical implications and design guidelines for HCI researchers and practitioners developing interactive systems aimed at supporting and improv-ing sustained attention. |
Prediction of employee absence in a telecommunication company using machine learning models PRESENTER: Ewa Walaszczyk ABSTRACT. Telecommunication companies rely on programming teams, whose core competencies enable them to complete orders successfully. Employee time management is one of the most important challenges facing planning teams. Employee absences lead to delays, productivity losses, and overloading of other staff, who are forced to take over the responsibilities of absent colleagues. The aim of the paper was to evaluate two types of machine learning models, Random Forrest and XGBoost, in the context of predicting employee absence on a dataset from the telecommunications company's human resources department. The database included 4,088 records, which were then aggregated into 889 records representing the cumulative number of days absent by type. The basic XGBoost model proved to be the most effective simulator, achieving an R² of 0.90, MAE of 1.08, and RMSE of 1.41. In comparison, the Random Forest model achieved an R2 of 0.33, an MAE of 4.74, and an RMSE of 7.88. Analysis of the impact of individual variables on tested models indicated that the most important features were vacation leave, day of the year, and the 7-day rolling average of all absences. Future research will include further optimization of model hyperparameters and the application of feature engineering, the effects of which could be significant in identifying patterns related to absences. The conducted analysis indicates the high potential of the use of machine learning in human resources management. Organizations can use the results of the selected model as a reference point for making planning decisions. |
Improving Skin Lesion Prediction Performance by Integrating Clinical Metadata PRESENTER: Van-Hiep Do ABSTRACT. Skin cancer is one of the most common cancers, requiring early anticipation to improve treatment efficiency. Expected-aided computer systems, especially deep learning models, often only use images while ignoring available clinical information such as age, gender, and anatomical location of the lesion. This research introduces the new method based on MaxViT architecture, which simultaneously combines specific images and available data to classify 7 types of lesions on the HAM10000 dataset. The proposed model uses MaxViT to extract specific images, combined with the technique of handling missing data for age field. The cross-attention fusion mechanism helps to refine image features based on clinical data embedding. With the training strategy is two stages and Layer-wise Learning Rate Decay, the proposed model achieves 92.16% accuracy, 87.18% macro Precision, 87.77% macro Recall, and 87.32% macro F1-score, which are superior to the image only model. The results confirm that integrating clinical data through the fusion mechanism significantly improves the accuracy and reliability of the automatic skin cancer diagnosis system. |
A Content-Based Music Recommender System by Integrating Music Repeating Patterns and User Preference Ratings PRESENTER: Ja-Hwung Su ABSTRACT. In recent years, music has become a widely used form of multimedia. For online music platforms, enabling users to quickly find the music they like from a vast music database has become an urgent challenge. Therefore, mu-sic recommendation technology has emerged and become an important ap-proach to addressing the problem of information overload. Though there have been considerable successes achieved by traditional recommender sys-tems, problems of cold start and data monotony have limited the perfor-mances. To overcome these problems, this paper proposes a rating-based recommender system that incorporates musical patterns’ frequencies into item-based rating calculation. For musical patterns’ frequencies, repeating patterns are extracted from each music piece based on their occurrences. For item-based rating calculation, the unknown rating for each target music is thereby derived. The experimental results on a collected dataset show the proposed method performs more promising than the compared method in terms of RMSE (Root Mean Squared Errors). This demonstrates that the dis-covered musical patterns effectively capture user-preferred musical features. |
PRESENTER: Thuan Huynh ABSTRACT. We propose a novel framework for abstractive title generation from scientific abstracts by integrating cross-model keyword augmentation with a denoising autoencoder (DAE) auxiliary objective. In our approach, T5 is employed to extract salient keywords that guide BART’s title generation, ensuring improved lexical and semantic alignment. To further enhance robustness, a DAE auxiliary objective is incorporated into BART’s fine-tuning, refining abstract representations by mitigating noise and emphasizing core semantic concepts. This joint mechanism strengthens the interaction between keyword guidance and generative modeling, enabling more concise, informative, and contextually accurate titles. Experimental results demonstrate that the keyword-augmented, DAE-enhanced framework consistently outperforms existing baselines, offering a robust and interpretable approach to automatic title generation in the scientific domain. |
CLIP-based dual-encoder approach for Sketch-to-Image Retrieval PRESENTER: Ky Quoc Binh Le ABSTRACT. Sketch-Based Image Retrieval (SBIR) enables users to search visual databases using freehand sketches, yet effective zero-shot retrieval remains challenging due to the large modality gap between sparse line drawings and natural images. This paper introduces an efficient dual-encoder SBIR framework built upon CLIP’s ViT-B/32 backbone and a hybrid loss that combines InfoNCE contrastive alignment with hard-batch triplet mining. The resulting embedding space jointly captures global semantic structure and fine-grained distinctions, enabling robust retrieval even when sketches are abstract, incomplete, or stylistically diverse. Evaluated on the Sketchy benchmark, our approach converges rapidly-achieving >=99% Recall@100 in only 50 epochs, a 3× speed-up over Doodle2Search while also improving retrieval accuracy (mAP 81.4 vs. 74.2). Automatic mixed precision and selective fine-tuning further reduce computational cost, enabling larger batch sizes and more stable optimization on a single consumer GPU. Qualitative results show consistent retrieval across organic, geometric, and fine-structure categories, with improved resilience to noise and occlusion. Overall, our CLIP-guided dual-encoder framework delivers fast, scalable, and semantically aligned zero-shot SBIR, setting a strong foundation for future extensions to hierarchical semantics, generative augmentation, and large-scale indexing. |
Temporal Graph Refinement and Reasoning Path Extraction for Legal Document Retrieval PRESENTER: Sinh Nguyen Van ABSTRACT. Neural retrieval systems excel at semantic matching but fail to model temporal dynamics and domain constraints critical for spe- cialized collections. In legal document retrieval, regulations have explicit lifecycles, evolve through amendments, and form citation networks en- coding legal authority. Existing systems cannot distinguish active regu- lations from superseded ones or explain retrieval decisions through legal connections. We propose Legal-GraphRetriever, augmenting hybrid retrieval with two novel components. First, temporal graph refinement reranks candidates using domain knowledge from a legal citation graph. Unlike static graph features in prior work, we compute query-specific scores integrating tem- poral relevance with specialized decay functions, legal validity tracking, dynamic citation authority over retrieved documents, and contextual clustering through co-citation. This multiplicative refinement preserves semantic ranking quality while incorporating legal constraints without training. Second, interpretable reasoning path extraction traverses the citation graph to discover multi-hop paths between documents. Paths are scored by length, edge quality, document authority, and temporal coherence, then converted to natural language explanations via templates. This makes decisions transparent and verifiable by legal professionals. Our approach differs from existing legal IR in three ways: explicit mod- eling of document lifecycles rather than static text, rule-based graph refinement requiring no training data for new legal systems, and graph- derived explanations instead of black-box neural models. |
Optimizing Technical-indicator-based Trading Strategies Using a Modified Grey Wolf Optimizer PRESENTER: Chun-Hao Chen ABSTRACT. Constructing a profitable and robust trading strategy portfolio has become a fundamental challenge in algorithmic trading. To reach the goal, several critical issues must be considered, including identifying suitable technique indicators to form strategies and the parameters used in these strategies. Therefore, in this paper, we propose a grey wolf optimizer (GWO)-based approach for optimizing a trading strategy portfolio that exhibits strong profitability and risk resilience across diverse market conditions. A comprehensive pool of technical indicators that covers trend-based, momentum, oscillation, and volume categories is first used with randomized parameter combinations to construct candidate trading strategies. Recognizing the intrinsic, discrete, and nonlinear nature of technical-strategy search spaces, we modify the traditional GWO mechanism to simultaneously handle continuous decision variables, such as capital weights and stop-loss levels, and discrete components, including indicator types and parameter sets. In addition, a fitness-difference-guided inheritance mechanism is incorporated, enabling ω wolves to adaptively assimilate features from the top three wolves, α, β, and δ, while random strategy injection maintains exploration capacity and prevents premature convergence. Experiments were conducted on a real dataset to demonstrate the performance of the proposed approach. Results demonstrate that the proposed approach outperforms the existing approach in terms of return, execution time, and risk aversion ability. |
PRESENTER: Viet Vo ABSTRACT. Human activity recognition is attracting several scientists in computer vision because of numerous applications such as health care systems, sports video analysis, and human-computer interaction. Deep learning becomes dominant in activity recognition, but it need more resources for computation, time processing. This work has tackled these issues by introducing the key frames extraction method which captures representative frames of activity and redundant video frames. Kernel Temporal Segmentation is applied to divide a video into meaningful segments without training data. Then, dynamic key frames for each video are the middle frames of each segment. Human activity will be represented by these key frames instead of video as a whole. A Vision Transformer is applied to extract the visual features from key frames for activity representation. In addition, RAFT is used to extract optical flow for motion features around each key frame. Finally, an efficient machine learning method is also utilized to recognize the action performed by humans based on the representation of key frames. Our experiments are conducted on four human activity datasets UCF11, UCF50, UCF101, and HMDB51 to demonstrate generalizability, robustness, and scalability in our method. The best recognition rates are 98.32%, 95.53%, 93.72%, and 89.49%, respectively. The recognition rate proves that our proposed method is outperformed, stable, and large scale. |
Modeling Emergent Group Processes from Verbal Interaction Dynamics PRESENTER: Marcin Jodłowiec ABSTRACT. Group interaction processes evolve over time and exhibit recurrent patterns of coordination, conflict, and integration. Although group development is often described in terms of stages or phases, such models remain largely conceptual and poorly grounded in observable interaction data. This paper presents an unsupervised framework for identifying emergent interaction archetypes from verbal communication dynamics. Group interaction is modeled as a multivariate temporal process in which utterance-level semantic features, derived from theoretically grounded interactional components, are temporally aggregated and segmented using change-point detection. Each segment is summarized by an archetypal configuration representing a stable organization of the group interaction field. Experiments on scenario-based group processes from the AMI Meeting Corpus show that a small number of archetypes consistently captures recurrent interactional configurations and that these configurations are not aligned with formal task-defined phases. The proposed approach enables process-level analysis of group dynamics without supervision or predefined stage models. |
An Association-based Multi-document Summarization Optimization Algorithm by Using the Non-dominated Sorting Genetic Algorithm II PRESENTER: Chun-Hao Chen ABSTRACT. The continual expansion of digital content presents a significant challenge in extracting and summarizing large amounts of textual information from multiple documents. Multi-document summarization plays a crucial role in enabling us-ers to efficiently grasp key information. However, generating various high-quality summaries requires considering multiple conflicting objectives, such as ensuring content relevance, reducing redundancy, maximizing the coverage of essential information, and maintaining conceptual centrality. To address this is-sue, this paper proposes an association-based multi-document summarization optimization algorithm by using the non-dominated sorting genetic algorithm II, named AMDS-NSGAII. The objective functions used in the proposed ap-proach include the relevance, redundancy, coverage, and association-based cen-trality factors. AMDS-NSGAII utilizes the non-dominated sorting and crowd-ing distance mechanisms to evolve a population of candidate summaries, result-ing in a diverse set of Pareto solutions that effectively balance multiple quality criteria. Experiments were conducted on the DUC2002 dataset, and the results demonstrate that the proposed method outperforms several existing multi-document summarization techniques in terms of ROUGE scores. |
ViQR-Decider: Lightweight Vietnamese Query Rewriting with Decider Tokens and Synthetic Data PRESENTER: Thanh Tho Quan ABSTRACT. Conversational search is an information retrieval paradigm that simulates natural human dialogue. In multi-turn interactions, each user query often depends on the context of previous turns, which challenges traditional retrieval systems. We introduce ViQR-Decider, a lightweight Vietnamese query rewriting (QR) model that transforms user queries into self-contained forms for effective retrieval. The model employs decider tokens, a controllable mechanism that determines whether rewriting is necessary, enabling adaptive trade-offs between accuracy and efficiency. ViQR-Decider is fine-tuned on a synthetic dataset automatically generated from Vietnamese Wikipedia, allowing training without expensive manual annotation. Experimental results demonstrate that ViQR-Decider achieves comparable or superior retrieval accuracy to LLM-based rewriters while being up to 25× smaller and 33–37% faster. These results highlight the potential of lightweight and controllable query rewriting for practical Vietnamese conversational search systems. |
Enhancing Speaker Verification with Whispered Speech via Post-Processing PRESENTER: Magdalena Gołębiowska ABSTRACT. Speaker verification is a task of confirming an individual's identity through the analysis of their voice. Whispered speech differs from phonated speech in acoustic characteristics, which degrades the performance of speaker verification systems in real-life scenarios, including avoiding fully phonated speech to protect privacy, disrupt others, or the lack of full vocalization may be dictated by a disease. In this paper we propose a model with a training recipe to obtain more robust representations against whispered speech hindrances. The proposed system employs an encoder–decoder structure built atop a fine-tuned speaker verification backbone, optimized jointly using cosine similarity–based classification and triplet loss. We gain relative improvement of 22.26% compared to the baseline (baseline 6.77% vs ours 5.27%) in normal vs whispered speech trials, achieving AUC of 98.16%. In tests comparing whispered to whispered, our model attains an EER of 1.88% with AUC equal to 99.73%, which represents a 15% relative enhancement over the prior leading ReDimNet-B2. We also offer a summary of the most popular and state-of-the-art speaker verification models in terms of their performance with whispered speech. Additionally, we evaluate how these models perform under noisy audios, obtaining that generally the same relative level of noise degrades the performance of speaker verification more significantly on whispered speech than on normal speech. |
SegViT-Based Segmentation of Common Carotid Artery Ultrasound Images PRESENTER: Ching-Fu Xu ABSTRACT. The carotid artery is the main vessel supplying blood to the brain, and its obstruction or stenosis is a major cause of ischemic stroke. Carotid atherosclerotic plaques may lead to vascular narrowing and thrombosis, resulting in cerebral ischemia. Ultrasound imaging of the common carotid artery (CCA) provides a non-invasive technique to evaluate vessel wall thickness and plaque burden, supporting early detection of atherosclerosis and cardiovascular risk assessment. This article presents a deep learning–based method for accurate segmentation of the common carotid artery (CCA) region in ultrasound images. An attention-driven SegViT model is employed to automatically delineate the CCA, reducing the manual annotation workload and improving diagnostic efficiency and consistency. Experimental results show that cropping and enlarging the CCA region prior to recognition yields superior segmentation performance compared to direct processing of original images. The proposed method achieves a Precision of 96.81%, Recall of 96.65%, F1-score of 96.73%, Intersection over Union (IoU) of 93.68%, and mean IoU (mIoU) of 96.76%. These results demonstrate that the refinement strategy not only improves boundary delineation but also provides more stable and consistent performance across different dataset splits. |
Causal Time-Series GNN with XAI for Stock Market Fraud Detection PRESENTER: Thu Le ABSTRACT. Stock market fraud exhibits complex temporal dynamics, causal interdependencies between firms, and severe data gaps due to de- layed disclosures. Traditional anomaly detectors fail under such conditions due to evolving normality, long-tail fraud types, and missing regulatory signals. This paper introduces Causal GNN, a novel method that inte- grates: (i) causal graph construction from regulatory enforcement chains; (ii) self-supervised pre-training via masked reconstruction and contrastive learning; (iii) ImputeGAP for temporal gap imputation under MNAR missingness; and (iv) multi-level explainable AI using SHAP, counterfac- tuals, and path tracing. Evaluated on a new dataset of 1,226 real-world fraud cases from Vietnam (SSC) and Taiwan (FSC), enriched with stock prices and filing delays up to September 2025, CausalTS-GNN achieves 0.847 AUROC and up to 42% relative improvement over unsupervised baselines on zero-shot fraud types. The dataset, source code, and an intelligent interactive XAI dashboard—demonstrating early fraud alerts to regulators and investors for proactive risk mitigation, are publicly released at: https://github.com/levominhthu/Causal-GNN. |
A Variational Graph Neural Network with Multi-View Integration for Predicting lncRNA–Disease Associations PRESENTER: Luong Hai Nguyen ABSTRACT. Long non-coding RNAs (lncRNAs) are critical regulators in diverse biological processes, and their dysregulation is implicated in numerous diseases. Identifying lncRNA-disease associations is crucial for understanding disease mechanisms and discovering therapeutic targets. However, experimental validation is laborintensive and time-consuming, while existing computational methods struggle with network sparsity and inadequate integration of heterogeneous biological data. We propose a novel end-to-end deep learning framework that integrates attention-weighted multi-view graph convolutional networks (GCNs), a variational graph autoencoder (VGAE), and a gated multilayer perceptron to predict potential lncRNA-disease associations. The method constructs a unified heterogeneous graph from multiple similarity networks of lncRNAs, diseases, and miRNAs, along with known associations. This architecture first employs attention-weighted GCNs to extract deep topological features, then uses a VGAE to learn robust, probabilistic topology-aware node embeddings, and finally predicts associations through a specialized gated MLP predictor. Comprehensive evaluation on a primary benchmark dataset 1 demonstrates good performance (AUC 0.9584, AUPR 0.9590), with a second dataset confirming generalizability, proving the approach effectively addresses data sparsity, heterogeneity, and class imbalance as a robust tool for prioritizing lncRNA-disease candidates. |
Brain Tumor Detection by Improving Deep Learning Techniques and Image Processing PRESENTER: Thong Nguyen Dinh ABSTRACT. Advanced deep learning techniques have shown remarkable success in medical image analysis, with many state-of-the-art methods that are being applied to brain tumor segmentation from MRI scans. Among these approaches, combining image processing with deep learning architectures has proven particularly effective. In this paper, we pro- pose improved deep learning techniques that integrate image preprocessing and a hybrid CNN-transformer architecture for automated brain tumor segmentation. We first apply robust preprocessing steps including percentile-based normalization and data augmentation to enhance image quality. Next, we employ a U-Net-based architecture enhanced with CBAM (Convolutional Block Attention Module) for adaptive feature refinement and multi-scale fusion to capture hierarchical tumor struc- tures. Our architecture combines the strengths of CNNs and transformers through four key contributions: (i) a hybrid encoder-decoder with resid- ual blocks and CBAM attention for multi-scale feature extraction, (ii) a multi-scale transformer bottleneck that processes features at multiple patch sizes for better multi-resolution reasoning, (iii) a multi-component loss function combining Dice, focal, and IoU objectives with deep super- vision to address class imbalance and metric alignment, and (iv) multi- scale feature fusion for enhanced localization. We evaluate on standard BraTS 4-class segmentation (background, necrotic core, edema, enhanc- ing tumor) and report results on three hierarchical regions (WT, TC, ET). Our model achieves Dice scores of 0.9113, 0.8647, and 0.8642 for WT, TC, and ET, respectively, demonstrating strong performance with improved HD95 boundary precision metrics. The architecture also sup- ports multi-task learning with a classification branch for tumor grade prediction, achieving 96.23% accuracy in HGG/LGG classification. |
| 16:00 | Energy-Based Motion Pattern Detection with Partial Distance Correlation PRESENTER: Kieu Phan ABSTRACT. An energy-based motion pattern detection framework inte- grated with partial distance correlation was proposed for skeleton-based action recognition under nuisance effects. Motion patterns were defined onfixedtemporalwindowsof30consecutiveframesandwererepresented by 3D joint sequences and kinematic descriptors for representation learn- ing; the motion categories of Jumping, Sitting, Waving, and Throwing wereusedtoconstructthewindoweddatasetandtrainthemodel.Within the proposed framework, class evidence was derived from two comple- mentary sources: an energy score was used to measure the compatibility between a query sample and each class, and a conditional dependence measurewasusedtoquantifynonlinearsimilarityinmotiondynamicsaf- ter the context variable modeling confounding factors (e.g., subject vari- ation, viewpoint changes, and measurement noise) had been controlled. The dependence scores were aggregated at the class level and normal- ized to a common scale for direct cross-class comparison, and were then combined with the energy scores through a geometric fusion rule to pro- duce the final decision. The proposed method was evaluated on the NTU RGB+D skeleton dataset and was compared with kNN+DTW, a plain EBM baseline, EBM combined with distance correlation, and EBM com- binedwithpartialdistancecorrelation.Experimentalresultsshowedthat the EBM combined with partial distance correlation achieved the best performance, with an accuracy of approximately 0.99 and a macro-F1 of 0.78, while inter-class confusion was reduced relative to kNN+DTW. |
| 16:20 | ViHERMES: A Graph-Grounded Multihop Question Answering Benchmark and System for Vietnamese Healthcare Regulations PRESENTER: Long Nguyen ABSTRACT. Question Answering (QA) over regulatory documents is inherently challenging due to the need for multihop reasoning across legally interdependent texts, a requirement that is particularly pronounced in the healthcare domain where regulations are hierarchically structured and frequently revised through amendments and cross-references. Despite recent progress in retrieval-augmented and graph-based QA methods, systematic evaluation in this setting remains limited, especially for low-resource languages such as Vietnamese, due to the lack of benchmark datasets that explicitly support multihop reasoning over healthcare regulations. In this work, we introduce the Vietnamese Healthcare Regulations–Multihop Reasoning Dataset (ViHERMES), a benchmark designed for multihop QA over Vietnamese healthcare regulatory documents. ViHERMES consists of high-quality question–answer pairs that require reasoning across multiple regulations and capture diverse dependency patterns, including amendment tracing, cross-document comparison, and procedural synthesis. To construct the dataset, we propose a controlled multihop QA generation pipeline based on semantic clustering and graph-inspired data mining, followed by large language model–based generation with structured evidence and reasoning annotations. We further present a graph-aware retrieval framework that models formal legal relations at the level of legal units and supports principled context expansion for legally valid and coherent answers. Experimental results demonstrate that ViHERMES provides a challenging benchmark for evaluating multihop regulatory QA systems and that the proposed graph-aware approach consistently outperforms strong retrieval-based baselines. The ViHERMES dataset and system implementation are publicly available at https://github.com/ura-hcmut/ViHERMES. |
| 16:40 | Topology-Driven Rough Set Classification Using Ball Mapper Coverings for Healthcare Intelligence PRESENTER: Quang-Thinh Bui ABSTRACT. The covering-based rough set (CBRS) theory provides a flexible framework for data approximation and classification under uncertainty, where its effectiveness critically depends on the construction of coverings. Traditional approaches employing the Mapper algorithm often suffer from instability and parameter sensitivity due to their reliance on projections and clustering. This study establishes a novel direction for geometric rough set modeling with topological constraints by proposing a topology-driven classification framework that integrates the Ball Mapper algorithm into the CBRS theory, known as the Ball Mapper-based rough set (BMbRS). Furthermore, it enables the construction of projection-free coverings based solely on metric geometry. Within the BMbRS model, we examine all derived coverings in order to obtain a more comprehensive view of the underlying data structure, allowing us to identify the most suitable covering configuration for a given classification task. An experimental evaluation across five benchmark healthcare datasets demonstrates the advantages of the proposed approach. It achieves high average classification quality, with scores of 0.910 (Diabetes), 0.967 (Heart disease), 0.943 (Lung cancer), 0.971 (Fetal health), and 0.962 (Breast cancer). These outcomes indicate that the novel approach offers better interpretability and stability while maintaining a manageable number of parameters, especially in comparison with conventional Mapper-based methods. |
| 17:00 | Energy-Based Video Captioning with Statistical Energy Distance: A Unified Framework PRESENTER: Trung Le ABSTRACT. Video captioning is a central vision–language task for acces- sibility, retrieval and human–computer interaction, yet state-of-the-art encoder–decoder models trained only with token-level cross-entropy of- ten exhibit weak semantic grounding, sensitivity to noisy web videos and spurious multimodal correlations. These limitations are particularly evident when audio carries strong but potentially misleading cues. In this work, we propose a new unified framework, termed EBM-EoD-VC, which augments a standard Transformer-based video captioning pipeline with both sample-level energy-based modeling and distribution-level en- ergy statistics. Concretely, global video, audio and caption embeddings are coupled with a lightweight energy head and optimized using a joint objective that combines cross-entropy with a margin-based energy com- patibility loss and three distance-based criteria: Energy Distance to en- courage distribution matching, Distance Correlation to promote nonlin- ear dependence, and Partial Distance Correlation to explicitly control audio as a potential confounder. Extensive experiments on MSVD and MSR-VTT indicate that, while a carefully tuned cross-entropy baseline already attains competitive performance on the compact MSVD bench- mark, the proposed EBM-EoD-VC framework yields consistent and re- producible gains in CIDEr and SPICE on the larger and noisier MSR- VTT dataset, especially when audio conditioning is enabled. Ablation studies and alignment diagnostics further suggest that energy-statistics regularization can tighten video–caption alignment without architectural complexity, making it a promising ingredient for future multimodal cap- tioning systems. |
| 16:00 | Cross-Quantized Hyperbolic Representations for Enhancing Cartoon Image Retrieval PRESENTER: Thanh-Hung Nguyen ABSTRACT. Approximate Nearest Neighbor search via Product Quantization has become a popular approach for large-scale unsupervised image retrieval. However, existing systems primarily focus on real-world domains and lack evaluation in animation and cartoon contexts, where hierarchical semantics structure plays a crucial role in creative workflows. In animation production, artists often require flexible retrieval tools that can suggest visually coherent yet stylistically diverse content based on a given frame. To address this gap, we propose Cross-Quantized Hyperbolic (CQH) retrieval approach, which can capture hierarchical semantic similarity within the context of hyperbolic geometry. Besides, we contribute a so-called Cartoon18K dataset, the first large-scale dataset of cartoon images with multi-label hierarchical annotations. Extensive experiments on Flickr25K, NUS-WIDE, and CIFAR-10 demonstrate that CQH outperforms state-of-the-art unsupervised baselines. This accomplishment underscores the efficacy of our approach and contributes to the advancement of cartoon retrieval amidst the rapid proliferation of creative multimedia content. |
| 16:20 | PRESENTER: Mandira Neog ABSTRACT. The increasing use of regional languages in digital communication has intensified the need for accurate, context-aware spelling detection tools that can handle the language’s rich morphology and complex orthographic patterns. This paper presents a transformer-based framework for Assamese spelling error detection, utilizing three multilingual pre-trained models—IndicBERT v2, XLM-RoBERTa, and mBERT—fine-tuned on a curated dataset of 115,925 word-level samples. Each word is represented as a tuple (word, label, corrected_word), enabling both binary error detection and lexicon-assisted correction. The models were trained using standardized preprocessing, subword tokenization, and a weighted loss function to address class imbalance. Experimental results demonstrate that XLM-RoBERTa achieves the highest detection performance, with 94.84% accuracy and an F1-score of 0.9683, while IndicBERT v2 attains the best correction accuracy (85.96%), highlighting complementary strengths across the two architectures. A hybrid pipeline that combines neural detection with a Levenshtein-based nearest-neighbor correction mechanism further enhances real-world applicability. The findings underscore the effectiveness of transformer models for Assamese spelling detection and lay the groundwork for future advances in contextual correction, real-world error handling, and deployment in educational, social media, and content moderation applications. |
| 16:40 | PRESENTER: Kim-Son Nguyen ABSTRACT. Context management remains a critical bottleneck in multi- agent Geospatial Artificial Intelligence systems, limiting task completion rates and causing context overflow in complex workflows. We propose ContextGeo, a systematic Context Engineering framework that integrates four Retrieval-Augmented Generation strategies through five specialized agents coordinated by a Spatial Context Manager. The framework introduces three novel mechanisms: Hierarchical Spatial Memory for structured multi-scale retrieval, geospatial-aware compression that preserves topological relationships, and Temporal Context Versioning for maintaining spatial-temporal consistency across workflow stages. Through comprehensive experiments on 150 real-world geospatial tasks spanning three complexity levels, ContextGeo demonstrates substantial performance improvements with 91.2% Task Completion Rate (+34.7 percentage points over state-of-the-art), 87.6% Spatial Analysis Accuracy (+28.5 percentage points), and 42.3% token reduction, all statistically significant at p < 0.001. These results provide empirical evidence that systematic Context Engineering can address fundamental limitations in multi-agent Geospatial Artificial Intelligence, with immediate implications for urban planning, environmental monitoring, and disaster response applications. |
| 17:00 | Adaptive Online Concept Drift Detection in Process Mining using GAN-based Representation Learning ABSTRACT. Detecting concept drift in high-velocity event streams is critical for maintaining the reliability of process monitoring systems. However, traditional approaches often rely on rigid statistical tests or handcrafted features, struggling to capture complex, evolving process behaviors. To address this limitation, this paper proposes a novel adversarial representation learning framework for adaptive online drift detection. The proposed framework leverages a Generative Adversarial Network (GAN) where the discriminator functions as a dynamic anomaly scorer, capable of identifying distributional shifts without explicit supervision. Crucially, to mitigate catastrophic forgetting during model updates, the framework incorporates an experience replay mechanism that retains a buffer of recent normal traces. Extensive experiments on eight synthetic datasets demonstrate that this approach achieves near-perfect recall and reduces detection latency by over 50% compared to the state-of-the-art PrefixCDD baseline. Furthermore, evaluations on the real world BPI Challenge 2017 log highlight the model’s robustness to noise and irregular human behaviors, yielding significant improvements in precision and F1-score while maintaining real-time efficiency. |
| 16:00 | Exploring Large Language Models' Ability in Text Correction and Restoration for Vietnamese Hate Speech Detection PRESENTER: Hoang-Quan Tran ABSTRACT. This paper addresses the challenge of Vietnamese Hate Speech Detection (HSD) on social media, where non-standard and noisy text reduces model effectiveness by obscuring meaning. We explore the use of large language models for text restoration and correction (standardization), employing few-shot prompting with Gemini 2.0-Flash and Qwen3 in the context of Vietnamese HSD. The experiments demonstrate that LLM-based standardization, particularly with Gemini 2.0-Flash, improves downstream model performance by increasing the Macro-F1 score from 66.66% on non-standardized data to as high as 68.30%. This result reveals a prominent aspect of LLM-driven standardization as a practical and effective direction for advancing Vietnamese HSD. |
| 16:05 | PRESENTER: Przemysław Spurek ABSTRACT. Creating realistic and controllable 3D facial avatars is a sig- nificant challenge in computer graphics. While traditional methods are often difficult and time-consuming, emerging technologies like Neural Radiance Fields (NeRFs), Gaussian Splatting (GS), and deepfake meth- ods offer new possibilities. However, a way to combine them to create plausible 3D avatars is still needed. Consequently, we developed Implicit- Deepfake framework to address this gap. It works by first applying a 2D deepfake algorithm to individual training images. Then, a neural ren- dering model (either NeRF or GS) is trained on these altered images. This process allows a consistent 3D avatar to emerge implicitly from the 2D data, avoiding complex direct 3D manipulation. This method can create high-fidelity static and dynamic 3D avatars. We also expand the framework to allow for stylistic and semantic adjustments using diffusion models like Stable Diffusion with ControlNet via text prompts. By com- bining these emerging technologies, our approach provides a functional approach for the next generation of digital avatar creation. |
| 16:10 | A Hybrid Method for Emotions and Sarcasm Detection in Polish Language PRESENTER: Monika Kontna-Zygowska ABSTRACT. In an era of increasing user activity on social media, the automatic detection of emotions and sarcasm in online communication is gaining importance. The aim of this study was to develop a hybrid method for classifying emotions, sarcasm, and sentiment in content published in Polish on platform X (formerly Twitter). The proposed solution combines a transformer-based language model (HerBERT) with lexical analysis, utilizing an Emotion and Sentiment Dictionary based on information drawn from plWordNet. Research conducted on a Polish-language dataset showed that the hybrid approach using the dictionary improves classification precision compared to methods relying only on transformers. Additionally, data balancing techniques contributed to a slight improvement in the F1-Score metric of the HerBERT-base model on the dataset. |
| 16:15 | How Smart is Smart Enough? Benchmarking LLMs with Embedding-Based Similarity in Python Code Generation PRESENTER: Dominik Palla ABSTRACT. The growing capabilities of generative AI, particularly Large Language Models (LLMs), are reshaping software development by en- abling automated code generation. This study presents a comparative evaluation of state-of-the-art models—OpenAI GPT-4.5 Preview, GPT- 4o, GPT-4o Mini, GPT-4 Turbo, GPT-3.5 Turbo, GPT-o1, GPT-o3 Mini; Google’s Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, 2.0 Flash Lite; An- thropic’s Claude 3 Opus, 3 Sonnet, 3 Haiku, 3.5 Sonnet, 3.5 Haiku, 3.7 Sonnet; and Meta’s LLaMA 3.0 and 3.1 8B Instruct—across ten Python programming tasks of varying complexity. Model outputs were assessed using an embedding-based semantic similarity metric against expert- crafted reference solutions. Results show that top performers like GPT- 4.5 Preview and GPT-4o Mini achieve consistently high similarity scores, while LLaMA 3.1 8B ranks lowest. Interestingly, complex tasks yielded higher similarity, possibly due to more structured outputs. The findings highlight the strengths and limitations of current LLMs and advocate for complementary evaluation criteria such as execution correctness and efficiency in practical use. |
| 16:20 | Solving Soft-Rectangle Packing with Guillotine Constraints for Field Allocation in Vietnamese Agriculture PRESENTER: Quoc-Trung Bui ABSTRACT. The soft-rectangle packing problem with guillotine constraints seeks to partition a fixed-shape rectangle of length $L$ and height $W$ into $n$ soft rectangles of given areas using two-stage guillotine cuts, with the objective of minimizing the sum of the differences between the lengths of the two sides of all soft rectangles. This problem arises in Vietnamese agricultural land allocation and the optimization of matrix multiplication algorithms. We prove that this problem is NP-hard. We also propose two mixed-integer linear programming models for the problem: one model is used to develop a branch-and-cut algorithm, and the other is used to solve the problem via a column generation approach. The proposed algorithms and a state-of-the-art heuristic algorithm were evaluated through experiments on near-real-world instances, yielding valuable insights. |
| 16:25 | Comparative Analysis of Seven YOLO Architectures Applied for the Detection of Objects of Different Scale at Different Scene Densities PRESENTER: Farid Kazimov ABSTRACT. Selecting the most effective lightweight YOLO model for object detection remains a complex task, especially when object scale and scene density vary significantly. This study systematically compares seven nano/tiny YOLO architectures (v5n, v6n, v8n, v9t, v10n, 11n, 12n) trained and evaluated under identical conditions using a unified Ultralytics pipeline. Three custom datasets were designed to represent distinct visual challenges: small objects (bees), medium-scale objects (sheep), and dense, occluded human heads. Both quantitative and qualitative analyses revealed that model superiority shifts with object scale and crowding, and that conventional accuracy metrics such as mAP may fail to reflect real-world detection reliability. While most prior studies focus on a single dataset or rely solely on numerical accuracy, our work introduces a multiscale, multi-density benchmark that bridges statistical evaluation with visual error analysis. This approach exposes hidden performance tradeoffs between detection precision and contextual robustness. The findings underscore the importance of dataset-specific validation and highlight that mAP alone is insufficient for comprehensive model assessment in lightweight YOLO deployment. |
| 16:30 | GMPA: Enhancing Continual Learning with Rehearsal-based Method via Gaussian Mixture Prototype Augmentation PRESENTER: Quynh-Trang Pham Thi ABSTRACT. Continual learning addresses the funda- mental challenge of catastrophic forgetting, wherein models significantly decrease in performance on pre- viously learned tasks when acquiring new knowledge. The Rehearsal-based method is an effective approach for mitigating this problem by retraining samples from old tasks while learning new tasks. In this paper, we propose the GMPA model, which uses Gaussian Mixture Prototype Augmentation to generate exemplars from the approximate distribution of previously learned classes for rehearsal during training on new tasks. The GMPA parameters are stored in memory and used to create augmented features proportional to the current task’s class distribution, maintaining a balanced representation between old and new classes. The experimental results on various benchmark datasets, including CIFAR-100, ImageNet-subset, CUB-200, Stanford-cars, and Food-101, demonstrate that our method achieves state-of-the-art performance across various continual learning scenarios. |
| 16:35 | IOF-DP: Iterative Inner–Outer Fusion with Dual-Projection for Exemplar-Free Class Incremental Learning PRESENTER: Quynh-Trang Pham Thi ABSTRACT. Exemplar-free class-incremental learning (EFCIL) trains mod- els on a sequence of tasks without storing past samples, which is crucial under privacy or memory constraints. However, EFCIL remains chal- lenging due to (i) catastrophic forgetting of old knowledge, (ii) semantic drift of features when the backbone adapts to new tasks, and (iii) de- cision bias favoring recently learned classes. We propose IOF-DP, an iterative inner-outer task-vector framework with importance score-based fusion and a DPCR-based calibration stage. At each mini-batch, an in- ner step adapts to new classes, while an outer step distills to the pre- vious model, yielding two displacement vectors; importance scores pro- vide parameter-wise soft masks to fuse them and balance plasticity vs. stability. After fusion, we adopt Dual-Projection to align old statistics and ridge-based reconstruction to rebalance the classifier. Experiments on standard EFCIL benchmarks show that IOF-DP achieves a superior stability–plasticity trade-off and improves overall accuracy over strong baselines. Code is available at: https://github.com/Ngahoang2005/IOF- DP. |
| 16:40 | PRESENTER: Van Tham Nguyen ABSTRACT. Merging probabilistic knowledge bases (PKBs) is an essential task in knowledge representation and reasoning, as it enables the merging of information from multiple sources. Dempster-Shafer (D-S) theory provides a powerful framework for knowledge merging, enabling reasoning and decision-making with incomplete or conflicting information. However, how to employ fusion rules derived from Dempster's rule for merging PKBs is still an open issue. This paper examines the logical relationship between classical probability theory, which represents PKBs through probabilistic constraints, and D-S theory, which represents them using basic probability assignment functions (BPAs). Several fusion rules are investigated, including the Dempster rule (DSR), generalized combination rules (GCR, mGCR). In addition, basic belief functions (BBFs) are studied as a means to refine BPAs and enhance merging quality. Based on these rules and BBFs, methods for merging PKBs are developed, incorporating both single-rule and multi-rule strategies, and algorithm implementing these methods are also proposed. Finally, simulation results, along with discussion, are provided to validate the effectiveness of the proposed methods |
| 16:45 | An Explainable Framework for Brain Connectivity-Based Autism Diagnosis PRESENTER: Imen Hmida ABSTRACT. Recent advancements in deep learning have significantly contributed to improving the accuracy of Autism Spectrum Disorder (ASD) diagnosis using functional MRI (fMRI) data. However, existing methods explore all the connections from fMRI or often face challenges in selecting the most relevant connections for classification. In this study, we propose a structured and explainable framework that integrates connectivity selection and attention-based modeling for ASD diagnosis. The Transformer-Encoder model, known for its ability to capture long-range dependencies in complex data, is used here to analyze the fMRI connectivity data. The Random Forest algorithm is employed for automatic brain connections selection, ranking and refining the set of connections according to their feature importance scores to retain the most discriminative ones. Experimental results show that our approach achieves an accuracy of 76.29\%, surpassing leading methods in the literature and highlighting its effectiveness in providing a reliable and robust ASD diagnosis. |
| 16:50 | LUMOS: LLM-Driven Contrastive Federated Sequential Recommendation PRESENTER: Minh Hieu Nguyen ABSTRACT. Federated sequential recommendation (FedSeqRec) enables privacy-preserving next-item prediction by training models locally on user devices, but its effectiveness is often limited by the short, noisy, and low-diversity interaction sequences available on each client. Existing FedSeqRec methods typically rely on handcrafted augmentations or server-side regularization, which either provide weak self-supervised signals or introduce additional communication and system complexity. In this paper, we present LUMOS, a parameter-transmission-free FedSeqRec framework that leverages large language models (LLMs) as on-device semantic augmenters. For each user, LUMOS privately prompts an LLM to synthesize three types of behavioral views from the original interaction history: (i) predictive future sequences that extrapolate likely next actions, (ii) intent-preserving paraphrased sequences that maintain preferences while varying surface behavior, and (iii) counterfactual sequences that are inconsistent with the user's interests. These LLM-generated views are encoded by the federated backbone and coupled via a tri-view contrastive objective, yielding richer and more discriminative user representations without exposing raw data or model parameters. Comprehensive experiments on three real-world benchmarks demonstrate that LUMOS consistently outperforms strong centralized and federated baselines in terms of HR@20 and NDCG@20. Moreover, by learning from semantically grounded positives and counterfactual negatives, LUMOS exhibits improved robustness under noisy and adversarial training conditions, despite requiring no explicit server-side defense mechanism. Our results highlight LLM-driven behavioral synthesis as a promising direction for enhancing privacy-preserving federated recommendation. |
| 16:55 | Fine-tuning of Multilingual Language Models for Low-Resource Smishing Detection using LoRA PRESENTER: Natalia Krawczyk ABSTRACT. Smishing, or SMS phishing, is a fast-growing cybersecurity threat that targets mobile users through deceptive text messages. Detecting smishing in low-resource languages remains challenging because labeled datasets are limited, and training language models for each language is computationally expensive. This paper presents the use of Low-Rank Adaptation (LoRA) for efficient fine-tuning of multilingual models in such settings. We use XLM-RoBERTa Base as the base model and evaluate three training strategies: a monolingual English baseline, full fine-tuning, and LoRA-based fine-tuning. In the experiments, we use English, Bengali and Swahili datasets, representing both high- and low-resource conditions. Models are trained with small, incrementally sized subsets (50–250 samples per language) to simulate realistic data scarcity. The results show that while full fine-tuning achieves the best overall accuracy, LoRA performs competitively, reaching up to 97\% F1 with only a small fraction of trainable parameters. LoRA's performance improves steadily with additional labeled samples, offering a strong balance between efficiency and effectiveness. These findings demonstrate that LoRA is a practical alternative for multilingual smishing detection in low-resource contexts, enabling cost-effective adaptation of large language models. |
| 17:00 | Out-of-Time Analysis of the Stability of Machine Learning Techniques for Churn Management in Non-Contractual Businesses PRESENTER: Adriana dos Reis ABSTRACT. Churn is a metric widely used by companies to quantify customer attrition, defined as the event in which a customer terminates a contractual relationship or discontinues purchasing a company’s products or services. This work addresses the prediction of churn in B2B customers in the agribusiness sector, utilizing machine learning techniques applied to a real dataset with a three-year transaction history. The study evaluates three algorithms that are widely used in the field: Logistic Regression, Random Forests, and XGBoost. The main objective is to analyze the predictive stability of these models over time (Out-of-Time analysis), without re-training, verifying their ability to maintain consistent performance on future data. The predictive quality was gauged utilizing the AUC-ROC metric, supplemented by a 10-fold cross-validation technique to assess the robustness of the models. The results revealed that the Random Forest model exhibited enhanced stability and superior overall performance, with AUC-ROC varying between 0.849 and 0.877 in the four quarters analyzed, while Logistic Regreesion and XGBoost showed greater variability. The Student's t-test confirmed a statistically significant difference between RF and LR (\(p \leq 0.05\)). The conclusions of this analysis serve to fortify the practical applicability of the proposed methodology. This applicability manifests in the form of a reduction in operating costs associated with frequent model maintenance and the facilitation of customer retention strategies. |
| 17:05 | Effect of Training Window Length on the Performance of Machine Learning Models for PM10 Forecasting PRESENTER: Przemysław Juszczuk ABSTRACT. Accurate forecasting of PM10 concentrations is essential for effective air quality management; however, the influence of historical training window length on the performance of machine learning mod els remains poorly understood. This study investigates how the amount of historical data used for model training affects PM10 prediction accu racy. Using hourly meteorological and air quality data from six urban locations in southern Poland, models were trained with progressively extended historical windows and evaluated separately for the cold and warm seasons. Model performance was assessed using the Mean Abso lute Error and the coefficient of determination. The results indicate that extending the training window does not consistently improve predictive performance. In most cases, the highest accuracy was achieved using relatively short training histories, while longer data records often led to performance degradation. These findings highlight the importance of data recency and seasonal effects in data-driven air quality forecasting. |
| 17:10 | SHAP-Guided LightGBM Classification of Neuropathic EMG Signals PRESENTER: Behçet Uğur Töreyin ABSTRACT. Accurate identification of neuropathies from electromyography (EMG) is crucial for automated diagnosis and future wearable screening systems. In this work, invasive EMG signals were processed to extract a 17 dimensional feature vector and classified as Healthy or Neuropathy using Light Gradient Boosting Machine (LightGBM). Model robustness was ensured through stratified 5 fold cross validation and repeated evaluation over 100 random dataset splits. SHapley Additive exPlanations (SHAP) were then applied to assess feature relevance and interpretability. Based on the SHAP ranking, we introduced a SHAP-guided Iterative Feature Elimination (SHIFE) strategy, which removes features according to their estimated importance. This approach was compared with an unguided Iterative Feature Elimination (IFE) baseline that evaluates multiple feature combinations at each reduction step. Both methods improve performance with respect to the full feature set: IFE reduces the feature vector to 6 features and increases mean accuracy and AUC, while SHIFE reduces it to 7 features, preserving accuracy and improving AUC. |
| 17:15 | Exploring Action Selection Methods in User-Driven Image Generation PRESENTER: Urszula Markowska-Kaczmar ABSTRACT. User-driven image generation can be a valuable solution for many engineering tasks. In this work, we investigate how such techniques can support architects in designing Art Nouveau façades for tenement buildings. The proposed approach combines user-guided image genera- tion with reinforcement learning. To overcome the challenge of specifying reward functions, we introduce a reward network trained on user pref- erences. We explore strategies for image generation, preference-pair for- mation, and reward-network optimization. Experiments were conducted on the CelebA and NeoFacade datasets. Given the high cost of human evaluation, we also examine the use of artificial preferences to improve efficiency. Our findings provide insights into preference learning for user- driven image generation. |
| 17:20 | GRIT-R: A Graph–Text Fusion Framework with Gated Integration for Rumor Detection on Social Media PRESENTER: Bay Vo ABSTRACT. Rumor detection on social media is challenging due to short text length, ambiguous language, and strong dependence on propagation context. Text-based models often overlook social interaction signals, while graph-based models struggle to capture linguistic semantics. To address this, we propose GRIT-R (Graph–Text Rumor Integration with Gating), which integrates representations from a transformer-based language encoder and a graph attention network through a gated fusion mechanism, allowing dynamic adjustment between the two modalities. On the benchmark Twitter15 dataset, our approach achieves an Accuracy and macro-F1 of 0.923, surpassing baseline models and reaching state-of-the-art performance. |
| 17:40 | PRESENTER: Indra Aulia ABSTRACT. This study presents a theoretical-informational analysis of URL characteristics to identify the most relevant and least redundant attributes for phishing detection in the Indonesian top-level domain (.id). Using Mutual Information (MI), Information Gain (GI), and Maximum Relevance with Minimum Redundancy (MRRM), this study systematically evaluates 83 lexical, structural, and entropy-based features extracted from legitimate and phishing URLs in the Indonesian domain space. The results reveal distinct patterns that consistently characterize local phishing attempts. Higher relevance scores are dominated by path-based attributes (such as path depth, symbol frequency, and digit concentration), indicating the attackers' strong reliance on deeply nested and irregular directory structures. Entropy-based features in URL components, domains, and paths also prove prominent, reflecting the widespread use of scrambled and obfuscated lexical sequences as key evasion strategies. Further optimization by RMRM indicates that some highly relevant features exhibit redundancy, while certain host-level descriptors retain unique discriminatory value. These findings offer a solid, data-driven foundation for understanding the structural signals most closely associated with phishing behavior in Indonesian TLDs. By mapping the relevance-redundancy landscape of key URL attributes, this study lays the groundwork for the future development of lightweight, interpretable, and feature-based phishing detection models specifically calibrated for the Indonesian cyber ecosystem. |
| 16:00 | Towards plausibility in time series counterfactual explanations PRESENTER: Marcin Kostrzewa ABSTRACT. We present a new method for generating plausible counterfactual explanations for time series classification problems. The approach performs gradient-based optimization directly in the input space. To enforce plausibility, we integrate soft-DTW (dynamic time warping) alignment with k-nearest neighbors from the target class, which effectively encourages the generated counterfactuals to adopt a realistic temporal structure. The overall optimization objective is a multi-faceted loss function that balances key counterfactual properties. It incorporates losses for validity, sparsity, and proximity, alongside the novel soft-DTW-based plausibility component. We conduct an evaluation of our method against several strong reference approaches, measuring the key properties of the generated counterfactuals across multiple dimensions. The results demonstrate that our method achieves competitive performance in validity while significantly outperforming existing approaches in distributional alignment with the target class, indicating superior temporal realism. Furthermore, a qualitative analysis highlights the critical limitations of existing methods in preserving realistic temporal structure. This work shows that the proposed method consistently generates counterfactual explanations for time series classifiers that are not only valid but also highly plausible and consistent with temporal patterns. |
| 16:20 | Conflict-Based Classification with Parameterized Coalition Thresholds: Unified and Diverse Approaches for Dispersed Data PRESENTER: Jakub Sacewicz ABSTRACT. The paper addresses the challenge of classifying dispersed data collected from independent sources. The proposed approach constructs ensembles of random trees for local datasets and applies conflict analysis to identify coalitions of models with similar or diverse predictions. Two strategies for forming global decisions are examined: unified coalitions (grouping models with similar opinions) and diverse coalitions (combining models with differing opinions). Each strategy is evaluated in weighted and unweighted variants, where weights are based on validation accuracy. The key novelty of this work lies in introducing parameterization of the threshold that determines when local models are considered to form a coalition or to be in conflict within Pawlak's conflict model. This parameter is dynamically optimized using a random search procedure, which significantly influences coalition structure and classification performance. To the best of our knowledge, systematic exploration of parameter tuning in conflict-based coalition frameworks (for unified and diverse coalitions) has not been previously addressed. The proposed method is assessed on multiple public datasets and compared with traditional ensemble and optimization strategies, demonstrating improved accuracy and robustness. These findings highlight the importance of adaptive coalition formation and conflict parameter optimization in distributed learning environments. |
| 16:00 | New Mixed-Integer Linear Programming Formulations with Redundant Variables for Elementary Shortest Path and Quorumcast Routing Problems PRESENTER: Quoc-Trung Bui ABSTRACT. The Elementary Shortest Path (\textbf{ESP}) problem on graphs with negative cycles and the Quorumcast Routing (\textbf{QR}) problem are two important NP-hard routing combinatorial optimization problems in operations research. The \textbf{ESP} problem seeks an elementary shortest path between two given nodes such that no node appears more than once, while the \textbf{QR} problem computes a minimum-cost tree that includes a predefined root node and at least a specified number of multicast nodes from a given set. Currently, Mixed-Integer Linear Programming (MILP) represents the state-of-the-art exact approach for both problems. This research proposes new MILP formulations by incorporating redundant variables and constraints into existing state-of-the-art formulations. Experimental results demonstrate that algorithms implementing these formulations using the CPLEX solver are faster and require less memory than existing methods. |
| 16:20 | Infrastructure Abstract Graph for Modeling and Comparative Evaluation of Optimization Methods PRESENTER: Artur Basiura ABSTRACT. Road lighting is a key factor for traffic safety and visual comfort, but also represents a significant share of urban energy consumption. Numerous optimization methods have been proposed to design more efficient installations; however, their results are difficult to compare because they rely on different input models, lighting standards, and problem formulations. Thus, a common formal representation is needed that enables comparison not only of optimization algorithms but also of their normative and contextual assumptions. This paper introduces the Infrastructure Abstract Graph (IAG), a typed attributed graph model that can represent the road lighting infrastructure in a way that is independent of any particular optimization method. The approach is illustrated in a real case study, where an existing urban lighting network is mapped to IAG and three different optimization strategies are applied. The example demonstrates how IAG supports systematic comparison of methods and the resulting design configurations. |
| 16:40 | PRESENTER: Paweł Lorek ABSTRACT. The Expectation--Maximization (EM) algorithm is a standard method for parameter estimation in Gaussian mixture models (GMMs). However, in high-dimensional settings with limited data, the classical EM algorithm often suffers from numerical instabilities due to ill-conditioned or singular covariance estimates. To address this issue, regularized variants of EM often introduce shrinkage of covariance matrices toward predefined target structures. Yet, their performance remains highly sensitive to the choice of the regularization parameter. In this paper, we propose a new gradient-based approach to adaptively update the regularization parameter during the iterations of the EM algorithm. Our method, denoted GMM-grad, leverages automatic differentiation to continuously adjust the penalization strength, in contrast to grid-search strategies. We provide a comprehensive empirical evaluation on both synthetic and real datasets, including text and image data, demonstrating that the proposed method achieves more stable and accurate clustering in challenging scenarios. |
| 17:00 | Haar Decomposition with Cross Attention for Time Series Anomaly Detection PRESENTER: Duy Nguyen ABSTRACT. Time series anomaly detection remains challenging due to non-stationary dynamics, multi-scale patterns, and scarce labeled anomalies. Existing deep models often struggle to learn reliable normal behavior, particularly for short and low-dimensional time series. We propose an unsupervised architecture that combines a fixed one-level Haar wavelet decomposition with a shared-parameter bidirectional cross-attention encoder and branch-specific Temporal Convolutional Network (TCN) decoders. The Haar transform decomposes each input window into low-frequency (trend) and high-frequency (transient) components, while cross-attention mutually conditions the two bands to expose inconsistent dynamics that may indicate anomalies. Experiments on six public benchmarks show that our method outperforms twelve representative baselines on five datasets, achieving F1 scores of 100% on UCR and 2D-Gesture, 99.39% on Power Demand, 99.09% on ECG, and 99.85% on SMD, while remaining competitive on MSL (95.09%). The source code and pretrained checkpoints are available at https://github.com/NNNguyenDuyyy/ACIIDS_2026_Anomaly_Timeseries_Detection.git. |
| 16:00 | DOM-GraphIE: HTML-Aware GNNs for Web Information Extraction PRESENTER: Nguyen Thi Thuy Loan ABSTRACT. We introduce DOM-GraphIE, a web-native framework for joint entity and relation extrac-tion that combines a Transformer cross-encoder with a graph neural network (GNN) over HTML Document Object Model (DOM) graphs enhanced with inter-page links. Raw HTML is parsed into DOM nodes and hyperlink edges, a tensor builder extracts node fea-tures and anchor pairs, and a fusion module integrates the Transformer’s pooled sequence representation with GNN embeddings of the anchor nodes and a pooled subgraph embed-ding. On three benchmarks (DocRED, CDR, and DWIE), DOM-GraphIE outperforms strong Text-only and GNN-only baselines, achieving F1 (positive) scores of 0.9026 on DocRED, 0.7318 on CDR, and 0.8886 on DWIE. Beyond accuracy, the model provides evidence nodes that highlight DOM regions supporting each decision, enhancing interpret-ability. These results demonstrate that combining deep semantic encoding with DOM-level message passing is an effective approach for document-level relation extraction on the web. |
| 16:20 | Secure Semantic Communications with Adversarial Training and Active Eavesdropper PRESENTER: Phuong Vo ABSTRACT. Semantic communication is gaining popularity in wireless image transmission systems due to its high efficiency. However, this efficiency comes with security drawbacks. It makes transmissions easier for eavesdroppers to capture and gives adversaries opportunities to inject misleading content. These weaknesses expose the current semantic communication systems to both eavesdropping and semantic attacks. To mitigate these threats, this paper proposes a secure framework that combines the Swin Transformer-based Joint Source-Channel Coding (SwinJSCC) architecture with two defense mechanisms: adversarial training and physical-layer artificial noise (AN) injection. The adversarial training improves resilience against over-the-air attacks by training the system on disturbed signals. Meanwhile, AN injection jams the eavesdropper’s reception without affecting legitimate receivers. This dual strategy protects semantic communications from unauthorized interception while preserving robust reconstruction quality. Extensive simulations show that the proposed secure SwinJSCC system achieves superior performance compared to baseline and traditional methods at high signal-to-noise ratio (SNR) and maintains comparable performance at low SNR. These results confirm that integrating adversarial training with AN injection effectively secures semantic communications without compromising transmission efficiency. |
| 16:40 | WSUI: An Efficient Weighted Sequential Pattern Mining Algorithm Using Indices and Frequency-Based Weighting PRESENTER: Thi-Thiet Pham ABSTRACT. Sequential pattern mining discovers commonly found ordered patterns but has regarded all items as equals without considering the importance of each. Weighted sequential pattern mining (WSPM) overcomes such drawbacks by taking the importance of items into consideration with corresponding weights. However, currently available approaches are faced with high memory and computational cost problems. Motivated by this, we proposed WSUI, which successfully combines the memory-efficient pseudo-IDList structures with strict upper bounds and automatically frequency-based weight distribution. Our method guarantees pattern completeness by keeping track of all possible ends for i-extensions and taking advantage of pseudo-IDList in s-extensions. Experiments on three real-world datasets show that WSUI outperforms the state-of-the-art EWSPM consistently. It provides substantial speed-up while keeping the exhaustive pattern discovery and competitive memory consumption. |
17:30 Meeting point: Howard Plaza Hotel, Kaohsiung