Program for Thursday, November 20th

PROGRAM FOR THURSDAY, NOVEMBER 20TH

Days:

previous day

next day

all days

View: session overview talk overview

09:00-10:00 Session 14: Keynote 2

Location: RHLT1

09:00

Kate Smith-Miles

The search for quantum advantage in optimisation: myths, maths, and the travelling salesman problem

ABSTRACT. Over the last two decades, the travelling salesperson problem (TSP) has been used as a benchmark problem to explore the advantage of quantum computers over conventional computers. Its advantages include being easy to understand, highlighting the challenges of searching through an exponentially growing number of possible solutions, with direct applications to large scale transportation and logistics problems in industry. However, we argue that the TSP is not a problem well-suited to current (QUBO-based) quantum optimisation methods. At what point is a call made that quantum advantage is not likely, and efforts should be focused on other problems or directions? This talk discusses the requirements for demonstrating quantum advantage, and offers mathematical arguments for why current (QUBO-based) quantum methods are not well suited for tackling the challenges of the TSP landscape. Drawing parallels with similar observations made almost four decades ago for (QUBO-based) neural networks, we highlight lessons that can be learned, and discuss the numerous challenges that must be overcome for current quantum methods to offer quantum advantage for constrained optimisation. Finally, we discuss more promising directions for quantum optimisation, where quantum search could accelerate components of classical state-of-the-art algorithms.

10:00-10:30Coffee Break

10:30-12:30 Session 15A: Best Paper Session

Chair:

Yi Mei

Location: RHLT2

10:30	Chao Xue and Ziyuan Gao StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching PRESENTER: Ziyuan Gao ABSTRACT. Text semantic matching requires nuanced understanding of both structural relationships and fine-grained semantic distinctions. While pre-trained language models excel at capturing token-level interactions, they often overlook hierarchical structural patterns and struggle with subtle semantic discrimination. In this paper, we proposed StructCoh, a graph-enhanced contrastive learning framework that synergistically combines structural reasoning with representation space optimization. Our approach features two key innovations: (1) A dual-graph encoder constructs semantic graphs via dependency parsing and topic modeling, then employs graph isomorphism networks to propagate structural features across syntactic dependencies and cross-document concept nodes. (2) A hierarchical contrastive objective enforces consistency at multiple granularities: node-level contrastive regularization preserves core semantic units, while graph-aware contrastive learning aligns inter-document structural semantics through both explicit and implicit negative sampling strategies. Experiments on three legal document matching benchmarks and academic plagiarism detection datasets demonstrate significant improvements over state-of-the-art methods. Notably, StructCoh achieves 86.7% F1-score (+6.2% absolute gain) on legal statute matching by effectively identifying argument structure similarities.
10:50	Mingze Han, Shuang Liu, Peng Chen, Mingliang Xue, Simon Kolmanič and Dabao Zhang Structure-Aware Dynamic Fusion with Modality Balance for Multimodal KGC ABSTRACT. Multimodal knowledge graphs (MMKGs) integrate structural, visual, and textual modalities to enhance entity and relation representations. However, existing MMKGs completion methods often rely on static fusion strategies that overlook context-specific modality relevance, and they tend to underutilize structural information encoded in the graph topology.In this paper, we present SDMF-MKG, a structure-aware dynamic fusion framework designed to address modality bias and structural underrepresentation in MMKGs. The model incorporates three key components: a structure-guided semantic encoder that preserves topological signals, a dynamic weighting mechanism that adaptively calibrates modality contributions based on triple context, and a KL-regularized loss to encourage balanced modality utilization. We evaluate SDMF-MKG on four benchmark datasets spanning both multimodal-rich and structure-only settings. The model achieves state-of-the-art or competitive performance across most metrics, with notable gains on multimodal datasets such as VTKG-C. Ablation studies further confirm the complementary effects of structure awareness, adaptive fusion, and modality balancing.
11:10	Hai Dang Nguyen, Nguyen Dang Huy Pham, The Minh Duc Nguyen, Dac Thai Nguyen, Hang Thi Nguyen and Duong M. Nguyen MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression PRESENTER: Hai Dang Nguyen ABSTRACT. Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information, offering critical insights into tissue architecture and disease pathology. Recent developments have explored the use of hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks. This task is commonly framed as a regression problem, where each input corresponds to a localized image patch extracted from the WSI. However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals. Recent studies have attempted to incorporate both local and global information into predictive models. Nevertheless, existing methods still suffer from two key limitations: (1) insufficient granularity in local feature extraction, and (2) inadequate coverage of global spatial context. In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously. To enhance local feature granularity, MMAP leverages multi-magnification patch representations that capture fine-grained histological details. To improve global contextual understanding, it learns a set of latent prototype embeddings that serve as compact representations of slide-level information. Extensive experimental results demonstrate that MMAP consistently outperforms all existing state-of-the-art methods across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Pearson Correlation Coefficient (PCC).
11:30	Yi Qi, Shufeng Chen, Xiangyu Yin, Wenjie Ruan, Siddartha Khastgir, Ji Ruan, Xingyu Zhao and Xiaowei Huang Interpreting Safety: A LLM and STPA Approach PRESENTER: Ji Ruan ABSTRACT. Artificial Intelligence (AI) models are increasingly used in complex systems such as autonomous vehicles (AVs), where safety and explainability are critical. However, existing explainable AI (xAI) methods focus on model-level transparency while neglecting system-level safety explanations (Gap 1), and prior applications of large language models (LLMs) in AVs often view the AV as a whole, overlooking potential risks arising from interactions among its internal components (Gap 2). To address these gaps, we propose a framework that integrates LLMs with System Theoretic Process Analysis (STPA), a structured method to analyse hazards and assess safety, to improve AV safety assurance. Our framework leverages LLMs for scenario analysis while incorporating STPA to identify unsafe control actions (UCAs) and filter them with real-world video data. We evaluated our method against Lingo-2 (a vision-language-action model developed by Wayve) in a simulated environment, demonstrating superior STPA-based explanations. To evaluate the framework, we employed two ground truth references for accuracy verification and conducted robustness testing, which outperforming traditional LLM-based explainers, as also confirmed by expert evaluations.
11:50	Xiaoyu Han, Yonghui Xu, Haotian Chen and Lizhen Cui Test-Time Recommendation for Safe Medication Combination ABSTRACT. Recommending appropriate medication combinations is crucial to intelligent healthcare. Recent studies leverage Large Language Models (LLMs) to streamline traditional recommendation architectures. However, LLMs are prone to hallucinations during training, often generating nonexistent medications or suggesting incompatible combinations. Moreover, patient queries(i.e., personal information, medical history, and current symptoms) often arrive as an online stream from distributions that differ from the training data. To address this, we propose a novel test-time recommendation method for medication combination, enabling on-the-fly and robust recommendations. During training, we introduce a learnable output layer and a drug–drug interaction (DDI)-aware objective to guide the LLM in generating clinically valid and safe medication combination recommendations. To handle distribution shifts at test time, we further design a self-distillation task that enables off-the-shelf pretrained models to dynamically adapt to unseen patient queries based on their feature representations. Extensive experiments conducted on the MIMIC-III and MIMIC-IV datasets demonstrate that our approach excels in both recommendation accuracy and safety, showing its potential for deployment in real-world clinical settings.
12:10	Hangzhi Guo, Firdaus Ahmed Choudhury, Tinghua Chen and Amulya Yadav Watermarking Counterfactual Explanations ABSTRACT. Counterfactual (CF) explanations for ML model predictions provide actionable recourse recommendations to individuals adversely impacted by predicted outcomes. However, despite being preferred by end-users, CF explanations have been shown to pose significant security risks in real-world applications; in particular, malicious adversaries can exploit CF explanations to perform query-efficient model extraction attacks on the underlying proprietary ML model. To address this security challenge, we propose CFMark, a novel model-agnostic watermarking framework for detecting unauthorized model extraction attacks relying on CF explanations. CFMark involves a novel bi-level optimization problem to embed an indistinguishable watermark into the generated CF explanation such that any future model extraction attacks using these watermarked CF explanations can be detected using a null hypothesis significance testing (NHST) scheme. At the same time, the embedded watermark does not compromise the quality of the CF explanations. We evaluate CFMark across diverse real-world datasets, CF explanation methods, and model extraction techniques. Our empirical results demonstrate CFMark's effectiveness, achieving an F-1 score of ~0.89 in identifying unauthorized model extraction attacks using watermarked CF explanations. Importantly, this watermarking incurs only a negligible degradation in the quality of generated CF explanations (i.e., ~1.3% degradation in validity and ~1.6% in proximity). Our work establishes a critical foundation for the secure deployment of CF explanations in real-world applications.

10:30-12:30 Session 15B: Large Language Model 3

Chair:

Ruwang Jiao

Location: RHMZ02

10:30	Yuanyi Wang and Ichiro Kobayashi Enhancing LLM Abductive Reasoning through MCMC Premise Retrieval PRESENTER: Yuanyi Wang ABSTRACT. We present a framework that leverages Markov Chain Monte Carlo (MCMC) to enhance abductive reasoning in large language models (LLMs). Abductive reasoning—the task of inferring the most plausible explanation for a given observation—remains a significant challenge for LLMs, particularly in scenarios with incomplete or ambiguous information. Existing methods typically rely on static retrieval strategies that struggle to adapt to diverse reasoning contexts. In contrast, our approach employs an unsupervised MCMC algorithm to efficiently explore large premise spaces, balancing exploration and exploitation to identify the most relevant supporting evidence. These premises are dynamically reordered to appear at the beginning of the prompt, guiding LLMs toward generating more accurate and coherent hypotheses. Experimental results demonstrate substantial gains in both premise recall and hypothesis consistency, highlighting the effectiveness of probabilistic modeling in complex reasoning tasks. When evaluated on the Entailment Bank dataset, our method significantly improves premise retrieval and enables LLMs to generate hypotheses that better align with the ground truth.
10:50	Suchun Xie, Shota Sasaki, Hwichan Kim, Yunmeng Li, Reina Akama and Jun Suzuki Understanding Cross-Lingual Generalization of English-Centric LLMs: The Role of Representation Similarity and Data Exposure PRESENTER: Suchun Xie ABSTRACT. English-centric large language models (LLMs), such as LLaMA, have gained prominence in NLP research and practice. Although these models are predominantly trained on English data, their widespread adoption has prompted important attention regarding their cross-lingual generalization capabilities. While cross-lingual capabilities have been extensively explored in the context of multilingual masked language models (MMLMs), corresponding research on English-centric LLMs remains limited. However, due to their decoder-only architecture and constrained access to multilingual training data, it remains unclear whether insights gained from MMLMs apply to these English-centric models. To fill this gap, we conduct a systematic analysis of cross-lingual generalization capabilities in English-centric LLMs. Our experiments demonstrate that even when fine-tuned solely on English data, English-centric LLMs generalize across languages in both classification and generation tasks. Further analysis reveals that representation similarity to English plays a crucial role in enabling this generalization, outweighing the influence of the multilingual data ratio during pretraining. This finding contrasts with prevailing assumptions in the MMLM literature. Additionally, we propose and empirically validate a similarity-reversed data allocation strategy, one that assigns more data to languages less similar to English, which can effectively enhance overall multilingual performance, particularly under constrained data budgets.
11:10	Junxin Li, Yifu Guo, Zishan Xu, Fengyu Yang, Siyue Chen, Siyan Wu and Lihua Cai Text2Omni: A Text-only Training Strategy for MLLMs PRESENTER: Lihua Cai ABSTRACT. Text2Omni is an innovative framework for generating high-quality multimodal synthetic data using text alone, targeting the advancement of Multimodal Large Language Models (MLLMs). Text2Omni addresses the significant challenge of acquiring large-scale multimodal datasets by eliminating the need for real images or audio. The framework leverages the geometric structure of multimodal contrastive representations to generate diverse, high-quality datasets that facilitate pretraining and instruction-tuning for multimodal models. The process involves a three-stage pipeline: (1) Diverse Caption Data Synthesis, where text descriptions are enriched with more detailed semantic information; (2) Instruction-Tuning Data Generation, producing data for complex tasks like multiple-choice and reasoning; and (3) Modality Representation Transfer, where textual descriptions are converted into synthetic image or audio representations. The resulting datasets, Text2Omni-1.8M for pretraining and Text2Omni-540K-Instruction for instruction-tuning, significantly reduce training costs while supporting the development of small- to medium-scale multimodal models. The paper also introduces a two-phase multimodal training paradigm to enhance multimodal understanding and reasoning capabilities efficiently. Experimental results across image-to-text and audio-to-text tasks demonstrate that the Text2Omni framework improves the performance of existing models on a variety of benchmarks, establishing its potential as an effective tool for advancing multimodal learning without requiring large-scale real-world data.
11:30	Xiaodan Wang, Yanbin Liu, Weihua Li and Quan Bai Tunnel Vision in Online Discourse: Formalization and Entropy-Based Quantification with LLM-Simulated Agents ABSTRACT. Online social networks have reshaped public discourse by enabling large-scale, user-driven discussions on societal topics. However, such discussions often exhibit a narrowing of attention, where collective focus converges on a limited subset of topic aspects while neglecting alternative viewpoints. We term this phenomenon tunnel vision. Distinct from ideological echo chambers or filter bubbles, tunnel vision emerges at the aspect level, reflecting reduced topical diversity rather than alignment of opinions. This paper presents a formal framework for defining and quantifying tunnel vision in online discourse. We propose two entropy-based metrics, i.e., Aspect-Sentiment Pairwise Entropy (ASPE) and Coverage-Adjusted Aspect-Sentiment Entropy (CASE), to measure both the distribution and completeness of aspect-level engage ment. To investigate the emergence and dynamics of tunnel vision, we simulate discourse using LLM-simulated agents guided by Bayesian cognitive modeling within an artificial social environment. Our experiments show that tunnel vision naturally arises over time and is shaped by cognitive constraints, including users’ perception windows and expressive capacity. These results offer a new lens on attention dynamics in digital spaces and establish a quantitative basis for detecting and mitigating aspect-level discourse narrowing.
11:50	Chao Guo, Shaolin Wang, Ruwang Jiao and Jin Wang From Evolution to Generation: Leveraging LLMs to Redefine Genetic Programming for Symbolic Regression ABSTRACT. Mathematical equations describe fundamental laws across various disciplines, yet discovering concise and effective mathematical expressions from data remains a challenging task. Traditional symbolic regression methods often overlook domain-specific prior knowledge that scientists rely on, while large language model (LLM)-driven symbolic regression approaches can effectively leverage it. However, existing LLM-driven symbolic regression methods typically require substantial computational resources to generate equations while still suffering from low efficiency in producing high-quality expressions. To address this issue, we propose LLM-Guided Genetic Programming for Symbolic Regression (LLMGP-SR), a prompt-guided equation evolutionary search algorithm. LLMGP-SR integrates LLMs into the initialization, crossover, and mutation operations of genetic programming, achieving an organic integration of semantic generation and structural evolution of expressions. By leveraging an adaptive prompt strategy, LLMGP-SR constructs carefully designed prompts to guide LLMs in generating effective expressions. Experimental results demonstrate that LLMGP-SR significantly outperforms traditional genetic programming in symbolic regression problems across six common benchmark datasets, while maintaining diversity in the solution space.
12:10	Wang Hongkai, Yang Xiaocui, Wang Daling, Feng Shi and Zhang Yifei Adaptive Persona Context Modulation for Personalized Emotional Support Conversation ABSTRACT. Personalized Emotion Support Conversation (ESC) systems (i.e., supporters) assist users (i.e., seekers) in navigating negative emo- tional states through personalized, empathetic interactions, which often equipped with a persona extractor. Currently, two key challenges are encountered by personalized ESC systems. First, while existing persona extractors attempt to infer persona from dialogue to understand seekers, they often struggle to distinguish the speakers’ roles and thus only con- sider the utterances from seeker’s side. Second, incorporating personal information without consideration of contextual relevance risks damaging the naturalness and coherence of responses. Therefore, we present a novel Adaptive Persona Context Modulation approach (APCM) for the ESC task. For more effective persona extraction, we reconstruct the Persona- Chat dataset to adapt to our task and propose a role-cognitive persona extractor, thus enhancing comprehensive understanding for seeker’s per- sona from utterances by both sides while preventing role ambiguity. For persona-context integration, our model introduces an Adaptive Atten- tion Balancing Module that dynamically adjusts the influence of persona and context information during response generation, better reflecting real-world conversation patterns where seeker’s persona is only consid- ered in appropriate circumstances. Extensive experiments on benchmark datasets demonstrate the effectiveness of APCM, achieving state-of-the- art (SOTA) performance in emotional support dialogue generation. Our code is public at https://anonymous.4open.science/r/APCM.

10:30-12:30 Session 15C: Computer Vision 3

Chair:

Zhenshou Song

Location: RHMZ03

10:30	Xinxu Xie, Minghao Kong, Yu Xin, Chen Xu and Shuzi Niu Prompt Efficient Generation Agent for Skin Lesion Segmentation PRESENTER: Xinxu Xie ABSTRACT. Skin lesion segmentation is a medical image analysis task that involves automatically delineating lesion boundaries from dermatoscopic or clinical images. It plays a critical role in the early detection of skin cancer like melanoma. Low contrast and fuzzy boundaries are the main challenges, especially for limited training images. To tackle these challenges with limited annotation data, we propose a controllable prompt generation agent to activate the skin lesion segmentation capability in vision foundation models for medical image analysis. Specifically, we reduce the pixel-level action space to the grid level for efficient search. With Convolutional Neural Networks as the backbone, the agent performs spatial reasoning over an image to simultaneously find the prompt coordinate and its label using a policy function, and provides the selected prompt points for vision foundation model. The interaction process will be terminated within a fixed iteration number. For optimization, we propose asymmetric rewards aligned with the value function and introduce them into proximal policy optimization to save computation and memory cost. Interestingly, better performance is achieved with fewer prompt points than the threshold number, along with some background points. Thus, the proposed agent is referred to as the Prompt Efficient Generation agent. Experimental results on public benchmark skin lesion segmentation datasets show that PEG outperforms state-of-the-art methods, and the IoU improvement is at least 4% compared with SAM.
10:50	Pierre Lefebvre, Houda Saidi, Mohammed Azzakhini, Ahmed Azough and Nicolas Travers CMoD-VD: Cross-Modal Distillation with Privileged Motion Supervision for Violence Detection PRESENTER: Pierre Lefebvre ABSTRACT. Automatic violence detection in videos (VD) has become a major challenge in the field of Computer Vision with the deployment of smart cameras and the increasing volume of videos shared online. Recent works primarily rely on CNN-based models paired with 3D or recurrent layers to capture the spatiotemporal dynamics of video streams. The integration of additional modalities, such as audio or optical flow, has recently attracted growing interest. Particularly, optical flow has demonstrated strong relevance in modeling motion patterns associated with violent events. However, its estimation is computationally intensive, limiting its use for real-time applications. In this work, we introduce CMoD-VD, a novel method for violence detection based on two CNN+BiLSTM models enhanced with spatial, channel, and temporal attentions. Our method relies on cross-modal distillation with privileged motion supervision. A teacher model is first trained with both RGB and optical flow videos. Then, a student model learns to reproduce its behavior using RGB frames only. This strategy enables accurate inference without relying on motion estimation. Experiments on three public datasets RWF-2000, Hockey Fight and Violent-Flows demonstrate that our student model achieves competitive results close to the teacher and state-of-the-art methods, while significantly reducing computational costs.
11:10	Enhui Chai, Tianxiang Cui, Ta Lin, Yujian Ye and Ning Xue BLAH: Enhancing Small Object Detection via a Bi-Level Interactive Head with Multi-Level Self-Attention ABSTRACT. The detection head framework critically influences the balance between classification and localization in small object detection, yet existing designs often neglect task-specific feature interactions, leading to optimization conflicts. To address this, we propose Bi-Level Attention Head (BLAH), a novel framework that harmonizes dual-task learning through structured attention mechanisms and adaptive loss optimization. BLAH introduces two key innovations: (1) Channel Group Self-Attention (CGSA) stacks, which dynamically recalibrate channel-group dependencies to align classification and localization features, resolving spatial-channel decoupling limitations in conventional attention. (2) Dual-Task Attention (DTA), integrating global channel attention for classification robustness (translation invariance) and local spatial attention for precise localization (translation variability), enabling synergistic task interaction without computational overhead. Further, we design a Differentiable Task-Balanced Loss (DTBL) that adaptively modulates gradients between tasks via cosine similarity constraints, ensuring stable optimization without extra parameters. Extensive experiments on MS COCO and VisDrone demonstrate BLAH’s superiority. When integrated with DETR, Deformable DETR, and YOLOv10, BLAH achieves +1.2% mAP on COCO over state-of-the-art detectors (e.g., YOLO-based, DETR-based) while maintaining inference efficiency, and significantly improves small-object detection (e.g., +4.5% AP_S on YOLOv12). Ablation studies validate each component's necessity.
11:30	Ziyi Sun, Bing Xue, Mengjie Zhang and Jan Schindler YOLOv8-UT: A Unified Training Approach for Cross-Environment Tree Crown Instance Segmentation in Aerial Imagery ABSTRACT. Instance segmentation of individual tree crowns in aerial imagery is a critical task for forest management, carbon storage estimation, and biodiversity modeling. However, achieving effective segmentation faces significant challenges including dense canopy overlapping, diverse crown characteristics, and varying environmental conditions across different geographical regions. This paper proposes YOLOv8-UT, a unified training approach for cross-environment tree crown instance segmentation that enhances model generalization across rural and urban environments. YOLOv8-UT employs a two-stage training strategy that leverages unified pre-training on combined datasets followed by environment-specific fine-tuning to learn robust cross-environment features. Moreover, YOLOv8-UT incorporates the Large Kernel Attention mechanism to enhance feature representation for complex tree crown identification. Comprehensive experiments on aerial imagery from the Greater Wellington region demonstrate that YOLOv8-UT outperforms other recent peer competitors, achieving Box AP of 39.7 and Mask AP of 34.4 on the rural dataset, and Box AP of 48.2 and Mask AP of 40.5 on the urban dataset.

10:30-12:30 Session 15D: Machine Learning 3

Chair:

Jan Steckel

Location: RH103

10:30	Juntao Zhang, Shaogeng Liu, Jun Zhou, Kun Bian, You Zhou, Jianning Liu, Pei Zhang and Bingyan Liu Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain PRESENTER: Kun Bian ABSTRACT. In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as Mamba deep learning models, have made significant progress in modeling long sequences. Compared to Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), Vision Mamba (ViM) methods have not yet achieved fully competitive performance. To enable SSMs to process image data, ViMs typically flatten 2D images into 1D sequences, inevitably ignoring some 2D local dependencies, thereby weakening the model's ability to interpret spatial relationships from a global perspective. We believe that the introduction of frequency domain information can enable ViM to achieve a better global receptive field during the scanning process. We propose a novel model called Vim-F, which employs pure Mamba encoders and scans in both the frequency and spatial domains. Moreover, considering that Mamba remains essentially a recurrent neural network (RNN), we question the necessity of position embedding in ViM and remove it accordingly in Vim-F. Vim-F has good scalability. As far as we know, its variant Vim-F(CF) is the first ViM model to use a convolution-free ViM encoder. Another variant, Vim-F(H), introduces a linear attention mechanism. This reduces the model's sensitivity to the input sequence and achieves better performance.
10:50	Lisha Peng, Canghong Jin, Longxiang Shi and Qihao Shi D-FRGAT: Event Prediction Technology Based on Temporal Knowledge Graph Reasoning ABSTRACT. "Event" refers to a speciﬁc incident or occurrence that has a signiﬁcant impact on human society and the natural world. Predicting such events helps reduce the risk of potential losses. Event prediction technology plays a vital role in ensuring safety, reducing risks, and con- trolling infectious diseases, among other aspects. Some researches use temporal knowledge graphs to mine the relationship between events and predict the occurrence of future events by analyzing the evolution pat- terns of historical events. However, there are some challenges in the pre- diction process. Firstly, some studies model historical events as a time point process to learn the entity evolution representation, ignoring the interaction between concurrent events occurring at the same time. Sec- ondly, when focusing on the interaction of concurrent events, most studies update node features through simple static linear transformations, with- out considering the importance of semantic information and dynamically distinguishing diﬀerent edges under the same relationship type to node representation. Thirdly, it is impossible to model the temporal informa- tion and semantic information of historical events, and lacks the ability to dynamically perceive the importance of both.To improve prediction accuracy, we proposes a temporal event prediction model (D-FRGAT). Speciﬁcally, D-FRGAT incorporates a third-level control mechanism to decay the temporal information, and integrates multi-relational attention networks and node feature adjustment techniques to better capture the correlations among events over time. Experimental results demonstrate that our method achieves superior performance, the MRR on the GDELT and ICEWS18 datasets increased by 10.93% and 11.39% respectively.
11:10	Joeri Winckelmans, Bart De Clerck and Jan Steckel A Comparative Study of Variational and Vector Encoders in Graph User Matching ABSTRACT. Cross-Platform User Identification (CPUI) aims to identify social media accounts belonging to the same real-world user across different platforms. This task is vital for combating cybercrime, where malicious users create multiple accounts, and for enhancing user modeling in fields such as sociology, economics, and epidemiology. Prior research suggests that vector-based encoding of a local network graph may fall short when faced with real-world inconsistencies such as platform dependency and data sparsity. In response, variational encoding, which models the data as normal distributions explicitly, has been proposed as a more robust alternative. In this paper we present a comparative study of vector and variational encoding approaches in the context of a binary CPUI classification task. For this goal, we constructed a synthetic heterogeneous graph derived from 277 research papers authored within an engineering department. Using vector-embedding of the textual context of the papers as features, various models were trained to evaluate the advantage of variational encoding in CPUI. Experimental results show that the standard vector encoding consistently outperforms the variational models in terms of accuracy, F1-score, and AUC-ROC. While all models achieved high performance (accuracy around 90%), there was no empirical advantage to using variational encoding in our experiments. These findings suggest that the benefits of variational encoding may depend on the presence of real-world data inconsistencies that our synthetic dataset lacks
11:30	Shuaiwei Zhang, Xu Liang and Yazhou Hu Meta Learning-enhanced Iterative Learning Control for Tracking PRESENTER: Shuaiwei Zhang ABSTRACT. Iterative Learning Control (ILC) fundamentally suffers from sensitivity to uncertainties, and poor cross-task generalization. Addressing these limitations, we propose the first unified Meta Learning-enhanced Iterative Learning Control (Meta-ILC) framework—integrating metalearning principles with neural network-based adaptive control. Our approach replaces fixed ILC gain matrices with context-sensitive operators Lp(t) generated by deep residual networks, while a meta-optimizer extracts transferable knowledge across tasks to initialize near-optimal controllers. The framework autonomously adapts to system variations without model reliance or manual tuning, resolving the core deficiencies of conventional ILC. Experimental validation confirms significant advantages: accelerated convergence, minimal initial tracking errors, and robust performance across unseen trajectories, demonstrating transformative potential for high-precision applications including multi-axis robotics and semiconductor manufacturing.

10:30-12:30 Session 15E: Online Session

Zoom link: https://vuw.zoom.us/j/93664289896

Chair:

Yuye Zhang

Location: RH104

10:30	Nian Wei, Rongzuo Guo and Jun Huang Evolutionary Weight Pruning: A PSO-Based Approach ABSTRACT. Weight pruning is a widely used technique to compress deep neural networks by reducing parameters. However, most existing methods depend on magnitude thresholds, sparsity constraints, or heuristics, often resulting in suboptimal pruning and degraded model performance. In this paper, we propose PSOWeightPruner, a novel pruning approach based on Particle Swarm Optimization (PSO) to automatically find optimal pruning configurations. Unlike manual pruning criteria, PSOWeightPruner treats pruning as an optimization problem, where each particle encodes a candidate pruning strategy. By iteratively updating particle positions and velocities, PSOWeightPruner efficiently explores the search space and leverages PSO’s global search ability to converge on superior pruning solutions without manual tuning. Extensive experiments demonstrate that PSOWeightPruner outperforms conventional pruning methods in compression ratio and accuracy, enabling efficient end-to-end fine-tuning with minimal performance loss.
10:50	Yuxuan Xie, Bochuang Yang and Yuxin Xie MACT: Mutation-Aware CNN-Transformer for ESG Forecasting ABSTRACT. Environmental, Social, and Governance (ESG) indicators are core metrics for evaluating corporate sustainability and long-term resilience. Against the backdrop of escalating climate risks and increasingly stringent environmental regulations, timely and reliable forecasting of the environmental (E) dimension has become critical—yet remains challenging in the presence of abrupt structural changes. Using the Huazheng ESG Ratings dataset, which covers 2,270 mainland A-shares and Hong Kong–listed companies from 2013 to 2022, we formulate E-score prediction as a multivariate annual time-series task and generate training samples via overlapping sliding windows. We propose the Mutation-Aware CNN-Transformer (MACT), the first hybrid architecture explicitly designed to model ESG “mutations.” MACT employs convolutional encoders to capture short-term patterns, Transformer blocks to learn long-range dependencies, and two mutation-aware augmentation strategies—synthetic mutation injection and temporal masking—that introduce sudden shocks and missing segments during training. Extensive experiments show that MACT reduces the Root Mean Square Error (RMSE) to 3.7951 and the Mean Absolute Percentage Error (MAPE) to 3.67%. These results correspond to a reduction in RMSE and MAPE of 34.02% and 46.73%, respectively, compared to the state-of-the-art (SOTA) model Long Short‑Term Memory (LSTM), and an improvement of 34.96% and 46.89% relative to the Transformer baseline. Our findings demonstrate that integrating convolutional feature extraction, attention-based sequence modeling, and mutation-aware augmentation yields a highly accurate and robust framework for forecasting corporate environmental performance.
11:10	Ji Feng, Qingjun Zhang, Yongqiang Xu, Peilin Li and Degang Yang Multi-Stage Variance-Controlled Gradient Updates: Toward Robust Continual Learning ABSTRACT. Large Language Models (LLMs) have demonstrated remarkable performance and strong generalization across diverse tasks; however, catastrophic forgetting remains a fundamental challenge in continual learning scenarios. MIGU, a label-free approach, alleviates forgetting by selectively updating parameters based on gradient magnitude, thereby improving adaptability. Despite its effectiveness, MIGU relies heavily on manually tuned mask generation thresholds, which incur significant computational overhead and limit scalability. To address these limitations, this paper proposes MVGU, an improved method employing multi-stage variance-controlled gradient updates. At its core, MVGU optimizes pre-mask vector generation and threshold selection strategies to reduce dependence on empirical hyperparameter tuning inherent in MIGU, enhancing training efficiency. Extensive continual learning experiments on T5-Large and LLaMA3-8B Instruct architectures demonstrate that MVGU achieves comparable or superior performance to MIGU with fewer training iterations. Results indicate that MVGU is an effective continual learning strategy, capable of reducing training overhead, mitigating task interference during continual learning, and strengthening model adaptability in dynamic learning environments.
11:30	Jinda Du, Jian Hou and Huaqiang Yuan Improving Nystrom Spectral Clustering with Unsupervised Vector Quantization and Incomplete Cholesky Decomposition ABSTRACT. Spectral clustering is a popular clustering approach with wide applications in various fields including machine learning and pattern recognition. However, in dealing with large-scale datasets, it requires to build a very large pairwise similarity matrix, which can be very time-consuming. The Nystrom method is well known for its ability to approximate the feature space with a small number of samples (landmarks), thereby reducing the computation overhead significantly. Motivated by this observation, in this paper we address this problem by approximating the similarity matrix and eigenvectors based on the Nystrom method. First, we present a sampling method to determine the landmarks, which are used in the Nystrom method to obtain the approximate similarity matrix and eigenvectors. By careful utilization of k-means++ method and cosine similarity, our method improves the quality of landmarks. Second, we use incomplete Cholesky decomposition to accelerate the approximation method, and therefore improve the efficiency of the whole algorithm. In experiments with synthetic and real datasets, our algorithm is shown to be effective in comparison with some other approaches.
11:50	Kasra Mojallal, Pouria Sadr, Sepideh Ahmadian and Dima Alhadidi Prompt Attacks and Safeguards in Large Language Models: A Survey ABSTRACT. Large Language Models (LLMs) have quickly improved in how well they understand and generate human-like language. But as they become more capable, they also become more vulnerable to adversarial manipulation. This survey looks at different types of prompt-based attacks that take advantage of the tendency of models to follow instructions, often in ways that can undermine safety, privacy, or reliability. We organize these threats into a clear taxonomy and also explore a range of defense strategies. In addition, we review tools and benchmarks used to test how robust these models are (including PyRIT, Giskard, Garak, and PromptBench). By mapping attacks to defenses in a layered framework, this work emphasizes the need for thoughtful, flexible safeguards when using LLMs in real-world settings.
12:10	Yuhao Sun, Peng Zhang, Wei Zhao, Fuqiang Wang, Xiangzhi Liu and Xiaoming Wu Prompt-Enhanced Multimodal Learning for Robust Sentiment Analysis with Incomplete Data ABSTRACT. Multimodal sentiment analysis faces significant challenges when processing incomplete data, which is a common scenario in real-world applications due to sensor failures or transmission errors. In this paper, we propose a Prompt-Enhanced Multimodal Learning (PEML) framework that mimics human cognitive process for handling incomplete information. It comprises three core components: (1) Modality-Specific Prompt Encoder (MSPE) that activates prior knowledge through learnable prompt templates, providing adaptive enhancement for different missing patterns; (2) Cross-Modal Adaptive Alignment (CMAA) that establishes inter-modal information exchange channels through a dynamic gating mechanism; (3) Quality-Aware Fusion (QAF) that dynamically fuses high-quality features based on multi-level quality assessment, achieving confidence-based information integration. Extensive experiments across various missing data scenarios demonstrate that PEML outperforms existing state-of-the-art methods, validating the effectiveness of modeling human cognitive processes for robust multimodal learning.

12:30-13:30Lunch Break

13:30-15:30 Session 16A: Large Language Model 4

Chair:

Guanqun Cao

Location: RHLT2

13:30	Ryo Nishida, Masayuki Kawarada, Tatsuya Ishigaki, Hiroya Takamura and Masaki Onishi A Comparative Study of Demonstration Selection for Practical Large Language Models-based Next POI Prediction ABSTRACT. This paper investigates demonstration selection strategies for predicting a user's next point-of-interest (POI) using large language models (LLMs), aiming to accurately forecast a user's subsequent location based on historical check-in data. While in-context learning (ICL) with LLMs has recently gained attention as a promising alternative to traditional supervised approaches, the effectiveness of ICL significantly depends on the selected demonstration. Although previous studies have examined methods such as random selection, embedding-based selection, and task-specific selection, there remains a lack of comprehensive comparative analysis among these strategies. To bridge this gap and clarify the best practices for real-world applications, we comprehensively evaluate existing demonstration selection methods alongside simpler heuristic approaches such as geographical proximity, temporal ordering, and sequential patterns. Extensive experiments conducted on three real-world datasets indicate that these heuristic methods consistently outperform more complex and computationally demanding embedding-based methods. Notably, in certain scenarios, these simpler heuristic methods even surpass fine-tuned models without requiring further training. Our source code is available at: https://anonymous.4open.science/r/poi-demonstration-selection-5B1C.
13:50	Shengchang Wang, Yue Han, Yanjun Qin, Yongke Li, Zhaoru Guo, Feng Yan, Lei Su, Haoxiang Huang, Tiquan Gu and Panpan Zheng CLIP-LMFA: Few-Shot Anomaly Detection via Large Language Model-Driven Hybrid Prompts and Multi-Scale Adaptive Fusion ABSTRACT. Industrial anomaly detection plays a key role in ensuring production quality and operational safety. Although large-scale language vision models are gradually applied to the field of industrial anomaly detection due to their advantages in few-shot scenarios, their limited semantic generalization ability and lack of fine-grained spatial sensitiv- ity hinder their deployment in real-world and high-precision industrial environments. To address these challenges, we propose a CLIP-LMFA framework for few-shot anomaly detection based on CLIP. We intro- duce a hybrid textual prompt strategy driven by a large language model (LLM) to enhance semantic discrimination while reducing the manual design cost. We design a multi-scale local adaptive fusion (MFEAF) en- coder that can jointly capture global semantics and local fine-grained anomalies to achieve pixel-level anomaly segmentation. Without addi- tional fine-tuning or retraining, CLIP-LMFA achieves significant perfor- mance improvements on benchmark datasets, outperforming the base- line by 1.3% and 4.5% in the I-AUROC test on the MVTec-AD and Brain datasets, respectively, demonstrating its effectiveness and prac- ticality in real-world industrial applications. Our code is available on: https://github.com/PRICAI25/CLIP-LMFA.
14:10	Jie Chen, Yunpeng Hong, Jiacheng Liu and Chenyang Bu Towards Automation in Log Parsing: Auto-Prompt Optimization with Natural Language Gradients ABSTRACT. Log parsing aims to transform raw log information into fixed log templates and dynamic log parameters, thereby achieving structured data conversion to break down automated analysis barriers and improve the performance of anomaly analysis applications. Existing LLM-based log parsing methods primarily rely on manually crafted prompts or selecting a static candidate prompt through prompt optimization to guide the model in gradually understanding the parsing task. However, due to the high diversity of real-world log data, existing log parsing methods may not be able to dynamically update prompt information based on the input logs, resulting in limited adaptability to different parsing scenarios. To overcome this limitation, we propose a framework called NLGLP, which aims to achieve automated prompt tuning to adapt to diverse data scenarios. Specifically, NLGLP first automatically selects several examples that best match the current task through a candidate example set, then combines automated prompt optimization techniques to dynamically adjust prompts based on natural language gradients, thereby improving adaptability and parsing performance under different log data. Experiments conducted on 16 public loghub datasets demonstrate that NLGLP achieves an average parsing accuracy of 98. 9\%, obtaining the best current performance.
14:30	Guanqun Cao, Ryan Mckenna, Erich Graf and John Oyekan Learn from the Past: Language-conditioned Object Rearrangement with Large Language Models ABSTRACT. Object manipulation for rearrangement into a specific goal state is a significant task for collaborative robots. Accurately determining object placement is a key challenge, as misalignment can increase task complexity and the risk of collisions, affecting the efficiency of the rearrangement process. Most current methods heavily rely on pre-collected datasets to train the model for predicting the goal position. As a result, these methods are restricted to specific instructions, which limits their broader applicability and generalisation. In this paper, we propose a framework of flexible language-conditioned object rearrangement based on the Large Language Model (LLM). Our approach mimics human reasoning by making use of successful past experiences as a reference to infer the best strategies to achieve a current desired goal position. Based on LLM's strong natural language comprehension and inference ability, our method generalises to handle various everyday objects and free-form language instructions in a zero-shot manner. Experimental results demonstrate that our methods can effectively execute the robotic rearrangement tasks, even those involving long sequences of orders.
14:50	Wen Gu, Zhaoxing Li, Jan Buermann, Jim Dilkes, Dimitris Michailidis, Shinobu Hasegawa, Vahid Yazdanpanah and Sebastian Stein PTFA: An LLM-based Agent that Facilitates Online Consensus Building through Parallel Thinking ABSTRACT. Consensus building is inherently challenging due to the diverse opinions held by stakeholders. Effective facilitation is crucial to support the consensus building process and enable efficient group decision making. However, the effectiveness of facilitation is often constrained by human factors such as limited experience and scalability. In this research, we propose a Parallel Thinking-based Facilitation Agent (PTFA) that facilitates online, text-based consensus building processes. The PTFA automatically collects real-time textual input and leverages large language models (LLMs) to perform all six distinct roles of the well-established Six Thinking Hats technique in parallel thinking. To illustrate the potential of the agent, a pilot study was conducted, demonstrating its capabilities in idea generation, emotional probing, and deeper analysis of idea quality. Additionally, future open research challenges such as optimizing scheduling and managing behaviors in divergent phase are identified. Furthermore, a comprehensive dataset that contains not only the conversational content among the participants but also between the participants and the agent is constructed for future study.
15:10	Soma Watanabe, Shiyao Ding and Takayuki Ito Towards Autonomous Building Construction: A Multi-Agent Framework Leveraging Large Multimodal Models ABSTRACT. Large Language Models (LLMs) have advanced creative AI agents capable of complex planning and decision-making. Yet, autonomous building construction remains difficult due to challenges in handling diverse structures and inferring precise spatial layouts from visual data. We propose a novel multi-agent framework that leverages Large Multimodal Models (LMMs) to integrate textual and visual inputs for iterative building reproduction. In our system, an Advisor Agent performs high-level visual reasoning and planning, while a Constructor Agent translates this guidance into precise blueprints for execution. Additional innovations include part-based structural modeling, blueprint consistency across cycles, and multi-view perception to handle occlusion. Experiments in Minecraft demonstrate that this multi-agent architecture substantially improves accuracy, flexibility, and error correction compared to single-agent baselines.

13:30-15:30 Session 16B: Machine Learning 4

Chair:

Zeqiong Lv

Location: RHMZ02

13:30	Roshan Birjais, Kevin Wang and Waleed Abdulla Enhancing the Forward Forward Algorithm with Label Based Similarity for Improved Neural Network Training ABSTRACT. The Forward Forward (FF) algorithm has been proposed as a biologically plausible alternative to backpropagation for training deep neural networks. It replaces backward gradient computations with a dual forward pass strategy, where each layer independently optimizes a local "goodness" function to distinguish between positively and negatively labeled data. However, due to the lack of global gradient flow and local training, FF based training suffers from poor inter layer coordination and suboptimal label alignment. In this research, we enhance the Forward Forward framework by using the Hilbert Schmidt Independence Criterion (HSIC) to improve the goodness function at each layer. HSIC serves as a label aware statistical dependence measure, encouraging each layer’s output to retain relevant input structure while aligning more closely with the true class labels. Our formulation introduces distinct HSIC based objectives for positive and negative passes: the positive pass maximizes dependence with the true label, while the negative pass penalizes alignment with incorrect labels. This design maintains the local and backpropagation free nature of FF training while promoting global task coherence and refines its ability to differentiate between positive and negative data more effectively, leading to more robust feature representations and improved learning dynamics. Our experimental results demonstrate that this approach substantially improves the accuracy of the FF algorithm across multiple benchmark datasets, narrowing the performance gap with backpropagation while preserving the FF algorithm’s intrinsic advantages.
13:50	Jiawen Deng, Han Ji and Yanan Sun BMSR: A Bidirectional Multi-hop Predictor with Structure-aware Ranking Loss for NAS ABSTRACT. Performance evaluation is crucial in neural architecture search (NAS), but full training is costly and slow. Performance predictors offer an efficient way to quickly evaluate architectures, significantly speeding up the process. However, existing predictors often trade accuracy for speed or depend on complex encoders and costly pretraining, making it difficult to balance accuracy and efficiency with limited labeled data. In this paper, we propose a Bidirectional Multi-hop predictor with Structure-aware Ranking Loss (BMSR), which is designed for speedy and accurate performance prediction. During feature extraction, BMSR applies a bidirectional multi-hop graph convolution network with hop-aware attention to capture long-range and directional dependencies from architectures. Once the architecture embeddings are obtained, a progressively shrinking MLP is employed to compress them layer by layer, enhancing nonlinear modeling and improving representation quality. In the optimization stage, BMSR adopts a structure-aware ranking loss that leverages topological and operational similarity to encourage stable rankings among architectures. Experiments across multiple NAS benchmarks demonstrate that BMSR achieves competitive performance in both efficiency and accuracy. On NAS-Bench-201, BMSR identifies the optimal architecture using only 100 labeled samples and 8.45 seconds—just 0.3% of the computation time required by prior SOTA methods. Code is anonymously available.
14:10	Fuyuan Ma, Yaodi Zhu, Xin Wang and Ying Wang More Imperceptible Adversarial Attack Method on Graph Neural Networks ABSTRACT. Graph Neural Networks (GNNs) have achieved notable success in various graph-related tasks, yet they remain highly susceptible to adversarial attacks. Minor perturbations to graph data—particularly to structure—can significantly impair model performance. Most existing attacks manipulate the graph structure (e.g., adding or removing edges), but due to the sparsity of real-world graphs, such changes often violate the imperceptibility constraint. Furthermore, these methods typically perturb either structure or features in isolation, neglecting the inherent coupling between them. To address these issues, we propose More Imperceptible Adversarial Attack(MIAA), a novel method that jointly perturbs both structural and feature information while maintaining imperceptibility. MIAA introduces a Gradient-Adaptive Permutation Attack (GAPA) method, which disrupts node-edge-feature semantics by permuting existing elements in the graph. Guided by the gradient direction, the attack maximizes prediction loss while preserving global statistical properties. Experiments on benchmark datasets including Cora, Citeseer, and Polblogs show that MIAA significantly outperforms existing methods. Under a perturbation budget constrained to half the node degree, it improves attack success rates by 5.31% to 35.06%, demonstrating both effectiveness and subtlety in adversarial manipulation.
14:30	Hao Dai, Yuqing Zhu, Xueying Zhu, Chong Tang, Ni Li and Yang Wang CoTraX: An Efficient Parallel Training Method for On-Policy Deep Reinforcement Learning PRESENTER: Hao Dai ABSTRACT. Deep reinforcement learning (DRL) has significantly advanced artificial agents in complex environments by integrating deep learning with reinforcement learning, demonstrating success in domains such as robotics, reinforcement learning from human feedback (RLHF), and game-playing. However, the alternating training and execution phases, particularly in on-policy methods, introduce substantial synchronization overhead, limiting efficiency. To address this challenge, we analyze the computational interplay between these phases and propose CoTraX, a novel framework that strategically overlaps training and execution to optimize resource utilization and accelerate training. Furthermore, we develop an adaptive control algorithm to mitigate potential adverse effects of overlapping. Extensive experiments demonstrate that CoTraX reduces training time by an average of $9.89\%$ without compromising performance.
14:50	Dejiao Niu, Chengyu Zhang, Tao Cai, Lei Li, Yuxuan Yang and Ye Wang SWD-HTM: An novel Hierarchical Temporal Memory Model Integrating Optimal Transport and Sparse Autoencoder PRESENTER: Ye Wang ABSTRACT. Hierarchical Temporal Memory (HTM) is a biologically inspired, online learning algorithm that emulates neocortical computation for time series modeling. However, its reliance on hand-crafted encoders limits adaptability. Meanwhile, independent encoding and concatenation of multivariate feature embeddings often cause dimension explosion. To overcome these limitations, we propose a novel HTM architecture integrating deep representation learning via a Sparse Autoencoder (SAE) with optimal transport theory. The SAE replaces manual the original encoder and spatial pooler components with a data-driven, end-to-end framework, enhancing generalization. The Sliced Wasserstein Distance (SWD) is introduced to align the SAE’s hidden-layer activation distribution with the target Sparse Distributed Representation (SDR), ensuring sparsity, similarity, and distributivity simultaneously. This alignment minimizes distributional discrepancy while reducing computational complexity. Extensive experiments demonstrate that the proposed SWD-HTM model significantly improves prediction accuracy, achieving 14.3\% and 22.1\% gains on short-term and long-term forecasting tasks, respectively, outperforming traditional HTM and state-of-the-art baselines.

13:30-15:30 Session 16C: Real-World Applications 2

Chair:

Junhao Huang

Location: RHMZ03

13:30	Guanyuan Zeng, Yingyi Fu, Sikai Lin, Xinyang Chen and Guoting Chen UniTCP: Traffic Prediction via UniBasis Spectral Filtering and Temporal Convolutional Projection PRESENTER: Guoting Chen ABSTRACT. Accurate and efficient traffic flow prediction is crucial for modern urban transportation systems, directly impacting the effectiveness of intelligent traffic management and sustainable mobility solutions. Current spatio-temporal graph neural networks often fail to balance prediction accuracy and computational efficiency when modeling complex traffic patterns – a critical limitation for real-time applications requiring both precision and rapid processing. This paper presents UniTCP, a novel framework advancing urban traffic flow prediction through three key innovations: (1) The introduction of Universal Polynomial Basis (UniBasis) overcomes limitations of traditional spectral graph convolution by adaptively constructing optimal polynomial filters through data-driven learning, extending the concept of homophily ratio from node classification to multivariate time series forecasting and enabling dynamic modeling of complex spatial dependencies across heterogeneous traffic networks. (2) The innovative Temporal Convolutional Projection Module (TCPM) synergizes multi-scale convolutional branches with trend-aware pooling to comprehensively capture both transient traffic fluctuations and persistent periodic patterns, establishing a new paradigm for efficient temporal feature extraction. (3) A unified architecture integrating node-adaptive parameter learning with time-variant graph structure generation, which achieves optimal performance-efficiency balance through spectral domain parameterization and spatio-temporal embedding fusion. Experimental validation across four public datasets confirms the framework's superior performance in addressing three core challenges: precise modeling of nonlinear spatio-temporal dependencies, computational resource optimization, and effective generalization across diverse traffic networks. The results demonstrate significant improvements in both prediction accuracy and operational efficiency compared to existing state-of-the-art approaches.
13:50	Tao Yu, Yiwei Lin, Zhenqin Chen and Jinshan Xu CASD: An Accelerated Sampling Framework for ECG Denoising ABSTRACT. Real-time and accurate analysis of electrocardiogram (ECG) signals is a core requirement for clinical applications such as ambulatory monitoring. However, existing ECG denoising methods struggle to bal ance efficiency and fidelity. Traditional approaches, such as filtering and methods based on deep encoders, often compromise the fidelity of crit ical waveforms, especially in the presence of strong noise interference. Although methods based on Denoising Diffusion Probabilistic Models (DDPMs) have achieved significant progress in signal reconstruction, their high iterative inference cost and architectural limitations in mod eling long-range dependencies severely restrict their clinical application potential. To address these challenges, this paper proposes a Conditional Accelerated Sampling Denoising (CASD) framework. This framework formulates the denoising task as an efficient, deterministic sampling pro cess, capable of achieving high-fidelity signal reconstruction in as few as 10 sampling steps. At the core of CASD is a Global Feature Enhancement (GFE) module, which explicitly captures the long-range dependencies of the signal via a self-attention mechanism, thereby effectively suppressing baseline wander noise.
14:10	Junhao Huang, Romain Chaput, Bing Xue, Ivy Liu, Ross Vennell and Mengjie Zhang Automated Design of Neural Networks for River Flow Prediction using Weather Data PRESENTER: Junhao Huang ABSTRACT. River flow plays a vital role in the hydrologic cycle. In recent years, numerous machine learning approaches, particularly deep neural networks (DNNs), have demonstrated success in forecasting river flow. However, many of these models depend heavily on extensive historical flow data and very specific hydrological features, which are often labor-intensive to gather. Moreover, most existing predictive models, especially DNN-based, are handcrafted, requiring substantial domain expertise and extensive hyperparameter tuning, making the modeling process potentially inefficient. In this study, we propose a novel approach to achieve flexible and efficient river flow prediction that relies solely on openly accessible weather forecast data. A one-dimensional convolutional neural network (1D-CNN) is designed to effectively capture the temporal relationships between weather variables and river flow. Additionally, an efficient evolutionary neural architecture search (NAS) algorithm is developed to automatically discover the best 1D-CNN architectures, thereby improving predictive performance while reducing the need for manual architecture tuning. To evaluate our approach (named AutoNN-Flow), experiments are conducted using the Kaeo River in New Zealand as a case study. Without access to river-specific attributes, AutoNN-Flow achieves highly accurate 7-day and 14-day river flow predictions, greatly outperforming classic machine learning models.
14:30	Haru Momozu, Yuki Uehara, Naoki Nishimura, Koya Ohashi, Deddy Jobson, Yilin Li, Phuong Dinh, Noriyoshi Sukegawa and Yuichi Takano Subset Selection for Stratified Sampling in Online Controlled Experiments ABSTRACT. Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for variance reduction to improve the sensitivity (or statistical power) of controlled experiments; this technique first divides the population into strata (homogeneous subgroups) based on stratification variables and then draws samples from each stratum to avoid sampling bias. To enhance the estimation accuracy of stratified sampling, we focus on the problem of selecting a subset of stratification variables that are effective in variance reduction. We design an efficient algorithm that selects stratification variables one by one by simulating a series of stratified sampling processes. We also estimate the computational complexity of our subset selection algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our method can outperform other variance reduction techniques especially when multiple variables have a certain correlation with the outcome variable. Our subset selection method for stratified sampling can improve the sensitivity of online controlled experiments, thus enabling more reliable marketing decisions.
14:50	Prabhashrini Manage, Yutong Wu, Jinglang Zhang and Yuefeng Li BERT-TCPE: A Term-Class Probability-Enhanced BERT Framework for Uncertainty-Aware Medical Text Classification ABSTRACT. Modern deep learning models such as BERT have achieved remarkable success in text classification tasks, yet they often struggle with uncertainty—particularly when dealing with ambiguous inputs or overlapping class boundaries in high-stakes domains like healthcare. This paper introduces TCPE-BERT, a lightweight framework that augments BERT with Term-Class Probability Embeddings (TCPE) derived from interpretable statistical features. Unlike ensemble-based uncertainty-handling methods that introduce significant computational overhead, TCPE-BERT fuses probabilistic term semantics with contextual embeddings in a scalable and transparent manner. By incorporating uncertainty-aware statistical cues, the framework improves the model's ability to resolve borderline cases and enhance decision confidence. Experiments on two benchmark medical datasets— PubMed and Ohsumed — demonstrate consistent accuracy improvements over BERT alone, particularly in samples with high classification uncertainty, while maintaining low inference time and minimal parameter overhead. Embedding space visualizations further reveal that TCPE-BERT enhances class separation, validating the synergy between statistical and contextual signals. Our findings suggest that probabilistic-semantic augmentation offers a practical and interpretable solution for improving robustness, uncertainty resolution, and decision boundaries in medical NLP without sacrificing efficiency.

13:30-15:30 Session 16D: Online Session

Zoom link: https://vuw.zoom.us/j/97046252862

Chair:

Jiyuan Pei

Location: RH103

13:30	Wei Liu, Yuhe An, Senrong Xu, Xingshen Wei and Weiyong Yang Enhancing Graph Anomaly Detection with Contrastive Pre-training and Pseudo-label Learning ABSTRACT. Graph anomaly detection has been instrumental in many domains, playing a critical role in identifying unusual patterns in graph-structured data. Recent advancements that integrate graph neural networks with contrastive learning have demonstrated significant potential and achieved promising results. Despite the progress, how to design a method that is both effective and efficient remains a key challenge. In this paper, we propose a new graph anomaly detection method TSGAD. Specifically, TSGAD adopts a two-stage design of pre-training and pseudo-label learning. The first stage uses contrastive pre-training to make initial predictions, and the second stage uses pseudo-label learning based on the predictions of the first stage. Additionally, we also elaborately design the two-stage framework to ensure that it scales linearly w.r.t. the graph size. Extensive experimental results on four real-world datasets demonstrate both the effectiveness and efficiency of the proposed method. Overall, TSGAD is fully unsupervised and does not require any manual label annotations; meanwhile, it can also achieve comparable results with its supervised counterpart.
13:50	Lei Shu, Tao Zhu, Jinlong Jiang, Qi Yu, Shiyu Li and Yu Peng LN_Net: Lightweight Non-interventionist Network for Weakly-Supervised Video Anomaly Detection ABSTRACT. Traditional spatiotemporal modeling for video anomaly detection often suffers from high computational costs and relies on heuristic assumptions (e.g., feature magnitude) that can be flawed and hinder robust representation learning. To address this, we propose LN_Net, a Lightweight Network adopting a non-interventionist strategy to overcome the pitfalls of imposing such potentially misleading priors. This approach grants the model greater flexibility to learn discriminative patterns directly from observational data. LN_Net implements this strategy through two core, efficient innovations: (1) an Efficient Temporal Modeling Module (ETMM) capturing multi-faceted temporal dynamics without convolutions, and (2) an Adaptive Focusing Module (AFM) highlighting salient temporal evidence. Our non-interventionist method achieves competitive detection accuracy (97.77\% SH-AUC, 86.21\% UCF-AUC). Simultaneously, it demonstrates state-of-the-art efficiency, requiring only about 1/35th the parameters, 1/135th the model size, and 1/51th the inference time compared to recent complex methods like VadCLIP. This highlights its significant practical value for deployment.Our code is available at: https://anonymous.4open.science/r/LN_Net-0324
14:10	Haoyi Gu PCGS: Pose Conditioned Generative Steganography ABSTRACT. Secure transmission of private information over public channels without arousing suspicion remains a fundamental challenge in stega-nography. Traditional methods modify pixel-level or frequency-domain features, making them vulnerable to detection and degradation. Recent synthesis-based approaches leverage generative models to embed data but often suffer from limited capacity or visual artifacts. In this work, we propose a pose conditioned generative steganographic framework that decouples message representation from image content. Binary messages are first mapped to human poses using a geometry-aware codebook derived from real-world data. These poses then serve as structural conditions to guide diffusion-based image generation, producing semantically coherent and visually natural stego images. By encoding multiple human poses in a single image, our framework increases message capacity while preserving visual coherence. To enhance robustness, we introduce a randomized linear expansion scheme to stabilize pose-code mapping under occlusion and detection noise. We evaluate the method under various perturbations and assess detectability using state-of-the-art steganalysis models. Experimental results show strong imperceptibility, decoding accuracy, and semantic flexibility, highlighting the effectiveness of our framework in enabling secure and coverless generative steganography. The code for PCGS will be released prior to publication.
14:30	Peiyu Fan, Xuanpu Zhao, Can Bu, Ping Lin, Guodong Ma and Yongming Huang MASM-Net: A Multi-Scale Adaptive Mamba Network for Cuffless Blood Pressure Estimation Using Photoplethysmographic Signals PRESENTER: Peiyu Fan ABSTRACT. Continuous blood pressure monitoring is crucial for the pre vention and management of cardiovascular diseases. However, conven tional cuff-based measurement methods are unsuitable for prolonged ambulatory use due to discomfort and restricted mobility. To overcome these challenges, we present MASM-Net, a deep learning architecture for accurate cuffless blood pressure estimation from single-channel photoplethys mographic (PPG) signals. The proposed model integrates three key components including stacked Adaptive Multi-Scale Convolution (AMSC) modules for comprehensive temporal feature extraction, a Dimension Expansion (DE) module for enhanced feature representation, and a Mamba module for efficient long-range temporal dependency modeling with linear computational complexity. Extensive experiments on two public datasets demonstrate that MASM-Net achieves state-of-the-art performance, with mean absolute errors and standard deviations of 2.25 ± 3.90 mmHg (systolic) and 1.26 ± 2.14 mmHg (diastolic) on the UCI dataset, and 2.69 ± 3.80 mmHg (systolic) and 1.63 ± 2.26 mmHg (diastolic) on the BCG dataset. These results surpass those of existing methods, establishing a robust framework for continual, noninvasive blood pressure monitoring.
14:50	Yun Wu, Ziyi Wang, Wei Zheng, Yan Du, Jieming Yang, Kai Yang, Ning An and Nan Xu Wind Power Curve Data Cleaning Model Based on LOF-ITSM ABSTRACT. Aiming at the problems that traditional image threshold segmentation method cannot effectively clean the outlier anomaly data in the wind power curve and it is difficult to adapt to the fuzzy boundaries in the curve, this paper proposes a data cleaning method of local outlier factor combined with image threshold segmentation (LOF-ITSM). Firstly, the LOF algorithm is used to pre-clean the wind power data, which is used to identify and clean the outlier anomalous data; then the image threshold segmentation method is improved by introducing a local adaptive threshold optimization mechanism, which better adapts to the fuzzy boundaries in the curves, and at the same time mitigates the insufficiency of the sensitivity of the LOF algorithm to the stacked anomalous data. The experimental results indicate that this method can better clean the abnormal data in the wind power curve, which lays a data foundation for promoting the efficient utilization of wind energy and the green transformation of energy structure.
15:10	Fengjie Chang, Xinning Zhu, Zheng Hu and Yang Qin HGTUL: A Hypergraph-based Model For Trajectory User Linking ABSTRACT. Trajectory User Linking (TUL) focuses on linking anonymous trajectories with the users who generated them. It is essential for understanding and modeling human mobility patterns. Despite significant advancements in this field, existing studies primarily neglect the high-order inter-trajectory relationships — complex associations among multiple trajectories, often revealed through co-occurrence across multiple locations. Furthermore, they fail to consider the variable influence of Points of Interest (POIs) on different trajectories, as well as the user class imbalance problem caused by disparities in user activity levels and check-in frequencies. To address these limitations, we propose a novel HyperGraph-based Trajectory User Linking model (HGTUL). Our model learns trajectory representations from both relational and spatiotemporal perspectives: (1) It models high-order trajectory associations via a hypergraph and incorporates an attention mechanism to learn the variable impact of POIs; (2) It encodes spatio-temporal characteristics by feeding the temporal and spatial features of each trajectory into a sequential encoder. Furthermore, we introduce a data balancing method to mitigate user class imbalance, and experimentally validate its significance in TUL. Extensive experiments on three real-world datasets show that HGTUL outperforms state-of-the-art baselines, achieving improvements of 2.57%∼20.09% in ACC@1 and 5.68%∼26.00% in Macro-F1 scores. The code is available at https://github.com/changfengjie3003/HGTUL.

13:30-15:30 Session 16E: Online Session

Zoom link: https://vuw.zoom.us/j/93664289896

Chair:

Yuye Zhang

Location: RH104

13:30	Shengjie Lu, Zhibin Wan, Shufan Zhou, Jiejie Liu, Quan Zhang and Mingjie Sun Training-free Clothing Region of Interest Self-correction for Virtual Try-On PRESENTER: Shengjie Lu ABSTRACT. VTON (Virtual Try-ON) aims at synthesizing the target clothing on a certain person, preserving the details of the target clothing while keeping the rest of the person unchanged. Existing methods suffer from the discrepancies between the generated clothing results and the target ones, in terms of the patterns, textures and boundaries. Therefore, we propose to use an energy function to impose constraints on the attention map extracted through the generation process. Thus, at each generation step, the attention can be more focused on the clothing region of interest, thereby influencing the generation results to be more consistent with the target clothing details. Furthermore, to address the limitation that existing evaluation metrics concentrate solely on image realism and overlook the alignment with target elements, we design a new metric, Virtual Try-on Inception Distance (VTID), to bridge this gap and ensure a more comprehensive assessment. On the VITON-HD and DressCode datasets, our approach has outperformed the previous state-of-the-art (SOTA) methods by 1.4%, 2.3%, 12.3%, and 5.8% in the traditional metrics of LPIPD, FID, KID, and the new VTID metrics, respectively. Additionally, by applying the generated data to downstream Clothing-Change Re-identification (CC-Reid) methods, we have achieved performance improvements of 2.5%, 1.1%, and 1.6% on the LTCC, PRCC, VC-Clothes datasets in the metrics of Rank-1. The code of our method will be public after acceptance.
13:50	Dehai Zhang, Jiahao Zhang, Yujing Huang and Kuang Hu A Vision-Language Fusion Framework for Ethnic Minority Costume Image Captioning PRESENTER: Yujing Huang ABSTRACT. Image description generation of ethnic minority costumes is an important research direction in the intersection of computer vision and natural language processing, with application potential in cultural protection, education, tourism and other fields. Existing image description methods face challenges such as description simplification, insufficient semantic accuracy and "hallucination phenomenon" in the field of ethnic minority images, and lack of well-annotated datasets. To ad-dress these issues, this paper proposes a framework for generating image de-scriptions of ethnic minority costumes that integrates memory mechanisms, cog-nitive computing and multimodal large language models: Firstly, an abstract se-mantic memory bank is constructed to store ethnic semantic information and be dynamically invoked; Secondly, an image semantic understanding method based on cognitive computing is designed, which enhances the expressiveness and in-terpretability of descriptions through entity recognition, attribute analysis and re-lationship reasoning; Finally, a multimodal large model is combined to generate more detailed and accurate descriptions through modality alignment. Experiments show that this framework significantly outperforms traditional methods in the ac-curacy and semantic richness of image descriptions of ethnic minority costumes, achieving the best results in both image recognition accuracy and comprehensive indicators of description generation, providing an innovative technical solution for the digital protection and dissemination of ethnic minority cultures.
14:10	Man Zhang, Yun Xiang and Zhi Wang Improved GJO optimized CNN-BiLSTM-Attention touchdown speed prediction model ABSTRACT. Accurately predicting the touchdown point speed of civil aviation aircraft is crucial for identifying unstable approaches and estimating runway occupancy time post-landing.To achieve accurate prediction of the touchdown point speed, a CNN-BiLSTM-ATTENTION speed prediction model optimized by an improved golden jackal optimization (IGJO) algorithm is proposed. Multiple factors affecting the aircraft's touchdown speed are comprehensively considered. Input feature data for the model are constructed by combining ADS-B data, flight plan information, meteorological data, and runway information. The CNN-BiLSTM-ATTENTION network is used to extract the deep spatial and temporal features of the data. Meanwhile, the GJO algorithm is improved through Gaussian random walk, spiral search, sine-cosine search strategy, and lens imaging opposition-based learning strategy. The resulting IGJO algorithm has stronger global search ability and higher convergence accuracy. The CNN-BiLSTM-ATTENTION model optimized by IGJO has higher prediction accuracy. Experimental results show that the MAE, MAPE, and RMSE of the prediction results of the IGJO-CNN-BiLSTM-ATTENTION model are 3.2017, 3.06%, and 3.8817 respectively. It has higher prediction accuracy compared with the unoptimized model and the models optimized by GJO, PSO, and DA. This prediction method provides strong support for air traffic control departments to obtain accurate touchdown speed predictions.
14:30	Chao Zhang and Haojie Zhou BAD: Bidirectional Attention-Guided Distillation for Object Detection ABSTRACT. Feature-based knowledge distillation has been widely adopted to boost lightweight detectors. However, existing methods often rely on ground-truth bounding boxes to localize salient regions for guiding the feature learning of student models. These approaches ignore the significant structural and semantic gap between teacher and student networks, making it challenging to transfer effective knowledge . We propose \textbf{BAD}, a novel distillation framework that introduces bidirectional attention guidance in both spatial and channel dimensions. The core of BAD is to extract spatial response patterns from both teacher and student networks, and fuse them into a joint attention mask. This mask identifies semantically aligned regions without relying on ground-truth boxes, enabling adaptive guidance for the student to focus on task-relevant areas while preserving its representational flexibility. Meanwhile, the masking mechanism helps mitigate training imbalance caused by overly strict alignment. The spatial attention module helps localize contextually important regions, while the channel attention enhances global semantic alignment. We conduct extensive experiments on several popular object detectors, including YOLOv8, RetinaNet, FCOS and Faster R-CNN. The results demonstrate that BAD achieves stable and effective knowledge transfer under heterogeneous architectures and limited model capacity.
14:50	Lei Zuo, Jing Chen, Zihao Yu and Jun Sun Towards Effective Event Argument Extraction via Enhanced Contextual Understanding ABSTRACT. Event Argument Extraction is a crucial task that aims to extract the arguments of specified events from a text and predict their roles. Recent mainstream methods for event argument extraction still falls short in handling long-distance dependencies of arguments, resulting in limited contextual understanding and suboptimal performance. To address these limitations, here we propose an effective event argument extraction model named DSEAE. The proposed DSEAE model mainly consists of the dependency-guided module and the structure-aware module, each of which employs a distinct and newly improved self-attention mechanism. The dependency-guided module aims to guide the model in associating different prompts with their corresponding event contexts, whereas the structure-aware module aims to strengthen interaction between event information for better contextual understanding. Experimental results show that our method achieves better performance compared to the baselines. This demonstrates the effectiveness of our method.
15:10	Keisuke Sugawara, Kento Uchida and Shinichi Shirakawa Neural Architecture Search of Sample Reweighting Networks for Complex Distribution Shift ABSTRACT. Sample reweighting is a major approach to addressing distribution shifts, such as label noise and class imbalance. Meta-Weight-Net (MW-Net) is a promising sample reweighting network that computes weights based on classification loss. Although MW-Net improves prediction performance under a single type of distribution shift using a simple neural network, its performance degrades when facing both label noise and class imbalance, where it is hard to determine appropriate weights solely from classification loss and using a simple network. In this study, we introduce neural architecture search to MW-Net to mitigate such performance degradation. Using the tree-structured Parzen estimator, we explore the optimal number of hidden layers and nodes and select the most suitable intermediate layer in the classification model to serve as the input for MW-Net. Experimental results on the CIFAR-10 and CIFAR-100 datasets that were modified to include both label noise and class imbalance demonstrate the effectiveness of neural architecture search for MW-Net.

16:00-17:00 Session 17: Industry Panel

Chairs:

Mengjie Zhang and Andrew Lensen

Location: RHLT1

18:30-22:00 Conference Dinner at Tākina Wellington Convention and Exhibition Centre